Redoing the DotNetNuke Friendly Url Provider for Human Reading and Search Engines Indexing
NOTE: Due to popular demand, there is now a Support Forum for support requests on this module, or if you just want the thing to work ASAP, I can provide one-on-one support or install it for you.
What I did previously was improve upon Scott McCulloch's work by working out a way to handle parameters. All well and good, and it worked OK. But it just still didn't look quite right. What's more, I wanted to use the rel="tag" microformat on my tagging module, and I was stuck with the .aspx extension on everything, which the microformat doesn't recognise.
So after some back and forwards on the Ventrian forum where I posted my update, I went away and had a good think about it. There were two things nagging at me:
- 1. The performance would drop off drastically with the number of pages in a site, due to the iterative search for a match.
- 2. The only way to get truly nice Url's is to ditch the page extension of .aspx. When you think about it, it doesn't serve the user at all. It's really only there to make it easier on the web server. It had to go
Doing number 1 was easy : just do a dictionary based lookup on the page path, if it's found, great. If not, 404 coming right up.
It works like this (if you're not intimately familiar with DNN, you might want to skip to the next part). On the first request, a dictionary of path information is built up. This contains the path (example: mysite/mypage) and the actual DNN request Url (example: default.aspx?tabid=37). The dictionary is stashed away in the Cache.
So far so good. The incoming url is deconstructed into segments. Working backwards from the full url (mysite/mypage/mypath/myvalue) the url is tried against the dictionary for a 'hit'. If a page entry in the dictionary is found, great, the url is rewritten and processing continues.
If the dictionary didn't contain the page, then the next segment is removed, thus mysite/mypage/mypath/myvalue is trimmed to mysite/mypage/mypath. This trimming process is repeated until there are not segments left. If nothing was found after all that, the page dictionary is rebuilt one more time, just in case it's a new page, then the process is repeated. If still nothing, then no url rewriting will be done and a 404 will probably occur.
There is obviously a great deal more complexity in it than that, but that's the basic algorithm.
So onto number 2. Some will look at you with fear in their eyes when you mention removing the .aspx extension from asp.net pages and declare that 'you cannot change the laws of physics'. But in reality there is nothing difficult about removing the .aspx extension from asp.net pages. There's really two (easy) ways.
- -> Implement an ISAPI Rewriting DLL. There's some commercial ones available, and some open-source ones. I assume these work OK, but looked like way too much work for me.
- -> Direct all calls to the website through asp.net by mapping a wildcard (*) to the aspnet_isapi.dll
Given I was already rewriting Urls, the second of those options was the path of least resistance for me. Now I realise that people using DotNetNuke on a shared host aren't going to be able to do this, but if you've got your own server, it's not difficult. By mapping all requests to go through asp.net, all requests for items on the website (or virtual directory) end up going through the DNN Url Rewriter. Eek - better make sure they work then.
To fix this, I implemented a couple of regex filters to be placed in the web.config. These restrict what items will be passed along to IIS unfettered and what will be rewritten by the UrlRewrite code. And that was about it - because the way the dictionary lookup works, the .aspx is redundant anyway. Because all entries in the d