404 Handling is the process of managing what happens when a visitor comes to your site and enters a Url/follows a link which isn't valid. The correct handling of 404 errors can really mean the different between a professional looking site, and one that still looks like it is in development. The name ‘404 Error’ is used because the web server returns a response code of ‘404’, meaning it couldn’t find a matching file/resource for the Url.
A well developed 404 strategy will retain users, keep search engines up to date and in general make your site look like a polished and finished effort.
This post takes a look at what happens when a 404 occurs, how DotNetNuke handles it, and how the new version of Url Master (2.0) can handle 404 errors.
Background on DotNetNuke 404 Error Handling.
In order to discuss issues behind DotNetNuke 404 handling, it’s worthwhile to just take a background look at how DotNetNuke handles all Urls that are requested. As I’m sure you are aware, DotNetNuke sits on the ‘Active Server Pages .NET’ (ASP.NET) platform, which runs in ‘Internet Information Services’ (IIS), which in turn runs on Microsoft Windows.
DotNetNuke Url Handling
This diagram demonstrates what happens when a Url is requested to a DotNetNuke installation. The steps are:
- Request from browser arrives at website
- IIS inspects the Url extension, and sends the request according to the current configuration. Static files (.jpg, .gif, .css etc) are read directly from the file system, and streamed back as the response. ASP.NET files (.aspx, .axd, .asmx, etc) are passed onto the ASP.NET runtime.
- For ASP.NET requests, they are given to the configured Url Rewriter for DotNetNuke.
a) If a match is found with a page in DotNetNuke, then the Url is rewritten so that it looks like /default.aspx?tabid=xx etc. ‘Default.aspx’ is the one of the only physical files that you can see if you look in the website directory for a DotNetNuke installation. b) If no match is found with a page (or rule) in DotNetNuke, the Url is not rewritten and passed back to ASP.NET without modification.
c) ASP.NET will then attempt to locate the file (or handler, if defined in the web.config file) and execute the contents. This is the same for both rewritten requests and requests not touched by the Url Rewriter.
- If ASP.NET cannot find a matching file or handler for the Url, it will drop into it’s defined 404 error configuration. If a 404 Error handler is specified, then ASP.NET will do a 302 redirect to that Url, and the 404 page/handler will stream the contents back along with a ‘200 OK’ status.
- If ASP.NET did find a matching file or handler, that code is executed, the contents are streamed back and a ‘200 OK’ Http status is returned.
- For IIS static files, if IIS cannot find a matching file, it will lookup the configuration of the IIS website. If it finds a custom 404 error handler, that page will be shown. Otherwise, it will show a standard ‘IIS’ 404 error page, which is different depending on the version and other settings. In both cases, IIS will return a ‘404 Not Found’ Http status. If the missing file is an image, you’ll see a broken image in the web page.
The reason I have detailed this process is to highlight problems with it. I’ll do that by showing what you want to happen when someone requests a file that doesn’t exist.
Ideal 404 Handling for a Website
This what I think should happen when you get a 404 error on your site, regardless of underlying technology.
1. The 404 Error Page should immediately identify itself as associated with your site.
You can choose the wording, whether it’s a formal ‘404 Error on Url xyx’ or the informal ‘Whoopsie! Lost that document’. But it’s important that the page shown at least looks like it belongs to the target website, by some level of branding such as colours, fonts, logos.
2. The 404 Error Page should show a set of links where a person can get back on track. The gold standard is where it runs a search on the server and returns some results, but a simple set of links back into the main part of the site is probably good enough for most people. Just suggesting someone go back and try again isn’t much help, particularly if they clicked on a link in an email or on some social media site.
3. You should be aware of any 404 Errors happening.
4. Your 404 page must return a 404 status to search engine indexes.
The Http status codes are there for browsers and search engine indexing robots to understand and act upon. If you don’t return a 404 status code when a Url does not return anything, search engines don’t know that the page doesn’t exist. It’s very important to return a 404 status code so that dead links and pages can be removed from the page index.
How a standard DotNetNuke Installation Fails the 404 Criteria
Using the above criteria, here’s where a standard DotNetNuke installation fails:
1. There is no standard 404 page for DotNetNuke. Depending on ‘where’ the error lies, DNN will either show the ASP.NET 404 page (no branding, no links) or the IIS 404 Page (no branding, no links).
You can overcome this by creating your own 404 page, and putting it in the root of your website, so that any 404 errors are routed to that page. This is how dotnetnuke.com does it. The problem with this approach is when you have a multi-portal DotNetNuke installation – you are forced to have a generic page, because all portals will use the same 404 handler. There’s no way for ASP.NET to display a per-portal error page because ASP.NET just sees a DotNetNuke install as one website. The portal segmentation is achieved within DotNetNuke itself.
2. Again, having no standard error page means not having links for the site, and, again, if you have a multi-portal install, you can’t easily determine which links to show for which site.
3. DotNetNuke will not log any 404 errors in it’s site log, or through an analytics package (unless you include the analytics script on a custom 404 page – but again for multi-portal installs this can’t work properly). You have to look at the IIS website logs to discover what 404’s are occuring.
4. Crucially, any ASP.NET application doesn’t actually return a correct 404 Http status back to a search engine if it follows a dead link. In fact, you get a 302 redirect, and then a 200 OK return status. The search engines assume that the original Url has now changed, and they index the 404 error page for that Url instead. If you don’t believe me, try searching Google for ‘the resource cannot be found’. Once you get past the first 5 pages of programmers asking questions on how to fix the problem, you’ll see thousands, if not millions, of ASP.NET sites that have the generic ASP.NET error page indexed for previously-valid Urls. While this probably doesn’t present too much of a problem, SEO wise, it would be better if search engines can automatically remove dead links for your site instead of leaving them lying around. Some links really are just dead and should be removed. Sometimes search engine ‘bots’ take guesses at Urls while indexing your site, and you definitely don’t want to have them endlessly indexing your site errors because every guessed-at link returns a valid 200 page.
For SEO purposes, though, the most important part is step 3 : logging your 404 errors so that you can setup redirects to capture the valuable traffic if a popular link is wrong.
404 Handling with Url Master 2.0
[Note : Url Master 2.0 is currently in Beta Testing, and is not yet available for public release. This information is posted to alert readers of upcoming functionality which will be available soon.]
When Url Master version 1.0 came out, it didn’t take long before people started asking about how to handle 404 errors better in their DotNetNuke installs. The emails and forum posts have continued at a steady rate the whole time the module has been released. Thanks to many conversations with many different people, Url Master version 2.0 contains probably the most robust 404 handling you can put into an ASP.NET application. Here’s the details:
Per-portal specification of 404 Pages
This is selected within the module settings for each portal in the installation. Within each portal on a site, this is the choices you have on how to handle 404 errors:
- Default ASP.NET 404 Handling (use existing methods)
This is for when you have implemented your own 404 handling scheme, and are happy with it, or you don’t particularly care about 404 errors.
- Show a selected DotNetNuke page for 404 Errors.
This method allows you to select a page in your DotNetNuke site to show when a 404 occurs. You can tailor the content of this page to exactly what you want users to see, without having any more skill than it takes to create any other DotNetNuke page. When a 404 error happens, this page will be shown. Because it is a full DNN page, it will show the correct menus, links, skin, logos, etc. You can place anything on the page; script, sitemap modules, html, flash, images – anything you can do with a DotNetNuke page
- Show a specified Url in the site.
This method allows you to show another page/handler in your site for 404 errors. You might have a simple html page setup, or you might just want to use the standard DotNetNuke ErrorPage.aspx. Or, you might have written a customised handler for 404 errors which then redirects to a specific page depending on the Url. This option allows to to tailor your needs endlessly.
- Show specific DotNetNuke for some 404 Errors, and show a specified Url for others.
This method uses a Regex expression to show a specified DotNetNuke page (as per the second option) for most errors, but for those matching a Regex expression, to use a Url that you specify. This allows you to have a generic 404 for most errors, but a specific page for other types. For example, you can match on Urls that have ‘ProductId’ and show a product search page.
If you specify either showing a DotNetNuke page or Url within the site, the module will return a true 404 Http status when the page is shown. This method doesn’t use a 302 redirect to a specified page like the standard ASP.NET handling does. The original, incorrect Url will still be in the browser bar for a visitor, not a generic Url like ‘ErrorPage.aspx’.
Note that all of the above options are all strictly point-and-click. There’s no need to modify web.config files, edit custom html files or use FTP to move things around.
404 Error Logging
The Url Master module now has the ability to log 404 Errors per portal in your installation. What this means is that you have a convenient place to check on 404 errors and look for patterns of recurring Urls being requested. This allows you to identify any problem Urls, and, if found, you can setup a 301 redirect for that Url to somewhere valid in your site, using the Custom Redirect functionality built into the Url Master module.
The 404 Log will provide the following information:
- Date/ Time of request
- Url requested that resulted in a 404.
- IP Address request originated from
- User Agent (browser type, or search engine bot type)
- The originally requested Url, if the Url was rewritten before returning the 404 (helps identify Url rewriting errors)
- An identified page or portal alias/ if matched correctly
The 404 Log is self-managing and keeps its size to a maximum amount of rows, to prevent filling up of database tables and disk space.
Other 404 Errors in IIS
Keen readers will note that this discussion hasn’t covered off errors for non-DotNetNuke related Urls, such as static files like images and CSS files. These are not as important, because they show up as broken/missing pieces of information within the site, rather than entire pages showing errors. If custom errors are required for these items, I suggest using the standard IIS configuration to point the 404 handling to the pages you create, or create a new custom page to show 404 errors for static files.
In addition to 404 error handling, Url Master version 2.0 also contains similar handling for ‘500 Application Errors’. A 500 error is shown when something goes seriously wrong with the site and it is not working properly. You can use the same settings to show customised 500 error pages for your site, when it encounters unhandled errors. Unhandled errors are ones that DotNetNuke itself doesn’t detect (when you get an error in a module, you’ll still see the familiar ‘an error has occured’ message). This gives administrators greater control about how the application behaves when things seriously go wrong. You can use a DotNetNuke page, or an external file in the site, like a html file. Or, you can just elect to use the ‘ErrorPage.aspx’. It’s important to note, though, that not all errors can be caught like this : some are so serious that the site never quite gets started in the first place. But you’ll know when that happens, because the server will return nothing at all.
I’m hoping that this is a comprehensive step forwards in 404 handling in DotNetNuke websites. I’m also hoping that this post doesn’t flush out any more suggestions and that I have covered all angles. I can’t wait to release the finished product and finally close the work item on my release management system. The one that says ‘Handle 404 Errors properly’, and that has been open and waiting since June 2008.
Thanks to all the customers, well-wishers and altruistic types who have contributed their suggestions to how 404 errors should be handled. You know who you are, and I can’t say enough how much this module improves through community feedback.