iFinity Blogs 

Building Friendly Urls into DotNetNuke Modules – Part 3 – Implementing new Url Schemes

by Bruce Chapman on Tuesday, November 23, 2010 6:47 PM

This is part 3 of a blog series about developing Friendly Url into DotNetNuke modules.  The prior versions are:

Part 1 : Why Friendly Urls and Understanding DotNetNuke Module Urls

Part 2 : Improving Module Urls

This third part in the series covers practical implementations of the ‘Level 1’ and ‘Level 2’ mentioned in Part 2.

Part 2 Recap

In Part 2 of the series, I introduced 3 levels of DNN improvement:

  1. Incorporating contextual Keywords into the Url
  2. Reformatting the Url for length and increasing the keyword ratio
  3. Creating totally custom Url values with only keywords in them

I’m happy to say that I’ve been contacted by several people developing new modules that have taken some of my advice on board and who are creating DNN modules right now with keyword-specific values. 

I’m going to cover how to implement the changes in levels 1 and 2, and give some important information about preserving existing links.

It’s time to delve in and look at some specific code to help us all work our way forwards.

Achieving Level 1 Urls in 10 minutes of Work

All DNN Urls should be generated by calling the NavigateUrl() function call, which is in the DotNetNuke.Common.Globals namespace.   This series of functions takes the Tab Id, and some other options, and returns you a Url in the format that the DotNetNuke Url Rewriting system is designed to understand.   If you’re developing modules for DNN, you should know this call by heart.

However, underneath the covers, what really happens is that the NavigateUrl() call checks a global host setting to see if the ‘Friendly Urls’ value for the DNN installation is ‘true’, and if so, delegates the construction of the Url out to the Friendly Url Provider.  If you read Part 1 of the series, you’ll know that the Friendly Url Provider is one half of the DNN Url duo – the other half being the Url Rewriter.  It’s the Friendly Url Provider that constructs the Urls that we see through the DNN world, and returns something in the format of example.com/tabname/tabid/xx/key/value/default.aspx rather than example.com/default.aspx?tabid=xx&key=value.

One thing that the Friendlly Url Provider API allows that the ‘NavigateUrl’ calls does not is the supply of a ‘pagename’ parameter.

The ‘pagename’ parameter is always used by the NavigateUrl function(s) by supplying ‘default.aspx’.  Which is why so many DNN Urls you see end in /default.aspx.  This is literally because it is hard-coded that way by using a constant value into the Friendly Url API.

The trick here is to circumvent the NavigateUrl() call and just call the Friendly Url Provider directly, and supply your own ‘pagename’ value.  It doesn’t have to be default.aspx  - it can be anything ending in .aspx, such as ‘migratory-birds.aspx’.

Here’s the code:

  //assume we know the tab (TabInfo), entryTitle (string) and entryId (int) relevant to the module
  string friendlyUrlPart = MakeUrlSafe(entryTitle);
  string url = "";
  if (HostSettings.GetHostSetting("UseFriendlyUrls") == "Y")
  {
     url = DotNetNuke.Services.Url.FriendlyUrl.FriendlyUrlProvider.Instance().FriendlyUrl(tab, _
       "~/Default.aspx?TabId=" + tab.TabID.ToString() + "&entryId=" + entryId.ToString(), friendlyUrlPart)
  }
  else
  {
     url = DotNetNuke.Common.Globals.NavigateUrl(tab.TabId, "", "entryId", entryId);
  }

 

That’s it.  Your only job is to come up with the right bit of text to feed in to create your Friendly[er] Url.  Oh, and a bit of code for ‘MakeUrlSafe’.   This procedure cleans up a Url and removes both ‘unsafe’ Url characters like (: / & ? etc) and ‘unwanted’ characters, which will produce encoding in the end result.  These are characters like accented characters (å,ä and â and others) and ‘control’ characters like spaces, quotations and others.   You can either write this yourself or you can find an example in the ‘MultiContentController.cs’ class of the companion module to this series.  But you’ll have to read to the end of the post to find that.

Using code like this will result in Urls like the example.  And incidentally, that’s how the Blog module Urls are formed.  I think the movement to change the Blog module Urls was partially inspired by my original post, but someone will likely correct me if that isn’t the case at all.

In reality, it would be better if the ‘NavigateUrl’ call already contained an overload so you don’t have to check the ‘UseFriendlyUrls’ setting at all.  I originally opened a DotNetNuke support request in Gemini 2 years ago to get this : happily due to me writing this series, it appears to have been re-opened.  Here’s the request : http://support.dotnetnuke.com/issue/ViewIssue.aspx?id=7400&PROJID=23

The request outlines how the ‘pagename’ value could be incorporated into the base ‘NavigateUrl’ call to encourage module developers to write code similar to that outlined above.

Achieveing Level 2 Urls – Reformat the Url and increase the Keyword Ratio

I’m not silly enough to put a time limit on this type of change because it’s highly dependent on the existing implementation for your module.    Sometimes it will be easy, sometimes more difficult.

In the example given in part 2, we want to change a Url from :

example.com/species/tabid/75/entryid/56/default.aspx

into:

example.com/species/56-migratory-birds

or, if the standard DNN Url Rewriting solution is to be used:
example.com/species/tabid/75/entry/56-migratory-birds/default.aspx

The first example is achieved through a combination of changing the way the module reads the Urls and implemening the Url Master module.

For now, we’ll concentrate on the changes to the module code lest this just sound like one big advertisement.

Removing or Replacing the Url ‘Key’ values

The first thing to note with the original Url is that ‘entryid’ means nothing, and is only used as a programming artefact.   Many developers reflexively use querystring key values like this without ever stopping to think about it.  They do this because they are bubbling up a part of the database design into the interface, and, well, ‘EntryId’ is the name of the column on the database, so it’s plain good coding to match the querystring value with the database column name.  Right?

Well, no.  The Url, as I have previously argued, is part of the user experience, and shouldn’t be considered a programming artefact like a variable name or database column name.  You wouldn’t show a visitor that the page has a variable called _sessionId or that all the initialization code runs in the Page_Init procedure, so don’t show them that your database has a column called ‘EntryId’.

So, the first step is to either customise that value in the Url, or, if you can, remove it altogether.

By customise, I mean, give the administrator of the site control over the value.  It’s dead simple to allow an administrator access to change the Key value for a querystring.  Just put it into the module settings as a value, then retrieve it when the module loads up, and then go looking for what is in the Url.  I’ve seen the highly-useful and flexible ‘Property Agent’ module from Ventrian be used for, well, things other than properties.   And it doesn’t look half strange when ‘propertyId’ is used to identify something that isn’t a property.

By remove it altogether, I mean get rid of it completely.  You can only do this if you’ve got the Url Master module (or some other rewriting module that allows odd numbers of querystring items).  The Url Master module will translate the friendly url (/key/value/key2/value2) into a querystring like this : &key=value&key2=value2.

But if you don’t have any key items, then it will do this:

&value=value2

You can’t look up this item using Request.Querystring[“key”] anymore.  But you can look it up using this type of logic:

private static List GetEligibleQueryStringParameters(NameValueCollection queryString, string ignoreKeys)
        {
            if (ignoreKeys == null)
                ignoreKeys = "Tab,error,ctl,runningDefault,language";
            string regexFilter = ignoreKeys.Replace(',', '|');
            List queryStringPath = new List();
            foreach (string key in queryString.Keys)
            {
                if (key != null)
                {
                    if (!Regex.IsMatch(key, regexFilter, RegexOptions.IgnoreCase))
                    {
                        //add the key
                        queryStringPath.Add(key);
                        //add the value
                        queryStringPath.Add(queryString[key]);
                    }
                }
                else
                {
                    if (queryString[key] != null)
                    {
                        //you can have a null key and an actual value
                        queryStringPath.Add(queryString[key]);
                    }
                }
            }
            return queryStringPath;
        }

What this logic does is take the QueryString collection from the Request.QueryString value, and return the values that aren’t part of the DNN Control values (ie Tabid, ctl, etc) as a simple string list.  If you feed a Url like this:

example.com/species/tabid/75/entryid/56/migratory-birds/default.aspx

into the above procedure, you’ll be returned a List that looks like this:

entryid,56,migratory-birds

It’s trivial to iterate this list looking for the thing that is a numeric value. You don’t even need to know what the original ‘key’ looked like.

Note that you’ll want to allow some customisation of the ignoreKeys value.  I’d suggest putting that into the module settings with the default shown.  This would be to allow the module flexibility with future changes in DNN and cross-compatibility with other modules your module may share a page with.

So, using this logic, and if using the Url Master module, and removing the ‘entryid’ completely, you can have this Url:

example.com/species/56-migratory-birds

and, fed into the procedure, you’ll end up with this returned:

56-migratory-birds

Once you have this list, you can stop thinking in terms of ‘key/value’ pairs and just think about the data given to you by the Url.

Ideally you might allow an option, so that if the module-administrator wishes, they can (a) leave a ‘key’ value as a default, (b) substitute the key value with something else and (c) discard it altogether if the Url Rewriter being used allows it.

One final note : some sharp readers may be thinking ‘aha! but if someone modifies the Url slightly and changes the order, then this logic will no longer work’.  To that I say : correct!  We are trying to make canonical Urls here – one single Url that points to one single piece of content.  If someone messes up the Url and transposes the values – that Url is no longer valid, even if the valid information is still buried in there.  I’m intentionally discarding Url flexibility in the quest to make the Url short and concise.  If you can’t work out what to return, then either redirect the user to a ‘home’ (for the module) page or show a 404 error.

Rethinking the Url Contents

The next step is to re-think what actually goes into the Url.  Take, for example, a DNN Forum Url like this one:

http://www.ifinity.com.au/Products/Support_Forums/forumid/8/threadid/212/scope/posts

The most important item in this Url is the threadid.  The forumid is actually not needed if you have the thread id – the query to return it from the database can easily determine the forumid if you have the threadid.  And the ‘scope/posts’ – well, this is obviously used to control what is displayed on the page.  But, you can indicate existence of a condition by an absence of something.  So, the ‘scope’ value should use logic instead that determined that, if the ‘scope’ value is missing, then assume ‘posts’.   Because almost all web content has a default view setting – that which an anonymous, newly arrived person will see if they just hit the site. 

This is the most important Url of all, because it’s what new visitors and search engines will see.

All the other Urls (edit pages, special views, etc) aren’t anywhere near as important.  In fact, it doesn’t really matter if your edit Url is pig-ugly because you have to be logged in to view it, and it’s not going to get indexed or posted on Facebook.

Just those two improvements would drastically reduce down the size of the Url to /threadid/212.  And if I could customise the ‘Key’ value via the module settings, I could change that to be:

/support-request/212

Because the forum on this site is primarily for providing support of modules, that would make a lot more sense.

But we need to go further – that thread is the 'Url Master Module Release History’ thread – something which you can deduce from just looking at the Url.

In the code listed above for ‘Level 1’ improvements, I suggested adding in a descriptive keyword to the Url via the Friendly Url API.  You could do that here, but you still end up with something like /212/url-master-release-history

What I’m going to suggest here is that you combine the Id and the descriptive text to create what I call a Url Value.  The Url Value is the bit of data that is used to find your content.  It breaks the 1:1 relationship between Id in the database and Id in the Url.

For the forum example, the Url Value could be /212-Url-Master-Release-History

Once you have the Url Value in your code, it’s pretty simple to get the relevant Id from the front using a little bit of Regex:

public static bool ReturnIdFromUrlValue(string urlValue, out int value)
{
    value = -1;//initialise to -1 to show if failed
    bool success = false;
    Regex idRegex = new Regex(@"(?\d+)-.+", RegexOptions.IgnoreCase);
    Match idmatch = idRegex.Match(urlValue);
    if (idmatch.Success)
    {
        string rawValue = idmatch.Groups["id"].Value;
        if (int.TryParse(rawValue, out value))
            success = true;
    }
    return success;
}

Note this includes no rewriting – just a way of interpreting the already-rewritten Url.

So, by combining the first piece of code (which decomposes the Url into a list of module-specific values) and this piece of code, you can obtain the correct Id from a Url that doesn’t look anything like the original.

Getting your Friendly Url

Now that I have shown how to get the correct id value from a much-changed Url, how do you go about creating your Friendly Url to start with?

The first item to decide on is where to get your Url Value from.  The ideal place is something that is already content and keyword specific.  Sometimes this might be something like a blog or forum title, or perhaps some other closely-related field.   You can also create a custom field so that administrators can enter a value in.  But if you do this, I would suggest pre-emptively filling this value out with something related.  An empty box can provoke severe writers block in some people, particularly if they aren’t completely sure what they are looking at.   Any cursory examination of website usernames will show that up pretty quickly.

Once you’ve decided on your source data, the next important thing is to clean up the value.   What this means is removing any non-ASCII characters from the content, and removing any illegal Url characters as well.  Illegal characters are those like :/?& which form part of the control characters for Urls.  Unwanted characters are those which become encoded when viewed in a browser, and include spaces, punctuation markers and other characters like accented characters such as å and è.  While some punctuation marks will show up without encoding, they tend to produce issues in the rather messy business of Url posting and sharing, so I recommend boiling it down to essentially a 26 character set and a couple of safe punctuation markers like – and _.

This next piece of code covers this process, and it comes from the MultiContent module which is the companion module for this series.

public static string MakeUrlSafe(string urlName, string punctuationReplacement, int maxLength)
{
    if (urlName == null) urlName = "";
    urlName = urlName.Normalize(System.Text.NormalizationForm.FormD);
    const string illegalCharacters = "#%&*{}\\:<>?/+'.";
    const string unwantedCharacters = ";,\"+!'{}[]()^$*";
    StringBuilder outUrl = new StringBuilder(urlName.Length);
    int i = 0;
    foreach (char c in urlName)
    {
        if (!illegalCharacters.Contains(c.ToString()))
        {
            //can't have leading .. or trailing .
            if (!((i <= 0 || i == urlName.Length) && c == '.'))
            {
                if (c == ' ' || unwantedCharacters.Contains(c.ToString()))
                    //replace spaces, commas and semicolons
                    outUrl.Append(punctuationReplacement);
                else
                    if (System.Globalization.CharUnicodeInfo.GetUnicodeCategory(c) 
!= System.Globalization.UnicodeCategory.NonSpacingMark)
                    {
                        outUrl.Append(c);
                    }
            }
        }
        i++;
        if (i >= maxLength)
            break;
    }
    string result = outUrl.ToString();
    //replace double replacements
    string doubleReplacement = punctuationReplacement + punctuationReplacement;
    if (result.Contains(doubleReplacement))
    {
        result = result.Replace(doubleReplacement, punctuationReplacement);
        //once more for triples
        result = result.Replace(doubleReplacement, punctuationReplacement);
    }
    return result;
}

This routine cleans up the Url and returns a Url Value that is both safe for use, has replaced unwanted punctuation with a character you've supplied (which is adjustable through module settings, of course!) and is restricted in length.

While on the length topic, don’t restrict the length to too short – it isn’t a creative writing challenge like trying to name a set of documents in DOS 2.0 with an 8.3 filename set.  Instead you should be aiming to balance between giving your end-users flexibility in creating a good Url Value and not giving them too much space to mess things up by including too much.  I personally feel that many blogging platforms would be better served by enforcing a shorter Url when the blog title itself is rather long.

Of course, if you’re going to include your Id value, then you can include that in a copy of the above routine, so that the entire logic gives back the value of what the ‘Url Value’ is going to be – all ready to feed into the NavigateUrl() call, to get back your Friendly Url.

Handling Old Urls when Replacing with New Urls

Once you’ve designed a great new Url scheme and implemented it, assuming you have an existing module – you now have a problem.  All those linked, bookmarked and indexed Urls still need to work.  You don’t want to 404 an entire back-catalog of links with the rollout of a new version of your module.

Handling your old Urls is simple : just leave the old Url handling code in place, and put your new Url handling code first in the logic, with a success/fail marker for being able to obtain the Url from the new routine. If it’s a fail, then you can check for the correct value using the old logic.

Once you have obtained the correct value (for example, the database id of the linked content) from your Url, you can then do a reverse comparison against the requested Url.   This will tell you whether or not the Url was requested with the correct, latest, friendly Url.  You can obtain the latest, friendly Url simply by calling the Friendly Url generation logic that you use to create your Friendly Urls.

When you’ve got the Url, then compare it to the incoming Url of the current request – if they are different, then you know the incorrect version was used – this approach is flexible enough that you can support any number of ‘old’ formats.

This code is pretty simple:

public void RedirectIfOldUrl(int databaseId)
{
    string friendlyUrl = GetFriendlyUrl(databaseId);
    string requestedUrl = Context.Items["UrlRewrite:OriginalUrl"];
    if (string.Compare(friendlyUrl, requestedUrl, true) != 0)
    {
       //add a header to explain what happened
        Response.AppendHeader("X-Redirect-Reason", "Old Url " + requestedUrl);
         //do a 301 redirect
        Response.Status = "301 Moved Permanently";
        Response.AddHeader("Location", friendlyUrl);
    }
}

You'll note that this also adds a response header in as part of the redirect. This is just something that I always put in when doing something like a redirect, because if it is playing up, you'll want to find out why, and most debugging tools like Firebug or Fiddler allow you to look at the response headers.

Also important is the use of case-insensitive comparisons - you don't want to cause a redirect loop caused by this issue. You may also wish to test your module in http/https configuration, and with different browsers as well.

Finally (and most important) you should have a setting which allows you to switch this feature off. If it isn't working, the usual symptom is a 301 redirect loop. Firefox and Chrome will detect this and show a message, but IE will just sit there blank, never actually loading the page. This will result in reports that the site is broken and you won't be able to diagnose what the problem is until you hook up some type of Http traffic monitoring tool

Tying it all Together

You can mix and match all of the techniques here to modify the way your module code works with regards to the Urls.   However you do it, you’ll need to cover these three key areas:

  1. Interpreting your incoming requests to identify the correct / relevant id which will allow you to retrieve the unique piece of content associated with that Url.    If you keep the key/value paradigm you may not need to change anything.
  2. Building your Friendly Url – this is the process of taking the unique id and blending it with a text value from somewhere in your module to create the most relevant keyword text for the Url.
  3. Redirecting your Old Urls – working out when the best Url wasn’t used, and redirecting to that best Url.

Note that the techniques and samples on this page are ideally suited for integration into existing modules.  For new modules, the techniques presented in the next post in the series will go further than this.

Companion Module

This Blog Series (and the associated Conference presentation) has a companion module.  This is called the ‘iFinity MultiContent’ module.  This module is a simple way of displaying different Text/Html content on a page, by varying the Url value.  It’s built with the purpose of displaying the techniques that will be shown throughout this series of blog posts, and comes available with source code.

The MultiContent module shows ‘Level 3’ Urls at work, and the source code shows the techniques used to do this.

You can download it (for free) from this location : iFinity MultiContent Module

The next (and final) part of the series will cover the techniques used by the MultiContent module to create short, keyword-only module Urls.

Blogs Parent Separator Crafty Code
Author
Bruce Chapman

The craft of writing code. The outcomes from being crafty with code. Crafty Code is tales from the coding bench.

3 comment(s) so far...

Lucas Jans 12/21/2010

Wow, this is a lot to take in. Thanks for putting this together. As a former SEO turned project manager, it drives me nuts to see bad URLs in DNN. <br><br>One question I have is about linking to pages within a module. DNN is great at inserting that javascript link which performs a postback and then redirects to the correct page. But for link juice, this is terrible. The spiders will have little reference of site structure and your juice will die on the first page.<br><br>What is the best way to insert the friendly URL as a link to other parts of the same module?

 
Bruce Chapman 12/21/2010

@Lucas that's a very good question and it's a bit remiss of me not to include it. If you have a module which has parts spread across several pages, then you shouldn't use postback links to go between pages. You should just use normal links. In order to know which tabs the other parts of the modules are on, it's best to include a dropdown within the module settings, configured at either a tabModule or module level. The drop down is a list of pages, and the administrator can choose which page the other 'part' of the module is on, which is stored in the module settings. Then, when the module is building links to the other parts of the module the settings are read, and the url generated in the normal way.An example would be a shopping cart module. The product listing page settings would have a setting for 'product detail page'. Then, when the links for the product details are being built, the setting would be read, and the links built to point to the details. Thus it is all spiderable and very seo friendly, particularly if the links iunclude the product name as the main part of the Url.

 
Bruce Chapman 12/12/2011

New functionality has been added to the Url Master module, which allows you to build friendly url providers as a plug in. This allows developers to encapsulate much of the functionality described in this post, but with most of the more technical parts abstracted away and managed by the Url Master module. <br /><br />See more on this here :<br /><a href="http://www.ifinity.com.au/2011/10/26/Build_your_own_Custom_DotNetNuke_Module_Provider_and_start_creating_the_Friendly_Urls_you_need_for_your_SEO_strategy" rel="nofollow">www.ifinity.com.au/2011/10/26/Build_your_own_Custom_DotNetNuke_Module_Provider_and_start_creating_the_Friendly_Urls_you_need_for_your_SEO_strategy</a><br /><br />(note above Url is built with new Blog Module Friendly Url Provider plug-in, which modifies the blog Urls for the DNN Blog module)

Bruce Chapman
Hi, I'm Bruce Chapman, and this is my blog. You'll find lots of information here - my thoughts about business and the internet, technical information, things I'm working on and the odd strange post or two.
Connect with Bruce Chapman on Google+

Share this page
Get more!
Subscribe to the Mailing List
Email Address:
First Name:
Last Name:
You will be sent a confirmation upon subscription

Follow me on Twitter
Stack Exchange
profile for Bruce Chapman at Stack Overflow, Q&A for professional and enthusiast programmers
Klout Profile