iFinity Blogs 

A Google Sitemap Provider for DotNetNuke using the ASP.NET Provider model

by Bruce Chapman on Thursday, October 19, 2006 1:14 PM

Introduction

In a previous article I wrote about a Google Sitemap generator using the ASP.NET Provider model. Well, that whole basis was a learning exercise based on me wanting to sharpen up my knowledge about the intricacies of the Provider model. I had only ever worked with existing code, and hadn't written my own providers from 'file->new project' if you get my drift. This new entry takes that idea one step further and implements it for DotNetNuke based websites.

It was all a lead-in to the 'real' project, which was developing a DotNetNuke-specific Google Sitemap generator using the Provider model. Sure, these are around, I myself have downloaded and used the DotNetNuke Google Site Map developed by bitethebullet.co.uk - the person behind it seems to want to remain anonymous, but good work there. However I found it slightly limited in its use, particularly when you start adding in modules where the number of URL's for a single page (tab) starts to grow. The standard DNN Blog module is such an example - for a single page there can be many different URL's - one for each Blog and Blog Entry. Then you've got the date-based Archive URL's. A more sophisticated, module specific Google Sitemap tool was (is) needed.

My Requirements

I develop website based on DotNetNuke, and as such most websites have a common core which is pretty much the same - a series of pages with HTML modules on them. Specific websites have specific functionality, such as e-commerce modules, enquiry modules, and of course, blog modules. So, to get a Google Sitemap generator that would cater for all the different (read: complicated) module types, I realised I needed a flexible model. Enter the Provider model.

The requirements were:

1. Base SiteMap generator which would index 'normal' DotNetNuke pages, and obey rules to do with 'hidden' pages and pages available only to registered users.

2. Extensible model so that run-time configuration using web.config could be achieved for more complex modules.

3. Everything to be accomplished using Assemblies with no .aspx, .ashx or any other type of ASP.NET page. Everything had to drop into the \bin directory for deployment.

My Design

Using my original Google Sitemap provider model as a starting point, I added in DNN-specific code. The original prototype for an ordinary ASP.NET site simply iterated the all of the files on the web server and built up a sitemap based on physical files. As DotNetNuke uses a single page (default.aspx) and determines the content based on the request URL, the new site design relied on reading the DNN Tabs collection for a specific portal, and building up the Sitemap this way.

Each page has to be checked for the security level - pages that are not visible to the public are not put into the sitemap, and hidden pages are shown/hidden to the sitemap based on a switch set in the web.config file.

This works quite well and generates a successful sitemap for any 'standard' DNN site (version 4 upwards). This takes care of my requirement number 1.

As in my earlier prototype design, the ASP.NET Handler was built into the provider DLL, so that the handler and the provider are in the same assembly. There are reasons for this not being conceptually correct, but I chose binary simplicity over true separation of components. This takes care of my requirement number 2.

Extending the Model

As in my requirement number 2, a flexible and extensible model is required to cater for more complex modules sitting on DotNetNuke pages. This is where the ASP.NET Provider model comes in. I ensured that the Base Provider Class was designed to be derived by setting the class accessibility modifiers, and created a new Assembly called BlogGoogleSiteMap. The main type in this assembly, BlogGoogleSiteMapProvider, inherits from the original GoogleSiteMapProvider type. This gives it the base functionality for generating a Sitemap, transforming the starting URL into a DotNetNuke PortalAlias instance and other assorted functions. However, by redefining the SitePages(siteURL) method (which returns an object collection of the logical page URL's for the given actual URL), the new BlogGoogleSiteMap provider works on the specific Blog nuances.

Stitching it together is done in a procedure in the base DNN Google Site Map Provider type. This method, called <> reads each of the modules on a specific DNN page (actually an instance of a TabInfo object). For each of these modules, the ModuleDef type instance is loaded and the FriendlyName of the module definition is read. This gives us a unique indicator of what modules are on the page - in effect telling what the content of the page is. You could implement a switch or Case statement here, and call a specific piece of code based on the module definition. But that would violate requirement number 2 - an extensible model. Each specific module definition means a new Module-specific Assembly to reference, and the whole SiteMap Provider would lose portability. You'd have to upload every binary for every module you've ever programmed into the Google Sitemap provider.

Instead, the design uses a simple naming format of ModuleDef.FriendlyName + ".GoogleSiteMapProvider" to locate the correct provider for the specific module in the web.config.

For instance, when searching for a Blog entry in a collection of modules on a DNN page, you'll come across the ModuleDefinition Friendly name of 'View_Blog'. To define a Google Sitemap Provider for this module definition, all that is required is an entry in the web.config entry:

The base DNNGoogleSiteMap Provider has this FriendlyName => correct Google Sitemap Provider code built into it. Each module on each page is checked for an entry in the section. When it finds a matching module, it will load the named Provider and call it to get the list of URL's for the specific page/module combination. If it doesn't find a module entry in the config section, it just returns the 'normal' DNN page URL for the specific page without attempting to load any module-specific Provider.

Provider-per-module

With this framework in place, any specific module in DNN that uses more than just the standard page URL can have a specific GoogleSiteMap Provider developed for it. This Provider can then just be dropped into the specific website that uses that module. So if you are creating a lot of different DNN sites, all using a different mixture of modules, you can quickly configure up the required Google SiteMap for the specific website, just by dropping in different Asssemblies into the \bin directory, and by modifiying the web.config file. No recompiles or modifications to the DNN base are needed.

The DotNetNuke Blog Google Sitemap provider

The specific Blog Google Sitemap Provider works in a specific way. Firstly, by reverse engineering the Blog code and testing with some examples, I figured out that there really is only one 'Blog Set' per portal. You can put blogs and blog entries across specific pages on the site, and you can associate specific blogs with specific pages, but the relationship is on a portal->Blogs->Entries rather than Portal->Page->Blogs->Entries as you might expect on first glance. There isn't really a way of associating a specific blog with a specific page, as all blogs can be viewed on all blog-specific pages through the link navigation that comes standard in the blog module. I can understand why the designers did it this way, as it gives complete flexibility for a visitor to browse the entire set of blogs/entries on a site without hunting around for them.

With this in mind, each page can therefore have a full set of Blog-related URL's associated with it. Depending on the number of blogs, this can quickly build to a high number. But most Blog installations I have seen tend to stick the entire blog-set on a single page in the site and leave it at that.

The Blog Sitemap Provider iterates through each Blog and Blog entry and puts in an entry for each specific URL in the Blog. The site of URL's for a specific page might be:

http://www.yoursite.com/blog/tabid/15/default.aspx //the standard blog page
http://www.yoursite.com/blog/tabid/15/blogid/1/default.aspx //the standard page for BlogID = 1
http://www.yoursite.com/blog/tabid/15/entryid/2/default.aspx //the specific URL for EntryID = 2 (entries are unique across all blogs)
http://www.yoursite.com/blog/tabid/15/blogid/1/entryid/2/defaut.aspx //the specific URL for BlogID = 1, EntryID = 2

The last URL (blogID, entryID) will produce an identical page as the 3rd in the list (EntryID only) because each EntryID is unique across the Portal, regardless of which actual Blog it belongs to. This means that the two URL's provide an identical page, and following Google's guidelines about identical content, the last URL doesn't get submitted by the Blog Google Sitemap Provider.

The Blog Google Sitemap Provider also has a configurable web.config entry in the Provider entry which specifies whether or not to include the Blog archive. By setting this to true, it will include links to the Archive of blogs. Now this may or may not be identical to the individual Entry URL's, depending on whether or not it is the custom for the site to only have a maximum of one entry per day. It is a judgement call by the website owner whether including the Archives into the Google Sitemap is necessary or not. Archives have a specific URL pattern - and for some reason this always reverts to parameter driven (non-friendly??) URLs, such as :

http://www.yoursite.com/default.aspx?tabid=15&BlogDate=2006-10-11

The way it actually works is that the Blog page will show all entries in the month up until the date specifiied. So submitting a date of 11-Oct-2006 will return all Blog entries from the 1st October, 2006 to the 11th October, 2006. Again, whether or not this produces a unique page depends on the number of entries. A blog with one entry a month will produce roughly the same page content, but a blog with one or two entries a week will provide specific enough pages to be bothered with including the Archives. Remember that there will be a distinct URL for each and every Blog entry. There is a limit on the Google Sitemap file of up to 10,000 URL's, but if you've done 10,000 Blog entries perhaps a career in writing awaits you instead of configuring Google Sitemaps.

The page update frequency and page priority elements in the standard Google schema are optional, and there is a school of thought that says 'don't provide any information, it can only get you into trouble'. I don't agree with this, and obviously Google wants to know how often your pages get updated. Some might want to say 'every day' and 'priorty = 1' thinking this will get them more frequent Googlebot visits, and somehow, higher page rankings. Google couldn't be clearer on this point, and in their Sitemap help section, state that there are hints only, and it is up to the Googlebot if it will follow the hints or not. I figure there is no point telling the Googlebot that a page is updated daily, when in reality it never changes. It wouldn't take much of a smart programmer to compare the cache last time with the current version of a page and determine that no content is different.

With this in mind, in the Blog Provider, I have developed a simple algorithm to compare the time between entries, and to supply a rough estimate on page update frequency depending on how often the page is getting updated. For the Blog pages, this is how many new entries are going in. For the Entry pages, this really is dependant on how many new comments are being added to your Blog entry. If nobody comments on the page (or you have comments turned off) then that entry, once posted, is probably never going to change. Accordingly, it will be shown as PageUpdateFrequency=Never in the associated Sitemap.

The page priority is a relative term - and by filling this out, you are ranking pages within your own site on importance. Given this, the Blog Provider rates the newest Blog pages as whatever is set as the defaultPagePriority in the web.config. However, it halves this value for the Blog Archives, as you would expect the archived pages to be less relevant than the newer postings. I'd like to do a long-term study on Sitemaps and web logs to see if changing the Page update frequency actually changes the way that the Googlebot accesses the pages in a site, but that will have to go into the 'one day' pile of projects-to-do. Actually, given the database-centric DotNetNuke site logs, it's probably not that hard and would yield interesting data when studied over a significant period. Back to the topic...

Installing and Configuring the example DNN and Blog Google Sitemap Providers

If you've downloaded the code and wish to install it on your site, first place all the of the DLL's into your sites \bin directory. This includes the Utility DLL's and other associated items in the download. Then open your web.config and make the following modifications (of course, I don't need to tell you to backup your web.config first, do I??)

In the , under the element, place the following entry:


type= "iFinity.DNN.Modules.GoogleSiteMap.GoogleSiteMapSection,
iFinity.DNN.GoogleSiteMapProvider" />

This entry tells ASP.NET that there is a configuration section called 'googlesitemaps' when the in-built Google Sitemap HttpHandler is called, which brings us to the next entry required, the HttpHandler. In the section, add the following entry:


type="iFinity.DNN.Modules.GoogleSiteMap.GoogleSiteMapHandler,
iFinity.DNN.GoogleSiteMapProvider" />

This tells ASP.NET that any request coming for GoogleSiteMap.axd should load the GoogleSiteMapHandler, located in the iFinity.DNN.GoogleSiteMapProvider Assembly. Within this handler lies the code that then loads the actual Provider for the GoogleSiteMap.

The next entry in the web.config is the section that ASP.NET was notified of in the first entry above. This contains the actual specification for the Providers that are used in providing Google Sitemap services. This entry should be placed at the end of the web.config file, underneath the section, and before the section closing element.

              
type="iFinity.DNN.Modules.GoogleSiteMap.GoogleSiteMapProvider"
defaultPagePriority="0.5" defaultPageUpdateFrequency="daily"
includeHiddenPages="false"/>

The element provides the place to put any custom providers for specific modules in the DNN framework. The first entry is the 'default' provider and is the base DNN Google Sitemap Provider I have developed. This is in the same assembly as the HTTP handler. The defaultPageUpdateFrequency and defaultPagePriority attributes tell the Provider what to output in the Sitemap XML. There is also an attribute for specifying whether or not hidden pages should be included in the Sitemap or not.

The second entry is the Blog-module specific entry. The naming standard of this relates to the explanation earlier of how the default Provider discovers and loads Module-specific Providers. Because the name is 'View_Blog.GoogleSiteMapProvider' (an inbuilt naming standard) the default Provider knows that this particular Provider should be called for the Sitemap entries whenever a Module on a Page, using the ModuleDefinition FriendlyName of 'View_Blog' is found. Because the BlogGoogleSiteMapProvider uses the GoogleSiteMapProvider as a base class, it also has the 'defaultPagePriority', 'defaultPageUpdateFrequency' and 'includeHiddenPages' attributes. However, the Blog Provider also adds in a new Attribute called 'showArchives' which was covered earlier. This list of attributes can be expanded indefinitely for individual Module-specific requirements.

Creating your own DNN Module-specific Google Sitemap Providers

If you've developed your own Private Assembly module for DotNetNuke and it uses more than the standard Page URL to deliver content, the Provider model outlined is a good way to deliver a Google Sitemap for it. All you need to do is create a new Assembly, reference the iFinity.DNN.Modules.GoogleSiteMap Provider Assembly and derive your own Provider type from the base GoogleSiteMapProvider type. You can then redefine the SitePages(siteURL) method to index your page in whichever method its most appropriate. The List of SitePage objects returned from your Custom Provider will then be included in the overall list of SitePage objects the GoogleSitemapProvider first generates, and then transforms into Sitemap-schema compliant XML.

Shortcomings and potential expansions.

The Provider based model for Google Sitemaps is quite simple - but then Google sitemaps are deceptively simple themselves. There are a few things that could be changed. I haven't used the code for long enough to determine any major shortcomings with the approach, except for perhaps performance. But given the Googlebot tends to only read the Sitemap on perhaps a once-daily basis, I see it as a respectable tradeoff to get flexibility in the approach.

Expansion, apart from adding more module-specific providers, could include GZip compression of the Sitemap within the base code, as Google allows a GZip compression of the sitemap. It could also be changed into a Sitemap-set of files, to get around the potential 10,000 URL limit for a large site (think listing site, such as a classified listing site where each item for sale gets it's own URL). Google allows the definition of a sitemap index file, which then refers to individual sitemaps. This could be done by the base provider generating the index file, and a series of individual providers returning their own sitemap files. I don't have the requirement for this at the moment, but it could be done easily enough by changing the base code provided here.

Summary

Hopefully this code will be of use to someone else, as it in itself is based on an open source project and the work of others. I'd like to know that people find it a useful way of incorporating Google Sitemaps for their custom DNN Modules without having to re-write the entire sitemap-generating code each time. Maybe it could even be included in the core of a future DNN release and a standard based around the concept for custom module providers to adopt!

This Article is also published on 'The Code Project' and various ASP.NET/DotNetNuke Forums



Update 11 Feb 2008


I have updated the Google Sitemap Provider code. See here for details


If you just want to download the latest code, go to Free DotNetNuke Downloads page.

Blogs Parent Separator Crafty Code
Author
Bruce Chapman

The craft of writing code. The outcomes from being crafty with code. Crafty Code is tales from the coding bench.

17 comment(s) so far...

Anonymous 10/19/2006

I forgot to add: when you have installed the Sitemap generator, your sitemap will be available at : www.yoursite.com/googlesitemap.axd.

This is the URL you should put into your Google Sitemaps account.

You can view my sitemap at http://www.iFinity.com.au/googlesitemap.axd

 
Anonymous 10/23/2006

The provider model you have developed is a really great idea. There are loads of modules that create "virtual" pages. One I can think of immediately that a lot of people would want in their site map is CataLook online store.

 
Anonymous 10/24/2006

Great Article, I'm looking at building an extension for my news articles module now.

 
Anonymous 12/12/2006

I installed this on my site but I get a blank page back. There are no errors generated just a blank page. I double checked all the web.config entries and they all appear to be blank. I get this on both localhost and the webs live site. Any ideas what might cause this behavior

 
Anonymous 12/12/2006

I made a comment a few minutes ago about a blank page. I then realized I just installed Snapsis.com pageblaster I uncommented that and am now getting this error.


-
-


Do you have any ideas?

 
Anonymous 12/12/2006

Larry

It's difficult to diagnose but perhaps you don't have your web.config entries correct? Make sure that all of the web.config entries match the example. You won't be able to post xml on this blog. I have only tested the code against a DNN 4.02 installation so if you have an earlier installation it may be to do with this.

 
Anonymous 1/8/2007

Any chance this could be modified to work with DNN 3.x too? The other sitemap programs I've tried for 3.x have all the limitations you mention in your article.

 
Bruce Chapman 1/8/2007

Tracy

The principal difference between DotNetNuke 4.x and 3.x is the use of .NET 2.0 vs .NET 1.1. I don't have an environment capable of compiling ASP.NET 1.1 applications. However, I don't think there is any intrinsic reason the code wouldn't work on 3.x with some modification. Specifically you'd have to investigate the backwards compatibility of the web.config code as some changes were made in this area in .NET 2.0. Additionally, I have used the List<> generic type within the code and this is a .NET 2.0 addition. The basic concept, though, would be backwards portable if you had the skills.

My first advice would be to upgrade your installation to the equivalent .NET 2.0 platform and upgrade to DotNetNuke 4.x, but I realise that's not always possible. However, support for the 3.x codebase will be soon dropped by the Core team as per the DotNetNuke site.

My 2nd piece of advice would be to open up the source code in .NET 1.1 and see what happened. It won't compile because of the later code features, but these can be refactored to use .NET 1.1 methods and types.

 
Anonymous 4/19/2007

Thanks Bruce, this worked a treat for us. It was so useful that I decided to blog about it myself!
http://www.netpotential.co.nz/readme/Blogannouncements/tabid/81/EntryID/21/Default.aspx

This is the first time I did a 'Trackback' post - did it work?
Cheers,
Tim

 
Anonymous 7/23/2007

Bruce, thanks for sharing this. One question though. What about using in the Sitemap XML file the Permalink URL for blog entries (which is in the form http://www.myDNN site.com/Default.aspx?tabid=87&EntryID=220) instead of the 'Friendly URL'? Could this be acheieved easily? The fact that the full Permalink is stored in a field in the DNN database should make it easier (as the URL does not need to be 'assemebled'). The Permalink URL has many advantages from a search engine point of view (e.g. it will never change while a friendly URL can change). Any ideas on how to achieve this?
Thanks,
Eoghano

 
Anonymous 7/23/2007

Eoghano

Yes putting in the permalink Url is very easy, it's just a case of retrieving it from the database instead of building it up the way I do using the friendly urls. I've supplied the source code, so you can modify it to suit yourself, as long as the original copyright messages are maintained.

I disagree with you about the permalink being a better way though, because having a friendly url means having keywords in the url, which is far better from a search engine point of view than meaningless query string values. In addition, the whole point of the sitemap is to keep search engines updated on what url's are in your site, so if one does change the search engine can pick up on it quickly. If I'm doing a future release, I might add it in as a configuration option, but I don't plan on using the permalink for my sites.

-Bruce

 
Anonymous 1/23/2008

Hi Bruce

I've installed this as per the instructions on DNN 4.8 but every time I install the new web.config I get an error.(server ever in /). I have to quickly switch back to the backed up web.config. All the dlls are installed in /bin. I've checked and double checked my web.config and can't see what's wrong.

Any ideas?

 
Anonymous 1/23/2008

@Gus

Can you check the event logs of the server to see if there is an error there? The code definitely works with 4.8. Do you get the error on every request, or only when you request the GoogleSiteMap.axd??

 
Anonymous 1/25/2008

as soon as I overwrite the old web.config with the new one when I browse to any site on the portal it returns the following error:

Server Error in '/' Application.

Nothing is recorded in the event log when this happens.

I can't see any syntax error in web.config.
Any idea?

 
Anonymous 2/20/2008

@Gus<br><br>You've most likely got a compilation problem in your web.config. I'd do a comparison on your old web.config with the new one using a Text comparison tool (I use Winmerge, it's free). 99 times out of 100 the error you are seeing is a web.config compilation error

 
mb 8/5/2008

FYI...I don't think your sitemap is up right now. When I went to the link you refer to above, I receive the following: "An Exception has occured : Object reference not set to an instance of an object. ...".

 
Bruce Chapman 8/5/2008

Right you are - I'll have it fixed in a jiffy!

Bruce Chapman
Hi, I'm Bruce Chapman, and this is my blog. You'll find lots of information here - my thoughts about business and the internet, technical information, things I'm working on and the odd strange post or two.
Connect with Bruce Chapman on Google+

Share this page
Get more!
Subscribe to the Mailing List
Email Address:
First Name:
Last Name:
You will be sent a confirmation upon subscription

Follow me on Twitter
Stack Exchange
profile for Bruce Chapman at Stack Overflow, Q&A for professional and enthusiast programmers
Klout Profile