Googlebot and ASP.NET 2.0 URL Rewriting
URL rewriting is a really important part of search engine optimisation (SEO), so when I built a new site a few weeks ago I decided that pages would be nice and friendly, free of messy querystring parameters, and full of relevant keywords. This sort of thing:
http://www.mysite.com/gadgets/widget.aspx
Instead of this:
http://www.mysite.com/DisplayPage.aspx?productId=747393
I remembered Scott Guthrie’s ASP.NET URL rewriting article, which explains the various options, and chose the best solution for my situation which was an HttpModule with the HttpContext.RewritePath method. The implementation details are explained in Scott’s article, and the approach works really well.
A few weeks after launching the new site, I noticed Google had indexed only a few hundred pages, and the bulk of my content was still missing. As soon as I started getting diagnostic data from Google Webmaster Tools I could see the problem immediately — I had thousands of unreachable URLs, each throwing an ASP.NET error! This was weird because the pages loaded fine in Firefox and IE, and only URL rewritten pages were throwing errors.
I needed to see what was happening when Googlebot looked at the page, so I installed a Firefox plug-in called User Agent Switcher which lets Firefox appear to the web server as Googlebot by changing the user-agent string to:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
To my surprise, and slight relief, the page threw an ASP.NET exception:
Exception message: Cannot use a leading .. to exit above the top directory.
After searching for a solution, it turns out the problem is caused by the way ASP.NET does browser detection and subsequent adaptive rendering. It doesn’t seem to recognise Googlebot’s new user-agent string, so downgrades to a buggy rendering mode that uses the Html32TextWriter instead of the usual HtmlTextWriter. There’s more information about the problem and some fixes in these articles:
ASP.NET 2.0 Mozilla Browser Detection Hole
Get GoogleBot to crash your .NET 2.0 site
The solution, as described on Brendan Kowitz’s .Net Blog, is to add a new browser definition file to your app_browsers folder. This will tell ASP.NET to use the working HtmlTextWriter (and not Html32TextWriter) when rendering for Googlebot.
This is a fairly new problem, introduced towards the end of March 2007 when Google deployed their BigDaddy infrastructure update, which included an update to the Googlebot user agent string.
about 9 months ago
Thanks a bunch! This was exactly what I needed