<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Chris Fulstow &#187; asp.net 2.0</title>
	<atom:link href="http://chrisfulstow.com/category/asp-net-2-0/feed/" rel="self" type="application/rss+xml" />
	<link>http://chrisfulstow.com</link>
	<description>ASP.NET Tech Lead and Web Developer</description>
	<lastBuildDate>Sat, 05 Jun 2010 01:32:55 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Googlebot and ASP.NET 2.0 URL Rewriting</title>
		<link>http://chrisfulstow.com/googlebot-and-asp-net-2-0-url-rewriting/</link>
		<comments>http://chrisfulstow.com/googlebot-and-asp-net-2-0-url-rewriting/#comments</comments>
		<pubDate>Thu, 17 May 2007 02:42:00 +0000</pubDate>
		<dc:creator>Chris Fulstow</dc:creator>
				<category><![CDATA[asp.net 2.0]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[seo]]></category>

		<guid isPermaLink="false">http://chrisfulstow.com/googlebot-and-asp-net-2-0-url-rewriting/</guid>
		<description><![CDATA[URL rewriting is a really important part of search engine optimisation (SEO), so when I built a new site a few weeks ago I decided that pages would be nice and friendly, free of messy querystring parameters, and full of relevant keywords.  This sort of thing:
http://www.mysite.com/gadgets/widget.aspx
Instead of this:
http://www.mysite.com/DisplayPage.aspx?productId=747393
I remembered Scott Guthrie&#8217;s ASP.NET URL rewriting <a href="http://chrisfulstow.com/googlebot-and-asp-net-2-0-url-rewriting/" class="more-link">More &#62;</a>]]></description>
			<content:encoded><![CDATA[<p><span style="font-weight: bold;">URL rewriting</span> is a really important part of <span style="font-weight: bold;">search engine optimisation</span> (SEO), so when I built a new site a few weeks ago I decided that pages would be nice and friendly, free of messy querystring parameters, and full of relevant keywords.  This sort of thing:</p>
<p>http://www.mysite.com/gadgets/widget.aspx</p>
<p>Instead of this:</p>
<p>http://www.mysite.com/DisplayPage.aspx?productId=747393</p>
<p>I remembered <a href="http://weblogs.asp.net/scottgu/archive/2007/02/26/tip-trick-url-rewriting-with-asp-net.aspx">Scott Guthrie&#8217;s ASP.NET URL rewriting article</a>, which explains the various options, and chose the best solution for my situation which was  an <a href="http://www.codeproject.com/useritems/http-module-ip-security.asp">HttpModule</a> with the <a href="http://msdn2.microsoft.com/en-us/library/system.web.httpcontext.rewritepath%28vs.80%29.aspx">HttpContext.RewritePath</a> method.  The implementation details are explained in Scott&#8217;s article, and the approach works really well.</p>
<p>A few weeks after launching the new site, I noticed Google had indexed only a few hundred pages, and the bulk of my content was still missing.  As soon as I started getting diagnostic data from <a href="https://www.google.com/webmasters/tools/">Google Webmaster Tools</a> I could see the problem immediately — I had thousands of unreachable URLs, each throwing an ASP.NET error!  This was weird because the pages loaded fine in Firefox and IE, and only URL rewritten pages were throwing errors.</p>
<p>I needed to see what was happening when <span style="font-weight: bold;">Googlebot</span> looked at the page, so I installed a Firefox plug-in called <a href="https://addons.mozilla.org/firefox/addon/59">User Agent Switcher</a> which lets Firefox appear to the web server as Googlebot by changing the <a href="http://en.wikipedia.org/wiki/User_agent">user-agent</a> string to:</p>
<p><span style="font-size:85%;"><span style="font-family:courier new;">Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)</span></span></p>
<p>To my surprise, and slight relief, the page threw an ASP.NET exception:</p>
<p><span style=";font-family:courier new;font-size:85%;"  ><span>Exception message: Cannot use a leading .. to exit above the top directory.</span></span><span style="font-size:100%;"><br /></span><br />After searching for a solution, it turns out the problem is caused by the way ASP.NET does browser detection and subsequent adaptive rendering.  It doesn&#8217;t seem to recognise Googlebot&#8217;s new user-agent string, so downgrades to a buggy rendering mode that uses the Html32TextWriter instead of the usual HtmlTextWriter.  There&#8217;s more information about the problem and some fixes in these articles:</p>
<p><a href="http://www.kowitz.net/2006/12/11/ASPNET+20+Mozilla+Browser+Detection+Hole.aspx">ASP.NET 2.0 Mozilla Browser Detection Hole</a><br /><a href="http://todotnet.com/archive/0001/01/01/7472.aspx">Get GoogleBot to crash your .NET 2.0 site</a></p>
<p>The solution, as described on Brendan Kowitz&#8217;s .Net Blog, is to add a new <a href="http://msdn2.microsoft.com/en-us/library/ms228122.aspx">browser definition file</a> to your <span style="font-weight: bold;">app_browsers</span> folder.  This will tell ASP.NET to use the working HtmlTextWriter (and not Html32TextWriter) when rendering for Googlebot.</p>
<p>This is a fairly new problem, introduced towards the end of March 2007 when Google deployed their <a href="http://www.mattcutts.com/blog/indexing-timeline/">BigDaddy</a> infrastructure update, which included an update to the Googlebot user agent string.</p>
]]></content:encoded>
			<wfw:commentRss>http://chrisfulstow.com/googlebot-and-asp-net-2-0-url-rewriting/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
