BotDetect ASP.NET CAPTCHA Known Issues

SSL Offloading and Audio CAPTCHA in IE

Issue ID

#4120861

Affected products

BotDetect ASP.NET CAPTCHA v2.0.*

Affected clients

IE 6.0, IE 7.0

Status

Pending

Description

If you use SSL offloading, i.e. HTTPS requests to your site are decrypted by a special proxy and forwarded to your web server over plain HTTP , IE users will be unable to use the audio CAPTCHA.

Steps to reproduce

  • Open a page using the BotDetect ASP.NET CAPTCHA over HTTPS in Internet Exploree 6.0 or 7.0
  • Click the speaker icon - no sound will be played
  • Right-click the speaker icon and select "Open link in new tab" - an error will occur, saying IE is unable to download the file

Explanation

The problem is caused by IE file downloads not working if the server sends a file over SSL with the no-cache header. IE interprets the header as a strict prohibition to save the file locally, which means the sound file is not downloaded on the client. Since the sound file first needs to be downloaded to be played, this means the CAPTCHA sound is never played.

In other browsers we tested this against (Firefox, Opera, Safari), this header is interpreted differently, meaning the sound file can be downloaded (and played), but is not reused. IE also works properly when not using SSL. In other words, it's the no-cache + SSL + IE use case that is problematic.

When not using SSL, the no-cache header is required, otherwise there are use cases when changing the CAPTCHA image doesn't result in the equivalent change in the CAPTCHA sound, since the browser reuses the old sound file.

So the solution we used in the latest versions of BotDetect is to send the no-cache header only if SSL is not used. In other words, the code is similar to:

if (!context.Request.IsSecureConnection)
{
   context.Response.Cache.SetCacheability(HttpCacheability.NoCache);
}

Since this code is obviously executed on the web server, this will only work if the request is sent to the web server over SSL. But if the SSL offloading means that SSL requests are made to a special network device, which then decrypts the traffic and sends plain HTTP requests to the web server, the web server will not know the current request is made over SSL.

In other words, context.Request.IsSecureConnection will be false at the point of execution even though the original request was made over SSL. This in turn will result in the no-cache header being sent to IE clients, returning the original problem.

Workarounds

Our component needs to handle the broadest possible range of browsers and use cases, so the general solution is non-trivial - despite SSL offloading, the web server still needs to know the request was originally sent over SSL, and send the appropriate headers accordingly. This could be achieved by sending a special sound request querystring parameter, which is set in the sound playing JavaScript depending how the client sees the page (HTTPS or plain HTTP), for example. We will investigate this solution in future versions of BotDetect.

You can try looking up your SSL offloading settings, maybe there is an existing solution for cases when the server needs to know it's being accessed via SSL even if it's offloaded. If the ASP.NET built-in HttpContext.Current.Request.IsSecureConnection property can be made to work properly, that would solve the problem.

Also, if you are only going to use the CAPTCHA on pages accessed over SSL, you don't have to handle the plain HTTP use case – which means you can remove the SSL-detecting logic from the code altogether and just assume SSL is always used. You can make this change in the component source code if you purchased it.

Cookieless ASP.NET Session State and search engine indexing

Issue ID

#4120741

Affected products

BotDetect ASP.NET CAPTCHA v2.0.*

Affected clients

Googlebot, Yahoo! Slurp, msnbot, other crawlers

Status

3rd party issue - can not fix

Description

If you allow cookieless ASP.NET Session state in your application (the the <sessionState> element cookieless attribute set to "true" or "AutoDetect" in of your web.config file), your ASP.NET pages and some BotDetect CAPTCHA paths might get indexed multiple times in various search engines, with different URLs.

Steps to reproduce

  • Check that the Google, Yahoo or Live index of your website contains at least one ASP.NET page (using the site: operator) - for example, test/default.aspx
  • Set the cookieless attribute to "true" or "AutoDetect" in your application's web.config file
  • Re-check the Google, Yahoo or Live index of your website periodically - as the crawlers visit your ASP.NET pages, they will add many different copies of the same page with different URLs - for example, test/(S(xihahl55btbse4qpvz45qj55))/default.aspx, test/(S(upoy35555j25grr4dacr1bz1))/default.aspx, test/(S(3aebky5512wexi45oylubwud))/default.aspx...

Explanation

ASP.NET cookieless Session State is meant to ensure your ASP.NET pages work properly even for users who have disabled cookies in their browser. As an alternative to the regular Session State storing the SessionID in a cookie, it rewrites the URLs of your pages to contain the SessionID instead. This is useful for your site's visitors but can influence the indexing of your ASP.NET pages in search engine indexes.

Since various crawlers (Googlebot, Yahoo! Slurp, msnbot, ...) all visit your site without any cookie support, the ASP.NET runtime rewrites your pages' URLs to contain the SessionID. And since the SessionID is highly dynamic and changes on every crawler visit, your pages will always have a different URL for the crawler, causing it to consider those URLs as new pages and index them separately. Over time, your site's index can accumulate a huge number of different entries for the same page. This in turn can cause the search engine to consider them duplicate content and move those pages to supplemental results, effectively removing them from user searches.

Workarounds

First, BotDetect CAPTCHA request paths should always be blocked in the robots.txt file, since they shouldn't be indexed by search engines at all:

User-Agent: Googlebot
Disallow: /*/LanapCaptcha.aspx
Disallow: /*/WebResource.axd

User-Agent: Yahoo! Slurp
Disallow: /*/LanapCaptcha.aspx
Disallow: /*/WebResource.axd

User-Agent: msnbot
Disallow: /*/LanapCaptcha.aspx
Disallow: /*/WebResource.axd

User-Agent: *
# other restrictions go below

Furthermore, you can try to prevent ASP.NET from using cookieless Session Urls for Googlebot, Yahoo! Slurp, msnbot and some other search engine bots. This is important if you want your ASP.NET pages to be included in the search engine index properly.

According to this article, just create an App_Browsers folder in your project's root folder and follow the instructions to create a genericmozilla5.browser file there. This should have the effect that the bots in question will always get the same Url, despite the fact they don't have cookie support.

If this doesn't help to get your pages back in the search engine index with proper URLs, and it's important to you to that they get indexed and ranked properly, you should consider disabling cookieless ASP.NET Session State altogether. To do this, set the cookieless attribute of the <sessionState> element to "false" in your application's web.config file.