Amazon A9's siteinfo.xml: almost a repeat of favicon.ico

Recently, I’ve received a few error 404s on a request for “siteinfo.xml.” siteinfo.xml is a file used by Amazon’s A9 search engine’s browser toolbar SiteInfo, and is automatically fetched for every website a user visits.

This sounds pretty similar to Microsoft’s Internet Explorer’s infamous favorites icons feature. For every site a user visited with Internet Explorer, the browser would automatically request a file called favicon.ico, to be displayed in the browser’s location bar and bookmarks. A lot of people were not happy–all of the sudden web servers would begin to get swamped for requests for this mysterious favicon.ico that did not exist. These requests polluted many web server logs, and were very annoying.

On some sites, especially dynamic ones, 404 errors are very expensive. Unfortunately this is true of most Drupal-powered sites, including mine. When using Drupal’s “pretty URLs” which uses Apache’s mod_rewrite to, well, make URLs pretty, all requests that the web server does not process (including errors) will go through Drupal. Going through Drupal means a long boot-strapping process to initialize Drupal and load all its modules, and at least one database request to find out a URL does not exist and to return an error 404. Too many requests for a non-existent file can basically become a DoS attack.

It seems Amazon’s A9 developers didn’t get the memo people don’t like tools that request files that don’t exist.

Granted, it’s not too bad: I don’t think this toolbar has much market penetration, so it’s not as if millions of people are killing my site. The siteinfo.xml specification page also mentions that the file is fetched through A9 and cached, so the file will not be requested for every user that visits, but only for the first one.

Kudos for Amazon’s programmers being a bit brighter than Microsoft’s, but eh, I can’t say how much more bright for designing a system that is a bit too similar to the favicon.ico debacle.

