February 18, 2008
6:39 pm

Loren reports that Colin Cochrane found this “Over the weekend, Yahoo’s Delicious (del.icio.us) social bookmarking property has been blocking spiders and bots from non-Yahoo search engines from crawling the site and identifying new web pages, sites and bookmarks.” — saying that ‘This isn’t a simple robots.txt exclusion, but rather a 404 response that is now being served based on the requesting User-Agent.’

I took a look at del.icio.us’ robots.txt and found that it was disallowing Googlebot, Slurp, Teoma, and msnbot for the following:

Disallow: /inbox
Disallow: /subscriptions
Disallow: /network
Disallow: /search
Disallow: /post
Disallow: /login
Disallow: /rss

Seeing that the robots.txt was blocking these search engine spiders, I tried accessing del.icio.us with my User-Agent switcher set to each of the disallowed User-Agents and received the same 404 response for each one.

Colin also found that Delicious pages listed in Google are lacking a cache, title, description and other information.

Yahoo!, Search Engine, Spider, Crawling, Search Bots, Delicious, Bookmark, Google, Ask.com, MSN, Slurp, Teoma

Loading

Contextual Related Posts:

No followup yet

Leave a Response

Comment Preview
« Google Glossary Search EngineWildBit Viewer 5.2 Alpha 1.0 »
Feed Icon

Subscribe via RSS or email: