Last year, Google published an SEO Report Card of 100 Google properties. In it, they rated themselves on how well the sites were optimized for search. This morning a tweet revealed that Google Translate pages are appearing in Google search results. As you can see in the example below, pages with individual translation requests have been indexed.
A site owner might also want to block these types of pages from being crawled and indexed to increase crawl efficiency and ensure the most valuable pages on the site are being crawled and indexed instead.
Vanessa says "When asked Google about this, they confirmed that indeed it was simply a matter of the Google Translate team not being aware of the issue and said they would resolve it."
Blocking Autogenerated Search Pages From Being Indexed
In case of Google Translate, the ideal scenario is the http://translate.google.com/# and any secondary pages (such as http://translate.google.com/translate_tools ) be indexed, but that any pages from translation requests not be indexed.
The best way to do this would be to add a disallow line in the robots.txt file for the site that blocks indexing based on a pattern match of the URL query parameter. For instance:
This pattern would prevent search engines from indexing any URLs containing q=. (The * before the q= means that the q= can appear anywhere in the URL.)
Adding the disallow pattern shown above to the www.google.com/robots.txt file wouldn't work as search engines wouldn't check that file when crawling the translate subdomain and in would instead cause search engines not to index URLs that match the pattern on www.google.com.
translate.google.com (and all google.com subdomains should have their own robots.txt file that's customized for that subdomain.
Using the meta robots tag
If Google isn't able to create a separate robots.txt file for the translate subdomain, they should first remove the file that's there (and from other subdomains as well, as it could be causing unexpected indexing results for those subdomains). Then, they should use the meta robots tag on the individual pages they want blocked:
[tags]meta tag,googlebot,searchbot,spiders,search results,search page[/tags]