A newer version of the Google’s quality rater guidelines (PDF) has been leaked, which is used internally to evaluate the quality of search results, reveals some interesting things about Google’s approach to search. Brian Ussery gave it a look already. According to the document, which is dated April 2007 and at least looks legitimate, a quality rater has the job to first research and understand a specific search query – say [cell phones] –, to then look at the quality of a website returned for this query. The rater will then assign a rating to this specific “query-page” combo, and proceed to the next “query-page” task.
The quality rating options for a specific URL are: vital, useful, relevant, not relevant, off-topic, didn’t load, foreign language, and unratable. Also, a URL can be flagged as not spam, maybe spam, or spam, and malicious or pornography. We have reason to believe the feedback given by quality raters is then incorporated into the algorithms by Google engineers, or helps deciding with which fine-tuned variant of the algorithms to proceed. (Google once told people their results are “completely objectively,” “completely automated” and “independent of the beliefs” of people working at Google, but they since revised many occurrences of such wording on their help pages.)
As before, queries are grouped into the types navigational (when a user just wants to locate a specific webpage they have in mind, e.g. typing [ibm]), informational (when someone researches a topic to find out more about something, e.g. querying for [tsunami]), and transactional (when the user wants to buy something or download something or is looking for some other type of resource, as opposed to information, e.g. for a query like [download adobe reader]). These types are not mutually exclusive; Google lists the query [“ipod nano”] as an example of something which is navigational (going to the product page), informational (reading a review or info on the product page), as well as transactional (perhaps purchasing the product).
Spam is treated separately from search results evaluation. A web page may be spammy even if it’s considered “vital” for some queries or it’s very authoritative. “Webspam is the term for web pages that are designed by webmasters to trick search engine robots and direct traffic to their websites,” explains Google. Web pages that include ads and scraped content from other sites, but don’t bring any original information are considered spam. “When trying to decide if a page is Spam, it is helpful to ask yourself this question: If I remove the scraped (copied) content, the ads, and the links to other pages, is there anything of value left? If the answer is no, the page is probably Spam.”
Google, Search, Guidelines, Spam