Improved URL Parameters Handling for Google Webmaster Tools Launches

In 2009, Google launched the 'URL parameter' handling feature of Google webmaster tools, which enabled site owners to specify the parameters on their site that were optional vs. required.A year later, the company improved URL paramaeter feature by providing an option for a default value.In the earlier version, you could specify if a parameter should […]

In 2009, Google launched the 'URL parameter' handling feature of Google webmaster tools, which enabled site owners to specify the parameters on their site that were optional vs. required.

A year later, the company improved URL paramaeter feature by providing an option for a default value.

In the earlier version, you could specify if a parameter should be ignored during the crawling and indexing process. This ability was helpful in cases when the parameter was completely optional and didn't change the content of the associated page. For instance, http://www.example.com/glee-is-awesome/reviews and http://www.example.com/glee-is-awesome/reviews?ssessionid=1234 both load the same content. The sessionid parameter can be ignored entirely.

This latest version solves a similar, but slightly different use case. For example, http://www.example.com/glee-is-awesome/episodes?sort=newest and http://www.example.com/glee-is-awesome/episodes?sort=oldest both load the same content in different sort orders, but the sort parameter can't be ignored. Without it (http://www.example.com/glee-is-awesome/episodes), the content won't load at all. In this case, Googlebot needs the parameter, but only needs to know one preferred value of it. You can specify, for instance, "newest" as the specific value for the "sort" parameter and Google will use that version of the URL as canonical. This means that Google will index that version and will consolidate links to all other versions to the canonical one. Google will use this value anytime it encounters the parameter across the site. Note that you have to choose a listed value -- you can't manually type in a value that Google hasn't yet crawled.

Google says that they've seen a positive impact from the usage of those tools thus far." Now, the search giant has improved the feature again, by enabling site owners to specify how a parameter changes the content of the page.

So what's new? You can now specify whether or not a parameter changes the content on the page. For those that don't, once you specify that, you're done! Things get more complicated if a parameter changes the content on the page. You now have a number of options available, described in more detail below. This latest incarnation of the feature impacts both which URLs are crawled and how the parameters are handled.

First up some background on what's exactly is URL paramenter (for those who are new):

Google says:

Crawling and indexing pages with identical content is an inefficient use of our resources. It can limit the number of pages we can crawl on your site, and duplicate content in our index can hinder your pages' performance in our search results. Duplicate content often occurs when sites make the same content available via different URLs--for example, by using session IDs or other parameters, like this:

http://www.example.com/products/women/dresses/green.htm
http://www.example.com/products/women?category=dresses&color=green
http://example.com/shop/index.php?product_id=32&highlight=green+dress&cat_id=1&sessionid=123&affid=431

In this case, all these URLs point to the same content: a collection of real green dresses.

When Google detects duplicate content, such as variations caused by URL parameters, we group the duplicate URLs into one cluster and select what we think is the "best" URL to represent the cluster in search results. We then consolidate properties of the URLs in the cluster, such as link popularity, to the representative URL. Consolidating properties from duplicates into one representative URL often provides users with more accurate search results.

To improve this process, we recommend using the parameter handling tool to give Google information about how to handle URLs containing specific parameters. We'll do our best to take this information into account; however, there may be cases when the provided suggestions may do more harm than good for a site.

In general, URL parameters fall into one of two categories:

  • Parameters that don't change page content: for example, sessionid, affiliateid. Parameters like these are often used to track visits and referrers. They have no affect on the actual content of the page. For example, the following URLs all point to the exact same content:
          http://www.example.com/products/women/dresses?sessionid=12345
          http://www.example.com/products/women/dresses?sessionid=34567
          http://www.example.com/products/women/dresses?sessionid=34567&source=google.com
        
  • Parameters that change or determine the content of a page: for example, brand, gender, country, sortorder. For example, a parameter can affect content as follows:
    • Sorts (for example, sort=price_ascending): Changes the order in which content is presented.
    • Narrows (for example, t-shirt_size=XS): Filters the content on the page.
    • Specifies (for example, store=women): Determines the set of content displayed on a page.
    • Translates (for example, lang=fr): Displays a translated version of the content.
    • Paginates (for example, page=2): Displays a specific page of a long listing or article.
    • Other: Changes content in ways other than those described above.

Multiple parameters

A single URL may contain many parameters for each of which you can specify settings. More restrictive settings override less restrictive settings. For example, here are three parameters and their settings:

  • shopping-category (Every URL)
  • sort-by (Only URLs with value = production-year)
  • sort-order (Only URLs with value = asc)

Based on these settings, Google would crawl the following URL: www.example.com?shopping-category=DVD-movies&sort-by=production-year&sort-order=asc.

However, Google would not crawl this URL: www.example.com?shopping-category=shoes&sort-by=size&sort-order=asc. This is because the settings tell Google to crawl only those URLs where the value of the sort-by parameter equals production-year. Because shoes are never sorted by production year, this overly restrictive setting results in a lot of content going uncrawled.

To access the feature, following the steps described below:

  1. On the Dashboard, under Site configuration, click URL parameter.
  2. Next to the parameter you want, click Edit. (If the parameter isn’t listed, click Add parameter. Note that this tool is case sensitive, so be sure to type your parameter exactly as it appears in your URL.)
  3. If the parameter doesn't affect the content displayed to the user, select No ... in the Does this parameter change... list, and then click Save. If the parameter does affect the display of content, click Yes: Changes, reorders, or narrows page content, and then select how you want Google to crawl URLs with this parameter.
    • Let Googlebot decide. Select if you're unsure of the parameter's behavior, or if the behavior changes for different parts of the site. Googlebot will analyze your site to determine how best to handle the parameter. This is a good general option.
    • Every URL. Googlebot will use the value of this parameter to determine if a URL is unique. For example, www.example.com/dresses/real.htm?productid=1202938 will be considered an entirely different URL from www.example.com/dresses/real.htm?productid=5853729. Before selecting this option, be sure that the parameter really does change the page content; otherwise, Googlebot might unnecessarily crawl duplicate content on your site.
    • Only URLs with value=x. Googlebot will crawl only those URLs where the value of this parameter matches this specified value. URLs with a different parameter value won’t be crawled. This is useful if, for example, your site uses the parameter value to change the order in which otherwise identical content is displayed. For example, www.example.com/dresses/real.htm?sort=price_high contains the same content as www.example.com/dresses/real.htm?sort=price_low. Use this setting to tell Googlebot to crawl only those URLs where sort=price_low (thus avoiding crawling duplicate content).
    • No URLs. Googlebot won't crawl any URLs containing this parameter. For example, telling Googlebot not to crawl URLs with parameters such as pricefrom and priceto (like http://www.examples.com/search?category=shoe&brand=nike&color=red&size=5&pricefrom=10&priceto=1000) can prevent the unnecessary crawling of content already available from http://www.examples.com/search?category=shoe&brand=nike&color=red&size=5.

[Source:Webmaster Tools Help]