Google Custom Search Engine (CSE) - Specifying URL patterns

Google Custom Search Engine (CSE) is quite easy to build and offers a very powerful way of building your own search engine on top of Google search. You can exclude sites, add labels for drill-down and even change the ranking of results for your search engine. This blog post, we’ll look at the basic element of Custom […]

Google Custom Search Engine (CSE) is quite easy to build and offers a very powerful way of building your own search engine on top of Google search. You can exclude sites, add labels for drill-down and even change the ranking of results for your search engine. This blog post, we’ll look at the basic element of Custom Search - “URL patterns”.

URL patterns specify the part of the web you want to search or exclude from your search. Custom Search is based on approximation algorithms that use these patterns to give you your customized results.

Consider the “I Love Veggies” search engine that we created. Here's how the “I Love Veggies” search engine made use of patterns effectively:

  • Be very specific. Use the longest possible pattern for specifying a site. For example, in the "I Love Veggies" search engine, we wanted to search all of www.goveg.com, so we added “www.goveg.com/*” as a pattern. But we wanted to search only the vegetarian part of the “allrecipes.com” site. So instead of adding all of  “allrecipes.com/*” we added the more specific “allrecipes.com/Recipes/Everyday-Cooking/Vegetarian/*”.
  • Specify multiple pages in a site with a "*" at the end of the pattern. If you specify just "www.goveg.com", Custom Search will search just the single page http://www.goveg.com. You need to remember this only if you are write your XML file of annotations directly. If you are using the Control Panel, it automatically adds the "/*" at the end for you, unless you indicate otherwise.
  • Sometimes, you might have a few hosts on a domain with the same path that you want to search. In our example, we wanted to search "mideastfood.about.com/od/vegetarianrecipes/*" and "indianfood.about.com/od/vegetarianrecipes/*". In such a case it is better to specify these patterns individually instead of a very general "*.about.com/od/vegetarianrecipes/*" as more specific the patterns, better the approximation.
  • You can only use the * in the hostname at the beginning of the pattern and it can only represent a full token. For example, "*.about.com/*" is a valid pattern and so is "*.food.about.com/*". However, "*ood.about.com/*" is not valid, nor is "food.*.about.com/*". [Google Custom Search Blog]

Google, Search, Google Search, Search Engine, CSE, Custom Search Engine, URLs, Patterns, Features, Tips, Tricks, Tips and Tricks, Knowledgebase