In a Bing Search blog post, Jim Kleban provides an overview of Bing Speller technology with some examples of recent improvements shipped in December of 2012. “Bing Speller handles tens of thousands of queries per second and computes corrections within tens of milliseconds.”
“In close collaboration with Microsoft Research we have developed advanced machine learning models to build a great speller. The team working on speller relevance runs thousands of experiments every week – ranging from improving data freshness to improving ranking fundamentals – to deliver a better search experience,” Kleban wrote.
Bing’s Speller processes tens of millions of data points mined from searches, web pages, clicks and user actions to help you find the best possible results. Bing even adapts to constantly evolving vocabularies on the internet!
For example “is it Swarzinegar, Swarneger, Scwarznagger or Schwartiznegar?” These are just a few of more than 2,000 different ways users on Bing have typed their queries in hope of searching for “Schwarzenegger.” “You can see that many of the characters are missing between the misspelled and the corrected version. The Bing Speller employs statistical mapping schemes and phonetic similarity measures to identify and correct the misspellings,” he said.
“Sometimes, even when an individual word is spelled correctly, the Bing Speller corrects it to better suit the searchers intent. “One of the most powerful clues we have to find the right spelling is the context of the query. For example, someone recently searched the phrase: “how can you sea if money is reel.” In this case, “the Bing Speller corrects the two words “sea” to “see” and “reel” to “real”,” Kleban explained. Though the two words were correctly spelled, they don’t make sense in the context of the search. The correction has a big impact on the quality of the web results,” Kleban added further.
The aim of the Bing Speller is to correct these queries so users receive relevant web results that match their intent even when their query is misspelt. “It takes state-of-the-art machine learning, statistical modeling, information retrieval, and significant engineering muscle to deliver high quality web scale spell correction at high speeds.”