Research @ Google team explains company's "table search" initiative and its recetnt improvments -- "Organizing the collection of structured data that the web offers in tables and helping users find the most useful tables is a key mission of Table Search from Google Research," posted Johnny Chen, Product Manager, Google Research in a blog post titled Better table search through Machine Learning and Knowledge.
While we are still a long way away from the perfect table search, "we made a few steps forward recently by revamping how we determine which tables are "good" (one that contains meaningful structured data) and which ones are "bad" (for example, a table that hold the layout of a Web page)," writes Chen.
In particular, "we switched from a rule-based system to a machine learning classifier that can tease out subtleties from the table features and enables rapid quality improvement iterations." "This new classifier is a support vector machine (SVM) that makes use of multiple kernel functions which are automatically combined and optimized using training examples. Several of these kernel combining techniques were in fact studied and developed within Google Research," Chen explains.
Also, leveraging the Knowledge Graph, helped better understanding of the tables. In particular, "the improved algorithms for identifying the context and topics of each table, the entities represented in the table and the properties they have," Chen wrote. This knowledge helps "classifier make a better decision on the quality of the table, and also enables better matching of the table to the user query."
Finally, Google now allow users to import Web tables found through Table Search into their Google Drive account as Fusion Tables. "Once in Fusion Tables, the data can be visualized, updated, and accessed programmatically using the Fusion Tables API," said Chen concluding the post.
Here are a couple of reference documents (PDFs):