A new search engine for datasets that aims to organize the information for the scientific community called “Dataset Search” was announced Wednesday, to help discover data from sciences, government, some news organizations.
The search service will be a companion of sorts to Google Scholar (search engine for academic studies and reports) allows to find datasets wherever it’s hosted online— whether on publisher’s site, a digital library, or an author’s web page.
Dataset Search can be used by “scientists, data journalists, data geeks, and anyone who is interested in a specific topic or want to satisfy their intellectual curiosity,” Google stated. Using this search feature dataset can be searched on topics across “environmental and social sciences, as well as data from other disciplines, including government data and data provided by news organizations, such as ProPublica,” Google said.
Institutions/publishers are required to include the “dataset schema” metadata tags in their published data web pages to be eligible to include into Google’s new Dataset Search Engine for indexing and crawling. Google is encouraging data providers to adopt this new schema markup to enable searchers to easily discover their datasets within this vertical search feature.
Google said, “As more data repositories use the schema.org standard to describe their datasets, the variety and coverage of datasets that users will find in Dataset Search, will continue to grow.”
Google also recently launched a new search feature to discover “tabular data” in Search, that focused more on news organizations and data journalists. This initiative to uses same metadata along with the linked tabular data to provide answers to queries directly in search results.
To start using Dataset Search, head over to toolbox.google.com/datasetsearch — then, enter “what you are looking for” and Google will guide you to the published dataset on the repository provider’s site.
See the example screenshot below showing [daily weather] query in Dataset Search:
In the search results returned, you will find data from NASA and NOAA, as well as from academic repositories such as Harvard’s Dataverse and Inter-university Consortium for Political and Social Research (ICPSR).
Though, Dataset Search already supports multiple languages, Google says, it will add support for additional languages, soon.
Google also has published the schema markup requirements for developers over at this site.
“To create Dataset search, we developed guidelines for dataset providers to describe their data in a way that Google (and other search engines) can better understand the content of their pages. These guidelines include salient information about datasets: who created the dataset, when it was published, how the data was collected, what the terms are for using the data, etc. We then collect and link this information, analyze where different versions of the same dataset might be, and find publications that may be describing or discussing the dataset. Our approach is based on an open standard for describing this information (schema.org) and anybody who publishes data can describe their dataset this way. We encourage dataset providers, large and small, to adopt this common standard so that all datasets are part of this robust ecosystem.”