Yahoo! Search Index support Open-Source Hadoop Architecture

The Yahoo! Search Blog has announced that they’re now using “open-source Apache Hadoop” to process the Webmap -- the application which produces the index from the billions of pages crawled by Yahoo! Search. Our implementation of a Hadoop-based Webmap is part of a larger strategy of Yahoo! moving toward openness -- both in our infrastructure and […]

The Yahoo! Search Blog has announced that they’re now using “open-source Apache Hadoop” to process the Webmap -- the application which produces the index from the billions of pages crawled by Yahoo! Search.

Our implementation of a Hadoop-based Webmap is part of a larger strategy of Yahoo! moving toward openness -- both in our infrastructure and throughout the network (our recent OpenID announcement is another good example). Using open source software is a win-win situation for Yahoo! and the wider community. We achieve cost savings, faster processing, reduced maintenance, and increased scale and the community can benefit from the myriad improvements it took to make Hadoop viable for such a large-scale commercial implementation.

Matt McAlister posted, about the Hadoop implementation. Hadoop takes over from a proprietary system being used previously. The benefits, among others, are cost savings and scalability.

The irony of this development, however, is that it comes just before Microsoft may take over Yahoo. Microsoft is all about proprietary technology, which is the opposite of what's going on here. There's an interview between Jeremy Zawodny and two of the engineers that worked on the project in the video below:

Yahoo!, Yahoo Search, Open-Source, Open Source, Hadoop, Architecture, New Features, Crawling, Indexing, Yahoo Bots