Microsoft releasing two new Hadoop connectors to help customers exploit the benefits of unstructured data in both SQL and non-SQL environments. “We will soon release a Community Technology Preview (CTP) of two new Hadoop connectors – one for SQL Server and one for PDW.
Connectors will include:
- Hadoop to SQL Server Parallal Data Warehouse (PDW) for large data volumes.
- Hadoop to SQL Server 2008 R2 or SQL Server ‘Denali’ software,” announced SQL Serve team.
“The connectors provide interoperability between SQL Server/PDW and Hadoop environments, enabling customers to transfer data between Hadoop and SQL Server/PDW. With these connectors, customers can more easily integrate Hadoop with their Microsoft Enterprise Data Warehouses and Business Intelligence solutions to gain deeper business insights from both structured and unstructured data.”
Microsoft brings over a decade of Big Data expertise to the market. For instance the company use it at Bing to deliver the best search results (over 100 PBs of data). Over the years Microsoft has invested steadily in unstructured data, including support for Binary files, FILESTREAM in SQL Server, semantic search, File Table, StreamInsight and geospatial data types.
“Microsoft understands that customers are working with unstructured data in different environments such as Hadoop; we are committed to providing these customers with interoperability to enable them to move data between their Hadoop and SQL Server environments,” notes SQL Server Team.
For those new, “Apache Hadoop software library is a framework that supports distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
Hadoop is highly scalable and can support petabytes of data. One of its key attractions is cost: through the use of commodity servers, Hadoop dramatically reduces the cost of analyzing large data volumes. As an example there is an application of Hadoop at New York Times that processed 4 TB of images, producing up to 11 million PDF files in 24 hours for only $240 in computational cost.”
[Via: SQL Server team]