Microsoft SQL Server 2008 ETL World Record

At the launch of SQL Server 2008, you may have seen the references to world-record performance doing a load of data using SSIS. Microsoft and Unisys announced a record for loading data into a relational database using an Extract, Transform and Load (ETL) tool. Over 1 TB of TPC-H data was loaded in under 30 […]

At the launch of SQL Server 2008, you may have seen the references to world-record performance doing a load of data using SSIS. Microsoft and Unisys announced a record for loading data into a relational database using an Extract, Transform and Load (ETL) tool. Over 1 TB of TPC-H data was loaded in under 30 minutes. I wanted to provide some background material in the form of a Q&A on the record, since it’s hard to give many details in the context of a launch event. We are also planning a paper that talks about all this, so think of this article as a place-holder until the full paper comes along. I hope you find this background information useful.

How fast was the data load? More than one terabyte of data was parsed from flat files, transferred over the network and loaded into the destination database in less than 30 minutes, a world record beating all previously published results using an ETL tool.  That is a rate in excess of 2 TB per hour (650+ MB/second).   To be precise, 1.18TB of flat file data was loaded in 1794 seconds.  This is equivalent to 1.00TB in 25 minutes 20 seconds or 2.36TB per hour.

Why is this important? Businesses have ever-increasing volumes of data stored in many heterogeneous systems.  Thay want to know that an ETL tool they choose will be able to support any data volumes they might require.  Microsoft has been making a significant investment in SQL Server Integration Services (SSIS), and this record illustrates the capability of SQL Server Integration Services 2008, SQL Server 2008 and the Unisys ES7000 to handle a significant volume of data at a dramatic speed.

Why not just do a bulk load of the data? It is rare in businesses today that data is always available on the destination system, and does not need to be standardized or corrected for errors before loading. These rare cases are the times that bulk loading data makes sense. Data integration can involve complex transformation rules, error checking and data standardization techniques. ETL tools like SSIS can perform these functions such as moving data between systems, reformatting data, integrity checking, key lookups, tracking lineage, and more. SSIS has proven itself to be a versatile ETL tool, and now it is shown to be the fastest one as well.

Full Article

Microsoft, SQL Server 2008, ETL, Database, Launch, Benchmarks, Announcement, Performance, SSIS, Integration Services, SQL Server Performance