On November 22, 2008, Google announced the results of the first ever “petasort” (sorting a petabyte-worth of 100-byte records, following the Sort Benchmark rules) “completed in just over six hours on 4000 computers.”
Now, the company repeated the experiment using 8000 computers — and the “execution time was 33 minutes,” an order of magnitude improvement, revealed member of Google’s Systems Infrastructure over at Google Research blog.
The team said “We are excited by these results. While internal improvements to the MapReduce framework contributed significantly, a large part of the credit goes to numerous advances in Google’s hardware, cluster management system, and storage stack.”
“MapReduce, is a key framework for running multiple processes simultaneously at Google. Thousands of applications, supporting most services offered by Google, have been expressed in MapReduce.”
“While not many MapReduce applications operate at a petabyte scale, some do. Their scale is likely to continue growing quickly. The need to help such applications scale motivated us to experiment with data sets larger than one petabyte. In particular, sorting a ten petabyte input set took 6 hours and 27 minutes to complete on 8000 computers. We are not aware of any other sorting experiment successfully completed at this scale,” the team noted.
The team also announced of looking for candidates for software engineering position with Google.