TidyFS: Microsoft's Distributed File System for Parallel Computations on Clusters

Later this week at the Usenix '11 conference, Microsoft researchers behind the TidyFS will be sharing more publicly about their work."TidyFS is a distributed file system system under development by Microsoft Research for parallel computations on clusters. On commodity, "shared-nothing" clusters, the primary workloads tend to be generted by distributed execution engines like MapReduce, Hadoop […]

Later this week at the Usenix '11 conference, Microsoft researchers behind the TidyFS will be sharing more publicly about their work.

"TidyFS is a distributed file system system under development by Microsoft Research for parallel computations on clusters. On commodity, "shared-nothing" clusters, the primary workloads tend to be generted by distributed execution engines like MapReduce, Hadoop or Microsoft's Dryad, the Microsoft researchers note in the abstract of their presentation. Other vendors have created distributed file systems for these workloads -- like the Google File System (GFS) and the Hadoop Distributed File System (HDFS). Microsoft has one in development, too: TidyFS."

The architectural diagram below from Microsoft showing how researchers were envisioning that TidyFS and other experimental components would fit together a year ago:

Per TidyFS white paper:

The TidyFS storage system is composed of three components: a metadata server; a node service that performs housekeeping tasks running on each cluster computer that stores data; and the TidyFS Explorer, a graphical user interface which allows users to view the state of the system.

Microsoft Research has been deploying and using actively TidyFS for the past year on a research cluster with 256 servers running large-scale, data-intensive computations, according to the white paper. The research cluster is used only for programs run using DryadLINQ, which's a parallelizing compiler for .Net programs using Dryad.

"On a typical day, several terabytes of data are read and written to TidyFS through the execution of DryadLINQ program," the white paper notes.

The experimental TidyFS cluster also is making use of a cluster-wide scheduler, codenamed "Quincy," and a computational cache-manager, codenamed "Nectar."

As with all Microsoft research projects, there is no absolute guarantee as to when and if TidyFS will evolve into a commercial product or part of a commercial product.

We've embedded full whitepaper below:

[via: App About Microsoft]