Introducing Columnstore Indexes: A New Feature in SQL Server "Denali" Known As Project "Apollo"

Do you have a data warehouse? Do you wish your queries would run faster? Check out the new columnstore index (aka Project "Apollo") in SQL Server Code Name "Denali" as of CTP3."SQL Server's traditional indexes, clustered and nonclustered, are based on the B-tree. B-trees are great for finding data that match a predicate on the […]

Do you have a data warehouse? Do you wish your queries would run faster? Check out the new columnstore index (aka Project "Apollo") in SQL Server Code Name "Denali" as of CTP3.

"SQL Server's traditional indexes, clustered and nonclustered, are based on the B-tree. B-trees are great for finding data that match a predicate on the primary key. They're also reasonably fast when you need to scan all the data in a table," explains Susan Price, Senior Program Manager, SQL Server Database Engine Team.

So why use a column store? There are two main reasons:

  1. Compression. Most general-purpose relational database management systems, including SQL Server, store data in row-wise fashion. This organization is sometimes called a row store. Both heaps and B-trees are row stores because they store the values from each column in a given row contiguously. When you want to find all the values associated with a row, having the data stored together on one page is very efficient. Storing data by rows is less ideal for compressing the data. Most compression algorithms exploit the similarities of a group of values. The values from different columns usually are not very similar. When data is stored row-wise, the number of rows per page is relatively few, so the opportunities to exploit similarity among values are limited. A column store organizes data in column-wise fashion. Data from a single column are stored contiguously. Usually there is repetition and similarity among values within a column. The column store organization allows compression algorithms to exploit that similarity.
  2. Fetching only needed columns. When data is stored column-wise, each column can be accessed independently of the other columns. If a query touches only a subset of the columns in a table, IO is reduced. Data warehouse fact tables are often wide as well long. Typical queries touch only 10 - 15% of the columns. That means a column store can reduce IO by 85 - 90%, a huge speedup in systems that are often IO bound, meaning the query speed is limited by the speed at which needed data can be transferred from disk into memory.

"Columnstore indexes are available in CTP 3 of SQL Server Code Name "Denali." You can create a columnstore index on your table by using a slight variation on existing syntax for creating indexes. To create an index named mycolumnstoreindex on a table named mytable with three columns, named col1, col2, and col3, use the following syntax," informs Price:

CREATE NONCLUSTERED COLUMNSTORE INDEX mycolumnstoreindex ON mytable (col1, col2, col3);

"To avoid typing the names of all the columns in the table, you can use the Object Explorer in Management Studio to create the index as follows:

  1. Expand the tree structure for the table and then right click on the Indexes icon.
  2. Select New Index and then Nonclustered columnstore index
  3. Click Add in the wizard and it will give you a list of columns with check boxes.
  4. You can either choose columns individually or click the box next to Name at the top, which will put checks next to all the columns. Click OK.
  5. Click OK," Price explained.

For more information about using columnstore indexes, check out the MSDN article Columnstore Indexes and SQL Server Columnstore Index FAQ on the TechNet wiki.

[Via:SQL Server Team Blog ]