In memory columnstore indexes--make your data warehouse

Preview:

DESCRIPTION

Presentation on SQL Server 2012 and 2014 Columnstore Indexing feature presented to Philadelphia SQL BI Usergroup on November 19, 2013

Citation preview

Who Moved My Tuple—Columnstore Indexes in SQL Server 2014

Joe D’Antoni Philadelphia SQL Server Users Group25 March 2014

Joe D’Antoni

Joe has over 15 years of experience with a wide variety of data platforms, in both Fortune 50 companies as well as smaller organizations

He is a frequent speaker on database administration, big data, and career management

He is the co-president of the Philadelphia SQL Server User’s Group

He wants you to make sure you can restore your data

Joedantoni.wordpress.com – Blog, Slides

http://bit.ly/SQLColumnstore -- Slides, Resources

AgendaIndexes—a basic overview

Columnstore—an introduction

Query Performance—Demo

2012 and 2014—What’s Changing?

2014—Demo

Questions

Indexes• Data Structure that allows us

to speed data retrieval, by maintaining an extra copy of data

• Can be filtered

• Can be function based, or ordered

• Penalty is that writes become more expensive

• More storage required

Indexes in SQL Server• Clustered vs. Nonclustered

• Clustered Index—Index Organized Table

• Non-clustered index “just an index”

Clustered Index• Data is ordered as is inserted

into pages• Data in clustered index is only

stored on disk once (it’s the data from the tables)

• Table without a clustered index is called a heap—no order at all

LastName FirstName Address PhoneNumber

Gates Bill 101 Money Ln (206)-555-1111

Smith John 101 Anywhere Rd

(212)-566-1112

Smith John 181 Uphill Way (215)-555-2425

Zuckerberg Mark 1 Hacker Way (650)-555-9999

Clustered Index Layout

Ellison Larry 1 Oracle Way (650)-555-1245New Record to be inserted

LastName FirstName Address PhoneNumber

Ellison Larry 1 Oracle Way (650)-555-1245

Gates Bill 101 Money Ln (206)-555-1111

Smith John 101 Anywhere Rd

(212)-566-1112

Smith John 181 Uphill Way (215)-555-2425

Zuckerberg Mark 1 Hacker Way (650)-555-9999

Non-Clustered Index• Duplicate copy of the data in table

• Provides point from index to table data

• No specific order of data in index

LastName FirstName Address PhoneNumber

Gates Bill 101 Money Ln (206)-555-1111

Smith John 101 Anywhere Rd

(212)-566-1112

Smith John 181 Uphill Way (215)-555-2425

Zuckerberg Mark 1 Hacker Way (650)-555-9999

Non-Clustered Index Layout

Ellison Larry 1 Oracle Way (650)-555-1245New Record to be inserted

LastName FirstName Address PhoneNumber

Gates Bill 101 Money Ln (206)-555-1111

Smith John 101 Anywhere Rd

(212)-566-1112

Smith John 181 Uphill Way (215)-555-2425

Zuckerberg Mark 1 Hacker Way (650)-555-9999

Ellison Larry 1 Oracle Way (650)-555-1245

So Why All This Talk About Indexes?

Data Warehouse Queries• Data Warehouses have a lot of data

• Querying lots of a data can take a really long time

• Processing data row by row—may not be the most efficient way to perform aggregations

Traditional Approaches To Improving Performance• Partitioned Tables• Indexed Views• Data Compression

LastName FirstName Address PhoneNumber

Ellison Larry 1 Oracle Way (650)-555-1245

Gates Bill 101 Money Ln (206)-555-1111

Smith John 101 Anywhere Rd

(212)-566-1112

Smith John 181 Uphill Way (215)-555-2425

Zuckerberg Mark 1 Hacker Way (650)-555-9999

Compression in SQL Server

Uncompressed Table

LastName

FirstName

Address PhoneNumber

Ellison Larry 1 Oracle Way (650)-555-1245

Gates Bill 101 Money Ln (206)-555-1111

Smith John 101 Anywhere Rd

(212)-566-1112

Smith John 181 Uphill Way

(215)-555-2425

Zuckerberg

Mark 1 Hacker Way (650)-555-9999

Row Compressed Table

LastName

FirstName

Address PhoneNumber

Ellison Larry 1 ***c** W** (650)-555-*245

G*t** B*** *0* M**** ** *2***********

S***h J*** *** ******** ** *************

***** **** *8* Up**** *** *************

Z******** **** * ******* *** *************

Page Compressed Table

Introducing Columnstore Indexes (SQL 2012)• Data is stored in columns, as

opposed to rows• This allows a much higher rate

of compression• Columns not used in a query a

simply not scanned, nor returned

• Recommended practice is to add most columns in a table to a index

Fn LnAreaCode Phone StNum StName StType City State

A Disney 661872-4547 111Wilson Dr

Bakersfield CA

Al Disney 530778-3737 222Main St Lewiston CA

Amy Disney 209577-5824 410Park Av

Santa Rosa CA

Anita Disney 559642-4472 89

Ahwahnee St San Diego CA

Anita Disney 209966-4472 781Mariposa Dr Napa CA

Ann Disney 949830-1883 3Amato Ct Yountville CA

Original Table

Fn

A

Al

Amy

Anita

Anita

Ann

LnDisneyDisneyDisneyDisneyDisneyDisney

AreaCode

661530209559209949

Phone872-4547778-3737577-5824642-4472966-4472830-1883

StNum111222410

89781

3

StNameWilsonMainParkAhwahneeMariposaAmato

StTypeDrStAvStDrCt

CityBakersfieldLewistonSanta RosaSan DiegoNapaYountville

StateCACACACACACA

Split in Columns

Fn A*l*my*nita********

LnDisney******************************

AreaCode

6615302*9*******4*

Phone872-4547***-3*3****-****6**-****9**-******0-1***

StNum1112224*089

7**3

StNameWilsonMa**P*rk*hw***e****i*******t*

StTypeDrStAv****C*

CityBakersfieldL*wi*tonS**** ******* DiegoNapaYountville

StateCA**********

Columnstore Compressed

Columnar Data Storage

From Microsoft SIGMOD Paper

So How are Columnstores So Much Faster?• Very good compression ratio for Column

oriented data• Better use of Memory• Segment Elimination Skips Large Chunks of

Data• Batch Mode

• Processes data in chunks of a 1000 row “batches” rather than row by row

• 7-40x CPU savings with batch mode

“The key to getting the best performance is to make sure your queries process the large majority of data in batch mode.”

Columnstore All The Things?• Awesome performance—so

what’s the negative?• Can’t update/insert in

2012• Can only be nonclustered

index—so we are storing more data on disk

• Data types are somewhat limited

• One index per table• Can’t be a sorted index

Update Process (2012)

Fact Table

Partition 1

Fact Table

Partition 3

Fact Table

Partition 2

Staging Table Data To Be

Loaded

Build Columnstore Index

Fact Table

Partition 4Partition Switch

Data From Staging to Fact Table

So Where To Use Columnstore Indexes?• Only on Large Tables—Fact

tables and Dimension Tables > 3 Million Rows

• Include Every Column • Structure Queries as star

joins with grouping and aggregation

More details here

Columnstore 2014

Columnstore in 2014• Fewer Data Type Limitations

• Updateable

• Can be Clustered Index

• New Archival Compression Mode

• Batch Mode Improvements

Columnstore Trickle Updates (2014)

Updates To Index

Collected until they reach 210

rows

Tuple Movers

Move into Index

This is the process when loading 102,399 rows or fewer

Columnstore Bulk Insert

Columnstore Updates (2014)• Bulk Inserts go

through special API• Updates are

processed as inserts and deletes, so expensive operation

Columnstore Compression Effect

1 2 3 4 5 6 70

50

100

150

200

250

300

Columnstore Compression

No CS Clustered CS Archival CS

1 2 3 4 5 6 70

10

20

30

40

50

60

70

80

Columnstore Archival Compression

Clustered CS Archival CS

• Average space savings of columnstore versus no compression—69%

• Average space savings of columnstore Archival versus regular columnstore—29%

Columnstore 2014Demo

What Do We Do Differently in 2014• Best Practices are mostly the

same• Batch mode gets enhanced

and gains more query types• No need to worry about

dropping and rebuilding indexes—just append data

• Still focus on large tables where data is not frequently updated

• Archival Compression Good for old unused data

Questions

Contact jdanton1@yahoo.com

Joedantoni.wordpress.com

@jdanton

http://bit.ly/SQLColumnstore -- Slides, Resources

Recommended