Patrick McFadin | Chief Evangelist DataStax@PatrickMcFadin
Data Model on Fire
#CASSANDRAEU
Friday, October 18, 13
#CASSANDRAEUData Model is King•With 2.0 we now have more choices•Sometimes the data model is only the first part
•Understanding the underlying engine helps
•You aren’t done until you tune
Load test baby!
Friday, October 18, 13
Light Weight Transactions
Friday, October 18, 13
#CASSANDRAEUThe race is onProcess 1 Process 2
SELECT firstName, lastNameFROM usersWHERE username = 'pmcfadin';
SELECT firstName, lastNameFROM usersWHERE username = 'pmcfadin';
(0 rows)
(0 rows)
INSERT INTO users (username, firstname, lastname, email, password, created_date)VALUES ('pmcfadin','Patrick','McFadin', ['[email protected]'], 'ba27e03fd95e507daf2937c937d499ab', '2011-06-20 13:50:00');
INSERT INTO users (username, firstname, lastname, email, password, created_date)VALUES ('pmcfadin','Paul','McFadin', ['[email protected]'], 'ea24e13ad95a209ded8912e937d499de', '2011-06-20 13:51:00');
T0
T1
T2
T3
Got nothing! Good to go!
This one wins
Friday, October 18, 13
#CASSANDRAEUSolution LWTProcess 1
INSERT INTO users (username, firstname, lastname, email, password, created_date)VALUES ('pmcfadin','Patrick','McFadin', ['[email protected]'], 'ba27e03fd95e507daf2937c937d499ab', '2011-06-20 13:50:00')IF NOT EXISTS;
T0
T1 [applied]----------- True
•Check performed for record•Paxos ensures exclusive access
•applied = true: Success
Friday, October 18, 13
#CASSANDRAEUSolution LWTProcess 2
T2
T3
[applied] | username | created_date | firstname | lastname -----------+----------+--------------------------+-----------+---------- False | pmcfadin | 2011-06-20 13:50:00-0700 | Patrick | McFadin
INSERT INTO users (username, firstname, lastname, email, password, created_date)VALUES ('pmcfadin','Paul','McFadin', ['[email protected]'], 'ea24e13ad95a209ded8912e937d499de', '2011-06-20 13:51:00')IF NOT EXISTS;
•applied = false: Rejected•No record stomping!
Friday, October 18, 13
#CASSANDRAEULWT Fine Print•Light Weight Transactions solve edge conditions•They have latency cost.
•Be aware
•Load test
•Consider in your data model
•Now go shut down that ZooKeeper mess you have!
Friday, October 18, 13
Form Versioning: Revisited
Friday, October 18, 13
#CASSANDRAEUForm Versioning Pt 1•From “Next top data model”•Great idea, but edge conditions
CREATE TABLE working_version (! username varchar,! form_id int,! version_number int,! locked_by varchar,! form_attributes map<varchar,varchar> ! PRIMARY KEY ((username, form_id), version_number)) WITH CLUSTERING ORDER BY (version_number DESC);
•Each user has a form•Each form needs versioning
•Need an exclusive lock on the form
Friday, October 18, 13
#CASSANDRAEUForm Versioning Pt 1
INSERT INTO working_version (username, form_id, version_number, locked_by, form_attributes)VALUES ('pmcfadin',1138,1,'',{'FirstName<text>':'First Name: ','LastName<text>':'Last Name: ','EmailAddress<text>':'Email Address: ','Newsletter<radio>':'Y,N'});
UPDATE working_version SET locked_by = 'pmcfadin'WHERE username = 'pmcfadin'AND form_id = 1138AND version_number = 1;
INSERT INTO working_version (username, form_id, version_number, locked_by, form_attributes)VALUES ('pmcfadin',1138,2,null,{'FirstName<text>':'First Name: ','LastName<text>':'Last Name: ','EmailAddress<text>':'Email Address: ','Newsletter<checkbox>':'Y'});
1. Insert first version
2. Lock for one user
3. Insert new version. Release lock
Danger Zone
Friday, October 18, 13
#CASSANDRAEUForm Versioning Pt 2
INSERT INTO working_version (username, form_id, version_number, locked_by, form_attributes)VALUES ('pmcfadin',1138,1,'pmcfadin',{'FirstName<text>':'First Name: ','LastName<text>':'Last Name: ','EmailAddress<text>':'Email Address: ','Newsletter<radio>':'Y,N'})IF NOT EXISTS;
UPDATE working_version SET form_attributes['EmailAddress<text>'] = 'Primary Email Address: 'WHERE username = 'pmcfadin'AND form_id = 1138AND version_number = 1IF locked_by = 'pmcfadin';
UPDATE working_version SET form_attributes['EmailAddress<text>'] = 'Email Adx: 'WHERE username = 'pmcfadin'AND form_id = 1138AND version_number = 1IF locked_by = 'dude';
1. Insert first version
Exclusive lock
Accepted
Rejected(sorry dude)
Friday, October 18, 13
#CASSANDRAEUForm Versioning Pt 2•Old way: Edge cases with problems
•Use external locking?
•Take your chances?
•New way: Managed expectations (LWT)•Exclusive by existence check
•Continued with IF clause
•Downside: More latency
Friday, October 18, 13
Fire: Bring it
Friday, October 18, 13
#CASSANDRAEUCassandra 2.0 Fire•Great changes in both 1.2 and 2.0 for perf•Three big changes in 2.0 I like
Friday, October 18, 13
#CASSANDRAEUCassandra 2.0 Fire•Great changes in both 1.2 and 2.0 for perf•Three big changes in 2.0 I like
Single pass compaction
Friday, October 18, 13
#CASSANDRAEUCassandra 2.0 Fire•Great changes in both 1.2 and 2.0 for perf•Three big changes in 2.0 I like
Single pass compaction
Hints to reduce SSTable reads
Friday, October 18, 13
#CASSANDRAEUCassandra 2.0 Fire•Great changes in both 1.2 and 2.0 for perf•Three big changes in 2.0 I like
Single pass compaction
Hints to reduce SSTable reads
Faster index reads from off-heap
Friday, October 18, 13
#CASSANDRAEUWhy is this important?•Reducing SStable reads mean less seeks•Disk seeks can add up fast
•5 seeks on SATA = 60ms of just disk!
Avg Access Time* Rotation Speed
12ms 7200 RPM
7ms 10k RPM
5ms 15k RPM
.04ms SSD
* Source: www.tomshardware.com
Friday, October 18, 13
#CASSANDRAEUWhy is this important?•Reducing SStable reads mean less seeks•Disk seeks can add up fast
•5 seeks on SATA = 60ms of just disk!
Avg Access Time* Rotation Speed
12ms 7200 RPM
7ms 10k RPM
5ms 15k RPM
.04ms SSD
* Source: www.tomshardware.com
Shared storage == Great sadness
Friday, October 18, 13
#CASSANDRAEUQuick Diversion•cfhistograms is your friend•Histograms of statistics per table
•Collected...•per read
•per write
•SSTable flush
•Compaction
nodetool cfhistograms <keyspace> <table>
Friday, October 18, 13
#CASSANDRAEU
How do I even read this thing!
Friday, October 18, 13
#CASSANDRAEUHistograms How to
nodetool cfhistograms videodb users
videodb/users histogramsOffset SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes)1 107 0 0 0 02 0 0 0 0 010 0 0 0 0 5250 0 5 0 0 0800 0 10 50 0 01250 0 0 300 5 0
•Unit-less column•Units are assigned by each column
•Numerical buckets
Friday, October 18, 13
#CASSANDRAEUHistograms How to
nodetool cfhistograms videodb users
videodb/users histogramsOffset SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes)1 107 0 0 0 02 2 0 0 0 010 0 0 0 0 5250 0 5 0 0 0800 0 10 50 0 01250 0 0 300 5 0
•Per read. How many seeks?•Offset is number of SSTables read
•Less == lower read latency
•107 reads took 1 seek to satisfy
Friday, October 18, 13
#CASSANDRAEUHistograms How to
nodetool cfhistograms videodb users
videodb/users histogramsOffset SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes)1 107 0 0 0 02 2 0 0 0 010 0 0 0 0 5250 0 5 0 0 0800 0 10 50 0 01250 0 0 300 5 0
•Per write. How fast?•Offset is microseconds
Friday, October 18, 13
#CASSANDRAEUHistograms How to
nodetool cfhistograms videodb users
videodb/users histogramsOffset SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes)1 107 0 0 0 02 2 0 0 0 010 0 0 0 0 5250 0 5 0 0 0800 0 10 50 0 01250 0 0 300 5 0
•Per read. How fast?•Offset is microseconds
Friday, October 18, 13
#CASSANDRAEUHistograms How to
nodetool cfhistograms videodb users
videodb/users histogramsOffset SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes)1 107 0 0 0 02 2 0 0 0 010 0 0 0 0 5250 0 5 0 0 0800 0 10 50 0 01250 0 0 300 5 0
•Per partition (storage row)•Offset is size in bytes
•5 partitions are 1250 bytes
Friday, October 18, 13
#CASSANDRAEUHistograms How to
•Per partition (storage row)•Offset is count of cells in partition
•5 partitions have 10 cells
nodetool cfhistograms videodb users
videodb/users histogramsOffset SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes)1 107 0 0 0 02 2 0 0 0 010 0 0 0 0 5250 0 5 0 0 0800 0 10 50 0 01250 0 0 300 5 0
Friday, October 18, 13
#CASSANDRAEUHistograms + Data Model•Your data model is the key to success•How do you ensure that?
Test
Measure
Repeat
Friday, October 18, 13
#CASSANDRAEUReal World Example•Real Customer•Needed very tight SLA on reads
•Read response highly variable•Loading data increases latency
Problem
Friday, October 18, 13
#CASSANDRAEU
Offset SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes)1 2016550 0 0 0 02 2064495 0 0 0 03 434526 0 0 0 04 51084 0 0 0 05 0 0 0 0 06 0 0 0 0 07 0 0 0 0 08 0 0 0 0 010 0 0 0 0 162912 0 0 0 0 297114 0 0 0 0 128617 0 0 0 0 6820 0 0 0 0 18824 0 0 0 0 10129 0 0 0 0 5079935 0 0 0 0 26942 0 0 0 0 13241450 0 0 0 0 3294360 0 0 0 0 6209972 0 0 0 0 11685586 0 0 0 0 41562103 0 0 0 0 42796124 0 0 0 0 46719149 0 0 0 0 57693179 0 0 3 0 27659215 0 0 18 0 26941258 0 0 47 0 21589310 0 0 71 0 19494372 0 0 141 0 8681446 0 0 67 0 9499535 0 0 36466 1629 9360642 0 0 263829 0 4349770 0 0 608488 2971 4242924 0 0 209549 1468 24221109 0 0 398845 59 16851331 0 0 625099 45105 9541597 0 0 462636 5731 6101916 0 0 499920 132391 3662299 0 0 380787 16265 3032759 0 0 285323 20015 1883311 0 0 202417 30980 1063973 0 0 148920 44973 644768 0 0 106452 38502 555722 0 0 81533 69479 236866 0 0 55470 39218 158239 0 0 43512 23027 39887 0 0 30810 58498 211864 0 0 22375 73629 014237 0 0 15148 33444 117084 0 0 12047 28321 020501 0 0 11298 17021 024601 0 0 9652 13072 329521 0 0 6715 7790 035425 0 0 13788 7764 042510 0 0 15322 5890 051012 0 0 8585 4046 061214 0 0 5041 2973 073457 0 0 2892 1954 088148 0 0 1543 936 0105778 0 0 900 661 0126934 0 0 486 409 0152321 0 0 285 289 0182785 0 0 124 178 0219342 0 0 35 126 0263210 0 0 8 76 0315852 0 0 0 68 0
• Compactions behind
• Disk IO problems
• How to optimize?
Friday, October 18, 13
#CASSANDRAEU
Offset SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes)1 2045656 0 0 0 02 1813961 0 0 0 03 70496 0 0 0 04 0 0 0 0 05 0 0 0 0 06 0 0 0 0 07 0 0 0 0 08 0 0 0 0 010 0 0 0 0 4712 0 0 0 0 86014 0 0 0 0 39317 0 0 0 0 5020 0 0 0 0 024 0 0 0 0 2129 0 0 0 0 3448935 0 0 0 0 3242 0 0 0 0 9722650 0 0 0 0 2449060 0 0 0 0 4707772 0 0 0 0 9476186 0 0 0 0 32559103 0 0 0 0 33885124 0 0 0 0 37051149 0 0 1 0 48429179 0 0 17 0 23272215 0 0 95 0 22459258 0 0 84 0 17953310 0 0 174 0 16178372 0 0 53082 0 7123446 0 0 318074 0 7836535 0 0 423140 47 7904642 0 0 382926 0 3552770 0 0 365670 860 3525924 0 0 414824 392 19981109 0 0 442701 46 14111331 0 0 335862 30325 7571597 0 0 302920 4082 5181916 0 0 236448 97224 2942299 0 0 171726 11843 2542759 0 0 122880 15160 1623311 0 0 90413 23484 893973 0 0 66682 34799 624768 0 0 53385 29619 545722 0 0 39121 53155 236866 0 0 26828 30702 128239 0 0 18930 18627 39887 0 0 12517 47739 211864 0 0 8269 61853 014237 0 0 6049 28875 117084 0 0 4614 24391 020501 0 0 5868 14450 024601 0 0 6167 11112 029521 0 0 2879 6609 035425 0 0 2054 6654 042510 0 0 8913 4986 051012 0 0 4429 3352 061214 0 0 1541 2465 073457 0 0 560 1607 088148 0 0 192 809 0105778 0 0 59 523 0126934 0 0 19 333 0152321 0 0 0 262 0
2 ms!
Lessseeks
• Tuned data disk
• Compactions better
• 1 less seek overall
• Further tuning made it even better!
What about the partition size?
Friday, October 18, 13
#CASSANDRAEUPartition Size•Tuning is an option based on size in bytes•All about the reads
•index_interval•How many samples taken
•Lower for faster access but more memory usage
•column_index_size_in_kb•Add column indexes to a row when the data reaches this size
•Partial row reads? Maybe smaller.
Friday, October 18, 13
#CASSANDRAEUTuning results•Spent a lot of time tuning disk•Played with
•index_interval (Lowered)
•concurrent_reads (Increased)
•column_index_size_in_kb (Lowered)
220 Million Ops/Day
10000 Transactions/Sec Peak
9ms at 95th percentile. Measured at the application!
Friday, October 18, 13
Offset SSTables Write Latency Read Latency Row Size Column Count1 27425403 0 0 0 02 0 0 0 0 03 0 0 0 0 04 0 0 1 0 05 0 0 24 0 06 0 0 56 0 07 0 0 92 0 08 0 0 283 0 010 0 0 2834 0 012 0 0 11954 0 014 0 0 32621 0 121834517 0 0 135311 0 020 0 0 314195 0 024 0 0 610665 0 029 0 0 536736 0 035 0 0 162541 0 042 0 0 25277 0 050 0 0 7847 0 060 0 0 5864 0 072 0 0 9580 0 086 0 0 5517 0 0103 0 0 3822 0 0124 0 0 1850 0 0149 0 0 394 0 0179 0 0 253 0 0215 0 0 305 0 0258 0 0 4657297 0 0310 0 0 12748409 0 0372 0 0 7475534 0 0446 0 0 263549 0 0535 0 0 217171 0 0642 0 0 41908 1218345 0770 0 0 24876 0 0924 0 0 13566 0 01109 0 0 10875 0 01331 0 0 9379 0 01597 0 0 7111 0 01916 0 0 5333 0 02299 0 0 5072 0 02759 0 0 3987 0 03311 0 0 5290 0 03973 0 0 5169 0 04768 0 0 2867 0 05722 0 0 2093 0 06866 0 0 3177 0 08239 0 0 2161 0 09887 0 0 1552 0 011864 0 0 1200 0 014237 0 0 834 0 017084 0 0 1380 0 020501 0 0 6219 0 024601 0 0 4977 0 029521 0 0 2114 0 035425 0 0 6479 0 042510 0 0 18417 0 051012 0 0 5532 0 0
#CASSANDRAEU
• The two hump problem
• Reads awesome until
• Compaction!
• Solution:
• Throttle down compaction
• Tune disk
• Ignore it
Friday, October 18, 13
#CASSANDRAEUDisk + Data Model•Understand the internals
•Size of partition
•Compaction
•Learn how to measure•Load test
Friday, October 18, 13
#CASSANDRAEU
*More? My data modeling talks:
The Data Model is Dead, Long Live the Data Model
Become a Super Modeler
The World's Next Top Data Model
Thank you! Time for questions...
Friday, October 18, 13