Professional Cassandra support and services
Tuesday, August 10, 2010
Cassandra: Present & FutureJonathan Ellis
@spyced
Tuesday, August 10, 2010
Cassandra 0.6 & 0.7Jonathan Ellis
@spyced
Tuesday, August 10, 2010
Quiet change of policy
• 0.5.1 was bug fixes only
• Too early to be strict about bugfix-only policy in stable branch, especially w/ 0.7 being longer/more break-y
• Maybe after 1.0?
Tuesday, August 10, 2010
0
375
750
1125
1500
Jan(0.5)
Feb(0.5.1) Mar
Apr(0.6, 0.6.1)
May(0.6.2)
Jun(0.6.3)
Jul(0.6.4)
mails sent
Tuesday, August 10, 2010
Lots of bug fixes
• 85 issues marked Resolved/Fixed in 0.6 branch after 0.6 released
Tuesday, August 10, 2010
Runtime configuration
• concurrent reads, writes (0.6.2)
• making it easier to bandage your foot after you shoot it
• PhiConvictThreshold (0.6.2)
Tuesday, August 10, 2010
Performance
• JVM GC defaults (0.6.2)
• Faster commitlog (0.6.2)
• Faster range slice, Hadoop jobs (0.6.1, 2)
• Better parallelization of multiget (0.6.4)
• UTF8Type, UUIDType optimizations (0.6.5)
Tuesday, August 10, 2010
Bulletproofing
• HH disable (0.6.2)
• compaction priority (0.6.3)
• HH hourly scan (0.6.3)
• JMX metrics for row-level bloom filters (0.6.3)
• Flow control (0.6.4, 5)
• HH paging (0.6.5)
• Dynamic snitch (0.6.5)
Tuesday, August 10, 2010
Hinted Handoff
• 0.6.0: send hints to natural replicas
• 0.6.0: fix row-level concurrency bottleneck
• 0.6.2: option to disable entirely
• 0.6.3: remove hourly scan
• 0.6.4: lower priority
• 0.6.5: paging of large hinted rows
• 0.7.0: large rows
Tuesday, August 10, 2010
Why keep HH around?
https://www.cloudkick.com/blog/2010/jan/12/visual-ec2-latency/
Tuesday, August 10, 2010
Compaction priority
-XX:+UseThreadPriorities \-XX:ThreadPriorityPolicy=42 \-Dcassandra.compaction.priority=1 \
Extended to HH in 0.6.4
Tuesday, August 10, 2010
http://www.javamex.com/tutorials/threads/priority_what.shtml
Tuesday, August 10, 2010
JMX for bloom filters
• o.a.c.db:ColumnFamilyStores
• getBloomFilterFalsePositives
• [not in nodetool yet]
Tuesday, August 10, 2010
Flow control in 0.5
• Why backpressure doesn’t fit Cassandra
Tuesday, August 10, 2010
Flow Control in 0.6.4
• Replica nodes drop hopeless requests on the floor
• Coordinator node is unaffected
• TimedOutException signals client to back off
• Requires enough memory to buffer RPCTimeout’s worth of requests
• (In the short term, you’re still screwed)
Tuesday, August 10, 2010
Flow Control, 0.6.4IncomingTcpConnection
Message Deserializer
MutationRead
Uncapped
Capped at 4096
Tuesday, August 10, 2010
IncomingTcpConnection
Message Deserializer
MutationRead Gossip
Tuesday, August 10, 2010
Flow Control, 0.6.5IncomingTcpConnection
MutationRead Gossip Uncapped
Tuesday, August 10, 2010
Dynamic snitch
• sortByProximity
Tuesday, August 10, 2010
Open problems
• Linux/mmap/swap unholy trio (0.6.5)
• Memory fragmentation (0.6.5?)
• Compaction effect on caches (0.7.1?)
Tuesday, August 10, 2010
mmap and swap
• The problem
• Mitigations
• mmap_index_only
• swappiness=0
• turn off swap
• mlockall at startup (Xms=Xmx)
Tuesday, August 10, 2010
GC Fragmentation
• Culprit of infamous CASSANDRA-1014?
• Mitigation: tune with much larger new generation / tenuring threshold?
Tuesday, August 10, 2010
Compaction and caches
• Compactions wrecks the OS fs cache
• Wrecks Cassandra key cache, too
• (but not row cache)
Tuesday, August 10, 2010
0.7
Tuesday, August 10, 2010
New in 0.7
• live schema changes
• large rows
• secondary indexes
• efficient Streaming
• DatacenterStrategy
Tuesday, August 10, 2010
Live schema changes
• Details: http://www.riptano.com/blog/live-schema-updates-cassandra-07
Tuesday, August 10, 2010
Large rows
• 0.6: smaller of {2GB, memory limit}
• 0.7: in_memory_compaction_limit_in_mb
Tuesday, August 10, 2010
Secondary indexes
Tuesday, August 10, 2010
A
L
T
W
F(A-L]
Streaming in 0.6
Tuesday, August 10, 2010
A
L
T
W
F(A-F]
(F-L]
(A-F]
Tuesday, August 10, 2010
A
L
T
W
F
Data
Index
Filter
Tuesday, August 10, 2010
A
L
T
W
F
Index
Filter
Streaming in 0.7
Tuesday, August 10, 2010
DatacenterStrategy
• RackAwareStrategy is tuned for 3 replicas and 2 data centers
• DS allows configuring replicas per data center, per Keyspace
Tuesday, August 10, 2010
Minor features in 0.7
• read_repair_chance
• per-keyspace request scheduling
• Hadoop OutputFormat
• Per CF what used to be global (gc_grace_seconds, memtable thresholds)
Tuesday, August 10, 2010
0.7 API changes
• String keys become byte[]
• Thrift keyspace argument moved to set_keyspace
• i64 timestamp becomes Clock
• SlicePredicate for _count methods
Tuesday, August 10, 2010
0.7 performance
• Reads roughly 100% faster, thanks largely to removing String creation
• Row-cached reads up to 8x faster after optimizations by tjake and jbellis
• Optimizations for reads of large rows
• 0.7.1? ~20% improvement everywhere from Thrift optimizations
Tuesday, August 10, 2010
Thrift
• OOMs on malformed packets
• Python Unicode string issues
• PHP support is buggy and maintainerless
Tuesday, August 10, 2010
After 0.7.0
• IndexOperator.GT
• Triggers / plugins
• Avro?
• On-disk data format improvements (Compression, heirarchical data?)
• Auth
Tuesday, August 10, 2010
Questions
Tuesday, August 10, 2010