11
Cassandra Metrics By: Chris Lohfink

Metrics lightning talk

Embed Size (px)

DESCRIPTION

Introduction to thread pool metrics in Cassandra

Citation preview

Page 1: Metrics lightning talk

Cassandra MetricsBy: Chris Lohfink

Page 2: Metrics lightning talk

Blackbird

About Me

• Engineer at Blackbird

• Worked with C* since 0.8 (3 years)• 7 years as a Java/Python developer• Interests

o Data Scienceo Hobbyist Electronicso Development

Page 3: Metrics lightning talk

Blackbird

About Cassandra

• Fault tolerant to a faulto easy to ignore until it gets bad

• Like all other systems:o If not many events no one pays attention to ito If theres a lot of events need to keep eye on ito When things happen need information to quickly diagnose

Basically...

Page 4: Metrics lightning talk

Blackbird

Page 5: Metrics lightning talk

Blackbird

Lots of Metrics

A lot of data but with no context or understanding doesn’t have that much use

… but you have lots of pretty graphs

Page 6: Metrics lightning talk

Blackbird

Disclaimer

This not all of the important metrics, in fact it is missing many critical ones

• Heap• OS metrics• Latencies• Log messages

Page 7: Metrics lightning talk

Blackbird

An Example for a little background

Threads

ReadStage

x32

Clie

nt R

equ

est RequestResponse

231-1 231-1 Threads

ReadRepairStage

Threads231-1

MessagingService

Page 8: Metrics lightning talk

Blackbird

Cassandra Key Metrics

● Cassandra internal messaging based on SEDA with many asynchronous elements

● Its easy to overrun the processing capabilities of a stage that is not in the requests feedback loop (i.e. ReadRepairStage)

Page 9: Metrics lightning talk

Blackbird

Access the metrics

● nodetool tpstatsPool Name Active Pending Completed Blocked All time blockedReadStage 0 0 113702 0 0RequestResponseStage 0 0 0 0 0MutationStage 0 0 164503 0 0...InternalResponseStage 0 0 0 0 0HintedHandoff 0 0 0 0 0

Message type DroppedRANGE_SLICE 0READ_REPAIR 0...REQUEST_RESPONSE 0COUNTER_MUTATION 0

● JMXorg.apache.cassandra.request:type=* and org.apache.cassandra.internal:type=*

● Metrics Reporter

MBean Attribute tpstats name Description

ActiveCount Active Number of tasks pulled off the queue with a Thread currently processing.

PendingTasks Pending Number of tasks in queue waiting for a thread

CompletedTasks Completed Number of tasks completed

CurrentlyBlockedTasks Blocked When a pool reaches its core pool size (configurable or set per stage, more below) it will begin queuing until the max size is reached. When this is reached it will block until there is room in the queue.

TotalBlockedTasks All time blocked Total number of tasks that have been blocked

Page 10: Metrics lightning talk

Blackbird

Examples

• Read/Mutation Stageo Too many reads/writes, disk failure, poor tuning

• ReplicateOnWrite (CounterMutationStage in 2.1+)o High throughput of counter increments

• FlushWritero writes over running disk capabilities, poor tuningo large collections

• GossipStageo vnodes + many servers (pre 2.0.3)

Page 11: Metrics lightning talk

Blackbird

Questions

?