Spotify: Automating Cassandra repairs

Automating Cassandra Repairs

Radovan [email protected]

github.com/spotify/cassandra-reaper#CassandraSummit

mailto:[email protected]

mailto:[email protected]

About zvo

About zvo

Likes pancakes

About zvo

Likes pancakes

Does this for the 3rd time

About zvo

Likes pancakes

Does this for the 3rd time

Works at Spotify

Working at Spotify

Is autonomous

Squads responsible for their full stack

Including Cassandra

Cassandra

Node’s data

Cassandra

Replication

Cassandra

Running Cassandra

Requires many things

One of them is keeping data consistent

Otherwise it can get lost or reappear

Running Cassandra

Requires many things

One of them is keeping data consistent

Eventually

Eventual consistency

Cassandra


Read Repairs

Cassandra

R W


Hinted Handoff

Cassandra


Anti-entropy Repair

Cassandra

Coordinated process

Anti-entropy Repair

Coordinated process Four steps:

Anti-entropy Repair

Coordinated process Four steps:1: Hash

Anti-entropy Repair

#

#

#

Coordinated process Four steps:1: Hash2: Compare

Anti-entropy Repair

###

Coordinated process Four steps:1: Hash2: Compare3: Stream

Anti-entropy Repair

Coordinated process Four steps:1: Hash2: Compare3: Stream4: Merge

Anti-entropy Repair

Coordinated process Four steps:1: Hash2: Compare3: Stream4: MergeCan go wild...

Anti-entropy Repair

Repair gone wild

Repair gone wild

Eats a lot of disk IO● because of hashing all the data

Repair gone wild

Eats a lot of disk IOSaturates the network● because of streaming a lot of data around

Repair gone wild

Eats a lot of disk IOSaturates the networkFills up the disk● because of receiving all replicas, possibly

from all other data centers

Repair gone wild

Eats a lot of disk IOSaturates the networkFills up the diskCauses a ton of compactions● because of having to merge the received

data

Repair gone wild

Eats a lot of disk IOSaturates the networkFills up the diskCauses a ton of compactions

… one better be careful

Careful repair

nodetool repair

Careful repair

All three intervals

Partitioner range● nodetool repair -pr

Careful repair

This interval only

Start & end tokens● nodetool repair -pr -st -et

Careful repair

A part of interval only

Requires splitting the ring into smaller intervals

Smaller intervals mean less data

Less data means fewer repairs gone wild

Careful repair

Smaller intervals also mean more intervals

More intervals mean more actual repairs

Repairs need to be babysat :(

Careful repair

The Spotify way

Feature teams meant to do featuresNot waste time operating their C* clusters

Cron-ing nodetool repair is no good ● mostly due to no feedback loop

This all led to creation of the Reaper

The Reaper

REST(ish) service

Does a lot of JMX

Orchestrates repairs for you

The reaping

You:curl http://reaper/cluster --data ‘{“seedHost” : “my.cassandra.host.net”}’

The Reaper:● Figures out cluster info (e.g. name, partitioner)

http://reaper.com/cluster

http://reaper.com/cluster

The reaping

You:curl http://reaper/repair_run --data ‘{“clusterName”: “myCluster”}’

The Reaper:● Prepares repair intervals

http://reaper.com/repair_run


The reaping

You:curl -X PUT http://reaper/repair_run/42 -d state=RUNNING

The Reaper:● Starts triggering repairs of repair intervals


Reaper’s features

Reaper’s features

Carefulness - doesn’t kill a node● checks for node load● backs off after repairing an interval

Reaper’s features

Carefulness - doesn’t kill a nodeResilience - retries when things break● because things break all the time

Reaper’s features

Carefulness - doesn’t kill a nodeResilience - retries when things breakParallelism - no idle nodes● multiple small intervals in parallel

Reaper’s features

Carefulness - doesn’t kill a nodeResilience - retries when things breakParallelism - no idle nodesScheduling - setup things only once● regular full-ring repairs

Reaper’s features

Carefulness - doesn’t kill a nodeResilience - retries when things breakParallelism - no idle nodesScheduling - setup things only oncePersistency - state saved somewhere● a bit of extra resilience

What we reaped

First repair done 2015-01-281,700 repairs since then, recently 90 per week176,000 (16%) segments failed at least once60 repair failures

What we reaped

What we reaped

Reaper’s Future

CASSANDRA-10070

Whatever is needed until then

Greatest benefitCassandra Reaper automates a very tedious maintenance operation of Cassandra clusters in a rather smart, efficient and careful manner while requiring minimal Cassandra expertise

github.com/spotify/cassandra-reaper

#CassandraSummit

Thank you!

github.com/spotify/cassandra-reaper

#CassandraSummit

Technology

Spotify: Automating Cassandra repairs