15
Continuous Deployment with C*: Treating C* as First-Class Code Michael Kjellman @mkjellman Software Engineer, Barracuda Networks

Continuous Deployment with Cassandra

Embed Size (px)

DESCRIPTION

Michael Kjellman, Software Engineer at Barracuda Networks, has offered to present on his experiences with Apache Cassandra. Come learn about: • Continuous Deployments with Cassandra • Upgrading Cassandra • When Upgrades Go Wrong • Coding Complexity Moved to Operations (How to Prepare and Plan) • Why 'Apt-get/Yum Install Cassandra' is a bad idea • Why You Should Treat Cassandra’s Code like it's Your Own

Citation preview

Page 1: Continuous Deployment with Cassandra

Continuous Deployment with C*: Treating C* as First-Class Code

Michael Kjellman@mkjellman

Software Engineer, Barracuda Networks

Page 2: Continuous Deployment with Cassandra
Page 3: Continuous Deployment with Cassandra

C* At Barracuda• Powers 100% of our Spam and Webfilter Backend• 48 Node Cluster• 2 Datacenters• Requests: 20k writes/sec 30k reads/sec • Latency: 1 ms/write 1.6 ms/read• > 30TB of Data • Almost entirely native protocol/CQL3

Page 4: Continuous Deployment with Cassandra

Hardware Configuration• 32GB of RAM• 1x SSD• 2x Spinning Disks• 2x 6 Core AMD

Page 5: Continuous Deployment with Cassandra

Key Configuration Options• key_cache_size_in_mb: 1024• row_cache_size_in_mb: 0• memtable_total_space_in_mb: 2048• HEAP_NEWSIZE = “1200M” (-Xmn)• MAX_HEAP_SIZE = “8G” (-Xmx)• -XX:SurvivorRatio=6

• Sidenote: Java 7u40 is out!

Page 6: Continuous Deployment with Cassandra

How do I keep my graphs pretty during a C* upgrade?

September 18th 2013

Page 7: Continuous Deployment with Cassandra

Make a C* Build$> git clone http://git-wip-us.apache.org/repos/asf/cassandra.git$> git checkout –t origin/cassandra-1.2$> git log$> vim build.xml (change version number every time you make a build!)$> ant clean release

Page 8: Continuous Deployment with Cassandra

Deployment• Make release• Test release with CCM• Push release to Puppet (deals with config, etc)• Run controlled and scripted rolling restart one datacenter at a

time– flush– stop– start– validate node

Page 9: Continuous Deployment with Cassandra

Automate, Automate, Automate

Page 10: Continuous Deployment with Cassandra

So, why not just apt-get install cassandra?

• Makes running a custom release in the future a complete nightmare

• Lost visibility into changes in the release• WHY are you upgrading• Treat a C* build just as if it was a release of your

code. What commits did you put into your own release?

Page 11: Continuous Deployment with Cassandra

MY CODE DOESN’T WORK WITHOUT A STABLE C* CLUSTER

Simply Put:

Page 12: Continuous Deployment with Cassandra

When things go wrong• Every commit (those by C* committers or my own)

come with potential bugs and regressions• Gossip Bugs Can Bite Hard:– CASSANDRA-5665: Gossiper.handleMajorStateChange

can lose existing node ApplicationState• At 48 nodes, even small mistakes are massive

Page 13: Continuous Deployment with Cassandra

Writing your code to deal with node failure

• Upgrading a C* cluster means constant node failures for the duration of the rolling restart

• How does your code deal with read latency and retries– CASSANDRA-4705: Eager Retries for reads for 2.0+

• The mythical “constantly failing” code != stability. – Handle exceptions (and node/read failures) gracefully!

Page 14: Continuous Deployment with Cassandra

Why treat C* like your own code• Using C* will move much of your own

application logic to C*• The bugs have to go somewhere!• Data replication at database layer or at

application layer

Page 15: Continuous Deployment with Cassandra

QUESTIONS?Thanks for Listening!