47
Cassandra stress Christopher Batey @chbatey The Last Pickle

The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pickle) | C* Summit 2016

Embed Size (px)

Citation preview

Cassandra stress

Christopher Batey@chbatey

The Last Pickle

Overview

● Importance of early realistic testing● What is cassandra stress● How you use it

Why do Cassandra projects fail?

How cassandra projects roll

● New shiny project● Cassandra is a good fit● Build application against a single node cassandra cluster● Go to production● Fail

Cassandra is not an agile database

● Schema migrations :(● Data migrations :(● Must know queries up front

You must iterate on your schema early

● Know your queries● Know your data● Test them against a realistic cluster

Cassandra stress

cassandra-stress

Where

<install>/tools/bin/cassandra-stress

<install>/tools/bin/cassandra-stressd

<install>/tools/bin/compaction-stress

Resources

Commandscassandra-stress help

---Commands---read : Multiple concurrent reads - the cluster must first be populated by a write test

write : Multiple concurrent writes against the cluster

mixed : Interleaving of any basic commands

user : Interleaving of user provided queries, with configurable ratio and distribution

help : Print help for a command or option

10

Command help

cassandra-stress help write

Usage: write n=? [no-warmup] [truncate=?] [cl=?] OR Usage: write duration=? [no-warmup] [truncate=?] [cl=?]

err<? (default=0.02) Run until the standard error of the mean is below this fraction n>? (default=30) Run at least this many iterations before accepting uncertainty convergence n<? (default=200) Run at most this many iterations before accepting uncertainty convergence no-warmup Do not warmup the process truncate=? (default=never) Truncate the table: never, before performing any work, or before each iteration cl=? (default=LOCAL_ONE) Consistency level to use n=? Number of operations to perform duration=? Time to run in (in seconds, minutes or hours)

Example

cassandra-stress write n=100 no-warmup truncate=once cl=ONE

Output

Op rate : 654 op/s [WRITE: 654 op/s]Partition rate : 654 pk/s [WRITE: 654 pk/s]Row rate : 654 row/s [WRITE: 654 row/s]Latency mean : 59.5 ms [WRITE: 59.5 ms]Latency median : 64.0 ms [WRITE: 64.0 ms]Latency 95th percentile : 79.3 ms [WRITE: 79.3 ms]Latency 99th percentile : 84.0 ms [WRITE: 84.0 ms]Latency 99.9th percentile : 84.9 ms [WRITE: 84.9 ms]Latency max : 84.9 ms [WRITE: 84.9 ms]Total partitions : 100 [WRITE: 100]Total errors : 0 [WRITE: 0]Total GC count : 0Total GC memory : 0.000 KiBTotal GC time : 0.0 secondsAvg GC time : NaN msStdDev GC time : 0.0 msTotal operation time : 00:00:00

Options

---Options----pop : Population distribution and intra-partition visit order-insert : Insert specific options relating to various methods for batching and splitting partition-rate : Thread count, rate limit or automatic mode (default is auto)-mode : Thrift or CQL with options-errors : How to handle errors when encountered during stress-sample : Specify the number of samples to collect for measuring latency-schema : Replication settings, compression, compaction, etc.-node : Nodes to connect to-log : Where to log progress to, and the interval at which to do it-transport : Custom transport factories-port : The port to connect to cassandra nodes on-sendto : Specify a stress server to send this command to

Options help

cassandra-stress help -mode

Usage: -mode native [unprepared] cql3 [compression=?] [port=?] [user=?] [password=?] [auth-provider=?] [maxPending=?] [connectionsPerHost=?] [protocolVersion=?]

user=? username password=? password unprepared force use of unprepared statements compression=? (default=none) port=? (default=9046) auth-provider=? Fully qualified implementation of com.datastax.driver.core.AuthProvider maxPending=? (default=) Maximum pending requests per connection connectionsPerHost=? (default=) Number of connections per host protocolVersion=? (default=NEWEST_SUPPORTED) CQL Protocol Version

User mode

● YAML profile

Scenario: tracking your staff

● Get all the activities for a staff member● Get the latest event for a staff member● Is Cassandra a good fit?

Defining your schema

table: staff_activities

table_definition: | CREATE TABLE staff_activities ( name text, when timeuuid, what text, PRIMARY KEY(name, when) )

Column metadata

columnspec: - name: name size: uniform(5..10) # The names of the staff members are between 5-10 characters population: uniform(1..10) # 10 possible staff members to pick from

- name: when cluster: uniform(20..500) # Staff members do between 20 and 500 activities

- name: what size: normal(10..100,50)

Column metadata

- name: name size: uniform(5..10) population: uniform(1..10)

Column metadata

- name: when cluster: uniform(20..500)

Column metadata

- name: what size: NORMAL(0..500,250,10)

Distributions

cassandra-stress print

Usage: print dist=DIST(?)

dist=DIST(?): A mathematical distribution EXP(min..max) EXTREME(min..max,shape) QEXTREME(min..max,shape,quantas) GAUSSIAN(min..max,stdvrng) GAUSSIAN(min..max,mean,stdev) UNIFORM(min..max) FIXED(val) SEQ(min..max)

Distributions

cassandra-stress print dist="UNIFORM(0..100)" % of samples Range % of total10.0 10 10.020.0 20 20.030.0 30 30.040.0 40 40.050.0 50 50.060.0 60 60.070.0 70 70.080.0 80 80.090.0 90 90.095.0 95 95.099.0 99 99.0100.0 100 100.0

Distributionscassandra-stress print dist="FIXED(100)" % of samples Range % of total10.0 100 100.020.0 100 100.030.0 100 100.040.0 100 100.050.0 100 100.060.0 100 100.070.0 100 100.080.0 100 100.090.0 100 100.095.0 100 100.099.0 100 100.0100.0 100 100.0

Distributionscassandra-stress print dist="FIXED(100)" % of samples Range % of total10.0 100 100.020.0 100 100.030.0 100 100.040.0 100 100.050.0 100 100.060.0 100 100.070.0 100 100.080.0 100 100.090.0 100 100.095.0 100 100.099.0 100 100.0100.0 100 100.0

Distributionscassandra-stress print "dist=NORMAL(0..500,250,10)" % of samples Range % of total10.0 237 47.420.0 241 48.230.0 244 48.840.0 247 49.450.0 250 50.060.0 252 50.470.0 255 51.080.0 258 51.690.0 262 52.495.0 266 53.299.0 273 54.6100.0 500 100.0

Column metadata

columnspec: - name: name size: uniform(5..10) # The names of the staff members are between 5-10 characters population: uniform(1..10) # 10 possible staff members to pick from

- name: when cluster: uniform(20..500) # Staff members do between 20 and 500 activities

- name: what size: normal(10..100,50)

Insertion of data

insert: # we only update a single partition in any given insert partitions: fixed(1)

# we want to insert a single row per partition and we have between 20 and 500 # rows per partition select: fixed(1)/500

batchtype: UNLOGGED

Queries

queries: events: cql: select * from staff_activities where name = ?

latest_event: cql: select * from staff_activities where name = ? LIMIT 1

Running

cassandra-stress user profile=./tlp-users.yaml

duration=1h

"ops(insert=1,latest_event=1,events=1)"

truncate=once

cl=LOCAL_QUORUM

Throughput mode (default)

● Keeps increasing the number of clients (threads) until no performance gain

Limit mode

● You specify a target TPS e.g. 10,000 writes / seconds● Replaced with fixed/throttle mode in trunk

Graphing

Graphing

Graphing - adding to existing graph

Other features

Daemon mode

client

Overriding protocol version

cassandra-stress -mode native protocolVersion=X

Coordinated omission

● Limit mode replaced with throttle and fixed mode

http://psy-lob-saw.blogspot.co.uk/2016/07/fixing-co-in-cstress.html

Summary

● Not very flexible for inserting data● No LWT support● Distributed mode limited

Questions?Christopher Batey

@chbatey

The Last Pickle

Coordinated omission

Coordinated omission

Coordinated omission