43
Financial Time Series Cassandra 1.2 Jake Luciani and Carl Yeksigian BlueMountain Capital

NYC* Big Tech Day 2013: Financial Time Series

Embed Size (px)

DESCRIPTION

A talk about how BlueMountain Capital utilizes Cassandra to store Financial data, with @tjake and @carlyeks.

Citation preview

Page 1: NYC* Big Tech Day 2013: Financial Time Series

Financial Time SeriesCassandra 1.2

Jake Luciani and Carl YeksigianBlueMountain Capital

Page 2: NYC* Big Tech Day 2013: Financial Time Series

Know your problem.

1000s of consumers..creating and reading data as fast as possible..consistent to all readers..and handle ad-hoc user queries..quickly..across datacenters.

Page 3: NYC* Big Tech Day 2013: Financial Time Series

Know your data.AAPL price

MSFT price

Page 4: NYC* Big Tech Day 2013: Financial Time Series

Know your queries.Time Series Query

start (10am)

end (2pm)

1 minute periods

Start, End, Periodicity defines query

Page 5: NYC* Big Tech Day 2013: Financial Time Series

Know your queries.Cross Section Query

As Of time defines the query

As Of Time (11am)

Page 6: NYC* Big Tech Day 2013: Financial Time Series

Know your queries.

● Cross sections are for random data● Storing for Cross Sections means thousands of

writes, inconsistent queries● We also need bitemporality, but it's hard, so let's

ignore it in the query

Page 7: NYC* Big Tech Day 2013: Financial Time Series

Know your users.

A million, billion writes per second..and reads are fast and happen at the same time..and we can answer everything consistently..and it scales to new use cases quickly..and it's all done yesterday

Page 8: NYC* Big Tech Day 2013: Financial Time Series

Since we can't optimize for everything.

Let's optimize for Time Series.

Page 9: NYC* Big Tech Day 2013: Financial Time Series

Data Model (in C* 1.1)

AAPL lastPrice:2013-03-18:2013-03-19 0E-34-88-FF-26-E3-2C

lastPrice:2013-03-19:2012-03-19

lastPrice:2013-03-19:2013-03-20

0E-34-88-FF-26-E3-3D

0E-34-88-FF-26-E3-4E

Page 10: NYC* Big Tech Day 2013: Financial Time Series

But we're using C* 1.2.CQL3

V-nodesJBOD

Pooled Decompression buffers

SSD Aware

Parallel CompactionOff-Heap Bloom Filters

Metrics!Concurrent Schema Creation

Page 11: NYC* Big Tech Day 2013: Financial Time Series

CREATE TABLE tsdata (id blob,property string,asof_ticks bigint,knowledge_ticks bigint,value blob,PRIMARY KEY(id,property,asof_ticks,knowledge_ticks)

)WITH COMPACT STORAGEAND CLUSTERING ORDER BY(asof_ticks DESC, knowledge_ticks DESC)

Data Model (CQL 3)

Page 12: NYC* Big Tech Day 2013: Financial Time Series

SELECT * FROM tsdataWHERE id = 0x12345AND property = 'lastPrice'AND asof_ticks >= 1234567890AND asof_ticks <= 2345678901

CQL3 Queries: Time Series

Page 13: NYC* Big Tech Day 2013: Financial Time Series

CQL3 Queries: Cross Section

SELECT * FROM tsdataWHERE id = 0x12345AND property = 'lastPrice'AND asof_ticks = 1234567890AND knowledge_ticks < 2345678901LIMIT 1

Page 14: NYC* Big Tech Day 2013: Financial Time Series

Data Overload!

All points between start and endEven though we have a periodicity

All knowledge timesEven though we only want latest

Page 15: NYC* Big Tech Day 2013: Financial Time Series

A Service, not an app

C*

Olympus

Olympus

Olym

pusOly

mpu

s

Olympus

Olympus Olympus

OlympusApp

App

App

App

App

App

App

App

App

App

Page 16: NYC* Big Tech Day 2013: Financial Time Series

Filtration

Filter everything by knowledge time

Filter time series by periodicity

200k points filtered down to 300

ServiceFilter

AAPL:lastPrice:2013-03-18:2013-03-19AAPL:lastPrice:2013-03-19:2013-03-19AAPL:lastPrice:2013-03-19:2013-03-20AAPL:lastPrice:2013-03-20:2013-03-20AAPL:lastPrice:2013-03-20:2013-03-21

AAPL:lastPrice:2013-03-18:2013-03-19AAPL:lastPrice:2013-03-19:2013-03-20AAPL:lastPrice:2013-03-20:2013-03-21Cassandra Reads

Page 17: NYC* Big Tech Day 2013: Financial Time Series

Pushdown Filters

● To provide periodicity on raw data, downsample on write

● There are still cases where we don't know how to sample

● This filtering should be pushed to C*● The coordinator node should apply a filter to the

result set

Page 18: NYC* Big Tech Day 2013: Financial Time Series

Complex Value Types

Not every value is a doubleSome values belong togetherBid and Ask should come back together

Page 19: NYC* Big Tech Day 2013: Financial Time Series

Thrift

Thrift structures as valuesTyped, extensible schemaUnion types give us a way to deserialize any type

Page 20: NYC* Big Tech Day 2013: Financial Time Series

Thrift: Union Types

https://gist.github.com/carlyeks/5199559

Page 21: NYC* Big Tech Day 2013: Financial Time Series

But that's the easy part...

Page 22: NYC* Big Tech Day 2013: Financial Time Series

Scaling...

The first rule of scaling is you do not just turn eveything to 11.

Page 23: NYC* Big Tech Day 2013: Financial Time Series

Scaling...

Step 1 - Fast Machines for your workloadStep 2 - Avoid Java GC for your workloadStep 3 - Tune Cassandra for your workloadStep 4 - Prefetch and cache for your workload

Page 24: NYC* Big Tech Day 2013: Financial Time Series

Can't fix what you can't measure

Riemann (http://riemann.io)Easily push application and system metrics into a single systemWe push 4k metrics per second to a single Riemann instance

Page 25: NYC* Big Tech Day 2013: Financial Time Series

Metrics: Riemann

Yammer Metrics with Riemann

https://gist.github.com/carlyeks/5199090

Page 26: NYC* Big Tech Day 2013: Financial Time Series

Metrics: Riemann

Push stream based metrics libraryRiemann Dash for Why is it Slow?

Graphite for Why was itSlow?

Page 27: NYC* Big Tech Day 2013: Financial Time Series

VisualVM-The greatest tool EVER

Many useful plugins...Just start jstatd on each server and go!

Page 28: NYC* Big Tech Day 2013: Financial Time Series

Scaling Reads: Machines

SSDs for hot dataJBOD configAs many cores as possible (> 16)10GbE networkBonded network cardsJumbo frames

Page 29: NYC* Big Tech Day 2013: Financial Time Series

JBOD is a lifesaver

SSDs are great until they aren't anymore

JBOD allowed passive recovery in the face of simultaneous disk failures (SSDs had a bad firmware)

Page 30: NYC* Big Tech Day 2013: Financial Time Series

Scaling Reads: JVM

-Xmx12G-Xmn1600M-XX:SurvivorRatio=16-XX:+UseCompressedOops

-XX:+UseTLAB yields ~15% Boost!(Thread local allocators, good for SEDA architectures)

JVM

Magic!

Page 31: NYC* Big Tech Day 2013: Financial Time Series

Scaling Reads: Cassandra

Changes we've made:● Configuration● Compaction● Compression● Pushdown Filters

Page 32: NYC* Big Tech Day 2013: Financial Time Series

Scaling Cassandra: Configuration

Hinted HandoffHHO single threaded, 100kb throttle

Page 33: NYC* Big Tech Day 2013: Financial Time Series

Scaling Cassandra: Configuration

memtable size2048mb, instead of 1/3 heap

We're using a 12gb heap; leaves enough room for memtables while the majority is left for reads and compaction.

Page 34: NYC* Big Tech Day 2013: Financial Time Series

Scaling Cassandra: Configuration

Half-Sync Half-Async serverNo thread dedicated to an idle connectionWe have a lot of idle connections

Page 35: NYC* Big Tech Day 2013: Financial Time Series

Scaling Cassandra: Configuration

Multithreaded compaction, 4 coresMore threads to compact means fastToo many threads means resource contention

Page 36: NYC* Big Tech Day 2013: Financial Time Series

Scaling Cassandra: Configuration

Disabled internode compressionCaused too much GC and Latency

On a 10GbE network, who needs compression?

Page 37: NYC* Big Tech Day 2013: Financial Time Series

Leveled Compaction

Wide rows means data can be spread across a huge number of SSTablesLeveled Compaction puts a bound on the worst case (*)Fewer SSTables to read means lower latency, as shown below; orange SSTables get read

L0

L1

L2

L3

L4

L5

* In Theory

Page 38: NYC* Big Tech Day 2013: Financial Time Series

Leveled CompactionBreaking Bad

L0

L1

L2

L3

L4

L5

Under high write load, forced to read all of the L0 files

Page 39: NYC* Big Tech Day 2013: Financial Time Series

Hybrid CompactionBreaking Better

L0

L1

L2

L3

L4

L5

{HybridCompaction

Size Tiered

Leveled

Size Tiering Level 0

Page 40: NYC* Big Tech Day 2013: Financial Time Series

Better Compression:New LZ4Compressor

LZ4 Compression is 40% faster than Google's Snappy...

LZ4 JNI

Snappy JNI

LZ4 Sun Unsafe

Blocks in Cassandra are so small we don't see the same in production but the 95% latency is improved and it works with Java 7

Page 41: NYC* Big Tech Day 2013: Financial Time Series

CRC Check Chance

CRC check of each compressed block causes reads to be 2x SLOWER.Lowered crc_check_chance to 10% of reads.

A move to JNI would cause a 30x boost

Page 42: NYC* Big Tech Day 2013: Financial Time Series

Current Stats

● 12 nodes● 2 DataCenters● RF=6● 150k Writes/sec at EACH_QUORUM● 100k Reads/sec at LOCAL_QUORUM● > 6 Billion points (without replication)● 2TB on disk (compressed) ● Read Latency 50%/95% is 1ms/10ms

Page 43: NYC* Big Tech Day 2013: Financial Time Series

Questions?

Thank you!

@tjake and @carlyeks