43
CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010

CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 2: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

In the beginning, we had paper filing systemsLibrary == database, card catalog == index

Page 3: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Then came mainframes and hierarchical (ISAM) storageISAM == Indexed Sequential Access Method

Page 4: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Now we are living in a relational worldDefault tool for most engineers for most jobs

Page 5: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

What’s wrong with RDMBS? Served us well for a long timeToo often are default tool without thinking about the constraints

Page 6: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Consistency is king

Favors ACID consistency above all other considerationsDoesn’t allow for CAP tradeoffs (by design)

Page 7: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Difficult to scale

3NF makes it difficult to partition data across physical instancesJOINs have to be denormalized in order to shard, losing consistency

Page 8: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Complex Implementations

General case architecture makes it slow, many features not needed for a given appOperations overhead is very high, most complex software in most apps

Page 9: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Consistency

Availability

Partition-Tolerance

you can pick twoframework for understanding tradeoffs in distributed system design

Page 10: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

RDBMS* = CA

* Using asynchronous replication

RDBMS forces you to give up partition-toleranceThis is a bad choice for large distributed systems

Page 11: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

What if I need something else?

Page 12: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index
Page 13: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Dynamo + BigTable = Cassandra

Takes AP/architecture from Dynamo and column-orientation/access patterns from BigTable

Page 14: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Originally created by Facebook, dumped source, abandoned itNow top-level Apache project

Page 15: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Pros

Page 16: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Column-oriented

Data is stored by ColumnFamilyMakes it fast for querying, better able to be compressed

Page 17: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

No SPoF

All nodes are equal, all service queriesNode will proxy to other node if asking for data it doesn’t have

Page 18: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Auto Scaling(fo realz)

Just add more nodes with same configNew nodes request data from old ones and split ring amongst themselves

Page 19: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Read Repair

Will choose replica with latest timestamp in case of conflict (only when R > 1)Requires that client clocks be synchronized to work right

Page 20: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

R + W > N

Tunable consistencyif R + W > N, strong(ish) consistency, else eventual consistency

Page 21: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Location-aware

Will take into account rack and datacenter location if givenUsed to balance replicas for robustness

Page 22: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Gossip

Uses gossip protocols to bring nodes back up to speed with the current stateDetects failures of its peers constantly

Page 23: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Anti-entropy

Disks fail, uses Merkle trees to determine bad instancesHelps Cassandra choose most correct replica in cases of conflict

Page 24: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Fast Reads

lg(n) access time as dataset growsBloom filters help reduce disk access

Page 25: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Super Fast Writes

LSM (append-only) structure of SSTable makes writes really, really cheap

Page 26: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Cons

Page 27: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

No JOINs

Need to denormalize data to get use out of Cassandra

Page 28: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

No (automatic) Indices

No secondary indices maintained by CassandraNeed to create and maintain your own

Page 29: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Thrift

Thrift doesn’t currently allow streamingSomewhat hard to work with

Page 30: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Datamodel

Takes some getting used toResembles a 4- or 5-dimensional map

Page 31: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Keyspace ~= Schema

Namespace, doesn’t carry any schema metadataOne keyspace per app, usually

Page 32: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

ColumnFamily ~= Table

Think of it like: SortedMap<String, Row>

Page 33: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Row ~= Row

Again, best thought of as: SortedMap<Name, Value>Rows can have many columns (or supercolumns)

Page 34: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Column ~= Column

Contains name, value and last-update timestamp

Page 35: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

SuperColumn ~=Column of Columns

SuperColumn is a SortedMap<Name, Value>, where value is a ColumnSet ColumnFamily to use Columns or SuperColumns in config

Page 36: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Applications

• Logging

• Output of batch jobs

• Social network data

• Search engine

Page 37: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index
Page 38: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index
Page 39: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index
Page 40: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index
Page 41: CASSANDRA · 2013-07-10 · CASSANDRA Your New Best Girl Toby DiPasquale | Philly ETE 2010. In the beginning, we had paper filing systems Library == database, card catalog == index

Thanks!