18
A highly scalable, eventually consistent, distributed, structured key-value store. How we use it @ Frugal Mechanic Eric Peters (@ericpeters) STS 3- 10-10

Cassandra Seattle Tech Startups 3-10-10

Embed Size (px)

DESCRIPTION

Slidedeck from my presentation at the Seattle Tech Startups meetup on 3-10-10 for how Frugal Mechanic uses Cassandra

Citation preview

Page 1: Cassandra Seattle Tech Startups 3-10-10

A highly scalable, eventually consistent, distributed, structured key-value store.

How we use it @ Frugal Mechanic

Eric Peters (@ericpeters) STS 3-10-10

Page 2: Cassandra Seattle Tech Startups 3-10-10

About

• Search 2.5M unique Auto Parts, fitting 250M car configurations for over 100 retailers

Data Data Data, We Need To:• Quickly Process More than 50 Feeds & Data Sources• Support 10M source-specific SKUs• Handle 300M SKU Part Fitments• Be flexible with new columns• Store and persist raw data before we cherry pick which

data to use

Eric Peters (@ericpeters) STS 3-10-10

Page 3: Cassandra Seattle Tech Startups 3-10-10

Who Uses Cassandra?

Eric Peters (@ericpeters) STS 3-10-10

Page 4: Cassandra Seattle Tech Startups 3-10-10

Cassandra Design Goals

• High availability• Eventual consistency

– trade-off strong consistency in favor of high availability

• Incremental scalability• Optimistic Replication• “Knobs” to tune tradeoffs between consistency,

durability and latency• Low total cost of ownership• Minimal administration

Slide “borrowed” from Avinash Lakshman

Eric Peters (@ericpeters) STS 3-10-10

Page 5: Cassandra Seattle Tech Startups 3-10-10

Cassandra write properties

• No reads

• No seeks

• Fast

• Atomic within ColumnFamily

• Always writable

Eric Peters (@ericpeters) STS 3-10-10 Slide “borrowed” from Avinash Lakshman

Page 6: Cassandra Seattle Tech Startups 3-10-10

Cassandra read properties

• Read multiple SSTables

• Slower than writes (but still fast)

• Seeks can be mitigated with more RAM

• Scales to billions of rows

Eric Peters (@ericpeters) STS 3-10-10 Slide “borrowed” from Avinash Lakshman

Page 7: Cassandra Seattle Tech Startups 3-10-10

MySQL Comparison

• MySQL > 50 GB Data Writes Average : ~300 msReads Average : ~350 ms

• Cassandra > 50 GB DataWrites Average : 0.12 msReads Average : 15 ms

Slide “borrowed” from Avinash LakshmanEric Peters (@ericpeters) STS 3-10-10

Page 8: Cassandra Seattle Tech Startups 3-10-10

ColumnFamilies

{ // this is a column name: "emailAddress", value: ”[email protected]", timestamp: 123456789 }

Examples from http://www.slideshare.net/jbellis/cassandra-open-source-bigtable-dynamoEric Peters (@ericpeters) STS 3-10-10

Page 9: Cassandra Seattle Tech Startups 3-10-10

Super ColumnFamilies

{ // this is a SuperColumn name: "homeAddress”, // with an infinite list of Columns value: { // note the keys is the name of the Column street: {name: "street", value: "1234 x street", timestamp: 123456789}, city: {name: "city", value: "san francisco", timestamp: 123456789}, zip: {name: "zip", value: "94107", timestamp: 123456789}, } }

Examples from http://www.slideshare.net/jbellis/cassandra-open-source-bigtable-dynamoEric Peters (@ericpeters) STS 3-10-10

Page 10: Cassandra Seattle Tech Startups 3-10-10

JSON Column ExampleUserProfile = { // this is a ColumnFamily

phatduckk: { // this is the key to this Row inside the CF

// now we have an infinite # of columns in this row

username: "phatduckk",

email: "[email protected]",

phone: "(900) 976-6666”

}, // end row

ieure: { // this is the key to another row in the CF

// now we have another infinite # of columns in this row

username: "ieure”,

email: "[email protected]",

phone: "(888) 555-1212”,

age: "66",

gender: "undecided”

},

}

Examples from: http://arin.s3.amazonaws.com/pub/docs/WTF-is-a-SuperColumn.pdfEric Peters (@ericpeters) STS 3-10-10

Page 11: Cassandra Seattle Tech Startups 3-10-10

JSON Super Column ExampleAddressBook = { // this is a ColumnFamily of type Super

phatduckk: { // this is the key to this row inside the Super CF

// the key here is the name of the owner of the address book

// now we have an infinite # of super columns in this row

// the keys inside the row are the names for the SuperColumns

// each of these SuperColumns is an address book entry

friend1: {street: "8th street", zip: "90210", city: "Beverley Hills", state: "CA"},

// this is the address book entry for John in phatduckk's address book

John: {street: "Howard street", zip: "94404", city: "FC", state: "CA"},

Kim: {street: "X street", zip: "87876", city: "Balls", state: "VA"},

// we can have an infinite # of ScuperColumns (aka address book entries)

}, // end row

ieure: { // this is the key to another row in the Super CF

// all the address book entries for ieure

joey: {street: "A ave", zip: "55485", city: "Hell", state: "NV"},

William: {street: "Armpit Dr", zip: "93301", city: "Bakersfield", state: "CA"},

},

}

Examples from: http://arin.s3.amazonaws.com/pub/docs/WTF-is-a-SuperColumn.pdfEric Peters (@ericpeters) STS 3-10-10

Page 12: Cassandra Seattle Tech Startups 3-10-10

Why Cassandra?

• It offers column-oriented data storage, so you have a bit more structure than plain key/value stores.

• Fast! Writes (.12ms) (300M == 10hrs)• Written in Java + Apache Foundation

Project• People smarter than us are using it to solve

even bigger problems than ours, if they can scale it, we will be able to

Eric Peters (@ericpeters) STS 3-10-10

Page 13: Cassandra Seattle Tech Startups 3-10-10

Frugal Mechanic’s Data

Eric Peters (@ericpeters) STS 3-10-10

Page 14: Cassandra Seattle Tech Startups 3-10-10

Modeling Part Informationcassandra> get FrugalMechanic.RawParts['amazon/b0002jmuwk']

=> (column=thumb_image_url, value=http://ecx.images-amazon.com/images/I/31JQJNSAV6L._SL75_.jpg, timestamp=1264701499346)

=> (column=sku/FA1632, value=manufacturer, timestamp=1264701499346)

=> (column=sku/B0002JMUWK, value=asin, timestamp=1266339317829)

=> (column=name, value=Motorcraft FA1632 Air Filter, timestamp=1264701499346)

=> (column=large_image_url, value=http://ecx.images-amazon.com/images/I/31JQJNSAV6L.jpg, timestamp=1264701499346)

=> (column=description, value=Motorcraft Air Filter is designed to filter outside air that enters the vehicle. It is manufactured from leak proof polyurethane seals. This air filter chemically treats dry type cleaner elements to withstand damage from oil and moisture. It is resistant to temperature extremes and has a 98.5% efficiency standard. This air filter is treated to enhance capacity and efficiency as well as facilitates hassle free installation. It is backed by a 12 month warranty.<br/><br/>

Features:<br />

<ul>

<li>Efficiently filters outside air</li>

<li>Withstands damage from oil and moisture</li>

<li>Easy to install</li>

<li>12 months warranty</li>

<li>Leak proof</li>

</ul>

, timestamp=1264701499346)

=> (column=category/name, value=Automotive / Categories / Replacement Parts / Filters / Air Filters & Accessories / Air Filters, timestamp=1264701499346)

=> (column=category/browsenodeid, value=15727081,15727321, timestamp=1264701499346)

=> (column=cassandra_write_date, value=2010-02-16T16:55:17.809Z, timestamp=1266339317813)

=> (column=brand/name, value=Motorcraft, timestamp=1264701499346)

=> (column=brand/manufacturer, value=Motorcraft, timestamp=1264701499346)

Returned 11 results.

Eric Peters (@ericpeters) STS 3-10-10

Page 15: Cassandra Seattle Tech Startups 3-10-10

Modeling Part Pricescassandra> get FrugalMechanic.RawPartPrices['amazon/b0002jmuwk']

=> (column=ATVPDKIKX0DER, value={"site":"ATVPDKIKX0DER","price":"13.45","buyUrlVar1":"B0002JMUWK","buyUrlVar2":"ATVPDKIKX0DER","updatedOn":"2010-01-28T17:58:19.346Z"}, timestamp=1264701499346)

=> (column=AOMQHH38LHK76, value={"site":"AOMQHH38LHK76","price":"8.64","buyUrlVar1":"B0002JMUWK","buyUrlVar2":"AOMQHH38LHK76","updatedOn":"2010-01-28T17:58:19.346Z"}, timestamp=1264701499346)

=> (column=ADG953YR6NRBF, value={"site":"ADG953YR6NRBF","price":"16.5","buyUrlVar1":"B0002JMUWK","buyUrlVar2":"ADG953YR6NRBF","updatedOn":"2010-01-28T17:58:19.346Z"}, timestamp=1264701499346)

=> (column=A8F3HAQ1FDLH8, value={"site":"A8F3HAQ1FDLH8","price":"12.94","buyUrlVar1":"B0002JMUWK","buyUrlVar2":"A8F3HAQ1FDLH8","updatedOn":"2010-01-28T17:58:19.346Z"}, timestamp=1264701499346)

=> (column=A3TW1WCPSO49LP, value={"site":"A3TW1WCPSO49LP","price":"19.72","buyUrlVar1":"B0002JMUWK","buyUrlVar2":"A3TW1WCPSO49LP","updatedOn":"2010-01-28T17:58:19.346Z"}, timestamp=1264701499346)

=> (column=A3NMYM0J8WG63N, value={"price":"$18.14","buyUrlVar2":"A3NMYM0J8WG63N","buyUrlVar1":"B0002JMUWK"}, timestamp=1260473908931)

=> (column=A1DPIC5NQU31S0, value={"site":"A1DPIC5NQU31S0","price":"16.13","buyUrlVar1":"B0002JMUWK","buyUrlVar2":"A1DPIC5NQU31S0","updatedOn":"2010-01-28T17:58:19.346Z"}, timestamp=1264701499346)

=> (column=A1ATZ3MAARQNEF, value={"site":"A1ATZ3MAARQNEF","price":"16.0","buyUrlVar1":"B0002JMUWK","buyUrlVar2":"A1ATZ3MAARQNEF","updatedOn":"2010-01-28T17:58:19.346Z"}, timestamp=1264701499346)

Returned 8 results.

cassandra>

Page 16: Cassandra Seattle Tech Startups 3-10-10

Modeling Part Fitmentscassandra> get FrugalMechanic.RawPartFitments['amazon/b0002jmuwk']

=> (column={"year":"2009","make":"Ford","model":"F-150","engine":"5.4L V8","notes":"TYPE: 269 - HEIGHT: 7.81 - OUTSIDE: 4.26B - INSIDE: 6.10T"}, value=1, timestamp=1266339317864)

=> (column={"year":"2009","make":"Ford","model":"F-150","engine":"4.6L V8","notes":"TYPE: 269 - HEIGHT: 7.81 - OUTSIDE: 4.26B - INSIDE: 6.10T"}, value=1, timestamp=1266339317862)

...

=> (column={"year":"1997","make":"Ford","model":"E-250 Econoline","engine":"5.4L V8 CNG","notes":"GAS ENG"}, value=1, timestamp=1266339317860)

=> (column={"year":"1997","make":"Ford","model":"E-150 Econoline","engine":"5.4L V8","notes":"All"}, value=1, timestamp=1266339317860)

=> (column={"year":"1997","make":"Ford","model":"E-150 Econoline Club Wagon","engine":"5.4L V8","notes":"All"}, value=1, timestamp=1266339317860)

=> (column={"year":"1996-1999","make":"Ford","model":"All","engine":"4.6L V8 DOHC","notes":"All"}, value=1, timestamp=1266339317860)

Returned 200 results.

cassandra>

Eric Peters (@ericpeters) STS 3-10-10

Page 17: Cassandra Seattle Tech Startups 3-10-10

Great Resources

• NoSQL West Intro: http://cloudera-todd.s3.amazonaws.com/nosql.pdf (Video: http://www.vimeo.com/5145059)

• Cassandra Talk (Rackspace): Vid+PPT: http://www.parleys.com/#st=5&id=1866• Cassandra Talk (Facebook): PPT:

http://static.last.fm/johan/nosql-20090611/cassandra_nosql.ppt Video: http://vimeo.com/5185526

• Cassandra Talk (Digg): http://nosql.mypopescu.com/post/334198583/presentation-cassandra-in-production-digg-arin

• WTF is a Super Column: http://arin.s3.amazonaws.com/pub/docs/WTF-is-a-SuperColumn.pdf

• Get Up and Running w/Cassandra: http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/

Eric Peters (@ericpeters) STS 3-10-10

Page 18: Cassandra Seattle Tech Startups 3-10-10

Questions?

Eric Peters (@ericpeters) STS 3-10-10