HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

Preview:

Citation preview

Content Addressable

Storagesfor fun and profit.

Berk D. Demir@bddemir

I break and fix things at @StumbleUpon.

Problem.

Serve lots of static assetswith low latency

and high availability.

Understanding data.

A lot.Small.Frequent.

100 million.19 kilobytes.

Updates.

BLOBs don’t change. They get replaced.We want to keep all without duplicating.

Content Addressable

Store

Store immutable content and authoritatively address it with

a cryptographic hash.

We had ideas.

Very bad ideas.

Very bad.Shared Storage, i.e., NFS.

Bad ideas.

Bad.AWS S3, RS Cloud Files, ...

Distributed: AFS, GlusterHDFS (Oh my!)

Bad ideas.Take 2

o_O

Write a distributed, fault tolerant, replicating, multi datacenter, fast, CAS for

BLOBs.

Reimplementing a lot of things is generally not a

good sign.

Reuse.Don’t reimplement.

HBaseDistributed,

Fault tolerant,Replicating,

Multi datacenter,Fast.

Immutable rows with compact keys, separated into different

column families based on their access patterns.

m: d:

MD5 16 bytes(SHA-1 20 bytes)

Metadata9 bytes

BLOBMany bytes

One table to rule them all.

MAX_FILESIZE => 20G,VERSION => 1,

BLOCKCACHE => true,BLOOMFILTER => ROW

Pre-split into 512 regions at table creation time.

Scala,Finagle,

asynchbase,Varnish

HTTP has a lot to offer.

VerbsGET HEAD PUT DELETE

GET /KwIEec5utYGrKmzXYLgFzg HTTP/1.1Host: b9.sustatic.com

HeadersCache-Control: max-age=<1 year>Last-Modified: <cell timestamp>Content-MD5: <row key: base64>

Content-Disposition: attachment; filename=su.xpi

HBase and HTTP are the perfect tools to build

simple, reliable, fast data services.

Get excited and build things!

Thanks.

Like the design of this slide deck?

Direct your positive feedback to Coda Hale (@coda)