The Art of Big Data

Krishna Sankar, http://doubleclix.wordpress.com

EC4000–PhD Guest Seminar, Naval Post Graduate School

Nov 29,2011

The road lies plain before me;--'tis a theme

Single and of determined bounds; …

- Wordsworth, The Prelude

What is Big Data ?

Big Data to smart data

Big Data Pipeline

Analytic Algorithms

Storage - NOSQL

Processing - Hadoop …

Analytics/Modeling

R

Visualization

o  Agenda o  To cover the broad

picture o  Understand the

waypoints & o  Drill down into one

area (NOSQL) o  Can do others later

…

o  Of the Big Data domain …

Thanks to … The giants whose shoulders I am

standing on

Special Thanks to: Peter Ateshian, NPS

Prof Murali Tummala, NPS Shirley Bailes,O’Reilly Ed Dumbill,O’Reilly

Jeff Barr,AWS Jenny Kohr Chynoweth,AWS

When I think of my own native land, In a moment I seem to be there;

But, alas! recollection at hand Soon hurries me back to despair.

- Cowper, The Solitude Of Alexander SelKirk

What is Big Data ? “Big data” is data that becomes large enough that it cannot be processed using conventional methods. @twitter

Ref: hIp://radar.oreilly.com/2010/09/the-‐smaq-‐stack-‐for-‐big-‐data.html

“Big data” is less about size, more

about flow & velocity - persisting

petabytes per year is easier than

processing terabytes per hour. @twitter

What is Big Data ?

Ref: hIp://www.ciol.com/News/News/News-‐Reports/Vinod-‐Khosla%E2%80%99s-‐cool-‐dozen-‐tech-‐innovaXons/156307/0/ hIp://yourstory.in/2011/11/vinod-‐khoslas-‐keynote-‐at-‐nasscom-‐product-‐conclave-‐reject-‐punditry-‐believe-‐in-‐an-‐idea-‐take-‐risk-‐and-‐succeed/

Vinod Khosla’s Cool Dozen !①  Consumers : “Widespread innovation in technologies that reduce data overload for

users” ~ Data Reduction ②  Businesses : “Simple solutions to handle the deluge of data generated from various

sources …” ~ Big Data Analytics TV 2.0, EducaXon, Social NEXT,Tools for sharing inteerst,Publishing,…

①  Volume o  Scale

②  Velocity o  Data change rate vs. decision window

③  Variety o  Different sources & formats o  Structured vs. Unstructured

④  Variability o  Breadth of interpreta<on & o  Depth of analy<cs

⑤  Contextual o  Dynamic variability o  RecommendaXon

⑥  Connectedness

EBC322

hIp://doubleclix.wordpress.com/2011/09/13/when-‐is-‐big-‐data-‐really-‐big-‐data/ hIp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf







EBC322








EBC322








EBC322








EBC322


I.  Two Main Types – based on collecXon i.  Big Data Streams

o  Data in “moXon” o  TwiIer fire hose, Facebook, G+

ii.  Big Data Logs o  Data “at rest” o  Logs, DW, external market data, POS, …

II.  Typically, Big Data has a non-‐determinisXc angle as well … o  CreaXve Discovery o  IteraXve, Model based AnalyXcs o  Explore quesXons to ask

III.  Smart Data = Big Data + context + embedded/interacXve (inference, reasoning) models o  Model Driven o  DeclaraXvely InteracXve

hIp://www.slideshare.net/leonsp/hadoop-‐slides-‐11-‐what-‐is-‐big-‐data hIp://www.slideshare.net/Dataversity/wed-‐1550-‐bacvanskivladimircolor

Twitter §  200 million tweets/day §  Peak 10,000/second §  How would you handle the fire

hose for social network analytics ?

hIp://goo.gl/dcBsQ

Storage §  4 U box = 40 TB, §  1 PB = 25 boxes !

Zynga §  “Analytics company, not a

gaming company!” §  Harvests data : 15 TB/day

§  Test new features §  Target advertising

§  230 million players/month

AWS – 600 Billion objects!

•  6 Billion Messages per day

•  2 PB (w/compression) online

•  6 PB w/ replicaXon •  250 TB/Month growth •  HBase Infrastructure

Ref: hIp://www.hpts.ws/sessions/2011HPTS-‐TomFastner.pdf

Path Analysis A/B TesXng

50 TB/Day 240 nodes, 84 PB Teradata InstallaXon

Very systemaXc Diagram speaks volumes!

•  “… they didn’t need a genius, … but build the world’s most impressive dileIante … baIling the efficient human mind with spectacular flamboyant inefficiency” – Final Jeopardy by Stephen Baker

•  15 TB memory, across 90 IBM 760 servers, in 10 racks •  1 TB of dataset •  200 Million pages processed by Hadoop •  This is a good example of Connected data

–  Contextual w/ variability –  Breath of interpretaXon –  AnalyXcs depth

hIp://doubleclix.wordpress.com/2011/03/01/the-‐educaXon-‐of-‐a-‐machine-‐%E2%80%93-‐review-‐of-‐book-‐%E2%80%9Cfinal-‐jeopardy%E2%80%9D-‐by-‐stephen-‐baker/ hIp://doubleclix.wordpress.com/2011/02/17/watson-‐at-‐jeopardy-‐a-‐race-‐of-‐machines/

Storage

Parallelism

Inference

NOSQL

HPC

Map/Reduce

Object Store

Block Store

AnalyXcs

Web AnalyXcs

Log AnalyXcs

Social Media

Social Graph

Knowledge Graph

Distributed ApplicaXons

Warehouse-‐style ApplicaXons

RecommendaXon/Inference Engines Machine Learning

ClassificaXon, Clustering

Search, Indexing

Mahout

Cloud Architecture

Big Data

Big Data to Smart Data

“A towel is about the most massively useful thing an interstellar hitchhiker can have … any man who can hitch the length and breadth of the Galaxy, rough it … win through, and still know where his towel is, is clearly a man to be reckoned with.”

- From The Hitchhiker's Guide to the Galaxy, by Douglas Adams. Published by Harmony Books in 1979

Big data to smart data •  summary

1 Don’t throw away any data !

2 Be ready for different ways of organizing the data

h;p://goo.gl/fGw7r

Big Data Pipeline

If a problem has no solution, it is not a problem, but a fact, not to be solved but to be coped with, over time …

- Peres’s Law

Big Data Pipeline •  Stages

o  Collect o  Store o  Transform & Analyze o  Model & Reason o  Predict, Recommend & Visualize

•  Different systems have different characteristics o  Infrastructure optimization based in application/hardware

attributes correlation (short term) •  Hadoop, Splunk, internal Dashboard

o  Application performance trends (medium term) •  Analytics, Modeling,…

o  Product Metrics •  Feature set vs. usage, what is important to users, stratification •  Modeling using R, Visualization layers like Tableau

Volume

Velocity

Variety

Variability

Connectedness

Context

Model

Infer-ability

Big Data Pipeline

Decomplexify! Contextualize! Network! Reason! Infer!

Logs, Scribe, Flume, Hadoop…

SQL NOSQL, HDFS, XML, <iles, …

SQL, BI Tools, Hadoop, Pig, Hive, .NET Dryad, Various other tools

Hand coded Programs, R, Mahout, …

Internal dashboards, Tableau

Ref:h;p:goo.gl/Mm83k

The NOSQL !

I AM monarch of all I survey; My right there is none to dispute;

From the centre all round to the sea I am lord of the fowl and the brute


Build to Fail - “It is working” is not binary

Agenda •  Opening Gambit

–  NOSQL : Toil, Tears & Sweat ! •  The Pragmas

–  ABCs of NOSQL [ACID, BASE & CAP] •  The Mechanics

–  Algorithmics & Mechanisms (For reference)

Referenced Links @ http://doubleclix.wordpress.com/2010/06/20/nosql-talk-references/

What is NOSQL Anyway ?

•  NOSQL != NoSQL or NOSQL != (!SQL) •  NOSQL = Not Only SQL •  Can be traced back to Eric Evans[2]!

–  You can ask him during the ayernoon session! •  Unfortunate Name, but is stuck now •  Non RelaXonal could have been beIer •  Usually OperaXonal, Definitely Distributed •  NOSQL has certain semanXcs – need not stay that way

Key Value Column Document Graph

Ref: [22,51,52]

NOSQL

Neo4j

FlockDB

InfiniteGraph

CouchDB

MongoDB

Lotus Domino

Riak

Google BigTable

HBase

Cassandra

HyperTable

In-‐memory

Disk Based

SimpleDB

Memcached

Redis

Tokyo Cabinet

Dynamo

Voldemort Azure TS

WHAT WORKS NOSQL Tales from the field

When I think of my own native land, In a moment I seem to be there;

But, alas! recollection at hand Soon hurries me back to despair.


•  Designer Augmenting RDBMS with a Distributed key Value Store[40 : A good talk by Geir]

•  Invitation only designer brand sales •  Limited inventory sales – start at 12:00, members have

10 min to grab them. 500K mails every day •  Keeps brand value, hidden from search •  Interesting load properties •  Each item a row in DB-BUY NOW reserves it

–  Can't order more •  Started out as a Rails app

–  shared nothing

•  Narrow peaks – half of revenue

Christian Louboutin Effect

•  ½ amz for Louboutin •  Use Voldemort •  Inventory, Shopping Cart,

Checkout •  Partition by prod ID •  Shared infrastructure – “fog”

not “cloud’ - Joyent! •  In-memory inventory •  Not afraid of sale anymore!

And SQL DBs are still relevant !

Typical NOSQL Example Bit.ly •  Bit,ly URL shortening service, uses MongoDB •  User, title, URL, hash, labels[I-5], sort by time •  Scale – ~50M users, ~10K concurrent, ~1.25B shortens

per month •  Criteria:

–  Simple, Zippy FAST, Very Flexible, Reasonable Durability, Low cost of ownership

•  Sharded by userid

•  New kind of “dictionary” a word repository, GPS for English – context, pronunciations, twitter … developer API

•  Characteristics[I-6,Tony Tam’s presentation] –  RO-centric, 10,000 reads for every write –  Hit a wall with MySQL (4B rows) –  MongoDB read was so good that memcached layer was not

required –  MongoDB used 4 times MySQL storage

•  Another example : –  Voldemort – Unified Communications, IP-Phone data stored

keyed off of phone number. Data relatively stable

Large Hadron Collider@CERN •  DAS is part of giant data management

enterprise (cms) –  Polygot Persistence (SQL + NOSQL, Mongo, Couch,

memcache, HDFS, Luster, Oracle, mySQL, …) •  Data Aggregation System [I-1,I-2,I-3,I-4]

–  Uses MongoDB –  Distributed Model, 2-6 pb data –  Combine info. from different metadata sources, query

without knowing their existence, user has domain knowledge – but shouldn’t deal with various formats, interfaces and query semantics

–  DAS aggregates, caches and presents data as JSON documents – preserving security & integrity

And SQL DBs are still relevant !

Scaling Twitter • 

•  Digg –  RDBMS places burden on reads than writes[I-8] –  Looked at NOSQL, selected Cassandra

•  Colum oriented, so more structure than key-value

•  Heard from noSQL Boston[http://twitter.com/#search?q=%23nosqllive] –  Baidu: 120 node HyperTable cluster managing

600TB of data –  StumbleUpon uses HBase for Analytics –  Twitter’s Current Cassandra cluster: 45 nodes

•  Adob is a HBase shop[I-10,I-11,2]

•  Adobe SaaS Infrastructure – tagging, content aggregation, search, storage and so forth

•  Dynamic schema & huge number of records[I-5]

•  40 million records in 2008 to 1 billion with 50 ms response

•  NOSQL not mature in 2008, now good enough

•  Prod Analytics:40 nodes, largest has 100 nodes

•  BBC is a CouchDB shop[I-13]

•  Sweet spot: •  Multi-master, multi

datacenter replication

•  Interactive Mediums •  Old data to CouchDB •  Thus free up DB to do

work!

•  Cloudkick is a Cassandra shop[I-12] •  Cloudkick offers cloud management services •  Store metrics data •  Linear scalability for write load •  Massive write performance

•  Memory table & serial commit log •  Low operational costs •  Data Structure

–  Metrics, Rolled-up data, Statuses at time slice : all indexed by timestamp

•  Guardian/UK –  Runs on Redis[I-14] ! –  “Long-term The Guardian is looking

towards the adoption of a schema-free database to sit alongside its Oracle database and is investigating CouchDB. … the relational database is now just a component in the overall data management story, alongside data caching, data stores, search engines etc.

–  NOSQL can increase performance of relational data by offloading specific data and tasks

And SQL DBs are still relevant ! "The evil that SQL DBs do lives after them; the good is oft interred with their bones...",

NOSQL at Netflix •  Netflix is fully in the cloud •  Uses NOSQL across the globe •  Customer Profiles, watchlog, usage logging (see next

slide) –  No multi-record locking

•  No DBA ! •  Easier Schema Changes •  Less complex, Highly Available data store •  Joins happen in the applications

http://www.hpts.ws/sessions/nosql-ecosystem.pdf http://www.hpts.ws/sessions/GlobalNetflixHPTS.pdf

21 NOSQL Themes •  Web Scale •  Scale Incrementally/conXnuous growth •  Oddly shaped & exponenXally connected •  Structure data as it will be used – i.e. read, query •  Know your queries/updates in advance[96], but you can change

them later •  Compute aIributes at run Xme •  Create a few large enXXes with opXonal parts

–  NormalizaXon creates many small enXXes •  Define Schemas in models (not in databases) •  Avoid impedance mismatch •  Narrow down & solve your core problem •  Solve the right problem with the right tool

Ref: [I-‐8]

21 NOSQL Themes •  ExisXng soluXons are clunky[1] (in certain situaXons) •  Scale automaXcally, “becoming prohibiXvely costly (in

terms of manpower) to operate” TwiIer[I-‐9] •  DistribuXon & parXXoning are built-‐in NOSQL

•  RDBMS distribuXon & sharding not fun and is expensive –  Lose most funcXonality along the way

•  Data at the center, Flexible schema, Less joins •  The value of NOSQL is in flexibility as much as it is in “Big

Data”

21 NOSQL Themes •  Requirements[3]

–  Data will not fit in one node •  And so need data parXXon/distribuXon by the system

–  Nodes will fail, but data needs to be safe – replicaXon! –  Low latency for real-‐Xme use

•  Data Locality –  Row based structures will need to read whole row, even for a column

–  Column based structures need to scan for each row •  SoluXon : Column storage with Locality

–  Keep data that is read together, don’t read what you don’t care •  For example friends – other data

Ref: 3

ABCs of NOSQL -

ACID, BASE &

CAP

The woods are lovely, dark, and deep, But I have promises to keep,

And miles to go before I sleep, And miles to go before I sleep.

-Frost

CAP Principle

Consistency

Availability Partition

“CAP Principle → Strong Consistency, High Availability, Par::on-‐resilience:

Pick at most 2”[37]

Which feature to discard depends on the nature of your system[41]

CAP Principle

Consistency



Pick at most 2”[37] C-‐A No P → Single DB server, no network par::on


CAP Principle

Consistency



Pick at most 2”[37] C-‐P No A → Block transac:on in case of par::on failure


CAP Principle

Consistency



Pick at most 2”[37] A-‐P No C → Expira:on based caching, vo:ng majority

Interesting (& controversial) from NOSQL perspective

ABCs of NOSQL •  ACID

o  Atomicity, Consistency, IsolaXon & Durability – fundamental properXes of SQL DBMS

•  BASE[35,39] o  Basically Available Soy state(Scalable) Eventually Consistent

•  CAP[36,39] o  Consistency, Availability & ParXXoning o  This C is ~A+C

•  i.e. Atomic Consistency[36]

ACID •  Atomicity

o  All or nothing •  Consistent

o  From one consistent state to another •  e.g. ReferenXal Integrity

o  But it is also applicaXon dependent on •  e.g. min account balance •  Predicates, invariants,…

•  IsolaXon •  Durability

CAP Pragmas •  PrecondiXons

o  The domain is scalable web apps o  Low Latency For real Xme use o  A small sub-‐set of SQL FuncXonality o  Horizontal Scaling

•  PritcheI[35] talks about relaxing consistency across funcXonal groups than within funcXonal groups

•  Idempotency to consider o  Updates inc/dec are rarely idempotent o  Order preserving trx are not idempotent either o  MVCC is an answer for this (CouchDB)

Consistency

•  Strict Consistency o Any read on Data X will return the most recent write on X[42]

•  SequenXal Consistency o Maintains sequenXal order from mulXple processes (No menXon of Xme)

•  Linearizability o Add Xmestamp from loosely synchronized processes

Consistency •  Write availability, not read availability[44] •  Even load distribuXon is easier in eventually consistent systems

•  MulX-‐data center support is easier in eventually consistent systems

•  Some problems are not solvable with eventually consistent systems

•  Code is someXmes simpler to write in strongly consistent systems

CAP EssenXals – 1 of 3 •  “CAP Principle → Strong Consistency, High Availability, ParXXon-‐resilience: Pick at most 2”[37] o  C-‐A No P → Single DB server, no network parXXon

o  C-‐P No A → Block transacXon in case of parXXon failure

o  A-‐P No C → ExpiraXon based caching, voXng majority

•  Which feature to discard depends on the nature of your system[41]

CAP EssenXals – 2 of 3 •  Yield vs. Harvest[37]

o  Yield → Probability of compleXng a request o  Harvest → FracXon of data reflected in the response

•  Some systems tolerate < 100% harvest (e.g search i.e. approximate answers OK) others need 100% harvest (e.g. Trx i.e. correct behavior = single well defined response)

•  For sub-‐systems that tolerate harvest degradaXon, CAP makes sense

CAP EssenXals – 3 of 3 •  Trading Harvest for yield – AP •  ApplicaXon decomposiXon & use NOSQL in

appropriate sub-‐systems that has state management and data semanXcs that match the opera<onal feature & impedance o  Hence NotOnly SQL not No SQL o  Intelligent homing to tolerate parXXon failures[44] o  MulX zones in a region (150 miles -‐ 5 ms) o  TwiIer tweets in Cassandra & MySQL o  BBC using MongoDB for offloading DBMS o  Polygot persistence at LHC@CERN

CAP EssenXals – 3 of 3 •  Trading Harvest for yield – AP •  ApplicaXon decomposiXon & use NOSQL in

appropriate sub-‐systems that has state management and data semanXcs that match the opera<onal feature & impedance o  Hence NotOnly SQL not No SQL o  Intelligent homing to tolerate parXXon failures[44] o  MulX zones in a region (150 miles -‐ 5 ms) o  TwiIer tweets in Cassandra and MySQL o  BBC using MongoDB for offloading DBMS o  Polygot persistence at LHC@CERN

Most important point in the whole

presentation

Eventual Consistency & AMZ •  DistribuXon Transparency[38] •  Larger distributed systems, network parXXons are given

•  Consistency Models o  Strong o Weak

•  Has an inconsistency window before update and guaranteed view

o  Eventual •  If no new updates, all will see the value, eventually

Eventual Consistency & AMZ •  Guarantee variaXons[38]

o Read-‐Your-‐writes o Session consistency o Monotonic Read consistency

• Access will not return previous value o Monotonic Write consistency

• Serialize write by the same process

•  Guarantee order (vector clocks, mvcc) o  Example : Amz Cart merger (let cart add even with parXal

failure)

Eventual Consistency & AMZ -‐ SimpleDB •  SimpleDB strong consistency semanXcs [49,50] o UnXl Feb 2010, SimpleDB only supported eventual consistency i.e. GetAIributes ayer PutAIributes might not be the same for some Xme (1 second)

o On Feb 24, AWS Added ConsistentRead=True aIribute for read

o Read will reflect all writes that got 200OK Xll that Xme!

Eventual Consistency & AMZ -‐ SimpleDB

•  SimpleDB strong consistency semanXcs [49,50] o Also added condiXonal put/delete o Put aIribute has a specified value (Expected.1.Value=) or (Expected.1.Exists = true/false)

o Same condiXonal check capability for delete also

o  Only on one aIribute !

Eventual Consistency & AMZ – S3 •  S3 is an eventual consistency system

o Versioning o “S3 PUT & COPY synchronously store data across mulXple faciliXes before returning SUCCESS”

o Repair Lost redundancy, repair bit-‐rot o Reduced Redundancy opXon for data that can be reproduced (99.999999999% vs. 99.99%) • Approx 1/3rd less

o CloudFront for caching

!SQL ? •  “We conclude that the current RDBMS code lines, while

aIempXng to be a “one size fits all” soluXon, in fact, excel at nothing. Hence, they are 25 year old legacy code lines that should be reXred in favor of a collecXon of “from scratch” specialized engines.”[43]

•  “Current systems were built in an era where resources were incredibly expensive, and every compuXng system was watched over by a collecXon of wizards in white lab coats, responsible for the care, feeding, tuning and opXmizaXon of the system. In that era, computers were expensive and people were cheap”

•  “The 1970 -‐ 1985 period was a <me of intense debate, a myriad of ideas, & considerable upheaval. We predict the next fiUeen years will have the same feel “

Further deliberaXon •  Daniel Abadi[45],Mike Stonebreaker[46], James Hamilton[47], Pat Hilland[48] are all good read for further deliberaXons

NOSQL Internals & Algorithmics

Caveats •  A representaXve subset of the mechanics and mechanisms used in the NOSQL world

•  Being refined & newer ones are being tried •  At a system level – to show how the techniques play a part to deliver a capability

•  The NOSQL Papers and other references for further deliberaXon

•  Even if we don’t cover fully, it is OK. I want to introduce some of the concepts so that you get an appreciaXon …

NOSQL Mechanics •  Horizontal Scalability

–  Gossip (Cluster membership)

–  Failure DetecXon –  Consistent Hashing –  ReplicaXon Techniques •  Hinted Handoff •  Merkle Trees

–  Sharding MongoDB –  Regions in HBase

•  Performance –  SStables/memtables –  LSM w/Bloom Filter

•  Integrity/Version reconcilia<on –  Timestamps –  Vector Clocks –  MVCC –  SemanXc vs. syntacXc reconciliaXon

Consistent Hashing •  Origin: web caching “To decrease ‘hot spots’

•  Three goals[87] –  Smooth evoluXon

• When a new machine joins, minimum rebalance work and impact

–  Spread • Objects assigned to a min number of nodes

–  Load • # of disXnct objects assigned to a node is small

Consistent Hashing •  Hash Keyspace/Token is divided into parXXons/ranges •  Cassandra – choice

–  OrderPreserving parXXoner – key = token (for range queries) –  Also saw a CollaXngOrderPreservingParXXoner

•  ParXXons assigned to nodes that are logically arranged in a circle topology

•  Amz (dynamo) – assign sets of (random) mulXple points to different machines depending on load

•  Cassandra – monitor load & distribute

•  Specific join & leave protocols •  ReplicaXon – next 3 consecuXve •  Cassandra – Rack-‐aware, Datacenter-‐aware

Consistent Hashing -‐ Hinted-‐handoff •  What happens when a node is not available ?

–  May be under load –  May be network parXXon

•  Sloppy Quorum & Hinted-‐handoff •  R/W performed on the 1st n healthy nodes •  Replica sent to a host node with hint in metadata & then transferred when the actual node is up

•  Burdens neighboring nodes •  Cassandra 0.6.2 default is disabled (I think)

Consistent Hashing -‐ ReplicaXon • What happens when a new node joins ? – It gets one or more parXXons – Dynamo : Copy the whole parXXon – Cassandra : Replicate keyset – Cassandra : working on a bit torrent type protocol to copy from replicas

AnX-‐entropy •  Merge and reconciliaXon operaXons

–  Operate on two states and return a new state[86] •  Merkle Trees

–  Dynamo use of Merkle trees to detect inconsistencies between replicas

–  AnXEntropy in Cassandra exchanges Merkle trees and if they disagree, range repair via compacXon[91,92]

–  Cassandra uses the ScuIlebuI ReconciliaXon[86]

Gossip • Membership & Failure detecXon •  Based on emergence without rigidity – pulse coupled oscillators, biological systems like fireflies ![90]

•  Also used for state propagaXon – Used in Dynamo/Cassandra

Gossip •  Cassandra exchanges heartbeat state, applicaXon state

and so forth •  Every second, random live node, random unreachable

node and exchanges key-‐value structures •  Some nodes play the part of seeds •  Seed /iniXal contact points in staXc conf file

storage.conf file •  Could also come from a configuraXon service like

zookeeper •  To guard against node flap, explicit membership join and

leave – now you know why hinted handoff was added

Membership & Failure detecXon •  Consensus & Atomic Broadcast -‐ impossible to solve in a distributed system[88,89] –  Cannot differenXate between an slow system and a crashed system

•  Completeness –  Every system that crashed will be eventually detected

•  Correctness –  A correct process is never suspected

•  In short, if you are dead somebody will no<ce it and if you are alive, nobody will mistake you for dead !

Ø Accrual Failure Detector •  Not Boolean value but a probabilisXc number that “accrues” over

an exponenXal scale •  Captures the degree of confidence that a corresponding monitored

process has crashed[94] –  Suspicion Level –  Ø = 1 -‐> prob(error) 10% –  Ø = 2 -‐> prob(error) 1% –  Ø = 3 -‐> prob(error) 0.1%

•  If process is dead, –  Ø is monotonically increasing & Ø→α as t →α

•  If process is alive and kicking, Ø=0 •  Account for lost messages, network latency and actual crash of

system/process

•  Well known heartbeat period Δi, then network latency Δtr can be tracked by inter-‐arrival Xme modeling

Write/Read Mechanisms •  Read & Write to a random node (StorageProxy)

•  Proxy coordinates the read and write strategy (R/W = any, quorum et al)

• Memtables/SSTables from big table •  Bloom Filter/Index •  LSM Trees

BF

Index

BF

Index

BF

Index

Commit Logs

MemTable

SSTable • Immutable • Compaction • Maintain Index & Bloom Filter

Node

Node

Flushing

Read

Write

Memory

Disk

Hbase – WAL, Memstore, HDFS File system

How… does HBase work again?

http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html

http://hbaseblog.com/2010/07/04/hug11-hbase-0-90-preview-wrap-up/

Bloom Filter •  The BloomFilter answers the quesXon •  “Might there be data for this key in this SSTable?” [Ref: Cassandra/Hbase mailer] –  “Maybe" or –  “Definitely not“ –  When the BloomFilter says "maybe" we have to go to disk to check out the content of the SSTable

•  Depends on implementaXon –  Redone in Cassandra –  Hbase 0.20.x removed, will be back in 0.90 with a “jazzy” implementaXon

Was it a vision, or a waking dream? Fled is that music:—do I wake or sleep?

-Keats, Ode to a Nightingale

•  http://www.readwriteweb.com/enterprise/2011/11/infographic-data-deluge---8-ze.php

•  http://www.crn.com/news/data-center/232200061/efficiency-or-bust-data-centers-drive-for-low-power-solutions-prompts-channel-growth.htm

•  http://www.quantumforest.com/2011/11/do-we-need-to-deal-with-big-data-in-r/

•  http://www.forbes.com/special-report/2011/migration.html •  http://www.mercurynews.com/bay-area-news/ci_19368103 •  http://www.businessinsider.com/apple-new-data-center-north-

carolina-created-50-jobs-2011-11

Technology

The Art of Big Data