81
Krishna Sankar, http://doubleclix.wordpress.com EC4000–PhD Guest Seminar, Naval Post Graduate School Nov 29,2011 The road lies plain before me;--'tis a theme Single and of determined bounds; … - Wordsworth, The Prelude

The Art of Big Data

Embed Size (px)

DESCRIPTION

Slides for my talk at the Naval Post Graduate SChool PhD Seminar

Citation preview

Page 1: The Art of Big Data

Krishna Sankar, http://doubleclix.wordpress.com

EC4000–PhD Guest Seminar, Naval Post Graduate School

Nov 29,2011

The road lies plain before me;--'tis a theme

Single and of determined bounds; …

- Wordsworth, The Prelude

Page 2: The Art of Big Data

What is Big Data ?

Big Data to smart data

Big Data Pipeline

Analytic Algorithms

Storage - NOSQL

Processing - Hadoop …

Analytics/Modeling

R

Visualization

o  Agenda o  To cover the broad

picture o  Understand the

waypoints & o  Drill down into one

area (NOSQL) o  Can do others later

o  Of the Big Data domain …

Page 3: The Art of Big Data

Thanks to … The giants whose shoulders I am

standing on

Special  Thanks  to:        Peter  Ateshian,  NPS  

     Prof  Murali  Tummala,  NPS        Shirley  Bailes,O’Reilly        Ed  Dumbill,O’Reilly  

     Jeff  Barr,AWS        Jenny  Kohr  Chynoweth,AWS  

Page 4: The Art of Big Data

When I think of my own native land, In a moment I seem to be there;

But, alas! recollection at hand Soon hurries me back to despair.

- Cowper, The Solitude Of Alexander SelKirk

Page 5: The Art of Big Data

What is Big Data ? “Big data” is data that becomes large enough that it cannot be processed using conventional methods. @twitter

Ref:  hIp://radar.oreilly.com/2010/09/the-­‐smaq-­‐stack-­‐for-­‐big-­‐data.html  

“Big data” is less about size, more

about flow & velocity - persisting

petabytes per year is easier than

processing terabytes per hour. @twitter

Page 6: The Art of Big Data

What is Big Data ?

Ref:  hIp://www.ciol.com/News/News/News-­‐Reports/Vinod-­‐Khosla%E2%80%99s-­‐cool-­‐dozen-­‐tech-­‐innovaXons/156307/0/  hIp://yourstory.in/2011/11/vinod-­‐khoslas-­‐keynote-­‐at-­‐nasscom-­‐product-­‐conclave-­‐reject-­‐punditry-­‐believe-­‐in-­‐an-­‐idea-­‐take-­‐risk-­‐and-­‐succeed/  

Vinod Khosla’s Cool Dozen !①  Consumers : “Widespread innovation in technologies that reduce data overload for

users” ~ Data Reduction ②  Businesses : “Simple solutions to handle the deluge of data generated from various

sources …” ~ Big Data Analytics TV  2.0,  EducaXon,  Social  NEXT,Tools  for  sharing  inteerst,Publishing,…  

Page 7: The Art of Big Data

①  Volume o  Scale  

②  Velocity o  Data  change  rate  vs.  decision  window  

③  Variety o  Different  sources  &  formats  o  Structured  vs.  Unstructured  

④  Variability o  Breadth  of  interpreta<on  &  o  Depth  of  analy<cs  

⑤  Contextual o  Dynamic  variability  o  RecommendaXon  

⑥  Connectedness

EBC322  

hIp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/  hIp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf  

Page 8: The Art of Big Data

①  Volume o  Scale  

②  Velocity o  Data  change  rate  vs.  decision  window  

③  Variety o  Different  sources  &  formats  o  Structured  vs.  Unstructured  

④  Variability o  Breadth  of  interpreta<on  &  o  Depth  of  analy<cs  

⑤  Contextual o  Dynamic  variability  o  RecommendaXon  

⑥  Connectedness

EBC322  

hIp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/  hIp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf  

Page 9: The Art of Big Data

①  Volume o  Scale  

②  Velocity o  Data  change  rate  vs.  decision  window  

③  Variety o  Different  sources  &  formats  o  Structured  vs.  Unstructured  

④  Variability o  Breadth  of  interpreta<on  &  o  Depth  of  analy<cs  

⑤  Contextual o  Dynamic  variability  o  RecommendaXon  

⑥  Connectedness

EBC322  

hIp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/  hIp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf  

Page 10: The Art of Big Data

①  Volume o  Scale  

②  Velocity o  Data  change  rate  vs.  decision  window  

③  Variety o  Different  sources  &  formats  o  Structured  vs.  Unstructured  

④  Variability o  Breadth  of  interpreta<on  &  o  Depth  of  analy<cs  

⑤  Contextual o  Dynamic  variability  o  RecommendaXon  

⑥  Connectedness

EBC322  

hIp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/  hIp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf  

Page 11: The Art of Big Data

①  Volume o  Scale  

②  Velocity o  Data  change  rate  vs.  decision  window  

③  Variety o  Different  sources  &  formats  o  Structured  vs.  Unstructured  

④  Variability o  Breadth  of  interpreta<on  &  o  Depth  of  analy<cs  

⑤  Contextual o  Dynamic  variability  o  RecommendaXon  

⑥  Connectedness

EBC322  

hIp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/  hIp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf  

Page 12: The Art of Big Data

I.  Two  Main  Types  –  based  on  collecXon  i.  Big  Data  Streams  

o  Data  in  “moXon”  o  TwiIer  fire  hose,  Facebook,  G+    

ii.  Big  Data  Logs  o  Data  “at  rest”  o  Logs,  DW,  external  market  data,  POS,  …  

II.  Typically,  Big  Data  has  a  non-­‐determinisXc  angle  as  well  …  o  CreaXve  Discovery  o  IteraXve,  Model  based  AnalyXcs  o  Explore  quesXons  to  ask  

III.  Smart  Data  =  Big  Data  +  context  +  embedded/interacXve  (inference,  reasoning)  models  o  Model  Driven  o  DeclaraXvely  InteracXve  

hIp://www.slideshare.net/leonsp/hadoop-­‐slides-­‐11-­‐what-­‐is-­‐big-­‐data  hIp://www.slideshare.net/Dataversity/wed-­‐1550-­‐bacvanskivladimircolor  

Page 13: The Art of Big Data

Twitter §  200 million tweets/day §  Peak 10,000/second §  How would you handle the fire

hose for social network analytics ?

hIp://goo.gl/dcBsQ  

Storage §  4 U box = 40 TB, §  1 PB = 25 boxes !

Zynga §  “Analytics company, not a

gaming company!” §  Harvests data : 15 TB/day

§  Test new features §  Target advertising

§  230 million players/month

AWS – 600 Billion objects!

Page 14: The Art of Big Data

•  6  Billion  Messages  per  day  

•  2  PB  (w/compression)  online  

•  6  PB  w/  replicaXon  •  250  TB/Month  growth  •  HBase  Infrastructure  

Page 15: The Art of Big Data

Ref:  hIp://www.hpts.ws/sessions/2011HPTS-­‐TomFastner.pdf  

Path  Analysis  A/B  TesXng  

50  TB/Day  240  nodes,  84  PB  Teradata  InstallaXon  

Very  systemaXc  Diagram  speaks  volumes!  

Page 16: The Art of Big Data

•  “…  they  didn’t  need  a  genius,  …  but  build  the  world’s  most  impressive  dileIante  …  baIling  the  efficient  human  mind  with  spectacular  flamboyant  inefficiency”  –  Final  Jeopardy  by  Stephen  Baker  

•  15  TB  memory,  across  90  IBM  760  servers,  in  10  racks  •  1  TB  of  dataset  •  200  Million  pages  processed  by  Hadoop  •  This  is  a  good  example  of  Connected  data  

–  Contextual  w/  variability  –  Breath  of  interpretaXon  –  AnalyXcs  depth  

hIp://doubleclix.wordpress.com/2011/03/01/the-­‐educaXon-­‐of-­‐a-­‐machine-­‐%E2%80%93-­‐review-­‐of-­‐book-­‐%E2%80%9Cfinal-­‐jeopardy%E2%80%9D-­‐by-­‐stephen-­‐baker/  hIp://doubleclix.wordpress.com/2011/02/17/watson-­‐at-­‐jeopardy-­‐a-­‐race-­‐of-­‐machines/  

Page 17: The Art of Big Data

Storage  

Parallelism  

Inference  

NOSQL  

HPC  

Map/Reduce  

Object  Store  

Block  Store  

AnalyXcs  

Web  AnalyXcs  

Log  AnalyXcs  

Social  Media  

Social    Graph  

Knowledge  Graph  

Distributed  ApplicaXons  

Warehouse-­‐style  ApplicaXons  

RecommendaXon/Inference  Engines  Machine  Learning  

ClassificaXon,  Clustering  

Search,  Indexing  

Mahout  

Cloud   Architecture  

Big Data

Page 18: The Art of Big Data

Big  Data  to  Smart  Data

“A towel is about the most massively useful thing an interstellar hitchhiker can have … any man who can hitch the length and breadth of the Galaxy, rough it … win through, and still know where his towel is, is clearly a man to be reckoned with.”

- From The Hitchhiker's Guide to the Galaxy, by Douglas Adams. Published by Harmony Books in 1979

Page 19: The Art of Big Data

Big  data  to  smart  data •  summary

1 Don’t  throw  away  any  data  !

2 Be  ready  for  different  ways  of  organizing  the  data

h;p://goo.gl/fGw7r

Page 20: The Art of Big Data

Big  Data  Pipeline

If a problem has no solution, it is not a problem, but a fact, not to be solved but to be coped with, over time …

- Peres’s Law

Page 21: The Art of Big Data

Big  Data  Pipeline •  Stages

o  Collect o  Store o  Transform & Analyze o  Model & Reason o  Predict, Recommend & Visualize

•  Different systems have different characteristics o  Infrastructure optimization based in application/hardware

attributes correlation (short term) •  Hadoop, Splunk, internal Dashboard

o  Application performance trends (medium term) •  Analytics, Modeling,…

o  Product Metrics •  Feature set vs. usage, what is important to users, stratification •  Modeling using R, Visualization layers like Tableau

Page 22: The Art of Big Data

Volume

Velocity

Variety

Variability

Connectedness

Context

Model

Infer-ability

Big Data Pipeline

Decomplexify! Contextualize! Network! Reason! Infer!

Logs,  Scribe,  Flume,  Hadoop…  

SQL  NOSQL,  HDFS,  XML,  <iles,  …    

SQL,    BI  Tools,  Hadoop,  Pig,  Hive,    .NET  Dryad,  Various  other  tools  

Hand  coded  Programs,  R,  Mahout,  …    

Internal  dashboards,  Tableau    

Ref:h;p:goo.gl/Mm83k

Page 23: The Art of Big Data

The  NOSQL  !

I AM monarch of all I survey; My right there is none to dispute;

From the centre all round to the sea I am lord of the fowl and the brute

- Cowper, The Solitude Of Alexander SelKirk

Build to Fail - “It is working” is not binary

Page 24: The Art of Big Data

Agenda •  Opening Gambit

–  NOSQL  :  Toil,  Tears  &  Sweat  !  •  The Pragmas

–  ABCs  of  NOSQL  [ACID,  BASE  &  CAP]  •  The Mechanics

–  Algorithmics  &  Mechanisms  (For  reference)  

Referenced Links @ http://doubleclix.wordpress.com/2010/06/20/nosql-talk-references/

Page 25: The Art of Big Data

What is NOSQL Anyway ?

•  NOSQL    !=  NoSQL  or  NOSQL  !=  (!SQL)  •  NOSQL  =  Not  Only  SQL  •  Can  be  traced  back  to  Eric  Evans[2]!  

–  You  can  ask  him  during  the  ayernoon  session!  •  Unfortunate  Name,  but  is  stuck  now  •  Non  RelaXonal  could  have  been  beIer  •  Usually  OperaXonal,  Definitely  Distributed  •  NOSQL  has  certain  semanXcs  –  need  not  stay  that  way  

Page 26: The Art of Big Data

Key  Value   Column   Document   Graph  

Ref:  [22,51,52]  

NOSQL  

Neo4j  

FlockDB  

InfiniteGraph  

CouchDB  

MongoDB  

Lotus  Domino  

Riak  

Google  BigTable  

HBase  

Cassandra  

HyperTable  

In-­‐memory  

Disk  Based  

SimpleDB  

Memcached  

Redis  

Tokyo  Cabinet  

Dynamo  

Voldemort   Azure  TS  

Page 27: The Art of Big Data

WHAT WORKS NOSQL Tales from the field

When I think of my own native land, In a moment I seem to be there;

But, alas! recollection at hand Soon hurries me back to despair.

- Cowper, The Solitude Of Alexander SelKirk

Page 28: The Art of Big Data

•  Designer Augmenting RDBMS with a Distributed key Value Store[40 : A good talk by Geir]

•  Invitation only designer brand sales •  Limited inventory sales – start at 12:00, members have

10 min to grab them. 500K mails every day •  Keeps brand value, hidden from search •  Interesting load properties •  Each item a row in DB-BUY NOW reserves it

–  Can't order more •  Started out as a Rails app

–  shared nothing

•  Narrow peaks – half of revenue

Page 29: The Art of Big Data

Christian Louboutin Effect

•  ½ amz for Louboutin •  Use Voldemort •  Inventory, Shopping Cart,

Checkout •  Partition by prod ID •  Shared infrastructure – “fog”

not “cloud’ - Joyent! •  In-memory inventory •  Not afraid of sale anymore!

And SQL DBs are still relevant !

Page 30: The Art of Big Data

Typical NOSQL Example Bit.ly •  Bit,ly URL shortening service, uses MongoDB •  User, title, URL, hash, labels[I-5], sort by time •  Scale – ~50M users, ~10K concurrent, ~1.25B shortens

per month •  Criteria:

–  Simple, Zippy FAST, Very Flexible, Reasonable Durability, Low cost of ownership

•  Sharded by userid

Page 31: The Art of Big Data

•  New kind of “dictionary” a word repository, GPS for English – context, pronunciations, twitter … developer API

•  Characteristics[I-6,Tony Tam’s presentation] –  RO-centric, 10,000 reads for every write –  Hit a wall with MySQL (4B rows) –  MongoDB read was so good that memcached layer was not

required –  MongoDB used 4 times MySQL storage

•  Another example : –  Voldemort – Unified Communications, IP-Phone data stored

keyed off of phone number. Data relatively stable

Page 32: The Art of Big Data

Large Hadron Collider@CERN •  DAS is part of giant data management

enterprise (cms) –  Polygot Persistence (SQL + NOSQL, Mongo, Couch,

memcache, HDFS, Luster, Oracle, mySQL, …) •  Data Aggregation System [I-1,I-2,I-3,I-4]

–  Uses MongoDB –  Distributed Model, 2-6 pb data –  Combine info. from different metadata sources, query

without knowing their existence, user has domain knowledge – but shouldn’t deal with various formats, interfaces and query semantics

–  DAS aggregates, caches and presents data as JSON documents – preserving security & integrity

And SQL DBs are still relevant !

Page 33: The Art of Big Data

Scaling Twitter • 

Page 34: The Art of Big Data

•  Digg –  RDBMS places burden on reads than writes[I-8] –  Looked at NOSQL, selected Cassandra

•  Colum oriented, so more structure than key-value

•  Heard from noSQL Boston[http://twitter.com/#search?q=%23nosqllive] –  Baidu: 120 node HyperTable cluster managing

600TB of data –  StumbleUpon uses HBase for Analytics –  Twitter’s Current Cassandra cluster: 45 nodes

Page 35: The Art of Big Data

•  Adob is a HBase shop[I-10,I-11,2]

•  Adobe SaaS Infrastructure – tagging, content aggregation, search, storage and so forth

•  Dynamic schema & huge number of records[I-5]

•  40 million records in 2008 to 1 billion with 50 ms response

•  NOSQL not mature in 2008, now good enough

•  Prod Analytics:40 nodes, largest has 100 nodes

•  BBC is a CouchDB shop[I-13]

•  Sweet spot: •  Multi-master, multi

datacenter replication

•  Interactive Mediums •  Old data to CouchDB •  Thus free up DB to do

work!

Page 36: The Art of Big Data

•  Cloudkick is a Cassandra shop[I-12] •  Cloudkick offers cloud management services •  Store metrics data •  Linear scalability for write load •  Massive write performance

•  Memory table & serial commit log •  Low operational costs •  Data Structure

–  Metrics, Rolled-up data, Statuses at time slice : all indexed by timestamp

Page 37: The Art of Big Data

•  Guardian/UK –  Runs on Redis[I-14] ! –  “Long-term The Guardian is looking

towards the adoption of a schema-free database to sit alongside its Oracle database and is investigating CouchDB. … the relational database is now just a component in the overall data management story, alongside data caching, data stores, search engines etc.

–  NOSQL can increase performance of relational data by offloading specific data and tasks

And SQL DBs are still relevant ! "The evil that SQL DBs do lives after them; the good is oft interred with their bones...",

Page 38: The Art of Big Data

NOSQL at Netflix •  Netflix is fully in the cloud •  Uses NOSQL across the globe •  Customer Profiles, watchlog, usage logging (see next

slide) –  No multi-record locking

•  No DBA ! •  Easier Schema Changes •  Less complex, Highly Available data store •  Joins happen in the applications

http://www.hpts.ws/sessions/nosql-ecosystem.pdf http://www.hpts.ws/sessions/GlobalNetflixHPTS.pdf

Page 39: The Art of Big Data
Page 40: The Art of Big Data

21 NOSQL Themes •  Web  Scale  •  Scale  Incrementally/conXnuous  growth  •  Oddly  shaped  &  exponenXally  connected  •  Structure  data  as  it  will  be  used  –  i.e.  read,  query  •  Know  your  queries/updates  in  advance[96],  but  you  can  change  

them  later  •  Compute  aIributes  at  run  Xme  •  Create  a  few  large  enXXes  with  opXonal  parts  

–  NormalizaXon  creates  many  small  enXXes  •  Define  Schemas  in  models  (not  in  databases)  •  Avoid  impedance  mismatch  •  Narrow  down  &  solve  your  core  problem  •  Solve  the  right  problem  with  the  right  tool  

Ref:  [I-­‐8]  

Page 41: The Art of Big Data

21 NOSQL Themes •  ExisXng  soluXons  are  clunky[1]  (in  certain  situaXons)  •  Scale  automaXcally,  “becoming  prohibiXvely  costly  (in  

terms  of  manpower)  to  operate”  TwiIer[I-­‐9]    •  DistribuXon  &  parXXoning  are  built-­‐in  NOSQL  

•  RDBMS  distribuXon  &  sharding  not  fun  and  is  expensive  –  Lose  most  funcXonality  along  the  way  

•  Data  at  the  center,  Flexible  schema,  Less  joins  •  The  value  of  NOSQL  is  in  flexibility  as  much  as  it  is  in  “Big  

Data”  

Page 42: The Art of Big Data

21 NOSQL Themes •  Requirements[3]  

–  Data  will  not  fit  in  one  node  •  And  so  need  data  parXXon/distribuXon  by  the  system  

–  Nodes  will  fail,  but  data  needs  to  be  safe  –  replicaXon!  –  Low  latency  for  real-­‐Xme  use  

•  Data  Locality  –  Row  based  structures  will  need  to  read  whole  row,  even  for  a  column  

–  Column  based  structures  need  to  scan  for  each  row  •  SoluXon  :  Column  storage  with  Locality    

–  Keep  data  that  is  read  together,  don’t  read  what  you  don’t  care  •  For  example  friends  –  other  data  

Ref:  3  

Page 43: The Art of Big Data

ABCs of NOSQL -

ACID, BASE &

CAP

The woods are lovely, dark, and deep, But I have promises to keep,

And miles to go before I sleep, And miles to go before I sleep.

-Frost

Page 44: The Art of Big Data

CAP Principle

Consistency

Availability Partition

“CAP  Principle  →      Strong  Consistency,      High  Availability,      Par::on-­‐resilience:    

Pick  at  most  2”[37]

Which  feature  to  discard  depends  on  the  nature  of  your  system[41]  

Page 45: The Art of Big Data

CAP Principle

Consistency

Availability Partition

“CAP  Principle  →      Strong  Consistency,      High  Availability,      Par::on-­‐resilience:    

Pick  at  most  2”[37]  C-­‐A  No  P  →  Single  DB  server,  no  network  par::on  

Which  feature  to  discard  depends  on  the  nature  of  your  system[41]  

Page 46: The Art of Big Data

CAP Principle

Consistency

Availability Partition

“CAP  Principle  →      Strong  Consistency,      High  Availability,      Par::on-­‐resilience:    

Pick  at  most  2”[37]  C-­‐P  No  A  →  Block  transac:on  in  case  of  par::on  failure  

Which  feature  to  discard  depends  on  the  nature  of  your  system[41]  

Page 47: The Art of Big Data

CAP Principle

Consistency

Availability Partition

“CAP  Principle  →      Strong  Consistency,      High  Availability,      Par::on-­‐resilience:    

Pick  at  most  2”[37]   A-­‐P  No  C  →  Expira:on  based  caching,  vo:ng  majority  

Interesting (& controversial) from NOSQL perspective

Page 48: The Art of Big Data

ABCs  of  NOSQL  •  ACID  

o  Atomicity,  Consistency,  IsolaXon  &  Durability  –  fundamental  properXes  of  SQL  DBMS  

•  BASE[35,39]  o  Basically  Available  Soy  state(Scalable)  Eventually  Consistent    

•  CAP[36,39]  o  Consistency,  Availability  &  ParXXoning  o  This  C  is  ~A+C  

•  i.e.  Atomic  Consistency[36]  

Page 49: The Art of Big Data

ACID  •  Atomicity  

o  All  or  nothing  •  Consistent  

o  From  one  consistent  state  to  another  •  e.g.  ReferenXal  Integrity  

o  But  it  is  also  applicaXon  dependent  on    •  e.g.  min  account  balance  •  Predicates,  invariants,…  

•  IsolaXon  •  Durability  

Page 50: The Art of Big Data

CAP  Pragmas  •  PrecondiXons  

o  The  domain  is  scalable  web  apps  o  Low  Latency  For  real  Xme  use  o  A  small  sub-­‐set  of  SQL  FuncXonality  o  Horizontal  Scaling  

•  PritcheI[35]  talks  about  relaxing  consistency  across  funcXonal  groups  than  within  funcXonal  groups  

•  Idempotency  to  consider  o  Updates  inc/dec  are  rarely  idempotent  o  Order  preserving  trx  are  not  idempotent  either  o  MVCC  is  an  answer  for  this  (CouchDB)  

Page 51: The Art of Big Data

Consistency  

•  Strict  Consistency  o Any  read  on  Data  X  will  return  the  most  recent  write  on  X[42]  

•  SequenXal  Consistency  o Maintains  sequenXal  order  from  mulXple  processes  (No  menXon  of  Xme)  

•  Linearizability  o Add  Xmestamp  from  loosely  synchronized  processes  

Page 52: The Art of Big Data

Consistency  •  Write  availability,  not  read  availability[44]  •  Even  load  distribuXon  is  easier  in  eventually  consistent  systems  

•  MulX-­‐data  center  support  is  easier  in  eventually  consistent  systems  

•  Some  problems  are  not  solvable  with  eventually  consistent  systems  

•  Code  is  someXmes  simpler  to  write  in  strongly  consistent  systems  

Page 53: The Art of Big Data

CAP  EssenXals  –  1  of  3  •  “CAP  Principle  →  Strong  Consistency,  High  Availability,  ParXXon-­‐resilience:  Pick  at  most  2”[37]  o  C-­‐A  No  P  →  Single  DB  server,  no  network  parXXon  

o  C-­‐P  No  A  →  Block  transacXon  in  case  of  parXXon  failure  

o  A-­‐P  No  C  →  ExpiraXon  based  caching,  voXng  majority  

•  Which  feature  to  discard  depends  on  the  nature  of  your  system[41]  

Page 54: The Art of Big Data

CAP  EssenXals  –  2  of  3  •  Yield  vs.  Harvest[37]  

o  Yield  →  Probability  of  compleXng  a  request  o  Harvest  →  FracXon  of  data  reflected  in  the  response  

•  Some  systems  tolerate  <  100%  harvest  (e.g  search  i.e.  approximate  answers  OK)  others  need  100%  harvest  (e.g.  Trx  i.e.  correct  behavior  =  single  well  defined  response)  

•  For  sub-­‐systems  that  tolerate  harvest  degradaXon,  CAP  makes  sense      

Page 55: The Art of Big Data

CAP  EssenXals  –  3  of  3  •  Trading  Harvest  for  yield  –  AP  •  ApplicaXon  decomposiXon  &  use  NOSQL  in  

appropriate  sub-­‐systems  that  has  state  management  and  data  semanXcs  that  match  the  opera<onal  feature  &  impedance  o  Hence  NotOnly  SQL  not  No  SQL  o  Intelligent  homing  to  tolerate  parXXon  failures[44]  o  MulX  zones  in  a  region  (150  miles  -­‐  5  ms)  o  TwiIer  tweets  in  Cassandra  &  MySQL  o  BBC  using  MongoDB  for  offloading  DBMS  o  Polygot  persistence  at  LHC@CERN  

Page 56: The Art of Big Data

CAP  EssenXals  –  3  of  3  •  Trading  Harvest  for  yield  –  AP  •  ApplicaXon  decomposiXon  &  use  NOSQL  in  

appropriate  sub-­‐systems  that  has  state  management  and  data  semanXcs  that  match  the  opera<onal  feature  &  impedance  o  Hence  NotOnly  SQL  not  No  SQL  o  Intelligent  homing  to  tolerate  parXXon  failures[44]  o  MulX  zones  in  a  region  (150  miles  -­‐  5  ms)  o  TwiIer  tweets  in  Cassandra  and  MySQL  o  BBC  using  MongoDB  for  offloading  DBMS  o  Polygot  persistence  at  LHC@CERN  

Most important point in the whole

presentation

Page 57: The Art of Big Data

Eventual  Consistency  &  AMZ  •  DistribuXon  Transparency[38]  •  Larger  distributed  systems,  network  parXXons  are  given  

•  Consistency  Models  o  Strong  o Weak  

•  Has  an  inconsistency  window  before  update  and  guaranteed    view  

o  Eventual  •  If  no  new  updates,  all  will  see  the  value,  eventually  

Page 58: The Art of Big Data

Eventual  Consistency  &  AMZ  •  Guarantee  variaXons[38]  

o Read-­‐Your-­‐writes  o Session  consistency  o Monotonic  Read  consistency  

• Access  will  not  return  previous  value  o Monotonic  Write  consistency  

• Serialize  write  by  the  same  process  

•  Guarantee  order  (vector  clocks,  mvcc)  o  Example  :  Amz  Cart  merger  (let  cart  add  even  with  parXal  

failure)  

Page 59: The Art of Big Data

Eventual  Consistency  &  AMZ  -­‐  SimpleDB  •  SimpleDB  strong  consistency  semanXcs  [49,50]    o UnXl  Feb  2010,  SimpleDB  only  supported  eventual  consistency  i.e.  GetAIributes  ayer  PutAIributes  might  not  be  the  same  for  some  Xme  (1  second)  

o On  Feb  24,  AWS  Added  ConsistentRead=True  aIribute  for  read  

o Read  will  reflect  all  writes  that  got  200OK  Xll  that  Xme!  

Page 60: The Art of Big Data

Eventual  Consistency  &  AMZ  -­‐  SimpleDB  

•  SimpleDB  strong  consistency  semanXcs  [49,50]    o Also  added  condiXonal  put/delete  o Put  aIribute  has  a  specified  value  (Expected.1.Value=)  or  (Expected.1.Exists  =  true/false)  

o Same  condiXonal  check  capability  for  delete  also  

o   Only  on  one  aIribute  !  

Page 61: The Art of Big Data

Eventual  Consistency  &  AMZ  –  S3  •  S3  is  an  eventual  consistency  system  

o Versioning  o “S3  PUT  &  COPY  synchronously  store  data  across  mulXple  faciliXes  before  returning  SUCCESS”  

o Repair  Lost  redundancy,  repair  bit-­‐rot  o Reduced  Redundancy  opXon  for  data  that  can  be  reproduced  (99.999999999%    vs.  99.99%)    • Approx  1/3rd  less  

o CloudFront  for  caching  

Page 62: The Art of Big Data

!SQL  ?  •  “We  conclude  that  the  current  RDBMS  code  lines,  while  

aIempXng  to  be  a  “one  size  fits  all”  soluXon,  in  fact,  excel  at  nothing.  Hence,  they  are  25  year  old  legacy  code  lines  that  should  be  reXred  in  favor  of  a  collecXon  of  “from  scratch”  specialized  engines.”[43]  

•  “Current  systems  were  built  in  an  era  where  resources  were  incredibly  expensive,  and  every  compuXng  system  was  watched  over  by  a  collecXon  of  wizards  in  white  lab  coats,  responsible  for  the  care,  feeding,  tuning  and  opXmizaXon  of  the  system.  In  that  era,  computers  were  expensive  and  people  were  cheap”  

•  “The  1970  -­‐  1985  period  was  a  <me  of  intense  debate,  a  myriad  of  ideas,  &  considerable  upheaval.  We  predict  the  next  fiUeen  years  will  have  the  same  feel  “  

Page 63: The Art of Big Data

Further  deliberaXon  •  Daniel  Abadi[45],Mike  Stonebreaker[46],  James  Hamilton[47],  Pat  Hilland[48]  are  all  good  read  for  further  deliberaXons  

Page 64: The Art of Big Data

NOSQL Internals & Algorithmics

Page 65: The Art of Big Data

Caveats  •  A  representaXve  subset  of  the  mechanics  and  mechanisms  used  in  the  NOSQL  world  

•  Being  refined  &  newer  ones  are  being  tried  •  At  a  system  level  –  to  show  how  the  techniques  play  a  part  to  deliver  a  capability  

•  The  NOSQL  Papers  and  other  references  for  further  deliberaXon  

•  Even  if  we  don’t  cover  fully,  it  is  OK.  I  want  to  introduce  some  of  the  concepts  so  that  you  get  an  appreciaXon  …  

Page 66: The Art of Big Data

NOSQL  Mechanics  •  Horizontal  Scalability  

–  Gossip  (Cluster  membership)  

–  Failure  DetecXon  –  Consistent  Hashing  –  ReplicaXon  Techniques  •  Hinted  Handoff  •  Merkle  Trees  

–  Sharding  MongoDB  –  Regions  in  HBase    

•  Performance  –  SStables/memtables  –  LSM  w/Bloom  Filter  

•  Integrity/Version  reconcilia<on  –  Timestamps  –  Vector  Clocks  –  MVCC  –  SemanXc  vs.  syntacXc  reconciliaXon  

Page 67: The Art of Big Data

Consistent  Hashing  •  Origin:  web  caching  “To  decrease  ‘hot  spots’  

•  Three  goals[87]  –  Smooth  evoluXon  

• When  a  new  machine  joins,  minimum  rebalance  work  and  impact  

–  Spread  • Objects  assigned  to  a  min  number  of  nodes  

–  Load  • #  of  disXnct  objects  assigned  to  a  node  is  small  

Page 68: The Art of Big Data

Consistent  Hashing  •  Hash  Keyspace/Token  is  divided  into  parXXons/ranges  •  Cassandra  –  choice    

–  OrderPreserving  parXXoner  –  key  =  token  (for  range  queries)  –  Also  saw  a  CollaXngOrderPreservingParXXoner  

•  ParXXons  assigned  to  nodes  that  are  logically  arranged  in  a  circle  topology  

•  Amz  (dynamo)  –  assign  sets  of  (random)  mulXple  points  to  different  machines  depending  on  load  

•  Cassandra  –  monitor  load  &  distribute  

•  Specific  join  &  leave  protocols  •  ReplicaXon  –  next  3  consecuXve  •  Cassandra  –  Rack-­‐aware,  Datacenter-­‐aware  

Page 69: The Art of Big Data

Consistent  Hashing  -­‐  Hinted-­‐handoff  •  What  happens  when  a  node  is  not  available  ?  

–  May  be  under  load  –  May  be  network  parXXon  

•  Sloppy  Quorum  &  Hinted-­‐handoff  •  R/W  performed  on  the  1st  n  healthy  nodes  •  Replica  sent  to  a  host  node  with  hint  in  metadata  &  then  transferred  when  the  actual  node  is  up  

•  Burdens  neighboring  nodes  •  Cassandra  0.6.2  default  is  disabled  (I  think)  

Page 70: The Art of Big Data

Consistent  Hashing  -­‐  ReplicaXon  • What  happens  when  a  new  node  joins  ?  – It  gets  one  or  more  parXXons  – Dynamo  :  Copy  the  whole  parXXon  – Cassandra  :  Replicate  keyset  – Cassandra  :  working  on  a  bit  torrent  type  protocol  to  copy  from  replicas  

Page 71: The Art of Big Data

AnX-­‐entropy  •  Merge  and  reconciliaXon  operaXons  

–  Operate  on  two  states  and  return  a  new  state[86]  •  Merkle  Trees  

–  Dynamo  use  of  Merkle  trees  to  detect  inconsistencies  between  replicas  

–  AnXEntropy  in  Cassandra  exchanges  Merkle  trees  and  if  they  disagree,  range  repair  via  compacXon[91,92]  

–  Cassandra  uses  the  ScuIlebuI  ReconciliaXon[86]  

Page 72: The Art of Big Data

Gossip  • Membership  &  Failure  detecXon  •  Based  on  emergence  without  rigidity  –  pulse  coupled  oscillators,  biological  systems  like  fireflies  ![90]  

•  Also  used  for  state  propagaXon  – Used  in  Dynamo/Cassandra  

Page 73: The Art of Big Data

Gossip  •  Cassandra  exchanges  heartbeat  state,  applicaXon  state  

and  so  forth  •  Every  second,  random  live  node,  random  unreachable  

node  and  exchanges  key-­‐value  structures  •  Some  nodes  play  the  part  of  seeds  •  Seed  /iniXal  contact  points  in  staXc  conf  file  

storage.conf  file  •  Could  also  come  from  a  configuraXon  service  like  

zookeeper  •  To  guard  against  node  flap,  explicit  membership  join  and  

leave  –  now  you  know  why  hinted  handoff  was  added    

Page 74: The Art of Big Data

Membership  &  Failure  detecXon  •  Consensus  &  Atomic  Broadcast    -­‐  impossible  to  solve  in  a  distributed  system[88,89]  –  Cannot  differenXate  between  an  slow  system  and  a  crashed  system    

•  Completeness  –  Every  system  that  crashed  will  be  eventually  detected  

•  Correctness  –  A  correct  process  is  never  suspected  

•  In  short,  if  you  are  dead  somebody  will  no<ce  it  and  if  you  are  alive,  nobody  will  mistake  you  for  dead  !  

Page 75: The Art of Big Data

Ø  Accrual  Failure  Detector  •  Not    Boolean  value  but  a  probabilisXc  number  that  “accrues”  over  

an  exponenXal  scale  •  Captures  the  degree  of  confidence  that  a  corresponding  monitored  

process  has  crashed[94]  –  Suspicion  Level  –  Ø  =  1  -­‐>  prob(error)  10%  –  Ø  =  2  -­‐>  prob(error)  1%  –  Ø  =  3  -­‐>  prob(error)  0.1%  

•  If  process  is  dead,    –  Ø  is  monotonically  increasing  &  Ø→α  as  t  →α  

•  If  process  is  alive  and  kicking,  Ø=0  •  Account  for  lost  messages,  network  latency  and  actual  crash  of  

system/process  

•  Well  known  heartbeat  period  Δi,  then  network  latency  Δtr  can  be  tracked  by  inter-­‐arrival  Xme  modeling  

Page 76: The Art of Big Data

Write/Read  Mechanisms  •  Read  &  Write  to  a  random  node  (StorageProxy)  

•  Proxy  coordinates  the  read  and  write  strategy  (R/W  =  any,  quorum  et  al)  

• Memtables/SSTables  from  big  table  •  Bloom  Filter/Index  •  LSM  Trees  

Page 77: The Art of Big Data

BF

Index

BF

Index

BF

Index

Commit Logs

MemTable

SSTable • Immutable • Compaction • Maintain Index & Bloom Filter

Node

Node

Flushing

Read

Write

Memory

Disk

Hbase – WAL, Memstore, HDFS File system

Page 78: The Art of Big Data

How…  does  HBase  work  again?  

http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html

http://hbaseblog.com/2010/07/04/hug11-hbase-0-90-preview-wrap-up/

Page 79: The Art of Big Data

Bloom  Filter  •  The  BloomFilter  answers  the  quesXon    •  “Might  there  be  data  for  this  key  in  this  SSTable?”  [Ref:  Cassandra/Hbase  mailer]  –  “Maybe"  or  –   “Definitely  not“  –  When  the  BloomFilter  says  "maybe"  we  have  to  go  to  disk  to  check  out  the  content  of  the  SSTable  

•  Depends  on  implementaXon  –  Redone  in  Cassandra  –  Hbase  0.20.x  removed,  will  be  back  in  0.90  with  a  “jazzy”  implementaXon  

Page 80: The Art of Big Data

Was it a vision, or a waking dream? Fled is that music:—do I wake or sleep?

-Keats, Ode to a Nightingale

Page 81: The Art of Big Data

•  http://www.readwriteweb.com/enterprise/2011/11/infographic-data-deluge---8-ze.php

•  http://www.crn.com/news/data-center/232200061/efficiency-or-bust-data-centers-drive-for-low-power-solutions-prompts-channel-growth.htm

•  http://www.quantumforest.com/2011/11/do-we-need-to-deal-with-big-data-in-r/

•  http://www.forbes.com/special-report/2011/migration.html •  http://www.mercurynews.com/bay-area-news/ci_19368103 •  http://www.businessinsider.com/apple-new-data-center-north-

carolina-created-50-jobs-2011-11