21
Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash { ehwang,rash }@ fb.com Speaker : Haiping Wang [email protected]

Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang [email protected]@gamil.com

Embed Size (px)

Citation preview

Page 1: Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.comctqlwhp1022@gamil.com

Data Freeway : Scaling Out to Realtime

Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com

Speaker : Haiping Wang [email protected]

Page 2: Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.comctqlwhp1022@gamil.com

Agenda

» Data at Facebook» Realtime Requirements» Data Freeway System Overview» Realtime Components

› Calligraphus/Scribe› HDFS use case and modifications› Calligraphus: a Zookeeper use case › ptail› Puma

» Future Work

Page 3: Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.comctqlwhp1022@gamil.com

Big Data, Big Applications / Data at Facebook

» Lots of data› More than 500 million active users › 50 million users update their statuses at least once each day› More than 1 billion photos uploaded each month › More than 1 billion pieces of content (web links, news stories,

blog posts, notes, photos, etc.) shared each week› Data rate: over 7 GB / second

» Numerous products can leverage the data› Revenue related: Ads Targeting› Product/User Growth related: AYML, PYMK, etc› Engineering/Operation related: Automatic Debugging› Puma: streaming queries

Page 4: Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.comctqlwhp1022@gamil.com

Example: User related Application

» Major challenges: Scalability , Latency

Page 5: Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.comctqlwhp1022@gamil.com

Realtime Requirements

› Scalability: 10-15 GBytes/second

› Reliability: No single point of failure

› Data loss SLA: 0.01%

• Loss due to hardware: means at most 1 out of 10,000 machines

can lose data

› Delay of less than 10 sec for 99% of data

• Typically we see 2s

› Easy to use: as simple as ‘tail –f /var/log/my-log-file’

Page 6: Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.comctqlwhp1022@gamil.com

Data Freeway System Diagram

» Scribe & Calligraphus get data into the system» HDFS at the core» Ptail provides data out» Puma is a emerging streaming analytics platform

Page 7: Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.comctqlwhp1022@gamil.com

Scribe

• Scalable distributed logging framework

• Very easy to use:

• scribe_log(string category, string message)

• Mechanics:

• Built on top of Thrift

• Runs on every machine at Facebook, Collect the log data into a bunch of

destinations

• Buffer data on local disk if network is down

• History:

• 2007: Started at Facebook

• 2008 Oct: Open-sourced

Page 8: Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.comctqlwhp1022@gamil.com

Calligraphus

» What

› Scribe-compatible server written in Java

› Emphasis on modular, testable code-base, and

performance

» Why?

› Extract simpler design from existing Scribe

architecture

› Cleaner integration with Hadoop ecosystem

• HDFS, Zookeeper, HBase, Hive

» History

› In production since November 2010

› Zookeeper integration since March 2011

Page 9: Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.comctqlwhp1022@gamil.com

HDFS : a different use case

» Message hub

› Add concurrent reader support and sync

› Writers + concurrent readers a form of pub/sub model

Page 10: Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.comctqlwhp1022@gamil.com

HDFS : add Sync

» Sync

› Implement in 0.20 (HDFS-200)

• Partial chunks are flushed

• Blocks are persisted

› Provides durability

› Lowers write-to-read latency

Page 11: Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.comctqlwhp1022@gamil.com

HDFS : Concurrent Reads Overview

» Without changes, stock

Hadoop 0.20 does not

allow access to the block

being written

» Need to read the block

being written for realtime

apps in order to achieve

< 10s latency

Page 12: Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.comctqlwhp1022@gamil.com

HDFS : Concurrent Reads Implementation

1.DFSClient asks

Namenode for blocks

and locations

2.DFSClient asks

Datanode for length of

block being written

3.opens last block

Page 13: Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.comctqlwhp1022@gamil.com

Calligraphus: Log Writer

Calligraphus Servers

HDFSScribe categories

ServerServer

ServerServer

ServerServer

Category 1Category 1

Category 2Category 2

Category 3Category 3

?

How to persist to HDFS?

Page 14: Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.comctqlwhp1022@gamil.com

Calligraphus (Simple)

Calligraphus Servers

HDFSScribe categories

Number of categories

Number of servers

Total number of directories

x =

ServerServer

ServerServer

ServerServer

Category 1Category 1

Category 2Category 2

Category 3Category 3

Page 15: Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.comctqlwhp1022@gamil.com

Calligraphus Servers

HDFSScribe categories

Number of categories

Total number of directories

=

Category 1Category 1

Category 2Category 2

Category 3Category 3

RouterRouter

RouterRouter

RouterRouter

WriterWriter

WriterWriter

WriterWriter

Calligraphus (Stream Consolidation)

ZooKeeperZooKeeper

Page 16: Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.comctqlwhp1022@gamil.com

ZooKeeper: Distributed Map

» Design

› ZooKeeper paths as tasks (e.g. /root/<category>/<bucket>)

› Cannonical ZooKeeper leader elections under each bucket for

bucket ownership

› Independent load management – leaders can release tasks

› Reader-side caches

› Frequent sync with policy db

AA

11 5522 33 44

BB

11 5522 33 44

CC

11 5522 33 44

DD

11 5522 33 44

RootRoot

Page 17: Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.comctqlwhp1022@gamil.com

Canonical Realtime ptail Application

Hides the fact we have many HDFS

instances: user can specify a category

and get a stream

Check pointing

Puma

Page 18: Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.comctqlwhp1022@gamil.com

Puma Overview

» Realtime analytics platform

» Metrics

› count, sum, unique count, average, percentile

» Uses ptail check pointing for accurate calculations in the case

of failure

» Puma nodes are sharded by keys in the input stream

» HBase for persistence

Page 19: Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.comctqlwhp1022@gamil.com

Puma Write Path

Page 20: Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.comctqlwhp1022@gamil.com

Puma Read Path

» Performance

› Elapsed time typically 200-300 ms for 30 day queries

› 99th percentile, cross-country, < 500ms for 30 day queries

Page 21: Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com{ehwang,rash}@fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.comctqlwhp1022@gamil.com

Future Work

» Puma

› Enhance functionality: add application-level transactions on Hbase

› Streaming SQL interface

» Compression