Apache BookKeeper Log, File, Stream, … Long Term Storage Disk Scrubber Better Lifecycle Management...

Preview:

Citation preview

Apache BookKeeper

A High Performance and Low Latency Storage Service

I am Sijie Guo- PMC Chair of Apache BookKeeper- Co-creator of Apache DistributedLog- Twitter Messaging/Pub-Sub Team- Yahoo! R&D Beijing

Hello!

Challenges in Distributed Systems

Expect Failures

up to 10% annual failure rates for disks/servers

Symptoms

Problem 1: Not Available

Problem 1: Not Available

Problem 2: Inconsistencies

CAP

More Issues

Problem 3: Split Brain

Writer A Writer A

Write A’

Writer A

Write A’

Two Writers

Problem 4: Failure Detection

B

A

C

Problem 5: Recovery

B

A

C

Recovery Protocol

Consistency

Solutions

OverviewEnter Apache BookKeeper

BookKeeper - Durable Storage

A building block for reliable systems

Commodity Hardware

Durability

Replication Consistency Recovery

Client Library

Ledger Abstraction

Ledger

◉ Segment

◉ Block / Object

◉ Append-Only File

◉ ...

Guarantees

If an entry

has been acknowledged,

it must be readable

If an entry

is read once,

it must always be readable

History

◉ Initial Use Case - Hadoop NameNode HA

◉ 2008: Open Sourced Contrib of ZooKeeper

◉ 2011: Sub-Project of ZooKeeper

◉ 2012: Yahoo! Push Notification

◉ 2012~Now: DistributedLog, Pulsar, Majordodo

◉ 2015~Now: Salesforce Distributed Store

Inside of Apache BookKeeper

Details

Architecture

Bookie

Bookie

Bookie

APPC

lient

Metadata Store

Ledger

Reliable Writes

◉ Store digest along with entry

◉ Fsync entries before responding

◉ Ack when

○ All Previous Entries

○ This Entry

Bookie

Bookie

Bookie

Accepted

by

Quorum

Consistency - LastAddPushed

0 1 2 3 4 7 8 9

LastAddPushed

10 11 12

Writer

Add entries

Consistency - LastAddConfirmed

0 1 2 3 4 7 8 9 10 11 12

LastAddConfirmed

Reader Reader

LastAddConfirmed

Writer WriterOwnership Changed

Add entriesAck Adds

Fencing

Fencing

Read Entry & Read LAC

B1 B2 B3

Client

Read Entry K

Speculative ReadsOn Timeouts

B1 B2 B3

Client

Read LAC

Quorum Read

Long Poll Read

B1 B2 B3

Client

Long Poll ReadSpeculativeLong Poll

Inside a Bookie

Use CasesApache BookKeeper as a Building Block

Projects built on BookKeeper

◉ Twitter: Apache DistributedLog

◉ Yahoo: Pulsar - Cloud Messaging Service

◉ Salesforce Distributed Store.

◉ Huawei - HDFS NameNode HA

◉ HubSpot - WAL

◉ Majordodo - Distributed Resource Manager

Apache DistributedLog(Twitter)

Apache DistributedLog

1 2 3 4 5 6 7 11 12

13

14

15

16 1

7

Oldest Newest

Log SegmentX

Log SegmentX+1

Log SegmentX+2

Apache BookKeeper

Apache DistributedLogM

etad

ata

Stor

e

Log SegmentStore(BK)

ColdStorage(HDFS)

Log Streams - Abstraction & Naming- Data Management

- Efficient Write & Read- Intra-cluster & Geo Replication

- Segments

- Raw Streams

WriteProxy

ReadProxy

- Ownership Tracking- Batching, Compression

Record Cache -Rate Limiting, Quota -

- Serving

- Applications

- Different

Consumer

models

DBs - e.g.,Twitter’s

Manhattan

DeferredRPC

(queuing)

Self-servePub/Sub

StreamComputing

Cross DCReplication

DistributedLog at Twitter

◉ Manhattan Key/Value Store - WAL

◉ Durable Deferred RPC - Journal

◉ Real-Time Search Indexing - Change Propagation

◉ Self-serve Pub/Sub - Message Delivery, Ads Pipeline

◉ Stream Computing

○ Source & Sink

○ Stateful Processing in Heron (coming soon)

◉ Reliable Cross Datacenter Replication

Scale DistributedLog at Twitter

◉ 1.5 trillion records/day, 17.5 petabytes/day

◉ O(10) thousands streams, O(1) million live ledgers

◉ O(10^2) bookies, O(10^3) proxies

◉ Records size from 100 bytes to 20 KB to even more

◉ Data is kept from hours to days, even up to a year

◉ Replication factor is 3 or 5. 9 or 15 for global use

case.

DistributedLog Resources

◉ Website - https://distributedlog.io

◉ Mail List -

dev@distributedlog.incubator.apache.org

◉ Project Ideas - https://cwiki.apache.org/confluence/display/DL/Project+Ideas

◉ Paper - “DistributedLog: A high performance

replicated log service” (ICDE 2017)

Yahoo! Pulsar(Cloud Messaging Service)

Yahoo! Pulsar

◉ Distributed Pub/Sub Messaging Platform

◉ Flexible Messaging Model - Topic and Queue

◉ Durable, Low Latency

◉ Strong Ordering and Consistency Guarantees

◉ Geo Replication

◉ Apache BookKeeper as Durable Message Store

Yahoo! Pulsar

Scale Pulsar at Yahoo!

◉ 100 billion messages per day

◉ More than 1.4 million topics

◉ Avg publish latency across services of less than 5ms

◉ 10+ data centers, cross-region replications

Pulsar Performance

Salesforce Distributed Store

Salesforce Application Storage

◉ Store for Persistent WAL, Data and Objects

◉ Low, Constant Write Latencies

◉ Low, Constant Random Read Latencies

◉ Highly Available, Consistent

◉ Distributed and Linearly Scalable

◉ On Commodity Hardware

Heterogeneous Stores

Roadmap, Releases, Future

Community

Community

◉ 7 PMC Members◉ 10+ Committers◉ 20+ Active Contributors◉ 5+ Companies actively using/contributing

○ Twitter○ Yahoo!○ Salesforce○ Huawei○ EMC

Release 4.5.0

◉ Netty 4 Upgrade - Performance Improvements

◉ Security (Authentication & Authorization) Support

◉ Explicit LAC

◉ Long Poll Read Support

◉ Auto Re-replication Improvements

◉ ...

Future

◉ Scalable Segment Store○ Object, Log, File, Stream, …

◉ Long Term Storage○ Disk Scrubber

○ Better Lifecycle Management

○ …

◉ Beyond the limit○ 128 bits support

○ Scalable metadata management

Any questions ?You can find me at

◉ @sijieg◉ guosijie@gmail.com

Thanks!