Upload
volien
View
216
Download
1
Embed Size (px)
Citation preview
Apache BookKeeper
A High Performance and Low Latency Storage Service
I am Sijie Guo- PMC Chair of Apache BookKeeper- Co-creator of Apache DistributedLog- Twitter Messaging/Pub-Sub Team- Yahoo! R&D Beijing
Hello!
Challenges in Distributed Systems
Expect Failures
up to 10% annual failure rates for disks/servers
“
Symptoms
Problem 1: Not Available
Problem 1: Not Available
Problem 2: Inconsistencies
CAP
“
More Issues
Problem 3: Split Brain
Writer A Writer A
Write A’
Writer A
Write A’
Two Writers
Problem 4: Failure Detection
B
A
C
Problem 5: Recovery
B
A
C
Recovery Protocol
Consistency
“
Solutions
OverviewEnter Apache BookKeeper
BookKeeper - Durable Storage
A building block for reliable systems
Commodity Hardware
Durability
Replication Consistency Recovery
Client Library
Ledger Abstraction
Ledger
◉ Segment
◉ Block / Object
◉ Append-Only File
◉ ...
Guarantees
If an entry
has been acknowledged,
it must be readable
If an entry
is read once,
it must always be readable
History
◉ Initial Use Case - Hadoop NameNode HA
◉ 2008: Open Sourced Contrib of ZooKeeper
◉ 2011: Sub-Project of ZooKeeper
◉ 2012: Yahoo! Push Notification
◉ 2012~Now: DistributedLog, Pulsar, Majordodo
◉ 2015~Now: Salesforce Distributed Store
Inside of Apache BookKeeper
Details
Architecture
Bookie
Bookie
Bookie
APPC
lient
Metadata Store
Ledger
Reliable Writes
◉ Store digest along with entry
◉ Fsync entries before responding
◉ Ack when
○ All Previous Entries
○ This Entry
Bookie
Bookie
Bookie
Accepted
by
Quorum
Consistency - LastAddPushed
0 1 2 3 4 7 8 9
LastAddPushed
10 11 12
Writer
Add entries
Consistency - LastAddConfirmed
0 1 2 3 4 7 8 9 10 11 12
LastAddConfirmed
Reader Reader
LastAddConfirmed
Writer WriterOwnership Changed
Add entriesAck Adds
Fencing
Fencing
Read Entry & Read LAC
B1 B2 B3
Client
Read Entry K
Speculative ReadsOn Timeouts
B1 B2 B3
Client
Read LAC
Quorum Read
Long Poll Read
B1 B2 B3
Client
Long Poll ReadSpeculativeLong Poll
Inside a Bookie
Use CasesApache BookKeeper as a Building Block
Projects built on BookKeeper
◉ Twitter: Apache DistributedLog
◉ Yahoo: Pulsar - Cloud Messaging Service
◉ Salesforce Distributed Store.
◉ Huawei - HDFS NameNode HA
◉ HubSpot - WAL
◉ Majordodo - Distributed Resource Manager
“
Apache DistributedLog(Twitter)
Apache DistributedLog
1 2 3 4 5 6 7 11 12
13
14
15
16 1
7
Oldest Newest
Log SegmentX
Log SegmentX+1
Log SegmentX+2
Apache BookKeeper
Apache DistributedLogM
etad
ata
Stor
e
Log SegmentStore(BK)
ColdStorage(HDFS)
Log Streams - Abstraction & Naming- Data Management
- Efficient Write & Read- Intra-cluster & Geo Replication
- Segments
- Raw Streams
WriteProxy
ReadProxy
- Ownership Tracking- Batching, Compression
Record Cache -Rate Limiting, Quota -
- Serving
- Applications
- Different
Consumer
models
DBs - e.g.,Twitter’s
Manhattan
DeferredRPC
(queuing)
Self-servePub/Sub
StreamComputing
Cross DCReplication
DistributedLog at Twitter
◉ Manhattan Key/Value Store - WAL
◉ Durable Deferred RPC - Journal
◉ Real-Time Search Indexing - Change Propagation
◉ Self-serve Pub/Sub - Message Delivery, Ads Pipeline
◉ Stream Computing
○ Source & Sink
○ Stateful Processing in Heron (coming soon)
◉ Reliable Cross Datacenter Replication
Scale DistributedLog at Twitter
◉ 1.5 trillion records/day, 17.5 petabytes/day
◉ O(10) thousands streams, O(1) million live ledgers
◉ O(10^2) bookies, O(10^3) proxies
◉ Records size from 100 bytes to 20 KB to even more
◉ Data is kept from hours to days, even up to a year
◉ Replication factor is 3 or 5. 9 or 15 for global use
case.
DistributedLog Resources
◉ Website - https://distributedlog.io
◉ Mail List -
◉ Project Ideas - https://cwiki.apache.org/confluence/display/DL/Project+Ideas
◉ Paper - “DistributedLog: A high performance
replicated log service” (ICDE 2017)
“
Yahoo! Pulsar(Cloud Messaging Service)
Yahoo! Pulsar
◉ Distributed Pub/Sub Messaging Platform
◉ Flexible Messaging Model - Topic and Queue
◉ Durable, Low Latency
◉ Strong Ordering and Consistency Guarantees
◉ Geo Replication
◉ Apache BookKeeper as Durable Message Store
Yahoo! Pulsar
Scale Pulsar at Yahoo!
◉ 100 billion messages per day
◉ More than 1.4 million topics
◉ Avg publish latency across services of less than 5ms
◉ 10+ data centers, cross-region replications
Pulsar Performance
“
Salesforce Distributed Store
Salesforce Application Storage
◉ Store for Persistent WAL, Data and Objects
◉ Low, Constant Write Latencies
◉ Low, Constant Random Read Latencies
◉ Highly Available, Consistent
◉ Distributed and Linearly Scalable
◉ On Commodity Hardware
Heterogeneous Stores
Roadmap, Releases, Future
Community
Community
◉ 7 PMC Members◉ 10+ Committers◉ 20+ Active Contributors◉ 5+ Companies actively using/contributing
○ Twitter○ Yahoo!○ Salesforce○ Huawei○ EMC
Release 4.5.0
◉ Netty 4 Upgrade - Performance Improvements
◉ Security (Authentication & Authorization) Support
◉ Explicit LAC
◉ Long Poll Read Support
◉ Auto Re-replication Improvements
◉ ...
Future
◉ Scalable Segment Store○ Object, Log, File, Stream, …
◉ Long Term Storage○ Disk Scrubber
○ Better Lifecycle Management
○ …
◉ Beyond the limit○ 128 bits support
○ Scalable metadata management