Upload
bo-liu
View
190
Download
0
Embed Size (px)
Citation preview
August 31, 2016
PinterestEngineering
Bo LiuSoftware Engineer, Serving Systems
Replicated RocksDB at Pinterest
Kafka
Example 1WritesReads
John saw Pin 1, Pin 2, …Pin K at Time T
Online event tracking system
Kafka
Example 1Writes
Fetch the last 1,000 Pins seen by John
Reads
John saw Pin 1, Pin 2, …Pin K at Time T
Online event tracking system
Kafka
Example 1Writes
Fetch the last 1,000 Pins seen by John
Fetch the number of Pins seen by John between Time T1 and T2
Reads
John saw Pin 1, Pin 2, …Pin K at Time T
Online event tracking system
Kafka
Example 2WritesReads
John just followed Board 1
Board based Pin retrieving and ranking system
Kafka
Example 2WritesReads
John just followed Board 1
Board based Pin retrieving and ranking system
Pin 1 was just saved to Board 1
Kafka
Example 2Writes
Fetch the most relevant Pins followed by John
Reads
John just followed Board 1
Board based Pin retrieving and ranking system
Pin 1 was just saved to Board 1
Kafka
Example 3WritesReads
Add u to HyperLogLog A
Distributed storage system with data structure support
Kafka
Example 3WritesReads
Add u to HyperLogLog A
Distributed storage system with data structure support
Add e to List B
Fetch List B
Kafka
Example 3WritesReads
Add u to HyperLogLog A
Distributed storage system with data structure support
Add e to List B
Fetch List B
Fetch the unique member # of HyperLogLog A
Kafka
Example 3WritesReads
Add u to HyperLogLog A
Distributed storage system with data structure support
Add e to List B
RocksDB Replicator
Application API Admin API
Generate cluster config
Application Logic Admin Logic ZooKeeper
Admin tool
Common system architecture
Rocks DBRocks DBRocks DBRocks DB
RocksDB Replicator
Generate cluster config
Admin tool
Load configwhen start
Application API Admin API
Application Logic Admin Logic ZooKeeper
Common system architecture
Rocks DBRocks DBRocks DBRocks DB
RocksDB Replicator
Generate cluster config
Admin tool
Load configwhen start
ZooKeeper
Application API Admin API
Application Logic Admin Logic
Create/Open DB
Common system architecture
Rocks DBRocks DBRocks DBRocks DB
RocksDB Replicator
Generate cluster config
Admin tool
Load configwhen start
ZooKeeper
Add/Remove DB for replication
Application API Admin API
Application Logic Admin Logic
Create/Open DB
Common system architecture
Rocks DBRocks DBRocks DBRocks DB
Generate cluster config
Admin tool
Load configwhen start
Create/Open DB Add/Remove DB for replication
Data Replicationlocal updates
remote updates
Application API Admin API
Application Logic Admin Logic
RocksDB Replicator
ZooKeeper
Common system architecture
Rocks DBRocks DBRocks DBRocks DB
Generate cluster config
Load configwhen start
Create/Open DB Add/Remove DB for replication
Data Replicationlocal updates
remote updates
RocksDB Replicator
ZooKeeper
Cluster management
Application API Admin APIAdmin tool
Application Logic Admin Logic
Common system architecture
Rocks DBRocks DBRocks DBRocks DB
Cluster managementGenerate cluster config
Load configwhen start
Create/Open DB Add/Remove DB for replication
Data Replicationlocal updates
remote updates
RocksDB Replicator
Admin tool
GetDB()
Application API Admin API
Admin Logic ZooKeeperApplication Logic
Common system architecture
Rocks DBRocks DBRocks DBRocks DB
Cluster managementGenerate cluster config
Load configwhen start
Create/Open DB Add/Remove DB for replication
Data Replicationlocal updates
remote updates
RocksDB Replicator
Admin tool
GetDB()ZooKeeper
Read/Write
Common system architectureApplication API Admin API
Application Logic Admin Logic
Rocks DBRocks DBRocks DBRocks DB
RocksDB replicator design•Support async Master-Slave replication only•Replicate multiple RocksDBs in one process•Replication role at RocksDB instance level•Work reactively ( AddDB(), RemoveDB() )•Low replication latency
RocksDB replicator implementation•RocksDB WAL sequence # as global replication sequence #
•fbthrift for RPC•Pull & Push
Latest SEQ #
Thrift Server
Worker threads
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
Get update sinceSEQ# for DB2Latest SEQ #
Thrift Server
Worker threads
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
Get update sinceSEQ# for DB2
Updates since SEQ# for DB2
Latest SEQ #
Thrift Server
Worker threads
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
Apply updates
Get update sinceSEQ# for DB2
Updates since SEQ# for DB2
Latest SEQ #
Thrift Server
Worker threads
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
Get updates since SEQ# for DB1
Thrift Server
Worker threads
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
Get updates since SEQ# for DB1
Thrift Server
Worker threads
Send request
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
Get updates since SEQ# for DB1
Thrift Server
Worker threads
Send request
Has updates since SEQ#?
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
Get updates since SEQ# for DB1
Thrift Server
Worker threads
Send requestYes, this is the data
Has updates since SEQ#?
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
Get updates since SEQ# for DB1
Thrift Server
Worker threads
Send requestResponseYes, this is the data
Has updates since SEQ#?
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
Response
Get updates since SEQ# for DB1
Thrift Server
Worker threads
Send requestResponseYes, this is the data
Has updates since SEQ#?
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
Get updates since SEQ# for DB1
Thrift Server
Worker threads
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
Get updates since SEQ# for DB1
Thrift Server
Worker threads
Send request
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
Get updates since SEQ# for DB1
Thrift Server
Worker threads
Send request
Has updates since SEQ#?
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
No, wait for my notification
Get updates since SEQ# for DB1
Thrift Server
Worker threads
Send request
Has updates since SEQ#?
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
Get updates since SEQ# for DB1
Thrift Server
Worker threads
Send request
Writes
No, wait for my notification
Has updates since SEQ#?
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
Get updates since SEQ# for DB1
Thrift Server
Worker threads
Send requestNo, wait for my notification
Has updates since SEQ#?
These are the new updates
RocksDB replicator workflowWrites
DB1 Master
DB2 Slave
Upstream: ip_Port
Get updates since SEQ# for DB1
Thrift Server
Worker threads
Send requestNo, wait for my notification
Has updates since SEQ#?
These are the new updates
Response
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
Writes
Response
Get updates since SEQ# for DB1
Thrift Server
Worker threads
RocksDB replicator workflow
Send requestNo, wait for my notification
Has updates since SEQ#?
These are the new updates
Response
DB1 Master
DB2 Slave
Upstream: ip_Port
Writes
•Production load: 1MB/s, P99 12ms, Max 60ms•Synthetic load: 76MB/s, P99 106ms, Max 224ms•Developer velocity: Build a production quality real-time counter service in one week
Performance
Cluster managementGenerate cluster config
Load configwhen start
Create/Open DB Add/Remove DB for replication
Data Replicationlocal updates
remote updates
RocksDB Replicator
Admin tool
GetDB()ZooKeeper
Read/Write
Application API Admin API
Rocks DBRocks DBRocks DBRocks DB
Application Logic Admin Logic
Open source - coming soon
Serving Systems Team @Pinterest
Thank you
Bo Liu, Shu Zhang, Jian Fang, Jinru He, Linda Lo, Yongsheng Wu
Data Analytics Team @PinterestBryant Xiao, Justin Mejorada Pier, Shuo Xiang,Qingxian Lai, Tien Nguyen, Chunyan Wang
Q&A