Upload
hbasecon
View
158
Download
0
Embed Size (px)
Citation preview
Tian-Ying ChangStorage & Caching, Engineering @ Pinterest
Removable SingularityA Story of HBase Upgrade at Pinterest
12345
AgendaUsage of HBase in PinterestUpgrading Situation and ChallengesMigration StepsPerformance TuningFinal Notes
HBase @ Pinterest§Early applications: homefeed, search & discovery§UserMetaStore: a generic store layer for all applications using user
KV data§Zen: a graph service layer• Many applications can be modeled as graph• Usage of HBase flourished after Zen release
§Other misc. use cases, e.g., analytics, OpenTSDB
40+ HBase clusters on 0.94
Need Upgrading§HBase 0.94 is not supported by community anymore§Newer version has better reliability, availability and
performance§Easier to contribute back to community
Singularity of HBase 0.96§RPC protocol changes§Data format changes§HDFS folder structure changes
§API changes§Generally considered “impossible” to live upgrade without
downtime
http://blog.cloudera.com/blog/2012/06/the-singularity-hbase-compatibility-and-extensibility/
The Dilemma§Singularity: cannot live upgrade from HBase 0.94 to later
version• Need downtime
§Pinterest hosts critical online real time services on HBase• Cannot afford downtime
§Stuck?
Fast Forward§Successfully upgraded production clusters last year• Chose one of the most loaded clusters as pilot• No application redeploy needed on the day of switching
to 1.2 cluster• Live switch with no interrupt to Pinterest site
§Big performance improvement
P99 L
atenc
y of D
iffere
nt A
PIs
Point of Live Switch
How did we do it?
ZKClient
read/write
HBase 0.94
ZKClient
read/write
HBase 0.94 HBase 0.94nativereplication
ZKClient
read/write
HBase 0.94 HBase 0.94nativereplication
ZKClient
read/write
HBase 0.94
ZKClient
read/write
HBase 0.94 HBase 0.94
1. Build empty cluster
ZKClient
read/write
HBase 0.94 HBase 0.94nativereplication
1. Build empty cluster
2. Setup replication
ZKClient
read/write
HBase 0.94 HBase 0.94nativereplication
1. Build empty cluster
2. Setup replication
3. Export snapshot
ZKClient
read/write
HBase 0.94 HBase 0.94nativereplication
1. Build empty cluster
2. Setup replication
3. Export snapshot4. Recover table from snapshot
ZKClient
read/write
HBase 0.94 HBase 0.94nativereplication
1. Build empty cluster
2. Setup replication
3. Export snapshot4. Recover table from snapshot
5. Replication drain
ZKClient
read/write
HBase 0.94 HBase 0.94nativereplication
ZKClient
read/write
HBase 0.94 HBase 1.2replication
ZKClient
read/write
HBase 0.94 HBase 1.2replication
ZKClient
read/write
HBase 0.94HBase 0.94
ZKClient
read/write
HBase 0.94 HBase 1.2
1. Build empty cluster
HBase 0.94
ZKClient
read/write
HBase 0.94 HBase 1.2non-nativereplication
1. Build empty cluster
2. Setup replication
HBase 0.94
ZKClient
read/write
HBase 0.94 HBase 1.2non-nativereplication
1. Build empty cluster
2. Setup replication
3. Export snapshot
HBase 0.94
ZKClient
read/write
HBase 0.94 HBase 1.2non-nativereplication
1. Build empty cluster
2. Setup replication
3. Export snapshot4. Recover 1.2 table from 0.94 snapshot
HBase 0.94
ZKClient
read/write
HBase 0.94 HBase 1.2non-nativereplication
1. Build empty cluster
2. Setup replication
3. Export snapshot4. Recover table from snapshot
5. Replication drain
HBase 0.94
ZKClient
read/write
HBase 0.94 HBase 1.2non-nativereplicationHBase 0.94
Major Problems to Solve
Client able to talk to both 0.94 and 1.2 automatically
Data can be replicated between HBase 0.94 and 1.2 bi-directional
AsyncHBase Client§Chose AsyncHBase due to better throughput and latency• Stock AsyncHBase 1.7 can talk to both 0.94 and later version by
detecting the HBase version and use different protocol
§But cannot directly use stock AsyncHBase 1.7• We made many private improvements internally• Need to make those private features work with 1.2 cluster
AsyncHBase Client Improvements§ BatchGet (open sourced in tsdb 2.4)§ SmallScan§ Ping feature to handle AWS network issue§ Pluggable metric framework§ Metrics broken down by RPC/region/RS
• Useful for debugging issues with better slice and dice § Rate limiting feature
• Automatically throttle/blacklist requests based on, e.g., latency• Easier and better place to do throttling than at HBase RS side
§ In progress to open source
Live Data Migration§Export snapshots from 0.94, recover tables in 1.2• Relatively easy since we were already doing it between 0.94 and 0.94• Modifying our existing DR/backup tool to work between 0.94 and
1.2
§Bidirectional live replication between 0.94 and 1.2 • Breaking changes in RPC protocol means native replication does not
work• Using thrift replication to overcome the issue
Thrift Replication§Patch from Flurry HBASE-12814§Fixed a bug in the 0.98/1.2 version
• Threading bug exposed during prod data testing with high write QPS• Fixed by implementing thrift client connection pool for each replication sink• Fix also made the replication more performant
§Bidirectional is needed for potential rollback§Verification!!
• Chckr: tool for checking data replication consistency between 0.94 and 1.2 cluster• Used a configurable timestamp parameter to eliminate false positive caused by
replication delay
Upgrade Steps (Recap)§Build new 1.2 empty cluster§Set up master/master thrift replication between 0.94 and 1.2§Export snapshot from 0.94 into 1.2 cluster§Recover table in 1.2 cluster§Replication draining§Monitor health metrics§Switch client to use 1.2 cluster
Performance§Use production dark read/write traffic to verify performance§Measured round trip latency from AsyncHBase layer• Cannot do server latency compare since 1.2 has p999 server side
latency, while 0.94 does not• Use metric breakdown by RPS/region/RS with Ostrich
implementation to compare performance
Get
BatchGet
Put
CompAndSet
Delete
Read Performance§Default config has worse p99 latency **• Bucket cache hurts p99 latency due to bad GC
§Short circuit read helps latency§Use CMS instead of G1GC§Native checksum helps latency
* *The read path off-heap feature from 2.0 should help a lot. HBASE-17138
Write Performance§Use write heavy load user case to expose perf issues §Metrics shows much higher wal sync ops than 0.94§Disruptor based wal sync implementation caused too much wal
sync operation§hbase.regionserver.hlog.syncer.count default is 5, changed to 1
Thanks!
Community helps from Michael Stack and Rahul Gidwani
© Copyright, All Rights Reserved, Pinterest Inc. 2017
We are hiring!