Upload
line-corporation-tech-unit
View
18.349
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Slides at hcj13w (http://hcj2013w.eventbrite.com/)
Citation preview
Storage infrastructure using HBase behind LINE messages
NHN Japan Corp. LINE Server Task Force
Shunsuke Nakamura @sunsuk7tp
13.1.21 2 Hadoop Conference Japan 2013 Winter
To support ’s users, we have built message storage that is
Large scale (tens of billion rows/day) Responsive (under 10 ms)
High available (dual clusters)
13.1.21 3 Hadoop Conference Japan 2013 Winter
Outline
• About LINE • LINE & Storage requirements • What we achieved • Today’s topics
– IDC online migration – NN failover – Stabilizing LINE message cluster
• Conclusion 13.1.21 4 Hadoop Conference Japan 2013 Winter
LINE - A global messenger powered by NHN Japan -
Devices 5 different mobile platforms + Desktop support
13.1.21 5 Hadoop Conference Japan 2013 Winter
13.1.21 6 Hadoop Conference Japan 2013 Winter
13.1.21 7 Hadoop Conference Japan 2013 Winter
New year 2013 in Japan
3 5mes traffic explosion LINE Storage had no problems :)
(ploFed by 1min)
13.1.21 9 Hadoop Conference Japan 2013 Winter
Number of requests in a HBase cluster Usual Peak Hours New Year 2013
あけおめ! 新年好!
X 3
LINE on Hadoop Storages for service, backup and log
For HBase, M/R and log archive
Bulk migration and ad-hoc analysis
For HBase and Sharded-Redis
Collecting Apache and Tomcat logs
KPI, Log analysis 13.1.21 10 Hadoop Conference Japan 2013 Winter
LINE on Hadoop Storages for service, backup and log
For HBase, M/R and log archive
Bulk migration and ad-hoc analysis
For HBase and Sharded-Redis
Collecting Apache and Tomcat logs
KPI, Log analysis 13.1.21 11 Hadoop Conference Japan 2013 Winter
LINE service requirements
LINE is a… Messaging Service - Should be fast Global Service - Downtime not allowed
But, not a Simple Messaging Service. Message synchronization b/w phone & PCs – Messages should be kept for a while.
13.1.21 12 Hadoop Conference Japan 2013 Winter
LINE’s storage requirements
HA
No data loss
Low latency
Easy scale-‐out
Flexible schema
management
Eventual consistency
13.1.21 13 Hadoop Conference Japan 2013 Winter
Our selection is HBase
• Low latency for large amount of data
• Linearly scalable • Relatively lower operating cost
– Replication by nature – Automatic failover
• Data model fits our requirements – Semi-structured – Timestamp
13.1.21 14 Hadoop Conference Japan 2013 Winter
Stored rows per day in a cluster 10
8
6
4
2
(billions/day)
13.1.21 15 Hadoop Conference Japan 2013 Winter
What we achieved with HBase
• No data loss – Persistent – Data replication
• Automatic recovery from server failure
• Reasonable performance for large data sets – Hundreds of billion rows – Write: ~ 1 ms – Read: 1 ~ 10 ms
13.1.21 16 Hadoop Conference Japan 2013 Winter
Many issues we had • Heterogeneous storages coordination • IDC online migration • Flush & Compaction Storms by “too many HLogs” • Row & Column distribution • Secondary Index • Region Management
– load, size balancing – RS Allocation – META region – M/R
• Monitoring for diagnostics • Traffic burst by decommission • NN problems • Performance degradation
– hotspot problem – timeout burst – GC problem
• Client bugs – Thread Blocking on server failure (HBASE-6364)
13.1.21 17 Hadoop Conference Japan 2013 Winter
Today’s topics
IDC online migration
NN failover
Stabilizing LINE message cluster
13.1.21 18 Hadoop Conference Japan 2013 Winter
IDC online migration NN failover
Stabilizing LINE message cluster
Why?
• Move whole HBase clusters and data
• For better network infrastructure
• Without downtime
13.1.21 20 Hadoop Conference Japan 2013 Winter
IDC online migration
App Server
src-HBase
dst-HBase
write
Before migration
13.1.21 21 Hadoop Conference Japan 2013 Winter
IDC online migration
• Write to both (client-level replication)
App Server
src-HBase
dst-HBase write
write
13.1.21 22 Hadoop Conference Japan 2013 Winter
IDC online migration
• New data: Incremental replication • Old data: Bulk migration • dst’s timestamp equals src’s one
App Server
src-HBase
dst-HBase write
write
13.1.21 23 Hadoop Conference Japan 2013 Winter
LINE HBase Replicator & BulkMigrator
Replicator is for incremental replication BulkMigrator is for bulk migration
13.1.21 24 Hadoop Conference Japan 2013 Winter
LINE HBase Replicator • Our own implementation • Prefer pull to push
• Throughput throttling • Workload isolation of replicator and RS
• Rowkey conversion and filtering
src-HBase
dst-HBase
push
HBase Replicator
src-HBase
dst-HBase
pull
LINE HBase Replicator
13.1.21 25 Hadoop Conference Japan 2013 Winter
LINE HBase Replicator - A simple daemon to replicate local regions -
1. HLogTracker reads a ckpt and selects next HLog.
2. For each entry in HLog: 1. Filter & convert a HLog.Entry 2. Create Puts and batch to dst HBase
• Periodic checkpointing • Generally, entries are replicated
in seconds
13.1.21 26 Hadoop Conference Japan 2013 Winter
Bulk migration 1. MapReduce between any storages
– Map task only – Read source, write destination – Task scheduling problem depends on region allocation
2. Non MapReduce version (BulkMigrator) – Our own implementation – HBase → HBase – On each RS, scan & batch by a region – Throughput throttling – Slow, but easy to implement and debug
13.1.21 27 Hadoop Conference Japan 2013 Winter
IDC online migration NN failover
Stabilizing LINE message cluster
Background
• Our HBase has a SPOF: NameNode • “Apache Hadoop HA Configuration”
http://blog.cloudera.com/blog/2009/07/hadoop-ha-configuration/ • Furthermore, added Pacemaker
– Heartbeat can’t detect whether NN is running
13.1.21 29 Hadoop Conference Japan 2013 Winter
Previous: HA-NN DRBD + VIP + Pacemaker
13.1.21 30 Hadoop Conference Japan 2013 Winter
NameNode failure in 2012.10
13.1.21 31 Hadoop Conference Japan 2013 Winter
HA-NN failover failed
• Not NameNode process • Incorrect leader election at network partitioning • Complicated configuration
– Easy to mistake, difficult to control – Pacemaker scripting was not straightforward – VIP is risky to HDFS
• DRBD split-brain problem – Protocol C – Unable to re-sync while service is online
13.1.21 32 Hadoop Conference Japan 2013 Winter
Now: In-house NN failure handling
• Bye-bye old HA-NN – Had to restart whole HBase clusters after NN failover
• Alternative ideas – Quorum-based leader election (Using ZK) – Using L4 switch – Implement our own AvatarNode
• Safer solution instead of a little downtime
13.1.21 33 Hadoop Conference Japan 2013 Winter
rsync with -‐-‐link-‐dest periodically
In-house NN failure handling (1)
13.1.21 34 Hadoop Conference Japan 2013 Winter
Bomb
In-house NN failure handling (2)
13.1.21 35 Hadoop Conference Japan 2013 Winter
In-house NN failure handling (3)
13.1.21 36 Hadoop Conference Japan 2013 Winter
IDC online migration NN failover
Stabilizing LINE message cluster
Stabilizing LINE message cluster
Performance
“Too many HLogs”
Hotspot problems
Region mappings to RS
META region workload isola5on
RS GC Storm H/W Failure Handling
Case 1
Case 2
Case 4
Case 3
13.1.21 38 Hadoop Conference Japan 2013 Winter
Case1: “Too many HLogs” • Effect
– MemStore flush storm – Compaction storm
• Cause – Different regions growth – Heterogeneous tables in a RS
• Solution – Region balancing – External flush scheduler
13.1.21 39 Hadoop Conference Japan 2013 Winter
Case1: Number of HLogs
No flushed
Forced flushed
peak off-peak
better case
worse case
Forced flushed Forced flushed
flush storm
Forced flushed
Periodic flushed
13.1.21 40 Hadoop Conference Japan 2013 Winter
Case2: Hotspot problems • Effect
– Excessive GC – RS performance degradation (High CPU usage)
• Cause – Get/Scan:
• Row or column, updated too frequently • Row which has too many columns (+ tombstones)
• Solution – Schema and row/column distribution are important – Hotspot region isolation
13.1.21 41 Hadoop Conference Japan 2013 Winter
Case3: META region workload isolation
• Effect 1. RS high CPU 2. Excessive timeout 3. META lookup timeout
• Cause – Inefficient exception handling of HBase client – Hotspot region and META in same RS
• Solution – META only RS
13.1.21 42 Hadoop Conference Japan 2013 Winter
Case4: Region mappings to RS
• Effect – Region mapping is not restored on RS restart – Some region mappings aren’t restored properly
after graceful restart • graceful_stop.sh --restart --reload
• Cause – HBase does not support it well
• Solution – Periodic dump and restore it
13.1.21 43 Hadoop Conference Japan 2013 Winter
Summary
• IDC online migration – Without downtime – LINE HBase Replicator & BulkMigrator
• NN failover – Simple solution for a person saying
“What’s Hadoop?” • Stabilizing LINE message cluster
– Improved response time of RS
13.1.21 44 Hadoop Conference Japan 2013 Winter
Conclusion
We won 100M user adopting HBase
LINE Storage is a successful example
of a messaging service using HBase
13.1.21 45 Hadoop Conference Japan 2013 Winter