43
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Operating and Supporting Apache HBase - Best Practices and Improvements Tanvir Kherada ([email protected]) Enis Soztutar ([email protected])

Operating and Supporting Apache HBase Best Practices and Improvements

Embed Size (px)

Citation preview

Page 1: Operating and Supporting Apache HBase Best Practices and Improvements

Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Operating and Supporting Apache HBase - Best Practices and Improvements

Tanvir Kherada ([email protected])Enis Soztutar ([email protected])

Page 2: Operating and Supporting Apache HBase Best Practices and Improvements

Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

About Us

Tanvir Kherada

Primary SME for HBase / Phoenix

Technical team lead @Hortonworks support

Enis Soztutar

Committer and PMC member in Apache HBase, Phoenix, and Hadoop

HBase/Phoenix dev @Hortonworks

Page 3: Operating and Supporting Apache HBase Best Practices and Improvements

Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Outline

Tools to debug: HBase UI and HBCK Top 3 categories of issues SmartSense Improvements for better operability Metrics and Alerts

Page 4: Operating and Supporting Apache HBase Best Practices and Improvements

Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tools

Page 5: Operating and Supporting Apache HBase Best Practices and Improvements

Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HBase UI

Load Distribution Debug Dump Runtime Configuration RPC Tasks

Page 6: Operating and Supporting Apache HBase Best Practices and Improvements

Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HBase UI – Load Distribution Request Per Second Read Request Count per RegionServer Write Request Count per RegionServer

Page 7: Operating and Supporting Apache HBase Best Practices and Improvements

Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HBase UI – Debug Dump contains Thread Dumps

Page 8: Operating and Supporting Apache HBase Best Practices and Improvements

Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HBase UI – Runtime Configurations Runtime configurations can be reviewed from UI Consolidated view of every relevant configuration.

Page 9: Operating and Supporting Apache HBase Best Practices and Improvements

Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HBase UI – Tasks Tasks can be reviewed and monitored Like major compactions. RPC calls

Page 10: Operating and Supporting Apache HBase Best Practices and Improvements

Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HBCK Covered extensively later while we discuss inconsistencies

Page 11: Operating and Supporting Apache HBase Best Practices and Improvements

Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Regionserver Stability Issues

Page 12: Operating and Supporting Apache HBase Best Practices and Improvements

Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Region Server Crashes – JVM Pauses Hbase’s high availability comes from excellent orchestration conducted by ZooKeeper on

monitoring every RS and Hbase Master Zookeeper issues a shutdown of RS if a heartbeat check to RS is not responded within

timeout Extended JVM pauses at a RS can manifest as unresponsive RS causing ZK to issue a

shutdown

ZK RS HeartBeat Check

I am ok

ZK

RS In GC

ShutDown Issued

HeartBeat Check

No Response

Page 13: Operating and Supporting Apache HBase Best Practices and Improvements

Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Region Server Crashes - Garbage Collection Pause What do we see in RS Logs?

2016-06-13 18:13:20,533 WARN regionserver/b-bdata-r07f4-prod.phx2.symcpe.net/100.80.148.53:60020 util.Sleeper: We slept 82136ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad

2016-06-13 18:13:20,533 WARN JvmPauseMonitor util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 79669ms

GC pool 'ParNew' had collection(s): count=2 time=65742ms GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=14253ms

Page 14: Operating and Supporting Apache HBase Best Practices and Improvements

Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Region Server Crashes - Garbage Collection Pause GC Tuning Recommendation for CMS and YoungGen.

– hbase-env.sh-Xmx32g

-Xms32g

-Xmn2500m

-XX:PermSize=128m (eliminated in Java 8)

-XX:MaxPermSize=128m (eliminated in Java 8)

-XX:SurvivorRatio=4

-XX:CMSInitiatingOccupancyFraction=50

-XX:+UseCMSInitiatingOccupancyOnly

Also test G1 for your use case.

Page 15: Operating and Supporting Apache HBase Best Practices and Improvements

Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

RS Crashes - Non GC JVM Pause Disk IO GC logs show unusual behavior What we’ve seen is a delta between user time and real time taken in GC logs.

2015-07-06T23:55:10.642-0700: 7271.224: [GC2015-07-06T23:55:41.688-0700: 7302.270: [ParNew: 420401K->1077K(471872K), 0.0347330 secs]

1066189K->646865K(32453440K), 31.0811340 secs] [Times: user=0.77 sys=0.01, real=31.08 secs]

This is that classic head scracthing moment.

Page 16: Operating and Supporting Apache HBase Best Practices and Improvements

Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

RS Crashes - Non GC JVM Pause Disk IO With no further leads in RS logs and GC logs we focus on system level information. /var/log/message provides significant leads Right when the we see that unusual delta between user and real clocks in GC logs we see

the following in system logs

kernel: sd 0:0:0:0: attempting task abort! scmd(ffff8809f5b7ddc0)

kernel: sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 17 0b 1c c8 00 00 08 00kernel: scsi target0:0:0: handle(0x0007), sas_address(0x4433221102000000), phy(2)kernel: scsi target0:0:0: enclosure_logical_id(0x500605b009941140), slot(0)

kernel: sd 0:0:0:0: task abort: SUCCESS scmd(ffff8809f5b7ddc0)

Enabling DEBUG logging at disk driver level clearly showed 30 seconds pauses during write operations.

Page 17: Operating and Supporting Apache HBase Best Practices and Improvements

Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

RS Crashes - Non GC JVM Pause CPU Halts RS Logs show long JVM pause However; it explicitly clarifies that it’s a non GC Pause

2016-02-11 04:59:33,859 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 140009ms No GCs detected 2016-02-11 04:59:33,861 WARN [regionserver60020.compactionChecker] util.Sleeper: We slept 140482ms instead of

We look at other component logs on the same machine. DataNode logs show break in activity around the same time frame. We don’t see exceptions in DN logs. But certainly break in log continuation.

Page 18: Operating and Supporting Apache HBase Best Practices and Improvements

Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

RS Crashes - Non GC JVM Pause CPU Halts Start looking at system level information dmesg buffer logs by running dmesg command provides leads on CPU pauses

INFO: task java:100759 blocked for more than 120 seconds. Not tainted 2.6.32-431.el6.x86_64 #1"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.java D 000000000000001b 0 100759 100731 0x00000080

This was identified as a kernel level Red Hat bug Root Cause: hpsa driver can block CPU's workqueue for up to 10 minutes timeout as it waits

for controller's acknowledgment. When this happens it results in stalled workqueue. And since the tty work ended up in the same CPU workqueue, we have the hung task

Page 19: Operating and Supporting Apache HBase Best Practices and Improvements

Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Mitigate JVM Pauses Mitigate Crashes from JVM Pauses?

– Extend ZK Tick Time in zoo.cfg– Extend zookeeper.session.timeout to match tick time in hbase-site.xml

How Much?

$ cat hbase-hbase*.log | grep –i pause

97903ms102732ms106956ms112824ms125318ms165652ms – Biggest Pause so Far

Consider – 180000ms

Not my favorite workaround.

Cons?

• Now ZK will wait for extended time to issue a shutdown.

• Makes Hbase fall short on its High Availability promises.

• Make every effort to debug and resolve pauses.

Page 20: Operating and Supporting Apache HBase Best Practices and Improvements

Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Read Write Performance

Page 21: Operating and Supporting Apache HBase Best Practices and Improvements

Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Write Performance

• Write to WAL caps your write performance. • Relies on throughput of DataNode Pipeline

• Writes to Memstore is instantaneous • Writes build up in RS heap• Flushes eventually on the disk

Page 22: Operating and Supporting Apache HBase Best Practices and Improvements

Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Write Performance How to go about debugging Write Performance issues in really huge clusters?

– Thanks to Hbase community, starting Hbase 0.99 onwards we have DN pipeline printed for slow Hlog Sync.– For Hlog writes slower than what is configured as hbase.regionserver.hlog.slowsync.ms we now print DN

pipeline in RS logs.

2016-06-23 05:01:06,972 INFO [sync.2] wal.FSHLog: Slow sync cost: 131006 ms, current pipeline: [DatanodeInfoWithStorage[10.189.115.117:50010,DS-c9d2a4b4-710b-4b3a-bd9d-93e8ba443f60,DISK], DatanodeInfoWithStorage[10.189.115.121:50010,DS-7b7ba04c-f654-4a50-ad3b-16116a593d37,DISK], DatanodeInfoWithStorage[10.189.111.128:50010,DS-8abb86da-84ac-413f-80a3-56ea7db1cb59,DISK]]

Tracking slow DN prior to Hbase 0.99 was a very convoluted process. – It starts with tracking which RS has RPC call queue length backing up– Identify the most recent WAL file associated with that RS– Run hadoop fsck –files –blocks –locations <WAL file>– Identify DN involved with hosting blocks for the most recent WAL file

Page 23: Operating and Supporting Apache HBase Best Practices and Improvements

Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Read Performance Hbase provides block caching which can improve subsequent scans However first read has to follow the read path of hitting HDFS first and the disk eventually. Read performance ideally depends on how fast the disks are responding.

Best Practices to Improve Read Performance Major Compactions - Once a day during low traffic hour. Balanced Cluster – Even distribution of regions across all region servers

Page 24: Operating and Supporting Apache HBase Best Practices and Improvements

Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Read Performance – Best Practices Major Compaction

– Consolidates multiple store files into one– Drastically improves block locality to avoid remote calls to read data.– Review Block Locality Metrics in RegionServer UI

Page 25: Operating and Supporting Apache HBase Best Practices and Improvements

Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Read Performance – Best Practices Balanced Cluster

– Even distribution of regions across all regionserver– Balancer if turned on runs ever 5 minutes and keeps balancing the cluster– It prevents a regionserver from being the most sought after regionserver. Preventing Hot Spotting

Other Configs– Enable HDFS Short Circuit – Turned on by Default in HDP distribution.– Client Scanner Cache hbase.client.scanner.caching. Set to 100 in HDP by default

Page 26: Operating and Supporting Apache HBase Best Practices and Improvements

Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Inconsistencies

Page 27: Operating and Supporting Apache HBase Best Practices and Improvements

Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Inconsistencies Hbase stores information in multiple places which includes

Unhandled situation within Meta, ZK, HDFS or Master just throws the entire system out of sync causing inconsistencies

Region Splits is an extremely complex and orchestrated work flow. It includes interaction with all of the above mentioned components and has very little room for error.

We’ve seen the most inconsistencies coming out of region splits.– Lingering reference files– Catalog Janitor prematurely deleting parent store file. HBASE-13331

HDFS Zookeeper

META Master Memory

Page 28: Operating and Supporting Apache HBase Best Practices and Improvements

Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Inconsistencies Symptoms

Client Hbase Region Not Serving

Retries/Time Out

Page 29: Operating and Supporting Apache HBase Best Practices and Improvements

Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Inconsistencies Tools to identify and resolve inconsistencies

HBCK – Great Tool to identify inconsistencies• Can be executed from any hbase client machine• Confirms if Hbase is healthy or has inconsistencies• Provides fix options to resolve inconsistencies

HBCK not a silver bullet• Deep dive into RS logs• Review Znodes• Hbase Master UI • Won’t run if Master has not initialized

Page 30: Operating and Supporting Apache HBase Best Practices and Improvements

Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Inconsistencies Some of the inconsistencies we see

– ERROR: Region { meta => xxx,\x1A,1440904364342.ffdece0f3fc5323055b56b4d79e99e16., hdfs => null, deployed => } found in META, but not in HDFS or deployed on any region server

– This is broken meta even though it says file missing on HDFS.– hbase hbck -fixMeta

Zookeeper

Master Memory

HDFS

META

Page 31: Operating and Supporting Apache HBase Best Practices and Improvements

Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Inconsistencies Some of the inconsistencies we see

– ERROR: There is a hole in the region chain between X and Y. You need to create a new .regioninfo and region dir in hdfs to plug the hole.

– This is broken HDFS. Expected region directory is missing– hbase hbck –fixHdfsOrphans -fixHdfsHoles

ZookeeperHDFS

Master MemoryMETA

Page 32: Operating and Supporting Apache HBase Best Practices and Improvements

Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Inconsistencies Some of the inconsistencies we see

– ERROR: Found lingering reference file hdfs://namenode.example.com:8020/apps/hbase/data/XXX/f1d15a5a44f966f3f6ef1db4bd2b1874/a/d730de20dcf148939c683bb20ed1acad.5dedd121a18d32879460713467db8736

– Region Splits did not complete successfully leaving lingering reference files– hbase hbck -fixReferenceFiles

ZookeeperHDFS

Master MemoryMETA

Page 33: Operating and Supporting Apache HBase Best Practices and Improvements

Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Inconsistencies Some of the inconsistencies we see

– HBCK reporting 0 inconsistencies after running the fixes.– However hbase master UI is still reporting RIT

– Restart Hbase Master to resolve this.

ZookeeperHDFS

Master MemoryMETA

Page 34: Operating and Supporting Apache HBase Best Practices and Improvements

Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Inconsistencies Not Always Straight Forward

– ERROR: Region { meta => null, hdfs => hdfs://xxx/hbase/yyy/00e2eed3bd0c3e8993fb2e130dbaa9b8, deployed => } on HDFS, but not listed in META or deployed on any region server

– Inconsistency of this nature needs deeper dive into other inconsistencies– It also need assessment of logs.

HDFS

Master Memory

Zookeeper

META

Page 35: Operating and Supporting Apache HBase Best Practices and Improvements

Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Inconsistencies

Hbase Hbck Best Practices

• Redirect output to a file hbase hbck >>/tmp/hbck.txt

• Larger clusters run table specific hbck fixes • hbase hbck –fixMeta mytable

• Avoid running hbck with –repair flag.

Page 36: Operating and Supporting Apache HBase Best Practices and Improvements

Page 36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

SmartSense

Page 37: Operating and Supporting Apache HBase Best Practices and Improvements

Page 37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

SmartSense Great at detecting setup/config issues proactively

– Ulimits– Dedicated ZK drives– Transparent Huge Pages– Swapiness

This is common knowledge. However; if you don’t have it setup SmartSense will prompt for resolution

Page 38: Operating and Supporting Apache HBase Best Practices and Improvements

Page 38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

SmartSense

Page 39: Operating and Supporting Apache HBase Best Practices and Improvements

Page 39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Improvements in Ops and Stability

Page 40: Operating and Supporting Apache HBase Best Practices and Improvements

Page 40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Metrics You MUST have a metric solution to successfully operate HBase cluster(s)

– GC Times, pause times– Gets / Puts, Scans per second– Memstore and Block cache (use memory!)– Queues (RPC, flush, compaction)– Replication (lag, queue, etc)– Load Distribution, per-server view– Look at HDFS and system(cpu, disk) metrics as well

Use OpenTSDB if nothing else is available New versions keep adding more and more metrics

– Pause times, more master metrics, per-table metrics, FS latencies, etc How to chose important metrics out of hundreds available? Region Server and Master UI is your friend

Page 41: Operating and Supporting Apache HBase Best Practices and Improvements

Page 41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Grafana + AMS

<insert grafana>

Page 42: Operating and Supporting Apache HBase Best Practices and Improvements

Page 42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Other Improvements Canary Tool

– Monitor per-regionserver / per-region, do actual reads and writes, create alerts Procedure V2 based assignments

– Robust cluster ops (HBase-2.0)– Eliminate states in multiple places– Less manual intervention will be needed

Bigger Heaps– Reduce garbage being generated– More offheap stuff (eliminate buffer copy, ipc buffers, memstore, cells, etc)

Graceful handling of peak loads– RPC scheduling– client backoff

Rolling Upgradable, no downtime

Page 43: Operating and Supporting Apache HBase Best Practices and Improvements

Page 43 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Thanks. Q & A