44
Sizing Your HBase Cluster Lars George | @larsgeorge EMEA Chief Architect @ Cloudera

HBase Sizing Guide

Embed Size (px)

DESCRIPTION

This talk was given during the HBase Meetup on the 15th of October, 2014 at the Google Offices in Chelsea.

Citation preview

Page 1: HBase Sizing Guide

Sizing Your HBase Cluster

Lars George | @larsgeorge

EMEA Chief Architect @ Cloudera

Page 2: HBase Sizing Guide

2

Agenda

•  Introduction

•  Technical Background/Primer

•  Best Practices

•  Summary

©2014 Cloudera, Inc. All rights reserved.

Page 3: HBase Sizing Guide

3

Who I am…

Lars George [EMEA Chief Architect]

•  Clouderan since October 2010

•  Hadooper since mid 2007

•  HBase/Whirr Committer (of Hearts)

•  github.com/larsgeorge

©2014 Cloudera, Inc. All rights reserved.

Page 4: HBase Sizing Guide

4

Bruce Lee: ”As you think, so shall you become.”

©2014 Cloudera, Inc. All rights reserved.

Page 5: HBase Sizing Guide

5

Introduction

©2014 Cloudera, Inc. All rights reserved.

Page 6: HBase Sizing Guide

6

HBase Sizing Is...

•  Making the most out of the cluster you have by... –  Understanding how HBase uses low-level resources –  Helping HBase understand your use-case by configuring it appropriately - and/or - –  Design the use-case to help HBase along

•  Being able to gauge how many servers are needed for a given use-case

Page 7: HBase Sizing Guide

7

Technical Background

“To understand your fear is the beginning of really seeing…”

— Bruce Lee

©2014 Cloudera, Inc. All rights reserved.

Page 8: HBase Sizing Guide

8

HBase Dilemma

Although HBase can host many applications, they may require completely opposite features

Events Entities

Time Series Message Store

Page 9: HBase Sizing Guide

9

Competing Resources

•  Reads and Writes compete for the same low-level resources –  Disk (HDFS) and Network I/O –  RPC Handlers and Threads –  Memory (Java Heap)

•  Otherwise they do exercise completely separate code paths

Page 10: HBase Sizing Guide

10

Memory Sharing

•  By default every region server is dividing its memory (i.e. given maximum heap) into –  40% for in-memory stores (write ops) –  20% (40%) for block caching (reads ops) –  Remaining space (here 40% or 20%) go towards usual Java heap usage •  Objects etc. •  Region information (HFile metadata)

•  Share of memory needs to be tweaked

Page 11: HBase Sizing Guide

11

Writes

•  The cluster size is often determined by the write performance –  Simple schema design implies writing to all (entities) or only one region (events)

•  Log structured merge trees like –  Store mutation in in-memory store and write-ahead log –  Flush out aggregated, sorted maps at specified threshold - or - when under pressure –  Discard logs with no pending edits –  Perform regular compactions of store files

Page 12: HBase Sizing Guide

12

Writes: Flushes and Compactions

Older Newer TIME

SIZE (MB)

1000

0

250

500

750

Page 13: HBase Sizing Guide

13

Flushes

•  Every mutation call (put, delete etc.) causes a check for a flush

•  If threshold is met, flush file to disk and schedule a compaction –  Try to compact newly flushed files quickly

•  The compaction returns - if necessary - where a region should be split

Page 14: HBase Sizing Guide

14

Compaction Storms

•  Premature flushing because of # of logs or memory pressure –  Files will be smaller than the configured flush size

•  The background compactions are hard at work merging small flush files into the existing, larger store files –  Rewrite hundreds of MB over and over

Page 15: HBase Sizing Guide

15

Dependencies

•  Flushes happen across all stores/column families, even if just one triggers it

•  The flush size is compared to the size of all stores combined –  Many column families dilute the size –  Example: 55MB + 5MB + 4MB

Page 16: HBase Sizing Guide

16

Write-Ahead Log

•  Currently only one per region server –  Shared across all stores (i.e. column families) –  Synchronized on file append calls

•  Work being done on mitigating this –  WAL Compression –  Multithreaded WAL with Ring Buffer –  Multiple WAL’s per region server ➜ Start more than one region server per node?

Page 17: HBase Sizing Guide

17

Write-Ahead Log (cont.)

•  Size set to 95% of default block size –  64MB or 128MB, but check config!

•  Keep number low to reduce recovery time –  Limit set to 32, but can be increased

•  Increase size of logs - and/or - increase the number of logs before blocking

•  Compute number based on fill distribution and flush frequencies

Page 18: HBase Sizing Guide

18

Write-Ahead Log (cont.)

•  Writes are synchronized across all stores –  A large cell in one family can stop all writes of another –  In this case the RPC handlers go binary, i.e. either work or all block

•  Can be bypassed on writes, but means no real durability and no replication –  Maybe use coprocessor to restore dependent data sets (preWALRestore)

Page 19: HBase Sizing Guide

19

Some Numbers

•  Typical write performance of HDFS is 35-50MB/s

Cell Size OPS 0.5MB 70-100

100KB 350-500

10KB 3500-5000 ??

1KB 35000-50000 ????

This is way to high in practice - Contention!

Page 20: HBase Sizing Guide

20

Some More Numbers

•  Under real world conditions the rate is less, more like 15MB/s or less

– Thread contention and serialization overhead is cause for massive slow down

Cell Size OPS 0.5MB 10

100KB 100

10KB 800

1KB 6000

Page 21: HBase Sizing Guide

21

Write Performance

•  There are many factors to the overall write performance of a cluster –  Key Distribution ➜ Avoid region hotspot –  Handlers ➜ Do not pile up too early –  Write-ahead log ➜ Bottleneck #1 –  Compactions ➜ Badly tuned can cause ever increasing background noise

Page 22: HBase Sizing Guide

22

Cheat Sheet

•  Ensure you have enough or large enough write-ahead logs

•  Ensure you do not oversubscribe available memstore space

•  Ensure to set flush size large enough but not too large

•  Check write-ahead log usage carefully

•  Enable compression to store more data per node

•  Tweak compaction algorithm to peg background I/O at some level

•  Consider putting uneven column families in separate tables

•  Check metrics carefully for block cache, memstore, and all queues

Page 23: HBase Sizing Guide

23

Example: Write to All Regions

•  Java Xmx heap at 10GB

•  Memstore share at 40% (default) –  10GB Heap x 0.4 = 4GB

•  Desired flush size at 128MB –  4GB / 128MB = 32 regions max!

•  For WAL size of 128MB x 0.95% –  4GB / (128MB x 0.95) = ~33 partially uncommitted logs to keep around

•  Region size at 20GB –  20GB x 32 regions = 640GB raw storage used

Page 24: HBase Sizing Guide

24

Notes

•  Compute memstore sizes based on number of written-to regions x flush size

•  Compute number of logs to keep based on fill and flush rate

•  Ultimately the capacity is driven by –  Java Heap –  Region Count and Size –  Key Distribution

Page 25: HBase Sizing Guide

25

Reads

•  Locate and route request to appropriate region server –  Client caches information for faster lookups

•  Eliminate store files if possible using time ranges or Bloom filter

•  Try block cache, if block is missing then load from disk

Page 26: HBase Sizing Guide

26

Seeking with Bloom Filters

Page 27: HBase Sizing Guide

27

Writes: Where’s the Data at?

Older Newer TIME

SIZE (MB)

1000

0

250

500

750

Existing Row Mutations Unique Row Inserts

Page 28: HBase Sizing Guide

28

Block Cache

•  Use exported metrics to see effectiveness of block cache –  Check fill and eviction rate, as well as hit ratios ➜ random reads are not ideal

•  Tweak up or down as needed, but watch overall heap usage

•  You absolutely need the block cache –  Set to 10% at least for short term benefits

Page 29: HBase Sizing Guide

29

Testing: Scans

HBase scan performance •  Use available tools to test •  Determine raw and KeyValue read performance –  Raw is just bytes, while KeyValue means block parsing

•  Insert data using YCSB, then compact table –  Single region enforced

•  Two test cases –  Small data: 1 column with 1 byte value –  Large(r) data: 1 column with 1KB value

•  About same size for both in total: 15GB

©2014 Cloudera, Inc. All rights reserved.

Page 30: HBase Sizing Guide

30

Testing: Scans

©2014 Cloudera, Inc. All rights reserved.

Page 31: HBase Sizing Guide

31

Scan Row Range

•  Set start and end key to limit scan size

Page 32: HBase Sizing Guide

32

Best Practices

“If you spend too much time thinking about a thing, you'll never get it done.”

— Bruce Lee

©2014 Cloudera, Inc. All rights reserved.

Page 33: HBase Sizing Guide

33

How to Plan

Advice on

•  Number of nodes

•  Number of disk and total disk capacity

•  RAM capacity

•  Region sizes and count

•  Compaction tuning

©2014 Cloudera, Inc. All rights reserved.

Page 34: HBase Sizing Guide

34

Advice on Nodes

•  Use previous example to compute effective storage based on heap size, region count and size –  10GB heap x 0.4 / 128MB x 20GB = 640GB, if all regions are active –  Address more storage with read-from-only regions

•  Typical advice is to use more nodes with fewer, smaller disks (6 x 1TB SATA or 600GB SAS, or SSDs)

•  CPU is not an issue, I/O is (even with compression)

©2014 Cloudera, Inc. All rights reserved.

Page 35: HBase Sizing Guide

35

Advice on Nodes

•  Memory is not an issue, heap sizes small because of Java Garbage Collection limitation –  Up to 20GB has been used –  Newer versions of Java should help –  Use off-heap cache

•  Current servers typically have 48GB+ memory

©2014 Cloudera, Inc. All rights reserved.

Page 36: HBase Sizing Guide

36

Advice on Tuning

•  Trade off throughput against size of single data points –  This might cause schema redesign

•  Trade off read performance against write amplification –  Advise users to understand read/write performance and background write amplification

Ø This drives the number of nodes needed!

©2014 Cloudera, Inc. All rights reserved.

Page 37: HBase Sizing Guide

37

Advice on Cluster Sizing

•  Compute the number of nodes needed based on –  Total storage needed –  Throughput required for either reads and writes

•  Assume ≈15MB/s minimum for each read and write –  Increasing the KeyValue sizes improves this

©2014 Cloudera, Inc. All rights reserved.

Page 38: HBase Sizing Guide

38

Example: Twitter Firehose

©2014 Cloudera, Inc. All rights reserved.

Page 39: HBase Sizing Guide

39

Example: Consume Data

©2014 Cloudera, Inc. All rights reserved.

Page 40: HBase Sizing Guide

40

HBase Heap Usage

•  Overall addressable amount of data is driven by heap size –  Only read-from regions need space for indexes,

filters –  Written-to regions also need MemStore space

•  Java heap space is limited still as garbage collections will cause pauses –  Typically up to 20GB heap –  Or invest is pause-less GC

Page 41: HBase Sizing Guide

41

Summary

“All fixed set patterns are incapable of adaptability or pliability. The truth is outside of all fixed patterns.”

— Bruce Lee

©2014 Cloudera, Inc. All rights reserved.

Page 42: HBase Sizing Guide

42

WHHAT BRUCE? IT DEPENDS? L

©2014 Cloudera, Inc. All rights reserved.

Page 43: HBase Sizing Guide

43

Checklist

To plan for the size of an HBase cluster you have to:

•  Know the use-case –  Read/write mix –  Expected throughput –  Retention policy

•  Optimize the schema and compaction strategy –  Devise a schema that allows for only some regions being written to

•  Take “known” numbers to compute cluster size

©2014 Cloudera, Inc. All rights reserved.

Page 44: HBase Sizing Guide

Thank you @larsgeorge