Page 1: Intro to HBase Internals & Schema Design (for HBase users)

Intro to HBase Internals &

Schema DesignAlex Baranau, Sematext International, 2012

(for HBase Users)

Monday, July 9, 12

Page 2: Intro to HBase Internals & Schema Design (for HBase users)

About Me

Software Engineer at Sematext International

@abaranau (abaranau)

Monday, July 9, 12

Page 3: Intro to HBase Internals & Schema Design (for HBase users)


Logical view

Physical view

Schema design

Other/Advanced topics

Monday, July 9, 12

Page 4: Intro to HBase Internals & Schema Design (for HBase users)

Why?Why should I (HBase user) care about HBase internals?

HBase will not adjust cluster settings to optimal based on usage patterns automatically

Schema design, table settings (defined upon creation), etc. depend on HBase implementation aspects

Monday, July 9, 12

Page 5: Intro to HBase Internals & Schema Design (for HBase users)

Logical View

Monday, July 9, 12

Page 6: Intro to HBase Internals & Schema Design (for HBase users)

Logical View: RegionsHBase cluster serves multiple tables, distinguished by name

Each table contains of rows

Each row contains cells:(row key, column family, column, timestamp) -> value

Table is split into Regions (table shards, each contains full rows), defined by start and end row keys

Monday, July 9, 12

Page 7: Intro to HBase Internals & Schema Design (for HBase users)

Logical View: Regions are Shards

Regions are “atoms of distribution”

Each region assigned to single RegionServer (HBase cluster slave)

Rows of particular Region served by single RS (cluster slave)

Regions are distributed evenly across RSs

Region has configurable max size

When region reaches max size (or on request) it is split into two smaller regions, which can be assigned to different RSs

Monday, July 9, 12

Page 8: Intro to HBase Internals & Schema Design (for HBase users)

Logical View: Regions on Cluster







RegionServerRegion Region

RegionServerRegion Region

RegionServerRegion Region


Monday, July 9, 12

Page 9: Intro to HBase Internals & Schema Design (for HBase users)

Logical View: Regions Load

It is essential for Regions under the load to be evenly distributed across the cluster

It is HBase user’s job to make sure the above is true. Note: even distribution of Regions over cluster doesn’t imply that the load is evenly distributed

Monday, July 9, 12

Page 10: Intro to HBase Internals & Schema Design (for HBase users)

Logical View: Regions Load

Take into account that rows are stored in ordered manner

Make sure you don’t write rows with sequential keys to avoid RS hotspotting*

When writing data with monotonically increasing/decreasing keys, data is written at one RS at a time

Use pre-splitting of the table upon creation

Starting with single region means using one RS for some time

In general, splitting can be expensive

Increase max region size

* see

Monday, July 9, 12

Page 11: Intro to HBase Internals & Schema Design (for HBase users)

Logical View: Slow RSs

When load is distributed evenly, watch for slowest RSs (HBase slaves)

Since every region served by single RS, one slow RS can slow down cluster performance e.g. when:

data is written into multiple RSs at even pace (random value-based row keys)

data is being read from many RSs when doing scan

Monday, July 9, 12

Page 12: Intro to HBase Internals & Schema Design (for HBase users)

Physical View

Monday, July 9, 12

Page 13: Intro to HBase Internals & Schema Design (for HBase users)

Physical View: Write/Read Flow



RegionStore(per CF)MemStore

HFile HFile ...



HDFSWrite Ahead Log




HFile HFile

write read

...Store(per CF)


Monday, July 9, 12

Page 14: Intro to HBase Internals & Schema Design (for HBase users)

Physical: Speed up Writing

Enabling & increasing client-side buffer reduces RPC operations amount

warn: possible loss of buffered data

in case of client failure; design for failover

in case of write failure (networking/server-side issues); can be handled on client

Disabling WAL increases write speed

warn: possible data loss in case of RS failure

Use bulk import functionality (writes HFiles directly, which can be later added to HBase)

Monday, July 9, 12

Page 15: Intro to HBase Internals & Schema Design (for HBase users)

Physical: Memstore FlushesWhen memstore is flushed N HFiles are created (one per CF)

Memstore size which causes flushing is configured on two levels:

per RS: % of heap occupied by memstores

per table: size in MB of single memstore (per CF) of Region

When Region memstores flushes, memstores of all CFs are flushed

Uneven data amount between CFs causes too many flushes & creation of too many HFiles (one per CF every time)

In most cases having one CF is the best design

Monday, July 9, 12

Page 16: Intro to HBase Internals & Schema Design (for HBase users)

Physical: Memstore Flushes

Important: there are Memstore size thresholds which cause writes to be blocked, so slow memstore flushes and overuse of memory by memstore can cause write perf degradation

Hint: watch for flush queue size metric on RSs

At the same time the more memory memstore uses the better for writing/reading perf (unless it reaches those “write blocking” thresholds)

Monday, July 9, 12

Page 17: Intro to HBase Internals & Schema Design (for HBase users)

Physical: Memstore Flushes

Example of good situation



Monday, July 9, 12

Page 18: Intro to HBase Internals & Schema Design (for HBase users)

Physical: HFiles CompactionHFiles are periodically compacted into bigger HFiles containing same data

Reading from less HFiles faster

Important: there’s a configured max number of files in Store which, when reached causes writes to block

Hint: watch for compaction queue size metric on RSs


HFile HFile

Store(per CF)


Monday, July 9, 12

Page 19: Intro to HBase Internals & Schema Design (for HBase users)

Physical: Data LocalityRSs are usually collocated with HDFS DataNodes

DataNode DataNode

RegionServer RegionServer







Slave Node Slave Node




Monday, July 9, 12

Page 20: Intro to HBase Internals & Schema Design (for HBase users)

Physical: Data LocalityHBase tries to assign Regions to RSs so that Region data stored physically on the same node. But sometimes fails

after Region splits there’s no guarantee that there’s a node that has all blocks (HDFS level) of new Region and

no guarantee that HBase will not re-assign this Region to different RS in future (even distribution of Regions takes preference over data locality)

There’s an ongoing work towards better preserving data locality

Monday, July 9, 12

Page 21: Intro to HBase Internals & Schema Design (for HBase users)

Physical: Data LocalityAlso, data locality can break when:

Adding new slaves to cluster

Removing slaves from cluster

Incl. node failures

Hint: look at networking IO between slaves when writing/reading data, it should be minimal


make sure HDFS is well balanced (use balancer tool)

try to rebalance Regions in HBase cluster if possible (HBase Master restart will do that) to regain data locality

Pre-split table on creation to limit (ideally avoid) splits and regions movement; manage splits manually sometimes helps

Monday, July 9, 12

Page 22: Intro to HBase Internals & Schema Design (for HBase users)

Schema Design(very briefly)

Monday, July 9, 12

Page 23: Intro to HBase Internals & Schema Design (for HBase users)

Schema: row keysUsing row key (or keys range) is the most efficient way to retrieve the data from HBase

Row key design is major part of schema design

Note: no secondary indices available out of the box

Row Key Data‘login_2012-03-01.00:09:17’ d:{‘user’:‘alex’}

... ...‘login_2012-03-01.23:59:35’ d:{‘user’:‘otis’}‘login_2012-03-02.00:00:21’ d:{‘user’:‘david’}

Monday, July 9, 12

Page 24: Intro to HBase Internals & Schema Design (for HBase users)

Schema: row keysRedundancy is OK!

warn: changing two rows in HBase is not atomic operation

Row Key Data‘login_2010-01-01.00:09:17’ d:{‘user’:‘alex’}

... ...‘login_2012-03-01.23:59:35’ d:{‘user’:‘otis’}‘alex_2010-01-01.00:09:17’ d:{‘action’:‘login’}

... ...‘otis_2012-03-01.23:59:35’ d:{‘action’:‘login’}‘alex_login_2010-01-01.00:09:17’ d:{‘device’:’pc’}

... ...‘otis_login_2012-03-01.23:59:35’ d:{‘device’:‘mobile’}

Monday, July 9, 12

Page 25: Intro to HBase Internals & Schema Design (for HBase users)

Schema: RelationsNot relational

No joins

Denormalization is OK! Use ‘nested entities’

Row Key Data



professor_math_firstname:David, professor_math_lastname:Smart,

professor_cs_firstname:Jack, professor_cs_lastname:Weird,


‘prof_dsmart’ d:{...}




Monday, July 9, 12

Page 26: Intro to HBase Internals & Schema Design (for HBase users)

Schema: row key/CF/qual size

HBase stores cells individually

great for “sparse” data

row key, CF name and column name stored with each cell which may affect data amount to be stored and managed

keep them short

serialize and store many values into single cell

Row Key Data



Monday, July 9, 12

Page 27: Intro to HBase Internals & Schema Design (for HBase users)

Other/Advanced Topics

Monday, July 9, 12

Page 28: Intro to HBase Internals & Schema Design (for HBase users)

Advanced: Co-ProcessorsCoProcessors API (HBase 0.92.0+) allows to:

execute (querying/aggregation/etc.) logic on server side (you may think of it as of stored procedures in RDBMS)

perform auditing of actions performed on server-side (you may think of it as of triggers in RDBMS)

apply security rules for data access

and many more cool stuff

Monday, July 9, 12

Page 29: Intro to HBase Internals & Schema Design (for HBase users)

Other: Use CompressionUsing compression:

reduces data amount to be stored on disks

reduces data amount to be transferred when RS reading data not from local replica

increases amount of CPU used, but CPU isn’t usually a bottleneck

Favor compression speed over compression ratio

SNAPPY is good

Use wisely:

e.g. avoid wasting CPU cycles on compressing images

compression can be configured on per CF basis, so storing non-compressible data in separate CF sometimes helps

data blocks are uncompressed in memory, avoid this to cause OOME

note: when scanning (seeking data to return for scan) many data blocks can be uncompressed even if none of the data will be returned from those block

Monday, July 9, 12

Page 30: Intro to HBase Internals & Schema Design (for HBase users)

Other: Use Monitoring


Ganglia, Cacti, other*, Just use it!


Monday, July 9, 12

Page 31: Intro to HBase Internals & Schema Design (for HBase users)


Sematext is hiring!Monday, July 9, 12
