[email protected]/6023532/LAHUG_HBase_intro.pdf · ©"Copyright2013"Cloudera."All"rights"reserved."Notto"be"reproduced"withoutprior"wri=en"consent." 3! HBaseisadistributed,scalablebigdatastorebuiltontopofHadoop

1 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.

An Introduc@on to Apache HBase Ian Wrigley Curriculum Manager, Cloudera [email protected] @iwrigley


Agenda

§ What is HBase?

§  HBase usage scenarios §  HBase table basics §  HBase architecture §  HBase schema fundamentals

§ Ques@ons


§ HBase is a distributed, scalable big data store built on top of Hadoop

§ Some=mes referred to as a ‘NoSQL’ data store – Access pa=erns are restricted to just get, put, scan (par@al or full table scan) – Does not use SQL to access the data

§ Goal: low-‐latency, consistent, random read/write of data

§ Based on Google’s BigTable – For a long @me, the data store for the Google Web Crawler’s data, GMail, Google Analy@cs…

What is HBase?


RDBMS HBase Data layout Row or column-‐oriented Column Family-‐oriented

Transac=ons Mul=-‐row ACID Single row only

Query language SQL get/put/scan

Security Authen=ca=on/Authoriza=on Column Family-‐level authen=ca=on/authorizatoin

Indexes On arbitrary columns Row-‐key only

Max data size TBs PB+

Read/write throughput limits

1000s queries/second Millions of “queries”/second

HBase is NOT a Tradi@onal RDBMS


§ Hadoop provides: – Fault tolerance – Scalability – Batch processing with MapReduce

§ HBase provides: – Random reads and writes – High throughput – Caching

§ HBase data is all stored in HDFS

§ Note: HBase does not use MapReduce! – HBase is real-‐@me, MapReduce is not – Although it is possible to run MapReduce jobs on data in HBase tables

HBase is Built on Hadoop


§ Writes – 1-‐3ms – 1,000 to 10,000 per node per second

§ Reads – 0-‐3ms cached – 10-‐30ms from disk – 10,000-‐40,000 reads/sec/node from cache

§ Read and write data anywhere in the table – No requirement for sequen@al writes

Low-‐Latency Random Data Access


Agenda

§ What is HBase?


§ Ques@ons


§ Lots of data – Hundreds of Gigabytes up to Petabytes

§ High write throughput – 1000s/second per node

– Scales to hundreds of thousands of writes/second across the cluster

§ Scalable cache capacity – Adding nodes adds to available cache

§ Data layout – Excels at key lookup – No penalty for sparse columns

Usage Scenarios for HBase


§ Use HBase if… – You need random write, random read, or both (but not neither) – You need to do many thousands of opera@ons per second on mul@ple TB of data – Your access pa=erns are well-‐known and simple

§ Don’t use HBase if… – You only append to your dataset, and tend to read the whole thing – You primarily do ad-‐hoc analy@cs (ill-‐defined access pa=erns) – Your data easily fits on one beefy node – You’re only doing it because it’s what the cool kids are using

When To Use HBase


§ eBay – ‘Cassini’ cluster indexes the en@re eBay site inventory – Approx 15TB of data – Random write: 200,000,000 rows/day – Bulk data import: 500,000,000 rows in 30 minutes – 1.2TB of data imported each day

§ Facebook – Uses HBase for its messaging store

– Stores small messages and message indexes in HBase – 75B+ R/W opera@ons/day – At peak, 1.5M opera@ons/second – 2PB+ of data in HBase

A Couple of Large HBase Users…


Agenda

§ What is HBase?


§ Ques@ons


§ Tables are comprised of rows and columns

§ Every row has a row key (analogous to a primary key in a tradi=onal RDBMS) – Rows are stored sorted by row key for fast lookups

§ All columns in HBase belong to a par=cular column family

§ A table has one or more column families – Typically a table will have a small number of column families – Column families should rarely change – A column family can have any number of columns – Columns within a family are sorted and stored together – Columns only exist when inserted

– NULLs are free

§ Table cells are versioned, uninterpreted arrays of bytes

Overview

13 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent. 13 9/24/12 STL HUG / Strangeloop unsessions

Row key

info: height

info:state roles:hadoop roles:hbase

cujng ‘9k’ ‘CA’ ‘Founder’

tlipcon ‘5k7’ ‘CA’ ‘PMC’ @ts=2011 ‘Commi=er’ @ts=2010

‘Commi=er’

Logical View as ‘Records’


Row key

info: height




‘Commi=er’

Implicit PRIMARY KEY in RDBMS terms



Row key

info: height




‘Commi=er’


Data is all byte[] in HBase



Row key

info: height




‘Commi=er’ A single cell might have different values at different @mestamps





Row key

info: height





Different rows may have

different sets of columns (table is

sparse)





Row key

info: height





Different rows may have

different sets of columns (table is

sparse)


Data is all byte[] in HBase Column format family:qualifier



§ Physically, data is stored on a per-‐Column Family basis as a sorted map – Ordered by row key, column key in ascending order – For the same rowkey and column qualifier, ordered by @mestamp in descending order

Physical Storage

Row key

Column key Timestamp Cell value

Row1 info:aaa 1273516197868 valueA Row1 info:bbb 1273871824184 valueB Row1 info:ccc 1273746289103 valueC Row2 info:hello 1273878447049 i_am_a_value Row3 info: aaa 1273616297446 another_value

Sorted by Row key

and Column


§ By default, HBase keeps three versions of a row

§ The versions are sorted by their =mestamp (in descending order)

Versions

Key Column Value Timestamp

rowA Fam:foo New value 1275340679713

rowA Fam:foo Old value 1275091706190

rowB Fam:foo Some value 1274999316683

Sorted in descending order


§ Columns are grouped into Column Families (CFs)

§ All column family members have the same prefix – E.g., info:height and info:state – The “:” delimits the CF from the qualifier

§ Columns can be created on the fly

§ Physically, all column family members are stored together

§ Column families must be declared at schema defini=on =me

§ Tuning and storage selngs can be specified for each Column Family

HBase Columns and Column Families


Amribute Possible values Default

COMPRESSION NONE, GZ, LZO, SNAPPY NONE

VERSIONS 1+ 3

TTL 1-‐2147483647 (seconds) FOREVER (special value, means the data is never deleted)

BLOCKSIZE 1 byte -‐ 2GB 64K

IN_MEMORY true, false false

BLOCKCACHE true, false true

Column Family A=ributes


§ Scaling rela=onal tables oqen means par==oning or sharding data – HBase automa@cally par@@ons data in regions – A region is a range of rows – Regions are automa@cally split (broken into two) when they become too large

§ In rela=onal databases, one might normalize tables and use joins to retrieve data – HBase does not support explicit joins – A lookup by row key implicitly joins data from column families if necessary

Comparison with RDBMS Design


§ Bytes-‐in/bytes-‐out interface

§ Anything that can be converted to an array of bytes can be stored – Input can be strings, numbers, complex objects, images, etc.

§ Cell size – Prac@cal limits to the size of values – In general, cell size should not consistently be above 2-‐3MB – For large cell size:

– Increase the block size – Increase the maximum region size for the table – Keep the index size reasonable

§ Counters – Synchroniza@on is done on the RegionServer (not client)

Supported Data Types


§ Data opera=ons – Get – Put – Scan – Increment – CheckAndPut – Delete

§ Access via HBase shell, Java API, REST proxy

Access HBase via its API


byte[] row = Bytes.toBytes("rowkey"); byte[] col = Bytes.toBytes("cf1:colname"); byte[] putVal = Bytes.toBytes("cell value here"); Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, "myTable"); Put p = new Put(row); p.add(col, putVal) table.put(p); Get g = new Get(row); Result r = table.get(g); byte[] getVal = r.getValue(col); assertEquals(putVal, getVal);

Access HBase via its API (cont’d)


Agenda

§ What is HBase?


§ Ques@ons


§ ZooKeeper – A centralized service used to maintain configura@on informa@on for HBase

§ Catalog Tables – Keep track of the loca@ons of region servers and regions

§ Master – Monitors all region server instances in the cluster – The interface for all metadata changes

§ RegionServer – Responsible for serving and managing regions

§ Region – A set of rows belonging to a table

Major Components of an HBase Cluster


§ ZooKeeper service – Stores global informa@on about the cluster – Provides synchroniza@on and detects master node failure – Holds the loca@on of the -‐ROOT-‐ table and the master

ZooKeeper

Client

Zookeeper

Master

Lookup Master and -ROOT-

Read/Write Data

Client Rarely Needs Master

Register Master and -ROOT- Locations

Assigns Regions to RegionServers and Check the Health of RegionServers

Region 3

RegionServer 2Table 'Foo'

HLog

Table 'Foo'RegionServer 2

ColumnFamily

Region 2

Key500

Key999

Region 3

RegionServer 2Table 'Foo'

HLog

Table 'Foo'RegionServer 1

ColumnFamily

Region 1

Key001

Key499


§ Responsible for coordina=ng the region servers

§ Assigns regions, detects region server failures

§ Handles schema changes

§ Master runs several background threads – LoadBalancer periodically reassigns regions in the cluster – CatalogJanitor periodically checks and cleans up the .META. Table

§ An HBase cluster can have mul=ple masters – Upon startup all compete to run the cluster – If the ac@ve master loses its lease in Zookeeper, the remaining masters compete for the master role

Master


§ Daemons which runs on some (typically all) of the slave nodes in the cluster

§ Serve data for reads and writes of rows contained in regions

§ Regions which become too large will automa=cally be split

RegionServers


§ -ROOT- Catalog Table – A table that lists the loca@on of the .META. table

§ .META. Catalog Table – A table that lists all the regions and their loca@ons

Catalog Tables


§ Holds a subset of a table’s rows, like a par==on – Region is specified by its startKey and endKey

§ A table may have one or more regions – Comprised of a store per column family

§ New regions are automa=cally created as tables grow – Each region may live on a different node – Made up of several HDFS files (store files)

Regions

Store (ColumnFamily1)

row 1 . . .row 2 . . .row 3 . . .

.

.

.H

Log

Region

RegionServer

Table 'Foo'Region

Table 'Bar'

Store (ColumnFamily1)

row 1 . . .row 2 . . .row 3 . . .

.

.

.


§ When regions get too big (256MB by default) they are automa=cally split

§ Resul=ng regions may be served by the same, or different, RegionServers

§ HBase periodically ‘balances’ the regions across RegionServers

Region Splits


§ Data is first wrimen to the region’s Write-‐Ahead-‐Log (WAL) and then to memstore – The WAL is required for crash recovery if the memstore is lost

§ Memstore is flushed to an immutable file in HDFS (store file) periodically

§ Eventually these store files will be aggregated and cleaned up during a compac2on

Data Storage


Agenda

§ What is HBase?


§ Ques@ons


§ Schema design is a combina=on of (amongst other things): – Designing the keys (row and column) – Segrega@ng data into column families – Choosing appropriate compression and block size sejngs

§ Similar techniques are needed to scale most systems – e.g., indexes, par@@oning data, consistent hashing

§ Overcome shortcomings of architecture – Denormaliza@on -‐> Replacement for JOINs – Duplica@on -‐> Design for reads – Intelligent Keys -‐> Implement indexing, sor@ng and op@mize reads

§ You must consider your access pamern when designing the table schema – Failure to do so will result in dreadful performance

Schema Fundamentals


§ Recommend no more than three Column Families

§ Column Families allow for separa=on of data – Used by columnar databases for fast analy@cal queries, but on column level only – Data across CFs is typically not accessed simultaneously

§ Amributes are applied on a per-‐Column Family basis – e.g., different or no compression depending on the content type

Column Family Design


§ Row keys cannot be changed – Row must be deleted and then re-‐inserted

§ Rows are sorted on insert, not on scan

§ Keys are ordered lexicographically – E.g., 1,10,100,11,12,13 . . . 2,20,21, . . . – Preserve natural ordering of numbers by lek padding with 0’s

§ The row key is the only key/index on the table – No secondary keys/indexes or foreign keys

§ Selec=ng the appropriate row keys for your applica=on is cri=cal for performance!

Row Key Design


§ This is not a rela=onal database! – Typically there are only a few, large (denormalized) tables – Each table will have a small number of Column Families

– Within each CF you may have hundreds or thousands of columns

§ Think about your access pamerns… – Columns that are accessed together should be assigned to the same Column Family – Row keys determine how closely on disk rows are stored

– Recall that data is assigned to Regions according to Row keys

§ Include enough informa=on in your Row Key so that you can avoid table scans

§ Be wary of hotspolng

How to Design your Table?


§ Be careful with the design of your Row Key

§ Example: a monotonically increasing value – 0000001, 0000002, 0000003, 0000004, etc.

§ All writes will go to the same RegionServer – Even aker the region splits – Results in an absolute bound on write performance

§ Instead, try to have your row keys distributed around the regions – MD5 hash of the row key for example

§ Consider your read pamern

Hotspojng: or, How to Ruin HBase Performance


§ HBase schema design is difficult!

§ It’s the single most cri=cal factor in the performance of your HBase cluster

§ There’s way more to it than we have =me to cover here

Schema Design: Conclusion


Agenda

§ What is HBase?


§ Ques=ons


§ Ques=ons? Ask away!

§ Thanks to Truecar for hos=ng the event and providing the food and drink

§ Thanks to Cloudera for providing this evening’s speaker

§ Discount on Cloudera’s HBase training course: HBaseSoCal – 15% off any HBase training class delivered by Cloudera – Expires 07/01/13

§ Teamtreehouse.com is offering a 3 month pass. Learn to build websites, create iPhone and Android apps, code with Ruby on Rails and PHP, or start a business. Please email Subash directly for that. His details are available on the meetup website

Ques@ons

Documents

[email protected]/6023532/LAHUG_HBase_intro.pdf · ©"Copyright2013"Cloudera."All"rights"reserved."Notto"be"reproduced"withoutprior"wri=en"consent." 3! HBaseisadistributed,scalablebigdatastorebuiltontopofHadoop