22
Hadoop and HBase on the Cloud: A Case Study on Performance and Isolation. Konstantin V. Shvachko Jagane Sundar February 26, 2013

02.28.13 WANdisco ApacheCon 2013

Embed Size (px)

DESCRIPTION

WANdisco is a provider of non-stop software for global enterprises to meet the challenges of Big Data and distributed software development. KEY HIGHLIGHTS, Session 1: Tuesday, Feb. 26, 5:15 p.m.-6 p.m. Hadoop and HBase on the Cloud: A Case Study on Performance and Isolation Cloud infrastructure is a flexible tool to orchestrate multiple Hadoop and HBase clusters, which provides strict isolation of data and compute resources for multiple customers. Most importantly, our benchmarks show that virtualized environment allows for higher average utilization of per-node resources. For more session information, visit http://na.apachecon.com/schedule/presentation/131/. CO-PRESENTERS, Dr. Konstantin V. Shvachko, Chief Architect, Big Data, WANdisco and Jagane Sundar, CTO/VP Engineering, Big Data, WANdisco A veteran Hadoop developer and respected author, Konstantin Shvachko is a technical expert specializing in efficient data structures and algorithms for large-scale distributed storage systems. Konstantin joined WANdisco through the AltoStor acquisition and before that he was founder and Chief Scientist at AltoScale, a Hadoop and HBase-as-a-Platform company acquired by VertiCloud. Konstantin played a lead architectural role at eBay, building two generations of the organization's Hadoop platform. At Yahoo!, he worked on the Hadoop Distributed File System (HDFS). Konstantin has dozens of publications and presentations to his credit and is currently a member of the Apache Hadoop PMC. Konstantin has a Ph.D. in Computer Science and M.S. in Mathematics from Moscow State University, Russia. Jagane Sundar has extensive big data, cloud, virtualization, and networking experience and joined WANdisco through its AltoStor acquisition. Before AltoStor, Jagane was founder and CEO of AltoScale, a Hadoop and HBase-as-a-Platform company acquired by VertiCloud. His experience with Hadoop began as Director of Hadoop Performance and Operability at Yahoo! Jagane has such accomplishments to his credit as the creation of Livebackup, development of a user mode TCP Stack for Precision I/O, development of the NFS and PPP clients and parts of the TCP stack for JavaOS for Sun MicroSystems, and more. Jagane received his B.E. in Electronics and Communications Engineering from Anna University. About WANdisco WANdisco ( LSE : WAND ) is a provider of enterprise-ready, non-stop software solutions that enable globally distributed organizations to meet today's data challenges of secure storage, scalability and availability. WANdisco's products are differentiated by the company's patented, active-active data replication technology, serving crucial high availability (HA) requirements, including Hadoop Big Data and Application Lifecycle Management (ALM). Fortune Global 1000 companies including AT&T, Motorola, Intel and Halliburton rely on WANdisco for performance, reliability, security and availability. For additional information, please visit www.wandisco.com.

Citation preview

Page 1: 02.28.13 WANdisco ApacheCon 2013

Hadoop and HBase on the Cloud:

A Case Study on Performance and Isolation.

Konstantin V. Shvachko Jagane Sundar

February 26, 2013

Page 2: 02.28.13 WANdisco ApacheCon 2013

Founders of AltoStor and AltoScale

Jagane: WANdisco, CTO and VP Engineering of Big Data

– Director of Hadoop Performance and Operability at Yahoo!

– Big data, cloud, virtualization, and networking experience

Konstantin: WANdisco, Chief Architect

– Hadoop, HDFS at Yahoo! & eBay

– Efficient data structures and algorithms for large-scale distributed storage systems

– Giraffa - file system with distributed metadata & data utilizing HDFS and HBase.

Hosted on Apache Extra

/ page 2

Authors

Page 3: 02.28.13 WANdisco ApacheCon 2013

The Hadoop Distributed File System

(HDFS)

– Reliable storage layer

– NameNode – namespace and block

management

– DataNodes – block replica container

MapReduce – distributed computation

framework

– Simple computational model

– JobTracker – job scheduling, resource

management, lifecycle coordination

– TaskTracker – task execution module

Analysis and transformation of very large

amounts of data using commodity

servers

/ page 3

A reliable, scalable, high performance distributed computing system

What is Apache Hadoop

NameNode

DataNode

TaskTracker

JobTracker

DataNode

TaskTracker

DataNode

TaskTracker

Block

Task

Page 4: 02.28.13 WANdisco ApacheCon 2013

Table: big, sparse, loosely structured

– Collection of rows, sorted by row keys

– Rows can have arbitrary number of

columns

Table is split Horizontally into Regions

– Dynamic Table partitioning

– Region Servers serve regions to

applications

Columns grouped into Column families

– Vertical partition of tables

Distributed Cache: Regions are loaded

in nodes’ RAM

– Real-time access to data

/ page 4

A distributed key-value storage for real-time access to semi-structured data

What is Apache HBase

DataNode DataNode DataNode

NameNode JobTracker

RegionServer RegionServer RegionServer

TaskTracker TaskTracker TaskTracker

HB

ase

Ma

ste

r

Page 5: 02.28.13 WANdisco ApacheCon 2013

/ page 5

A Parallel Computational Model and Distributed Framework

What is MapReduce

dogs C, 3

like

cats

V, 1

C, 2 V, 2

C, 3 V, 1

C, 8

V, 4

Page 6: 02.28.13 WANdisco ApacheCon 2013

I/O utilization

– Can run with the speed of spinning drives

– Examples: DFSIO, Terasort (well tuned)

Network utilization – optimized by design

– Data locality. Tasks executed on nodes where input data resides.

No massive transfers

– Block replication of 3 requires two data transfers

– Map writes transient output locally

– Shuffle requires cross-node transfers

CPU utilization

1. IO bound workloads preclude from using more cpu time

2. Cluster provisioning:

peak-load performance vs. average utilization tradeoff

/ page 6

Low Average CPU Utilization on Hadoop Clusters

What is the Problem

Page 7: 02.28.13 WANdisco ApacheCon 2013

Computation of Pi

– pure CPU workload, no input or output data

– Enormous amount of FFTs computing amazingly large numbers

– Record Pi run over-heated the datacenter

Well tuned Terasort is CPU intensive

Compression – marginal utilization gain

Production clusters run cold

1. IO bound workloads

2. Conservative provisioning of cluster resources to meet strict SLAs

/ page 7

CPU Load

Two quadrillionth (1015)

digit of π is 0

Page 8: 02.28.13 WANdisco ApacheCon 2013

72 GB - total RAM / node

– 4 GB – DataNode

– 2 GB – TaskTracker

– 16 GB – RegionServer

– 2 GB – per individual task: 25 task slots (17 maps and 8 reduces)

Average utilization vs peak-load performance

– Oversubscription (28 task slots)

– Better average utilization

– MR Tasks can starve HBase RegionServers

Better Isolation of resources → Aggressive resource allocation

/ page 8

Rule of thumb

Cluster Provisioning Dilemma

Page 9: 02.28.13 WANdisco ApacheCon 2013

Goal: Eliminate disk IO contention

Faster non-volatile storage devices improve IO performance

– Advantage in random reads

– Similar performance for sequential IOs

More RAM: HBase caching

/ page 9

With non-spinning storage

Increasing IO Rate

Page 10: 02.28.13 WANdisco ApacheCon 2013

DFSIO benchmark measures average throughput for IO operations

– Write

– Read (sequential)

– Append

– Random Read (new)

MapReduce job

– Map: same operation write or read for all mappers. Measures throughput

– Single reducer: aggregates the performance results

Random Reads (MAPREDUCE-4651)

– Random Read DFSIO randomly chooses an offset

– Backward Read DFSIO reads files in reverse order

– Skip Read DFSIO reads seeks ahead after every portion read

– Avoid read-ahead buffering

– Similar results for all three random read modifications

/ page 10

Standard Hadoop Benchmark measuring HDFS performance

What is DFSIO

Page 11: 02.28.13 WANdisco ApacheCon 2013

Four node cluster: Hadoop 1.0.3 HBase 0.92.1

– 1 master-node: NameNode, JobTracker

– 3 slave node: DataNode, TaskTracker

Node configuraiton

– Intel 8 core processor with hyper-threading

– 24 GB RAM

– Four 1TB 7200 rpm SATA drives

– 1 Gbps network interfaces

DFSIO dataset

– 72 files of size 10 GB each

– Total data read: 7GB

– Single read size: 1 MB

– Concurrent readers: from 3 to 72

/ page 11

DFSIO

Benchmarking Environment

Page 12: 02.28.13 WANdisco ApacheCon 2013

0

200

400

600

800

1000

1200

1400

1600

1800

3 12 24 48 72

Ag

gre

ga

te T

hro

ug

hp

ut

(MB

/sec)

Concurrent Readers

Disks

Flash

/ page 12

Increasing Load with Random Reads

Random Reads

Page 13: 02.28.13 WANdisco ApacheCon 2013

YCSB allows to define a mix of read / write operations,

measure latency and throughput

– Compares different database: relational and no-SQL

– Data is represented as a table of records with number of fixed fields

– Unique key identifies each record

Main operations

– Insert: Insert a new record

– Read: Read a record

– Update: Update a record by replacing the value of one field

– Scan: Scan a random number of consequent records, starting at a random record

key

/ page 13

Yahoo! Cloud Serving Benchmark

What is YCSB

Page 14: 02.28.13 WANdisco ApacheCon 2013

Four node cluster

– 1 master-node: NameNode, JobTracker, HBase Master, Zookeeper

– 3 slave node: DataNode, TaskTracker, RegionServer

– Physical master node

– 2 to 4 VMs on a slave node. Max 12 VMs

YCSB datasets of two different sizes: 10 and 30 million records

– dstat collects system resource metrics: CPU, memory usage, disk and network stats

/ page 14

YCSB

Benchmarking Environment

Page 15: 02.28.13 WANdisco ApacheCon 2013

/ page 15

YCSB Workloads

Workloads Insert % Read % Update % Scan %

Data Load 100

Reads with heavy insert load 55 45

Short range scans: workload E 5 95

Page 16: 02.28.13 WANdisco ApacheCon 2013

/ page 16

Random reads and Scans substantially faster with flash

Average Workloads Throughput

0

10

20

30

40

50

60

70

80

90

100

Data LoadReads with

Inserts Short rangeScans

Th

rou

gh

pu

t (%

Op

s/s

ec)

Disks

Flash

Page 17: 02.28.13 WANdisco ApacheCon 2013

0

500

1000

1500

2000

2500

3000

64 256 512 1024

Th

rou

gh

pu

t (O

ps

/sec)

Concurrent Threads

Physical nodes

6 Vms

9 Vms

12 Vms

/ page 17

Adding one VM per node increases overall performance 20% on average

Short range Scans: Throughput

Page 18: 02.28.13 WANdisco ApacheCon 2013

/ page 18

Latency grows linearly with number of threads on physical nodes

Short range Scans: Latency

0

200

400

600

800

1000

1200

64 256 512 1024

Avera

ge

La

ten

cy (

ms)

Concurrent Threads

Physical nodes

6 Vms

9 Vms

12 Vms

Page 19: 02.28.13 WANdisco ApacheCon 2013

/ page 19

Virtualized Cluster drastically increases CPU utilization

CPU Utilization comparison

4% 3% 1%

92%

CPU Physical nodes

user system wait idle

55% 23%

1% 21%

CPU Virtualized cluster

user system wait idle

Physical node cluster generates

very light CPU load – 92% idle

With VMs the CPU can be drawn close to

100% at peaks

Page 20: 02.28.13 WANdisco ApacheCon 2013

/ page 20

Latency of reads on mixed workload: 45% reads and 55% inserts

Reads with Inserts

0

100

200

300

400

500

600

700

10 million records

30 million records

Read

Late

ncy (

ms)

Disks

Flash

Page 21: 02.28.13 WANdisco ApacheCon 2013

HDFS

– Sequential IO is handled well by the disk storage

– Flash substantially outperforms disks on workloads with random reads

HBase write-only workload provides marginal improvement for flash

Using multiple VMs / node provides 100% peak utilization of HW resources

– CPU utilization on physical-node clusters is a fraction of its capacity

Combination of Flash Storage and Virtualization implies

high performance of HBase for

Random Read and Reads Mixed with writes workloads

Virtualization serves two main functions:

– Resource utilization by running more server processes per node

– Resource isolation by designating certain percentage of resources to each server

and not letting them starve each other

/ page 21

VMs allow to utilize Random Read advantage of flash for Hadoop

Conclusions

Page 22: 02.28.13 WANdisco ApacheCon 2013

Thank you

Konstantin V. Shvachko Jagane Sundar