Upload
wandisco-plc
View
5.393
Download
3
Embed Size (px)
DESCRIPTION
WANdisco is a provider of non-stop software for global enterprises to meet the challenges of Big Data and distributed software development. KEY HIGHLIGHTS, Session 1: Tuesday, Feb. 26, 5:15 p.m.-6 p.m. Hadoop and HBase on the Cloud: A Case Study on Performance and Isolation Cloud infrastructure is a flexible tool to orchestrate multiple Hadoop and HBase clusters, which provides strict isolation of data and compute resources for multiple customers. Most importantly, our benchmarks show that virtualized environment allows for higher average utilization of per-node resources. For more session information, visit http://na.apachecon.com/schedule/presentation/131/. CO-PRESENTERS, Dr. Konstantin V. Shvachko, Chief Architect, Big Data, WANdisco and Jagane Sundar, CTO/VP Engineering, Big Data, WANdisco A veteran Hadoop developer and respected author, Konstantin Shvachko is a technical expert specializing in efficient data structures and algorithms for large-scale distributed storage systems. Konstantin joined WANdisco through the AltoStor acquisition and before that he was founder and Chief Scientist at AltoScale, a Hadoop and HBase-as-a-Platform company acquired by VertiCloud. Konstantin played a lead architectural role at eBay, building two generations of the organization's Hadoop platform. At Yahoo!, he worked on the Hadoop Distributed File System (HDFS). Konstantin has dozens of publications and presentations to his credit and is currently a member of the Apache Hadoop PMC. Konstantin has a Ph.D. in Computer Science and M.S. in Mathematics from Moscow State University, Russia. Jagane Sundar has extensive big data, cloud, virtualization, and networking experience and joined WANdisco through its AltoStor acquisition. Before AltoStor, Jagane was founder and CEO of AltoScale, a Hadoop and HBase-as-a-Platform company acquired by VertiCloud. His experience with Hadoop began as Director of Hadoop Performance and Operability at Yahoo! Jagane has such accomplishments to his credit as the creation of Livebackup, development of a user mode TCP Stack for Precision I/O, development of the NFS and PPP clients and parts of the TCP stack for JavaOS for Sun MicroSystems, and more. Jagane received his B.E. in Electronics and Communications Engineering from Anna University. About WANdisco WANdisco ( LSE : WAND ) is a provider of enterprise-ready, non-stop software solutions that enable globally distributed organizations to meet today's data challenges of secure storage, scalability and availability. WANdisco's products are differentiated by the company's patented, active-active data replication technology, serving crucial high availability (HA) requirements, including Hadoop Big Data and Application Lifecycle Management (ALM). Fortune Global 1000 companies including AT&T, Motorola, Intel and Halliburton rely on WANdisco for performance, reliability, security and availability. For additional information, please visit www.wandisco.com.
Citation preview
Hadoop and HBase on the Cloud:
A Case Study on Performance and Isolation.
Konstantin V. Shvachko Jagane Sundar
February 26, 2013
Founders of AltoStor and AltoScale
Jagane: WANdisco, CTO and VP Engineering of Big Data
– Director of Hadoop Performance and Operability at Yahoo!
– Big data, cloud, virtualization, and networking experience
Konstantin: WANdisco, Chief Architect
– Hadoop, HDFS at Yahoo! & eBay
– Efficient data structures and algorithms for large-scale distributed storage systems
– Giraffa - file system with distributed metadata & data utilizing HDFS and HBase.
Hosted on Apache Extra
/ page 2
Authors
The Hadoop Distributed File System
(HDFS)
– Reliable storage layer
– NameNode – namespace and block
management
– DataNodes – block replica container
MapReduce – distributed computation
framework
– Simple computational model
– JobTracker – job scheduling, resource
management, lifecycle coordination
– TaskTracker – task execution module
Analysis and transformation of very large
amounts of data using commodity
servers
/ page 3
A reliable, scalable, high performance distributed computing system
What is Apache Hadoop
NameNode
DataNode
TaskTracker
JobTracker
DataNode
TaskTracker
DataNode
TaskTracker
Block
Task
Table: big, sparse, loosely structured
– Collection of rows, sorted by row keys
– Rows can have arbitrary number of
columns
Table is split Horizontally into Regions
– Dynamic Table partitioning
– Region Servers serve regions to
applications
Columns grouped into Column families
– Vertical partition of tables
Distributed Cache: Regions are loaded
in nodes’ RAM
– Real-time access to data
/ page 4
A distributed key-value storage for real-time access to semi-structured data
What is Apache HBase
DataNode DataNode DataNode
NameNode JobTracker
RegionServer RegionServer RegionServer
TaskTracker TaskTracker TaskTracker
HB
ase
Ma
ste
r
/ page 5
A Parallel Computational Model and Distributed Framework
What is MapReduce
dogs C, 3
like
cats
V, 1
C, 2 V, 2
C, 3 V, 1
C, 8
V, 4
I/O utilization
– Can run with the speed of spinning drives
– Examples: DFSIO, Terasort (well tuned)
Network utilization – optimized by design
– Data locality. Tasks executed on nodes where input data resides.
No massive transfers
– Block replication of 3 requires two data transfers
– Map writes transient output locally
– Shuffle requires cross-node transfers
CPU utilization
1. IO bound workloads preclude from using more cpu time
2. Cluster provisioning:
peak-load performance vs. average utilization tradeoff
/ page 6
Low Average CPU Utilization on Hadoop Clusters
What is the Problem
Computation of Pi
– pure CPU workload, no input or output data
– Enormous amount of FFTs computing amazingly large numbers
– Record Pi run over-heated the datacenter
Well tuned Terasort is CPU intensive
Compression – marginal utilization gain
Production clusters run cold
1. IO bound workloads
2. Conservative provisioning of cluster resources to meet strict SLAs
/ page 7
CPU Load
Two quadrillionth (1015)
digit of π is 0
72 GB - total RAM / node
– 4 GB – DataNode
– 2 GB – TaskTracker
– 16 GB – RegionServer
– 2 GB – per individual task: 25 task slots (17 maps and 8 reduces)
Average utilization vs peak-load performance
– Oversubscription (28 task slots)
– Better average utilization
– MR Tasks can starve HBase RegionServers
Better Isolation of resources → Aggressive resource allocation
/ page 8
Rule of thumb
Cluster Provisioning Dilemma
Goal: Eliminate disk IO contention
Faster non-volatile storage devices improve IO performance
– Advantage in random reads
– Similar performance for sequential IOs
More RAM: HBase caching
/ page 9
With non-spinning storage
Increasing IO Rate
DFSIO benchmark measures average throughput for IO operations
– Write
– Read (sequential)
– Append
– Random Read (new)
MapReduce job
– Map: same operation write or read for all mappers. Measures throughput
– Single reducer: aggregates the performance results
Random Reads (MAPREDUCE-4651)
– Random Read DFSIO randomly chooses an offset
– Backward Read DFSIO reads files in reverse order
– Skip Read DFSIO reads seeks ahead after every portion read
– Avoid read-ahead buffering
– Similar results for all three random read modifications
/ page 10
Standard Hadoop Benchmark measuring HDFS performance
What is DFSIO
Four node cluster: Hadoop 1.0.3 HBase 0.92.1
– 1 master-node: NameNode, JobTracker
– 3 slave node: DataNode, TaskTracker
Node configuraiton
– Intel 8 core processor with hyper-threading
– 24 GB RAM
– Four 1TB 7200 rpm SATA drives
– 1 Gbps network interfaces
DFSIO dataset
– 72 files of size 10 GB each
– Total data read: 7GB
– Single read size: 1 MB
– Concurrent readers: from 3 to 72
/ page 11
DFSIO
Benchmarking Environment
0
200
400
600
800
1000
1200
1400
1600
1800
3 12 24 48 72
Ag
gre
ga
te T
hro
ug
hp
ut
(MB
/sec)
Concurrent Readers
Disks
Flash
/ page 12
Increasing Load with Random Reads
Random Reads
YCSB allows to define a mix of read / write operations,
measure latency and throughput
– Compares different database: relational and no-SQL
– Data is represented as a table of records with number of fixed fields
– Unique key identifies each record
Main operations
– Insert: Insert a new record
– Read: Read a record
– Update: Update a record by replacing the value of one field
– Scan: Scan a random number of consequent records, starting at a random record
key
/ page 13
Yahoo! Cloud Serving Benchmark
What is YCSB
Four node cluster
– 1 master-node: NameNode, JobTracker, HBase Master, Zookeeper
– 3 slave node: DataNode, TaskTracker, RegionServer
– Physical master node
– 2 to 4 VMs on a slave node. Max 12 VMs
YCSB datasets of two different sizes: 10 and 30 million records
– dstat collects system resource metrics: CPU, memory usage, disk and network stats
/ page 14
YCSB
Benchmarking Environment
/ page 15
YCSB Workloads
Workloads Insert % Read % Update % Scan %
Data Load 100
Reads with heavy insert load 55 45
Short range scans: workload E 5 95
/ page 16
Random reads and Scans substantially faster with flash
Average Workloads Throughput
0
10
20
30
40
50
60
70
80
90
100
Data LoadReads with
Inserts Short rangeScans
Th
rou
gh
pu
t (%
Op
s/s
ec)
Disks
Flash
0
500
1000
1500
2000
2500
3000
64 256 512 1024
Th
rou
gh
pu
t (O
ps
/sec)
Concurrent Threads
Physical nodes
6 Vms
9 Vms
12 Vms
/ page 17
Adding one VM per node increases overall performance 20% on average
Short range Scans: Throughput
/ page 18
Latency grows linearly with number of threads on physical nodes
Short range Scans: Latency
0
200
400
600
800
1000
1200
64 256 512 1024
Avera
ge
La
ten
cy (
ms)
Concurrent Threads
Physical nodes
6 Vms
9 Vms
12 Vms
/ page 19
Virtualized Cluster drastically increases CPU utilization
CPU Utilization comparison
4% 3% 1%
92%
CPU Physical nodes
user system wait idle
55% 23%
1% 21%
CPU Virtualized cluster
user system wait idle
Physical node cluster generates
very light CPU load – 92% idle
With VMs the CPU can be drawn close to
100% at peaks
/ page 20
Latency of reads on mixed workload: 45% reads and 55% inserts
Reads with Inserts
0
100
200
300
400
500
600
700
10 million records
30 million records
Read
Late
ncy (
ms)
Disks
Flash
HDFS
– Sequential IO is handled well by the disk storage
– Flash substantially outperforms disks on workloads with random reads
HBase write-only workload provides marginal improvement for flash
Using multiple VMs / node provides 100% peak utilization of HW resources
– CPU utilization on physical-node clusters is a fraction of its capacity
Combination of Flash Storage and Virtualization implies
high performance of HBase for
Random Read and Reads Mixed with writes workloads
Virtualization serves two main functions:
– Resource utilization by running more server processes per node
– Resource isolation by designating certain percentage of resources to each server
and not letting them starve each other
/ page 21
VMs allow to utilize Random Read advantage of flash for Hadoop
Conclusions
Thank you
Konstantin V. Shvachko Jagane Sundar