Upload
krishnanand
View
33
Download
0
Embed Size (px)
DESCRIPTION
sdfdsf
Citation preview
Performance of Hadoop on OpenStack
Andrew LazarevMirantis, 2014
Introduction Environment description Direct virtualization impact Real-life workload Data locality Conclusion
Agenda
What Is Hadoop?Am
bari
(Man
agem
ent)
ZooK
eepe
r(C
oord
inat
ion)
Ooz
ie(S
ched
ulin
g)
HDFS(File System)
HBas
e(N
oSql
Sto
re)
MapReduce(Programming Framework)
Pig
(Dat
a Fl
ow)
Hive
(SQ
L)
Stor
m(R
eal-t
ime
com
puta
tion)
- Core Apache Hadoop
Easy to operate cluster One-click self-service provisioning Sharing hardware between several Hadoop
clusters Tenants isolation on hypervisor and network
layers Comparable performance with much more
flexibility
Why Virtualize Hadoop?
Sahara - OpenStack Data Processing project OpenStack Integrated Supports Hadoop 1 and 2 Different vendors (Apache, Hortonworks, Intel*) Cluster provisioning and on-demand jobs
execution
How To Virtualize?
Direct impact Disk write Disk read Network CPU
Virtualization Impact
Indirect impact Lack of low level system control Resources for hypervisor operation
Virtualization Impact
Introduction Environment description Direct virtualization impact Real-life workload Data locality Conclusion
Agenda
Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620) Memory: 8 x 4.0 GB, 32.0 GB total Disk: 1 drive, 0.9 TB (WDC WD1003FBYX-0) Network: 2 x 1 GbE
Environment
Host OS: CentOS 6.5 VM OS: CentOS 6.5 Mirantis OpenStack QEMU-KVM 1.2.0 Network: Neutron + GRE Open vSwitch 1.10.2
Environment (continuation)
Hadoop: Vanilla Apache 1.2.1 Bare metal setup: 19 Hadoop Nodes
OpenStack setup: 1 Controller + 19 Computes 19 (or 57) VMs with Hadoop
Environment (continuation)
Introduction Environment description Direct virtualization impact Real-life workload Data locality Conclusion
Agenda
Disk Write (using dd)
*greater is better
TestDFSIO - built-in hadoop IO test write test read test 1000 files of 1GB (1 TB total)
Disk Write (hadoop test)
Disk Write (hadoop test)
*less is better
Disk Write (hadoop test)
*less is better
disk_cachemodes param in nova.conf writethrough (default) - guest disk write cache
is disabled writeback - guest disk write cache is enabled
Disk Cache Mode
Writeback cache enabled One large VM with all memory per Host
Disk Write (dd, writeback cache)
Disk Write (dd, writeback cache)
*greater is better
Disk Write (hadoop test, writeback cache)
*less is better
QEMU 1.4: high performance virtio-blk data plane
implementation +108.0% on rnd-write (based on RedHat
presentation on KVM Forum):
Disk Write - Way To Improve
Disk Read (using hdparm)
*greater is better
Disk Read (using hdparm)
*greater is better
Disk Read (hadoop test)
*less is better
Network (OVS+GRE)
*greater is better
PI - built-in hadoop test Depends mostly on CPU 50 series of 10,000,000,000 probes
CPU (hadoop test)
CPU (hadoop test)
*less is better
Introduction Environment description Direct virtualization impact Real-life workload Data locality Conclusion
Agenda
Built-in hadoop test Represents real Hadoop workload Involves
IO Networking Computation
Sorting 200,000,000 of 100-byte entries (20 GB) Writeback cache enabled
Terasort
Terasort
*less is better
Introduction Environment description Direct virtualization impact Real-life workload Data locality Conclusion
Agenda
Hadoop can consider distance between nodes Intelligent task scheduling Reading data from close data nodes
Data Locality
NODENODE
NODE
NODE
NODE
NODE
Data Locality
*greater is better
Network within host comparable to disk speed Allows hadoop process isolation (VM per process) Test:
1 Master Node (JobTracker + NameNode) 18 DataNodes 18 TaskTrackers TeraSort of 20 Gb data
Data Locality
Terasort (data locality)
*less is better
Introduction Environment description Direct virtualization impact Real-life workload Data locality Conclusion
Agenda
Only 6% performance impact for composite test Performance continuously improving with
external libs upgrade (QEMU, Open vSwitch) Much more topology flexibility Isolation at low cost
between clusters between nodes within cluster
Conclusion
Q&A
Thank you!Andrew Lazarev
Launchpad/GitHub/IRC: alazarevE-Mail: [email protected]