A Performance Comparison of Container-‐based Virtualiza8on Systems for MapReduce Clusters
Miguel G. Xavier, Marcelo V. Neves, Cesar A. F. De Rose [email protected]
Faculty of Informa8cs, PUCRS Porto Alegre, Brazil
February 13, 2014
Outline
• Introduc8on • Container-‐based Virtualiza8on • MapReduce • Evalua8on • Conclusion
Introduc8on • Virtualiza8on
• Allows resources to be shared • Hardware independence, availability, isola8on and security • BeUer manageability • Widely used in datacenters/cloud compu8ng
• MapReduce Cluster and Virtualiza8on • Usage scenarios
• BeUer resource sharing • Cloud Compu8ng
• However, hypervisor-‐based technologies in MapReduce environments has tradi8onally been avoided
Container-‐based Virtualiza8on • A group o processes on a Linux box, put together in a
isolated environment • A lightweight virtualiza8on layer • Non virtualized drivers • Shared opera8ng system
Hardware
Host OS
Virtualization Layer
Guest Processes
Guest Processes
Hardware
Virtualization Layer
Guest Processes
Guest Processes
Guest OS Guest OS
Container-based Virtualization Hypervisor-Based Virtualization
Host OS
Container-‐based Virtualiza8on • Each container has:
• Its own network interface (and IP Address) • Bridged, routed …
• Its own filesystem • Isola8on (security)
• container A and B can’t see each other • Isola8on (resource usage)
• RAM, CPU, I/O • Current systems
• Linux-‐Vserver, OpenVZ, LXC
Container-‐based Virtualiza8on • Implements Linux Namespaces
• Mount – moun8ng/unmou8ng file systems • UTS – hostname, domainname • IPC – SysV message queues, semaphore, memory segments • Network – IPv4/IPv6 stacks, rou8ng, firewall, /proc/net,
sock • PID – Own set of pids Chroot is filesystem namespace
• Current systems • Linux-‐Vserver, OpenVZ, LXC
Container-‐based Systems • Linux-‐VServer
• Implements its own features in Linux kernel • limits the scope of the file system from different processes
through the tradi8onal chroot • OpenVZ
• Linux Containers (LXC) • Based on CGroups
Hypervisor-‐ vs Container-‐based Systems
Hypervisor Container Different Kernel OS Single Kernel Device Emula8on Syscall Many FS caches Single FS cache Limits per machine Limits per process High Performance Overhead Low Performance Overhead
MapReduce • MapReduce • A parallel programming model • Simplicity, efficiency and high scalability • It has become a de facto standard for large-‐scale data analysis
• MapReduce has also aUracted the aUen8on of the HPC
community • Simpler approach to address the parallelism problem • Highly visible case where MapReduce has been successfully
used by companies like Google, Yahoo!, Facebook and Amazon
MapReduce and Containers • Apache Mesos • Shares a cluster between mul8ple different frameworks • Creates another level of resource management • Management is taken away from cluster’s RMS
• Apache YARN • Hadoop Next Genera8on • BeUer job scheduling/monitoring • Uses virtualiza8on to share a cluster among different
applica8ons
Evalua8on • Experimental Environment
• Hadoop cluster composed by 4 nodes • Two processors with 8 cores (without threads) per node • 16GB of memory per node • 146GB of disksize per node
• Analyze of the best results of performance • Through micro-‐benchmarks
• HDFS evalua8on (TestDFSIO) • NameNode evalua8on (NNBench) • MapReduce evalua8on (MRBench)
• Through macro-‐benchmarks (WordCount, TeraBench) • Analyze of best results of isola8on
• Through IBS benchmark
• At least 50 execu8ons were performed for each experiment
HDFS Evalua8on
• Semngs: • Replica8on of 3 blocks • File size from 100 MB to
3000 MB
• All Container-‐based systems have performance similar to na8ve
• Results o OpenVZ represents loss of 3Mbps
• It is due to the CFQ scheduler
0
5
10
15
20
25
30
0 1000 2000 3000File size (Bytes)
Thro
ughp
ut (M
bps)
lxcnativaovzvserver
HDFS Evalua8on • All of Container-‐based
systems obtained performance results similar to na8ve
• Linux-‐VServer uses a
Physical-‐based network
0
5
10
15
20
25
30
0 1000 2000 3000File size (Bytes)
Thro
ughp
ut (M
bps)
lxcnativaovzvserver
NameNode Evalua8on using NNBench
• NNBench benchmark was chosen to evaluate the NameNode component • Linux-‐VServer reaches a latency at a average of 48ms, while LXC obtained the
worst result at an average of 56ms • The differences are not so significant if the numbers are considered • However, the strengths are that no excep8on was observed during the high
HDFS management stress, and that all systems were able to respond effec8vely as the na8ve
Na8ve LXC OpenVZ VServer
Open/Read (ms) 0.51 0.52 0.51 0.49
Create/Write (ms) 54.65 56.89 51.96 48.90
• Generates opera8ons on 1000 files on HDFS
MapReduce Evalua8on using MRBench
• The results obtained from MRBench show that MR layer suffers no substan8al effect while running on different container-‐based virtualiza8on systems
Na8ve LXC OpenVZ VServer
Execu8on Time 14251 13577 14304 13614
Analyzing Performance with WordCount
0
20
40
60
80
100
120
140
160
180
Wordcount
Exec
utio
n Ti
me
(sec
onds
)
NativeLXCOpenVZVServer
• 30 GB of input data
• The peak of performance degrada8on from OpenVZ is explained by the I/O scheduler overhead
Analyzing Performance with TeraSort
0
20
40
60
80
100
120
140
Terasort
Exec
utio
n Ti
me
(sec
onds
)
NativeLXCOpenVZVServer
• Standard map/reduce sort • Steps: • Generates 30 GB of input
data • Run on such input data.
• A HDFS block size of 64MB
Performance Isola8on
Container A
Container A
Container B
Base line applica8on
Base line applica8on
Stress Test
Execu8on Time Execu8on Time
Performance degrada8on (%)
Performance Isola8on
CPU Memory I/O Fork Bomb
LXC 0% 8.3% 5.5% 0%
• We chose LXC as the representa8ve of the container-‐based virtualiza8on to be evaluated
• The limits of the CPU usage per container is working well • no significant impact was noted. • a liUle performance degrada8on needs to be taken into account • The fork bomb stress test reveals that the LXC has a security subsystem that
ensure feasibility
Conclusions • we found that all container-‐based systems reach a near-‐na8ve performance for
MapReduce workloads • the results of performance isola8on reveled that the LXC has improved its
capabili8es of restrict resources among containers • although some works are already taking advantages of container-‐based
systems on MR clusters • this work demonstrated the benefits of using container-‐based systems to
support MapReduce clusters
Future Work
• We plan to study the performance isola8on at the network-‐level • We plan to study the scalability while increasing the number of
nodes • We plan to study aspects regarding the green compu8ng, such as
the trade-‐off between performance and energy consump8on
Thank you for your aUen8on!