View
1
Download
0
Category
Preview:
Citation preview
The Art of CPU-Pinning:Evaluating and Improving the
Performance of Virtualization and Containerization PlatformsDavoodGhatrehSamani,Chavit Denninart
JosephBacik†MohsenAmini Salehi‡
HighPerformanceCloudComputingLab(HPCC)SchoolofComputingandInformaticsUniversityofLouisianaLafayette
1
Introduction
• Executionplatforms1. Bare-Metal(BM)2. HardwareVirtualization(VM)3. OSVirtualization(containers,CN)
• Choosingaproperexecutionplatform,basedontheimposedoverhead▪ ContainerontopofVM(VMCN)isnotstudiedandcomparedtootherplatformsindepth
2
Introduction
• Overheadbehaviorandtrend▪ Differentexecutionplatforms(BM,VM,CN,VMCN)
▪ Differentworkloadpatterns(CPUintensive,IOintensive,etc.)
▪ Increasingcomputeresources▪ Computetuningapplied(CPUpinning)
• Cloudsolutionarchitectchallenge:▪ WhichExecutionplatformsuitswhatkindofworkload
3
Hardware Virtualization (VM)
• Operatesbasedonahypervisor• VM:aprocessrunninginthehypervisor• HypervisorhasnovisibilitytoVM’sprocesses
• KVM:apopularopen-sourcehypervisorAWSNitroC5VMtype
4
OS Virtualization (Container)
• LightweightOSlayervirtualization• Noresourceabstraction(CPU,Memory,etc.)
• HostOShascompletevisibilitytothecontainerprocesses
• Container=namespace+cgroups• Docker:Themostwidelyadoptedcontainertechnology
5
CPU Provisioning in Virtualized Platform
• Default:Timesharing▪ Linux:CompletelyFairScheduler(CFS)▪ AllCPUcoresareutilizedevenifthere
isonlyoneVMinthehostortheworkloadisnotheavy
▪ EachCPUquantumdifferentsetofCPUcores
▪ CalledVanillamodeinthisstudy• Pinned:FixedsetofCPUcoresforall
quantum▪ Overridedefaulthost/hypervisorOS
scheduler▪ Processisdistributedonlyamong
thosedesignatedCPUcores
7
Application types and measurements
• Measuredperformancemetric▪ TotalExecutionTime
• Overheadratio▪ !"#$%&##(#)*+,-.+,/#-00#$#123+4#56%+0-$/
!"#$%&##(#)*+,-.+,/#-02%$#7/#+%6
• Performancemonitoringandprofilingtools▪ BCC(BPFCompilerCollection:cpudist,offcputime),iostat,perf,
htop,top
9
Configuration of instance types
• Hostserver:DELLPowerEdgeR830▪ 4×IntelXeonE5-4628Lv4
o Eachprocessoris1.80,35MBcacheand14processingcores(28threads)
o 112homogeneouscores▪ 384GBmemory
o 24×16GBDDR4DRAM▪ RAID1(2×900GBHDD10k)storage.
10
Motivation
• IndepthstudyofcontainerontopofVM(VMCN)
• Comparingdifferentexecutionplatforms(BM,VM,CN,VMCN)alltogather
• Reallifeapplicationswithdifferentworkloadpatterns
• Findinganoverheadtrendbyincreasingresourceconfigurations
• InvolvingCPUpinningintheevaluation
11
Contribution to this work
• Unveiling▪ PSO(PlatformSizeOverhead)▪ CHR(ContainertoHostcoreRatio)
• LeveragePSOandCHRtodefineoverheadbehaviorpatternfor▪ Differentresourceconfigurations▪ Differentworkloadtypes
• Asetofbestpracticesforcloudsolutionarchitects▪ Whichexecutionplatformsuitswhatkindof
workload
Experiment and analysis: Video Processing Workload Using FFmpeg
• FFmpeg:Widelyusedvideotranscoder▪ Veryhighprocessingdemand▪ Multithreaded(upto16cores)▪ Smallmemoryfootprint▪ RepresentativeofaCPU-intensiveworkload
• Workload:▪ Function:codecchangefromAVC(H.264)to
HEVC(H.265)▪ Sourcevideofile:30MBHDvideo▪ Meanandconfidenceintervalacross20timesof
executionforeachplatformcollected
Experiment and analysis: Parallel Processing Workload Using MPI
• MPI:Widely-usedHPCplatform▪ Multi-threaded▪ ResourceusagefootprinthighlydependsontheMPIprogram
• Workload▪ Applications:MPI_Search,Prime_MPI▪ Computeintensive,however,communicationbetweenCPUcoresdominatesthecomputation
▪ Meanandconfidenceintervalacross20timesofexecutionforeachplatformcollected
Experiment and analysis: Web-based Workload Using WordPress
• WordPress▪ PHP-basedCMS:ApacheWebServer+MySQL▪ IOintensive(networkanddiskinterrupts)
• Workload▪ AsimplewebsiteissetuponWordPress▪ Browsingbehaviorofawebuserisrecorded▪ 1,000simultaneouswebusersaresimulated
o ApacheJmeter▪ Eachexperimentisperformed6times▪ Meanexecutiontime(responsetime)oftheseweb
processesisrecorded
NoSQL Workload using Apache Cassandra
• ApacheCassandra:▪ DistributedNoSQL,BigDataplatform▪ Demandscompute,memory,anddiskIO.
• Workload▪ 1,000operationswithinonesecond
o Cassandra-stresso 25%Write,85%Reado 100threads,eachonesimulatingoneuser
▪ Eachexperimentisrepeated20times▪ Averageexecutiontime(responsetime)ofallthe
synthesizedoperations.
Cross-Application Overhead Analysis
• Platform-TypeOverhead(PTO)• Resourceabstraction(VM)• Constanttrend• Pinningisnohelpful
• Platform-SizeOverhead(PSO)▪ DiminishedbyincreasingthenumberofCPUcores▪ Specifictocontainers▪ JustreportedbyIBMforDocker(Websphere tuning)• Pinningishelpfulalot
Parameters affecting PSO
1. ContainerResourceUsageTracking• cgroups
2. Container-to-HostCoreRatio(CHR)
• CHR=AssignedcorestothecontainerTotalnumberofhostcores
3. IOOperations4. Multitasking
Impact of CHR on PSO
• Lower value of CHR imposes a larger overhead (PSO)
• Application characteristics define the value of CHR
• CPU intensive• 0.14 < 𝐶𝐻𝑅 < 0.28
• IO intensive: higher• 0.28 < 𝐶𝐻𝑅 < 0.57
Container Resource Usage Tracking
• OSschedulerallocatesallavailableCPUcorestotheCNprocess
• cgroupscollectsusagescumulatively• EachschedulingeventhasdifferentCPUallocationforthatCN
• cgroupsisanatomic(kernelspace)process• Containerhastobesuspendedwhileaggregatingresource
• OSschedulingenforcesprocessmigration,cgroupsenforceresourceusagetracking-- Synergistic
The Impact of IO operations on PSO
Experimental Setup: MPI task
• CPU pinning can mitigate this kind of overhead
Summary
Experimental Setup: MPI task
1. Applicationcharacteristicisdecisiveontheimposedoverhead
2. CPUpinningreducetheoverheadforIO-boundapplicationsrunningoncontainers.
3. CHRplaysasignificantroleontheoverheadofcontainers
4. ContainersmayinducehigheroverheadincomparingtoVMs
5. ContainersontopofVMs(calledVMCN)imposealoweroverheadforIOintensiveapplications
Best Practices
Experimental Setup: MPI task
1. Avoidsmallvanillacontainers2. UsepinningforCPU-boundcontainers3. NotworthwhiletousepinningforCPU-boundVMs4. UsepinningforIOintensiveworkloads5. CPUintensiveapplications:0.07<𝐶𝐻𝑅 <0.146. IOintensiveapplications:0.14<𝐶𝐻𝑅 <0.287. UltraIOintensiveapplications:0.28<𝐶𝐻𝑅 <0.57
Recommended