25
Elastic Memory Management for Cloud Data Analytics Jingjing Wang and Magdalena Balazinska 1

Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

ElasticMemoryManagementforCloudDataAnalytics

JingjingWangandMagdalenaBalazinska

1

Page 2: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

Large-ScaleCloudDataAnalytics

2

Large-ScaleData

MyriaSpark

Hadoop

Impala

MemoryGiraph

• Memoryintensive• Deployedonclusters• Sharedenvironments

MemoryManagement

Page 3: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

MyriaQuery

Container-BasedCloudMemoryManagement

3

SparkApp

MyriaQuery

ResourceManagerSparkApp

MyriaQuery

Containers

• Hard limits: Good forisolation but lack flexibility• Estimating memory usage before execution is hard

Page 4: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

0 2 4 6 8 10 12 14 16Heap Size (GB)

20

50100200

500

Que

ry T

ime

(s)

1..

MyriaSpark 1.1Spark 2.0

InaccurateMemoryEstimatesAffectPerformance

• Applicationfailures due to out-of-memory• Performance degradationduetogarbage collection

4Aself-joinquery

OOM

Page 5: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

OurApproach:ElasticMem• Makecontainermemorylimitsdynamic

• Allocatememorytomultiple applications– Perform actions:garbagecollection,change memlimits,etc

• Predicthowmemoryactionsaffectperformance– Usepredictionstodrivememoryallocationdecisions

• Ourfocus:analytical(relational) queriesinJava-basedcontainers (JVM)

5

Page 6: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

OurApproach:ElasticMem• Makecontainermemorylimitsdynamic

• Allocatememorytomultiple applications– Perform actions:garbagecollection,change memlimits,etc

• Predicthowmemoryactionsaffectperformance– Usepredictionstodrivememoryallocationdecisions

6

Page 7: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

ImplementingDynamicHeapAdjustmentinaJVM

• OpenJDK hasarigiddesign:– Reserveheapspacebasedonuser-specifiedvalue– Cannotbechangedduringruntime

• Butmemoryovercommitting+64-bitaddressspaceopensup anopportunity– Reserveandcommitalargeaddressspace• Doesnotphysicallyoccupymemory

– Adjustlimitsaccordingtoactualusage

7

Page 8: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

ImplementingDynamicHeapAdjustmentinaJVM

8

Change limit(GROW)

GC(Shrink)DeadLive

ResourceManager

JVM

Socket

Kill

Live

Actions

(OpenJDK 7)

Dynamiclimit

Used

Page 9: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

OurApproach:ElasticMem• Makecontainermemorylimitsdynamic

• Allocatememorytomultiple applications– Perform actions:garbagecollection,change memlimits, etc

• Predicthowmemoryactionsaffectperformance– Usepredictionstodrivememoryallocationdecisions

9

Page 10: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

DynamicMemoryAllocation

• Problemdescription:–Multiplequeriessharingmemory– At each timestep, allocatememorybyperformingactions

– Goal:Reducequerytimesandfailures• 0-1knapsackproblem:– Capacity:totalmemory– Items:JVMmemoryusagesafterperformingactions– Itemvalue:definedonmultipleattributes

10

Page 11: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

DynamicMemoryAllocation

11

SparkApp

MyriaQuery

SparkApp

MyriaQuery

GC GROW

SparkApp

GROW

MyriaQuery

KILL

SparkApp

MyriaQuery

SparkApp

MyriaQuery

PAUSE GROW

GROWGROW

Page 12: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

ValuesofActionsandStates

• Kill(KILL):#ofkilledqueries,fewerisbetter• Pause (NOOP):#ofpausedqueries,fewer is better• Costto acquire more memory(cost)– Time/space efficiency

12

Action Value.KILL Value.NOOP Value.costKILL 1 0 N/A

NOOP 0 1 N/A

Others 0 0 time/space

• Valueofastate:sumofactionvalues• Lexicographicorder

Page 13: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

ValuesofActions:TimeandSpace

• Increase memory limit (GROW):– Space:estimatedheapgrowth• Maximumheapusagechangeinthepast few timesteps

– Time:acquiringandaccessingmemoryfromOS• Runacalibrationprogram

• Reclaim memory (GCactions)– Space:sizeofrecycledmemory– Time:GCtime– Howtopredictthemfromheapstates?

13

Page 14: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

OurApproach:ElasticMem• Makecontainermemorylimitsdynamic

• Allocatememorytomultiple applications– Perform actions:garbagecollection,change memlimits,etc

• Predicthowmemoryactionsaffectperformance– Usepredictionstodrivememoryallocationdecisions

14

Page 15: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

BuildPerformanceModelsfromHeapStates

• Our focus: analytical (relational) queries– Largein-memorydatastructures

• Pickhashtablesasourfocus– Predicttime&spacefordifferentGCsfromstats

Join

Aggregate

Select Select

HashTable✕ 2

HashTable

15

Page 16: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

FeaturesofHashTables

9819595054

Alice Bob Carol

Alice

… …

• #oftuples• #ofkeys• Schemainformation

– #oflong columns– #ofString columns– SumoflengthsofString

Keys(zip code),long Values(name),String

16

Join

Select Select

Page 17: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

FeaturesofHashTables

98195

95054

Alice Bob Carol

Alice98195 Alice Bob

98195 Alice Bob

• #oftuples&#ofkeys:Total&deltasincelastGC• 7features to collect,4valuestopredict

GC

17

L LL D D L L L

L L LL LL D D DYoung Collection

Full Collection

LforliveobjectsDfordeadobjectsYoungGen OldGen

Page 18: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

Evaluation:GCModels

• Model: M5PinWeka• Training:generatehashtableswithspecificfeaturevalues

18

Run a query with thegenerated hash table

WekaM5P

Pick featurevalues Collect stats

Page 19: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

Evaluation:GCModels

• Testing:17 TPC-H queries, randomlytriggerGCs

Valuesto Predict RelativeAbsoluteErrors(RAE)

Totalsize ofliveobjectintheyounggeneration(ylive)

23%

Totalsize ofliveobjectintheoldgeneration(olive)

6%

Timeforayounggeneration GC(gcy) 25%Timeforanoldgeneration GC(gco) 22%

19

Page 20: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

Evaluation:Scheduling

• OneAmazonEC2r3.4xlargeinstance• 4mostmemoryintensiveTPC-Hquerieswithscalefactors1and2onMyria

• Original:OpenJDK 7u85– Serialexecution/fullyparallel

• Elastic:ourapproach– Resubmit:resubmittingkilledqueriesseriallyafterallqueriescomplete

20

Page 21: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

Total Memory (GB)10 15 20 30 40 50 70 90

0

500

1000

1500

Ela

psed

Tim

e (s

) 78

88

8

8

8

8

8

8

8

8

8

8

8

8

Elastic-Resubmit, U=1/12ori,null,1,f

ComparetoSerial:MuchLessTimeinMostCases

21

Total Memory (GB)10 15 20 25 30 40 50 70 90

0

500

1000

1500

2000

Elap

sed

Tim

e (s

)

4 4

78

2 45

6

8 8

468 8

8

58 8

8

7 58 8

8

7 68 8

8

8 8 8

8

87

8 8

8

87

8 8

8

8 8

Elastic-Resubmit, U=1/12Elastic, U=1/12

Original, DOP=1Original, DOP=4

Original, DOP=8

#ofcompletedqueries Total Memory (GB)10 15 20 30 40 50 70 90

0

500

1000

1500

Elap

sed

Tim

e (s

) 78

88

8

8

8

8

8

8

8

8

8

8

8

8

Elastic-Resubmit, U=1/12Original, Serial

Page 22: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

Total Memory (GB)10 15 20 30 40 50 70 90

0

500

1000

1500

Ela

psed

Tim

e (s

) 7

2 4

8

48 58

68 8

78

78 8

Elastic-Resubmit, U=1/12ori,null,8,f

ComparetoFullyParallel:LessFailures,LessTime

22

Total Memory (GB)10 15 20 25 30 40 50 70 90

0

500

1000

1500

2000

Elap

sed

Tim

e (s

)

4 4

78

2 45

6

8 8

468 8

8

58 8

8

7 58 8

8

7 68 8

8

8 8 8

8

87

8 8

8

87

8 8

8

8 8

Elastic-Resubmit, U=1/12Elastic, U=1/12

Original, DOP=1Original, DOP=4

Original, DOP=8

Total Memory (GB)10 15 20 30 40 50 70 90

0

500

1000

1500

Elap

sed

Tim

e (s

) 7

2 4

8

48 58

68 8

78

78 8

Elastic-Resubmit, U=1/12Original, Fully Parallel

• AdvantagesofElasticMem:– Automaticallyadjustsconcurrencylevel– Fasterqueryexecutionsandfewerfailures– Lowoverheadincaseserialexecutionisnecessary

2 4 4 5 6 7 7

7

Page 23: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

Total Memory (GB)10 15 20 30 40 50 70 90

0

20

40

60

80

GC

Tim

e R

educ

tion

Ove

r Ful

ly P

aral

lel

(%)

Elastic, U=1/8Elastic, U=1/12

Elastic, U=1/16Elastic, U=500MB

Elastic, U=1000MB

GCTimeReduction:Upto80%

23

• DifferentmemoryincrementsU:– Fixed(U=500MB)ordynamic(U=1/12offreespace)

• Whenmemoryisabundant,carefultuningofU isnotrequired

Page 24: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

OtherResults

24

• Query time saving up to 30%• Elasticmethodsusememorymoreefficiently

Page 25: Elastic Memory Management for Cloud Data Analytics...0 2 4 6 8 10 12 14 16 Heap Size (GB) 20 50 100 200 500 Query Time (s). 1. Myria Spark 1.1 Spark 2.0 Inaccurate Memory Estimates

ElasticMem:Conclusion

• Schedulingwithhardmemorylimitsisinefficient• AvoidusingcontainerswithhardlimitsbymodifyingJVM

• Designaschedulingalgorithmtoallocatememoryacrossmultipleapplicationsinrealtime– BuildmodelstopredictGCtimeandspacesaving– Reducequerytimeupto30%,GCtimeupto80%,usememorymoreefficiently

25