15
Bolt: I Know What You Did Last Summer... In the Cloud Christina Delimitrou Cornell University [email protected] Christos Kozyrakis Stanford University [email protected] Abstract Cloud providers routinely schedule multiple applications per physical host to increase efficiency. The resulting interfer- ence on shared resources often leads to performance degra- dation and, more importantly, security vulnerabilities. Inter- ference can leak important information ranging from a ser- vice’s placement to confidential data, like private keys. We present Bolt, a practical system that accurately de- tects the type and characteristics of applications sharing a cloud platform based on the interference an adversary sees on shared resources. Bolt leverages online data mining tech- niques that only require 2-5 seconds for detection. In a multi- user study on EC2, Bolt correctly identifies the characteris- tics of 385 out of 436 diverse workloads. Extracting this in- formation enables a wide spectrum of previously-impractical cloud attacks, including denial of service attacks (DoS) that increase tail latency by 140x, as well as resource freeing (RFA) and co-residency attacks. Finally, we show that while advanced isolation mechanisms, such as cache partitioning lower detection accuracy, they are insufficient to eliminate these vulnerabilities altogether. To do so, one must either disallow core sharing, or only allow it between threads of the same application, leading to significant inefficiencies and performance penalties. *CCS Concepts: Security and privacy Systems se- curity; Computer systems organization Cloud com- puting *Keywords: cloud computing; security; interference; isola- tion; datacenter; latency; denial of service attack; data min- ing 1. Introduction Cloud computing has reached proliferation by offering re- source flexibility and cost efficiency [3, 1, 28]. Cost ef- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and /or a fee. Request permissions from [email protected]. ASPLOS ’17, April 08 - 12, 2017, Xi’an, China © 2017 Copyright held by the owner/author(s). Publication rights licensed to ACM. ISBN 978-1-4503-4465-4/17/04. . . $15.00 DOI: http://dx.doi.org/10.1145/3037697.3037703 ficiency is achieved through multi-tenancy, i.e., by co- scheduling multiple jobs from multiple users on the same physical hosts to increase utilization. However, multi-tenancy leads to interference in shared resources, such as last level caches or network switches, causing unpredictable perfor- mance [48, 16, 53]. More importantly, unmanaged con- tention leads to security and privacy vulnerabilities [39]. This has prompted significant work on side-channel [43, 70] and distributed denial of service attacks (DDoS) [30, 2, 70], data leakage exploitations [80, 43], and attacks that pin- point target VMs in a cloud system [60, 78, 75, 32]. Most of these schemes leverage the lack of strictly enforced resource isolation between co-scheduled instances and the naming conventions cloud frameworks use for machines to extract confidential information from victim applications, such as encryption keys. This work presents Bolt, a practical system that can ex- tract detailed information about the type, functionality, and characteristics of applications sharing resources in a cloud system. Bolt uses online data mining techniques to quickly determine the pressure an application puts on each of several shared resources. We show that this information is sufficient to determine the framework type (e.g., Hadoop), functional- ity (e.g., DNN), and dataset characteristics of a co-scheduled application, as well as the resources it is most sensitive to. Bolt periodically collects statistics on the resource pressure applications incur and projects this signal against datasets from previously seen workloads. Since detection repeats pe- riodically, Bolt accounts for changes in application behav- ior and can distinguish between multiple co-residents on the same physical host. We validate Bolt’s detection accuracy with a controlled experiment in a 40-server cluster using vir- tualized instances. Out of 108 co-scheduled victim applica- tions, including batch and real-time analytics, and latency- critical services, such as key-value stores and databases, Bolt correctly identifies the type and characteristics of 87% of workloads. We also used Bolt in a user study on Amazon EC2. We asked 20 users to launch applications of their pref- erence on EC2 instances and not disclose any information on their type and characteristics to us. Bolt correctly labeled 277 out of 436 launched jobs and determined the resource characteristics of 385, even when more than 5 jobs share a physical host.

Bolt: I Know What You Did Last Summer In the Cloudcsl.stanford.edu/~christos/publications/2017.bolt.asplos.pdf · 2018-01-09 · Bolt: I Know What You Did Last Summer... In the Cloud

Embed Size (px)

Citation preview

Page 1: Bolt: I Know What You Did Last Summer In the Cloudcsl.stanford.edu/~christos/publications/2017.bolt.asplos.pdf · 2018-01-09 · Bolt: I Know What You Did Last Summer... In the Cloud

Bolt I Know What You Did Last Summer In the Cloud

Christina DelimitrouCornell University

delimitroucornelledu

Christos KozyrakisStanford University

kozyrakistanfordedu

AbstractCloud providers routinely schedule multiple applications perphysical host to increase efficiency The resulting interfer-ence on shared resources often leads to performance degra-dation and more importantly security vulnerabilities Inter-ference can leak important information ranging from a ser-vicersquos placement to confidential data like private keys

We present Bolt a practical system that accurately de-tects the type and characteristics of applications sharing acloud platform based on the interference an adversary seeson shared resources Bolt leverages online data mining tech-niques that only require 2-5 seconds for detection In a multi-user study on EC2 Bolt correctly identifies the characteris-tics of 385 out of 436 diverse workloads Extracting this in-formation enables a wide spectrum of previously-impracticalcloud attacks including denial of service attacks (DoS) thatincrease tail latency by 140x as well as resource freeing(RFA) and co-residency attacks Finally we show that whileadvanced isolation mechanisms such as cache partitioninglower detection accuracy they are insufficient to eliminatethese vulnerabilities altogether To do so one must eitherdisallow core sharing or only allow it between threads ofthe same application leading to significant inefficiencies andperformance penalties

CCS Concepts bull Security and privacy rarr Systems se-curity bull Computer systems organizationrarr Cloud com-putingKeywords cloud computing security interference isola-tion datacenter latency denial of service attack data min-ing

1 IntroductionCloud computing has reached proliferation by offering re-source flexibility and cost efficiency [3 1 28] Cost ef-

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted withoutfee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this noticeand the full citation on the first page Copyrights for components of this work owned by others than the author(s) mustbe honored Abstracting with credit is permitted To copy otherwise or republish to post on servers or to redistribute tolists requires prior specific permission andor a fee Request permissions from permissionsacmorg

ASPLOS rsquo17 April 08 - 12 2017 Xirsquoan China

copy 2017 Copyright held by the ownerauthor(s) Publication rights licensed to ACMISBN 978-1-4503-4465-41704 $1500

DOI httpdxdoiorg10114530376973037703

ficiency is achieved through multi-tenancy ie by co-scheduling multiple jobs from multiple users on the samephysical hosts to increase utilization However multi-tenancyleads to interference in shared resources such as last levelcaches or network switches causing unpredictable perfor-mance [48 16 53] More importantly unmanaged con-tention leads to security and privacy vulnerabilities [39]This has prompted significant work on side-channel [43 70]and distributed denial of service attacks (DDoS) [30 2 70]data leakage exploitations [80 43] and attacks that pin-point target VMs in a cloud system [60 78 75 32] Most ofthese schemes leverage the lack of strictly enforced resourceisolation between co-scheduled instances and the namingconventions cloud frameworks use for machines to extractconfidential information from victim applications such asencryption keys

This work presents Bolt a practical system that can ex-tract detailed information about the type functionality andcharacteristics of applications sharing resources in a cloudsystem Bolt uses online data mining techniques to quicklydetermine the pressure an application puts on each of severalshared resources We show that this information is sufficientto determine the framework type (eg Hadoop) functional-ity (eg DNN) and dataset characteristics of a co-scheduledapplication as well as the resources it is most sensitive toBolt periodically collects statistics on the resource pressureapplications incur and projects this signal against datasetsfrom previously seen workloads Since detection repeats pe-riodically Bolt accounts for changes in application behav-ior and can distinguish between multiple co-residents on thesame physical host We validate Boltrsquos detection accuracywith a controlled experiment in a 40-server cluster using vir-tualized instances Out of 108 co-scheduled victim applica-tions including batch and real-time analytics and latency-critical services such as key-value stores and databases Boltcorrectly identifies the type and characteristics of 87 ofworkloads We also used Bolt in a user study on AmazonEC2 We asked 20 users to launch applications of their pref-erence on EC2 instances and not disclose any informationon their type and characteristics to us Bolt correctly labeled277 out of 436 launched jobs and determined the resourcecharacteristics of 385 even when more than 5 jobs share aphysical host

The information obtained by Bolt makes several cloudattacks practical and difficult to detect For example we useBolt to launch host-based DoS attacks that use informationon the victimrsquos resource sensitivity to inject carefully-craftedcontentious programs that degrade the victimrsquos performanceIn the 40-server cluster Boltrsquos DoS attack translates to a taillatency increase of up to 140times for interactive workloadsUnlike traditional DoS attacks that saturate compute andmemory resources Bolt maintains low CPU utilization byonly introducing interference in the most critical resourcesmaking it resilient to DoS mitigation techniques such asload-triggered VM migration We have also used Bolt tolaunch resource freeing attacks (RFA) that force the victimto yield its resources to the adversary [67] and VM co-residency detection attacks that pinpoint where in a sharedcluster a specific application resides [69] We show thatthe information obtained through data mining is critical toescalate the impact of the attack reduce its time and costand to make it difficult to detect

Finally we examine to what degree current isolation tech-niques can alleviate these security vulnerabilities We an-alyze baremetal systems containers and virtual machineswith techniques like thread pinning memory bandwidth iso-lation and network and cache partitioning These are themain isolation techniques available today and while theyprogressively reduce application detection accuracy from81 to 50 they are not sufficient to eliminate these vulner-abilities We show that the only method currently availableto reduce accuracy down to 14 is core isolation where anapplication is only allowed to share cores with itself to avoidmalicious co-residents However since application threadscontend for shared on-chip resources performance degradesby 34 on average Alternatively if we allocate more coresper application to mitigate performance unpredictability weend up with resource underutilization and cloud inefficiencyWe hope that this study will motivate public cloud providersto introduce stricter isolation solutions in their platforms andsystems architects to develop fine-grain isolation techniquesthat provide strong isolation and performance predictabilityat high utilization

2 Related WorkPerformance unpredictability is a well-studied problem inpublic clouds that stems from platform heterogeneity re-source interference software bugs and load variation [12 4762 18 17 20 54 35 21 62 59 37] A lot of recent workhas proposed isolation techniques to reduce unpredictabilityby eliminating interference [58 61 45 65 64] With Boltwe show that unpredictability due to interference also hidessecurity vulnerabilities since it enables an adversary to ex-tract information about an applicationrsquos type and character-istics Below we discuss related work with respect to cloudvulnerabilities such as VM placement detection DDoS andside-channel attacks

VM co-residency detection Cloud multi-tenancy has mo-tivated a line of work on locating a target VM in a pub-lic cloud Ristenpart et al [60] showed that the IP machinenaming conventions of cloud providers allowed adversarialusers to narrow down where a victim VM resided in a clus-ter Xu et al [74] and Herzberg et al [32] extended thisstudy resulting in part in cloud providers changing theirnaming conventions reducing the effectiveness of networktopology-based co-residency attacks Following this devel-opment Varadarajan et al [69] evaluated the susceptibil-ity of three cloud providers to VM placement attacks andshowed that techniques like virtual private clouds (VPC) ren-der some of them ineffective Similarly Zhang et al [78]designed HomeAlone a system that detects VM placementby issuing side-channels in the L2 cache during periods oflow traffic Finally Han et al [31] proposed VM placementstrategies that defend against placement attacks althoughthey are not specifically geared towards public clouds WithBolt we show that leveraging simple data mining techniqueson the pressure applications introduce on shared resourcesincreases the accuracy of VM co-residency detection signif-icantly Bolt does not rely on knowing the cloudrsquos networktopology or host IPs making it resilient against mitigationtechniques such as VPCsPerformance attacks Once a victim application is locatedan adversary can can negatively affect its performance Dis-tributed Denial of Service attacks [51 23 67 34] in thecloud have increased in number and impact over the pastyears This has generated a lot of interest in detection andprevention techniques [55 14 30] Bakshi et al [2] for ex-ample developed a system that detects abnormally high net-work traffic that could signal an upcoming DDoS attackwhile Crosby et al [13] and Edge [22] proposed a newDoS attack relying on algorithmic complexity that drivesCPU usage up Finally resource-freeing attacks (RFAs) alsohurt a victimrsquos performance while additionally forcing it toyield its resources to the adversary [67] While RFAs areeffective they require significant compute and network re-sources and are prone to defenses such as live VM migra-tion that cloud providers introduce to mitigate performanceunpredictability due to resource saturation In contrast Boltlaunches host-based attacks on the same machine as the vic-tim that take advantage of the victimrsquos resource sensitivityand keep resource utilization moderate therefore evadingdefense mechanismsSide-channel attacks There are also attacks that attempt toextract confidential information from co-scheduled applica-tions such as private keys [81 42 76 4] Zhang et al [80]proposed a system that launches side-channel attacks in avirtualized environment and cross-tenant side-channel at-tacks in PaaS clouds [79] Wu et al [73] on the other handused the memory bus of an x86 processor to launch a covert-channel attack and degrade the victimrsquos performance Onthe defense side Perez-Botero et al [56] analyzed the vul-

Server

N vCPUs 4 vCPUs

Victim VM Advers VM

Server

N vCPUs 4 vCPUs

Victim VM Advers VM

ci

cj

ck

~5 sec

[ci 0 cj 0 0 0 hellip ck 0]

c11 c12 c13 c14 hellip c1(N-1) c1N

c21 c22 c23 c24 hellip c2(N-1) c2N

c31 c32 c33 c34 hellip c3(N-1) c3N

cM1 cM2 cM3 cM4hellip cM(N-1) cMN

SVD +

SGD

c11 c12 c13 c14 hellip c1(N-1) c1N

c21 c22 c23 c24 hellip c2(N-1) c2N

c31 c32 c33 c34 hellip c3(N-1) c3N

cM1 cM2 cM3 cM4hellip cM(N-1) cMN

c1 c2 c3 c4 hellip c(N-1) cN

hellip

hellip

hellip

hellip

App 1

App 2

App 3

App M

Pearson similarity

App 3 80

App 4 21

App 2 45

App 1 16

App M 1

~50 msec ~30 msec

Figure 1 Overview of the application detection process Bolt first measures the pressure co-residents place in sharedresources and then uses data mining to determine the type and characteristics of co-scheduled applications

nerabilities of common hypervisors and Wang et al [70]proposed a system for intrusion detection in cloud settingswhile Liu et al [43] and Varadajaran et al [68] designedscheduler-based defenses against covert- and side-channelattacks in the memory bus The latter system controls theoverlapping execution of different VMs and injects noise inthe memory bus to prevent an adversary from extracting con-fidential information Bolt does not rely on accurate microar-chitectural event measurements through performance coun-ters to detect application placement and is therefore resilientto techniques that limit the fidelity of time-keeping and per-formance monitoring to thwart information leakage in side-channel attacks [49]

3 Bolt31 Threat ModelBolt targets IaaS providers that operate public clouds for mu-tually untrusting users Multiple VMs can be co-scheduledon the same server Each VM has no control over whereit is placed and no a priori information on other VMs onthe same physical host For now we assume that the cloudprovider is neutral with respect to detection by adversarialVMs ie it does not assist such attacks or employ addi-tional resource isolation techniques than what is availableby default to hinder attacks by adversarial users In Section 6we explore how additional isolation techniques affect Boltrsquosdetection accuracyAdversarial VM An adversarial VM has the goal of de-termining the nature and characteristics of any applicationsco-scheduled on the same physical host and negatively im-pacting their performance Adversarial VMs start with zeroknowledge of co-scheduled workloadsFriendly VM This is a normal VM scheduled on a physi-cal host that runs one or more applications Friendly VMsdo not attempt to determine the existence and characteris-tics of other co-scheduled VMs They also do not employany schemes to prevent detection such as memory patternobfuscation [27]

32 Application DetectionDetection relies on inferring workload characteristics fromthe contention the adversary experiences in shared resources

For simplicity we assume one co-scheduled victim jobfor now and generalize in Section 33 Figure 1 shows anoverview of the systemrsquos operation

Bolt instantiates an adversarial VM on the same physi-cal host as the victim friendly VM Bolt uses its VM to runa few microbenchmarks of tunable intensity that each putprogressively more pressure on a specific shared resourceThe resources stressed by the microbenchmarks include on-chip resources such as functional units and different lev-els of the cache hierarchy and off-chip resources such asthe memory network and storage subsystems [15] Each mi-crobenchmark progressively increases its intensity from 0 to100 until it detects pressure from the co-scheduled work-load ie until the microbenchmarkrsquos performance is worsethan its expected value when running in isolation The inten-sity of the microbenchmark at that point captures the pres-sure the victim incurs in shared resource i and is denoted ciwhere iisin[1N] N=10 and ci isin[0100] The 10 resources areL1 instruction and L1 data cache L2 and last level cachememory capacity and memory bandwidth CPU networkbandwidth disk capacity and disk bandwidth Large valuesof ci imply high pressure in resource i For unconstrained re-sources eg last level cache 100 pressure means that thebenchmark takes over the entire resource capacity For re-sources constrained through partitioning mechanisms egmemory capacity in VMs 100 corresponds to taking overthe entire memory partition allocated to the VM

Bolt uses 2-3 microbenchmarks for profiling requiringapproximately 2-5 seconds in total It randomly selects onecore and one uncore benchmark to get a more representativesnapshot of the co-residentrsquos resource profile If the coreis not shared between the adversarial and friendly VMsBolt measures zero pressure in the shared core resource Itthen adds a third microbenchmark for an additional uncoreresource (shared caches memory network or storage)

We use this sparse signal to determine the applicationrsquospressure in all other resources its type and characteristicsOnline data mining techniques have recently been shown toeffectively solve similar problems in cluster management byfinding similarities between new unknown applications andpreviously seen workloads [19] The Quasar cluster man-ager for example used collaborative filtering to find similar-

ities in terms of heterogeneity interference sensitivity andprovisioning The advantage of collaborative filtering is thatit can identify application similarities without a priori knowl-edge of application types and critical features Unfortunatelythis means that it is also unable to label the victim workloadsand classify their functionality making it insufficient for theadversaryrsquos purposePractical data mining Bolt instead feeds the profiling sig-nal to a hybrid recommender using feature augmentation [925] that determines the type and resource characteristicsof victim workloads The recommender combines collab-orative filtering and content-based similarity detection [929] The former has good scaling properties relaxed spar-sity constraints and offers conceptual insight on similaritieswhile the latter exploits contextual information for accurateresource profile matching

First a collaborative filtering system recovers the pres-sure the victim places in non-profiled resources [16 83]The system relies on matrix factorization with singularvalue decomposition (SVD) [72] and PQ-reconstructionwith stochastic gradient descent (SGD) [6 38] to find simi-larities between the new victim application and previously-seen workloads SVD produces three matrices U Σ and VThe singular values σi in Σ correspond to similarity con-cepts such as the intensity of compute operations or thecorrelation between high network and disk traffic Similar-ity concepts are in the order of the number of examinedresources and are ordered by decreasing magnitude in ΣLarge singular values reveal stronger similarity concepts(higher confidence in workload correlation) while smallervalues correspond to weaker similarity concepts and aretypically discarded during the dimensionality reduction bySVD Matrix U(mr) captures the correlation between eachapplication and similarity concept and V(nr) the correlationbetween each resource and similarity concept

Once we have identified critical similarity concepts anddiscarded inconsequential information a content-based sys-tem that uses weighted Pearson correlation coefficients tocompute the similarity between the resource profile u j of anew application j and the applications the system has previ-ously seen Weights correspond to the values of the r morecritical similarity concepts 1 and resource profiles to therows of the matrix of left singular vectors U Using tra-ditional Pearson correlation would discard the application-specific information that certain resources are more criticalfor a given workload Similarity between applications A andB is given by

Weighted Pearson(AB) =cov(uAuBσ)radic

cov(uAuAσ) middot cov(uBuBσ)(1)

where uA is the correlation of application A with each simi-larity concept σi cov(uAuBσ)= sumi σi(uAiminusm(uAσ))(uBiminusm(uBσ))

sumi σi

1 We keep the r largest singular values such that we preserve 90 of thetotal energy sum

ri=1 σ2

i = 90sumni=1 σ2

i

0 25 50 75 100L1-i Cache ()

100

75

50

25

0Last

Lev

el C

ache

(

)

0 25 50 75 100L1-d Cache ()

100

75

50

25

0

CP

U (

)

0 25 50 75 100Memory Capacity ()

100

75

50

25

0Mem

ory

Ban

dwid

th (

)

0 5 10 15 200

5

10

15

20

00

01

02

03

04

05

06

07

08

09

10

Pro

babi

lity

0 25 50 75 100Disk Capacity ()

100

75

50

25

0Net

wor

k B

andw

idth

(

)

0 25 50 75 100Disk Bandwidth ()

100

75

50

25

0

L2 C

ache

(

)

0 5 10 15 200

5

10

15

20

00

01

02

03

04

05

06

07

08

09

10

Pro

babi

lity

Figure 2 We show the correlation between the pres-sure in various resources and the probability of a co-scheduled application being memcached For exampleworkloads with high LLC pressure and very high L1-ipressure have a high probability of being memcached

the covariance of A and B under weights σ and m(uAσ) =sumi σimiddotuAi

sumi σithe weighted mean for A The output of the hybrid

recommender is a distribution of similarity scores of howclosely a victim resembles different previously-seen appli-cations For example a victim may be 65 similar to amemcached workload 18 similar to a Spark job runningPageRank 10 similar to a Hadoop job running an SVMclassifier and 3 to a Hadoop job running k-means The95th percentile of the recommenderrsquos end-to-end latency is80msec Apart from application labels this analysis yieldsinformation on the resources the victim is sensitive to en-abling several practical performance attacks (Section 5)System insights from data mining Before dimensionalityreduction each similarity concept corresponds to a sharedresource Different resources convey different amounts of in-formation about a workload and thus have different value toBolt for detection The magnitude of each similarity conceptreflects how strongly it captures application similarities Forthe controlled experiment of Section 34 that involves batchand interactive jobs the resources with most value are the LLand L1-i caches followed by compute intensity and memorybandwidth While the exact order depends on the applica-tion mix correlating similarity concepts to resources showsthat certain resources are more prone to leaking informationabout a workload and their isolation should be prioritized

Finally we show how resource pressure correlates to theprobability that an application is of a specific type Figure 2show the probability that an unknown workload is a mem-cached instance with a read-mostly load and KB-range val-ues as a function of its measured resource pressure (de-tails on methodology in Section 34) We decouple the 10resources in 2D plots for clarity From the heatmaps it be-comes clear that cache activity is a very strong indicator ofworkload type with applications with very high L1-i andhigh LLC pressure corresponding to memcached with a highprobability Disk traffic also conveys a lot of information

with zero disk usage signaling a memcached workload withvery high likelihood Similar graphs can be created to fin-gerprint other application types The high heat areas aroundthe red regions of each graph correspond to memcached jobswith different rdwr ratios and value sizes and memory-bound workloads like Spark

33 ChallengesMultiple co-residents When a single application shares ahost with Bolt detection is straightforward However cloudoperators colocate VMs on the same physical host to maxi-mize the infrastructurersquos utility Disentangling the contentioncaused by several jobs is challenging By default Bolt usestwo benchmarks - one core and one uncore - to measure re-source pressure If the recommender cannot determine thetype of co-scheduled workload based on them (all Pear-son correlation coefficients below 01) either the applica-tion type has not been seen before or the resource pressureis the result of multiple co-residents If at least one of the co-residents shares a core with Bolt ie the core benchmarkreturns non-zero pressure we profile with an additional corebenchmark and use it to determine the type of co-runnerBecause hyperthreads are not shared between multiple ac-tive instances this allows accurately measuring pressure incore resources The remainder of the uncore interference isused to determine the other co-scheduled workloads Thisassumes a linear relationship between jobs for the resourcesof memory bandwidth and IO bandwidth which can intro-duce some inaccuracies but in practice does not significantlyimpact Boltrsquos detection ability (see Section 34)

Finally there are occurrences where none of the co-scheduled applications share a core with Bolt In this casethe system must rely solely on uncore resource pressure todetect applications To disentangle the resource character-istics across co-residents Bolt employs a shutter profilingmode which involves frequent brief profiling phases (10-50msec each) in uncore resources The goal of this mode isto capture at least one of the co-scheduled workloads duringa low-pressure phase which would reveal the resource usageof only one of the co-residents much like a high speed cam-era shutter attempts to capture a fast-moving object still Fig-ure 3 shows how shutter profiling works When the shutteris open (upper left figure) thus profiling is off Bolt cannotdifferentiate between the two victim VMs When the shutteris closed during profiling Bolt checks for changes in uncoreresource pressure If both victims are at high load differenti-ating between them is difficult (lower left) Similarly whenboth VMs are at low load (lower right) However when onlyVM2 is at low load the pressure Bolt captures is primarilyfrom VM1 allowing the system to detect its type and basedon the remaining pressure also detect VM2 This techniqueis particularly effective for user-interactive applications thatgo through intermittent phases of low load it is less effec-tive for mixes of services with constant steady-state loadsuch as long-running analytics or logging services We will

Server VM2 Bolt VM1

Server VM2 Bolt VM1

13

13

13

13

Server VM2 Bolt VM1

Server VM2 Bolt VM1

13

13

13

13

13

13

Figure 3 Shutter profiling mode

consider whether additional input signals such as per-jobcache miss rate curves can improve detection accuracy forthe latter workloadsApplication phases Datacenter applications are notoriousfor going through multiple phases during their executionOnline services in particular follow diurnal patterns withhigh load during the day and low load in the night [3 50 44]Moreover an application may not be immediately detectedespecially during its initialization phase Cloud users maypurchase instances to run a service and then maintain themto execute other applications eg analytics over differentdatasets In these cases the results of Boltrsquos detection canbecome obsolete over time We address this by periodicallyrepeating the profiling and detection steps for co-scheduledworkloads Each iteration takes 2-5 seconds for profilingand a few milliseconds for the recommenderrsquos analysis Sec-tion 34 shows a sensitivity study on how frequently detec-tion needs to happen

34 Detection AccuracyWe first evaluate Boltrsquos detection accuracy in a controlledenvironment where all applications are known We usea 40-machine cluster with 8 core (2-way hyperthreaded)Xeon-class servers and schedule a total of 108 workloadsincluding batch analytics in Hadoop [46] and Spark [77]and latency-critical services such as webservers mem-cached [26] and Cassandra [10] For each application typethere are several different workloads with respect to algo-rithms framework versions datasets and input load patternsFriendly applications are scheduled using a least-loaded(LL) scheduler that allocates resources on the machines withthe most available compute memory and storage All work-loads are provisioned for peak requirements to reduce inter-ference Even so interference between co-residents existssince the scheduler does not account for the sensitivity ap-plications have to contention In the end of this section weevaluate how a scheduler that accounts for cross-applicationinterference affects detection accuracy The training set con-sists of 120 diverse applications that include webserversvarious analytics algorithms and datasets and several key-value stores and databases The training set is selected toprovide sufficient coverage of the space of resource charac-teristics Figure 4 shows the pressure jobs in the training setput in compute memory storage and network bandwidth

0 20 40 60 80 100CPU Pressure ()

0

20

40

60

80

100M

emor

y P

ress

ure

()

0 20 40 60 80 100Network Pressure ()

0

20

40

60

80

100

Sto

rage

Pre

ssur

e (

)

Figure 4 Coverage of resource characteristics for appli-cations in the training set

Applications Detection accuracy ()Least Load scheduler Quasar scheduler

Aggregate 87 89memcached 78 80

Hadoop 92 92Spark 85 86

Cassandra 90 89speccpu2006 84 85

Table 1 Boltrsquos detection accuracy in the controlled ex-periment with the least loaded scheduler and Quasar

While there are clusters of applications that saturate severalresources the selected workloads cover the majority of theresource usage space This enables Bolt to match any re-source profile to information from the training set even ifthe new application has not been previously seen Increasingthe training set size further did not improve detection accu-racy Finally note that there is no overlap between trainingand testing sets in terms of algorithms datasets and inputloads

In each server we instantiate an adversarial VM runningUbuntu 1404 By default the VM has 4vCPUs (2 physicalcores) to generate enough contention (Fig 10 shows a sensi-tivity analysis on the VM size) The remainder of each ma-chine is allocated to one or more friendly VMs The max VMnumber 5 in our setting depends on the available serversFriendly VMs run on Ubuntu 1404 or Debian 80 Adver-sarial VMs have no a priori information on the number andtype of their co-residents Applications are allowed to sharea physical core but must be on different hyperthreads (vC-PUs) While sharing a single hyperthread is common in pri-vate deployments hosting batch jobs in small containers [7]it is uncommon in public clouds where 1 vCPU is the min-imum guaranteed size of dedicated resources for non-idlinginstances

Table 1 shows Boltrsquos detection accuracy per applicationclass and aggregate We signal a detection as correct if Boltidentifies correctly the framework (eg Hadoop) or service(eg memcached) the application uses and the algorithmeg SVM on Hadoop or user load characteristics eg read-vs write-heavy load We do not currently examine moredetailed features such as the distribution of individual querytypes Bolt correctly identifies 87 of jobs and for certainapplication classes like databases and analytics the accuracy

CPU

L1-i

L1-d

L2

LLC

MemCap

MemBW

NetBW

DiskCap

DiskBW

0204

0608

HadoopwordCountSCPU

L1-i

L1-d

L2

LLC

MemCap

MemBW

NetBW

DiskCap

DiskBW

0204

0608

HadooprecommenderLCPU

L1-i

L1-d

L2

LLC

MemCap

MemBW

NetBW

DiskCap

DiskBW

0204

0608

New unknown app

Similarity=029 Similarity=078

Figure 5 Star charts showing the resource profiles oftwo Hadoop and one unknown job The closer the shadedarea is to a vertex the higher the pressure in the specificresource

1 2 3 4 5Number of Applications

0

20

40

60

80

100

Acc

urac

y (

)L1-i L1-d L2 LLC CPU MemCMemBw Net DiskBw DiskC

Dominant Resource

0

20

40

60

80

100

Acc

urac

y (

)

12

5

5

1918 17

15 114 2

Figure 6 Detection accuracy as a function of the numberof co-runners (left) and the appsrsquo dominant resources(right)

exceeds 90 Misclassified jobs are typically identified asworkloads with the same or similar critical resourcesPer-application profiles Note that although in Table 1 wegroup applications by programming framework or onlineservice each framework does not correspond to a single re-source profile Profiles of applications within a frameworkcan vary greatly depending on functionality complexity anddataset features Boltrsquos recommender system matches re-source profiles to specific algorithmic classes and datasetcharacteristics within each framework Figure 5 shows anexample where two Hadoop jobs one running word counton a small dataset and one running a recommender systemon a very large dataset exhibit very different resource pro-files While the third application is also a Hadoop job it isidentified as very similar to the recommender as opposed toword countNumber of co-residents The number of victim applica-tions affects detection accuracy Figure 6a shows accuracyas a function of the number of victims per machine Whenthe number of co-residents is less than 2 accuracy exceeds95 As the number increases accuracy drops since it be-comes progressively more difficult to differentiate betweenthe contention caused by each workload When there are5 co-scheduled applications accuracy is 67 Interestinglyaccuracy is higher for 4 than for 3 applications since with 4workloads the probability of sharing a core and thus gettingan accurate measurement of core pressure is higher Whileaggressive multi-tenancy affects accuracy large numbers ofco-residents in public clouds are unlikely in practice [41]

Dominant resource We now examine the correlation be-tween the resource an application puts the most pressure onand Boltrsquos ability to correctly detect it (Figure 6) The num-bers on each bar show how many applications have each re-source as dominant In general applications that are mosteasily detected are ones with high instruction cache (L1-i)memory bandwidth network bandwidth and disk capacitypressure These are workloads that fall in one of the fol-lowing categories latency-critical services with large code-bases such as webservers in-memory analytics like Sparkand disk-bound analytics like Hadoop Interestingly L2 ac-tivity is a poor indicator of application type in contrast to L1and LLC since it does not capture a significant change in theworking set size (from 32KB to 256KB for our platforms)Number of iterations In this controlled experiment westop the detection process upon correct identification of aworkload For several jobs this requires more than one it-erations of profiling and data mining Figure 7a shows thefraction of workloads that were correctly-identified after Niterations 71 of victims only require a single iterationwhile an additional 15 required a second A small frac-tion of applications needed more than two iterations whilejobs that are not identified correctly until the sixth iterationdid not benefit from additional iterations The number of ap-plications per machine also affects the iterations required fordetection (Figure 7b) When a single job is scheduled on amachine one iteration is sufficient for almost all workloadsAs the number of victims per machine increases additionaliterations are needed to disentangle the contention signals ofeach co-scheduled job

Figure 8 shows a case of detection over several iter-ations The victim is a 4vCPU instance executing differ-ent consecutive jobs Detection occurs every 20sec by de-fault (see Fig 10 for a sensitivity study on profiling fre-quency) After the initial iteration Bolt detects the job asmcf from SPECCPU2006 In the next two iterations the in-terference profile of the victim continues to fit mcfrsquos char-acteristics At t=60sec Bolt detects that the job profile haschanged and now resembles an SVM classifier running onMahout [46] Similarly at t=180sec the victim changesagain to a Spark data mining workload Changes are typi-cally captured within a few secondsResource pressure Figure 9 shows the correlation betweenthe pressure of a victim in a given resource and Boltrsquos abilityto correctly detect it We plot detection accuracy for threecore and three uncore resources the results are similar forthe remaining four In almost all cases very low or veryhigh pressure carries the most value for detection For diskbandwidth accuracy remains high except for the 20-50 re-gion which contains many application classes with moder-ate disk activity including analytics in Hadoop databasesand graph applications Resource pressure also affects thenumber of iterations Bolt needs to correctly identify a work-load For resources with a lot of value for detection such as

1 2 3 4 5 6Iterations until Detection

0

10

20

30

40

50

60

70

80

PD

F (

)

1 2 3 4 5 6Iterations until Detection

0

20

40

60

80

100

PD

F (

)

1 app2 apps3 apps4 apps5 apps

Figure 7 PDFs of iterations required for detection intotal (left) and as a function of the co-residents number(right)

0 1 2 3 4 5 6 7Time (min)

0

20

40

60

80

100

Res

ourc

e P

ress

ure

() SPEC Hadoop Spark mem$d Cassandra L1-i

L1-dL2LLCCPUmemCapmemBwNetBwDiskCapDiskBw

Figure 8 Example of workload phase detection by Bolt

the L1-i and LLCs when pressure is moderate several iter-ations are needed for accuracy to increase In contrast foroff-chip resources like network bandwidth this effect is lesspronounced In general when moderate pressure hurts de-tection accuracy (Figure 9) more iterations are needed forcorrect identificationScheduler So far we have used a least loaded schedulerwhich is commonly-used in datacenters [33 63] This sched-uler does not account for resource contention between appli-cations leading to suboptimal performance [16 24 5 8482] Recent work has shown that if interference is accountedfor at the scheduler level both performance and utilizationimprove [48 19 53] We now evaluate the impact of sucha scheduler on Boltrsquos detection accuracy Quasar [19] lever-ages machine learning to quickly determine which applica-tions can be co-scheduled on the same machine without de-structive interference We use Quasar to schedule all victimapplications and then inject Bolt in each physical host for de-tection In a real setting Bolt can trick an interference-awarescheduler by incurring very low levels of interference ini-tially until it is co-scheduled with a victim application andcan determine its type and characteristics

Table 1 shows Boltrsquos accuracy with the least loadedscheduler (LL) and Quasar Interestingly accuracy increasesslightly with Quasar 2 on average but in general the im-pact of the scheduler is small There are two reasons for theincrease first both LL and Quasar do not share a singlehyperthread between jobs so core pressure measurementsremain accurate Second because Quasar only co-schedulesjobs with different critical resources it provides Bolt with aless ldquonoisyrdquo interference signal for uncore resources mak-ing distinguishing co-residents easier Therefore reducinginterference in software alone is not sufficient to mitigatethese security threats

0 20 40 60 80 100Resource Pressure ()

50

60

70

80

90

100

Acc

urac

y (

)

0 20 40 60 80 100Resource Pressure ()

50

60

70

80

90

100

Acc

urac

y (

)

L1-iLLCCPUMemCapNetBWDiskBW

Figure 9 Detection accuracy as a function of the pres-sure victims place in various shared resources

0 50 100 150 200 250 300Profiling Interval (sec)

0

20

40

60

80

100

Acc

urac

y (

)

0 5 10 15 20 25 30Adversarial VM size (vCPUs)

0

20

40

60

80

100

Acc

urac

y (

)

0 2 4 6 8 10Number of Benchmarks

0

20

40

60

80

100

Acc

urac

y (

)

Figure 10 Sensitivity to (a) profiling frequency (b)adversarial VM size and (c) profiling benchmarks

Sensitivity analysis Finally we examine how design de-cisions in Bolt affect detection Figure 10a shows how ac-curacy changes as profiling frequency decreases For profil-ing intervals beyond 30 sec accuracy drops rapidly If pro-filing only occurs every 5 minutes almost half the victimsare incorrectly identified Note that if workloads are long-running and mostly-stable profiling can be less frequentwithout such an impact on accuracy Figure 10b shows ac-curacy as a function of the adversarial VMrsquos size We exam-ine sizes offered as on-demand instances by EC2 [1] If theadversary has fewer than 4 vCPUs its resources are insuffi-cient to create enough contention to capture the co-residentsrsquopressure Accuracy continues to grow for larger sizes how-ever the larger the adversarial instance is the less likely itwill be co-scheduled with other VMs nullifying the valueof Bolt Finally Figure 10c shows accuracy as a function ofthe number of microbenchmarks used for profiling A singlebenchmark is not sufficient to fingerprint the characteristicsof a workload however using more than 3 benchmarks hasdiminishing returns in accuracy Unless otherwise specifiedwe use 20sec profiling intervals 4 vCPU adversarial VMsand 2 benchmarks for initial profiling

35 LimitationsWhile Bolt can accurately detect most frequently-run work-loads it has certain limitations First when no job sharesa core with the adversary the system assumes linear rela-tion between the co-residentsrsquo resource pressure in uncoreresources While this is true in some cases it may not begenerally accurate We will explore alternative ways of dis-tinguishing co-residents in future work Second the systemcannot differentiate between multiple jobs running in a sin-gle instance and multiple instances running one job eachWhile this does not affect detection if a covert-channel at-tack is aimed at a particular user more accurate informationwould increase the adversaryrsquos leverage Similarly Bolt can-not currently distinguish between two copies of the same job

sharing a platform at low load and one copy of the job thatruns at higher load

4 Cloud User StudyWe now evaluate Bolt in a real-world environment withmultiple users and unknown applications We conducted amulti-user study on Amazon EC2 subject to the followingrules

bull We have engaged 20 independent users from two aca-demic institutions (Cornell University and Stanford Uni-versity) provided them with access to a shared clusterof 200 c38xlarge instances on EC2 with 32 vCPUs and60GB of memory each and asked them to submit severalapplications of their choosing

bull On each instance we maintain a 4-vCPU VM for Bolt Allother resources are made available to the users as VMs

bull Each user submits one or more applications and all usershave equal priority ie no application will be preemptedto accommodate the job of another user

bull Users were instructed to behave amenably ie to avoidconsuming disproportionate resources so that the systemcan observe a representative job sample from all users

bull Users were also advised to pin their applications to spe-cific cores to eliminate interference from the OS sched-uler Physical cores can be shared across jobs howevereach vCPU (hardware thread) is dedicated to a single ap-plication

bull Users can optionally select the instance(s) they want tolaunch their applications on We provide a spreadsheetwith a list of all available instances and their utilizationat each point in time This information as well as any apriori information on the type number and characteris-tics of submitted applications is concealed from Bolt

bull If a user does not select an instance a least-loaded sched-uler selects the VM with the most available cores andmemory

Once users start launching jobs in the cluster we instan-tiate Bolt on each machine and activate it periodically to de-tect the type characteristics and number of co-scheduledapplications The entire experiment lasts approximately 4hours beyond which point all instances are terminated Fig-ure 11 shows the PDF of application types the different userslaunched provided by the users after the completion of theexperiment Different colors correspond to different users Atotal of 436 jobs were submitted to the cluster The applica-tion mix includes analytics jobs in Hadoop and Spark sci-entific benchmarks (Parsec Bioparallel) hardware synthesistools (Cadence VivadoHLS) multicore (zsim) and n-bodysimulations email clients web browsers web servers (http)music and video streaming services and databases amongothers Each job may take one or more vCPUs Even though

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30O

ccur

renc

esUser 1User 2User 3User 4User 5User 6User 7User 8User 9User 10

User 11User 12User 13User 14User 15User 16User 17User 18User 19User 20

1hadoop 5cadence 9MLPython 13spec 17parsec 21postgres 25ppt 29du -h 33cpu burn 37html 41cpmv 45rsync 49rm 53ix2spark 6zsim 10make 14matlab 18vim 22musicStream 26OS img 30crdel cgroup 34audacity 38cassandra 42sirius 46ping 50skype3email 7video 11mem$d 15mysql 19scala 23minebench 27pdfview 31bioparallel 35javascript 39mongoDB 43oProfile 47photoshop 51zipkin

4browser 8latex 12http server 16vivado 20php 24n-body sim 28scons 32storm 36create VMs 40mkdir 44dwnld LF 48ssh 52graphX

Figure 11 Probability distribution function of applications launched in the cloud study per user

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30

Occ

urre

nces

Correctly Identifying App Name

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30

Occ

urre

nces

Correctly Identifying App Characteristics

0 50 100 150 200Time (min)

0

50

100

150

200

Inst

ance

s

0006121824303642485460

Act

ive

Jobs

Figure 12 Detection accuracy of (a) application labels and (b) resource characteristics Fig 12(c) number of jobs perinstance

figure 11 groups applications by framework the individuallogic and datasets differ between jobs in the same frame-work Note that we have not updated the training set for thisstudy it consists of the same 120 applications described inSection 34

Figure 12a shows the number of applications that werecorrectly identified by name across the different categoriesIn total 277 jobs are correctly labeled As expected Boltcould not assign a label to application types it has not seenbefore such as email clients or image editing applicationsHowever even when Bolt cannot label an application it canstill identify the resources it is sensitive to as shown in fig-ure 12b For 385 out of 436 jobs Bolt correctly identifiesthe jobrsquos resource characteristics For performance attackssuch as the ones described in Section 5 this information issufficient to dramatically degrade the performance of a vic-tim application Finally figure 12c shows the number of co-scheduled applications per instance throughout the durationof the experiment The bottom 14 instances remained un-used for the remaining 186 instances the number of activeapplications ranges from 1 to 6 with each job taking oneor more vCPUs The majority of misclassified applicationscorrespond to instances with 5 or more concurrently activejobs

5 Security AttacksBolt makes several cloud attacks practical and difficult todetect We discuss three possible attacks that leverage theinformation obtained through detection

51 Internal Denial of Service AttackAttack setting Denial of service attacks hurt the perfor-mance of a victim service by overloading its resources [142 51] In cloud settings specifically they can be catego-rized in two types external and internal (or host-based) at-tacks External attacks are the most conventional form ofDoS [51 13 22] These attacks utilize external servers to di-rect excessive traffic to the victims flooding their resourcesand hurting their availability External DoS attacks affectmostly PaaS and SaaS systems and include IP spoofing syn-chronization (SYN) flooding smurf buffer overflow landand teardrop attacks

In contrast internal DoS attacks take advantage of IaaSand PaaS cloud multi-tenancy to launch adversarial pro-grams on the same host as the victim and degrade its perfor-mance [14 60 34 52] For example Ristenpart et al [60]showed how an adversarial user can leverage the IP namingconventions of IaaS clouds to locate a victim VM and de-grade its performance Cloud providers are starting to builddefenses against such attacks For example EC2 offers au-toscaling that will increase the instances of a service un-der heavy resource usage Similarly there is related workon mechanisms that detect saturation in memory bandwidth

0 20 40 60 80 100 120Time (s)

0

2000

4000

6000

8000

10000

12000

99th

ile

Lat

ency

BoltNaive

Detection

Migration

0 20 40 60 80 100 120Time (s)

0

20

40

60

80

100

CP

U U

tiliz

atio

n (

) BoltNaive

Detection

Migration

Figure 13 Latency and utilization with Bolt and a naıveDoS attack that saturates CPU resources

and can trigger VM migration [71] This means that DoSattacks that simply overload a physical host are ineffectivewhen such defenses are in place We focus on host-basedDoS attacks and make them resilient against such defensesby avoiding resource saturation We leverage the informationobtained by Bolt to construct custom interference programsthat stress the resources the victim is most sensitive to with-out overloading the host

Bolt creates custom contentious workloads by combiningthe microbenchmarks used to measure the resource pressureof the victim Since these microbenchmarks are tunable Boltconfigures their intensity to a higher point than their mea-sured pressure ci during detection For example if a victimis detected as memcached and its most critical resources arethe L1-i cache (81 pressure) and the last level cache (LLC)(78 pressure) Bolt uses the two microbenchmarks for L1-i and LLC at a higher intensity than what memcached cantolerateImpact We use the DoS attack setting above against the 108applications of the controlled experiment (Section 34) TheDoS attack results in degraded performance by 22x on aver-age and up to 98x in terms of execution time Degradation ismuch more pronounced for interactive workloads like key-value stores with tail latency increasing by 8-140x a dra-matic impact for applications with strict tail latency SLAs

We now examine the impact of the DoS attack on uti-lization If it translates to resource saturation there is a highprobability that the DoS will be detected and the victim mi-grated to a new machine The experimental cluster supportslive migration If CPU utilization (sampled every 1 sec) ex-ceeds 70 the victim is migrated to an unloaded host Fig-ure 13 compares the tail latency and CPU utilization withBolt to that of a naıve DoS that saturates the CPU through acompute-intensive kernel We focus on a single victim VMrunning memcached The overhead of migration (time be-tween initiating migration and latency returning to normallevels) for this VM is 8sec Performance degradation is sim-ilar for both systems until t=80sec at which point the victimis migrated to a new host due to the compute-intensive kernelcausing utilization to exceed 70 While during migrationperformance continues to degrade once the VM resumes inthe new server latency returns to nominal levels In contrastBolt keeps utilization low and impacts the victimrsquos perfor-mance beyond t=80sec

52 Resource Freeing AttackAttack setting The goal of a resource freeing attack (RFA)is to modify the workload of a victim such that it frees upresources for the adversary improving its performance [6766] The adversarial VM consists of two components a ben-eficiary and a helper The beneficiary is the program whoseperformance the attacker wants to improve and the helperis the program that forces the victim to yield its resourcesThe RFA works by adding load in the victimrsquos critical re-sources causing other resources to free up For example ifthe victim is a memory-bound Spark job introducing addi-tional memory traffic will result in Spark stalling in memoryand lessening its pressure in the other resources until it canreclaim its required memory bandwidth When launched ina public cloud the victim of an RFA ends up paying moreand achieving worse performance compared to running inisolation

We now create a proof of concept RFA for a webserver anetwork-bound Hadoop job and a memory-bound Spark jobLaunching RFAs requires in depth knowledge of the victimrsquosresource requirements [67] which Bolt provides by identi-fying the victimrsquos dominant resource Once the dominant re-source is detected the runtime applies a custom helper pro-gram that saturates this critical resource For the webserverthe helper is a CPU-intensive benchmark launching CGI re-quests causing the victim to saturate its CPU usage servic-ing fewer real user requests and freeing up cache resourcesFor the Hadoop job we use a network-intensive benchmarksimilar to iperf which saturates network bandwidth andfrees up CPU and memory resources for the beneficiarySimilarly for Spark we use a streaming memory benchmarkthat slows down k-means and frees network and compute re-sources for the adversary The selection of the beneficiaryis of lesser importance without loss of generality we se-lect mcf a CPU-intensive benchmark from SPECCPU2006The same methodology can be used with other beneficiariesconditioned to their critical resource not overlapping withthe victimrsquosImpact Table 2 shows the performance degradation for thethree victim applications and the improvement in executiontime for the beneficiary The webserver suffers the most interms of queries per second as the helperrsquos CGI requestspollute its cache preventing it from servicing legitimate userrequests Hadoop and Spark experience significant degrada-tions in execution time due to network and memory band-width saturation respectively Execution time for mcf im-proves by 16-38 benefiting from the victim stalling in itscritical resource to improve its cache and CPU usage

53 VM Co-residency DetectionAttack setting Sections 34 and 4 showed that we can iden-tify the applications sharing a cloud infrastructure Howevera malicious user is rarely interested in a random service run-ning on a public cloud More often they target a specific

Victim Beneficiary TargetApp Perf App Perf Resource

Apache Webserver -64 (QPS) mcf +24 CPUHadoop (SVM) -36 (Exec) mcf +16 Network BW

Spark (k-means) -52 (Exec) mcf +38 Memory BW

Table 2 RFA impact on the victims and beneficiaries

workload eg a competitive e-commerce site Thereforethey need to pinpoint where the target resides in a practicalmanner This requires a launch strategy and a mechanism forco-residency detection [69] The attack is practical if the tar-get is located with high confidence in reasonable time andwith modest resource costs Bolt enables this attack to becarried out in a practical way and remains resilient againstdefense mechanisms such as virtual private clouds (VPCs)which make internal IP addresses private to a single tenantOnce a target VM is located the adversary can launch RFAor DoS attacks as previously described Co-residency detec-tion attacks rely on leakage of logical information eg IPaddress or on observing the impact of a side-channel attackdue to resource contention Bolt relies on a variation of thelatter approach

Assume a system of N servers A target victim userlaunches k VMs The adversary launches n malicious VMsInstances are launched simultaneously to avoid malicious-malicious instance co-residency [69] The probability that atleast one adversarial VM will be co-scheduled with a victiminstance is P( f ) = 1minus (1minus kN)n The adversary then usesa two-process detection scheme The sender is a process thatcreates contention to degrade the victimrsquos performance Thereceiver is the process that detects this degradation runningeither on the same host as the victim (cooperative detection)or externally (uncooperative detection) We assume an unco-operative victim which is the more general case Bolt worksas the sender to determine the type of co-residents on eachsampled host and identify any of the desired type Assumem VMs of the desired type have been detected in the sampleset Bolt now needs to prune down this space to VM(s) fromthe specific target victim It launches an external receiveraiming at the victim over a public channel eg HTTP orkey-value protocol to ping the victim service This is theonly point of communication with the victim At the sametime the sender incurs contention in the resources the victimis most sensitive to If the sender and victim are co-residentsthe receiverrsquos requests will be slower due to contention Boltquickly prunes down the sampled VMs by determining thetype of co-residents In a large cluster this reduces the timeto pinpoint a victim and the resource costs for the adver-sary It also increases the confidence in the detection in thepresence of noise from other co-residentsImpact We evaluate co-residency detection on the same40-node cluster The victim is a SQL server instantiatinga single VM The cluster also hosts 7 other VMs runningSQL Other active VMs host key-value stores Hadoop andSpark jobs Bolt launches 10 senders simultaneously on

randomly-selected machines (P( f ) = ) and detects 3VMs running SQL in the sample set It then introduces mem-ory interference as the receiver initiates a large number ofSQL queries While these queries have 816msec mean la-tency without interference latency now becomes 2614msecGiven a sim 3times latency increase we can conclude that one ofthe co-residents is the victim SQL server Detection required6sec from instantiation to receiver detection and 11 adver-sary VMs

6 Improving Security via Resource IsolationSince interference is at the heart of Boltrsquos detection method-ology isolation mechanisms that attenuate contention shouldreduce its accuracy We first evaluate to what extent ex-isting isolation techniques mitigate security vulnerabilitiesand then highlight trade-offs between security performanceand utilization We use the same setting as in the controlledexperiment of Section 34 We evaluate 4 resource-specificisolation techniques and 3 settings for OS-level isolationmechanisms baremetal containerized and virtualized Thebaremetal system does not employ any OS level isolationThe containerized setup uses Linux containers (lxc) andassigns cores to applications via cpuset cgroups Bothcontainers and VMs constrain memory capacity Baremetalexperiments do not enforce memory capacity allocationsand the Linux scheduler is free to float applications acrosscores

The first resource-specific isolation techniques is threadpinning to physical cores to constrain interference fromscheduling actions like context switching The number ofcores an application is allocated can change dynamicallyand is limited by how fast Linux can migrate tasks typicallyin the tens of milliseconds

For network isolation we use the outbound networkbandwidth partitioning capabilities of Linuxrsquos traffic con-trol Specifically we use the qdisc [8] scheduler with hier-archical token bucket queueing discipline (HTB) to enforceegress bandwidth limits The limits are set to the maximumtraffic burst rate for each application to avoid contention(ceil parameter in HTB interface) Ingress network band-width isolation has been extensively studied in previouswork [36] these approaches can be applied here as well

For DRAM bandwidth isolation there is no commer-cially available partitioning mechanism To enforce band-width isolation we use the following approach we monitorthe DRAM bandwidth usage of each application throughperformance counters [45] and modify the scheduler to onlycolocate jobs on machines that can accommodate their ag-gregate peak memory bandwidth requirements This requiresextensive application knowledge and is used simply to high-light the benefits of DRAM bandwidth isolation

Finally for last level cache (LLC) isolation we use theCache Allocation Technology (CAT) available in recent In-tel chips [11] CAT partitions the LLC in ways which in

0

20

40

60

80

100

Acc

urac

y (

)

None ThreadPinning

+Net BWPartitioning

+Mem BWPartitioning

+CachePartitioning

+CoreIsolation

BaremetalLinuxContainersVirtualMachines

Figure 14 Detection accuracy with isolation techniques

a highly-associative cache allows for non-overlapping par-titions at the granularity of a few percent of the LLC ca-pacity Each co-resident is allocated one partition configuredto its current capacity requirements [57] Partitions can beresized at runtime by reprogramming specific low-level reg-isters (MSRs) changes take effect after a few milliseconds

We add one isolation mechanism at a time in the threesystem settings (baremetal containers VMs) Figure 14shows the impact of isolation techniques on Boltrsquos detectionaccuracy As expected when no isolation is used baremetalallows for a significantly higher detection accuracy thancontainer- and VM-based systems mostly due to the lat-ter constraining core and memory capacity usage As a re-sult introducing thread pinning benefits baremetal the mostsince it reduces core contention It also benefits container-and VM-based setups to a lesser degree by eliminating un-predictability introduced by the Linux scheduler (eg con-text switching) [40] The dominant resource of each appli-cation determines which isolation technique benefits it themost Thread pinning mostly benefits workloads bound byon-chip resources such as L1L2 caches and cores Addingnetwork bandwidth partitioning lowers detection accuracyfor all three settings almost equally It primarily benefitsnetwork-bound workloads for which network interferenceconveys the most information for detection Memory band-width isolation further reduces accuracy by 10 on averagebenefiting jobs dominated by DRAM traffic Finally cachepartitioning has the most dramatic reduction in accuracyespecially for LLC-bound workloads We attribute this tothe importance cache pressure has as a detection signal Thenumber of co-residents also affects the extent to which iso-lation helps The more co-scheduled applications exist permachine the more isolation techniques degrade accuracy asthey make distinguishing between co-residents harder

Unfortunately even when all the techniques are on ac-curacy is still 50 due to two reasons current techniquesare not fine-grain enough to allow strict and scalable isola-tion and core resources (L1L2 caches CPU) are prone tointerference due to contending hyperthreads [61] To eval-uate the latter hypothesis we modify the scheduler suchthat hyperthreads of different jobs are never scheduled onthe same physical core eg if a job needs 7vCPUs it isallocated 4 dedicated physical cores The grey bars of Fig-ure 14 show the detection accuracy Baremetal instances stillallow certain applications to be detected but for container-

ized and virtualized settings accuracy drops to 14 The re-maining accuracy corresponds to disk-heavy workloads Im-proving security however comes at a performance penaltyof 34 on average in execution time as threads of the samejob are forced to contend with one another Alternativelyusers can overprovision their resource reservations to avoiddegradation which results in a 45 drop in utilization Thismeans that the cloud provider cannot leverage CPU idlenessto share machines decreasing the cost benefits of cloud com-puting Note that enforcing core isolation alone is also notsufficient as it allows a detection accuracy of 46Discussion The previous analysis highlights a design prob-lem with current datacenter platforms Traditional multi-cores are prone to contention which will only worsen asmore cores are integrated in each server and multi-tenancybecomes more pronounced Existing isolation techniques areinsufficient to mitigate security vulnerabilities and tech-niques that provide reasonable security guarantees sacrificeperformance or cost efficiency through low utilization Thishighlights the need for new fine-grain and coordinated iso-lation techniques that guarantee security at high utilizationfor shared resources

7 ConclusionsWe have presented Bolt a practical system that uses onlinedata mining techniques to identify the type and character-istics of applications running on shared cloud systems andenables practical attacks that degrade their performance Ina 40-server cluster Bolt correctly identifies 87 out of 108workloads and degrades tail latency by up to 140x We alsoused Bolt in a user study on EC2 to detect the character-istics of unknown jobs submitted by multiple users Finallywe show that while existing isolation techniques are helpfulthey are not sufficient to mitigate such attacks Bolt revealsreal easy-to-exploit threats in public clouds We hope thatthis work will motivate cloud providers and computer sci-entists to develop and deploy stricter isolation primitives incloud systems

AcknowledgementsWe sincerely thank all the participants in the user study fortheir time and effort We also thank Daniel Sanchez EdSuh Grant Ayers Mingyu Gao Ana Klimovic and the restof the MAST group as well as the anonymous reviewersfor their feedback on earlier versions of this manuscriptThis work was supported by the Stanford Platform LabNSF grant CNS-1422088 and a John and Norma BalenSesquicentennial Faculty Fellowship

References[1] Amazon ec2 httpawsamazoncomec2

[2] Aman Bakshi and Yogesh B Dujodwala Securing cloudfrom ddos attacks using intrusion detection system in virtualmachine In Proc of the Second International Conference onCommunication Software and Networks (ICCSN) 2010

[3] Luiz Barroso and Urs Hoelzle The Datacenter as a Com-puter An Introduction to the Design of Warehouse-Scale Ma-chines MC Publishers 2009

[4] Naomi Benger Joop van de Pol Nigel P Smart and YuvalYarom rdquoooh aah just a little bitrdquo A small amount of sidechannel can go a long way In Proc of the InternationalCryptographic Hardware and Embedded Systems Workshop(CHES) Busan South Korea 2014

[5] Major Bhadauria and Sally A McKee An approach toresource-aware co-scheduling for cmps In Proc of the 24thACM International Conference on Supercomputing (ICS)Tsukuba Japan 2010

[6] Leon Bottou Large-scale machine learning with stochas-tic gradient descent In Proceedings of the InternationalConference on Computational Statistics (COMPSTAT) ParisFrance 2010

[7] Eric Brewer Kubernetes The path to cloud native httpgooglQgkzYB SOCC Keynote August 2015

[8] Martin A Brown Traffic control howto httplinux-ipnetarticlesTraffic-Control-HOWTO

[9] Robin Burke Hybrid recommender systems Survey andexperiments User Modeling and User-Adapted Interaction12(4)331ndash370 November 2002

[10] Apache cassandra httpcassandraapacheorg

[11] Intel Rcopy64 and IA-32 Architecture Software DeveloperrsquosManual vol3B System Programming Guide Part 2 Septem-ber 2014

[12] Ludmila Cherkasova Diwaker Gupta and Amin VahdatComparison of the three cpu schedulers in xen SIGMET-RICS Perform Eval Rev 35(2)42ndash51 September 2007

[13] Scott A Crosby and Dan S Wallach Denial of service viaalgorithmic complexity attacks In Proceedings of the 12thConference on USENIX Security Symposium WashingtonDC 2003

[14] Marwan Darwish Abdelkader Ouda and Luiz FernandoCapretz Cloud-based ddos attacks and defenses In Procof i-Society Toronto ON 2013

[15] Christina Delimitrou and Christos Kozyrakis iBench Quan-tifying Interference for Datacenter Workloads In Proceed-ings of the 2013 IEEE International Symposium on WorkloadCharacterization (IISWC) Portland OR September 2013

[16] Christina Delimitrou and Christos Kozyrakis Paragon QoS-Aware Scheduling for Heterogeneous Datacenters In Pro-ceedings of the Eighteenth International Conference on Ar-chitectural Support for Programming Languages and Oper-ating Systems (ASPLOS) Houston TX USA 2013

[17] Christina Delimitrou and Christos Kozyrakis QoS-AwareScheduling in Heterogeneous Datacenters with Paragon InACM Transactions on Computer Systems (TOCS) Vol 31Issue 4 December 2013

[18] Christina Delimitrou and Christos Kozyrakis Quality-of-Service-Aware Scheduling in Heterogeneous Datacenterswith Paragon In IEEE Micro Special Issue on Top Picks fromthe Computer Architecture Conferences MayJune 2014

[19] Christina Delimitrou and Christos Kozyrakis Quasar Resource-Efficient and QoS-Aware Cluster Management In Proceed-ings of the Nineteenth International Conference on Architec-tural Support for Programming Languages and OperatingSystems (ASPLOS) Salt Lake City UT USA 2014

[20] Christina Delimitrou and Christos Kozyrakis HCloudResource-Efficient Provisioning in Shared Cloud SystemsIn Proceedings of the Twenty First International Conferenceon Architectural Support for Programming Languages andOperating Systems (ASPLOS) April 2016

[21] Christina Delimitrou Daniel Sanchez and Christos KozyrakisTarcil Reconciling Scheduling Speed and Quality in LargeShared Clusters In Proceedings of the Sixth ACM Symposiumon Cloud Computing (SOCC) August 2015

[22] Jake Edge Denial of service via hash collisions httplwnnetArticles474912 January 2012

[23] Benjamin Farley Ari Juels Venkatanathan Varadarajan ThomasRistenpart Kevin D Bowers and Michael M Swift More foryour money Exploiting performance heterogeneity in publicclouds In Proc of the ACM Symposium on Cloud Computing(SOCC) San Jose CA 2012

[24] Alexandra Fedorova Margo Seltzer and Michael D SmithImproving performance isolation on chip multiprocessors viaan operating system scheduler In Proceedings of the 16thIntl Conference on Parallel Architecture and CompilationTechniques (PACT) Brasov Romania 2007

[25] Alexander Felfernig and Robin Burke Constraint-based rec-ommender systems Technologies and research issues InProceedings of the ACM International Conference on Elec-tronic Commerce (ICEC) Innsbruck Austria 2008

[26] Brad Fitzpatrick Distributed caching with memcached InLinux Journal Volume 2004 Issue 124 2004

[27] Oded Goldreich and Rafail Ostrovsky Software protectionand simulation on oblivious rams J ACM 43(3)431ndash473May 1996

[28] Google container engine httpscloudgooglecomcontainer-engine

[29] Asela Gunawardana and Christopher Meek A unified ap-proach to building hybrid recommender systems In Proc ofthe Third ACM Conference on Recommender Systems (Rec-Sys) New York NY 2009

[30] Sanchika Gupta and Padam Kumar Vm profile based opti-mized network attack pattern detection scheme for ddos at-tacks in cloud In Proc of SSCC Mysore India 2013

[31] Yi Han Tansu Alpcan Jeffrey Chan and Christopher LeckieSecurity games for virtual machine allocation in cloud com-puting In 4th International Conference on Decision andGame Theory for Security Fort Worth TX 2013

[32] Amir Herzberg Haya Shulman Johanna Ullrich and EdgarWeippl Cloudoscopy Services discovery and topology map-ping In Proceedings of the ACM Workshop on Cloud Com-puting Security Workshop (CCSW) Berlin Germany 2013

[33] Ben Hindman Andy Konwinski Matei Zaharia Ali GhodsiAnthony D Joseph Randy Katz Scott Shenker and Ion Sto-ica Mesos A platform for fine-grained resource sharing inthe data center In Proceedings of NSDI Boston MA 2011

[34] Jingwei Huang David M Nicol and Roy H CampbellDenial-of-service threat to hadoopyarn clusters with multi-tenancy In Proc of the IEEE International Congress on BigData Washington DC 2014

[35] Alexandru Iosup Nezih Yigitbasi and Dick Epema Onthe performance variability of production cloud services InProceedings of CCGRID Newport Beach CA 2011

[36] Vimalkumar Jeyakumar Mohammad Alizadeh David MazieresBalaji Prabhakar Changhoon Kim and Albert GreenbergEyeq Practical network performance isolation at the edge InProc of the 10th USENIX Conference on Networked SystemsDesign and Implementation (NSDI) Lombard IL 2013

[37] Yaakoub El Khamra Hyunjoo Kim Shantenu Jha and Man-ish Parashar Exploring the performance fluctuations of hpcworkloads on clouds In Proceedings of CloudCom Indi-anapolis IN 2010

[38] Krzysztof C Kiwiel Convergence and efficiency of subgra-dient methods for quasiconvex minimization In Mathemat-ical Programming (Series A) (Berlin Heidelberg Springer)90 (1) pp 1-25 2001

[39] Ruby B Lee Rethinking computers for cybersecurity IEEEComputer 48(4)16ndash25 2015

[40] Jacob Leverich and Christos Kozyrakis Reconciling highserver utilization and sub-millisecond quality-of-service InProceedings of EuroSys Amsterdam The Netherlands 2014

[41] Host server cpu utilization in amazon ec2 cloud httpgoogl2LTx4T

[42] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser andRuby B Lee Last-level cache side-channel attacks are prac-tical In Proc of IEEE Symposium on Security and Privacy(SampP) San Jose CA 2015

[43] Fei Liu Lanfang Ren and Hongtao Bai Mitigating cross-vm side channel attack on multiple tenants cloud platform InJournal of Computers Vol 9 No 4 (2014) 1005-1013 April2014

[44] David Lo Liqun Cheng Rama Govindaraju Luiz Andre Bar-roso and Christos Kozyrakis Towards energy proportionalityfor large-scale latency-critical workloads In Proceedings ofthe 41st Annual International Symposium on Computer Ar-chitecuture (ISCA) Minneapolis MN 2014

[45] David Lo Liqun Cheng Rama Govindaraju ParthasarathyRanganathan and Christos Kozyrakis Heracles Improvingresource efficiency at scale In Proc of the 42Nd AnnualInternational Symposium on Computer Architecture (ISCA)Portland OR 2015

[46] Mahout httpmahoutapacheorg

[47] Dave Mangot Ec2 variability The numbers revealed httptechmangotcomrollerdaveentryec2_variability_the_numbers_revealed

[48] Jason Mars and Lingjia Tang Whare-map heterogeneity inrdquohomogeneousrdquo warehouse-scale computers In Proceedingsof ISCA Tel-Aviv Israel 2013

[49] Robert Martin John Demme and Simha SethumadhavanTimewarp Rethinking timekeeping and performance moni-toring mechanisms to mitigate side-channel attacks In Pro-ceedings of the International Symposium on Computer Archi-tecture (ISCA) Portland OR 2012

[50] David Meisner Christopher M Sadler Luiz Andre BarrosoWolf-Dietrich Weber and Thomas F Wenisch Power man-agement of online data-intensive services In Proceedings ofthe 38th annual international symposium on Computer archi-tecture pages 319ndash330 2011

[51] Jelena Mirkovic and Peter Reiher A taxonomy of ddos attackand ddos defense mechanisms ACM SIGCOMM ComputerCommunication Review (CCR) April 2004

[52] Thomas Moscibroda and Onur Mutlu Memory performanceattacks Denial of memory service in multi-core systemsIn Proc of 16th USENIX Security Symposium on USENIXSecurity Symposium (SS) Boston MA 2007

[53] Ripal Nathuji Aman Kansal and Alireza Ghaffarkhah Q-clouds Managing performance interference effects for qos-aware clouds In Proceedings of EuroSys ParisFrance 2010

[54] Simon Ostermann Alexandru Iosup Nezih Yigitbasi RaduProdan Thomas Fahringer and Dick Epema A performanceanalysis of ec2 cloud computing services for scientific com-puting In Lecture Notes on Cloud Computing Volume 34p115-131 2010

[55] Tao Peng Christopher Leckie and Kotagiri RamamohanaraoSurvey of network-based defense mechanisms countering thedos and ddos problems ACM Comput Surv 39(1) April2007

[56] Diego Perez-Botero Jakub Szefer and Ruby B Lee Char-acterizing hypervisor vulnerabilities in cloud computingservers In Proceedings of the 2013 International Work-shop on Security in Cloud Computing SCCASIACCSHangzhou China 2013

[57] Moinuddin K Qureshi and Yale N Patt Utility-based cachepartitioning A low-overhead high-performance runtimemechanism to partition shared caches In Proceedings ofthe 39th Annual IEEEACM International Symposium on Mi-croarchitecture MICRO 39 2006

[58] Himanshu Raj Ripal Nathuji Abhishek Singh and Paul Eng-land Resource management for isolation enhanced cloud ser-vices In Proc of the ACM Workshop on Cloud ComputingSecurity (CCSW) Chicago IL 2009

[59] Suhail Rehman and Majd Sakr Initial findings for provision-ing variation in cloud computing In Proceedings of Cloud-Com Indianapolis IN 2010

[60] Thomas Ristenpart Eran Tromer Hovav Shacham and Ste-fan Savage Hey you get off of my cloud Exploring infor-mation leakage in third-party compute clouds In Proc of theACM Conference on Computer and Communications Security(CCS) Chicago IL 2009

[61] Daniel Sanchez and Christos Kozyrakis Vantage Scalableand Efficient Fine-Grain Cache Partitioning In Proceedingsof the 38th annual International Symposium in ComputerArchitecture (ISCA-38) San Jose CA June 2011

[62] Jorg Schad Jens Dittrich and Jorge-Arnulfo Quiane-RuizRuntime measurements in the cloud Observing analyzingand reducing variance Proceedings VLDB Endow 3(1-2)460ndash471 September 2010

[63] Malte Schwarzkopf Andy Konwinski Michael Abd-El-Malek and John Wilkes Omega flexible scalable sched-ulers for large compute clusters In Proceedings of EuroSysPrague Czech Republic 2013

[64] Alan Shieh Srikanth Kandula Albert Greenberg and ChanghoonKim Seawall Performance isolation for cloud datacenternetworks In Proc of the USENIX Conference on Hot Topicsin Cloud Computing (HotCloud) Boston MA 2010

[65] David Shue Michael J Freedman and Anees Shaikh Perfor-mance isolation and fairness for multi-tenant cloud storage InProc of the 10th USENIX Conference on Operating SystemsDesign and Implementation (OSDI) Hollywood CA 2012

[66] Dan Tsafrir Yoav Etsion and Dror G Feitelson Secretlymonopolizing the cpu without superuser privileges In Procof 16th USENIX Security Symposium on USENIX SecuritySymposium Boston MA 2007

[67] Venkatanathan Varadarajan Thawan Kooburat BenjaminFarley Thomas Ristenpart and Michael Swift Resource-freeing attacks Improve your cloud performance (at yourneighborrsquos expense) In Proc of the Conference on Computerand Communications Security (CCS) Raleigh 2012

[68] Venkatanathan Varadarajan Thomas Ristenpart and MichaelSwift Scheduler-based defenses against cross-vm side-channels In Proc of the 23rd Usenix Security SymposiumSan Diego CA 2014

[69] Venkatanathan Varadarajan Yinqian Zhang Thomas Risten-part and Michael Swift A placement vulnerability study inmulti-tenant public clouds In Proc of the 24th USENIX Secu-rity Symposium (USENIX Security) Washington DC 2015

[70] Huaibin Wang Haiyun Zhou and Chundong Wang Vir-tual machine-based intrusion detection system framework incloud computing environment In Journal of Computers Oc-tober 2012

[71] Hui Wang Canturk Isci Lavanya Subramanian JongmooChoi Depei Qian and Onur Mutlu A-drm Architecture-aware distributed resource management of virtualized clus-ters In Proceedings of the 11th ACM SIGPLANSIGOPSinternational conference on Virtual Execution Environments(VEE) Istanbul Turkey 2015

[72] Ian H Witten Eibe Frank and Geoffrey Holmes DataMining Practical Machine Learning Tools and Techniques3rd Edition

[73] Zhenyu Wu Zhang Xu and Haining Wang Whispers inthe hyper-space High-speed covert channel attacks in thecloud In Proc of the 21st USENIX Conference on SecuritySymposium (USENIX Security) Bellevue WA 2012

[74] Yunjing Xu Michael Bailey Farnam Jahanian KaustubhJoshi Matti Hiltunen and Richard Schlichting An explo-ration of l2 cache covert channels in virtualized environmentsIn Proc of the 3rd ACM Workshop on Cloud Computing Se-curity Workshop (CCSW) Chicago IL 2011

[75] Zhang Xu Haining Wang and Zhenyu Wu A measurementstudy on co-residence threat inside the cloud In Proc ofthe 24th USENIX Security Symposium (USENIX Security)Washington DC 2015

[76] Yuval Yarom and Katrina Falkner Flush+reload a highresolution low noise l3 cache side-channel attack In Proc ofthe 23rd Usenix Security Symposium San Diego CA 2014

[77] Matei Zaharia Mosharaf Chowdhury Tathagata Das AnkurDave Justin Ma Murphy McCauly Michael J FranklinScott Shenker and Ion Stoica Resilient distributed datasetsA fault-tolerant abstraction for in-memory cluster computingIn Proceedings of NSDI San Jose CA 2012

[78] Yinqian Zhang Ari Juels Alina Oprea and Michael K Re-iter Homealone Co-residency detection in the cloud via side-channel analysis In Proc of the IEEE Symposium on Securityand Privacy Oakland CA 2011

[79] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-tenant side-channel attacks in paas cloudsIn Proc of the ACM SIGSAC Conference on Computer andCommunications Security (CCS) Scottsdale AZ 2014

[80] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-vm side channels and their use to ex-tract private keys In Proceedings of the ACM Conferenceon Computer and Communications Security (CCS) RaleighNC 2012

[81] Yinqian Zhang and Michael K Reiter Duppel retrofittingcommodity operating systems to mitigate cache side channelsin the cloud In Proc of the ACM Conference on Computerand Communications Security (CCS) Berlin Germany 2013

[82] Fangfei Zhou Manish Goel Peter Desnoyers and Ravi Sun-daram Scheduler vulnerabilities and coordinated attacks incloud computing J Comput Secur 21(4)533ndash559 July2013

[83] Jieming Zhu Pinjia He Zibin Zheng and Michael R LyuTowards online accurate and scalable qos prediction for run-time service adaptation In Proc of the IEEE InternationalConference on Distributed Computing Systems (ICDCS)Madrid Spain 2014

[84] Sergey Zhuravlev Sergey Blagodurov and Alexandra Fe-dorova Addressing shared resource contention in multicoreprocessors via scheduling In Proc of the Fifteenth Editionof on Architectural Support for Programming Languages andOperating Systems (ASPLOS) Pittsburgh PA 2010

  • Introduction
  • Related Work
  • Bolt
    • Threat Model
    • Application Detection
    • Challenges
    • Detection Accuracy
    • Limitations
      • Cloud User Study
      • Security Attacks
        • Internal Denial of Service Attack
        • Resource Freeing Attack
        • VM Co-residency Detection
          • Improving Security via Resource Isolation
          • Conclusions
Page 2: Bolt: I Know What You Did Last Summer In the Cloudcsl.stanford.edu/~christos/publications/2017.bolt.asplos.pdf · 2018-01-09 · Bolt: I Know What You Did Last Summer... In the Cloud

The information obtained by Bolt makes several cloudattacks practical and difficult to detect For example we useBolt to launch host-based DoS attacks that use informationon the victimrsquos resource sensitivity to inject carefully-craftedcontentious programs that degrade the victimrsquos performanceIn the 40-server cluster Boltrsquos DoS attack translates to a taillatency increase of up to 140times for interactive workloadsUnlike traditional DoS attacks that saturate compute andmemory resources Bolt maintains low CPU utilization byonly introducing interference in the most critical resourcesmaking it resilient to DoS mitigation techniques such asload-triggered VM migration We have also used Bolt tolaunch resource freeing attacks (RFA) that force the victimto yield its resources to the adversary [67] and VM co-residency detection attacks that pinpoint where in a sharedcluster a specific application resides [69] We show thatthe information obtained through data mining is critical toescalate the impact of the attack reduce its time and costand to make it difficult to detect

Finally we examine to what degree current isolation tech-niques can alleviate these security vulnerabilities We an-alyze baremetal systems containers and virtual machineswith techniques like thread pinning memory bandwidth iso-lation and network and cache partitioning These are themain isolation techniques available today and while theyprogressively reduce application detection accuracy from81 to 50 they are not sufficient to eliminate these vulner-abilities We show that the only method currently availableto reduce accuracy down to 14 is core isolation where anapplication is only allowed to share cores with itself to avoidmalicious co-residents However since application threadscontend for shared on-chip resources performance degradesby 34 on average Alternatively if we allocate more coresper application to mitigate performance unpredictability weend up with resource underutilization and cloud inefficiencyWe hope that this study will motivate public cloud providersto introduce stricter isolation solutions in their platforms andsystems architects to develop fine-grain isolation techniquesthat provide strong isolation and performance predictabilityat high utilization

2 Related WorkPerformance unpredictability is a well-studied problem inpublic clouds that stems from platform heterogeneity re-source interference software bugs and load variation [12 4762 18 17 20 54 35 21 62 59 37] A lot of recent workhas proposed isolation techniques to reduce unpredictabilityby eliminating interference [58 61 45 65 64] With Boltwe show that unpredictability due to interference also hidessecurity vulnerabilities since it enables an adversary to ex-tract information about an applicationrsquos type and character-istics Below we discuss related work with respect to cloudvulnerabilities such as VM placement detection DDoS andside-channel attacks

VM co-residency detection Cloud multi-tenancy has mo-tivated a line of work on locating a target VM in a pub-lic cloud Ristenpart et al [60] showed that the IP machinenaming conventions of cloud providers allowed adversarialusers to narrow down where a victim VM resided in a clus-ter Xu et al [74] and Herzberg et al [32] extended thisstudy resulting in part in cloud providers changing theirnaming conventions reducing the effectiveness of networktopology-based co-residency attacks Following this devel-opment Varadarajan et al [69] evaluated the susceptibil-ity of three cloud providers to VM placement attacks andshowed that techniques like virtual private clouds (VPC) ren-der some of them ineffective Similarly Zhang et al [78]designed HomeAlone a system that detects VM placementby issuing side-channels in the L2 cache during periods oflow traffic Finally Han et al [31] proposed VM placementstrategies that defend against placement attacks althoughthey are not specifically geared towards public clouds WithBolt we show that leveraging simple data mining techniqueson the pressure applications introduce on shared resourcesincreases the accuracy of VM co-residency detection signif-icantly Bolt does not rely on knowing the cloudrsquos networktopology or host IPs making it resilient against mitigationtechniques such as VPCsPerformance attacks Once a victim application is locatedan adversary can can negatively affect its performance Dis-tributed Denial of Service attacks [51 23 67 34] in thecloud have increased in number and impact over the pastyears This has generated a lot of interest in detection andprevention techniques [55 14 30] Bakshi et al [2] for ex-ample developed a system that detects abnormally high net-work traffic that could signal an upcoming DDoS attackwhile Crosby et al [13] and Edge [22] proposed a newDoS attack relying on algorithmic complexity that drivesCPU usage up Finally resource-freeing attacks (RFAs) alsohurt a victimrsquos performance while additionally forcing it toyield its resources to the adversary [67] While RFAs areeffective they require significant compute and network re-sources and are prone to defenses such as live VM migra-tion that cloud providers introduce to mitigate performanceunpredictability due to resource saturation In contrast Boltlaunches host-based attacks on the same machine as the vic-tim that take advantage of the victimrsquos resource sensitivityand keep resource utilization moderate therefore evadingdefense mechanismsSide-channel attacks There are also attacks that attempt toextract confidential information from co-scheduled applica-tions such as private keys [81 42 76 4] Zhang et al [80]proposed a system that launches side-channel attacks in avirtualized environment and cross-tenant side-channel at-tacks in PaaS clouds [79] Wu et al [73] on the other handused the memory bus of an x86 processor to launch a covert-channel attack and degrade the victimrsquos performance Onthe defense side Perez-Botero et al [56] analyzed the vul-

Server

N vCPUs 4 vCPUs

Victim VM Advers VM

Server

N vCPUs 4 vCPUs

Victim VM Advers VM

ci

cj

ck

~5 sec

[ci 0 cj 0 0 0 hellip ck 0]

c11 c12 c13 c14 hellip c1(N-1) c1N

c21 c22 c23 c24 hellip c2(N-1) c2N

c31 c32 c33 c34 hellip c3(N-1) c3N

cM1 cM2 cM3 cM4hellip cM(N-1) cMN

SVD +

SGD

c11 c12 c13 c14 hellip c1(N-1) c1N

c21 c22 c23 c24 hellip c2(N-1) c2N

c31 c32 c33 c34 hellip c3(N-1) c3N

cM1 cM2 cM3 cM4hellip cM(N-1) cMN

c1 c2 c3 c4 hellip c(N-1) cN

hellip

hellip

hellip

hellip

App 1

App 2

App 3

App M

Pearson similarity

App 3 80

App 4 21

App 2 45

App 1 16

App M 1

~50 msec ~30 msec

Figure 1 Overview of the application detection process Bolt first measures the pressure co-residents place in sharedresources and then uses data mining to determine the type and characteristics of co-scheduled applications

nerabilities of common hypervisors and Wang et al [70]proposed a system for intrusion detection in cloud settingswhile Liu et al [43] and Varadajaran et al [68] designedscheduler-based defenses against covert- and side-channelattacks in the memory bus The latter system controls theoverlapping execution of different VMs and injects noise inthe memory bus to prevent an adversary from extracting con-fidential information Bolt does not rely on accurate microar-chitectural event measurements through performance coun-ters to detect application placement and is therefore resilientto techniques that limit the fidelity of time-keeping and per-formance monitoring to thwart information leakage in side-channel attacks [49]

3 Bolt31 Threat ModelBolt targets IaaS providers that operate public clouds for mu-tually untrusting users Multiple VMs can be co-scheduledon the same server Each VM has no control over whereit is placed and no a priori information on other VMs onthe same physical host For now we assume that the cloudprovider is neutral with respect to detection by adversarialVMs ie it does not assist such attacks or employ addi-tional resource isolation techniques than what is availableby default to hinder attacks by adversarial users In Section 6we explore how additional isolation techniques affect Boltrsquosdetection accuracyAdversarial VM An adversarial VM has the goal of de-termining the nature and characteristics of any applicationsco-scheduled on the same physical host and negatively im-pacting their performance Adversarial VMs start with zeroknowledge of co-scheduled workloadsFriendly VM This is a normal VM scheduled on a physi-cal host that runs one or more applications Friendly VMsdo not attempt to determine the existence and characteris-tics of other co-scheduled VMs They also do not employany schemes to prevent detection such as memory patternobfuscation [27]

32 Application DetectionDetection relies on inferring workload characteristics fromthe contention the adversary experiences in shared resources

For simplicity we assume one co-scheduled victim jobfor now and generalize in Section 33 Figure 1 shows anoverview of the systemrsquos operation

Bolt instantiates an adversarial VM on the same physi-cal host as the victim friendly VM Bolt uses its VM to runa few microbenchmarks of tunable intensity that each putprogressively more pressure on a specific shared resourceThe resources stressed by the microbenchmarks include on-chip resources such as functional units and different lev-els of the cache hierarchy and off-chip resources such asthe memory network and storage subsystems [15] Each mi-crobenchmark progressively increases its intensity from 0 to100 until it detects pressure from the co-scheduled work-load ie until the microbenchmarkrsquos performance is worsethan its expected value when running in isolation The inten-sity of the microbenchmark at that point captures the pres-sure the victim incurs in shared resource i and is denoted ciwhere iisin[1N] N=10 and ci isin[0100] The 10 resources areL1 instruction and L1 data cache L2 and last level cachememory capacity and memory bandwidth CPU networkbandwidth disk capacity and disk bandwidth Large valuesof ci imply high pressure in resource i For unconstrained re-sources eg last level cache 100 pressure means that thebenchmark takes over the entire resource capacity For re-sources constrained through partitioning mechanisms egmemory capacity in VMs 100 corresponds to taking overthe entire memory partition allocated to the VM

Bolt uses 2-3 microbenchmarks for profiling requiringapproximately 2-5 seconds in total It randomly selects onecore and one uncore benchmark to get a more representativesnapshot of the co-residentrsquos resource profile If the coreis not shared between the adversarial and friendly VMsBolt measures zero pressure in the shared core resource Itthen adds a third microbenchmark for an additional uncoreresource (shared caches memory network or storage)

We use this sparse signal to determine the applicationrsquospressure in all other resources its type and characteristicsOnline data mining techniques have recently been shown toeffectively solve similar problems in cluster management byfinding similarities between new unknown applications andpreviously seen workloads [19] The Quasar cluster man-ager for example used collaborative filtering to find similar-

ities in terms of heterogeneity interference sensitivity andprovisioning The advantage of collaborative filtering is thatit can identify application similarities without a priori knowl-edge of application types and critical features Unfortunatelythis means that it is also unable to label the victim workloadsand classify their functionality making it insufficient for theadversaryrsquos purposePractical data mining Bolt instead feeds the profiling sig-nal to a hybrid recommender using feature augmentation [925] that determines the type and resource characteristicsof victim workloads The recommender combines collab-orative filtering and content-based similarity detection [929] The former has good scaling properties relaxed spar-sity constraints and offers conceptual insight on similaritieswhile the latter exploits contextual information for accurateresource profile matching

First a collaborative filtering system recovers the pres-sure the victim places in non-profiled resources [16 83]The system relies on matrix factorization with singularvalue decomposition (SVD) [72] and PQ-reconstructionwith stochastic gradient descent (SGD) [6 38] to find simi-larities between the new victim application and previously-seen workloads SVD produces three matrices U Σ and VThe singular values σi in Σ correspond to similarity con-cepts such as the intensity of compute operations or thecorrelation between high network and disk traffic Similar-ity concepts are in the order of the number of examinedresources and are ordered by decreasing magnitude in ΣLarge singular values reveal stronger similarity concepts(higher confidence in workload correlation) while smallervalues correspond to weaker similarity concepts and aretypically discarded during the dimensionality reduction bySVD Matrix U(mr) captures the correlation between eachapplication and similarity concept and V(nr) the correlationbetween each resource and similarity concept

Once we have identified critical similarity concepts anddiscarded inconsequential information a content-based sys-tem that uses weighted Pearson correlation coefficients tocompute the similarity between the resource profile u j of anew application j and the applications the system has previ-ously seen Weights correspond to the values of the r morecritical similarity concepts 1 and resource profiles to therows of the matrix of left singular vectors U Using tra-ditional Pearson correlation would discard the application-specific information that certain resources are more criticalfor a given workload Similarity between applications A andB is given by

Weighted Pearson(AB) =cov(uAuBσ)radic

cov(uAuAσ) middot cov(uBuBσ)(1)

where uA is the correlation of application A with each simi-larity concept σi cov(uAuBσ)= sumi σi(uAiminusm(uAσ))(uBiminusm(uBσ))

sumi σi

1 We keep the r largest singular values such that we preserve 90 of thetotal energy sum

ri=1 σ2

i = 90sumni=1 σ2

i

0 25 50 75 100L1-i Cache ()

100

75

50

25

0Last

Lev

el C

ache

(

)

0 25 50 75 100L1-d Cache ()

100

75

50

25

0

CP

U (

)

0 25 50 75 100Memory Capacity ()

100

75

50

25

0Mem

ory

Ban

dwid

th (

)

0 5 10 15 200

5

10

15

20

00

01

02

03

04

05

06

07

08

09

10

Pro

babi

lity

0 25 50 75 100Disk Capacity ()

100

75

50

25

0Net

wor

k B

andw

idth

(

)

0 25 50 75 100Disk Bandwidth ()

100

75

50

25

0

L2 C

ache

(

)

0 5 10 15 200

5

10

15

20

00

01

02

03

04

05

06

07

08

09

10

Pro

babi

lity

Figure 2 We show the correlation between the pres-sure in various resources and the probability of a co-scheduled application being memcached For exampleworkloads with high LLC pressure and very high L1-ipressure have a high probability of being memcached

the covariance of A and B under weights σ and m(uAσ) =sumi σimiddotuAi

sumi σithe weighted mean for A The output of the hybrid

recommender is a distribution of similarity scores of howclosely a victim resembles different previously-seen appli-cations For example a victim may be 65 similar to amemcached workload 18 similar to a Spark job runningPageRank 10 similar to a Hadoop job running an SVMclassifier and 3 to a Hadoop job running k-means The95th percentile of the recommenderrsquos end-to-end latency is80msec Apart from application labels this analysis yieldsinformation on the resources the victim is sensitive to en-abling several practical performance attacks (Section 5)System insights from data mining Before dimensionalityreduction each similarity concept corresponds to a sharedresource Different resources convey different amounts of in-formation about a workload and thus have different value toBolt for detection The magnitude of each similarity conceptreflects how strongly it captures application similarities Forthe controlled experiment of Section 34 that involves batchand interactive jobs the resources with most value are the LLand L1-i caches followed by compute intensity and memorybandwidth While the exact order depends on the applica-tion mix correlating similarity concepts to resources showsthat certain resources are more prone to leaking informationabout a workload and their isolation should be prioritized

Finally we show how resource pressure correlates to theprobability that an application is of a specific type Figure 2show the probability that an unknown workload is a mem-cached instance with a read-mostly load and KB-range val-ues as a function of its measured resource pressure (de-tails on methodology in Section 34) We decouple the 10resources in 2D plots for clarity From the heatmaps it be-comes clear that cache activity is a very strong indicator ofworkload type with applications with very high L1-i andhigh LLC pressure corresponding to memcached with a highprobability Disk traffic also conveys a lot of information

with zero disk usage signaling a memcached workload withvery high likelihood Similar graphs can be created to fin-gerprint other application types The high heat areas aroundthe red regions of each graph correspond to memcached jobswith different rdwr ratios and value sizes and memory-bound workloads like Spark

33 ChallengesMultiple co-residents When a single application shares ahost with Bolt detection is straightforward However cloudoperators colocate VMs on the same physical host to maxi-mize the infrastructurersquos utility Disentangling the contentioncaused by several jobs is challenging By default Bolt usestwo benchmarks - one core and one uncore - to measure re-source pressure If the recommender cannot determine thetype of co-scheduled workload based on them (all Pear-son correlation coefficients below 01) either the applica-tion type has not been seen before or the resource pressureis the result of multiple co-residents If at least one of the co-residents shares a core with Bolt ie the core benchmarkreturns non-zero pressure we profile with an additional corebenchmark and use it to determine the type of co-runnerBecause hyperthreads are not shared between multiple ac-tive instances this allows accurately measuring pressure incore resources The remainder of the uncore interference isused to determine the other co-scheduled workloads Thisassumes a linear relationship between jobs for the resourcesof memory bandwidth and IO bandwidth which can intro-duce some inaccuracies but in practice does not significantlyimpact Boltrsquos detection ability (see Section 34)

Finally there are occurrences where none of the co-scheduled applications share a core with Bolt In this casethe system must rely solely on uncore resource pressure todetect applications To disentangle the resource character-istics across co-residents Bolt employs a shutter profilingmode which involves frequent brief profiling phases (10-50msec each) in uncore resources The goal of this mode isto capture at least one of the co-scheduled workloads duringa low-pressure phase which would reveal the resource usageof only one of the co-residents much like a high speed cam-era shutter attempts to capture a fast-moving object still Fig-ure 3 shows how shutter profiling works When the shutteris open (upper left figure) thus profiling is off Bolt cannotdifferentiate between the two victim VMs When the shutteris closed during profiling Bolt checks for changes in uncoreresource pressure If both victims are at high load differenti-ating between them is difficult (lower left) Similarly whenboth VMs are at low load (lower right) However when onlyVM2 is at low load the pressure Bolt captures is primarilyfrom VM1 allowing the system to detect its type and basedon the remaining pressure also detect VM2 This techniqueis particularly effective for user-interactive applications thatgo through intermittent phases of low load it is less effec-tive for mixes of services with constant steady-state loadsuch as long-running analytics or logging services We will

Server VM2 Bolt VM1

Server VM2 Bolt VM1

13

13

13

13

Server VM2 Bolt VM1

Server VM2 Bolt VM1

13

13

13

13

13

13

Figure 3 Shutter profiling mode

consider whether additional input signals such as per-jobcache miss rate curves can improve detection accuracy forthe latter workloadsApplication phases Datacenter applications are notoriousfor going through multiple phases during their executionOnline services in particular follow diurnal patterns withhigh load during the day and low load in the night [3 50 44]Moreover an application may not be immediately detectedespecially during its initialization phase Cloud users maypurchase instances to run a service and then maintain themto execute other applications eg analytics over differentdatasets In these cases the results of Boltrsquos detection canbecome obsolete over time We address this by periodicallyrepeating the profiling and detection steps for co-scheduledworkloads Each iteration takes 2-5 seconds for profilingand a few milliseconds for the recommenderrsquos analysis Sec-tion 34 shows a sensitivity study on how frequently detec-tion needs to happen

34 Detection AccuracyWe first evaluate Boltrsquos detection accuracy in a controlledenvironment where all applications are known We usea 40-machine cluster with 8 core (2-way hyperthreaded)Xeon-class servers and schedule a total of 108 workloadsincluding batch analytics in Hadoop [46] and Spark [77]and latency-critical services such as webservers mem-cached [26] and Cassandra [10] For each application typethere are several different workloads with respect to algo-rithms framework versions datasets and input load patternsFriendly applications are scheduled using a least-loaded(LL) scheduler that allocates resources on the machines withthe most available compute memory and storage All work-loads are provisioned for peak requirements to reduce inter-ference Even so interference between co-residents existssince the scheduler does not account for the sensitivity ap-plications have to contention In the end of this section weevaluate how a scheduler that accounts for cross-applicationinterference affects detection accuracy The training set con-sists of 120 diverse applications that include webserversvarious analytics algorithms and datasets and several key-value stores and databases The training set is selected toprovide sufficient coverage of the space of resource charac-teristics Figure 4 shows the pressure jobs in the training setput in compute memory storage and network bandwidth

0 20 40 60 80 100CPU Pressure ()

0

20

40

60

80

100M

emor

y P

ress

ure

()

0 20 40 60 80 100Network Pressure ()

0

20

40

60

80

100

Sto

rage

Pre

ssur

e (

)

Figure 4 Coverage of resource characteristics for appli-cations in the training set

Applications Detection accuracy ()Least Load scheduler Quasar scheduler

Aggregate 87 89memcached 78 80

Hadoop 92 92Spark 85 86

Cassandra 90 89speccpu2006 84 85

Table 1 Boltrsquos detection accuracy in the controlled ex-periment with the least loaded scheduler and Quasar

While there are clusters of applications that saturate severalresources the selected workloads cover the majority of theresource usage space This enables Bolt to match any re-source profile to information from the training set even ifthe new application has not been previously seen Increasingthe training set size further did not improve detection accu-racy Finally note that there is no overlap between trainingand testing sets in terms of algorithms datasets and inputloads

In each server we instantiate an adversarial VM runningUbuntu 1404 By default the VM has 4vCPUs (2 physicalcores) to generate enough contention (Fig 10 shows a sensi-tivity analysis on the VM size) The remainder of each ma-chine is allocated to one or more friendly VMs The max VMnumber 5 in our setting depends on the available serversFriendly VMs run on Ubuntu 1404 or Debian 80 Adver-sarial VMs have no a priori information on the number andtype of their co-residents Applications are allowed to sharea physical core but must be on different hyperthreads (vC-PUs) While sharing a single hyperthread is common in pri-vate deployments hosting batch jobs in small containers [7]it is uncommon in public clouds where 1 vCPU is the min-imum guaranteed size of dedicated resources for non-idlinginstances

Table 1 shows Boltrsquos detection accuracy per applicationclass and aggregate We signal a detection as correct if Boltidentifies correctly the framework (eg Hadoop) or service(eg memcached) the application uses and the algorithmeg SVM on Hadoop or user load characteristics eg read-vs write-heavy load We do not currently examine moredetailed features such as the distribution of individual querytypes Bolt correctly identifies 87 of jobs and for certainapplication classes like databases and analytics the accuracy

CPU

L1-i

L1-d

L2

LLC

MemCap

MemBW

NetBW

DiskCap

DiskBW

0204

0608

HadoopwordCountSCPU

L1-i

L1-d

L2

LLC

MemCap

MemBW

NetBW

DiskCap

DiskBW

0204

0608

HadooprecommenderLCPU

L1-i

L1-d

L2

LLC

MemCap

MemBW

NetBW

DiskCap

DiskBW

0204

0608

New unknown app

Similarity=029 Similarity=078

Figure 5 Star charts showing the resource profiles oftwo Hadoop and one unknown job The closer the shadedarea is to a vertex the higher the pressure in the specificresource

1 2 3 4 5Number of Applications

0

20

40

60

80

100

Acc

urac

y (

)L1-i L1-d L2 LLC CPU MemCMemBw Net DiskBw DiskC

Dominant Resource

0

20

40

60

80

100

Acc

urac

y (

)

12

5

5

1918 17

15 114 2

Figure 6 Detection accuracy as a function of the numberof co-runners (left) and the appsrsquo dominant resources(right)

exceeds 90 Misclassified jobs are typically identified asworkloads with the same or similar critical resourcesPer-application profiles Note that although in Table 1 wegroup applications by programming framework or onlineservice each framework does not correspond to a single re-source profile Profiles of applications within a frameworkcan vary greatly depending on functionality complexity anddataset features Boltrsquos recommender system matches re-source profiles to specific algorithmic classes and datasetcharacteristics within each framework Figure 5 shows anexample where two Hadoop jobs one running word counton a small dataset and one running a recommender systemon a very large dataset exhibit very different resource pro-files While the third application is also a Hadoop job it isidentified as very similar to the recommender as opposed toword countNumber of co-residents The number of victim applica-tions affects detection accuracy Figure 6a shows accuracyas a function of the number of victims per machine Whenthe number of co-residents is less than 2 accuracy exceeds95 As the number increases accuracy drops since it be-comes progressively more difficult to differentiate betweenthe contention caused by each workload When there are5 co-scheduled applications accuracy is 67 Interestinglyaccuracy is higher for 4 than for 3 applications since with 4workloads the probability of sharing a core and thus gettingan accurate measurement of core pressure is higher Whileaggressive multi-tenancy affects accuracy large numbers ofco-residents in public clouds are unlikely in practice [41]

Dominant resource We now examine the correlation be-tween the resource an application puts the most pressure onand Boltrsquos ability to correctly detect it (Figure 6) The num-bers on each bar show how many applications have each re-source as dominant In general applications that are mosteasily detected are ones with high instruction cache (L1-i)memory bandwidth network bandwidth and disk capacitypressure These are workloads that fall in one of the fol-lowing categories latency-critical services with large code-bases such as webservers in-memory analytics like Sparkand disk-bound analytics like Hadoop Interestingly L2 ac-tivity is a poor indicator of application type in contrast to L1and LLC since it does not capture a significant change in theworking set size (from 32KB to 256KB for our platforms)Number of iterations In this controlled experiment westop the detection process upon correct identification of aworkload For several jobs this requires more than one it-erations of profiling and data mining Figure 7a shows thefraction of workloads that were correctly-identified after Niterations 71 of victims only require a single iterationwhile an additional 15 required a second A small frac-tion of applications needed more than two iterations whilejobs that are not identified correctly until the sixth iterationdid not benefit from additional iterations The number of ap-plications per machine also affects the iterations required fordetection (Figure 7b) When a single job is scheduled on amachine one iteration is sufficient for almost all workloadsAs the number of victims per machine increases additionaliterations are needed to disentangle the contention signals ofeach co-scheduled job

Figure 8 shows a case of detection over several iter-ations The victim is a 4vCPU instance executing differ-ent consecutive jobs Detection occurs every 20sec by de-fault (see Fig 10 for a sensitivity study on profiling fre-quency) After the initial iteration Bolt detects the job asmcf from SPECCPU2006 In the next two iterations the in-terference profile of the victim continues to fit mcfrsquos char-acteristics At t=60sec Bolt detects that the job profile haschanged and now resembles an SVM classifier running onMahout [46] Similarly at t=180sec the victim changesagain to a Spark data mining workload Changes are typi-cally captured within a few secondsResource pressure Figure 9 shows the correlation betweenthe pressure of a victim in a given resource and Boltrsquos abilityto correctly detect it We plot detection accuracy for threecore and three uncore resources the results are similar forthe remaining four In almost all cases very low or veryhigh pressure carries the most value for detection For diskbandwidth accuracy remains high except for the 20-50 re-gion which contains many application classes with moder-ate disk activity including analytics in Hadoop databasesand graph applications Resource pressure also affects thenumber of iterations Bolt needs to correctly identify a work-load For resources with a lot of value for detection such as

1 2 3 4 5 6Iterations until Detection

0

10

20

30

40

50

60

70

80

PD

F (

)

1 2 3 4 5 6Iterations until Detection

0

20

40

60

80

100

PD

F (

)

1 app2 apps3 apps4 apps5 apps

Figure 7 PDFs of iterations required for detection intotal (left) and as a function of the co-residents number(right)

0 1 2 3 4 5 6 7Time (min)

0

20

40

60

80

100

Res

ourc

e P

ress

ure

() SPEC Hadoop Spark mem$d Cassandra L1-i

L1-dL2LLCCPUmemCapmemBwNetBwDiskCapDiskBw

Figure 8 Example of workload phase detection by Bolt

the L1-i and LLCs when pressure is moderate several iter-ations are needed for accuracy to increase In contrast foroff-chip resources like network bandwidth this effect is lesspronounced In general when moderate pressure hurts de-tection accuracy (Figure 9) more iterations are needed forcorrect identificationScheduler So far we have used a least loaded schedulerwhich is commonly-used in datacenters [33 63] This sched-uler does not account for resource contention between appli-cations leading to suboptimal performance [16 24 5 8482] Recent work has shown that if interference is accountedfor at the scheduler level both performance and utilizationimprove [48 19 53] We now evaluate the impact of sucha scheduler on Boltrsquos detection accuracy Quasar [19] lever-ages machine learning to quickly determine which applica-tions can be co-scheduled on the same machine without de-structive interference We use Quasar to schedule all victimapplications and then inject Bolt in each physical host for de-tection In a real setting Bolt can trick an interference-awarescheduler by incurring very low levels of interference ini-tially until it is co-scheduled with a victim application andcan determine its type and characteristics

Table 1 shows Boltrsquos accuracy with the least loadedscheduler (LL) and Quasar Interestingly accuracy increasesslightly with Quasar 2 on average but in general the im-pact of the scheduler is small There are two reasons for theincrease first both LL and Quasar do not share a singlehyperthread between jobs so core pressure measurementsremain accurate Second because Quasar only co-schedulesjobs with different critical resources it provides Bolt with aless ldquonoisyrdquo interference signal for uncore resources mak-ing distinguishing co-residents easier Therefore reducinginterference in software alone is not sufficient to mitigatethese security threats

0 20 40 60 80 100Resource Pressure ()

50

60

70

80

90

100

Acc

urac

y (

)

0 20 40 60 80 100Resource Pressure ()

50

60

70

80

90

100

Acc

urac

y (

)

L1-iLLCCPUMemCapNetBWDiskBW

Figure 9 Detection accuracy as a function of the pres-sure victims place in various shared resources

0 50 100 150 200 250 300Profiling Interval (sec)

0

20

40

60

80

100

Acc

urac

y (

)

0 5 10 15 20 25 30Adversarial VM size (vCPUs)

0

20

40

60

80

100

Acc

urac

y (

)

0 2 4 6 8 10Number of Benchmarks

0

20

40

60

80

100

Acc

urac

y (

)

Figure 10 Sensitivity to (a) profiling frequency (b)adversarial VM size and (c) profiling benchmarks

Sensitivity analysis Finally we examine how design de-cisions in Bolt affect detection Figure 10a shows how ac-curacy changes as profiling frequency decreases For profil-ing intervals beyond 30 sec accuracy drops rapidly If pro-filing only occurs every 5 minutes almost half the victimsare incorrectly identified Note that if workloads are long-running and mostly-stable profiling can be less frequentwithout such an impact on accuracy Figure 10b shows ac-curacy as a function of the adversarial VMrsquos size We exam-ine sizes offered as on-demand instances by EC2 [1] If theadversary has fewer than 4 vCPUs its resources are insuffi-cient to create enough contention to capture the co-residentsrsquopressure Accuracy continues to grow for larger sizes how-ever the larger the adversarial instance is the less likely itwill be co-scheduled with other VMs nullifying the valueof Bolt Finally Figure 10c shows accuracy as a function ofthe number of microbenchmarks used for profiling A singlebenchmark is not sufficient to fingerprint the characteristicsof a workload however using more than 3 benchmarks hasdiminishing returns in accuracy Unless otherwise specifiedwe use 20sec profiling intervals 4 vCPU adversarial VMsand 2 benchmarks for initial profiling

35 LimitationsWhile Bolt can accurately detect most frequently-run work-loads it has certain limitations First when no job sharesa core with the adversary the system assumes linear rela-tion between the co-residentsrsquo resource pressure in uncoreresources While this is true in some cases it may not begenerally accurate We will explore alternative ways of dis-tinguishing co-residents in future work Second the systemcannot differentiate between multiple jobs running in a sin-gle instance and multiple instances running one job eachWhile this does not affect detection if a covert-channel at-tack is aimed at a particular user more accurate informationwould increase the adversaryrsquos leverage Similarly Bolt can-not currently distinguish between two copies of the same job

sharing a platform at low load and one copy of the job thatruns at higher load

4 Cloud User StudyWe now evaluate Bolt in a real-world environment withmultiple users and unknown applications We conducted amulti-user study on Amazon EC2 subject to the followingrules

bull We have engaged 20 independent users from two aca-demic institutions (Cornell University and Stanford Uni-versity) provided them with access to a shared clusterof 200 c38xlarge instances on EC2 with 32 vCPUs and60GB of memory each and asked them to submit severalapplications of their choosing

bull On each instance we maintain a 4-vCPU VM for Bolt Allother resources are made available to the users as VMs

bull Each user submits one or more applications and all usershave equal priority ie no application will be preemptedto accommodate the job of another user

bull Users were instructed to behave amenably ie to avoidconsuming disproportionate resources so that the systemcan observe a representative job sample from all users

bull Users were also advised to pin their applications to spe-cific cores to eliminate interference from the OS sched-uler Physical cores can be shared across jobs howevereach vCPU (hardware thread) is dedicated to a single ap-plication

bull Users can optionally select the instance(s) they want tolaunch their applications on We provide a spreadsheetwith a list of all available instances and their utilizationat each point in time This information as well as any apriori information on the type number and characteris-tics of submitted applications is concealed from Bolt

bull If a user does not select an instance a least-loaded sched-uler selects the VM with the most available cores andmemory

Once users start launching jobs in the cluster we instan-tiate Bolt on each machine and activate it periodically to de-tect the type characteristics and number of co-scheduledapplications The entire experiment lasts approximately 4hours beyond which point all instances are terminated Fig-ure 11 shows the PDF of application types the different userslaunched provided by the users after the completion of theexperiment Different colors correspond to different users Atotal of 436 jobs were submitted to the cluster The applica-tion mix includes analytics jobs in Hadoop and Spark sci-entific benchmarks (Parsec Bioparallel) hardware synthesistools (Cadence VivadoHLS) multicore (zsim) and n-bodysimulations email clients web browsers web servers (http)music and video streaming services and databases amongothers Each job may take one or more vCPUs Even though

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30O

ccur

renc

esUser 1User 2User 3User 4User 5User 6User 7User 8User 9User 10

User 11User 12User 13User 14User 15User 16User 17User 18User 19User 20

1hadoop 5cadence 9MLPython 13spec 17parsec 21postgres 25ppt 29du -h 33cpu burn 37html 41cpmv 45rsync 49rm 53ix2spark 6zsim 10make 14matlab 18vim 22musicStream 26OS img 30crdel cgroup 34audacity 38cassandra 42sirius 46ping 50skype3email 7video 11mem$d 15mysql 19scala 23minebench 27pdfview 31bioparallel 35javascript 39mongoDB 43oProfile 47photoshop 51zipkin

4browser 8latex 12http server 16vivado 20php 24n-body sim 28scons 32storm 36create VMs 40mkdir 44dwnld LF 48ssh 52graphX

Figure 11 Probability distribution function of applications launched in the cloud study per user

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30

Occ

urre

nces

Correctly Identifying App Name

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30

Occ

urre

nces

Correctly Identifying App Characteristics

0 50 100 150 200Time (min)

0

50

100

150

200

Inst

ance

s

0006121824303642485460

Act

ive

Jobs

Figure 12 Detection accuracy of (a) application labels and (b) resource characteristics Fig 12(c) number of jobs perinstance

figure 11 groups applications by framework the individuallogic and datasets differ between jobs in the same frame-work Note that we have not updated the training set for thisstudy it consists of the same 120 applications described inSection 34

Figure 12a shows the number of applications that werecorrectly identified by name across the different categoriesIn total 277 jobs are correctly labeled As expected Boltcould not assign a label to application types it has not seenbefore such as email clients or image editing applicationsHowever even when Bolt cannot label an application it canstill identify the resources it is sensitive to as shown in fig-ure 12b For 385 out of 436 jobs Bolt correctly identifiesthe jobrsquos resource characteristics For performance attackssuch as the ones described in Section 5 this information issufficient to dramatically degrade the performance of a vic-tim application Finally figure 12c shows the number of co-scheduled applications per instance throughout the durationof the experiment The bottom 14 instances remained un-used for the remaining 186 instances the number of activeapplications ranges from 1 to 6 with each job taking oneor more vCPUs The majority of misclassified applicationscorrespond to instances with 5 or more concurrently activejobs

5 Security AttacksBolt makes several cloud attacks practical and difficult todetect We discuss three possible attacks that leverage theinformation obtained through detection

51 Internal Denial of Service AttackAttack setting Denial of service attacks hurt the perfor-mance of a victim service by overloading its resources [142 51] In cloud settings specifically they can be catego-rized in two types external and internal (or host-based) at-tacks External attacks are the most conventional form ofDoS [51 13 22] These attacks utilize external servers to di-rect excessive traffic to the victims flooding their resourcesand hurting their availability External DoS attacks affectmostly PaaS and SaaS systems and include IP spoofing syn-chronization (SYN) flooding smurf buffer overflow landand teardrop attacks

In contrast internal DoS attacks take advantage of IaaSand PaaS cloud multi-tenancy to launch adversarial pro-grams on the same host as the victim and degrade its perfor-mance [14 60 34 52] For example Ristenpart et al [60]showed how an adversarial user can leverage the IP namingconventions of IaaS clouds to locate a victim VM and de-grade its performance Cloud providers are starting to builddefenses against such attacks For example EC2 offers au-toscaling that will increase the instances of a service un-der heavy resource usage Similarly there is related workon mechanisms that detect saturation in memory bandwidth

0 20 40 60 80 100 120Time (s)

0

2000

4000

6000

8000

10000

12000

99th

ile

Lat

ency

BoltNaive

Detection

Migration

0 20 40 60 80 100 120Time (s)

0

20

40

60

80

100

CP

U U

tiliz

atio

n (

) BoltNaive

Detection

Migration

Figure 13 Latency and utilization with Bolt and a naıveDoS attack that saturates CPU resources

and can trigger VM migration [71] This means that DoSattacks that simply overload a physical host are ineffectivewhen such defenses are in place We focus on host-basedDoS attacks and make them resilient against such defensesby avoiding resource saturation We leverage the informationobtained by Bolt to construct custom interference programsthat stress the resources the victim is most sensitive to with-out overloading the host

Bolt creates custom contentious workloads by combiningthe microbenchmarks used to measure the resource pressureof the victim Since these microbenchmarks are tunable Boltconfigures their intensity to a higher point than their mea-sured pressure ci during detection For example if a victimis detected as memcached and its most critical resources arethe L1-i cache (81 pressure) and the last level cache (LLC)(78 pressure) Bolt uses the two microbenchmarks for L1-i and LLC at a higher intensity than what memcached cantolerateImpact We use the DoS attack setting above against the 108applications of the controlled experiment (Section 34) TheDoS attack results in degraded performance by 22x on aver-age and up to 98x in terms of execution time Degradation ismuch more pronounced for interactive workloads like key-value stores with tail latency increasing by 8-140x a dra-matic impact for applications with strict tail latency SLAs

We now examine the impact of the DoS attack on uti-lization If it translates to resource saturation there is a highprobability that the DoS will be detected and the victim mi-grated to a new machine The experimental cluster supportslive migration If CPU utilization (sampled every 1 sec) ex-ceeds 70 the victim is migrated to an unloaded host Fig-ure 13 compares the tail latency and CPU utilization withBolt to that of a naıve DoS that saturates the CPU through acompute-intensive kernel We focus on a single victim VMrunning memcached The overhead of migration (time be-tween initiating migration and latency returning to normallevels) for this VM is 8sec Performance degradation is sim-ilar for both systems until t=80sec at which point the victimis migrated to a new host due to the compute-intensive kernelcausing utilization to exceed 70 While during migrationperformance continues to degrade once the VM resumes inthe new server latency returns to nominal levels In contrastBolt keeps utilization low and impacts the victimrsquos perfor-mance beyond t=80sec

52 Resource Freeing AttackAttack setting The goal of a resource freeing attack (RFA)is to modify the workload of a victim such that it frees upresources for the adversary improving its performance [6766] The adversarial VM consists of two components a ben-eficiary and a helper The beneficiary is the program whoseperformance the attacker wants to improve and the helperis the program that forces the victim to yield its resourcesThe RFA works by adding load in the victimrsquos critical re-sources causing other resources to free up For example ifthe victim is a memory-bound Spark job introducing addi-tional memory traffic will result in Spark stalling in memoryand lessening its pressure in the other resources until it canreclaim its required memory bandwidth When launched ina public cloud the victim of an RFA ends up paying moreand achieving worse performance compared to running inisolation

We now create a proof of concept RFA for a webserver anetwork-bound Hadoop job and a memory-bound Spark jobLaunching RFAs requires in depth knowledge of the victimrsquosresource requirements [67] which Bolt provides by identi-fying the victimrsquos dominant resource Once the dominant re-source is detected the runtime applies a custom helper pro-gram that saturates this critical resource For the webserverthe helper is a CPU-intensive benchmark launching CGI re-quests causing the victim to saturate its CPU usage servic-ing fewer real user requests and freeing up cache resourcesFor the Hadoop job we use a network-intensive benchmarksimilar to iperf which saturates network bandwidth andfrees up CPU and memory resources for the beneficiarySimilarly for Spark we use a streaming memory benchmarkthat slows down k-means and frees network and compute re-sources for the adversary The selection of the beneficiaryis of lesser importance without loss of generality we se-lect mcf a CPU-intensive benchmark from SPECCPU2006The same methodology can be used with other beneficiariesconditioned to their critical resource not overlapping withthe victimrsquosImpact Table 2 shows the performance degradation for thethree victim applications and the improvement in executiontime for the beneficiary The webserver suffers the most interms of queries per second as the helperrsquos CGI requestspollute its cache preventing it from servicing legitimate userrequests Hadoop and Spark experience significant degrada-tions in execution time due to network and memory band-width saturation respectively Execution time for mcf im-proves by 16-38 benefiting from the victim stalling in itscritical resource to improve its cache and CPU usage

53 VM Co-residency DetectionAttack setting Sections 34 and 4 showed that we can iden-tify the applications sharing a cloud infrastructure Howevera malicious user is rarely interested in a random service run-ning on a public cloud More often they target a specific

Victim Beneficiary TargetApp Perf App Perf Resource

Apache Webserver -64 (QPS) mcf +24 CPUHadoop (SVM) -36 (Exec) mcf +16 Network BW

Spark (k-means) -52 (Exec) mcf +38 Memory BW

Table 2 RFA impact on the victims and beneficiaries

workload eg a competitive e-commerce site Thereforethey need to pinpoint where the target resides in a practicalmanner This requires a launch strategy and a mechanism forco-residency detection [69] The attack is practical if the tar-get is located with high confidence in reasonable time andwith modest resource costs Bolt enables this attack to becarried out in a practical way and remains resilient againstdefense mechanisms such as virtual private clouds (VPCs)which make internal IP addresses private to a single tenantOnce a target VM is located the adversary can launch RFAor DoS attacks as previously described Co-residency detec-tion attacks rely on leakage of logical information eg IPaddress or on observing the impact of a side-channel attackdue to resource contention Bolt relies on a variation of thelatter approach

Assume a system of N servers A target victim userlaunches k VMs The adversary launches n malicious VMsInstances are launched simultaneously to avoid malicious-malicious instance co-residency [69] The probability that atleast one adversarial VM will be co-scheduled with a victiminstance is P( f ) = 1minus (1minus kN)n The adversary then usesa two-process detection scheme The sender is a process thatcreates contention to degrade the victimrsquos performance Thereceiver is the process that detects this degradation runningeither on the same host as the victim (cooperative detection)or externally (uncooperative detection) We assume an unco-operative victim which is the more general case Bolt worksas the sender to determine the type of co-residents on eachsampled host and identify any of the desired type Assumem VMs of the desired type have been detected in the sampleset Bolt now needs to prune down this space to VM(s) fromthe specific target victim It launches an external receiveraiming at the victim over a public channel eg HTTP orkey-value protocol to ping the victim service This is theonly point of communication with the victim At the sametime the sender incurs contention in the resources the victimis most sensitive to If the sender and victim are co-residentsthe receiverrsquos requests will be slower due to contention Boltquickly prunes down the sampled VMs by determining thetype of co-residents In a large cluster this reduces the timeto pinpoint a victim and the resource costs for the adver-sary It also increases the confidence in the detection in thepresence of noise from other co-residentsImpact We evaluate co-residency detection on the same40-node cluster The victim is a SQL server instantiatinga single VM The cluster also hosts 7 other VMs runningSQL Other active VMs host key-value stores Hadoop andSpark jobs Bolt launches 10 senders simultaneously on

randomly-selected machines (P( f ) = ) and detects 3VMs running SQL in the sample set It then introduces mem-ory interference as the receiver initiates a large number ofSQL queries While these queries have 816msec mean la-tency without interference latency now becomes 2614msecGiven a sim 3times latency increase we can conclude that one ofthe co-residents is the victim SQL server Detection required6sec from instantiation to receiver detection and 11 adver-sary VMs

6 Improving Security via Resource IsolationSince interference is at the heart of Boltrsquos detection method-ology isolation mechanisms that attenuate contention shouldreduce its accuracy We first evaluate to what extent ex-isting isolation techniques mitigate security vulnerabilitiesand then highlight trade-offs between security performanceand utilization We use the same setting as in the controlledexperiment of Section 34 We evaluate 4 resource-specificisolation techniques and 3 settings for OS-level isolationmechanisms baremetal containerized and virtualized Thebaremetal system does not employ any OS level isolationThe containerized setup uses Linux containers (lxc) andassigns cores to applications via cpuset cgroups Bothcontainers and VMs constrain memory capacity Baremetalexperiments do not enforce memory capacity allocationsand the Linux scheduler is free to float applications acrosscores

The first resource-specific isolation techniques is threadpinning to physical cores to constrain interference fromscheduling actions like context switching The number ofcores an application is allocated can change dynamicallyand is limited by how fast Linux can migrate tasks typicallyin the tens of milliseconds

For network isolation we use the outbound networkbandwidth partitioning capabilities of Linuxrsquos traffic con-trol Specifically we use the qdisc [8] scheduler with hier-archical token bucket queueing discipline (HTB) to enforceegress bandwidth limits The limits are set to the maximumtraffic burst rate for each application to avoid contention(ceil parameter in HTB interface) Ingress network band-width isolation has been extensively studied in previouswork [36] these approaches can be applied here as well

For DRAM bandwidth isolation there is no commer-cially available partitioning mechanism To enforce band-width isolation we use the following approach we monitorthe DRAM bandwidth usage of each application throughperformance counters [45] and modify the scheduler to onlycolocate jobs on machines that can accommodate their ag-gregate peak memory bandwidth requirements This requiresextensive application knowledge and is used simply to high-light the benefits of DRAM bandwidth isolation

Finally for last level cache (LLC) isolation we use theCache Allocation Technology (CAT) available in recent In-tel chips [11] CAT partitions the LLC in ways which in

0

20

40

60

80

100

Acc

urac

y (

)

None ThreadPinning

+Net BWPartitioning

+Mem BWPartitioning

+CachePartitioning

+CoreIsolation

BaremetalLinuxContainersVirtualMachines

Figure 14 Detection accuracy with isolation techniques

a highly-associative cache allows for non-overlapping par-titions at the granularity of a few percent of the LLC ca-pacity Each co-resident is allocated one partition configuredto its current capacity requirements [57] Partitions can beresized at runtime by reprogramming specific low-level reg-isters (MSRs) changes take effect after a few milliseconds

We add one isolation mechanism at a time in the threesystem settings (baremetal containers VMs) Figure 14shows the impact of isolation techniques on Boltrsquos detectionaccuracy As expected when no isolation is used baremetalallows for a significantly higher detection accuracy thancontainer- and VM-based systems mostly due to the lat-ter constraining core and memory capacity usage As a re-sult introducing thread pinning benefits baremetal the mostsince it reduces core contention It also benefits container-and VM-based setups to a lesser degree by eliminating un-predictability introduced by the Linux scheduler (eg con-text switching) [40] The dominant resource of each appli-cation determines which isolation technique benefits it themost Thread pinning mostly benefits workloads bound byon-chip resources such as L1L2 caches and cores Addingnetwork bandwidth partitioning lowers detection accuracyfor all three settings almost equally It primarily benefitsnetwork-bound workloads for which network interferenceconveys the most information for detection Memory band-width isolation further reduces accuracy by 10 on averagebenefiting jobs dominated by DRAM traffic Finally cachepartitioning has the most dramatic reduction in accuracyespecially for LLC-bound workloads We attribute this tothe importance cache pressure has as a detection signal Thenumber of co-residents also affects the extent to which iso-lation helps The more co-scheduled applications exist permachine the more isolation techniques degrade accuracy asthey make distinguishing between co-residents harder

Unfortunately even when all the techniques are on ac-curacy is still 50 due to two reasons current techniquesare not fine-grain enough to allow strict and scalable isola-tion and core resources (L1L2 caches CPU) are prone tointerference due to contending hyperthreads [61] To eval-uate the latter hypothesis we modify the scheduler suchthat hyperthreads of different jobs are never scheduled onthe same physical core eg if a job needs 7vCPUs it isallocated 4 dedicated physical cores The grey bars of Fig-ure 14 show the detection accuracy Baremetal instances stillallow certain applications to be detected but for container-

ized and virtualized settings accuracy drops to 14 The re-maining accuracy corresponds to disk-heavy workloads Im-proving security however comes at a performance penaltyof 34 on average in execution time as threads of the samejob are forced to contend with one another Alternativelyusers can overprovision their resource reservations to avoiddegradation which results in a 45 drop in utilization Thismeans that the cloud provider cannot leverage CPU idlenessto share machines decreasing the cost benefits of cloud com-puting Note that enforcing core isolation alone is also notsufficient as it allows a detection accuracy of 46Discussion The previous analysis highlights a design prob-lem with current datacenter platforms Traditional multi-cores are prone to contention which will only worsen asmore cores are integrated in each server and multi-tenancybecomes more pronounced Existing isolation techniques areinsufficient to mitigate security vulnerabilities and tech-niques that provide reasonable security guarantees sacrificeperformance or cost efficiency through low utilization Thishighlights the need for new fine-grain and coordinated iso-lation techniques that guarantee security at high utilizationfor shared resources

7 ConclusionsWe have presented Bolt a practical system that uses onlinedata mining techniques to identify the type and character-istics of applications running on shared cloud systems andenables practical attacks that degrade their performance Ina 40-server cluster Bolt correctly identifies 87 out of 108workloads and degrades tail latency by up to 140x We alsoused Bolt in a user study on EC2 to detect the character-istics of unknown jobs submitted by multiple users Finallywe show that while existing isolation techniques are helpfulthey are not sufficient to mitigate such attacks Bolt revealsreal easy-to-exploit threats in public clouds We hope thatthis work will motivate cloud providers and computer sci-entists to develop and deploy stricter isolation primitives incloud systems

AcknowledgementsWe sincerely thank all the participants in the user study fortheir time and effort We also thank Daniel Sanchez EdSuh Grant Ayers Mingyu Gao Ana Klimovic and the restof the MAST group as well as the anonymous reviewersfor their feedback on earlier versions of this manuscriptThis work was supported by the Stanford Platform LabNSF grant CNS-1422088 and a John and Norma BalenSesquicentennial Faculty Fellowship

References[1] Amazon ec2 httpawsamazoncomec2

[2] Aman Bakshi and Yogesh B Dujodwala Securing cloudfrom ddos attacks using intrusion detection system in virtualmachine In Proc of the Second International Conference onCommunication Software and Networks (ICCSN) 2010

[3] Luiz Barroso and Urs Hoelzle The Datacenter as a Com-puter An Introduction to the Design of Warehouse-Scale Ma-chines MC Publishers 2009

[4] Naomi Benger Joop van de Pol Nigel P Smart and YuvalYarom rdquoooh aah just a little bitrdquo A small amount of sidechannel can go a long way In Proc of the InternationalCryptographic Hardware and Embedded Systems Workshop(CHES) Busan South Korea 2014

[5] Major Bhadauria and Sally A McKee An approach toresource-aware co-scheduling for cmps In Proc of the 24thACM International Conference on Supercomputing (ICS)Tsukuba Japan 2010

[6] Leon Bottou Large-scale machine learning with stochas-tic gradient descent In Proceedings of the InternationalConference on Computational Statistics (COMPSTAT) ParisFrance 2010

[7] Eric Brewer Kubernetes The path to cloud native httpgooglQgkzYB SOCC Keynote August 2015

[8] Martin A Brown Traffic control howto httplinux-ipnetarticlesTraffic-Control-HOWTO

[9] Robin Burke Hybrid recommender systems Survey andexperiments User Modeling and User-Adapted Interaction12(4)331ndash370 November 2002

[10] Apache cassandra httpcassandraapacheorg

[11] Intel Rcopy64 and IA-32 Architecture Software DeveloperrsquosManual vol3B System Programming Guide Part 2 Septem-ber 2014

[12] Ludmila Cherkasova Diwaker Gupta and Amin VahdatComparison of the three cpu schedulers in xen SIGMET-RICS Perform Eval Rev 35(2)42ndash51 September 2007

[13] Scott A Crosby and Dan S Wallach Denial of service viaalgorithmic complexity attacks In Proceedings of the 12thConference on USENIX Security Symposium WashingtonDC 2003

[14] Marwan Darwish Abdelkader Ouda and Luiz FernandoCapretz Cloud-based ddos attacks and defenses In Procof i-Society Toronto ON 2013

[15] Christina Delimitrou and Christos Kozyrakis iBench Quan-tifying Interference for Datacenter Workloads In Proceed-ings of the 2013 IEEE International Symposium on WorkloadCharacterization (IISWC) Portland OR September 2013

[16] Christina Delimitrou and Christos Kozyrakis Paragon QoS-Aware Scheduling for Heterogeneous Datacenters In Pro-ceedings of the Eighteenth International Conference on Ar-chitectural Support for Programming Languages and Oper-ating Systems (ASPLOS) Houston TX USA 2013

[17] Christina Delimitrou and Christos Kozyrakis QoS-AwareScheduling in Heterogeneous Datacenters with Paragon InACM Transactions on Computer Systems (TOCS) Vol 31Issue 4 December 2013

[18] Christina Delimitrou and Christos Kozyrakis Quality-of-Service-Aware Scheduling in Heterogeneous Datacenterswith Paragon In IEEE Micro Special Issue on Top Picks fromthe Computer Architecture Conferences MayJune 2014

[19] Christina Delimitrou and Christos Kozyrakis Quasar Resource-Efficient and QoS-Aware Cluster Management In Proceed-ings of the Nineteenth International Conference on Architec-tural Support for Programming Languages and OperatingSystems (ASPLOS) Salt Lake City UT USA 2014

[20] Christina Delimitrou and Christos Kozyrakis HCloudResource-Efficient Provisioning in Shared Cloud SystemsIn Proceedings of the Twenty First International Conferenceon Architectural Support for Programming Languages andOperating Systems (ASPLOS) April 2016

[21] Christina Delimitrou Daniel Sanchez and Christos KozyrakisTarcil Reconciling Scheduling Speed and Quality in LargeShared Clusters In Proceedings of the Sixth ACM Symposiumon Cloud Computing (SOCC) August 2015

[22] Jake Edge Denial of service via hash collisions httplwnnetArticles474912 January 2012

[23] Benjamin Farley Ari Juels Venkatanathan Varadarajan ThomasRistenpart Kevin D Bowers and Michael M Swift More foryour money Exploiting performance heterogeneity in publicclouds In Proc of the ACM Symposium on Cloud Computing(SOCC) San Jose CA 2012

[24] Alexandra Fedorova Margo Seltzer and Michael D SmithImproving performance isolation on chip multiprocessors viaan operating system scheduler In Proceedings of the 16thIntl Conference on Parallel Architecture and CompilationTechniques (PACT) Brasov Romania 2007

[25] Alexander Felfernig and Robin Burke Constraint-based rec-ommender systems Technologies and research issues InProceedings of the ACM International Conference on Elec-tronic Commerce (ICEC) Innsbruck Austria 2008

[26] Brad Fitzpatrick Distributed caching with memcached InLinux Journal Volume 2004 Issue 124 2004

[27] Oded Goldreich and Rafail Ostrovsky Software protectionand simulation on oblivious rams J ACM 43(3)431ndash473May 1996

[28] Google container engine httpscloudgooglecomcontainer-engine

[29] Asela Gunawardana and Christopher Meek A unified ap-proach to building hybrid recommender systems In Proc ofthe Third ACM Conference on Recommender Systems (Rec-Sys) New York NY 2009

[30] Sanchika Gupta and Padam Kumar Vm profile based opti-mized network attack pattern detection scheme for ddos at-tacks in cloud In Proc of SSCC Mysore India 2013

[31] Yi Han Tansu Alpcan Jeffrey Chan and Christopher LeckieSecurity games for virtual machine allocation in cloud com-puting In 4th International Conference on Decision andGame Theory for Security Fort Worth TX 2013

[32] Amir Herzberg Haya Shulman Johanna Ullrich and EdgarWeippl Cloudoscopy Services discovery and topology map-ping In Proceedings of the ACM Workshop on Cloud Com-puting Security Workshop (CCSW) Berlin Germany 2013

[33] Ben Hindman Andy Konwinski Matei Zaharia Ali GhodsiAnthony D Joseph Randy Katz Scott Shenker and Ion Sto-ica Mesos A platform for fine-grained resource sharing inthe data center In Proceedings of NSDI Boston MA 2011

[34] Jingwei Huang David M Nicol and Roy H CampbellDenial-of-service threat to hadoopyarn clusters with multi-tenancy In Proc of the IEEE International Congress on BigData Washington DC 2014

[35] Alexandru Iosup Nezih Yigitbasi and Dick Epema Onthe performance variability of production cloud services InProceedings of CCGRID Newport Beach CA 2011

[36] Vimalkumar Jeyakumar Mohammad Alizadeh David MazieresBalaji Prabhakar Changhoon Kim and Albert GreenbergEyeq Practical network performance isolation at the edge InProc of the 10th USENIX Conference on Networked SystemsDesign and Implementation (NSDI) Lombard IL 2013

[37] Yaakoub El Khamra Hyunjoo Kim Shantenu Jha and Man-ish Parashar Exploring the performance fluctuations of hpcworkloads on clouds In Proceedings of CloudCom Indi-anapolis IN 2010

[38] Krzysztof C Kiwiel Convergence and efficiency of subgra-dient methods for quasiconvex minimization In Mathemat-ical Programming (Series A) (Berlin Heidelberg Springer)90 (1) pp 1-25 2001

[39] Ruby B Lee Rethinking computers for cybersecurity IEEEComputer 48(4)16ndash25 2015

[40] Jacob Leverich and Christos Kozyrakis Reconciling highserver utilization and sub-millisecond quality-of-service InProceedings of EuroSys Amsterdam The Netherlands 2014

[41] Host server cpu utilization in amazon ec2 cloud httpgoogl2LTx4T

[42] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser andRuby B Lee Last-level cache side-channel attacks are prac-tical In Proc of IEEE Symposium on Security and Privacy(SampP) San Jose CA 2015

[43] Fei Liu Lanfang Ren and Hongtao Bai Mitigating cross-vm side channel attack on multiple tenants cloud platform InJournal of Computers Vol 9 No 4 (2014) 1005-1013 April2014

[44] David Lo Liqun Cheng Rama Govindaraju Luiz Andre Bar-roso and Christos Kozyrakis Towards energy proportionalityfor large-scale latency-critical workloads In Proceedings ofthe 41st Annual International Symposium on Computer Ar-chitecuture (ISCA) Minneapolis MN 2014

[45] David Lo Liqun Cheng Rama Govindaraju ParthasarathyRanganathan and Christos Kozyrakis Heracles Improvingresource efficiency at scale In Proc of the 42Nd AnnualInternational Symposium on Computer Architecture (ISCA)Portland OR 2015

[46] Mahout httpmahoutapacheorg

[47] Dave Mangot Ec2 variability The numbers revealed httptechmangotcomrollerdaveentryec2_variability_the_numbers_revealed

[48] Jason Mars and Lingjia Tang Whare-map heterogeneity inrdquohomogeneousrdquo warehouse-scale computers In Proceedingsof ISCA Tel-Aviv Israel 2013

[49] Robert Martin John Demme and Simha SethumadhavanTimewarp Rethinking timekeeping and performance moni-toring mechanisms to mitigate side-channel attacks In Pro-ceedings of the International Symposium on Computer Archi-tecture (ISCA) Portland OR 2012

[50] David Meisner Christopher M Sadler Luiz Andre BarrosoWolf-Dietrich Weber and Thomas F Wenisch Power man-agement of online data-intensive services In Proceedings ofthe 38th annual international symposium on Computer archi-tecture pages 319ndash330 2011

[51] Jelena Mirkovic and Peter Reiher A taxonomy of ddos attackand ddos defense mechanisms ACM SIGCOMM ComputerCommunication Review (CCR) April 2004

[52] Thomas Moscibroda and Onur Mutlu Memory performanceattacks Denial of memory service in multi-core systemsIn Proc of 16th USENIX Security Symposium on USENIXSecurity Symposium (SS) Boston MA 2007

[53] Ripal Nathuji Aman Kansal and Alireza Ghaffarkhah Q-clouds Managing performance interference effects for qos-aware clouds In Proceedings of EuroSys ParisFrance 2010

[54] Simon Ostermann Alexandru Iosup Nezih Yigitbasi RaduProdan Thomas Fahringer and Dick Epema A performanceanalysis of ec2 cloud computing services for scientific com-puting In Lecture Notes on Cloud Computing Volume 34p115-131 2010

[55] Tao Peng Christopher Leckie and Kotagiri RamamohanaraoSurvey of network-based defense mechanisms countering thedos and ddos problems ACM Comput Surv 39(1) April2007

[56] Diego Perez-Botero Jakub Szefer and Ruby B Lee Char-acterizing hypervisor vulnerabilities in cloud computingservers In Proceedings of the 2013 International Work-shop on Security in Cloud Computing SCCASIACCSHangzhou China 2013

[57] Moinuddin K Qureshi and Yale N Patt Utility-based cachepartitioning A low-overhead high-performance runtimemechanism to partition shared caches In Proceedings ofthe 39th Annual IEEEACM International Symposium on Mi-croarchitecture MICRO 39 2006

[58] Himanshu Raj Ripal Nathuji Abhishek Singh and Paul Eng-land Resource management for isolation enhanced cloud ser-vices In Proc of the ACM Workshop on Cloud ComputingSecurity (CCSW) Chicago IL 2009

[59] Suhail Rehman and Majd Sakr Initial findings for provision-ing variation in cloud computing In Proceedings of Cloud-Com Indianapolis IN 2010

[60] Thomas Ristenpart Eran Tromer Hovav Shacham and Ste-fan Savage Hey you get off of my cloud Exploring infor-mation leakage in third-party compute clouds In Proc of theACM Conference on Computer and Communications Security(CCS) Chicago IL 2009

[61] Daniel Sanchez and Christos Kozyrakis Vantage Scalableand Efficient Fine-Grain Cache Partitioning In Proceedingsof the 38th annual International Symposium in ComputerArchitecture (ISCA-38) San Jose CA June 2011

[62] Jorg Schad Jens Dittrich and Jorge-Arnulfo Quiane-RuizRuntime measurements in the cloud Observing analyzingand reducing variance Proceedings VLDB Endow 3(1-2)460ndash471 September 2010

[63] Malte Schwarzkopf Andy Konwinski Michael Abd-El-Malek and John Wilkes Omega flexible scalable sched-ulers for large compute clusters In Proceedings of EuroSysPrague Czech Republic 2013

[64] Alan Shieh Srikanth Kandula Albert Greenberg and ChanghoonKim Seawall Performance isolation for cloud datacenternetworks In Proc of the USENIX Conference on Hot Topicsin Cloud Computing (HotCloud) Boston MA 2010

[65] David Shue Michael J Freedman and Anees Shaikh Perfor-mance isolation and fairness for multi-tenant cloud storage InProc of the 10th USENIX Conference on Operating SystemsDesign and Implementation (OSDI) Hollywood CA 2012

[66] Dan Tsafrir Yoav Etsion and Dror G Feitelson Secretlymonopolizing the cpu without superuser privileges In Procof 16th USENIX Security Symposium on USENIX SecuritySymposium Boston MA 2007

[67] Venkatanathan Varadarajan Thawan Kooburat BenjaminFarley Thomas Ristenpart and Michael Swift Resource-freeing attacks Improve your cloud performance (at yourneighborrsquos expense) In Proc of the Conference on Computerand Communications Security (CCS) Raleigh 2012

[68] Venkatanathan Varadarajan Thomas Ristenpart and MichaelSwift Scheduler-based defenses against cross-vm side-channels In Proc of the 23rd Usenix Security SymposiumSan Diego CA 2014

[69] Venkatanathan Varadarajan Yinqian Zhang Thomas Risten-part and Michael Swift A placement vulnerability study inmulti-tenant public clouds In Proc of the 24th USENIX Secu-rity Symposium (USENIX Security) Washington DC 2015

[70] Huaibin Wang Haiyun Zhou and Chundong Wang Vir-tual machine-based intrusion detection system framework incloud computing environment In Journal of Computers Oc-tober 2012

[71] Hui Wang Canturk Isci Lavanya Subramanian JongmooChoi Depei Qian and Onur Mutlu A-drm Architecture-aware distributed resource management of virtualized clus-ters In Proceedings of the 11th ACM SIGPLANSIGOPSinternational conference on Virtual Execution Environments(VEE) Istanbul Turkey 2015

[72] Ian H Witten Eibe Frank and Geoffrey Holmes DataMining Practical Machine Learning Tools and Techniques3rd Edition

[73] Zhenyu Wu Zhang Xu and Haining Wang Whispers inthe hyper-space High-speed covert channel attacks in thecloud In Proc of the 21st USENIX Conference on SecuritySymposium (USENIX Security) Bellevue WA 2012

[74] Yunjing Xu Michael Bailey Farnam Jahanian KaustubhJoshi Matti Hiltunen and Richard Schlichting An explo-ration of l2 cache covert channels in virtualized environmentsIn Proc of the 3rd ACM Workshop on Cloud Computing Se-curity Workshop (CCSW) Chicago IL 2011

[75] Zhang Xu Haining Wang and Zhenyu Wu A measurementstudy on co-residence threat inside the cloud In Proc ofthe 24th USENIX Security Symposium (USENIX Security)Washington DC 2015

[76] Yuval Yarom and Katrina Falkner Flush+reload a highresolution low noise l3 cache side-channel attack In Proc ofthe 23rd Usenix Security Symposium San Diego CA 2014

[77] Matei Zaharia Mosharaf Chowdhury Tathagata Das AnkurDave Justin Ma Murphy McCauly Michael J FranklinScott Shenker and Ion Stoica Resilient distributed datasetsA fault-tolerant abstraction for in-memory cluster computingIn Proceedings of NSDI San Jose CA 2012

[78] Yinqian Zhang Ari Juels Alina Oprea and Michael K Re-iter Homealone Co-residency detection in the cloud via side-channel analysis In Proc of the IEEE Symposium on Securityand Privacy Oakland CA 2011

[79] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-tenant side-channel attacks in paas cloudsIn Proc of the ACM SIGSAC Conference on Computer andCommunications Security (CCS) Scottsdale AZ 2014

[80] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-vm side channels and their use to ex-tract private keys In Proceedings of the ACM Conferenceon Computer and Communications Security (CCS) RaleighNC 2012

[81] Yinqian Zhang and Michael K Reiter Duppel retrofittingcommodity operating systems to mitigate cache side channelsin the cloud In Proc of the ACM Conference on Computerand Communications Security (CCS) Berlin Germany 2013

[82] Fangfei Zhou Manish Goel Peter Desnoyers and Ravi Sun-daram Scheduler vulnerabilities and coordinated attacks incloud computing J Comput Secur 21(4)533ndash559 July2013

[83] Jieming Zhu Pinjia He Zibin Zheng and Michael R LyuTowards online accurate and scalable qos prediction for run-time service adaptation In Proc of the IEEE InternationalConference on Distributed Computing Systems (ICDCS)Madrid Spain 2014

[84] Sergey Zhuravlev Sergey Blagodurov and Alexandra Fe-dorova Addressing shared resource contention in multicoreprocessors via scheduling In Proc of the Fifteenth Editionof on Architectural Support for Programming Languages andOperating Systems (ASPLOS) Pittsburgh PA 2010

  • Introduction
  • Related Work
  • Bolt
    • Threat Model
    • Application Detection
    • Challenges
    • Detection Accuracy
    • Limitations
      • Cloud User Study
      • Security Attacks
        • Internal Denial of Service Attack
        • Resource Freeing Attack
        • VM Co-residency Detection
          • Improving Security via Resource Isolation
          • Conclusions
Page 3: Bolt: I Know What You Did Last Summer In the Cloudcsl.stanford.edu/~christos/publications/2017.bolt.asplos.pdf · 2018-01-09 · Bolt: I Know What You Did Last Summer... In the Cloud

Server

N vCPUs 4 vCPUs

Victim VM Advers VM

Server

N vCPUs 4 vCPUs

Victim VM Advers VM

ci

cj

ck

~5 sec

[ci 0 cj 0 0 0 hellip ck 0]

c11 c12 c13 c14 hellip c1(N-1) c1N

c21 c22 c23 c24 hellip c2(N-1) c2N

c31 c32 c33 c34 hellip c3(N-1) c3N

cM1 cM2 cM3 cM4hellip cM(N-1) cMN

SVD +

SGD

c11 c12 c13 c14 hellip c1(N-1) c1N

c21 c22 c23 c24 hellip c2(N-1) c2N

c31 c32 c33 c34 hellip c3(N-1) c3N

cM1 cM2 cM3 cM4hellip cM(N-1) cMN

c1 c2 c3 c4 hellip c(N-1) cN

hellip

hellip

hellip

hellip

App 1

App 2

App 3

App M

Pearson similarity

App 3 80

App 4 21

App 2 45

App 1 16

App M 1

~50 msec ~30 msec

Figure 1 Overview of the application detection process Bolt first measures the pressure co-residents place in sharedresources and then uses data mining to determine the type and characteristics of co-scheduled applications

nerabilities of common hypervisors and Wang et al [70]proposed a system for intrusion detection in cloud settingswhile Liu et al [43] and Varadajaran et al [68] designedscheduler-based defenses against covert- and side-channelattacks in the memory bus The latter system controls theoverlapping execution of different VMs and injects noise inthe memory bus to prevent an adversary from extracting con-fidential information Bolt does not rely on accurate microar-chitectural event measurements through performance coun-ters to detect application placement and is therefore resilientto techniques that limit the fidelity of time-keeping and per-formance monitoring to thwart information leakage in side-channel attacks [49]

3 Bolt31 Threat ModelBolt targets IaaS providers that operate public clouds for mu-tually untrusting users Multiple VMs can be co-scheduledon the same server Each VM has no control over whereit is placed and no a priori information on other VMs onthe same physical host For now we assume that the cloudprovider is neutral with respect to detection by adversarialVMs ie it does not assist such attacks or employ addi-tional resource isolation techniques than what is availableby default to hinder attacks by adversarial users In Section 6we explore how additional isolation techniques affect Boltrsquosdetection accuracyAdversarial VM An adversarial VM has the goal of de-termining the nature and characteristics of any applicationsco-scheduled on the same physical host and negatively im-pacting their performance Adversarial VMs start with zeroknowledge of co-scheduled workloadsFriendly VM This is a normal VM scheduled on a physi-cal host that runs one or more applications Friendly VMsdo not attempt to determine the existence and characteris-tics of other co-scheduled VMs They also do not employany schemes to prevent detection such as memory patternobfuscation [27]

32 Application DetectionDetection relies on inferring workload characteristics fromthe contention the adversary experiences in shared resources

For simplicity we assume one co-scheduled victim jobfor now and generalize in Section 33 Figure 1 shows anoverview of the systemrsquos operation

Bolt instantiates an adversarial VM on the same physi-cal host as the victim friendly VM Bolt uses its VM to runa few microbenchmarks of tunable intensity that each putprogressively more pressure on a specific shared resourceThe resources stressed by the microbenchmarks include on-chip resources such as functional units and different lev-els of the cache hierarchy and off-chip resources such asthe memory network and storage subsystems [15] Each mi-crobenchmark progressively increases its intensity from 0 to100 until it detects pressure from the co-scheduled work-load ie until the microbenchmarkrsquos performance is worsethan its expected value when running in isolation The inten-sity of the microbenchmark at that point captures the pres-sure the victim incurs in shared resource i and is denoted ciwhere iisin[1N] N=10 and ci isin[0100] The 10 resources areL1 instruction and L1 data cache L2 and last level cachememory capacity and memory bandwidth CPU networkbandwidth disk capacity and disk bandwidth Large valuesof ci imply high pressure in resource i For unconstrained re-sources eg last level cache 100 pressure means that thebenchmark takes over the entire resource capacity For re-sources constrained through partitioning mechanisms egmemory capacity in VMs 100 corresponds to taking overthe entire memory partition allocated to the VM

Bolt uses 2-3 microbenchmarks for profiling requiringapproximately 2-5 seconds in total It randomly selects onecore and one uncore benchmark to get a more representativesnapshot of the co-residentrsquos resource profile If the coreis not shared between the adversarial and friendly VMsBolt measures zero pressure in the shared core resource Itthen adds a third microbenchmark for an additional uncoreresource (shared caches memory network or storage)

We use this sparse signal to determine the applicationrsquospressure in all other resources its type and characteristicsOnline data mining techniques have recently been shown toeffectively solve similar problems in cluster management byfinding similarities between new unknown applications andpreviously seen workloads [19] The Quasar cluster man-ager for example used collaborative filtering to find similar-

ities in terms of heterogeneity interference sensitivity andprovisioning The advantage of collaborative filtering is thatit can identify application similarities without a priori knowl-edge of application types and critical features Unfortunatelythis means that it is also unable to label the victim workloadsand classify their functionality making it insufficient for theadversaryrsquos purposePractical data mining Bolt instead feeds the profiling sig-nal to a hybrid recommender using feature augmentation [925] that determines the type and resource characteristicsof victim workloads The recommender combines collab-orative filtering and content-based similarity detection [929] The former has good scaling properties relaxed spar-sity constraints and offers conceptual insight on similaritieswhile the latter exploits contextual information for accurateresource profile matching

First a collaborative filtering system recovers the pres-sure the victim places in non-profiled resources [16 83]The system relies on matrix factorization with singularvalue decomposition (SVD) [72] and PQ-reconstructionwith stochastic gradient descent (SGD) [6 38] to find simi-larities between the new victim application and previously-seen workloads SVD produces three matrices U Σ and VThe singular values σi in Σ correspond to similarity con-cepts such as the intensity of compute operations or thecorrelation between high network and disk traffic Similar-ity concepts are in the order of the number of examinedresources and are ordered by decreasing magnitude in ΣLarge singular values reveal stronger similarity concepts(higher confidence in workload correlation) while smallervalues correspond to weaker similarity concepts and aretypically discarded during the dimensionality reduction bySVD Matrix U(mr) captures the correlation between eachapplication and similarity concept and V(nr) the correlationbetween each resource and similarity concept

Once we have identified critical similarity concepts anddiscarded inconsequential information a content-based sys-tem that uses weighted Pearson correlation coefficients tocompute the similarity between the resource profile u j of anew application j and the applications the system has previ-ously seen Weights correspond to the values of the r morecritical similarity concepts 1 and resource profiles to therows of the matrix of left singular vectors U Using tra-ditional Pearson correlation would discard the application-specific information that certain resources are more criticalfor a given workload Similarity between applications A andB is given by

Weighted Pearson(AB) =cov(uAuBσ)radic

cov(uAuAσ) middot cov(uBuBσ)(1)

where uA is the correlation of application A with each simi-larity concept σi cov(uAuBσ)= sumi σi(uAiminusm(uAσ))(uBiminusm(uBσ))

sumi σi

1 We keep the r largest singular values such that we preserve 90 of thetotal energy sum

ri=1 σ2

i = 90sumni=1 σ2

i

0 25 50 75 100L1-i Cache ()

100

75

50

25

0Last

Lev

el C

ache

(

)

0 25 50 75 100L1-d Cache ()

100

75

50

25

0

CP

U (

)

0 25 50 75 100Memory Capacity ()

100

75

50

25

0Mem

ory

Ban

dwid

th (

)

0 5 10 15 200

5

10

15

20

00

01

02

03

04

05

06

07

08

09

10

Pro

babi

lity

0 25 50 75 100Disk Capacity ()

100

75

50

25

0Net

wor

k B

andw

idth

(

)

0 25 50 75 100Disk Bandwidth ()

100

75

50

25

0

L2 C

ache

(

)

0 5 10 15 200

5

10

15

20

00

01

02

03

04

05

06

07

08

09

10

Pro

babi

lity

Figure 2 We show the correlation between the pres-sure in various resources and the probability of a co-scheduled application being memcached For exampleworkloads with high LLC pressure and very high L1-ipressure have a high probability of being memcached

the covariance of A and B under weights σ and m(uAσ) =sumi σimiddotuAi

sumi σithe weighted mean for A The output of the hybrid

recommender is a distribution of similarity scores of howclosely a victim resembles different previously-seen appli-cations For example a victim may be 65 similar to amemcached workload 18 similar to a Spark job runningPageRank 10 similar to a Hadoop job running an SVMclassifier and 3 to a Hadoop job running k-means The95th percentile of the recommenderrsquos end-to-end latency is80msec Apart from application labels this analysis yieldsinformation on the resources the victim is sensitive to en-abling several practical performance attacks (Section 5)System insights from data mining Before dimensionalityreduction each similarity concept corresponds to a sharedresource Different resources convey different amounts of in-formation about a workload and thus have different value toBolt for detection The magnitude of each similarity conceptreflects how strongly it captures application similarities Forthe controlled experiment of Section 34 that involves batchand interactive jobs the resources with most value are the LLand L1-i caches followed by compute intensity and memorybandwidth While the exact order depends on the applica-tion mix correlating similarity concepts to resources showsthat certain resources are more prone to leaking informationabout a workload and their isolation should be prioritized

Finally we show how resource pressure correlates to theprobability that an application is of a specific type Figure 2show the probability that an unknown workload is a mem-cached instance with a read-mostly load and KB-range val-ues as a function of its measured resource pressure (de-tails on methodology in Section 34) We decouple the 10resources in 2D plots for clarity From the heatmaps it be-comes clear that cache activity is a very strong indicator ofworkload type with applications with very high L1-i andhigh LLC pressure corresponding to memcached with a highprobability Disk traffic also conveys a lot of information

with zero disk usage signaling a memcached workload withvery high likelihood Similar graphs can be created to fin-gerprint other application types The high heat areas aroundthe red regions of each graph correspond to memcached jobswith different rdwr ratios and value sizes and memory-bound workloads like Spark

33 ChallengesMultiple co-residents When a single application shares ahost with Bolt detection is straightforward However cloudoperators colocate VMs on the same physical host to maxi-mize the infrastructurersquos utility Disentangling the contentioncaused by several jobs is challenging By default Bolt usestwo benchmarks - one core and one uncore - to measure re-source pressure If the recommender cannot determine thetype of co-scheduled workload based on them (all Pear-son correlation coefficients below 01) either the applica-tion type has not been seen before or the resource pressureis the result of multiple co-residents If at least one of the co-residents shares a core with Bolt ie the core benchmarkreturns non-zero pressure we profile with an additional corebenchmark and use it to determine the type of co-runnerBecause hyperthreads are not shared between multiple ac-tive instances this allows accurately measuring pressure incore resources The remainder of the uncore interference isused to determine the other co-scheduled workloads Thisassumes a linear relationship between jobs for the resourcesof memory bandwidth and IO bandwidth which can intro-duce some inaccuracies but in practice does not significantlyimpact Boltrsquos detection ability (see Section 34)

Finally there are occurrences where none of the co-scheduled applications share a core with Bolt In this casethe system must rely solely on uncore resource pressure todetect applications To disentangle the resource character-istics across co-residents Bolt employs a shutter profilingmode which involves frequent brief profiling phases (10-50msec each) in uncore resources The goal of this mode isto capture at least one of the co-scheduled workloads duringa low-pressure phase which would reveal the resource usageof only one of the co-residents much like a high speed cam-era shutter attempts to capture a fast-moving object still Fig-ure 3 shows how shutter profiling works When the shutteris open (upper left figure) thus profiling is off Bolt cannotdifferentiate between the two victim VMs When the shutteris closed during profiling Bolt checks for changes in uncoreresource pressure If both victims are at high load differenti-ating between them is difficult (lower left) Similarly whenboth VMs are at low load (lower right) However when onlyVM2 is at low load the pressure Bolt captures is primarilyfrom VM1 allowing the system to detect its type and basedon the remaining pressure also detect VM2 This techniqueis particularly effective for user-interactive applications thatgo through intermittent phases of low load it is less effec-tive for mixes of services with constant steady-state loadsuch as long-running analytics or logging services We will

Server VM2 Bolt VM1

Server VM2 Bolt VM1

13

13

13

13

Server VM2 Bolt VM1

Server VM2 Bolt VM1

13

13

13

13

13

13

Figure 3 Shutter profiling mode

consider whether additional input signals such as per-jobcache miss rate curves can improve detection accuracy forthe latter workloadsApplication phases Datacenter applications are notoriousfor going through multiple phases during their executionOnline services in particular follow diurnal patterns withhigh load during the day and low load in the night [3 50 44]Moreover an application may not be immediately detectedespecially during its initialization phase Cloud users maypurchase instances to run a service and then maintain themto execute other applications eg analytics over differentdatasets In these cases the results of Boltrsquos detection canbecome obsolete over time We address this by periodicallyrepeating the profiling and detection steps for co-scheduledworkloads Each iteration takes 2-5 seconds for profilingand a few milliseconds for the recommenderrsquos analysis Sec-tion 34 shows a sensitivity study on how frequently detec-tion needs to happen

34 Detection AccuracyWe first evaluate Boltrsquos detection accuracy in a controlledenvironment where all applications are known We usea 40-machine cluster with 8 core (2-way hyperthreaded)Xeon-class servers and schedule a total of 108 workloadsincluding batch analytics in Hadoop [46] and Spark [77]and latency-critical services such as webservers mem-cached [26] and Cassandra [10] For each application typethere are several different workloads with respect to algo-rithms framework versions datasets and input load patternsFriendly applications are scheduled using a least-loaded(LL) scheduler that allocates resources on the machines withthe most available compute memory and storage All work-loads are provisioned for peak requirements to reduce inter-ference Even so interference between co-residents existssince the scheduler does not account for the sensitivity ap-plications have to contention In the end of this section weevaluate how a scheduler that accounts for cross-applicationinterference affects detection accuracy The training set con-sists of 120 diverse applications that include webserversvarious analytics algorithms and datasets and several key-value stores and databases The training set is selected toprovide sufficient coverage of the space of resource charac-teristics Figure 4 shows the pressure jobs in the training setput in compute memory storage and network bandwidth

0 20 40 60 80 100CPU Pressure ()

0

20

40

60

80

100M

emor

y P

ress

ure

()

0 20 40 60 80 100Network Pressure ()

0

20

40

60

80

100

Sto

rage

Pre

ssur

e (

)

Figure 4 Coverage of resource characteristics for appli-cations in the training set

Applications Detection accuracy ()Least Load scheduler Quasar scheduler

Aggregate 87 89memcached 78 80

Hadoop 92 92Spark 85 86

Cassandra 90 89speccpu2006 84 85

Table 1 Boltrsquos detection accuracy in the controlled ex-periment with the least loaded scheduler and Quasar

While there are clusters of applications that saturate severalresources the selected workloads cover the majority of theresource usage space This enables Bolt to match any re-source profile to information from the training set even ifthe new application has not been previously seen Increasingthe training set size further did not improve detection accu-racy Finally note that there is no overlap between trainingand testing sets in terms of algorithms datasets and inputloads

In each server we instantiate an adversarial VM runningUbuntu 1404 By default the VM has 4vCPUs (2 physicalcores) to generate enough contention (Fig 10 shows a sensi-tivity analysis on the VM size) The remainder of each ma-chine is allocated to one or more friendly VMs The max VMnumber 5 in our setting depends on the available serversFriendly VMs run on Ubuntu 1404 or Debian 80 Adver-sarial VMs have no a priori information on the number andtype of their co-residents Applications are allowed to sharea physical core but must be on different hyperthreads (vC-PUs) While sharing a single hyperthread is common in pri-vate deployments hosting batch jobs in small containers [7]it is uncommon in public clouds where 1 vCPU is the min-imum guaranteed size of dedicated resources for non-idlinginstances

Table 1 shows Boltrsquos detection accuracy per applicationclass and aggregate We signal a detection as correct if Boltidentifies correctly the framework (eg Hadoop) or service(eg memcached) the application uses and the algorithmeg SVM on Hadoop or user load characteristics eg read-vs write-heavy load We do not currently examine moredetailed features such as the distribution of individual querytypes Bolt correctly identifies 87 of jobs and for certainapplication classes like databases and analytics the accuracy

CPU

L1-i

L1-d

L2

LLC

MemCap

MemBW

NetBW

DiskCap

DiskBW

0204

0608

HadoopwordCountSCPU

L1-i

L1-d

L2

LLC

MemCap

MemBW

NetBW

DiskCap

DiskBW

0204

0608

HadooprecommenderLCPU

L1-i

L1-d

L2

LLC

MemCap

MemBW

NetBW

DiskCap

DiskBW

0204

0608

New unknown app

Similarity=029 Similarity=078

Figure 5 Star charts showing the resource profiles oftwo Hadoop and one unknown job The closer the shadedarea is to a vertex the higher the pressure in the specificresource

1 2 3 4 5Number of Applications

0

20

40

60

80

100

Acc

urac

y (

)L1-i L1-d L2 LLC CPU MemCMemBw Net DiskBw DiskC

Dominant Resource

0

20

40

60

80

100

Acc

urac

y (

)

12

5

5

1918 17

15 114 2

Figure 6 Detection accuracy as a function of the numberof co-runners (left) and the appsrsquo dominant resources(right)

exceeds 90 Misclassified jobs are typically identified asworkloads with the same or similar critical resourcesPer-application profiles Note that although in Table 1 wegroup applications by programming framework or onlineservice each framework does not correspond to a single re-source profile Profiles of applications within a frameworkcan vary greatly depending on functionality complexity anddataset features Boltrsquos recommender system matches re-source profiles to specific algorithmic classes and datasetcharacteristics within each framework Figure 5 shows anexample where two Hadoop jobs one running word counton a small dataset and one running a recommender systemon a very large dataset exhibit very different resource pro-files While the third application is also a Hadoop job it isidentified as very similar to the recommender as opposed toword countNumber of co-residents The number of victim applica-tions affects detection accuracy Figure 6a shows accuracyas a function of the number of victims per machine Whenthe number of co-residents is less than 2 accuracy exceeds95 As the number increases accuracy drops since it be-comes progressively more difficult to differentiate betweenthe contention caused by each workload When there are5 co-scheduled applications accuracy is 67 Interestinglyaccuracy is higher for 4 than for 3 applications since with 4workloads the probability of sharing a core and thus gettingan accurate measurement of core pressure is higher Whileaggressive multi-tenancy affects accuracy large numbers ofco-residents in public clouds are unlikely in practice [41]

Dominant resource We now examine the correlation be-tween the resource an application puts the most pressure onand Boltrsquos ability to correctly detect it (Figure 6) The num-bers on each bar show how many applications have each re-source as dominant In general applications that are mosteasily detected are ones with high instruction cache (L1-i)memory bandwidth network bandwidth and disk capacitypressure These are workloads that fall in one of the fol-lowing categories latency-critical services with large code-bases such as webservers in-memory analytics like Sparkand disk-bound analytics like Hadoop Interestingly L2 ac-tivity is a poor indicator of application type in contrast to L1and LLC since it does not capture a significant change in theworking set size (from 32KB to 256KB for our platforms)Number of iterations In this controlled experiment westop the detection process upon correct identification of aworkload For several jobs this requires more than one it-erations of profiling and data mining Figure 7a shows thefraction of workloads that were correctly-identified after Niterations 71 of victims only require a single iterationwhile an additional 15 required a second A small frac-tion of applications needed more than two iterations whilejobs that are not identified correctly until the sixth iterationdid not benefit from additional iterations The number of ap-plications per machine also affects the iterations required fordetection (Figure 7b) When a single job is scheduled on amachine one iteration is sufficient for almost all workloadsAs the number of victims per machine increases additionaliterations are needed to disentangle the contention signals ofeach co-scheduled job

Figure 8 shows a case of detection over several iter-ations The victim is a 4vCPU instance executing differ-ent consecutive jobs Detection occurs every 20sec by de-fault (see Fig 10 for a sensitivity study on profiling fre-quency) After the initial iteration Bolt detects the job asmcf from SPECCPU2006 In the next two iterations the in-terference profile of the victim continues to fit mcfrsquos char-acteristics At t=60sec Bolt detects that the job profile haschanged and now resembles an SVM classifier running onMahout [46] Similarly at t=180sec the victim changesagain to a Spark data mining workload Changes are typi-cally captured within a few secondsResource pressure Figure 9 shows the correlation betweenthe pressure of a victim in a given resource and Boltrsquos abilityto correctly detect it We plot detection accuracy for threecore and three uncore resources the results are similar forthe remaining four In almost all cases very low or veryhigh pressure carries the most value for detection For diskbandwidth accuracy remains high except for the 20-50 re-gion which contains many application classes with moder-ate disk activity including analytics in Hadoop databasesand graph applications Resource pressure also affects thenumber of iterations Bolt needs to correctly identify a work-load For resources with a lot of value for detection such as

1 2 3 4 5 6Iterations until Detection

0

10

20

30

40

50

60

70

80

PD

F (

)

1 2 3 4 5 6Iterations until Detection

0

20

40

60

80

100

PD

F (

)

1 app2 apps3 apps4 apps5 apps

Figure 7 PDFs of iterations required for detection intotal (left) and as a function of the co-residents number(right)

0 1 2 3 4 5 6 7Time (min)

0

20

40

60

80

100

Res

ourc

e P

ress

ure

() SPEC Hadoop Spark mem$d Cassandra L1-i

L1-dL2LLCCPUmemCapmemBwNetBwDiskCapDiskBw

Figure 8 Example of workload phase detection by Bolt

the L1-i and LLCs when pressure is moderate several iter-ations are needed for accuracy to increase In contrast foroff-chip resources like network bandwidth this effect is lesspronounced In general when moderate pressure hurts de-tection accuracy (Figure 9) more iterations are needed forcorrect identificationScheduler So far we have used a least loaded schedulerwhich is commonly-used in datacenters [33 63] This sched-uler does not account for resource contention between appli-cations leading to suboptimal performance [16 24 5 8482] Recent work has shown that if interference is accountedfor at the scheduler level both performance and utilizationimprove [48 19 53] We now evaluate the impact of sucha scheduler on Boltrsquos detection accuracy Quasar [19] lever-ages machine learning to quickly determine which applica-tions can be co-scheduled on the same machine without de-structive interference We use Quasar to schedule all victimapplications and then inject Bolt in each physical host for de-tection In a real setting Bolt can trick an interference-awarescheduler by incurring very low levels of interference ini-tially until it is co-scheduled with a victim application andcan determine its type and characteristics

Table 1 shows Boltrsquos accuracy with the least loadedscheduler (LL) and Quasar Interestingly accuracy increasesslightly with Quasar 2 on average but in general the im-pact of the scheduler is small There are two reasons for theincrease first both LL and Quasar do not share a singlehyperthread between jobs so core pressure measurementsremain accurate Second because Quasar only co-schedulesjobs with different critical resources it provides Bolt with aless ldquonoisyrdquo interference signal for uncore resources mak-ing distinguishing co-residents easier Therefore reducinginterference in software alone is not sufficient to mitigatethese security threats

0 20 40 60 80 100Resource Pressure ()

50

60

70

80

90

100

Acc

urac

y (

)

0 20 40 60 80 100Resource Pressure ()

50

60

70

80

90

100

Acc

urac

y (

)

L1-iLLCCPUMemCapNetBWDiskBW

Figure 9 Detection accuracy as a function of the pres-sure victims place in various shared resources

0 50 100 150 200 250 300Profiling Interval (sec)

0

20

40

60

80

100

Acc

urac

y (

)

0 5 10 15 20 25 30Adversarial VM size (vCPUs)

0

20

40

60

80

100

Acc

urac

y (

)

0 2 4 6 8 10Number of Benchmarks

0

20

40

60

80

100

Acc

urac

y (

)

Figure 10 Sensitivity to (a) profiling frequency (b)adversarial VM size and (c) profiling benchmarks

Sensitivity analysis Finally we examine how design de-cisions in Bolt affect detection Figure 10a shows how ac-curacy changes as profiling frequency decreases For profil-ing intervals beyond 30 sec accuracy drops rapidly If pro-filing only occurs every 5 minutes almost half the victimsare incorrectly identified Note that if workloads are long-running and mostly-stable profiling can be less frequentwithout such an impact on accuracy Figure 10b shows ac-curacy as a function of the adversarial VMrsquos size We exam-ine sizes offered as on-demand instances by EC2 [1] If theadversary has fewer than 4 vCPUs its resources are insuffi-cient to create enough contention to capture the co-residentsrsquopressure Accuracy continues to grow for larger sizes how-ever the larger the adversarial instance is the less likely itwill be co-scheduled with other VMs nullifying the valueof Bolt Finally Figure 10c shows accuracy as a function ofthe number of microbenchmarks used for profiling A singlebenchmark is not sufficient to fingerprint the characteristicsof a workload however using more than 3 benchmarks hasdiminishing returns in accuracy Unless otherwise specifiedwe use 20sec profiling intervals 4 vCPU adversarial VMsand 2 benchmarks for initial profiling

35 LimitationsWhile Bolt can accurately detect most frequently-run work-loads it has certain limitations First when no job sharesa core with the adversary the system assumes linear rela-tion between the co-residentsrsquo resource pressure in uncoreresources While this is true in some cases it may not begenerally accurate We will explore alternative ways of dis-tinguishing co-residents in future work Second the systemcannot differentiate between multiple jobs running in a sin-gle instance and multiple instances running one job eachWhile this does not affect detection if a covert-channel at-tack is aimed at a particular user more accurate informationwould increase the adversaryrsquos leverage Similarly Bolt can-not currently distinguish between two copies of the same job

sharing a platform at low load and one copy of the job thatruns at higher load

4 Cloud User StudyWe now evaluate Bolt in a real-world environment withmultiple users and unknown applications We conducted amulti-user study on Amazon EC2 subject to the followingrules

bull We have engaged 20 independent users from two aca-demic institutions (Cornell University and Stanford Uni-versity) provided them with access to a shared clusterof 200 c38xlarge instances on EC2 with 32 vCPUs and60GB of memory each and asked them to submit severalapplications of their choosing

bull On each instance we maintain a 4-vCPU VM for Bolt Allother resources are made available to the users as VMs

bull Each user submits one or more applications and all usershave equal priority ie no application will be preemptedto accommodate the job of another user

bull Users were instructed to behave amenably ie to avoidconsuming disproportionate resources so that the systemcan observe a representative job sample from all users

bull Users were also advised to pin their applications to spe-cific cores to eliminate interference from the OS sched-uler Physical cores can be shared across jobs howevereach vCPU (hardware thread) is dedicated to a single ap-plication

bull Users can optionally select the instance(s) they want tolaunch their applications on We provide a spreadsheetwith a list of all available instances and their utilizationat each point in time This information as well as any apriori information on the type number and characteris-tics of submitted applications is concealed from Bolt

bull If a user does not select an instance a least-loaded sched-uler selects the VM with the most available cores andmemory

Once users start launching jobs in the cluster we instan-tiate Bolt on each machine and activate it periodically to de-tect the type characteristics and number of co-scheduledapplications The entire experiment lasts approximately 4hours beyond which point all instances are terminated Fig-ure 11 shows the PDF of application types the different userslaunched provided by the users after the completion of theexperiment Different colors correspond to different users Atotal of 436 jobs were submitted to the cluster The applica-tion mix includes analytics jobs in Hadoop and Spark sci-entific benchmarks (Parsec Bioparallel) hardware synthesistools (Cadence VivadoHLS) multicore (zsim) and n-bodysimulations email clients web browsers web servers (http)music and video streaming services and databases amongothers Each job may take one or more vCPUs Even though

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30O

ccur

renc

esUser 1User 2User 3User 4User 5User 6User 7User 8User 9User 10

User 11User 12User 13User 14User 15User 16User 17User 18User 19User 20

1hadoop 5cadence 9MLPython 13spec 17parsec 21postgres 25ppt 29du -h 33cpu burn 37html 41cpmv 45rsync 49rm 53ix2spark 6zsim 10make 14matlab 18vim 22musicStream 26OS img 30crdel cgroup 34audacity 38cassandra 42sirius 46ping 50skype3email 7video 11mem$d 15mysql 19scala 23minebench 27pdfview 31bioparallel 35javascript 39mongoDB 43oProfile 47photoshop 51zipkin

4browser 8latex 12http server 16vivado 20php 24n-body sim 28scons 32storm 36create VMs 40mkdir 44dwnld LF 48ssh 52graphX

Figure 11 Probability distribution function of applications launched in the cloud study per user

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30

Occ

urre

nces

Correctly Identifying App Name

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30

Occ

urre

nces

Correctly Identifying App Characteristics

0 50 100 150 200Time (min)

0

50

100

150

200

Inst

ance

s

0006121824303642485460

Act

ive

Jobs

Figure 12 Detection accuracy of (a) application labels and (b) resource characteristics Fig 12(c) number of jobs perinstance

figure 11 groups applications by framework the individuallogic and datasets differ between jobs in the same frame-work Note that we have not updated the training set for thisstudy it consists of the same 120 applications described inSection 34

Figure 12a shows the number of applications that werecorrectly identified by name across the different categoriesIn total 277 jobs are correctly labeled As expected Boltcould not assign a label to application types it has not seenbefore such as email clients or image editing applicationsHowever even when Bolt cannot label an application it canstill identify the resources it is sensitive to as shown in fig-ure 12b For 385 out of 436 jobs Bolt correctly identifiesthe jobrsquos resource characteristics For performance attackssuch as the ones described in Section 5 this information issufficient to dramatically degrade the performance of a vic-tim application Finally figure 12c shows the number of co-scheduled applications per instance throughout the durationof the experiment The bottom 14 instances remained un-used for the remaining 186 instances the number of activeapplications ranges from 1 to 6 with each job taking oneor more vCPUs The majority of misclassified applicationscorrespond to instances with 5 or more concurrently activejobs

5 Security AttacksBolt makes several cloud attacks practical and difficult todetect We discuss three possible attacks that leverage theinformation obtained through detection

51 Internal Denial of Service AttackAttack setting Denial of service attacks hurt the perfor-mance of a victim service by overloading its resources [142 51] In cloud settings specifically they can be catego-rized in two types external and internal (or host-based) at-tacks External attacks are the most conventional form ofDoS [51 13 22] These attacks utilize external servers to di-rect excessive traffic to the victims flooding their resourcesand hurting their availability External DoS attacks affectmostly PaaS and SaaS systems and include IP spoofing syn-chronization (SYN) flooding smurf buffer overflow landand teardrop attacks

In contrast internal DoS attacks take advantage of IaaSand PaaS cloud multi-tenancy to launch adversarial pro-grams on the same host as the victim and degrade its perfor-mance [14 60 34 52] For example Ristenpart et al [60]showed how an adversarial user can leverage the IP namingconventions of IaaS clouds to locate a victim VM and de-grade its performance Cloud providers are starting to builddefenses against such attacks For example EC2 offers au-toscaling that will increase the instances of a service un-der heavy resource usage Similarly there is related workon mechanisms that detect saturation in memory bandwidth

0 20 40 60 80 100 120Time (s)

0

2000

4000

6000

8000

10000

12000

99th

ile

Lat

ency

BoltNaive

Detection

Migration

0 20 40 60 80 100 120Time (s)

0

20

40

60

80

100

CP

U U

tiliz

atio

n (

) BoltNaive

Detection

Migration

Figure 13 Latency and utilization with Bolt and a naıveDoS attack that saturates CPU resources

and can trigger VM migration [71] This means that DoSattacks that simply overload a physical host are ineffectivewhen such defenses are in place We focus on host-basedDoS attacks and make them resilient against such defensesby avoiding resource saturation We leverage the informationobtained by Bolt to construct custom interference programsthat stress the resources the victim is most sensitive to with-out overloading the host

Bolt creates custom contentious workloads by combiningthe microbenchmarks used to measure the resource pressureof the victim Since these microbenchmarks are tunable Boltconfigures their intensity to a higher point than their mea-sured pressure ci during detection For example if a victimis detected as memcached and its most critical resources arethe L1-i cache (81 pressure) and the last level cache (LLC)(78 pressure) Bolt uses the two microbenchmarks for L1-i and LLC at a higher intensity than what memcached cantolerateImpact We use the DoS attack setting above against the 108applications of the controlled experiment (Section 34) TheDoS attack results in degraded performance by 22x on aver-age and up to 98x in terms of execution time Degradation ismuch more pronounced for interactive workloads like key-value stores with tail latency increasing by 8-140x a dra-matic impact for applications with strict tail latency SLAs

We now examine the impact of the DoS attack on uti-lization If it translates to resource saturation there is a highprobability that the DoS will be detected and the victim mi-grated to a new machine The experimental cluster supportslive migration If CPU utilization (sampled every 1 sec) ex-ceeds 70 the victim is migrated to an unloaded host Fig-ure 13 compares the tail latency and CPU utilization withBolt to that of a naıve DoS that saturates the CPU through acompute-intensive kernel We focus on a single victim VMrunning memcached The overhead of migration (time be-tween initiating migration and latency returning to normallevels) for this VM is 8sec Performance degradation is sim-ilar for both systems until t=80sec at which point the victimis migrated to a new host due to the compute-intensive kernelcausing utilization to exceed 70 While during migrationperformance continues to degrade once the VM resumes inthe new server latency returns to nominal levels In contrastBolt keeps utilization low and impacts the victimrsquos perfor-mance beyond t=80sec

52 Resource Freeing AttackAttack setting The goal of a resource freeing attack (RFA)is to modify the workload of a victim such that it frees upresources for the adversary improving its performance [6766] The adversarial VM consists of two components a ben-eficiary and a helper The beneficiary is the program whoseperformance the attacker wants to improve and the helperis the program that forces the victim to yield its resourcesThe RFA works by adding load in the victimrsquos critical re-sources causing other resources to free up For example ifthe victim is a memory-bound Spark job introducing addi-tional memory traffic will result in Spark stalling in memoryand lessening its pressure in the other resources until it canreclaim its required memory bandwidth When launched ina public cloud the victim of an RFA ends up paying moreand achieving worse performance compared to running inisolation

We now create a proof of concept RFA for a webserver anetwork-bound Hadoop job and a memory-bound Spark jobLaunching RFAs requires in depth knowledge of the victimrsquosresource requirements [67] which Bolt provides by identi-fying the victimrsquos dominant resource Once the dominant re-source is detected the runtime applies a custom helper pro-gram that saturates this critical resource For the webserverthe helper is a CPU-intensive benchmark launching CGI re-quests causing the victim to saturate its CPU usage servic-ing fewer real user requests and freeing up cache resourcesFor the Hadoop job we use a network-intensive benchmarksimilar to iperf which saturates network bandwidth andfrees up CPU and memory resources for the beneficiarySimilarly for Spark we use a streaming memory benchmarkthat slows down k-means and frees network and compute re-sources for the adversary The selection of the beneficiaryis of lesser importance without loss of generality we se-lect mcf a CPU-intensive benchmark from SPECCPU2006The same methodology can be used with other beneficiariesconditioned to their critical resource not overlapping withthe victimrsquosImpact Table 2 shows the performance degradation for thethree victim applications and the improvement in executiontime for the beneficiary The webserver suffers the most interms of queries per second as the helperrsquos CGI requestspollute its cache preventing it from servicing legitimate userrequests Hadoop and Spark experience significant degrada-tions in execution time due to network and memory band-width saturation respectively Execution time for mcf im-proves by 16-38 benefiting from the victim stalling in itscritical resource to improve its cache and CPU usage

53 VM Co-residency DetectionAttack setting Sections 34 and 4 showed that we can iden-tify the applications sharing a cloud infrastructure Howevera malicious user is rarely interested in a random service run-ning on a public cloud More often they target a specific

Victim Beneficiary TargetApp Perf App Perf Resource

Apache Webserver -64 (QPS) mcf +24 CPUHadoop (SVM) -36 (Exec) mcf +16 Network BW

Spark (k-means) -52 (Exec) mcf +38 Memory BW

Table 2 RFA impact on the victims and beneficiaries

workload eg a competitive e-commerce site Thereforethey need to pinpoint where the target resides in a practicalmanner This requires a launch strategy and a mechanism forco-residency detection [69] The attack is practical if the tar-get is located with high confidence in reasonable time andwith modest resource costs Bolt enables this attack to becarried out in a practical way and remains resilient againstdefense mechanisms such as virtual private clouds (VPCs)which make internal IP addresses private to a single tenantOnce a target VM is located the adversary can launch RFAor DoS attacks as previously described Co-residency detec-tion attacks rely on leakage of logical information eg IPaddress or on observing the impact of a side-channel attackdue to resource contention Bolt relies on a variation of thelatter approach

Assume a system of N servers A target victim userlaunches k VMs The adversary launches n malicious VMsInstances are launched simultaneously to avoid malicious-malicious instance co-residency [69] The probability that atleast one adversarial VM will be co-scheduled with a victiminstance is P( f ) = 1minus (1minus kN)n The adversary then usesa two-process detection scheme The sender is a process thatcreates contention to degrade the victimrsquos performance Thereceiver is the process that detects this degradation runningeither on the same host as the victim (cooperative detection)or externally (uncooperative detection) We assume an unco-operative victim which is the more general case Bolt worksas the sender to determine the type of co-residents on eachsampled host and identify any of the desired type Assumem VMs of the desired type have been detected in the sampleset Bolt now needs to prune down this space to VM(s) fromthe specific target victim It launches an external receiveraiming at the victim over a public channel eg HTTP orkey-value protocol to ping the victim service This is theonly point of communication with the victim At the sametime the sender incurs contention in the resources the victimis most sensitive to If the sender and victim are co-residentsthe receiverrsquos requests will be slower due to contention Boltquickly prunes down the sampled VMs by determining thetype of co-residents In a large cluster this reduces the timeto pinpoint a victim and the resource costs for the adver-sary It also increases the confidence in the detection in thepresence of noise from other co-residentsImpact We evaluate co-residency detection on the same40-node cluster The victim is a SQL server instantiatinga single VM The cluster also hosts 7 other VMs runningSQL Other active VMs host key-value stores Hadoop andSpark jobs Bolt launches 10 senders simultaneously on

randomly-selected machines (P( f ) = ) and detects 3VMs running SQL in the sample set It then introduces mem-ory interference as the receiver initiates a large number ofSQL queries While these queries have 816msec mean la-tency without interference latency now becomes 2614msecGiven a sim 3times latency increase we can conclude that one ofthe co-residents is the victim SQL server Detection required6sec from instantiation to receiver detection and 11 adver-sary VMs

6 Improving Security via Resource IsolationSince interference is at the heart of Boltrsquos detection method-ology isolation mechanisms that attenuate contention shouldreduce its accuracy We first evaluate to what extent ex-isting isolation techniques mitigate security vulnerabilitiesand then highlight trade-offs between security performanceand utilization We use the same setting as in the controlledexperiment of Section 34 We evaluate 4 resource-specificisolation techniques and 3 settings for OS-level isolationmechanisms baremetal containerized and virtualized Thebaremetal system does not employ any OS level isolationThe containerized setup uses Linux containers (lxc) andassigns cores to applications via cpuset cgroups Bothcontainers and VMs constrain memory capacity Baremetalexperiments do not enforce memory capacity allocationsand the Linux scheduler is free to float applications acrosscores

The first resource-specific isolation techniques is threadpinning to physical cores to constrain interference fromscheduling actions like context switching The number ofcores an application is allocated can change dynamicallyand is limited by how fast Linux can migrate tasks typicallyin the tens of milliseconds

For network isolation we use the outbound networkbandwidth partitioning capabilities of Linuxrsquos traffic con-trol Specifically we use the qdisc [8] scheduler with hier-archical token bucket queueing discipline (HTB) to enforceegress bandwidth limits The limits are set to the maximumtraffic burst rate for each application to avoid contention(ceil parameter in HTB interface) Ingress network band-width isolation has been extensively studied in previouswork [36] these approaches can be applied here as well

For DRAM bandwidth isolation there is no commer-cially available partitioning mechanism To enforce band-width isolation we use the following approach we monitorthe DRAM bandwidth usage of each application throughperformance counters [45] and modify the scheduler to onlycolocate jobs on machines that can accommodate their ag-gregate peak memory bandwidth requirements This requiresextensive application knowledge and is used simply to high-light the benefits of DRAM bandwidth isolation

Finally for last level cache (LLC) isolation we use theCache Allocation Technology (CAT) available in recent In-tel chips [11] CAT partitions the LLC in ways which in

0

20

40

60

80

100

Acc

urac

y (

)

None ThreadPinning

+Net BWPartitioning

+Mem BWPartitioning

+CachePartitioning

+CoreIsolation

BaremetalLinuxContainersVirtualMachines

Figure 14 Detection accuracy with isolation techniques

a highly-associative cache allows for non-overlapping par-titions at the granularity of a few percent of the LLC ca-pacity Each co-resident is allocated one partition configuredto its current capacity requirements [57] Partitions can beresized at runtime by reprogramming specific low-level reg-isters (MSRs) changes take effect after a few milliseconds

We add one isolation mechanism at a time in the threesystem settings (baremetal containers VMs) Figure 14shows the impact of isolation techniques on Boltrsquos detectionaccuracy As expected when no isolation is used baremetalallows for a significantly higher detection accuracy thancontainer- and VM-based systems mostly due to the lat-ter constraining core and memory capacity usage As a re-sult introducing thread pinning benefits baremetal the mostsince it reduces core contention It also benefits container-and VM-based setups to a lesser degree by eliminating un-predictability introduced by the Linux scheduler (eg con-text switching) [40] The dominant resource of each appli-cation determines which isolation technique benefits it themost Thread pinning mostly benefits workloads bound byon-chip resources such as L1L2 caches and cores Addingnetwork bandwidth partitioning lowers detection accuracyfor all three settings almost equally It primarily benefitsnetwork-bound workloads for which network interferenceconveys the most information for detection Memory band-width isolation further reduces accuracy by 10 on averagebenefiting jobs dominated by DRAM traffic Finally cachepartitioning has the most dramatic reduction in accuracyespecially for LLC-bound workloads We attribute this tothe importance cache pressure has as a detection signal Thenumber of co-residents also affects the extent to which iso-lation helps The more co-scheduled applications exist permachine the more isolation techniques degrade accuracy asthey make distinguishing between co-residents harder

Unfortunately even when all the techniques are on ac-curacy is still 50 due to two reasons current techniquesare not fine-grain enough to allow strict and scalable isola-tion and core resources (L1L2 caches CPU) are prone tointerference due to contending hyperthreads [61] To eval-uate the latter hypothesis we modify the scheduler suchthat hyperthreads of different jobs are never scheduled onthe same physical core eg if a job needs 7vCPUs it isallocated 4 dedicated physical cores The grey bars of Fig-ure 14 show the detection accuracy Baremetal instances stillallow certain applications to be detected but for container-

ized and virtualized settings accuracy drops to 14 The re-maining accuracy corresponds to disk-heavy workloads Im-proving security however comes at a performance penaltyof 34 on average in execution time as threads of the samejob are forced to contend with one another Alternativelyusers can overprovision their resource reservations to avoiddegradation which results in a 45 drop in utilization Thismeans that the cloud provider cannot leverage CPU idlenessto share machines decreasing the cost benefits of cloud com-puting Note that enforcing core isolation alone is also notsufficient as it allows a detection accuracy of 46Discussion The previous analysis highlights a design prob-lem with current datacenter platforms Traditional multi-cores are prone to contention which will only worsen asmore cores are integrated in each server and multi-tenancybecomes more pronounced Existing isolation techniques areinsufficient to mitigate security vulnerabilities and tech-niques that provide reasonable security guarantees sacrificeperformance or cost efficiency through low utilization Thishighlights the need for new fine-grain and coordinated iso-lation techniques that guarantee security at high utilizationfor shared resources

7 ConclusionsWe have presented Bolt a practical system that uses onlinedata mining techniques to identify the type and character-istics of applications running on shared cloud systems andenables practical attacks that degrade their performance Ina 40-server cluster Bolt correctly identifies 87 out of 108workloads and degrades tail latency by up to 140x We alsoused Bolt in a user study on EC2 to detect the character-istics of unknown jobs submitted by multiple users Finallywe show that while existing isolation techniques are helpfulthey are not sufficient to mitigate such attacks Bolt revealsreal easy-to-exploit threats in public clouds We hope thatthis work will motivate cloud providers and computer sci-entists to develop and deploy stricter isolation primitives incloud systems

AcknowledgementsWe sincerely thank all the participants in the user study fortheir time and effort We also thank Daniel Sanchez EdSuh Grant Ayers Mingyu Gao Ana Klimovic and the restof the MAST group as well as the anonymous reviewersfor their feedback on earlier versions of this manuscriptThis work was supported by the Stanford Platform LabNSF grant CNS-1422088 and a John and Norma BalenSesquicentennial Faculty Fellowship

References[1] Amazon ec2 httpawsamazoncomec2

[2] Aman Bakshi and Yogesh B Dujodwala Securing cloudfrom ddos attacks using intrusion detection system in virtualmachine In Proc of the Second International Conference onCommunication Software and Networks (ICCSN) 2010

[3] Luiz Barroso and Urs Hoelzle The Datacenter as a Com-puter An Introduction to the Design of Warehouse-Scale Ma-chines MC Publishers 2009

[4] Naomi Benger Joop van de Pol Nigel P Smart and YuvalYarom rdquoooh aah just a little bitrdquo A small amount of sidechannel can go a long way In Proc of the InternationalCryptographic Hardware and Embedded Systems Workshop(CHES) Busan South Korea 2014

[5] Major Bhadauria and Sally A McKee An approach toresource-aware co-scheduling for cmps In Proc of the 24thACM International Conference on Supercomputing (ICS)Tsukuba Japan 2010

[6] Leon Bottou Large-scale machine learning with stochas-tic gradient descent In Proceedings of the InternationalConference on Computational Statistics (COMPSTAT) ParisFrance 2010

[7] Eric Brewer Kubernetes The path to cloud native httpgooglQgkzYB SOCC Keynote August 2015

[8] Martin A Brown Traffic control howto httplinux-ipnetarticlesTraffic-Control-HOWTO

[9] Robin Burke Hybrid recommender systems Survey andexperiments User Modeling and User-Adapted Interaction12(4)331ndash370 November 2002

[10] Apache cassandra httpcassandraapacheorg

[11] Intel Rcopy64 and IA-32 Architecture Software DeveloperrsquosManual vol3B System Programming Guide Part 2 Septem-ber 2014

[12] Ludmila Cherkasova Diwaker Gupta and Amin VahdatComparison of the three cpu schedulers in xen SIGMET-RICS Perform Eval Rev 35(2)42ndash51 September 2007

[13] Scott A Crosby and Dan S Wallach Denial of service viaalgorithmic complexity attacks In Proceedings of the 12thConference on USENIX Security Symposium WashingtonDC 2003

[14] Marwan Darwish Abdelkader Ouda and Luiz FernandoCapretz Cloud-based ddos attacks and defenses In Procof i-Society Toronto ON 2013

[15] Christina Delimitrou and Christos Kozyrakis iBench Quan-tifying Interference for Datacenter Workloads In Proceed-ings of the 2013 IEEE International Symposium on WorkloadCharacterization (IISWC) Portland OR September 2013

[16] Christina Delimitrou and Christos Kozyrakis Paragon QoS-Aware Scheduling for Heterogeneous Datacenters In Pro-ceedings of the Eighteenth International Conference on Ar-chitectural Support for Programming Languages and Oper-ating Systems (ASPLOS) Houston TX USA 2013

[17] Christina Delimitrou and Christos Kozyrakis QoS-AwareScheduling in Heterogeneous Datacenters with Paragon InACM Transactions on Computer Systems (TOCS) Vol 31Issue 4 December 2013

[18] Christina Delimitrou and Christos Kozyrakis Quality-of-Service-Aware Scheduling in Heterogeneous Datacenterswith Paragon In IEEE Micro Special Issue on Top Picks fromthe Computer Architecture Conferences MayJune 2014

[19] Christina Delimitrou and Christos Kozyrakis Quasar Resource-Efficient and QoS-Aware Cluster Management In Proceed-ings of the Nineteenth International Conference on Architec-tural Support for Programming Languages and OperatingSystems (ASPLOS) Salt Lake City UT USA 2014

[20] Christina Delimitrou and Christos Kozyrakis HCloudResource-Efficient Provisioning in Shared Cloud SystemsIn Proceedings of the Twenty First International Conferenceon Architectural Support for Programming Languages andOperating Systems (ASPLOS) April 2016

[21] Christina Delimitrou Daniel Sanchez and Christos KozyrakisTarcil Reconciling Scheduling Speed and Quality in LargeShared Clusters In Proceedings of the Sixth ACM Symposiumon Cloud Computing (SOCC) August 2015

[22] Jake Edge Denial of service via hash collisions httplwnnetArticles474912 January 2012

[23] Benjamin Farley Ari Juels Venkatanathan Varadarajan ThomasRistenpart Kevin D Bowers and Michael M Swift More foryour money Exploiting performance heterogeneity in publicclouds In Proc of the ACM Symposium on Cloud Computing(SOCC) San Jose CA 2012

[24] Alexandra Fedorova Margo Seltzer and Michael D SmithImproving performance isolation on chip multiprocessors viaan operating system scheduler In Proceedings of the 16thIntl Conference on Parallel Architecture and CompilationTechniques (PACT) Brasov Romania 2007

[25] Alexander Felfernig and Robin Burke Constraint-based rec-ommender systems Technologies and research issues InProceedings of the ACM International Conference on Elec-tronic Commerce (ICEC) Innsbruck Austria 2008

[26] Brad Fitzpatrick Distributed caching with memcached InLinux Journal Volume 2004 Issue 124 2004

[27] Oded Goldreich and Rafail Ostrovsky Software protectionand simulation on oblivious rams J ACM 43(3)431ndash473May 1996

[28] Google container engine httpscloudgooglecomcontainer-engine

[29] Asela Gunawardana and Christopher Meek A unified ap-proach to building hybrid recommender systems In Proc ofthe Third ACM Conference on Recommender Systems (Rec-Sys) New York NY 2009

[30] Sanchika Gupta and Padam Kumar Vm profile based opti-mized network attack pattern detection scheme for ddos at-tacks in cloud In Proc of SSCC Mysore India 2013

[31] Yi Han Tansu Alpcan Jeffrey Chan and Christopher LeckieSecurity games for virtual machine allocation in cloud com-puting In 4th International Conference on Decision andGame Theory for Security Fort Worth TX 2013

[32] Amir Herzberg Haya Shulman Johanna Ullrich and EdgarWeippl Cloudoscopy Services discovery and topology map-ping In Proceedings of the ACM Workshop on Cloud Com-puting Security Workshop (CCSW) Berlin Germany 2013

[33] Ben Hindman Andy Konwinski Matei Zaharia Ali GhodsiAnthony D Joseph Randy Katz Scott Shenker and Ion Sto-ica Mesos A platform for fine-grained resource sharing inthe data center In Proceedings of NSDI Boston MA 2011

[34] Jingwei Huang David M Nicol and Roy H CampbellDenial-of-service threat to hadoopyarn clusters with multi-tenancy In Proc of the IEEE International Congress on BigData Washington DC 2014

[35] Alexandru Iosup Nezih Yigitbasi and Dick Epema Onthe performance variability of production cloud services InProceedings of CCGRID Newport Beach CA 2011

[36] Vimalkumar Jeyakumar Mohammad Alizadeh David MazieresBalaji Prabhakar Changhoon Kim and Albert GreenbergEyeq Practical network performance isolation at the edge InProc of the 10th USENIX Conference on Networked SystemsDesign and Implementation (NSDI) Lombard IL 2013

[37] Yaakoub El Khamra Hyunjoo Kim Shantenu Jha and Man-ish Parashar Exploring the performance fluctuations of hpcworkloads on clouds In Proceedings of CloudCom Indi-anapolis IN 2010

[38] Krzysztof C Kiwiel Convergence and efficiency of subgra-dient methods for quasiconvex minimization In Mathemat-ical Programming (Series A) (Berlin Heidelberg Springer)90 (1) pp 1-25 2001

[39] Ruby B Lee Rethinking computers for cybersecurity IEEEComputer 48(4)16ndash25 2015

[40] Jacob Leverich and Christos Kozyrakis Reconciling highserver utilization and sub-millisecond quality-of-service InProceedings of EuroSys Amsterdam The Netherlands 2014

[41] Host server cpu utilization in amazon ec2 cloud httpgoogl2LTx4T

[42] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser andRuby B Lee Last-level cache side-channel attacks are prac-tical In Proc of IEEE Symposium on Security and Privacy(SampP) San Jose CA 2015

[43] Fei Liu Lanfang Ren and Hongtao Bai Mitigating cross-vm side channel attack on multiple tenants cloud platform InJournal of Computers Vol 9 No 4 (2014) 1005-1013 April2014

[44] David Lo Liqun Cheng Rama Govindaraju Luiz Andre Bar-roso and Christos Kozyrakis Towards energy proportionalityfor large-scale latency-critical workloads In Proceedings ofthe 41st Annual International Symposium on Computer Ar-chitecuture (ISCA) Minneapolis MN 2014

[45] David Lo Liqun Cheng Rama Govindaraju ParthasarathyRanganathan and Christos Kozyrakis Heracles Improvingresource efficiency at scale In Proc of the 42Nd AnnualInternational Symposium on Computer Architecture (ISCA)Portland OR 2015

[46] Mahout httpmahoutapacheorg

[47] Dave Mangot Ec2 variability The numbers revealed httptechmangotcomrollerdaveentryec2_variability_the_numbers_revealed

[48] Jason Mars and Lingjia Tang Whare-map heterogeneity inrdquohomogeneousrdquo warehouse-scale computers In Proceedingsof ISCA Tel-Aviv Israel 2013

[49] Robert Martin John Demme and Simha SethumadhavanTimewarp Rethinking timekeeping and performance moni-toring mechanisms to mitigate side-channel attacks In Pro-ceedings of the International Symposium on Computer Archi-tecture (ISCA) Portland OR 2012

[50] David Meisner Christopher M Sadler Luiz Andre BarrosoWolf-Dietrich Weber and Thomas F Wenisch Power man-agement of online data-intensive services In Proceedings ofthe 38th annual international symposium on Computer archi-tecture pages 319ndash330 2011

[51] Jelena Mirkovic and Peter Reiher A taxonomy of ddos attackand ddos defense mechanisms ACM SIGCOMM ComputerCommunication Review (CCR) April 2004

[52] Thomas Moscibroda and Onur Mutlu Memory performanceattacks Denial of memory service in multi-core systemsIn Proc of 16th USENIX Security Symposium on USENIXSecurity Symposium (SS) Boston MA 2007

[53] Ripal Nathuji Aman Kansal and Alireza Ghaffarkhah Q-clouds Managing performance interference effects for qos-aware clouds In Proceedings of EuroSys ParisFrance 2010

[54] Simon Ostermann Alexandru Iosup Nezih Yigitbasi RaduProdan Thomas Fahringer and Dick Epema A performanceanalysis of ec2 cloud computing services for scientific com-puting In Lecture Notes on Cloud Computing Volume 34p115-131 2010

[55] Tao Peng Christopher Leckie and Kotagiri RamamohanaraoSurvey of network-based defense mechanisms countering thedos and ddos problems ACM Comput Surv 39(1) April2007

[56] Diego Perez-Botero Jakub Szefer and Ruby B Lee Char-acterizing hypervisor vulnerabilities in cloud computingservers In Proceedings of the 2013 International Work-shop on Security in Cloud Computing SCCASIACCSHangzhou China 2013

[57] Moinuddin K Qureshi and Yale N Patt Utility-based cachepartitioning A low-overhead high-performance runtimemechanism to partition shared caches In Proceedings ofthe 39th Annual IEEEACM International Symposium on Mi-croarchitecture MICRO 39 2006

[58] Himanshu Raj Ripal Nathuji Abhishek Singh and Paul Eng-land Resource management for isolation enhanced cloud ser-vices In Proc of the ACM Workshop on Cloud ComputingSecurity (CCSW) Chicago IL 2009

[59] Suhail Rehman and Majd Sakr Initial findings for provision-ing variation in cloud computing In Proceedings of Cloud-Com Indianapolis IN 2010

[60] Thomas Ristenpart Eran Tromer Hovav Shacham and Ste-fan Savage Hey you get off of my cloud Exploring infor-mation leakage in third-party compute clouds In Proc of theACM Conference on Computer and Communications Security(CCS) Chicago IL 2009

[61] Daniel Sanchez and Christos Kozyrakis Vantage Scalableand Efficient Fine-Grain Cache Partitioning In Proceedingsof the 38th annual International Symposium in ComputerArchitecture (ISCA-38) San Jose CA June 2011

[62] Jorg Schad Jens Dittrich and Jorge-Arnulfo Quiane-RuizRuntime measurements in the cloud Observing analyzingand reducing variance Proceedings VLDB Endow 3(1-2)460ndash471 September 2010

[63] Malte Schwarzkopf Andy Konwinski Michael Abd-El-Malek and John Wilkes Omega flexible scalable sched-ulers for large compute clusters In Proceedings of EuroSysPrague Czech Republic 2013

[64] Alan Shieh Srikanth Kandula Albert Greenberg and ChanghoonKim Seawall Performance isolation for cloud datacenternetworks In Proc of the USENIX Conference on Hot Topicsin Cloud Computing (HotCloud) Boston MA 2010

[65] David Shue Michael J Freedman and Anees Shaikh Perfor-mance isolation and fairness for multi-tenant cloud storage InProc of the 10th USENIX Conference on Operating SystemsDesign and Implementation (OSDI) Hollywood CA 2012

[66] Dan Tsafrir Yoav Etsion and Dror G Feitelson Secretlymonopolizing the cpu without superuser privileges In Procof 16th USENIX Security Symposium on USENIX SecuritySymposium Boston MA 2007

[67] Venkatanathan Varadarajan Thawan Kooburat BenjaminFarley Thomas Ristenpart and Michael Swift Resource-freeing attacks Improve your cloud performance (at yourneighborrsquos expense) In Proc of the Conference on Computerand Communications Security (CCS) Raleigh 2012

[68] Venkatanathan Varadarajan Thomas Ristenpart and MichaelSwift Scheduler-based defenses against cross-vm side-channels In Proc of the 23rd Usenix Security SymposiumSan Diego CA 2014

[69] Venkatanathan Varadarajan Yinqian Zhang Thomas Risten-part and Michael Swift A placement vulnerability study inmulti-tenant public clouds In Proc of the 24th USENIX Secu-rity Symposium (USENIX Security) Washington DC 2015

[70] Huaibin Wang Haiyun Zhou and Chundong Wang Vir-tual machine-based intrusion detection system framework incloud computing environment In Journal of Computers Oc-tober 2012

[71] Hui Wang Canturk Isci Lavanya Subramanian JongmooChoi Depei Qian and Onur Mutlu A-drm Architecture-aware distributed resource management of virtualized clus-ters In Proceedings of the 11th ACM SIGPLANSIGOPSinternational conference on Virtual Execution Environments(VEE) Istanbul Turkey 2015

[72] Ian H Witten Eibe Frank and Geoffrey Holmes DataMining Practical Machine Learning Tools and Techniques3rd Edition

[73] Zhenyu Wu Zhang Xu and Haining Wang Whispers inthe hyper-space High-speed covert channel attacks in thecloud In Proc of the 21st USENIX Conference on SecuritySymposium (USENIX Security) Bellevue WA 2012

[74] Yunjing Xu Michael Bailey Farnam Jahanian KaustubhJoshi Matti Hiltunen and Richard Schlichting An explo-ration of l2 cache covert channels in virtualized environmentsIn Proc of the 3rd ACM Workshop on Cloud Computing Se-curity Workshop (CCSW) Chicago IL 2011

[75] Zhang Xu Haining Wang and Zhenyu Wu A measurementstudy on co-residence threat inside the cloud In Proc ofthe 24th USENIX Security Symposium (USENIX Security)Washington DC 2015

[76] Yuval Yarom and Katrina Falkner Flush+reload a highresolution low noise l3 cache side-channel attack In Proc ofthe 23rd Usenix Security Symposium San Diego CA 2014

[77] Matei Zaharia Mosharaf Chowdhury Tathagata Das AnkurDave Justin Ma Murphy McCauly Michael J FranklinScott Shenker and Ion Stoica Resilient distributed datasetsA fault-tolerant abstraction for in-memory cluster computingIn Proceedings of NSDI San Jose CA 2012

[78] Yinqian Zhang Ari Juels Alina Oprea and Michael K Re-iter Homealone Co-residency detection in the cloud via side-channel analysis In Proc of the IEEE Symposium on Securityand Privacy Oakland CA 2011

[79] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-tenant side-channel attacks in paas cloudsIn Proc of the ACM SIGSAC Conference on Computer andCommunications Security (CCS) Scottsdale AZ 2014

[80] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-vm side channels and their use to ex-tract private keys In Proceedings of the ACM Conferenceon Computer and Communications Security (CCS) RaleighNC 2012

[81] Yinqian Zhang and Michael K Reiter Duppel retrofittingcommodity operating systems to mitigate cache side channelsin the cloud In Proc of the ACM Conference on Computerand Communications Security (CCS) Berlin Germany 2013

[82] Fangfei Zhou Manish Goel Peter Desnoyers and Ravi Sun-daram Scheduler vulnerabilities and coordinated attacks incloud computing J Comput Secur 21(4)533ndash559 July2013

[83] Jieming Zhu Pinjia He Zibin Zheng and Michael R LyuTowards online accurate and scalable qos prediction for run-time service adaptation In Proc of the IEEE InternationalConference on Distributed Computing Systems (ICDCS)Madrid Spain 2014

[84] Sergey Zhuravlev Sergey Blagodurov and Alexandra Fe-dorova Addressing shared resource contention in multicoreprocessors via scheduling In Proc of the Fifteenth Editionof on Architectural Support for Programming Languages andOperating Systems (ASPLOS) Pittsburgh PA 2010

  • Introduction
  • Related Work
  • Bolt
    • Threat Model
    • Application Detection
    • Challenges
    • Detection Accuracy
    • Limitations
      • Cloud User Study
      • Security Attacks
        • Internal Denial of Service Attack
        • Resource Freeing Attack
        • VM Co-residency Detection
          • Improving Security via Resource Isolation
          • Conclusions
Page 4: Bolt: I Know What You Did Last Summer In the Cloudcsl.stanford.edu/~christos/publications/2017.bolt.asplos.pdf · 2018-01-09 · Bolt: I Know What You Did Last Summer... In the Cloud

ities in terms of heterogeneity interference sensitivity andprovisioning The advantage of collaborative filtering is thatit can identify application similarities without a priori knowl-edge of application types and critical features Unfortunatelythis means that it is also unable to label the victim workloadsand classify their functionality making it insufficient for theadversaryrsquos purposePractical data mining Bolt instead feeds the profiling sig-nal to a hybrid recommender using feature augmentation [925] that determines the type and resource characteristicsof victim workloads The recommender combines collab-orative filtering and content-based similarity detection [929] The former has good scaling properties relaxed spar-sity constraints and offers conceptual insight on similaritieswhile the latter exploits contextual information for accurateresource profile matching

First a collaborative filtering system recovers the pres-sure the victim places in non-profiled resources [16 83]The system relies on matrix factorization with singularvalue decomposition (SVD) [72] and PQ-reconstructionwith stochastic gradient descent (SGD) [6 38] to find simi-larities between the new victim application and previously-seen workloads SVD produces three matrices U Σ and VThe singular values σi in Σ correspond to similarity con-cepts such as the intensity of compute operations or thecorrelation between high network and disk traffic Similar-ity concepts are in the order of the number of examinedresources and are ordered by decreasing magnitude in ΣLarge singular values reveal stronger similarity concepts(higher confidence in workload correlation) while smallervalues correspond to weaker similarity concepts and aretypically discarded during the dimensionality reduction bySVD Matrix U(mr) captures the correlation between eachapplication and similarity concept and V(nr) the correlationbetween each resource and similarity concept

Once we have identified critical similarity concepts anddiscarded inconsequential information a content-based sys-tem that uses weighted Pearson correlation coefficients tocompute the similarity between the resource profile u j of anew application j and the applications the system has previ-ously seen Weights correspond to the values of the r morecritical similarity concepts 1 and resource profiles to therows of the matrix of left singular vectors U Using tra-ditional Pearson correlation would discard the application-specific information that certain resources are more criticalfor a given workload Similarity between applications A andB is given by

Weighted Pearson(AB) =cov(uAuBσ)radic

cov(uAuAσ) middot cov(uBuBσ)(1)

where uA is the correlation of application A with each simi-larity concept σi cov(uAuBσ)= sumi σi(uAiminusm(uAσ))(uBiminusm(uBσ))

sumi σi

1 We keep the r largest singular values such that we preserve 90 of thetotal energy sum

ri=1 σ2

i = 90sumni=1 σ2

i

0 25 50 75 100L1-i Cache ()

100

75

50

25

0Last

Lev

el C

ache

(

)

0 25 50 75 100L1-d Cache ()

100

75

50

25

0

CP

U (

)

0 25 50 75 100Memory Capacity ()

100

75

50

25

0Mem

ory

Ban

dwid

th (

)

0 5 10 15 200

5

10

15

20

00

01

02

03

04

05

06

07

08

09

10

Pro

babi

lity

0 25 50 75 100Disk Capacity ()

100

75

50

25

0Net

wor

k B

andw

idth

(

)

0 25 50 75 100Disk Bandwidth ()

100

75

50

25

0

L2 C

ache

(

)

0 5 10 15 200

5

10

15

20

00

01

02

03

04

05

06

07

08

09

10

Pro

babi

lity

Figure 2 We show the correlation between the pres-sure in various resources and the probability of a co-scheduled application being memcached For exampleworkloads with high LLC pressure and very high L1-ipressure have a high probability of being memcached

the covariance of A and B under weights σ and m(uAσ) =sumi σimiddotuAi

sumi σithe weighted mean for A The output of the hybrid

recommender is a distribution of similarity scores of howclosely a victim resembles different previously-seen appli-cations For example a victim may be 65 similar to amemcached workload 18 similar to a Spark job runningPageRank 10 similar to a Hadoop job running an SVMclassifier and 3 to a Hadoop job running k-means The95th percentile of the recommenderrsquos end-to-end latency is80msec Apart from application labels this analysis yieldsinformation on the resources the victim is sensitive to en-abling several practical performance attacks (Section 5)System insights from data mining Before dimensionalityreduction each similarity concept corresponds to a sharedresource Different resources convey different amounts of in-formation about a workload and thus have different value toBolt for detection The magnitude of each similarity conceptreflects how strongly it captures application similarities Forthe controlled experiment of Section 34 that involves batchand interactive jobs the resources with most value are the LLand L1-i caches followed by compute intensity and memorybandwidth While the exact order depends on the applica-tion mix correlating similarity concepts to resources showsthat certain resources are more prone to leaking informationabout a workload and their isolation should be prioritized

Finally we show how resource pressure correlates to theprobability that an application is of a specific type Figure 2show the probability that an unknown workload is a mem-cached instance with a read-mostly load and KB-range val-ues as a function of its measured resource pressure (de-tails on methodology in Section 34) We decouple the 10resources in 2D plots for clarity From the heatmaps it be-comes clear that cache activity is a very strong indicator ofworkload type with applications with very high L1-i andhigh LLC pressure corresponding to memcached with a highprobability Disk traffic also conveys a lot of information

with zero disk usage signaling a memcached workload withvery high likelihood Similar graphs can be created to fin-gerprint other application types The high heat areas aroundthe red regions of each graph correspond to memcached jobswith different rdwr ratios and value sizes and memory-bound workloads like Spark

33 ChallengesMultiple co-residents When a single application shares ahost with Bolt detection is straightforward However cloudoperators colocate VMs on the same physical host to maxi-mize the infrastructurersquos utility Disentangling the contentioncaused by several jobs is challenging By default Bolt usestwo benchmarks - one core and one uncore - to measure re-source pressure If the recommender cannot determine thetype of co-scheduled workload based on them (all Pear-son correlation coefficients below 01) either the applica-tion type has not been seen before or the resource pressureis the result of multiple co-residents If at least one of the co-residents shares a core with Bolt ie the core benchmarkreturns non-zero pressure we profile with an additional corebenchmark and use it to determine the type of co-runnerBecause hyperthreads are not shared between multiple ac-tive instances this allows accurately measuring pressure incore resources The remainder of the uncore interference isused to determine the other co-scheduled workloads Thisassumes a linear relationship between jobs for the resourcesof memory bandwidth and IO bandwidth which can intro-duce some inaccuracies but in practice does not significantlyimpact Boltrsquos detection ability (see Section 34)

Finally there are occurrences where none of the co-scheduled applications share a core with Bolt In this casethe system must rely solely on uncore resource pressure todetect applications To disentangle the resource character-istics across co-residents Bolt employs a shutter profilingmode which involves frequent brief profiling phases (10-50msec each) in uncore resources The goal of this mode isto capture at least one of the co-scheduled workloads duringa low-pressure phase which would reveal the resource usageof only one of the co-residents much like a high speed cam-era shutter attempts to capture a fast-moving object still Fig-ure 3 shows how shutter profiling works When the shutteris open (upper left figure) thus profiling is off Bolt cannotdifferentiate between the two victim VMs When the shutteris closed during profiling Bolt checks for changes in uncoreresource pressure If both victims are at high load differenti-ating between them is difficult (lower left) Similarly whenboth VMs are at low load (lower right) However when onlyVM2 is at low load the pressure Bolt captures is primarilyfrom VM1 allowing the system to detect its type and basedon the remaining pressure also detect VM2 This techniqueis particularly effective for user-interactive applications thatgo through intermittent phases of low load it is less effec-tive for mixes of services with constant steady-state loadsuch as long-running analytics or logging services We will

Server VM2 Bolt VM1

Server VM2 Bolt VM1

13

13

13

13

Server VM2 Bolt VM1

Server VM2 Bolt VM1

13

13

13

13

13

13

Figure 3 Shutter profiling mode

consider whether additional input signals such as per-jobcache miss rate curves can improve detection accuracy forthe latter workloadsApplication phases Datacenter applications are notoriousfor going through multiple phases during their executionOnline services in particular follow diurnal patterns withhigh load during the day and low load in the night [3 50 44]Moreover an application may not be immediately detectedespecially during its initialization phase Cloud users maypurchase instances to run a service and then maintain themto execute other applications eg analytics over differentdatasets In these cases the results of Boltrsquos detection canbecome obsolete over time We address this by periodicallyrepeating the profiling and detection steps for co-scheduledworkloads Each iteration takes 2-5 seconds for profilingand a few milliseconds for the recommenderrsquos analysis Sec-tion 34 shows a sensitivity study on how frequently detec-tion needs to happen

34 Detection AccuracyWe first evaluate Boltrsquos detection accuracy in a controlledenvironment where all applications are known We usea 40-machine cluster with 8 core (2-way hyperthreaded)Xeon-class servers and schedule a total of 108 workloadsincluding batch analytics in Hadoop [46] and Spark [77]and latency-critical services such as webservers mem-cached [26] and Cassandra [10] For each application typethere are several different workloads with respect to algo-rithms framework versions datasets and input load patternsFriendly applications are scheduled using a least-loaded(LL) scheduler that allocates resources on the machines withthe most available compute memory and storage All work-loads are provisioned for peak requirements to reduce inter-ference Even so interference between co-residents existssince the scheduler does not account for the sensitivity ap-plications have to contention In the end of this section weevaluate how a scheduler that accounts for cross-applicationinterference affects detection accuracy The training set con-sists of 120 diverse applications that include webserversvarious analytics algorithms and datasets and several key-value stores and databases The training set is selected toprovide sufficient coverage of the space of resource charac-teristics Figure 4 shows the pressure jobs in the training setput in compute memory storage and network bandwidth

0 20 40 60 80 100CPU Pressure ()

0

20

40

60

80

100M

emor

y P

ress

ure

()

0 20 40 60 80 100Network Pressure ()

0

20

40

60

80

100

Sto

rage

Pre

ssur

e (

)

Figure 4 Coverage of resource characteristics for appli-cations in the training set

Applications Detection accuracy ()Least Load scheduler Quasar scheduler

Aggregate 87 89memcached 78 80

Hadoop 92 92Spark 85 86

Cassandra 90 89speccpu2006 84 85

Table 1 Boltrsquos detection accuracy in the controlled ex-periment with the least loaded scheduler and Quasar

While there are clusters of applications that saturate severalresources the selected workloads cover the majority of theresource usage space This enables Bolt to match any re-source profile to information from the training set even ifthe new application has not been previously seen Increasingthe training set size further did not improve detection accu-racy Finally note that there is no overlap between trainingand testing sets in terms of algorithms datasets and inputloads

In each server we instantiate an adversarial VM runningUbuntu 1404 By default the VM has 4vCPUs (2 physicalcores) to generate enough contention (Fig 10 shows a sensi-tivity analysis on the VM size) The remainder of each ma-chine is allocated to one or more friendly VMs The max VMnumber 5 in our setting depends on the available serversFriendly VMs run on Ubuntu 1404 or Debian 80 Adver-sarial VMs have no a priori information on the number andtype of their co-residents Applications are allowed to sharea physical core but must be on different hyperthreads (vC-PUs) While sharing a single hyperthread is common in pri-vate deployments hosting batch jobs in small containers [7]it is uncommon in public clouds where 1 vCPU is the min-imum guaranteed size of dedicated resources for non-idlinginstances

Table 1 shows Boltrsquos detection accuracy per applicationclass and aggregate We signal a detection as correct if Boltidentifies correctly the framework (eg Hadoop) or service(eg memcached) the application uses and the algorithmeg SVM on Hadoop or user load characteristics eg read-vs write-heavy load We do not currently examine moredetailed features such as the distribution of individual querytypes Bolt correctly identifies 87 of jobs and for certainapplication classes like databases and analytics the accuracy

CPU

L1-i

L1-d

L2

LLC

MemCap

MemBW

NetBW

DiskCap

DiskBW

0204

0608

HadoopwordCountSCPU

L1-i

L1-d

L2

LLC

MemCap

MemBW

NetBW

DiskCap

DiskBW

0204

0608

HadooprecommenderLCPU

L1-i

L1-d

L2

LLC

MemCap

MemBW

NetBW

DiskCap

DiskBW

0204

0608

New unknown app

Similarity=029 Similarity=078

Figure 5 Star charts showing the resource profiles oftwo Hadoop and one unknown job The closer the shadedarea is to a vertex the higher the pressure in the specificresource

1 2 3 4 5Number of Applications

0

20

40

60

80

100

Acc

urac

y (

)L1-i L1-d L2 LLC CPU MemCMemBw Net DiskBw DiskC

Dominant Resource

0

20

40

60

80

100

Acc

urac

y (

)

12

5

5

1918 17

15 114 2

Figure 6 Detection accuracy as a function of the numberof co-runners (left) and the appsrsquo dominant resources(right)

exceeds 90 Misclassified jobs are typically identified asworkloads with the same or similar critical resourcesPer-application profiles Note that although in Table 1 wegroup applications by programming framework or onlineservice each framework does not correspond to a single re-source profile Profiles of applications within a frameworkcan vary greatly depending on functionality complexity anddataset features Boltrsquos recommender system matches re-source profiles to specific algorithmic classes and datasetcharacteristics within each framework Figure 5 shows anexample where two Hadoop jobs one running word counton a small dataset and one running a recommender systemon a very large dataset exhibit very different resource pro-files While the third application is also a Hadoop job it isidentified as very similar to the recommender as opposed toword countNumber of co-residents The number of victim applica-tions affects detection accuracy Figure 6a shows accuracyas a function of the number of victims per machine Whenthe number of co-residents is less than 2 accuracy exceeds95 As the number increases accuracy drops since it be-comes progressively more difficult to differentiate betweenthe contention caused by each workload When there are5 co-scheduled applications accuracy is 67 Interestinglyaccuracy is higher for 4 than for 3 applications since with 4workloads the probability of sharing a core and thus gettingan accurate measurement of core pressure is higher Whileaggressive multi-tenancy affects accuracy large numbers ofco-residents in public clouds are unlikely in practice [41]

Dominant resource We now examine the correlation be-tween the resource an application puts the most pressure onand Boltrsquos ability to correctly detect it (Figure 6) The num-bers on each bar show how many applications have each re-source as dominant In general applications that are mosteasily detected are ones with high instruction cache (L1-i)memory bandwidth network bandwidth and disk capacitypressure These are workloads that fall in one of the fol-lowing categories latency-critical services with large code-bases such as webservers in-memory analytics like Sparkand disk-bound analytics like Hadoop Interestingly L2 ac-tivity is a poor indicator of application type in contrast to L1and LLC since it does not capture a significant change in theworking set size (from 32KB to 256KB for our platforms)Number of iterations In this controlled experiment westop the detection process upon correct identification of aworkload For several jobs this requires more than one it-erations of profiling and data mining Figure 7a shows thefraction of workloads that were correctly-identified after Niterations 71 of victims only require a single iterationwhile an additional 15 required a second A small frac-tion of applications needed more than two iterations whilejobs that are not identified correctly until the sixth iterationdid not benefit from additional iterations The number of ap-plications per machine also affects the iterations required fordetection (Figure 7b) When a single job is scheduled on amachine one iteration is sufficient for almost all workloadsAs the number of victims per machine increases additionaliterations are needed to disentangle the contention signals ofeach co-scheduled job

Figure 8 shows a case of detection over several iter-ations The victim is a 4vCPU instance executing differ-ent consecutive jobs Detection occurs every 20sec by de-fault (see Fig 10 for a sensitivity study on profiling fre-quency) After the initial iteration Bolt detects the job asmcf from SPECCPU2006 In the next two iterations the in-terference profile of the victim continues to fit mcfrsquos char-acteristics At t=60sec Bolt detects that the job profile haschanged and now resembles an SVM classifier running onMahout [46] Similarly at t=180sec the victim changesagain to a Spark data mining workload Changes are typi-cally captured within a few secondsResource pressure Figure 9 shows the correlation betweenthe pressure of a victim in a given resource and Boltrsquos abilityto correctly detect it We plot detection accuracy for threecore and three uncore resources the results are similar forthe remaining four In almost all cases very low or veryhigh pressure carries the most value for detection For diskbandwidth accuracy remains high except for the 20-50 re-gion which contains many application classes with moder-ate disk activity including analytics in Hadoop databasesand graph applications Resource pressure also affects thenumber of iterations Bolt needs to correctly identify a work-load For resources with a lot of value for detection such as

1 2 3 4 5 6Iterations until Detection

0

10

20

30

40

50

60

70

80

PD

F (

)

1 2 3 4 5 6Iterations until Detection

0

20

40

60

80

100

PD

F (

)

1 app2 apps3 apps4 apps5 apps

Figure 7 PDFs of iterations required for detection intotal (left) and as a function of the co-residents number(right)

0 1 2 3 4 5 6 7Time (min)

0

20

40

60

80

100

Res

ourc

e P

ress

ure

() SPEC Hadoop Spark mem$d Cassandra L1-i

L1-dL2LLCCPUmemCapmemBwNetBwDiskCapDiskBw

Figure 8 Example of workload phase detection by Bolt

the L1-i and LLCs when pressure is moderate several iter-ations are needed for accuracy to increase In contrast foroff-chip resources like network bandwidth this effect is lesspronounced In general when moderate pressure hurts de-tection accuracy (Figure 9) more iterations are needed forcorrect identificationScheduler So far we have used a least loaded schedulerwhich is commonly-used in datacenters [33 63] This sched-uler does not account for resource contention between appli-cations leading to suboptimal performance [16 24 5 8482] Recent work has shown that if interference is accountedfor at the scheduler level both performance and utilizationimprove [48 19 53] We now evaluate the impact of sucha scheduler on Boltrsquos detection accuracy Quasar [19] lever-ages machine learning to quickly determine which applica-tions can be co-scheduled on the same machine without de-structive interference We use Quasar to schedule all victimapplications and then inject Bolt in each physical host for de-tection In a real setting Bolt can trick an interference-awarescheduler by incurring very low levels of interference ini-tially until it is co-scheduled with a victim application andcan determine its type and characteristics

Table 1 shows Boltrsquos accuracy with the least loadedscheduler (LL) and Quasar Interestingly accuracy increasesslightly with Quasar 2 on average but in general the im-pact of the scheduler is small There are two reasons for theincrease first both LL and Quasar do not share a singlehyperthread between jobs so core pressure measurementsremain accurate Second because Quasar only co-schedulesjobs with different critical resources it provides Bolt with aless ldquonoisyrdquo interference signal for uncore resources mak-ing distinguishing co-residents easier Therefore reducinginterference in software alone is not sufficient to mitigatethese security threats

0 20 40 60 80 100Resource Pressure ()

50

60

70

80

90

100

Acc

urac

y (

)

0 20 40 60 80 100Resource Pressure ()

50

60

70

80

90

100

Acc

urac

y (

)

L1-iLLCCPUMemCapNetBWDiskBW

Figure 9 Detection accuracy as a function of the pres-sure victims place in various shared resources

0 50 100 150 200 250 300Profiling Interval (sec)

0

20

40

60

80

100

Acc

urac

y (

)

0 5 10 15 20 25 30Adversarial VM size (vCPUs)

0

20

40

60

80

100

Acc

urac

y (

)

0 2 4 6 8 10Number of Benchmarks

0

20

40

60

80

100

Acc

urac

y (

)

Figure 10 Sensitivity to (a) profiling frequency (b)adversarial VM size and (c) profiling benchmarks

Sensitivity analysis Finally we examine how design de-cisions in Bolt affect detection Figure 10a shows how ac-curacy changes as profiling frequency decreases For profil-ing intervals beyond 30 sec accuracy drops rapidly If pro-filing only occurs every 5 minutes almost half the victimsare incorrectly identified Note that if workloads are long-running and mostly-stable profiling can be less frequentwithout such an impact on accuracy Figure 10b shows ac-curacy as a function of the adversarial VMrsquos size We exam-ine sizes offered as on-demand instances by EC2 [1] If theadversary has fewer than 4 vCPUs its resources are insuffi-cient to create enough contention to capture the co-residentsrsquopressure Accuracy continues to grow for larger sizes how-ever the larger the adversarial instance is the less likely itwill be co-scheduled with other VMs nullifying the valueof Bolt Finally Figure 10c shows accuracy as a function ofthe number of microbenchmarks used for profiling A singlebenchmark is not sufficient to fingerprint the characteristicsof a workload however using more than 3 benchmarks hasdiminishing returns in accuracy Unless otherwise specifiedwe use 20sec profiling intervals 4 vCPU adversarial VMsand 2 benchmarks for initial profiling

35 LimitationsWhile Bolt can accurately detect most frequently-run work-loads it has certain limitations First when no job sharesa core with the adversary the system assumes linear rela-tion between the co-residentsrsquo resource pressure in uncoreresources While this is true in some cases it may not begenerally accurate We will explore alternative ways of dis-tinguishing co-residents in future work Second the systemcannot differentiate between multiple jobs running in a sin-gle instance and multiple instances running one job eachWhile this does not affect detection if a covert-channel at-tack is aimed at a particular user more accurate informationwould increase the adversaryrsquos leverage Similarly Bolt can-not currently distinguish between two copies of the same job

sharing a platform at low load and one copy of the job thatruns at higher load

4 Cloud User StudyWe now evaluate Bolt in a real-world environment withmultiple users and unknown applications We conducted amulti-user study on Amazon EC2 subject to the followingrules

bull We have engaged 20 independent users from two aca-demic institutions (Cornell University and Stanford Uni-versity) provided them with access to a shared clusterof 200 c38xlarge instances on EC2 with 32 vCPUs and60GB of memory each and asked them to submit severalapplications of their choosing

bull On each instance we maintain a 4-vCPU VM for Bolt Allother resources are made available to the users as VMs

bull Each user submits one or more applications and all usershave equal priority ie no application will be preemptedto accommodate the job of another user

bull Users were instructed to behave amenably ie to avoidconsuming disproportionate resources so that the systemcan observe a representative job sample from all users

bull Users were also advised to pin their applications to spe-cific cores to eliminate interference from the OS sched-uler Physical cores can be shared across jobs howevereach vCPU (hardware thread) is dedicated to a single ap-plication

bull Users can optionally select the instance(s) they want tolaunch their applications on We provide a spreadsheetwith a list of all available instances and their utilizationat each point in time This information as well as any apriori information on the type number and characteris-tics of submitted applications is concealed from Bolt

bull If a user does not select an instance a least-loaded sched-uler selects the VM with the most available cores andmemory

Once users start launching jobs in the cluster we instan-tiate Bolt on each machine and activate it periodically to de-tect the type characteristics and number of co-scheduledapplications The entire experiment lasts approximately 4hours beyond which point all instances are terminated Fig-ure 11 shows the PDF of application types the different userslaunched provided by the users after the completion of theexperiment Different colors correspond to different users Atotal of 436 jobs were submitted to the cluster The applica-tion mix includes analytics jobs in Hadoop and Spark sci-entific benchmarks (Parsec Bioparallel) hardware synthesistools (Cadence VivadoHLS) multicore (zsim) and n-bodysimulations email clients web browsers web servers (http)music and video streaming services and databases amongothers Each job may take one or more vCPUs Even though

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30O

ccur

renc

esUser 1User 2User 3User 4User 5User 6User 7User 8User 9User 10

User 11User 12User 13User 14User 15User 16User 17User 18User 19User 20

1hadoop 5cadence 9MLPython 13spec 17parsec 21postgres 25ppt 29du -h 33cpu burn 37html 41cpmv 45rsync 49rm 53ix2spark 6zsim 10make 14matlab 18vim 22musicStream 26OS img 30crdel cgroup 34audacity 38cassandra 42sirius 46ping 50skype3email 7video 11mem$d 15mysql 19scala 23minebench 27pdfview 31bioparallel 35javascript 39mongoDB 43oProfile 47photoshop 51zipkin

4browser 8latex 12http server 16vivado 20php 24n-body sim 28scons 32storm 36create VMs 40mkdir 44dwnld LF 48ssh 52graphX

Figure 11 Probability distribution function of applications launched in the cloud study per user

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30

Occ

urre

nces

Correctly Identifying App Name

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30

Occ

urre

nces

Correctly Identifying App Characteristics

0 50 100 150 200Time (min)

0

50

100

150

200

Inst

ance

s

0006121824303642485460

Act

ive

Jobs

Figure 12 Detection accuracy of (a) application labels and (b) resource characteristics Fig 12(c) number of jobs perinstance

figure 11 groups applications by framework the individuallogic and datasets differ between jobs in the same frame-work Note that we have not updated the training set for thisstudy it consists of the same 120 applications described inSection 34

Figure 12a shows the number of applications that werecorrectly identified by name across the different categoriesIn total 277 jobs are correctly labeled As expected Boltcould not assign a label to application types it has not seenbefore such as email clients or image editing applicationsHowever even when Bolt cannot label an application it canstill identify the resources it is sensitive to as shown in fig-ure 12b For 385 out of 436 jobs Bolt correctly identifiesthe jobrsquos resource characteristics For performance attackssuch as the ones described in Section 5 this information issufficient to dramatically degrade the performance of a vic-tim application Finally figure 12c shows the number of co-scheduled applications per instance throughout the durationof the experiment The bottom 14 instances remained un-used for the remaining 186 instances the number of activeapplications ranges from 1 to 6 with each job taking oneor more vCPUs The majority of misclassified applicationscorrespond to instances with 5 or more concurrently activejobs

5 Security AttacksBolt makes several cloud attacks practical and difficult todetect We discuss three possible attacks that leverage theinformation obtained through detection

51 Internal Denial of Service AttackAttack setting Denial of service attacks hurt the perfor-mance of a victim service by overloading its resources [142 51] In cloud settings specifically they can be catego-rized in two types external and internal (or host-based) at-tacks External attacks are the most conventional form ofDoS [51 13 22] These attacks utilize external servers to di-rect excessive traffic to the victims flooding their resourcesand hurting their availability External DoS attacks affectmostly PaaS and SaaS systems and include IP spoofing syn-chronization (SYN) flooding smurf buffer overflow landand teardrop attacks

In contrast internal DoS attacks take advantage of IaaSand PaaS cloud multi-tenancy to launch adversarial pro-grams on the same host as the victim and degrade its perfor-mance [14 60 34 52] For example Ristenpart et al [60]showed how an adversarial user can leverage the IP namingconventions of IaaS clouds to locate a victim VM and de-grade its performance Cloud providers are starting to builddefenses against such attacks For example EC2 offers au-toscaling that will increase the instances of a service un-der heavy resource usage Similarly there is related workon mechanisms that detect saturation in memory bandwidth

0 20 40 60 80 100 120Time (s)

0

2000

4000

6000

8000

10000

12000

99th

ile

Lat

ency

BoltNaive

Detection

Migration

0 20 40 60 80 100 120Time (s)

0

20

40

60

80

100

CP

U U

tiliz

atio

n (

) BoltNaive

Detection

Migration

Figure 13 Latency and utilization with Bolt and a naıveDoS attack that saturates CPU resources

and can trigger VM migration [71] This means that DoSattacks that simply overload a physical host are ineffectivewhen such defenses are in place We focus on host-basedDoS attacks and make them resilient against such defensesby avoiding resource saturation We leverage the informationobtained by Bolt to construct custom interference programsthat stress the resources the victim is most sensitive to with-out overloading the host

Bolt creates custom contentious workloads by combiningthe microbenchmarks used to measure the resource pressureof the victim Since these microbenchmarks are tunable Boltconfigures their intensity to a higher point than their mea-sured pressure ci during detection For example if a victimis detected as memcached and its most critical resources arethe L1-i cache (81 pressure) and the last level cache (LLC)(78 pressure) Bolt uses the two microbenchmarks for L1-i and LLC at a higher intensity than what memcached cantolerateImpact We use the DoS attack setting above against the 108applications of the controlled experiment (Section 34) TheDoS attack results in degraded performance by 22x on aver-age and up to 98x in terms of execution time Degradation ismuch more pronounced for interactive workloads like key-value stores with tail latency increasing by 8-140x a dra-matic impact for applications with strict tail latency SLAs

We now examine the impact of the DoS attack on uti-lization If it translates to resource saturation there is a highprobability that the DoS will be detected and the victim mi-grated to a new machine The experimental cluster supportslive migration If CPU utilization (sampled every 1 sec) ex-ceeds 70 the victim is migrated to an unloaded host Fig-ure 13 compares the tail latency and CPU utilization withBolt to that of a naıve DoS that saturates the CPU through acompute-intensive kernel We focus on a single victim VMrunning memcached The overhead of migration (time be-tween initiating migration and latency returning to normallevels) for this VM is 8sec Performance degradation is sim-ilar for both systems until t=80sec at which point the victimis migrated to a new host due to the compute-intensive kernelcausing utilization to exceed 70 While during migrationperformance continues to degrade once the VM resumes inthe new server latency returns to nominal levels In contrastBolt keeps utilization low and impacts the victimrsquos perfor-mance beyond t=80sec

52 Resource Freeing AttackAttack setting The goal of a resource freeing attack (RFA)is to modify the workload of a victim such that it frees upresources for the adversary improving its performance [6766] The adversarial VM consists of two components a ben-eficiary and a helper The beneficiary is the program whoseperformance the attacker wants to improve and the helperis the program that forces the victim to yield its resourcesThe RFA works by adding load in the victimrsquos critical re-sources causing other resources to free up For example ifthe victim is a memory-bound Spark job introducing addi-tional memory traffic will result in Spark stalling in memoryand lessening its pressure in the other resources until it canreclaim its required memory bandwidth When launched ina public cloud the victim of an RFA ends up paying moreand achieving worse performance compared to running inisolation

We now create a proof of concept RFA for a webserver anetwork-bound Hadoop job and a memory-bound Spark jobLaunching RFAs requires in depth knowledge of the victimrsquosresource requirements [67] which Bolt provides by identi-fying the victimrsquos dominant resource Once the dominant re-source is detected the runtime applies a custom helper pro-gram that saturates this critical resource For the webserverthe helper is a CPU-intensive benchmark launching CGI re-quests causing the victim to saturate its CPU usage servic-ing fewer real user requests and freeing up cache resourcesFor the Hadoop job we use a network-intensive benchmarksimilar to iperf which saturates network bandwidth andfrees up CPU and memory resources for the beneficiarySimilarly for Spark we use a streaming memory benchmarkthat slows down k-means and frees network and compute re-sources for the adversary The selection of the beneficiaryis of lesser importance without loss of generality we se-lect mcf a CPU-intensive benchmark from SPECCPU2006The same methodology can be used with other beneficiariesconditioned to their critical resource not overlapping withthe victimrsquosImpact Table 2 shows the performance degradation for thethree victim applications and the improvement in executiontime for the beneficiary The webserver suffers the most interms of queries per second as the helperrsquos CGI requestspollute its cache preventing it from servicing legitimate userrequests Hadoop and Spark experience significant degrada-tions in execution time due to network and memory band-width saturation respectively Execution time for mcf im-proves by 16-38 benefiting from the victim stalling in itscritical resource to improve its cache and CPU usage

53 VM Co-residency DetectionAttack setting Sections 34 and 4 showed that we can iden-tify the applications sharing a cloud infrastructure Howevera malicious user is rarely interested in a random service run-ning on a public cloud More often they target a specific

Victim Beneficiary TargetApp Perf App Perf Resource

Apache Webserver -64 (QPS) mcf +24 CPUHadoop (SVM) -36 (Exec) mcf +16 Network BW

Spark (k-means) -52 (Exec) mcf +38 Memory BW

Table 2 RFA impact on the victims and beneficiaries

workload eg a competitive e-commerce site Thereforethey need to pinpoint where the target resides in a practicalmanner This requires a launch strategy and a mechanism forco-residency detection [69] The attack is practical if the tar-get is located with high confidence in reasonable time andwith modest resource costs Bolt enables this attack to becarried out in a practical way and remains resilient againstdefense mechanisms such as virtual private clouds (VPCs)which make internal IP addresses private to a single tenantOnce a target VM is located the adversary can launch RFAor DoS attacks as previously described Co-residency detec-tion attacks rely on leakage of logical information eg IPaddress or on observing the impact of a side-channel attackdue to resource contention Bolt relies on a variation of thelatter approach

Assume a system of N servers A target victim userlaunches k VMs The adversary launches n malicious VMsInstances are launched simultaneously to avoid malicious-malicious instance co-residency [69] The probability that atleast one adversarial VM will be co-scheduled with a victiminstance is P( f ) = 1minus (1minus kN)n The adversary then usesa two-process detection scheme The sender is a process thatcreates contention to degrade the victimrsquos performance Thereceiver is the process that detects this degradation runningeither on the same host as the victim (cooperative detection)or externally (uncooperative detection) We assume an unco-operative victim which is the more general case Bolt worksas the sender to determine the type of co-residents on eachsampled host and identify any of the desired type Assumem VMs of the desired type have been detected in the sampleset Bolt now needs to prune down this space to VM(s) fromthe specific target victim It launches an external receiveraiming at the victim over a public channel eg HTTP orkey-value protocol to ping the victim service This is theonly point of communication with the victim At the sametime the sender incurs contention in the resources the victimis most sensitive to If the sender and victim are co-residentsthe receiverrsquos requests will be slower due to contention Boltquickly prunes down the sampled VMs by determining thetype of co-residents In a large cluster this reduces the timeto pinpoint a victim and the resource costs for the adver-sary It also increases the confidence in the detection in thepresence of noise from other co-residentsImpact We evaluate co-residency detection on the same40-node cluster The victim is a SQL server instantiatinga single VM The cluster also hosts 7 other VMs runningSQL Other active VMs host key-value stores Hadoop andSpark jobs Bolt launches 10 senders simultaneously on

randomly-selected machines (P( f ) = ) and detects 3VMs running SQL in the sample set It then introduces mem-ory interference as the receiver initiates a large number ofSQL queries While these queries have 816msec mean la-tency without interference latency now becomes 2614msecGiven a sim 3times latency increase we can conclude that one ofthe co-residents is the victim SQL server Detection required6sec from instantiation to receiver detection and 11 adver-sary VMs

6 Improving Security via Resource IsolationSince interference is at the heart of Boltrsquos detection method-ology isolation mechanisms that attenuate contention shouldreduce its accuracy We first evaluate to what extent ex-isting isolation techniques mitigate security vulnerabilitiesand then highlight trade-offs between security performanceand utilization We use the same setting as in the controlledexperiment of Section 34 We evaluate 4 resource-specificisolation techniques and 3 settings for OS-level isolationmechanisms baremetal containerized and virtualized Thebaremetal system does not employ any OS level isolationThe containerized setup uses Linux containers (lxc) andassigns cores to applications via cpuset cgroups Bothcontainers and VMs constrain memory capacity Baremetalexperiments do not enforce memory capacity allocationsand the Linux scheduler is free to float applications acrosscores

The first resource-specific isolation techniques is threadpinning to physical cores to constrain interference fromscheduling actions like context switching The number ofcores an application is allocated can change dynamicallyand is limited by how fast Linux can migrate tasks typicallyin the tens of milliseconds

For network isolation we use the outbound networkbandwidth partitioning capabilities of Linuxrsquos traffic con-trol Specifically we use the qdisc [8] scheduler with hier-archical token bucket queueing discipline (HTB) to enforceegress bandwidth limits The limits are set to the maximumtraffic burst rate for each application to avoid contention(ceil parameter in HTB interface) Ingress network band-width isolation has been extensively studied in previouswork [36] these approaches can be applied here as well

For DRAM bandwidth isolation there is no commer-cially available partitioning mechanism To enforce band-width isolation we use the following approach we monitorthe DRAM bandwidth usage of each application throughperformance counters [45] and modify the scheduler to onlycolocate jobs on machines that can accommodate their ag-gregate peak memory bandwidth requirements This requiresextensive application knowledge and is used simply to high-light the benefits of DRAM bandwidth isolation

Finally for last level cache (LLC) isolation we use theCache Allocation Technology (CAT) available in recent In-tel chips [11] CAT partitions the LLC in ways which in

0

20

40

60

80

100

Acc

urac

y (

)

None ThreadPinning

+Net BWPartitioning

+Mem BWPartitioning

+CachePartitioning

+CoreIsolation

BaremetalLinuxContainersVirtualMachines

Figure 14 Detection accuracy with isolation techniques

a highly-associative cache allows for non-overlapping par-titions at the granularity of a few percent of the LLC ca-pacity Each co-resident is allocated one partition configuredto its current capacity requirements [57] Partitions can beresized at runtime by reprogramming specific low-level reg-isters (MSRs) changes take effect after a few milliseconds

We add one isolation mechanism at a time in the threesystem settings (baremetal containers VMs) Figure 14shows the impact of isolation techniques on Boltrsquos detectionaccuracy As expected when no isolation is used baremetalallows for a significantly higher detection accuracy thancontainer- and VM-based systems mostly due to the lat-ter constraining core and memory capacity usage As a re-sult introducing thread pinning benefits baremetal the mostsince it reduces core contention It also benefits container-and VM-based setups to a lesser degree by eliminating un-predictability introduced by the Linux scheduler (eg con-text switching) [40] The dominant resource of each appli-cation determines which isolation technique benefits it themost Thread pinning mostly benefits workloads bound byon-chip resources such as L1L2 caches and cores Addingnetwork bandwidth partitioning lowers detection accuracyfor all three settings almost equally It primarily benefitsnetwork-bound workloads for which network interferenceconveys the most information for detection Memory band-width isolation further reduces accuracy by 10 on averagebenefiting jobs dominated by DRAM traffic Finally cachepartitioning has the most dramatic reduction in accuracyespecially for LLC-bound workloads We attribute this tothe importance cache pressure has as a detection signal Thenumber of co-residents also affects the extent to which iso-lation helps The more co-scheduled applications exist permachine the more isolation techniques degrade accuracy asthey make distinguishing between co-residents harder

Unfortunately even when all the techniques are on ac-curacy is still 50 due to two reasons current techniquesare not fine-grain enough to allow strict and scalable isola-tion and core resources (L1L2 caches CPU) are prone tointerference due to contending hyperthreads [61] To eval-uate the latter hypothesis we modify the scheduler suchthat hyperthreads of different jobs are never scheduled onthe same physical core eg if a job needs 7vCPUs it isallocated 4 dedicated physical cores The grey bars of Fig-ure 14 show the detection accuracy Baremetal instances stillallow certain applications to be detected but for container-

ized and virtualized settings accuracy drops to 14 The re-maining accuracy corresponds to disk-heavy workloads Im-proving security however comes at a performance penaltyof 34 on average in execution time as threads of the samejob are forced to contend with one another Alternativelyusers can overprovision their resource reservations to avoiddegradation which results in a 45 drop in utilization Thismeans that the cloud provider cannot leverage CPU idlenessto share machines decreasing the cost benefits of cloud com-puting Note that enforcing core isolation alone is also notsufficient as it allows a detection accuracy of 46Discussion The previous analysis highlights a design prob-lem with current datacenter platforms Traditional multi-cores are prone to contention which will only worsen asmore cores are integrated in each server and multi-tenancybecomes more pronounced Existing isolation techniques areinsufficient to mitigate security vulnerabilities and tech-niques that provide reasonable security guarantees sacrificeperformance or cost efficiency through low utilization Thishighlights the need for new fine-grain and coordinated iso-lation techniques that guarantee security at high utilizationfor shared resources

7 ConclusionsWe have presented Bolt a practical system that uses onlinedata mining techniques to identify the type and character-istics of applications running on shared cloud systems andenables practical attacks that degrade their performance Ina 40-server cluster Bolt correctly identifies 87 out of 108workloads and degrades tail latency by up to 140x We alsoused Bolt in a user study on EC2 to detect the character-istics of unknown jobs submitted by multiple users Finallywe show that while existing isolation techniques are helpfulthey are not sufficient to mitigate such attacks Bolt revealsreal easy-to-exploit threats in public clouds We hope thatthis work will motivate cloud providers and computer sci-entists to develop and deploy stricter isolation primitives incloud systems

AcknowledgementsWe sincerely thank all the participants in the user study fortheir time and effort We also thank Daniel Sanchez EdSuh Grant Ayers Mingyu Gao Ana Klimovic and the restof the MAST group as well as the anonymous reviewersfor their feedback on earlier versions of this manuscriptThis work was supported by the Stanford Platform LabNSF grant CNS-1422088 and a John and Norma BalenSesquicentennial Faculty Fellowship

References[1] Amazon ec2 httpawsamazoncomec2

[2] Aman Bakshi and Yogesh B Dujodwala Securing cloudfrom ddos attacks using intrusion detection system in virtualmachine In Proc of the Second International Conference onCommunication Software and Networks (ICCSN) 2010

[3] Luiz Barroso and Urs Hoelzle The Datacenter as a Com-puter An Introduction to the Design of Warehouse-Scale Ma-chines MC Publishers 2009

[4] Naomi Benger Joop van de Pol Nigel P Smart and YuvalYarom rdquoooh aah just a little bitrdquo A small amount of sidechannel can go a long way In Proc of the InternationalCryptographic Hardware and Embedded Systems Workshop(CHES) Busan South Korea 2014

[5] Major Bhadauria and Sally A McKee An approach toresource-aware co-scheduling for cmps In Proc of the 24thACM International Conference on Supercomputing (ICS)Tsukuba Japan 2010

[6] Leon Bottou Large-scale machine learning with stochas-tic gradient descent In Proceedings of the InternationalConference on Computational Statistics (COMPSTAT) ParisFrance 2010

[7] Eric Brewer Kubernetes The path to cloud native httpgooglQgkzYB SOCC Keynote August 2015

[8] Martin A Brown Traffic control howto httplinux-ipnetarticlesTraffic-Control-HOWTO

[9] Robin Burke Hybrid recommender systems Survey andexperiments User Modeling and User-Adapted Interaction12(4)331ndash370 November 2002

[10] Apache cassandra httpcassandraapacheorg

[11] Intel Rcopy64 and IA-32 Architecture Software DeveloperrsquosManual vol3B System Programming Guide Part 2 Septem-ber 2014

[12] Ludmila Cherkasova Diwaker Gupta and Amin VahdatComparison of the three cpu schedulers in xen SIGMET-RICS Perform Eval Rev 35(2)42ndash51 September 2007

[13] Scott A Crosby and Dan S Wallach Denial of service viaalgorithmic complexity attacks In Proceedings of the 12thConference on USENIX Security Symposium WashingtonDC 2003

[14] Marwan Darwish Abdelkader Ouda and Luiz FernandoCapretz Cloud-based ddos attacks and defenses In Procof i-Society Toronto ON 2013

[15] Christina Delimitrou and Christos Kozyrakis iBench Quan-tifying Interference for Datacenter Workloads In Proceed-ings of the 2013 IEEE International Symposium on WorkloadCharacterization (IISWC) Portland OR September 2013

[16] Christina Delimitrou and Christos Kozyrakis Paragon QoS-Aware Scheduling for Heterogeneous Datacenters In Pro-ceedings of the Eighteenth International Conference on Ar-chitectural Support for Programming Languages and Oper-ating Systems (ASPLOS) Houston TX USA 2013

[17] Christina Delimitrou and Christos Kozyrakis QoS-AwareScheduling in Heterogeneous Datacenters with Paragon InACM Transactions on Computer Systems (TOCS) Vol 31Issue 4 December 2013

[18] Christina Delimitrou and Christos Kozyrakis Quality-of-Service-Aware Scheduling in Heterogeneous Datacenterswith Paragon In IEEE Micro Special Issue on Top Picks fromthe Computer Architecture Conferences MayJune 2014

[19] Christina Delimitrou and Christos Kozyrakis Quasar Resource-Efficient and QoS-Aware Cluster Management In Proceed-ings of the Nineteenth International Conference on Architec-tural Support for Programming Languages and OperatingSystems (ASPLOS) Salt Lake City UT USA 2014

[20] Christina Delimitrou and Christos Kozyrakis HCloudResource-Efficient Provisioning in Shared Cloud SystemsIn Proceedings of the Twenty First International Conferenceon Architectural Support for Programming Languages andOperating Systems (ASPLOS) April 2016

[21] Christina Delimitrou Daniel Sanchez and Christos KozyrakisTarcil Reconciling Scheduling Speed and Quality in LargeShared Clusters In Proceedings of the Sixth ACM Symposiumon Cloud Computing (SOCC) August 2015

[22] Jake Edge Denial of service via hash collisions httplwnnetArticles474912 January 2012

[23] Benjamin Farley Ari Juels Venkatanathan Varadarajan ThomasRistenpart Kevin D Bowers and Michael M Swift More foryour money Exploiting performance heterogeneity in publicclouds In Proc of the ACM Symposium on Cloud Computing(SOCC) San Jose CA 2012

[24] Alexandra Fedorova Margo Seltzer and Michael D SmithImproving performance isolation on chip multiprocessors viaan operating system scheduler In Proceedings of the 16thIntl Conference on Parallel Architecture and CompilationTechniques (PACT) Brasov Romania 2007

[25] Alexander Felfernig and Robin Burke Constraint-based rec-ommender systems Technologies and research issues InProceedings of the ACM International Conference on Elec-tronic Commerce (ICEC) Innsbruck Austria 2008

[26] Brad Fitzpatrick Distributed caching with memcached InLinux Journal Volume 2004 Issue 124 2004

[27] Oded Goldreich and Rafail Ostrovsky Software protectionand simulation on oblivious rams J ACM 43(3)431ndash473May 1996

[28] Google container engine httpscloudgooglecomcontainer-engine

[29] Asela Gunawardana and Christopher Meek A unified ap-proach to building hybrid recommender systems In Proc ofthe Third ACM Conference on Recommender Systems (Rec-Sys) New York NY 2009

[30] Sanchika Gupta and Padam Kumar Vm profile based opti-mized network attack pattern detection scheme for ddos at-tacks in cloud In Proc of SSCC Mysore India 2013

[31] Yi Han Tansu Alpcan Jeffrey Chan and Christopher LeckieSecurity games for virtual machine allocation in cloud com-puting In 4th International Conference on Decision andGame Theory for Security Fort Worth TX 2013

[32] Amir Herzberg Haya Shulman Johanna Ullrich and EdgarWeippl Cloudoscopy Services discovery and topology map-ping In Proceedings of the ACM Workshop on Cloud Com-puting Security Workshop (CCSW) Berlin Germany 2013

[33] Ben Hindman Andy Konwinski Matei Zaharia Ali GhodsiAnthony D Joseph Randy Katz Scott Shenker and Ion Sto-ica Mesos A platform for fine-grained resource sharing inthe data center In Proceedings of NSDI Boston MA 2011

[34] Jingwei Huang David M Nicol and Roy H CampbellDenial-of-service threat to hadoopyarn clusters with multi-tenancy In Proc of the IEEE International Congress on BigData Washington DC 2014

[35] Alexandru Iosup Nezih Yigitbasi and Dick Epema Onthe performance variability of production cloud services InProceedings of CCGRID Newport Beach CA 2011

[36] Vimalkumar Jeyakumar Mohammad Alizadeh David MazieresBalaji Prabhakar Changhoon Kim and Albert GreenbergEyeq Practical network performance isolation at the edge InProc of the 10th USENIX Conference on Networked SystemsDesign and Implementation (NSDI) Lombard IL 2013

[37] Yaakoub El Khamra Hyunjoo Kim Shantenu Jha and Man-ish Parashar Exploring the performance fluctuations of hpcworkloads on clouds In Proceedings of CloudCom Indi-anapolis IN 2010

[38] Krzysztof C Kiwiel Convergence and efficiency of subgra-dient methods for quasiconvex minimization In Mathemat-ical Programming (Series A) (Berlin Heidelberg Springer)90 (1) pp 1-25 2001

[39] Ruby B Lee Rethinking computers for cybersecurity IEEEComputer 48(4)16ndash25 2015

[40] Jacob Leverich and Christos Kozyrakis Reconciling highserver utilization and sub-millisecond quality-of-service InProceedings of EuroSys Amsterdam The Netherlands 2014

[41] Host server cpu utilization in amazon ec2 cloud httpgoogl2LTx4T

[42] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser andRuby B Lee Last-level cache side-channel attacks are prac-tical In Proc of IEEE Symposium on Security and Privacy(SampP) San Jose CA 2015

[43] Fei Liu Lanfang Ren and Hongtao Bai Mitigating cross-vm side channel attack on multiple tenants cloud platform InJournal of Computers Vol 9 No 4 (2014) 1005-1013 April2014

[44] David Lo Liqun Cheng Rama Govindaraju Luiz Andre Bar-roso and Christos Kozyrakis Towards energy proportionalityfor large-scale latency-critical workloads In Proceedings ofthe 41st Annual International Symposium on Computer Ar-chitecuture (ISCA) Minneapolis MN 2014

[45] David Lo Liqun Cheng Rama Govindaraju ParthasarathyRanganathan and Christos Kozyrakis Heracles Improvingresource efficiency at scale In Proc of the 42Nd AnnualInternational Symposium on Computer Architecture (ISCA)Portland OR 2015

[46] Mahout httpmahoutapacheorg

[47] Dave Mangot Ec2 variability The numbers revealed httptechmangotcomrollerdaveentryec2_variability_the_numbers_revealed

[48] Jason Mars and Lingjia Tang Whare-map heterogeneity inrdquohomogeneousrdquo warehouse-scale computers In Proceedingsof ISCA Tel-Aviv Israel 2013

[49] Robert Martin John Demme and Simha SethumadhavanTimewarp Rethinking timekeeping and performance moni-toring mechanisms to mitigate side-channel attacks In Pro-ceedings of the International Symposium on Computer Archi-tecture (ISCA) Portland OR 2012

[50] David Meisner Christopher M Sadler Luiz Andre BarrosoWolf-Dietrich Weber and Thomas F Wenisch Power man-agement of online data-intensive services In Proceedings ofthe 38th annual international symposium on Computer archi-tecture pages 319ndash330 2011

[51] Jelena Mirkovic and Peter Reiher A taxonomy of ddos attackand ddos defense mechanisms ACM SIGCOMM ComputerCommunication Review (CCR) April 2004

[52] Thomas Moscibroda and Onur Mutlu Memory performanceattacks Denial of memory service in multi-core systemsIn Proc of 16th USENIX Security Symposium on USENIXSecurity Symposium (SS) Boston MA 2007

[53] Ripal Nathuji Aman Kansal and Alireza Ghaffarkhah Q-clouds Managing performance interference effects for qos-aware clouds In Proceedings of EuroSys ParisFrance 2010

[54] Simon Ostermann Alexandru Iosup Nezih Yigitbasi RaduProdan Thomas Fahringer and Dick Epema A performanceanalysis of ec2 cloud computing services for scientific com-puting In Lecture Notes on Cloud Computing Volume 34p115-131 2010

[55] Tao Peng Christopher Leckie and Kotagiri RamamohanaraoSurvey of network-based defense mechanisms countering thedos and ddos problems ACM Comput Surv 39(1) April2007

[56] Diego Perez-Botero Jakub Szefer and Ruby B Lee Char-acterizing hypervisor vulnerabilities in cloud computingservers In Proceedings of the 2013 International Work-shop on Security in Cloud Computing SCCASIACCSHangzhou China 2013

[57] Moinuddin K Qureshi and Yale N Patt Utility-based cachepartitioning A low-overhead high-performance runtimemechanism to partition shared caches In Proceedings ofthe 39th Annual IEEEACM International Symposium on Mi-croarchitecture MICRO 39 2006

[58] Himanshu Raj Ripal Nathuji Abhishek Singh and Paul Eng-land Resource management for isolation enhanced cloud ser-vices In Proc of the ACM Workshop on Cloud ComputingSecurity (CCSW) Chicago IL 2009

[59] Suhail Rehman and Majd Sakr Initial findings for provision-ing variation in cloud computing In Proceedings of Cloud-Com Indianapolis IN 2010

[60] Thomas Ristenpart Eran Tromer Hovav Shacham and Ste-fan Savage Hey you get off of my cloud Exploring infor-mation leakage in third-party compute clouds In Proc of theACM Conference on Computer and Communications Security(CCS) Chicago IL 2009

[61] Daniel Sanchez and Christos Kozyrakis Vantage Scalableand Efficient Fine-Grain Cache Partitioning In Proceedingsof the 38th annual International Symposium in ComputerArchitecture (ISCA-38) San Jose CA June 2011

[62] Jorg Schad Jens Dittrich and Jorge-Arnulfo Quiane-RuizRuntime measurements in the cloud Observing analyzingand reducing variance Proceedings VLDB Endow 3(1-2)460ndash471 September 2010

[63] Malte Schwarzkopf Andy Konwinski Michael Abd-El-Malek and John Wilkes Omega flexible scalable sched-ulers for large compute clusters In Proceedings of EuroSysPrague Czech Republic 2013

[64] Alan Shieh Srikanth Kandula Albert Greenberg and ChanghoonKim Seawall Performance isolation for cloud datacenternetworks In Proc of the USENIX Conference on Hot Topicsin Cloud Computing (HotCloud) Boston MA 2010

[65] David Shue Michael J Freedman and Anees Shaikh Perfor-mance isolation and fairness for multi-tenant cloud storage InProc of the 10th USENIX Conference on Operating SystemsDesign and Implementation (OSDI) Hollywood CA 2012

[66] Dan Tsafrir Yoav Etsion and Dror G Feitelson Secretlymonopolizing the cpu without superuser privileges In Procof 16th USENIX Security Symposium on USENIX SecuritySymposium Boston MA 2007

[67] Venkatanathan Varadarajan Thawan Kooburat BenjaminFarley Thomas Ristenpart and Michael Swift Resource-freeing attacks Improve your cloud performance (at yourneighborrsquos expense) In Proc of the Conference on Computerand Communications Security (CCS) Raleigh 2012

[68] Venkatanathan Varadarajan Thomas Ristenpart and MichaelSwift Scheduler-based defenses against cross-vm side-channels In Proc of the 23rd Usenix Security SymposiumSan Diego CA 2014

[69] Venkatanathan Varadarajan Yinqian Zhang Thomas Risten-part and Michael Swift A placement vulnerability study inmulti-tenant public clouds In Proc of the 24th USENIX Secu-rity Symposium (USENIX Security) Washington DC 2015

[70] Huaibin Wang Haiyun Zhou and Chundong Wang Vir-tual machine-based intrusion detection system framework incloud computing environment In Journal of Computers Oc-tober 2012

[71] Hui Wang Canturk Isci Lavanya Subramanian JongmooChoi Depei Qian and Onur Mutlu A-drm Architecture-aware distributed resource management of virtualized clus-ters In Proceedings of the 11th ACM SIGPLANSIGOPSinternational conference on Virtual Execution Environments(VEE) Istanbul Turkey 2015

[72] Ian H Witten Eibe Frank and Geoffrey Holmes DataMining Practical Machine Learning Tools and Techniques3rd Edition

[73] Zhenyu Wu Zhang Xu and Haining Wang Whispers inthe hyper-space High-speed covert channel attacks in thecloud In Proc of the 21st USENIX Conference on SecuritySymposium (USENIX Security) Bellevue WA 2012

[74] Yunjing Xu Michael Bailey Farnam Jahanian KaustubhJoshi Matti Hiltunen and Richard Schlichting An explo-ration of l2 cache covert channels in virtualized environmentsIn Proc of the 3rd ACM Workshop on Cloud Computing Se-curity Workshop (CCSW) Chicago IL 2011

[75] Zhang Xu Haining Wang and Zhenyu Wu A measurementstudy on co-residence threat inside the cloud In Proc ofthe 24th USENIX Security Symposium (USENIX Security)Washington DC 2015

[76] Yuval Yarom and Katrina Falkner Flush+reload a highresolution low noise l3 cache side-channel attack In Proc ofthe 23rd Usenix Security Symposium San Diego CA 2014

[77] Matei Zaharia Mosharaf Chowdhury Tathagata Das AnkurDave Justin Ma Murphy McCauly Michael J FranklinScott Shenker and Ion Stoica Resilient distributed datasetsA fault-tolerant abstraction for in-memory cluster computingIn Proceedings of NSDI San Jose CA 2012

[78] Yinqian Zhang Ari Juels Alina Oprea and Michael K Re-iter Homealone Co-residency detection in the cloud via side-channel analysis In Proc of the IEEE Symposium on Securityand Privacy Oakland CA 2011

[79] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-tenant side-channel attacks in paas cloudsIn Proc of the ACM SIGSAC Conference on Computer andCommunications Security (CCS) Scottsdale AZ 2014

[80] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-vm side channels and their use to ex-tract private keys In Proceedings of the ACM Conferenceon Computer and Communications Security (CCS) RaleighNC 2012

[81] Yinqian Zhang and Michael K Reiter Duppel retrofittingcommodity operating systems to mitigate cache side channelsin the cloud In Proc of the ACM Conference on Computerand Communications Security (CCS) Berlin Germany 2013

[82] Fangfei Zhou Manish Goel Peter Desnoyers and Ravi Sun-daram Scheduler vulnerabilities and coordinated attacks incloud computing J Comput Secur 21(4)533ndash559 July2013

[83] Jieming Zhu Pinjia He Zibin Zheng and Michael R LyuTowards online accurate and scalable qos prediction for run-time service adaptation In Proc of the IEEE InternationalConference on Distributed Computing Systems (ICDCS)Madrid Spain 2014

[84] Sergey Zhuravlev Sergey Blagodurov and Alexandra Fe-dorova Addressing shared resource contention in multicoreprocessors via scheduling In Proc of the Fifteenth Editionof on Architectural Support for Programming Languages andOperating Systems (ASPLOS) Pittsburgh PA 2010

  • Introduction
  • Related Work
  • Bolt
    • Threat Model
    • Application Detection
    • Challenges
    • Detection Accuracy
    • Limitations
      • Cloud User Study
      • Security Attacks
        • Internal Denial of Service Attack
        • Resource Freeing Attack
        • VM Co-residency Detection
          • Improving Security via Resource Isolation
          • Conclusions
Page 5: Bolt: I Know What You Did Last Summer In the Cloudcsl.stanford.edu/~christos/publications/2017.bolt.asplos.pdf · 2018-01-09 · Bolt: I Know What You Did Last Summer... In the Cloud

with zero disk usage signaling a memcached workload withvery high likelihood Similar graphs can be created to fin-gerprint other application types The high heat areas aroundthe red regions of each graph correspond to memcached jobswith different rdwr ratios and value sizes and memory-bound workloads like Spark

33 ChallengesMultiple co-residents When a single application shares ahost with Bolt detection is straightforward However cloudoperators colocate VMs on the same physical host to maxi-mize the infrastructurersquos utility Disentangling the contentioncaused by several jobs is challenging By default Bolt usestwo benchmarks - one core and one uncore - to measure re-source pressure If the recommender cannot determine thetype of co-scheduled workload based on them (all Pear-son correlation coefficients below 01) either the applica-tion type has not been seen before or the resource pressureis the result of multiple co-residents If at least one of the co-residents shares a core with Bolt ie the core benchmarkreturns non-zero pressure we profile with an additional corebenchmark and use it to determine the type of co-runnerBecause hyperthreads are not shared between multiple ac-tive instances this allows accurately measuring pressure incore resources The remainder of the uncore interference isused to determine the other co-scheduled workloads Thisassumes a linear relationship between jobs for the resourcesof memory bandwidth and IO bandwidth which can intro-duce some inaccuracies but in practice does not significantlyimpact Boltrsquos detection ability (see Section 34)

Finally there are occurrences where none of the co-scheduled applications share a core with Bolt In this casethe system must rely solely on uncore resource pressure todetect applications To disentangle the resource character-istics across co-residents Bolt employs a shutter profilingmode which involves frequent brief profiling phases (10-50msec each) in uncore resources The goal of this mode isto capture at least one of the co-scheduled workloads duringa low-pressure phase which would reveal the resource usageof only one of the co-residents much like a high speed cam-era shutter attempts to capture a fast-moving object still Fig-ure 3 shows how shutter profiling works When the shutteris open (upper left figure) thus profiling is off Bolt cannotdifferentiate between the two victim VMs When the shutteris closed during profiling Bolt checks for changes in uncoreresource pressure If both victims are at high load differenti-ating between them is difficult (lower left) Similarly whenboth VMs are at low load (lower right) However when onlyVM2 is at low load the pressure Bolt captures is primarilyfrom VM1 allowing the system to detect its type and basedon the remaining pressure also detect VM2 This techniqueis particularly effective for user-interactive applications thatgo through intermittent phases of low load it is less effec-tive for mixes of services with constant steady-state loadsuch as long-running analytics or logging services We will

Server VM2 Bolt VM1

Server VM2 Bolt VM1

13

13

13

13

Server VM2 Bolt VM1

Server VM2 Bolt VM1

13

13

13

13

13

13

Figure 3 Shutter profiling mode

consider whether additional input signals such as per-jobcache miss rate curves can improve detection accuracy forthe latter workloadsApplication phases Datacenter applications are notoriousfor going through multiple phases during their executionOnline services in particular follow diurnal patterns withhigh load during the day and low load in the night [3 50 44]Moreover an application may not be immediately detectedespecially during its initialization phase Cloud users maypurchase instances to run a service and then maintain themto execute other applications eg analytics over differentdatasets In these cases the results of Boltrsquos detection canbecome obsolete over time We address this by periodicallyrepeating the profiling and detection steps for co-scheduledworkloads Each iteration takes 2-5 seconds for profilingand a few milliseconds for the recommenderrsquos analysis Sec-tion 34 shows a sensitivity study on how frequently detec-tion needs to happen

34 Detection AccuracyWe first evaluate Boltrsquos detection accuracy in a controlledenvironment where all applications are known We usea 40-machine cluster with 8 core (2-way hyperthreaded)Xeon-class servers and schedule a total of 108 workloadsincluding batch analytics in Hadoop [46] and Spark [77]and latency-critical services such as webservers mem-cached [26] and Cassandra [10] For each application typethere are several different workloads with respect to algo-rithms framework versions datasets and input load patternsFriendly applications are scheduled using a least-loaded(LL) scheduler that allocates resources on the machines withthe most available compute memory and storage All work-loads are provisioned for peak requirements to reduce inter-ference Even so interference between co-residents existssince the scheduler does not account for the sensitivity ap-plications have to contention In the end of this section weevaluate how a scheduler that accounts for cross-applicationinterference affects detection accuracy The training set con-sists of 120 diverse applications that include webserversvarious analytics algorithms and datasets and several key-value stores and databases The training set is selected toprovide sufficient coverage of the space of resource charac-teristics Figure 4 shows the pressure jobs in the training setput in compute memory storage and network bandwidth

0 20 40 60 80 100CPU Pressure ()

0

20

40

60

80

100M

emor

y P

ress

ure

()

0 20 40 60 80 100Network Pressure ()

0

20

40

60

80

100

Sto

rage

Pre

ssur

e (

)

Figure 4 Coverage of resource characteristics for appli-cations in the training set

Applications Detection accuracy ()Least Load scheduler Quasar scheduler

Aggregate 87 89memcached 78 80

Hadoop 92 92Spark 85 86

Cassandra 90 89speccpu2006 84 85

Table 1 Boltrsquos detection accuracy in the controlled ex-periment with the least loaded scheduler and Quasar

While there are clusters of applications that saturate severalresources the selected workloads cover the majority of theresource usage space This enables Bolt to match any re-source profile to information from the training set even ifthe new application has not been previously seen Increasingthe training set size further did not improve detection accu-racy Finally note that there is no overlap between trainingand testing sets in terms of algorithms datasets and inputloads

In each server we instantiate an adversarial VM runningUbuntu 1404 By default the VM has 4vCPUs (2 physicalcores) to generate enough contention (Fig 10 shows a sensi-tivity analysis on the VM size) The remainder of each ma-chine is allocated to one or more friendly VMs The max VMnumber 5 in our setting depends on the available serversFriendly VMs run on Ubuntu 1404 or Debian 80 Adver-sarial VMs have no a priori information on the number andtype of their co-residents Applications are allowed to sharea physical core but must be on different hyperthreads (vC-PUs) While sharing a single hyperthread is common in pri-vate deployments hosting batch jobs in small containers [7]it is uncommon in public clouds where 1 vCPU is the min-imum guaranteed size of dedicated resources for non-idlinginstances

Table 1 shows Boltrsquos detection accuracy per applicationclass and aggregate We signal a detection as correct if Boltidentifies correctly the framework (eg Hadoop) or service(eg memcached) the application uses and the algorithmeg SVM on Hadoop or user load characteristics eg read-vs write-heavy load We do not currently examine moredetailed features such as the distribution of individual querytypes Bolt correctly identifies 87 of jobs and for certainapplication classes like databases and analytics the accuracy

CPU

L1-i

L1-d

L2

LLC

MemCap

MemBW

NetBW

DiskCap

DiskBW

0204

0608

HadoopwordCountSCPU

L1-i

L1-d

L2

LLC

MemCap

MemBW

NetBW

DiskCap

DiskBW

0204

0608

HadooprecommenderLCPU

L1-i

L1-d

L2

LLC

MemCap

MemBW

NetBW

DiskCap

DiskBW

0204

0608

New unknown app

Similarity=029 Similarity=078

Figure 5 Star charts showing the resource profiles oftwo Hadoop and one unknown job The closer the shadedarea is to a vertex the higher the pressure in the specificresource

1 2 3 4 5Number of Applications

0

20

40

60

80

100

Acc

urac

y (

)L1-i L1-d L2 LLC CPU MemCMemBw Net DiskBw DiskC

Dominant Resource

0

20

40

60

80

100

Acc

urac

y (

)

12

5

5

1918 17

15 114 2

Figure 6 Detection accuracy as a function of the numberof co-runners (left) and the appsrsquo dominant resources(right)

exceeds 90 Misclassified jobs are typically identified asworkloads with the same or similar critical resourcesPer-application profiles Note that although in Table 1 wegroup applications by programming framework or onlineservice each framework does not correspond to a single re-source profile Profiles of applications within a frameworkcan vary greatly depending on functionality complexity anddataset features Boltrsquos recommender system matches re-source profiles to specific algorithmic classes and datasetcharacteristics within each framework Figure 5 shows anexample where two Hadoop jobs one running word counton a small dataset and one running a recommender systemon a very large dataset exhibit very different resource pro-files While the third application is also a Hadoop job it isidentified as very similar to the recommender as opposed toword countNumber of co-residents The number of victim applica-tions affects detection accuracy Figure 6a shows accuracyas a function of the number of victims per machine Whenthe number of co-residents is less than 2 accuracy exceeds95 As the number increases accuracy drops since it be-comes progressively more difficult to differentiate betweenthe contention caused by each workload When there are5 co-scheduled applications accuracy is 67 Interestinglyaccuracy is higher for 4 than for 3 applications since with 4workloads the probability of sharing a core and thus gettingan accurate measurement of core pressure is higher Whileaggressive multi-tenancy affects accuracy large numbers ofco-residents in public clouds are unlikely in practice [41]

Dominant resource We now examine the correlation be-tween the resource an application puts the most pressure onand Boltrsquos ability to correctly detect it (Figure 6) The num-bers on each bar show how many applications have each re-source as dominant In general applications that are mosteasily detected are ones with high instruction cache (L1-i)memory bandwidth network bandwidth and disk capacitypressure These are workloads that fall in one of the fol-lowing categories latency-critical services with large code-bases such as webservers in-memory analytics like Sparkand disk-bound analytics like Hadoop Interestingly L2 ac-tivity is a poor indicator of application type in contrast to L1and LLC since it does not capture a significant change in theworking set size (from 32KB to 256KB for our platforms)Number of iterations In this controlled experiment westop the detection process upon correct identification of aworkload For several jobs this requires more than one it-erations of profiling and data mining Figure 7a shows thefraction of workloads that were correctly-identified after Niterations 71 of victims only require a single iterationwhile an additional 15 required a second A small frac-tion of applications needed more than two iterations whilejobs that are not identified correctly until the sixth iterationdid not benefit from additional iterations The number of ap-plications per machine also affects the iterations required fordetection (Figure 7b) When a single job is scheduled on amachine one iteration is sufficient for almost all workloadsAs the number of victims per machine increases additionaliterations are needed to disentangle the contention signals ofeach co-scheduled job

Figure 8 shows a case of detection over several iter-ations The victim is a 4vCPU instance executing differ-ent consecutive jobs Detection occurs every 20sec by de-fault (see Fig 10 for a sensitivity study on profiling fre-quency) After the initial iteration Bolt detects the job asmcf from SPECCPU2006 In the next two iterations the in-terference profile of the victim continues to fit mcfrsquos char-acteristics At t=60sec Bolt detects that the job profile haschanged and now resembles an SVM classifier running onMahout [46] Similarly at t=180sec the victim changesagain to a Spark data mining workload Changes are typi-cally captured within a few secondsResource pressure Figure 9 shows the correlation betweenthe pressure of a victim in a given resource and Boltrsquos abilityto correctly detect it We plot detection accuracy for threecore and three uncore resources the results are similar forthe remaining four In almost all cases very low or veryhigh pressure carries the most value for detection For diskbandwidth accuracy remains high except for the 20-50 re-gion which contains many application classes with moder-ate disk activity including analytics in Hadoop databasesand graph applications Resource pressure also affects thenumber of iterations Bolt needs to correctly identify a work-load For resources with a lot of value for detection such as

1 2 3 4 5 6Iterations until Detection

0

10

20

30

40

50

60

70

80

PD

F (

)

1 2 3 4 5 6Iterations until Detection

0

20

40

60

80

100

PD

F (

)

1 app2 apps3 apps4 apps5 apps

Figure 7 PDFs of iterations required for detection intotal (left) and as a function of the co-residents number(right)

0 1 2 3 4 5 6 7Time (min)

0

20

40

60

80

100

Res

ourc

e P

ress

ure

() SPEC Hadoop Spark mem$d Cassandra L1-i

L1-dL2LLCCPUmemCapmemBwNetBwDiskCapDiskBw

Figure 8 Example of workload phase detection by Bolt

the L1-i and LLCs when pressure is moderate several iter-ations are needed for accuracy to increase In contrast foroff-chip resources like network bandwidth this effect is lesspronounced In general when moderate pressure hurts de-tection accuracy (Figure 9) more iterations are needed forcorrect identificationScheduler So far we have used a least loaded schedulerwhich is commonly-used in datacenters [33 63] This sched-uler does not account for resource contention between appli-cations leading to suboptimal performance [16 24 5 8482] Recent work has shown that if interference is accountedfor at the scheduler level both performance and utilizationimprove [48 19 53] We now evaluate the impact of sucha scheduler on Boltrsquos detection accuracy Quasar [19] lever-ages machine learning to quickly determine which applica-tions can be co-scheduled on the same machine without de-structive interference We use Quasar to schedule all victimapplications and then inject Bolt in each physical host for de-tection In a real setting Bolt can trick an interference-awarescheduler by incurring very low levels of interference ini-tially until it is co-scheduled with a victim application andcan determine its type and characteristics

Table 1 shows Boltrsquos accuracy with the least loadedscheduler (LL) and Quasar Interestingly accuracy increasesslightly with Quasar 2 on average but in general the im-pact of the scheduler is small There are two reasons for theincrease first both LL and Quasar do not share a singlehyperthread between jobs so core pressure measurementsremain accurate Second because Quasar only co-schedulesjobs with different critical resources it provides Bolt with aless ldquonoisyrdquo interference signal for uncore resources mak-ing distinguishing co-residents easier Therefore reducinginterference in software alone is not sufficient to mitigatethese security threats

0 20 40 60 80 100Resource Pressure ()

50

60

70

80

90

100

Acc

urac

y (

)

0 20 40 60 80 100Resource Pressure ()

50

60

70

80

90

100

Acc

urac

y (

)

L1-iLLCCPUMemCapNetBWDiskBW

Figure 9 Detection accuracy as a function of the pres-sure victims place in various shared resources

0 50 100 150 200 250 300Profiling Interval (sec)

0

20

40

60

80

100

Acc

urac

y (

)

0 5 10 15 20 25 30Adversarial VM size (vCPUs)

0

20

40

60

80

100

Acc

urac

y (

)

0 2 4 6 8 10Number of Benchmarks

0

20

40

60

80

100

Acc

urac

y (

)

Figure 10 Sensitivity to (a) profiling frequency (b)adversarial VM size and (c) profiling benchmarks

Sensitivity analysis Finally we examine how design de-cisions in Bolt affect detection Figure 10a shows how ac-curacy changes as profiling frequency decreases For profil-ing intervals beyond 30 sec accuracy drops rapidly If pro-filing only occurs every 5 minutes almost half the victimsare incorrectly identified Note that if workloads are long-running and mostly-stable profiling can be less frequentwithout such an impact on accuracy Figure 10b shows ac-curacy as a function of the adversarial VMrsquos size We exam-ine sizes offered as on-demand instances by EC2 [1] If theadversary has fewer than 4 vCPUs its resources are insuffi-cient to create enough contention to capture the co-residentsrsquopressure Accuracy continues to grow for larger sizes how-ever the larger the adversarial instance is the less likely itwill be co-scheduled with other VMs nullifying the valueof Bolt Finally Figure 10c shows accuracy as a function ofthe number of microbenchmarks used for profiling A singlebenchmark is not sufficient to fingerprint the characteristicsof a workload however using more than 3 benchmarks hasdiminishing returns in accuracy Unless otherwise specifiedwe use 20sec profiling intervals 4 vCPU adversarial VMsand 2 benchmarks for initial profiling

35 LimitationsWhile Bolt can accurately detect most frequently-run work-loads it has certain limitations First when no job sharesa core with the adversary the system assumes linear rela-tion between the co-residentsrsquo resource pressure in uncoreresources While this is true in some cases it may not begenerally accurate We will explore alternative ways of dis-tinguishing co-residents in future work Second the systemcannot differentiate between multiple jobs running in a sin-gle instance and multiple instances running one job eachWhile this does not affect detection if a covert-channel at-tack is aimed at a particular user more accurate informationwould increase the adversaryrsquos leverage Similarly Bolt can-not currently distinguish between two copies of the same job

sharing a platform at low load and one copy of the job thatruns at higher load

4 Cloud User StudyWe now evaluate Bolt in a real-world environment withmultiple users and unknown applications We conducted amulti-user study on Amazon EC2 subject to the followingrules

bull We have engaged 20 independent users from two aca-demic institutions (Cornell University and Stanford Uni-versity) provided them with access to a shared clusterof 200 c38xlarge instances on EC2 with 32 vCPUs and60GB of memory each and asked them to submit severalapplications of their choosing

bull On each instance we maintain a 4-vCPU VM for Bolt Allother resources are made available to the users as VMs

bull Each user submits one or more applications and all usershave equal priority ie no application will be preemptedto accommodate the job of another user

bull Users were instructed to behave amenably ie to avoidconsuming disproportionate resources so that the systemcan observe a representative job sample from all users

bull Users were also advised to pin their applications to spe-cific cores to eliminate interference from the OS sched-uler Physical cores can be shared across jobs howevereach vCPU (hardware thread) is dedicated to a single ap-plication

bull Users can optionally select the instance(s) they want tolaunch their applications on We provide a spreadsheetwith a list of all available instances and their utilizationat each point in time This information as well as any apriori information on the type number and characteris-tics of submitted applications is concealed from Bolt

bull If a user does not select an instance a least-loaded sched-uler selects the VM with the most available cores andmemory

Once users start launching jobs in the cluster we instan-tiate Bolt on each machine and activate it periodically to de-tect the type characteristics and number of co-scheduledapplications The entire experiment lasts approximately 4hours beyond which point all instances are terminated Fig-ure 11 shows the PDF of application types the different userslaunched provided by the users after the completion of theexperiment Different colors correspond to different users Atotal of 436 jobs were submitted to the cluster The applica-tion mix includes analytics jobs in Hadoop and Spark sci-entific benchmarks (Parsec Bioparallel) hardware synthesistools (Cadence VivadoHLS) multicore (zsim) and n-bodysimulations email clients web browsers web servers (http)music and video streaming services and databases amongothers Each job may take one or more vCPUs Even though

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30O

ccur

renc

esUser 1User 2User 3User 4User 5User 6User 7User 8User 9User 10

User 11User 12User 13User 14User 15User 16User 17User 18User 19User 20

1hadoop 5cadence 9MLPython 13spec 17parsec 21postgres 25ppt 29du -h 33cpu burn 37html 41cpmv 45rsync 49rm 53ix2spark 6zsim 10make 14matlab 18vim 22musicStream 26OS img 30crdel cgroup 34audacity 38cassandra 42sirius 46ping 50skype3email 7video 11mem$d 15mysql 19scala 23minebench 27pdfview 31bioparallel 35javascript 39mongoDB 43oProfile 47photoshop 51zipkin

4browser 8latex 12http server 16vivado 20php 24n-body sim 28scons 32storm 36create VMs 40mkdir 44dwnld LF 48ssh 52graphX

Figure 11 Probability distribution function of applications launched in the cloud study per user

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30

Occ

urre

nces

Correctly Identifying App Name

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30

Occ

urre

nces

Correctly Identifying App Characteristics

0 50 100 150 200Time (min)

0

50

100

150

200

Inst

ance

s

0006121824303642485460

Act

ive

Jobs

Figure 12 Detection accuracy of (a) application labels and (b) resource characteristics Fig 12(c) number of jobs perinstance

figure 11 groups applications by framework the individuallogic and datasets differ between jobs in the same frame-work Note that we have not updated the training set for thisstudy it consists of the same 120 applications described inSection 34

Figure 12a shows the number of applications that werecorrectly identified by name across the different categoriesIn total 277 jobs are correctly labeled As expected Boltcould not assign a label to application types it has not seenbefore such as email clients or image editing applicationsHowever even when Bolt cannot label an application it canstill identify the resources it is sensitive to as shown in fig-ure 12b For 385 out of 436 jobs Bolt correctly identifiesthe jobrsquos resource characteristics For performance attackssuch as the ones described in Section 5 this information issufficient to dramatically degrade the performance of a vic-tim application Finally figure 12c shows the number of co-scheduled applications per instance throughout the durationof the experiment The bottom 14 instances remained un-used for the remaining 186 instances the number of activeapplications ranges from 1 to 6 with each job taking oneor more vCPUs The majority of misclassified applicationscorrespond to instances with 5 or more concurrently activejobs

5 Security AttacksBolt makes several cloud attacks practical and difficult todetect We discuss three possible attacks that leverage theinformation obtained through detection

51 Internal Denial of Service AttackAttack setting Denial of service attacks hurt the perfor-mance of a victim service by overloading its resources [142 51] In cloud settings specifically they can be catego-rized in two types external and internal (or host-based) at-tacks External attacks are the most conventional form ofDoS [51 13 22] These attacks utilize external servers to di-rect excessive traffic to the victims flooding their resourcesand hurting their availability External DoS attacks affectmostly PaaS and SaaS systems and include IP spoofing syn-chronization (SYN) flooding smurf buffer overflow landand teardrop attacks

In contrast internal DoS attacks take advantage of IaaSand PaaS cloud multi-tenancy to launch adversarial pro-grams on the same host as the victim and degrade its perfor-mance [14 60 34 52] For example Ristenpart et al [60]showed how an adversarial user can leverage the IP namingconventions of IaaS clouds to locate a victim VM and de-grade its performance Cloud providers are starting to builddefenses against such attacks For example EC2 offers au-toscaling that will increase the instances of a service un-der heavy resource usage Similarly there is related workon mechanisms that detect saturation in memory bandwidth

0 20 40 60 80 100 120Time (s)

0

2000

4000

6000

8000

10000

12000

99th

ile

Lat

ency

BoltNaive

Detection

Migration

0 20 40 60 80 100 120Time (s)

0

20

40

60

80

100

CP

U U

tiliz

atio

n (

) BoltNaive

Detection

Migration

Figure 13 Latency and utilization with Bolt and a naıveDoS attack that saturates CPU resources

and can trigger VM migration [71] This means that DoSattacks that simply overload a physical host are ineffectivewhen such defenses are in place We focus on host-basedDoS attacks and make them resilient against such defensesby avoiding resource saturation We leverage the informationobtained by Bolt to construct custom interference programsthat stress the resources the victim is most sensitive to with-out overloading the host

Bolt creates custom contentious workloads by combiningthe microbenchmarks used to measure the resource pressureof the victim Since these microbenchmarks are tunable Boltconfigures their intensity to a higher point than their mea-sured pressure ci during detection For example if a victimis detected as memcached and its most critical resources arethe L1-i cache (81 pressure) and the last level cache (LLC)(78 pressure) Bolt uses the two microbenchmarks for L1-i and LLC at a higher intensity than what memcached cantolerateImpact We use the DoS attack setting above against the 108applications of the controlled experiment (Section 34) TheDoS attack results in degraded performance by 22x on aver-age and up to 98x in terms of execution time Degradation ismuch more pronounced for interactive workloads like key-value stores with tail latency increasing by 8-140x a dra-matic impact for applications with strict tail latency SLAs

We now examine the impact of the DoS attack on uti-lization If it translates to resource saturation there is a highprobability that the DoS will be detected and the victim mi-grated to a new machine The experimental cluster supportslive migration If CPU utilization (sampled every 1 sec) ex-ceeds 70 the victim is migrated to an unloaded host Fig-ure 13 compares the tail latency and CPU utilization withBolt to that of a naıve DoS that saturates the CPU through acompute-intensive kernel We focus on a single victim VMrunning memcached The overhead of migration (time be-tween initiating migration and latency returning to normallevels) for this VM is 8sec Performance degradation is sim-ilar for both systems until t=80sec at which point the victimis migrated to a new host due to the compute-intensive kernelcausing utilization to exceed 70 While during migrationperformance continues to degrade once the VM resumes inthe new server latency returns to nominal levels In contrastBolt keeps utilization low and impacts the victimrsquos perfor-mance beyond t=80sec

52 Resource Freeing AttackAttack setting The goal of a resource freeing attack (RFA)is to modify the workload of a victim such that it frees upresources for the adversary improving its performance [6766] The adversarial VM consists of two components a ben-eficiary and a helper The beneficiary is the program whoseperformance the attacker wants to improve and the helperis the program that forces the victim to yield its resourcesThe RFA works by adding load in the victimrsquos critical re-sources causing other resources to free up For example ifthe victim is a memory-bound Spark job introducing addi-tional memory traffic will result in Spark stalling in memoryand lessening its pressure in the other resources until it canreclaim its required memory bandwidth When launched ina public cloud the victim of an RFA ends up paying moreand achieving worse performance compared to running inisolation

We now create a proof of concept RFA for a webserver anetwork-bound Hadoop job and a memory-bound Spark jobLaunching RFAs requires in depth knowledge of the victimrsquosresource requirements [67] which Bolt provides by identi-fying the victimrsquos dominant resource Once the dominant re-source is detected the runtime applies a custom helper pro-gram that saturates this critical resource For the webserverthe helper is a CPU-intensive benchmark launching CGI re-quests causing the victim to saturate its CPU usage servic-ing fewer real user requests and freeing up cache resourcesFor the Hadoop job we use a network-intensive benchmarksimilar to iperf which saturates network bandwidth andfrees up CPU and memory resources for the beneficiarySimilarly for Spark we use a streaming memory benchmarkthat slows down k-means and frees network and compute re-sources for the adversary The selection of the beneficiaryis of lesser importance without loss of generality we se-lect mcf a CPU-intensive benchmark from SPECCPU2006The same methodology can be used with other beneficiariesconditioned to their critical resource not overlapping withthe victimrsquosImpact Table 2 shows the performance degradation for thethree victim applications and the improvement in executiontime for the beneficiary The webserver suffers the most interms of queries per second as the helperrsquos CGI requestspollute its cache preventing it from servicing legitimate userrequests Hadoop and Spark experience significant degrada-tions in execution time due to network and memory band-width saturation respectively Execution time for mcf im-proves by 16-38 benefiting from the victim stalling in itscritical resource to improve its cache and CPU usage

53 VM Co-residency DetectionAttack setting Sections 34 and 4 showed that we can iden-tify the applications sharing a cloud infrastructure Howevera malicious user is rarely interested in a random service run-ning on a public cloud More often they target a specific

Victim Beneficiary TargetApp Perf App Perf Resource

Apache Webserver -64 (QPS) mcf +24 CPUHadoop (SVM) -36 (Exec) mcf +16 Network BW

Spark (k-means) -52 (Exec) mcf +38 Memory BW

Table 2 RFA impact on the victims and beneficiaries

workload eg a competitive e-commerce site Thereforethey need to pinpoint where the target resides in a practicalmanner This requires a launch strategy and a mechanism forco-residency detection [69] The attack is practical if the tar-get is located with high confidence in reasonable time andwith modest resource costs Bolt enables this attack to becarried out in a practical way and remains resilient againstdefense mechanisms such as virtual private clouds (VPCs)which make internal IP addresses private to a single tenantOnce a target VM is located the adversary can launch RFAor DoS attacks as previously described Co-residency detec-tion attacks rely on leakage of logical information eg IPaddress or on observing the impact of a side-channel attackdue to resource contention Bolt relies on a variation of thelatter approach

Assume a system of N servers A target victim userlaunches k VMs The adversary launches n malicious VMsInstances are launched simultaneously to avoid malicious-malicious instance co-residency [69] The probability that atleast one adversarial VM will be co-scheduled with a victiminstance is P( f ) = 1minus (1minus kN)n The adversary then usesa two-process detection scheme The sender is a process thatcreates contention to degrade the victimrsquos performance Thereceiver is the process that detects this degradation runningeither on the same host as the victim (cooperative detection)or externally (uncooperative detection) We assume an unco-operative victim which is the more general case Bolt worksas the sender to determine the type of co-residents on eachsampled host and identify any of the desired type Assumem VMs of the desired type have been detected in the sampleset Bolt now needs to prune down this space to VM(s) fromthe specific target victim It launches an external receiveraiming at the victim over a public channel eg HTTP orkey-value protocol to ping the victim service This is theonly point of communication with the victim At the sametime the sender incurs contention in the resources the victimis most sensitive to If the sender and victim are co-residentsthe receiverrsquos requests will be slower due to contention Boltquickly prunes down the sampled VMs by determining thetype of co-residents In a large cluster this reduces the timeto pinpoint a victim and the resource costs for the adver-sary It also increases the confidence in the detection in thepresence of noise from other co-residentsImpact We evaluate co-residency detection on the same40-node cluster The victim is a SQL server instantiatinga single VM The cluster also hosts 7 other VMs runningSQL Other active VMs host key-value stores Hadoop andSpark jobs Bolt launches 10 senders simultaneously on

randomly-selected machines (P( f ) = ) and detects 3VMs running SQL in the sample set It then introduces mem-ory interference as the receiver initiates a large number ofSQL queries While these queries have 816msec mean la-tency without interference latency now becomes 2614msecGiven a sim 3times latency increase we can conclude that one ofthe co-residents is the victim SQL server Detection required6sec from instantiation to receiver detection and 11 adver-sary VMs

6 Improving Security via Resource IsolationSince interference is at the heart of Boltrsquos detection method-ology isolation mechanisms that attenuate contention shouldreduce its accuracy We first evaluate to what extent ex-isting isolation techniques mitigate security vulnerabilitiesand then highlight trade-offs between security performanceand utilization We use the same setting as in the controlledexperiment of Section 34 We evaluate 4 resource-specificisolation techniques and 3 settings for OS-level isolationmechanisms baremetal containerized and virtualized Thebaremetal system does not employ any OS level isolationThe containerized setup uses Linux containers (lxc) andassigns cores to applications via cpuset cgroups Bothcontainers and VMs constrain memory capacity Baremetalexperiments do not enforce memory capacity allocationsand the Linux scheduler is free to float applications acrosscores

The first resource-specific isolation techniques is threadpinning to physical cores to constrain interference fromscheduling actions like context switching The number ofcores an application is allocated can change dynamicallyand is limited by how fast Linux can migrate tasks typicallyin the tens of milliseconds

For network isolation we use the outbound networkbandwidth partitioning capabilities of Linuxrsquos traffic con-trol Specifically we use the qdisc [8] scheduler with hier-archical token bucket queueing discipline (HTB) to enforceegress bandwidth limits The limits are set to the maximumtraffic burst rate for each application to avoid contention(ceil parameter in HTB interface) Ingress network band-width isolation has been extensively studied in previouswork [36] these approaches can be applied here as well

For DRAM bandwidth isolation there is no commer-cially available partitioning mechanism To enforce band-width isolation we use the following approach we monitorthe DRAM bandwidth usage of each application throughperformance counters [45] and modify the scheduler to onlycolocate jobs on machines that can accommodate their ag-gregate peak memory bandwidth requirements This requiresextensive application knowledge and is used simply to high-light the benefits of DRAM bandwidth isolation

Finally for last level cache (LLC) isolation we use theCache Allocation Technology (CAT) available in recent In-tel chips [11] CAT partitions the LLC in ways which in

0

20

40

60

80

100

Acc

urac

y (

)

None ThreadPinning

+Net BWPartitioning

+Mem BWPartitioning

+CachePartitioning

+CoreIsolation

BaremetalLinuxContainersVirtualMachines

Figure 14 Detection accuracy with isolation techniques

a highly-associative cache allows for non-overlapping par-titions at the granularity of a few percent of the LLC ca-pacity Each co-resident is allocated one partition configuredto its current capacity requirements [57] Partitions can beresized at runtime by reprogramming specific low-level reg-isters (MSRs) changes take effect after a few milliseconds

We add one isolation mechanism at a time in the threesystem settings (baremetal containers VMs) Figure 14shows the impact of isolation techniques on Boltrsquos detectionaccuracy As expected when no isolation is used baremetalallows for a significantly higher detection accuracy thancontainer- and VM-based systems mostly due to the lat-ter constraining core and memory capacity usage As a re-sult introducing thread pinning benefits baremetal the mostsince it reduces core contention It also benefits container-and VM-based setups to a lesser degree by eliminating un-predictability introduced by the Linux scheduler (eg con-text switching) [40] The dominant resource of each appli-cation determines which isolation technique benefits it themost Thread pinning mostly benefits workloads bound byon-chip resources such as L1L2 caches and cores Addingnetwork bandwidth partitioning lowers detection accuracyfor all three settings almost equally It primarily benefitsnetwork-bound workloads for which network interferenceconveys the most information for detection Memory band-width isolation further reduces accuracy by 10 on averagebenefiting jobs dominated by DRAM traffic Finally cachepartitioning has the most dramatic reduction in accuracyespecially for LLC-bound workloads We attribute this tothe importance cache pressure has as a detection signal Thenumber of co-residents also affects the extent to which iso-lation helps The more co-scheduled applications exist permachine the more isolation techniques degrade accuracy asthey make distinguishing between co-residents harder

Unfortunately even when all the techniques are on ac-curacy is still 50 due to two reasons current techniquesare not fine-grain enough to allow strict and scalable isola-tion and core resources (L1L2 caches CPU) are prone tointerference due to contending hyperthreads [61] To eval-uate the latter hypothesis we modify the scheduler suchthat hyperthreads of different jobs are never scheduled onthe same physical core eg if a job needs 7vCPUs it isallocated 4 dedicated physical cores The grey bars of Fig-ure 14 show the detection accuracy Baremetal instances stillallow certain applications to be detected but for container-

ized and virtualized settings accuracy drops to 14 The re-maining accuracy corresponds to disk-heavy workloads Im-proving security however comes at a performance penaltyof 34 on average in execution time as threads of the samejob are forced to contend with one another Alternativelyusers can overprovision their resource reservations to avoiddegradation which results in a 45 drop in utilization Thismeans that the cloud provider cannot leverage CPU idlenessto share machines decreasing the cost benefits of cloud com-puting Note that enforcing core isolation alone is also notsufficient as it allows a detection accuracy of 46Discussion The previous analysis highlights a design prob-lem with current datacenter platforms Traditional multi-cores are prone to contention which will only worsen asmore cores are integrated in each server and multi-tenancybecomes more pronounced Existing isolation techniques areinsufficient to mitigate security vulnerabilities and tech-niques that provide reasonable security guarantees sacrificeperformance or cost efficiency through low utilization Thishighlights the need for new fine-grain and coordinated iso-lation techniques that guarantee security at high utilizationfor shared resources

7 ConclusionsWe have presented Bolt a practical system that uses onlinedata mining techniques to identify the type and character-istics of applications running on shared cloud systems andenables practical attacks that degrade their performance Ina 40-server cluster Bolt correctly identifies 87 out of 108workloads and degrades tail latency by up to 140x We alsoused Bolt in a user study on EC2 to detect the character-istics of unknown jobs submitted by multiple users Finallywe show that while existing isolation techniques are helpfulthey are not sufficient to mitigate such attacks Bolt revealsreal easy-to-exploit threats in public clouds We hope thatthis work will motivate cloud providers and computer sci-entists to develop and deploy stricter isolation primitives incloud systems

AcknowledgementsWe sincerely thank all the participants in the user study fortheir time and effort We also thank Daniel Sanchez EdSuh Grant Ayers Mingyu Gao Ana Klimovic and the restof the MAST group as well as the anonymous reviewersfor their feedback on earlier versions of this manuscriptThis work was supported by the Stanford Platform LabNSF grant CNS-1422088 and a John and Norma BalenSesquicentennial Faculty Fellowship

References[1] Amazon ec2 httpawsamazoncomec2

[2] Aman Bakshi and Yogesh B Dujodwala Securing cloudfrom ddos attacks using intrusion detection system in virtualmachine In Proc of the Second International Conference onCommunication Software and Networks (ICCSN) 2010

[3] Luiz Barroso and Urs Hoelzle The Datacenter as a Com-puter An Introduction to the Design of Warehouse-Scale Ma-chines MC Publishers 2009

[4] Naomi Benger Joop van de Pol Nigel P Smart and YuvalYarom rdquoooh aah just a little bitrdquo A small amount of sidechannel can go a long way In Proc of the InternationalCryptographic Hardware and Embedded Systems Workshop(CHES) Busan South Korea 2014

[5] Major Bhadauria and Sally A McKee An approach toresource-aware co-scheduling for cmps In Proc of the 24thACM International Conference on Supercomputing (ICS)Tsukuba Japan 2010

[6] Leon Bottou Large-scale machine learning with stochas-tic gradient descent In Proceedings of the InternationalConference on Computational Statistics (COMPSTAT) ParisFrance 2010

[7] Eric Brewer Kubernetes The path to cloud native httpgooglQgkzYB SOCC Keynote August 2015

[8] Martin A Brown Traffic control howto httplinux-ipnetarticlesTraffic-Control-HOWTO

[9] Robin Burke Hybrid recommender systems Survey andexperiments User Modeling and User-Adapted Interaction12(4)331ndash370 November 2002

[10] Apache cassandra httpcassandraapacheorg

[11] Intel Rcopy64 and IA-32 Architecture Software DeveloperrsquosManual vol3B System Programming Guide Part 2 Septem-ber 2014

[12] Ludmila Cherkasova Diwaker Gupta and Amin VahdatComparison of the three cpu schedulers in xen SIGMET-RICS Perform Eval Rev 35(2)42ndash51 September 2007

[13] Scott A Crosby and Dan S Wallach Denial of service viaalgorithmic complexity attacks In Proceedings of the 12thConference on USENIX Security Symposium WashingtonDC 2003

[14] Marwan Darwish Abdelkader Ouda and Luiz FernandoCapretz Cloud-based ddos attacks and defenses In Procof i-Society Toronto ON 2013

[15] Christina Delimitrou and Christos Kozyrakis iBench Quan-tifying Interference for Datacenter Workloads In Proceed-ings of the 2013 IEEE International Symposium on WorkloadCharacterization (IISWC) Portland OR September 2013

[16] Christina Delimitrou and Christos Kozyrakis Paragon QoS-Aware Scheduling for Heterogeneous Datacenters In Pro-ceedings of the Eighteenth International Conference on Ar-chitectural Support for Programming Languages and Oper-ating Systems (ASPLOS) Houston TX USA 2013

[17] Christina Delimitrou and Christos Kozyrakis QoS-AwareScheduling in Heterogeneous Datacenters with Paragon InACM Transactions on Computer Systems (TOCS) Vol 31Issue 4 December 2013

[18] Christina Delimitrou and Christos Kozyrakis Quality-of-Service-Aware Scheduling in Heterogeneous Datacenterswith Paragon In IEEE Micro Special Issue on Top Picks fromthe Computer Architecture Conferences MayJune 2014

[19] Christina Delimitrou and Christos Kozyrakis Quasar Resource-Efficient and QoS-Aware Cluster Management In Proceed-ings of the Nineteenth International Conference on Architec-tural Support for Programming Languages and OperatingSystems (ASPLOS) Salt Lake City UT USA 2014

[20] Christina Delimitrou and Christos Kozyrakis HCloudResource-Efficient Provisioning in Shared Cloud SystemsIn Proceedings of the Twenty First International Conferenceon Architectural Support for Programming Languages andOperating Systems (ASPLOS) April 2016

[21] Christina Delimitrou Daniel Sanchez and Christos KozyrakisTarcil Reconciling Scheduling Speed and Quality in LargeShared Clusters In Proceedings of the Sixth ACM Symposiumon Cloud Computing (SOCC) August 2015

[22] Jake Edge Denial of service via hash collisions httplwnnetArticles474912 January 2012

[23] Benjamin Farley Ari Juels Venkatanathan Varadarajan ThomasRistenpart Kevin D Bowers and Michael M Swift More foryour money Exploiting performance heterogeneity in publicclouds In Proc of the ACM Symposium on Cloud Computing(SOCC) San Jose CA 2012

[24] Alexandra Fedorova Margo Seltzer and Michael D SmithImproving performance isolation on chip multiprocessors viaan operating system scheduler In Proceedings of the 16thIntl Conference on Parallel Architecture and CompilationTechniques (PACT) Brasov Romania 2007

[25] Alexander Felfernig and Robin Burke Constraint-based rec-ommender systems Technologies and research issues InProceedings of the ACM International Conference on Elec-tronic Commerce (ICEC) Innsbruck Austria 2008

[26] Brad Fitzpatrick Distributed caching with memcached InLinux Journal Volume 2004 Issue 124 2004

[27] Oded Goldreich and Rafail Ostrovsky Software protectionand simulation on oblivious rams J ACM 43(3)431ndash473May 1996

[28] Google container engine httpscloudgooglecomcontainer-engine

[29] Asela Gunawardana and Christopher Meek A unified ap-proach to building hybrid recommender systems In Proc ofthe Third ACM Conference on Recommender Systems (Rec-Sys) New York NY 2009

[30] Sanchika Gupta and Padam Kumar Vm profile based opti-mized network attack pattern detection scheme for ddos at-tacks in cloud In Proc of SSCC Mysore India 2013

[31] Yi Han Tansu Alpcan Jeffrey Chan and Christopher LeckieSecurity games for virtual machine allocation in cloud com-puting In 4th International Conference on Decision andGame Theory for Security Fort Worth TX 2013

[32] Amir Herzberg Haya Shulman Johanna Ullrich and EdgarWeippl Cloudoscopy Services discovery and topology map-ping In Proceedings of the ACM Workshop on Cloud Com-puting Security Workshop (CCSW) Berlin Germany 2013

[33] Ben Hindman Andy Konwinski Matei Zaharia Ali GhodsiAnthony D Joseph Randy Katz Scott Shenker and Ion Sto-ica Mesos A platform for fine-grained resource sharing inthe data center In Proceedings of NSDI Boston MA 2011

[34] Jingwei Huang David M Nicol and Roy H CampbellDenial-of-service threat to hadoopyarn clusters with multi-tenancy In Proc of the IEEE International Congress on BigData Washington DC 2014

[35] Alexandru Iosup Nezih Yigitbasi and Dick Epema Onthe performance variability of production cloud services InProceedings of CCGRID Newport Beach CA 2011

[36] Vimalkumar Jeyakumar Mohammad Alizadeh David MazieresBalaji Prabhakar Changhoon Kim and Albert GreenbergEyeq Practical network performance isolation at the edge InProc of the 10th USENIX Conference on Networked SystemsDesign and Implementation (NSDI) Lombard IL 2013

[37] Yaakoub El Khamra Hyunjoo Kim Shantenu Jha and Man-ish Parashar Exploring the performance fluctuations of hpcworkloads on clouds In Proceedings of CloudCom Indi-anapolis IN 2010

[38] Krzysztof C Kiwiel Convergence and efficiency of subgra-dient methods for quasiconvex minimization In Mathemat-ical Programming (Series A) (Berlin Heidelberg Springer)90 (1) pp 1-25 2001

[39] Ruby B Lee Rethinking computers for cybersecurity IEEEComputer 48(4)16ndash25 2015

[40] Jacob Leverich and Christos Kozyrakis Reconciling highserver utilization and sub-millisecond quality-of-service InProceedings of EuroSys Amsterdam The Netherlands 2014

[41] Host server cpu utilization in amazon ec2 cloud httpgoogl2LTx4T

[42] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser andRuby B Lee Last-level cache side-channel attacks are prac-tical In Proc of IEEE Symposium on Security and Privacy(SampP) San Jose CA 2015

[43] Fei Liu Lanfang Ren and Hongtao Bai Mitigating cross-vm side channel attack on multiple tenants cloud platform InJournal of Computers Vol 9 No 4 (2014) 1005-1013 April2014

[44] David Lo Liqun Cheng Rama Govindaraju Luiz Andre Bar-roso and Christos Kozyrakis Towards energy proportionalityfor large-scale latency-critical workloads In Proceedings ofthe 41st Annual International Symposium on Computer Ar-chitecuture (ISCA) Minneapolis MN 2014

[45] David Lo Liqun Cheng Rama Govindaraju ParthasarathyRanganathan and Christos Kozyrakis Heracles Improvingresource efficiency at scale In Proc of the 42Nd AnnualInternational Symposium on Computer Architecture (ISCA)Portland OR 2015

[46] Mahout httpmahoutapacheorg

[47] Dave Mangot Ec2 variability The numbers revealed httptechmangotcomrollerdaveentryec2_variability_the_numbers_revealed

[48] Jason Mars and Lingjia Tang Whare-map heterogeneity inrdquohomogeneousrdquo warehouse-scale computers In Proceedingsof ISCA Tel-Aviv Israel 2013

[49] Robert Martin John Demme and Simha SethumadhavanTimewarp Rethinking timekeeping and performance moni-toring mechanisms to mitigate side-channel attacks In Pro-ceedings of the International Symposium on Computer Archi-tecture (ISCA) Portland OR 2012

[50] David Meisner Christopher M Sadler Luiz Andre BarrosoWolf-Dietrich Weber and Thomas F Wenisch Power man-agement of online data-intensive services In Proceedings ofthe 38th annual international symposium on Computer archi-tecture pages 319ndash330 2011

[51] Jelena Mirkovic and Peter Reiher A taxonomy of ddos attackand ddos defense mechanisms ACM SIGCOMM ComputerCommunication Review (CCR) April 2004

[52] Thomas Moscibroda and Onur Mutlu Memory performanceattacks Denial of memory service in multi-core systemsIn Proc of 16th USENIX Security Symposium on USENIXSecurity Symposium (SS) Boston MA 2007

[53] Ripal Nathuji Aman Kansal and Alireza Ghaffarkhah Q-clouds Managing performance interference effects for qos-aware clouds In Proceedings of EuroSys ParisFrance 2010

[54] Simon Ostermann Alexandru Iosup Nezih Yigitbasi RaduProdan Thomas Fahringer and Dick Epema A performanceanalysis of ec2 cloud computing services for scientific com-puting In Lecture Notes on Cloud Computing Volume 34p115-131 2010

[55] Tao Peng Christopher Leckie and Kotagiri RamamohanaraoSurvey of network-based defense mechanisms countering thedos and ddos problems ACM Comput Surv 39(1) April2007

[56] Diego Perez-Botero Jakub Szefer and Ruby B Lee Char-acterizing hypervisor vulnerabilities in cloud computingservers In Proceedings of the 2013 International Work-shop on Security in Cloud Computing SCCASIACCSHangzhou China 2013

[57] Moinuddin K Qureshi and Yale N Patt Utility-based cachepartitioning A low-overhead high-performance runtimemechanism to partition shared caches In Proceedings ofthe 39th Annual IEEEACM International Symposium on Mi-croarchitecture MICRO 39 2006

[58] Himanshu Raj Ripal Nathuji Abhishek Singh and Paul Eng-land Resource management for isolation enhanced cloud ser-vices In Proc of the ACM Workshop on Cloud ComputingSecurity (CCSW) Chicago IL 2009

[59] Suhail Rehman and Majd Sakr Initial findings for provision-ing variation in cloud computing In Proceedings of Cloud-Com Indianapolis IN 2010

[60] Thomas Ristenpart Eran Tromer Hovav Shacham and Ste-fan Savage Hey you get off of my cloud Exploring infor-mation leakage in third-party compute clouds In Proc of theACM Conference on Computer and Communications Security(CCS) Chicago IL 2009

[61] Daniel Sanchez and Christos Kozyrakis Vantage Scalableand Efficient Fine-Grain Cache Partitioning In Proceedingsof the 38th annual International Symposium in ComputerArchitecture (ISCA-38) San Jose CA June 2011

[62] Jorg Schad Jens Dittrich and Jorge-Arnulfo Quiane-RuizRuntime measurements in the cloud Observing analyzingand reducing variance Proceedings VLDB Endow 3(1-2)460ndash471 September 2010

[63] Malte Schwarzkopf Andy Konwinski Michael Abd-El-Malek and John Wilkes Omega flexible scalable sched-ulers for large compute clusters In Proceedings of EuroSysPrague Czech Republic 2013

[64] Alan Shieh Srikanth Kandula Albert Greenberg and ChanghoonKim Seawall Performance isolation for cloud datacenternetworks In Proc of the USENIX Conference on Hot Topicsin Cloud Computing (HotCloud) Boston MA 2010

[65] David Shue Michael J Freedman and Anees Shaikh Perfor-mance isolation and fairness for multi-tenant cloud storage InProc of the 10th USENIX Conference on Operating SystemsDesign and Implementation (OSDI) Hollywood CA 2012

[66] Dan Tsafrir Yoav Etsion and Dror G Feitelson Secretlymonopolizing the cpu without superuser privileges In Procof 16th USENIX Security Symposium on USENIX SecuritySymposium Boston MA 2007

[67] Venkatanathan Varadarajan Thawan Kooburat BenjaminFarley Thomas Ristenpart and Michael Swift Resource-freeing attacks Improve your cloud performance (at yourneighborrsquos expense) In Proc of the Conference on Computerand Communications Security (CCS) Raleigh 2012

[68] Venkatanathan Varadarajan Thomas Ristenpart and MichaelSwift Scheduler-based defenses against cross-vm side-channels In Proc of the 23rd Usenix Security SymposiumSan Diego CA 2014

[69] Venkatanathan Varadarajan Yinqian Zhang Thomas Risten-part and Michael Swift A placement vulnerability study inmulti-tenant public clouds In Proc of the 24th USENIX Secu-rity Symposium (USENIX Security) Washington DC 2015

[70] Huaibin Wang Haiyun Zhou and Chundong Wang Vir-tual machine-based intrusion detection system framework incloud computing environment In Journal of Computers Oc-tober 2012

[71] Hui Wang Canturk Isci Lavanya Subramanian JongmooChoi Depei Qian and Onur Mutlu A-drm Architecture-aware distributed resource management of virtualized clus-ters In Proceedings of the 11th ACM SIGPLANSIGOPSinternational conference on Virtual Execution Environments(VEE) Istanbul Turkey 2015

[72] Ian H Witten Eibe Frank and Geoffrey Holmes DataMining Practical Machine Learning Tools and Techniques3rd Edition

[73] Zhenyu Wu Zhang Xu and Haining Wang Whispers inthe hyper-space High-speed covert channel attacks in thecloud In Proc of the 21st USENIX Conference on SecuritySymposium (USENIX Security) Bellevue WA 2012

[74] Yunjing Xu Michael Bailey Farnam Jahanian KaustubhJoshi Matti Hiltunen and Richard Schlichting An explo-ration of l2 cache covert channels in virtualized environmentsIn Proc of the 3rd ACM Workshop on Cloud Computing Se-curity Workshop (CCSW) Chicago IL 2011

[75] Zhang Xu Haining Wang and Zhenyu Wu A measurementstudy on co-residence threat inside the cloud In Proc ofthe 24th USENIX Security Symposium (USENIX Security)Washington DC 2015

[76] Yuval Yarom and Katrina Falkner Flush+reload a highresolution low noise l3 cache side-channel attack In Proc ofthe 23rd Usenix Security Symposium San Diego CA 2014

[77] Matei Zaharia Mosharaf Chowdhury Tathagata Das AnkurDave Justin Ma Murphy McCauly Michael J FranklinScott Shenker and Ion Stoica Resilient distributed datasetsA fault-tolerant abstraction for in-memory cluster computingIn Proceedings of NSDI San Jose CA 2012

[78] Yinqian Zhang Ari Juels Alina Oprea and Michael K Re-iter Homealone Co-residency detection in the cloud via side-channel analysis In Proc of the IEEE Symposium on Securityand Privacy Oakland CA 2011

[79] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-tenant side-channel attacks in paas cloudsIn Proc of the ACM SIGSAC Conference on Computer andCommunications Security (CCS) Scottsdale AZ 2014

[80] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-vm side channels and their use to ex-tract private keys In Proceedings of the ACM Conferenceon Computer and Communications Security (CCS) RaleighNC 2012

[81] Yinqian Zhang and Michael K Reiter Duppel retrofittingcommodity operating systems to mitigate cache side channelsin the cloud In Proc of the ACM Conference on Computerand Communications Security (CCS) Berlin Germany 2013

[82] Fangfei Zhou Manish Goel Peter Desnoyers and Ravi Sun-daram Scheduler vulnerabilities and coordinated attacks incloud computing J Comput Secur 21(4)533ndash559 July2013

[83] Jieming Zhu Pinjia He Zibin Zheng and Michael R LyuTowards online accurate and scalable qos prediction for run-time service adaptation In Proc of the IEEE InternationalConference on Distributed Computing Systems (ICDCS)Madrid Spain 2014

[84] Sergey Zhuravlev Sergey Blagodurov and Alexandra Fe-dorova Addressing shared resource contention in multicoreprocessors via scheduling In Proc of the Fifteenth Editionof on Architectural Support for Programming Languages andOperating Systems (ASPLOS) Pittsburgh PA 2010

  • Introduction
  • Related Work
  • Bolt
    • Threat Model
    • Application Detection
    • Challenges
    • Detection Accuracy
    • Limitations
      • Cloud User Study
      • Security Attacks
        • Internal Denial of Service Attack
        • Resource Freeing Attack
        • VM Co-residency Detection
          • Improving Security via Resource Isolation
          • Conclusions
Page 6: Bolt: I Know What You Did Last Summer In the Cloudcsl.stanford.edu/~christos/publications/2017.bolt.asplos.pdf · 2018-01-09 · Bolt: I Know What You Did Last Summer... In the Cloud

0 20 40 60 80 100CPU Pressure ()

0

20

40

60

80

100M

emor

y P

ress

ure

()

0 20 40 60 80 100Network Pressure ()

0

20

40

60

80

100

Sto

rage

Pre

ssur

e (

)

Figure 4 Coverage of resource characteristics for appli-cations in the training set

Applications Detection accuracy ()Least Load scheduler Quasar scheduler

Aggregate 87 89memcached 78 80

Hadoop 92 92Spark 85 86

Cassandra 90 89speccpu2006 84 85

Table 1 Boltrsquos detection accuracy in the controlled ex-periment with the least loaded scheduler and Quasar

While there are clusters of applications that saturate severalresources the selected workloads cover the majority of theresource usage space This enables Bolt to match any re-source profile to information from the training set even ifthe new application has not been previously seen Increasingthe training set size further did not improve detection accu-racy Finally note that there is no overlap between trainingand testing sets in terms of algorithms datasets and inputloads

In each server we instantiate an adversarial VM runningUbuntu 1404 By default the VM has 4vCPUs (2 physicalcores) to generate enough contention (Fig 10 shows a sensi-tivity analysis on the VM size) The remainder of each ma-chine is allocated to one or more friendly VMs The max VMnumber 5 in our setting depends on the available serversFriendly VMs run on Ubuntu 1404 or Debian 80 Adver-sarial VMs have no a priori information on the number andtype of their co-residents Applications are allowed to sharea physical core but must be on different hyperthreads (vC-PUs) While sharing a single hyperthread is common in pri-vate deployments hosting batch jobs in small containers [7]it is uncommon in public clouds where 1 vCPU is the min-imum guaranteed size of dedicated resources for non-idlinginstances

Table 1 shows Boltrsquos detection accuracy per applicationclass and aggregate We signal a detection as correct if Boltidentifies correctly the framework (eg Hadoop) or service(eg memcached) the application uses and the algorithmeg SVM on Hadoop or user load characteristics eg read-vs write-heavy load We do not currently examine moredetailed features such as the distribution of individual querytypes Bolt correctly identifies 87 of jobs and for certainapplication classes like databases and analytics the accuracy

CPU

L1-i

L1-d

L2

LLC

MemCap

MemBW

NetBW

DiskCap

DiskBW

0204

0608

HadoopwordCountSCPU

L1-i

L1-d

L2

LLC

MemCap

MemBW

NetBW

DiskCap

DiskBW

0204

0608

HadooprecommenderLCPU

L1-i

L1-d

L2

LLC

MemCap

MemBW

NetBW

DiskCap

DiskBW

0204

0608

New unknown app

Similarity=029 Similarity=078

Figure 5 Star charts showing the resource profiles oftwo Hadoop and one unknown job The closer the shadedarea is to a vertex the higher the pressure in the specificresource

1 2 3 4 5Number of Applications

0

20

40

60

80

100

Acc

urac

y (

)L1-i L1-d L2 LLC CPU MemCMemBw Net DiskBw DiskC

Dominant Resource

0

20

40

60

80

100

Acc

urac

y (

)

12

5

5

1918 17

15 114 2

Figure 6 Detection accuracy as a function of the numberof co-runners (left) and the appsrsquo dominant resources(right)

exceeds 90 Misclassified jobs are typically identified asworkloads with the same or similar critical resourcesPer-application profiles Note that although in Table 1 wegroup applications by programming framework or onlineservice each framework does not correspond to a single re-source profile Profiles of applications within a frameworkcan vary greatly depending on functionality complexity anddataset features Boltrsquos recommender system matches re-source profiles to specific algorithmic classes and datasetcharacteristics within each framework Figure 5 shows anexample where two Hadoop jobs one running word counton a small dataset and one running a recommender systemon a very large dataset exhibit very different resource pro-files While the third application is also a Hadoop job it isidentified as very similar to the recommender as opposed toword countNumber of co-residents The number of victim applica-tions affects detection accuracy Figure 6a shows accuracyas a function of the number of victims per machine Whenthe number of co-residents is less than 2 accuracy exceeds95 As the number increases accuracy drops since it be-comes progressively more difficult to differentiate betweenthe contention caused by each workload When there are5 co-scheduled applications accuracy is 67 Interestinglyaccuracy is higher for 4 than for 3 applications since with 4workloads the probability of sharing a core and thus gettingan accurate measurement of core pressure is higher Whileaggressive multi-tenancy affects accuracy large numbers ofco-residents in public clouds are unlikely in practice [41]

Dominant resource We now examine the correlation be-tween the resource an application puts the most pressure onand Boltrsquos ability to correctly detect it (Figure 6) The num-bers on each bar show how many applications have each re-source as dominant In general applications that are mosteasily detected are ones with high instruction cache (L1-i)memory bandwidth network bandwidth and disk capacitypressure These are workloads that fall in one of the fol-lowing categories latency-critical services with large code-bases such as webservers in-memory analytics like Sparkand disk-bound analytics like Hadoop Interestingly L2 ac-tivity is a poor indicator of application type in contrast to L1and LLC since it does not capture a significant change in theworking set size (from 32KB to 256KB for our platforms)Number of iterations In this controlled experiment westop the detection process upon correct identification of aworkload For several jobs this requires more than one it-erations of profiling and data mining Figure 7a shows thefraction of workloads that were correctly-identified after Niterations 71 of victims only require a single iterationwhile an additional 15 required a second A small frac-tion of applications needed more than two iterations whilejobs that are not identified correctly until the sixth iterationdid not benefit from additional iterations The number of ap-plications per machine also affects the iterations required fordetection (Figure 7b) When a single job is scheduled on amachine one iteration is sufficient for almost all workloadsAs the number of victims per machine increases additionaliterations are needed to disentangle the contention signals ofeach co-scheduled job

Figure 8 shows a case of detection over several iter-ations The victim is a 4vCPU instance executing differ-ent consecutive jobs Detection occurs every 20sec by de-fault (see Fig 10 for a sensitivity study on profiling fre-quency) After the initial iteration Bolt detects the job asmcf from SPECCPU2006 In the next two iterations the in-terference profile of the victim continues to fit mcfrsquos char-acteristics At t=60sec Bolt detects that the job profile haschanged and now resembles an SVM classifier running onMahout [46] Similarly at t=180sec the victim changesagain to a Spark data mining workload Changes are typi-cally captured within a few secondsResource pressure Figure 9 shows the correlation betweenthe pressure of a victim in a given resource and Boltrsquos abilityto correctly detect it We plot detection accuracy for threecore and three uncore resources the results are similar forthe remaining four In almost all cases very low or veryhigh pressure carries the most value for detection For diskbandwidth accuracy remains high except for the 20-50 re-gion which contains many application classes with moder-ate disk activity including analytics in Hadoop databasesand graph applications Resource pressure also affects thenumber of iterations Bolt needs to correctly identify a work-load For resources with a lot of value for detection such as

1 2 3 4 5 6Iterations until Detection

0

10

20

30

40

50

60

70

80

PD

F (

)

1 2 3 4 5 6Iterations until Detection

0

20

40

60

80

100

PD

F (

)

1 app2 apps3 apps4 apps5 apps

Figure 7 PDFs of iterations required for detection intotal (left) and as a function of the co-residents number(right)

0 1 2 3 4 5 6 7Time (min)

0

20

40

60

80

100

Res

ourc

e P

ress

ure

() SPEC Hadoop Spark mem$d Cassandra L1-i

L1-dL2LLCCPUmemCapmemBwNetBwDiskCapDiskBw

Figure 8 Example of workload phase detection by Bolt

the L1-i and LLCs when pressure is moderate several iter-ations are needed for accuracy to increase In contrast foroff-chip resources like network bandwidth this effect is lesspronounced In general when moderate pressure hurts de-tection accuracy (Figure 9) more iterations are needed forcorrect identificationScheduler So far we have used a least loaded schedulerwhich is commonly-used in datacenters [33 63] This sched-uler does not account for resource contention between appli-cations leading to suboptimal performance [16 24 5 8482] Recent work has shown that if interference is accountedfor at the scheduler level both performance and utilizationimprove [48 19 53] We now evaluate the impact of sucha scheduler on Boltrsquos detection accuracy Quasar [19] lever-ages machine learning to quickly determine which applica-tions can be co-scheduled on the same machine without de-structive interference We use Quasar to schedule all victimapplications and then inject Bolt in each physical host for de-tection In a real setting Bolt can trick an interference-awarescheduler by incurring very low levels of interference ini-tially until it is co-scheduled with a victim application andcan determine its type and characteristics

Table 1 shows Boltrsquos accuracy with the least loadedscheduler (LL) and Quasar Interestingly accuracy increasesslightly with Quasar 2 on average but in general the im-pact of the scheduler is small There are two reasons for theincrease first both LL and Quasar do not share a singlehyperthread between jobs so core pressure measurementsremain accurate Second because Quasar only co-schedulesjobs with different critical resources it provides Bolt with aless ldquonoisyrdquo interference signal for uncore resources mak-ing distinguishing co-residents easier Therefore reducinginterference in software alone is not sufficient to mitigatethese security threats

0 20 40 60 80 100Resource Pressure ()

50

60

70

80

90

100

Acc

urac

y (

)

0 20 40 60 80 100Resource Pressure ()

50

60

70

80

90

100

Acc

urac

y (

)

L1-iLLCCPUMemCapNetBWDiskBW

Figure 9 Detection accuracy as a function of the pres-sure victims place in various shared resources

0 50 100 150 200 250 300Profiling Interval (sec)

0

20

40

60

80

100

Acc

urac

y (

)

0 5 10 15 20 25 30Adversarial VM size (vCPUs)

0

20

40

60

80

100

Acc

urac

y (

)

0 2 4 6 8 10Number of Benchmarks

0

20

40

60

80

100

Acc

urac

y (

)

Figure 10 Sensitivity to (a) profiling frequency (b)adversarial VM size and (c) profiling benchmarks

Sensitivity analysis Finally we examine how design de-cisions in Bolt affect detection Figure 10a shows how ac-curacy changes as profiling frequency decreases For profil-ing intervals beyond 30 sec accuracy drops rapidly If pro-filing only occurs every 5 minutes almost half the victimsare incorrectly identified Note that if workloads are long-running and mostly-stable profiling can be less frequentwithout such an impact on accuracy Figure 10b shows ac-curacy as a function of the adversarial VMrsquos size We exam-ine sizes offered as on-demand instances by EC2 [1] If theadversary has fewer than 4 vCPUs its resources are insuffi-cient to create enough contention to capture the co-residentsrsquopressure Accuracy continues to grow for larger sizes how-ever the larger the adversarial instance is the less likely itwill be co-scheduled with other VMs nullifying the valueof Bolt Finally Figure 10c shows accuracy as a function ofthe number of microbenchmarks used for profiling A singlebenchmark is not sufficient to fingerprint the characteristicsof a workload however using more than 3 benchmarks hasdiminishing returns in accuracy Unless otherwise specifiedwe use 20sec profiling intervals 4 vCPU adversarial VMsand 2 benchmarks for initial profiling

35 LimitationsWhile Bolt can accurately detect most frequently-run work-loads it has certain limitations First when no job sharesa core with the adversary the system assumes linear rela-tion between the co-residentsrsquo resource pressure in uncoreresources While this is true in some cases it may not begenerally accurate We will explore alternative ways of dis-tinguishing co-residents in future work Second the systemcannot differentiate between multiple jobs running in a sin-gle instance and multiple instances running one job eachWhile this does not affect detection if a covert-channel at-tack is aimed at a particular user more accurate informationwould increase the adversaryrsquos leverage Similarly Bolt can-not currently distinguish between two copies of the same job

sharing a platform at low load and one copy of the job thatruns at higher load

4 Cloud User StudyWe now evaluate Bolt in a real-world environment withmultiple users and unknown applications We conducted amulti-user study on Amazon EC2 subject to the followingrules

bull We have engaged 20 independent users from two aca-demic institutions (Cornell University and Stanford Uni-versity) provided them with access to a shared clusterof 200 c38xlarge instances on EC2 with 32 vCPUs and60GB of memory each and asked them to submit severalapplications of their choosing

bull On each instance we maintain a 4-vCPU VM for Bolt Allother resources are made available to the users as VMs

bull Each user submits one or more applications and all usershave equal priority ie no application will be preemptedto accommodate the job of another user

bull Users were instructed to behave amenably ie to avoidconsuming disproportionate resources so that the systemcan observe a representative job sample from all users

bull Users were also advised to pin their applications to spe-cific cores to eliminate interference from the OS sched-uler Physical cores can be shared across jobs howevereach vCPU (hardware thread) is dedicated to a single ap-plication

bull Users can optionally select the instance(s) they want tolaunch their applications on We provide a spreadsheetwith a list of all available instances and their utilizationat each point in time This information as well as any apriori information on the type number and characteris-tics of submitted applications is concealed from Bolt

bull If a user does not select an instance a least-loaded sched-uler selects the VM with the most available cores andmemory

Once users start launching jobs in the cluster we instan-tiate Bolt on each machine and activate it periodically to de-tect the type characteristics and number of co-scheduledapplications The entire experiment lasts approximately 4hours beyond which point all instances are terminated Fig-ure 11 shows the PDF of application types the different userslaunched provided by the users after the completion of theexperiment Different colors correspond to different users Atotal of 436 jobs were submitted to the cluster The applica-tion mix includes analytics jobs in Hadoop and Spark sci-entific benchmarks (Parsec Bioparallel) hardware synthesistools (Cadence VivadoHLS) multicore (zsim) and n-bodysimulations email clients web browsers web servers (http)music and video streaming services and databases amongothers Each job may take one or more vCPUs Even though

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30O

ccur

renc

esUser 1User 2User 3User 4User 5User 6User 7User 8User 9User 10

User 11User 12User 13User 14User 15User 16User 17User 18User 19User 20

1hadoop 5cadence 9MLPython 13spec 17parsec 21postgres 25ppt 29du -h 33cpu burn 37html 41cpmv 45rsync 49rm 53ix2spark 6zsim 10make 14matlab 18vim 22musicStream 26OS img 30crdel cgroup 34audacity 38cassandra 42sirius 46ping 50skype3email 7video 11mem$d 15mysql 19scala 23minebench 27pdfview 31bioparallel 35javascript 39mongoDB 43oProfile 47photoshop 51zipkin

4browser 8latex 12http server 16vivado 20php 24n-body sim 28scons 32storm 36create VMs 40mkdir 44dwnld LF 48ssh 52graphX

Figure 11 Probability distribution function of applications launched in the cloud study per user

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30

Occ

urre

nces

Correctly Identifying App Name

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30

Occ

urre

nces

Correctly Identifying App Characteristics

0 50 100 150 200Time (min)

0

50

100

150

200

Inst

ance

s

0006121824303642485460

Act

ive

Jobs

Figure 12 Detection accuracy of (a) application labels and (b) resource characteristics Fig 12(c) number of jobs perinstance

figure 11 groups applications by framework the individuallogic and datasets differ between jobs in the same frame-work Note that we have not updated the training set for thisstudy it consists of the same 120 applications described inSection 34

Figure 12a shows the number of applications that werecorrectly identified by name across the different categoriesIn total 277 jobs are correctly labeled As expected Boltcould not assign a label to application types it has not seenbefore such as email clients or image editing applicationsHowever even when Bolt cannot label an application it canstill identify the resources it is sensitive to as shown in fig-ure 12b For 385 out of 436 jobs Bolt correctly identifiesthe jobrsquos resource characteristics For performance attackssuch as the ones described in Section 5 this information issufficient to dramatically degrade the performance of a vic-tim application Finally figure 12c shows the number of co-scheduled applications per instance throughout the durationof the experiment The bottom 14 instances remained un-used for the remaining 186 instances the number of activeapplications ranges from 1 to 6 with each job taking oneor more vCPUs The majority of misclassified applicationscorrespond to instances with 5 or more concurrently activejobs

5 Security AttacksBolt makes several cloud attacks practical and difficult todetect We discuss three possible attacks that leverage theinformation obtained through detection

51 Internal Denial of Service AttackAttack setting Denial of service attacks hurt the perfor-mance of a victim service by overloading its resources [142 51] In cloud settings specifically they can be catego-rized in two types external and internal (or host-based) at-tacks External attacks are the most conventional form ofDoS [51 13 22] These attacks utilize external servers to di-rect excessive traffic to the victims flooding their resourcesand hurting their availability External DoS attacks affectmostly PaaS and SaaS systems and include IP spoofing syn-chronization (SYN) flooding smurf buffer overflow landand teardrop attacks

In contrast internal DoS attacks take advantage of IaaSand PaaS cloud multi-tenancy to launch adversarial pro-grams on the same host as the victim and degrade its perfor-mance [14 60 34 52] For example Ristenpart et al [60]showed how an adversarial user can leverage the IP namingconventions of IaaS clouds to locate a victim VM and de-grade its performance Cloud providers are starting to builddefenses against such attacks For example EC2 offers au-toscaling that will increase the instances of a service un-der heavy resource usage Similarly there is related workon mechanisms that detect saturation in memory bandwidth

0 20 40 60 80 100 120Time (s)

0

2000

4000

6000

8000

10000

12000

99th

ile

Lat

ency

BoltNaive

Detection

Migration

0 20 40 60 80 100 120Time (s)

0

20

40

60

80

100

CP

U U

tiliz

atio

n (

) BoltNaive

Detection

Migration

Figure 13 Latency and utilization with Bolt and a naıveDoS attack that saturates CPU resources

and can trigger VM migration [71] This means that DoSattacks that simply overload a physical host are ineffectivewhen such defenses are in place We focus on host-basedDoS attacks and make them resilient against such defensesby avoiding resource saturation We leverage the informationobtained by Bolt to construct custom interference programsthat stress the resources the victim is most sensitive to with-out overloading the host

Bolt creates custom contentious workloads by combiningthe microbenchmarks used to measure the resource pressureof the victim Since these microbenchmarks are tunable Boltconfigures their intensity to a higher point than their mea-sured pressure ci during detection For example if a victimis detected as memcached and its most critical resources arethe L1-i cache (81 pressure) and the last level cache (LLC)(78 pressure) Bolt uses the two microbenchmarks for L1-i and LLC at a higher intensity than what memcached cantolerateImpact We use the DoS attack setting above against the 108applications of the controlled experiment (Section 34) TheDoS attack results in degraded performance by 22x on aver-age and up to 98x in terms of execution time Degradation ismuch more pronounced for interactive workloads like key-value stores with tail latency increasing by 8-140x a dra-matic impact for applications with strict tail latency SLAs

We now examine the impact of the DoS attack on uti-lization If it translates to resource saturation there is a highprobability that the DoS will be detected and the victim mi-grated to a new machine The experimental cluster supportslive migration If CPU utilization (sampled every 1 sec) ex-ceeds 70 the victim is migrated to an unloaded host Fig-ure 13 compares the tail latency and CPU utilization withBolt to that of a naıve DoS that saturates the CPU through acompute-intensive kernel We focus on a single victim VMrunning memcached The overhead of migration (time be-tween initiating migration and latency returning to normallevels) for this VM is 8sec Performance degradation is sim-ilar for both systems until t=80sec at which point the victimis migrated to a new host due to the compute-intensive kernelcausing utilization to exceed 70 While during migrationperformance continues to degrade once the VM resumes inthe new server latency returns to nominal levels In contrastBolt keeps utilization low and impacts the victimrsquos perfor-mance beyond t=80sec

52 Resource Freeing AttackAttack setting The goal of a resource freeing attack (RFA)is to modify the workload of a victim such that it frees upresources for the adversary improving its performance [6766] The adversarial VM consists of two components a ben-eficiary and a helper The beneficiary is the program whoseperformance the attacker wants to improve and the helperis the program that forces the victim to yield its resourcesThe RFA works by adding load in the victimrsquos critical re-sources causing other resources to free up For example ifthe victim is a memory-bound Spark job introducing addi-tional memory traffic will result in Spark stalling in memoryand lessening its pressure in the other resources until it canreclaim its required memory bandwidth When launched ina public cloud the victim of an RFA ends up paying moreand achieving worse performance compared to running inisolation

We now create a proof of concept RFA for a webserver anetwork-bound Hadoop job and a memory-bound Spark jobLaunching RFAs requires in depth knowledge of the victimrsquosresource requirements [67] which Bolt provides by identi-fying the victimrsquos dominant resource Once the dominant re-source is detected the runtime applies a custom helper pro-gram that saturates this critical resource For the webserverthe helper is a CPU-intensive benchmark launching CGI re-quests causing the victim to saturate its CPU usage servic-ing fewer real user requests and freeing up cache resourcesFor the Hadoop job we use a network-intensive benchmarksimilar to iperf which saturates network bandwidth andfrees up CPU and memory resources for the beneficiarySimilarly for Spark we use a streaming memory benchmarkthat slows down k-means and frees network and compute re-sources for the adversary The selection of the beneficiaryis of lesser importance without loss of generality we se-lect mcf a CPU-intensive benchmark from SPECCPU2006The same methodology can be used with other beneficiariesconditioned to their critical resource not overlapping withthe victimrsquosImpact Table 2 shows the performance degradation for thethree victim applications and the improvement in executiontime for the beneficiary The webserver suffers the most interms of queries per second as the helperrsquos CGI requestspollute its cache preventing it from servicing legitimate userrequests Hadoop and Spark experience significant degrada-tions in execution time due to network and memory band-width saturation respectively Execution time for mcf im-proves by 16-38 benefiting from the victim stalling in itscritical resource to improve its cache and CPU usage

53 VM Co-residency DetectionAttack setting Sections 34 and 4 showed that we can iden-tify the applications sharing a cloud infrastructure Howevera malicious user is rarely interested in a random service run-ning on a public cloud More often they target a specific

Victim Beneficiary TargetApp Perf App Perf Resource

Apache Webserver -64 (QPS) mcf +24 CPUHadoop (SVM) -36 (Exec) mcf +16 Network BW

Spark (k-means) -52 (Exec) mcf +38 Memory BW

Table 2 RFA impact on the victims and beneficiaries

workload eg a competitive e-commerce site Thereforethey need to pinpoint where the target resides in a practicalmanner This requires a launch strategy and a mechanism forco-residency detection [69] The attack is practical if the tar-get is located with high confidence in reasonable time andwith modest resource costs Bolt enables this attack to becarried out in a practical way and remains resilient againstdefense mechanisms such as virtual private clouds (VPCs)which make internal IP addresses private to a single tenantOnce a target VM is located the adversary can launch RFAor DoS attacks as previously described Co-residency detec-tion attacks rely on leakage of logical information eg IPaddress or on observing the impact of a side-channel attackdue to resource contention Bolt relies on a variation of thelatter approach

Assume a system of N servers A target victim userlaunches k VMs The adversary launches n malicious VMsInstances are launched simultaneously to avoid malicious-malicious instance co-residency [69] The probability that atleast one adversarial VM will be co-scheduled with a victiminstance is P( f ) = 1minus (1minus kN)n The adversary then usesa two-process detection scheme The sender is a process thatcreates contention to degrade the victimrsquos performance Thereceiver is the process that detects this degradation runningeither on the same host as the victim (cooperative detection)or externally (uncooperative detection) We assume an unco-operative victim which is the more general case Bolt worksas the sender to determine the type of co-residents on eachsampled host and identify any of the desired type Assumem VMs of the desired type have been detected in the sampleset Bolt now needs to prune down this space to VM(s) fromthe specific target victim It launches an external receiveraiming at the victim over a public channel eg HTTP orkey-value protocol to ping the victim service This is theonly point of communication with the victim At the sametime the sender incurs contention in the resources the victimis most sensitive to If the sender and victim are co-residentsthe receiverrsquos requests will be slower due to contention Boltquickly prunes down the sampled VMs by determining thetype of co-residents In a large cluster this reduces the timeto pinpoint a victim and the resource costs for the adver-sary It also increases the confidence in the detection in thepresence of noise from other co-residentsImpact We evaluate co-residency detection on the same40-node cluster The victim is a SQL server instantiatinga single VM The cluster also hosts 7 other VMs runningSQL Other active VMs host key-value stores Hadoop andSpark jobs Bolt launches 10 senders simultaneously on

randomly-selected machines (P( f ) = ) and detects 3VMs running SQL in the sample set It then introduces mem-ory interference as the receiver initiates a large number ofSQL queries While these queries have 816msec mean la-tency without interference latency now becomes 2614msecGiven a sim 3times latency increase we can conclude that one ofthe co-residents is the victim SQL server Detection required6sec from instantiation to receiver detection and 11 adver-sary VMs

6 Improving Security via Resource IsolationSince interference is at the heart of Boltrsquos detection method-ology isolation mechanisms that attenuate contention shouldreduce its accuracy We first evaluate to what extent ex-isting isolation techniques mitigate security vulnerabilitiesand then highlight trade-offs between security performanceand utilization We use the same setting as in the controlledexperiment of Section 34 We evaluate 4 resource-specificisolation techniques and 3 settings for OS-level isolationmechanisms baremetal containerized and virtualized Thebaremetal system does not employ any OS level isolationThe containerized setup uses Linux containers (lxc) andassigns cores to applications via cpuset cgroups Bothcontainers and VMs constrain memory capacity Baremetalexperiments do not enforce memory capacity allocationsand the Linux scheduler is free to float applications acrosscores

The first resource-specific isolation techniques is threadpinning to physical cores to constrain interference fromscheduling actions like context switching The number ofcores an application is allocated can change dynamicallyand is limited by how fast Linux can migrate tasks typicallyin the tens of milliseconds

For network isolation we use the outbound networkbandwidth partitioning capabilities of Linuxrsquos traffic con-trol Specifically we use the qdisc [8] scheduler with hier-archical token bucket queueing discipline (HTB) to enforceegress bandwidth limits The limits are set to the maximumtraffic burst rate for each application to avoid contention(ceil parameter in HTB interface) Ingress network band-width isolation has been extensively studied in previouswork [36] these approaches can be applied here as well

For DRAM bandwidth isolation there is no commer-cially available partitioning mechanism To enforce band-width isolation we use the following approach we monitorthe DRAM bandwidth usage of each application throughperformance counters [45] and modify the scheduler to onlycolocate jobs on machines that can accommodate their ag-gregate peak memory bandwidth requirements This requiresextensive application knowledge and is used simply to high-light the benefits of DRAM bandwidth isolation

Finally for last level cache (LLC) isolation we use theCache Allocation Technology (CAT) available in recent In-tel chips [11] CAT partitions the LLC in ways which in

0

20

40

60

80

100

Acc

urac

y (

)

None ThreadPinning

+Net BWPartitioning

+Mem BWPartitioning

+CachePartitioning

+CoreIsolation

BaremetalLinuxContainersVirtualMachines

Figure 14 Detection accuracy with isolation techniques

a highly-associative cache allows for non-overlapping par-titions at the granularity of a few percent of the LLC ca-pacity Each co-resident is allocated one partition configuredto its current capacity requirements [57] Partitions can beresized at runtime by reprogramming specific low-level reg-isters (MSRs) changes take effect after a few milliseconds

We add one isolation mechanism at a time in the threesystem settings (baremetal containers VMs) Figure 14shows the impact of isolation techniques on Boltrsquos detectionaccuracy As expected when no isolation is used baremetalallows for a significantly higher detection accuracy thancontainer- and VM-based systems mostly due to the lat-ter constraining core and memory capacity usage As a re-sult introducing thread pinning benefits baremetal the mostsince it reduces core contention It also benefits container-and VM-based setups to a lesser degree by eliminating un-predictability introduced by the Linux scheduler (eg con-text switching) [40] The dominant resource of each appli-cation determines which isolation technique benefits it themost Thread pinning mostly benefits workloads bound byon-chip resources such as L1L2 caches and cores Addingnetwork bandwidth partitioning lowers detection accuracyfor all three settings almost equally It primarily benefitsnetwork-bound workloads for which network interferenceconveys the most information for detection Memory band-width isolation further reduces accuracy by 10 on averagebenefiting jobs dominated by DRAM traffic Finally cachepartitioning has the most dramatic reduction in accuracyespecially for LLC-bound workloads We attribute this tothe importance cache pressure has as a detection signal Thenumber of co-residents also affects the extent to which iso-lation helps The more co-scheduled applications exist permachine the more isolation techniques degrade accuracy asthey make distinguishing between co-residents harder

Unfortunately even when all the techniques are on ac-curacy is still 50 due to two reasons current techniquesare not fine-grain enough to allow strict and scalable isola-tion and core resources (L1L2 caches CPU) are prone tointerference due to contending hyperthreads [61] To eval-uate the latter hypothesis we modify the scheduler suchthat hyperthreads of different jobs are never scheduled onthe same physical core eg if a job needs 7vCPUs it isallocated 4 dedicated physical cores The grey bars of Fig-ure 14 show the detection accuracy Baremetal instances stillallow certain applications to be detected but for container-

ized and virtualized settings accuracy drops to 14 The re-maining accuracy corresponds to disk-heavy workloads Im-proving security however comes at a performance penaltyof 34 on average in execution time as threads of the samejob are forced to contend with one another Alternativelyusers can overprovision their resource reservations to avoiddegradation which results in a 45 drop in utilization Thismeans that the cloud provider cannot leverage CPU idlenessto share machines decreasing the cost benefits of cloud com-puting Note that enforcing core isolation alone is also notsufficient as it allows a detection accuracy of 46Discussion The previous analysis highlights a design prob-lem with current datacenter platforms Traditional multi-cores are prone to contention which will only worsen asmore cores are integrated in each server and multi-tenancybecomes more pronounced Existing isolation techniques areinsufficient to mitigate security vulnerabilities and tech-niques that provide reasonable security guarantees sacrificeperformance or cost efficiency through low utilization Thishighlights the need for new fine-grain and coordinated iso-lation techniques that guarantee security at high utilizationfor shared resources

7 ConclusionsWe have presented Bolt a practical system that uses onlinedata mining techniques to identify the type and character-istics of applications running on shared cloud systems andenables practical attacks that degrade their performance Ina 40-server cluster Bolt correctly identifies 87 out of 108workloads and degrades tail latency by up to 140x We alsoused Bolt in a user study on EC2 to detect the character-istics of unknown jobs submitted by multiple users Finallywe show that while existing isolation techniques are helpfulthey are not sufficient to mitigate such attacks Bolt revealsreal easy-to-exploit threats in public clouds We hope thatthis work will motivate cloud providers and computer sci-entists to develop and deploy stricter isolation primitives incloud systems

AcknowledgementsWe sincerely thank all the participants in the user study fortheir time and effort We also thank Daniel Sanchez EdSuh Grant Ayers Mingyu Gao Ana Klimovic and the restof the MAST group as well as the anonymous reviewersfor their feedback on earlier versions of this manuscriptThis work was supported by the Stanford Platform LabNSF grant CNS-1422088 and a John and Norma BalenSesquicentennial Faculty Fellowship

References[1] Amazon ec2 httpawsamazoncomec2

[2] Aman Bakshi and Yogesh B Dujodwala Securing cloudfrom ddos attacks using intrusion detection system in virtualmachine In Proc of the Second International Conference onCommunication Software and Networks (ICCSN) 2010

[3] Luiz Barroso and Urs Hoelzle The Datacenter as a Com-puter An Introduction to the Design of Warehouse-Scale Ma-chines MC Publishers 2009

[4] Naomi Benger Joop van de Pol Nigel P Smart and YuvalYarom rdquoooh aah just a little bitrdquo A small amount of sidechannel can go a long way In Proc of the InternationalCryptographic Hardware and Embedded Systems Workshop(CHES) Busan South Korea 2014

[5] Major Bhadauria and Sally A McKee An approach toresource-aware co-scheduling for cmps In Proc of the 24thACM International Conference on Supercomputing (ICS)Tsukuba Japan 2010

[6] Leon Bottou Large-scale machine learning with stochas-tic gradient descent In Proceedings of the InternationalConference on Computational Statistics (COMPSTAT) ParisFrance 2010

[7] Eric Brewer Kubernetes The path to cloud native httpgooglQgkzYB SOCC Keynote August 2015

[8] Martin A Brown Traffic control howto httplinux-ipnetarticlesTraffic-Control-HOWTO

[9] Robin Burke Hybrid recommender systems Survey andexperiments User Modeling and User-Adapted Interaction12(4)331ndash370 November 2002

[10] Apache cassandra httpcassandraapacheorg

[11] Intel Rcopy64 and IA-32 Architecture Software DeveloperrsquosManual vol3B System Programming Guide Part 2 Septem-ber 2014

[12] Ludmila Cherkasova Diwaker Gupta and Amin VahdatComparison of the three cpu schedulers in xen SIGMET-RICS Perform Eval Rev 35(2)42ndash51 September 2007

[13] Scott A Crosby and Dan S Wallach Denial of service viaalgorithmic complexity attacks In Proceedings of the 12thConference on USENIX Security Symposium WashingtonDC 2003

[14] Marwan Darwish Abdelkader Ouda and Luiz FernandoCapretz Cloud-based ddos attacks and defenses In Procof i-Society Toronto ON 2013

[15] Christina Delimitrou and Christos Kozyrakis iBench Quan-tifying Interference for Datacenter Workloads In Proceed-ings of the 2013 IEEE International Symposium on WorkloadCharacterization (IISWC) Portland OR September 2013

[16] Christina Delimitrou and Christos Kozyrakis Paragon QoS-Aware Scheduling for Heterogeneous Datacenters In Pro-ceedings of the Eighteenth International Conference on Ar-chitectural Support for Programming Languages and Oper-ating Systems (ASPLOS) Houston TX USA 2013

[17] Christina Delimitrou and Christos Kozyrakis QoS-AwareScheduling in Heterogeneous Datacenters with Paragon InACM Transactions on Computer Systems (TOCS) Vol 31Issue 4 December 2013

[18] Christina Delimitrou and Christos Kozyrakis Quality-of-Service-Aware Scheduling in Heterogeneous Datacenterswith Paragon In IEEE Micro Special Issue on Top Picks fromthe Computer Architecture Conferences MayJune 2014

[19] Christina Delimitrou and Christos Kozyrakis Quasar Resource-Efficient and QoS-Aware Cluster Management In Proceed-ings of the Nineteenth International Conference on Architec-tural Support for Programming Languages and OperatingSystems (ASPLOS) Salt Lake City UT USA 2014

[20] Christina Delimitrou and Christos Kozyrakis HCloudResource-Efficient Provisioning in Shared Cloud SystemsIn Proceedings of the Twenty First International Conferenceon Architectural Support for Programming Languages andOperating Systems (ASPLOS) April 2016

[21] Christina Delimitrou Daniel Sanchez and Christos KozyrakisTarcil Reconciling Scheduling Speed and Quality in LargeShared Clusters In Proceedings of the Sixth ACM Symposiumon Cloud Computing (SOCC) August 2015

[22] Jake Edge Denial of service via hash collisions httplwnnetArticles474912 January 2012

[23] Benjamin Farley Ari Juels Venkatanathan Varadarajan ThomasRistenpart Kevin D Bowers and Michael M Swift More foryour money Exploiting performance heterogeneity in publicclouds In Proc of the ACM Symposium on Cloud Computing(SOCC) San Jose CA 2012

[24] Alexandra Fedorova Margo Seltzer and Michael D SmithImproving performance isolation on chip multiprocessors viaan operating system scheduler In Proceedings of the 16thIntl Conference on Parallel Architecture and CompilationTechniques (PACT) Brasov Romania 2007

[25] Alexander Felfernig and Robin Burke Constraint-based rec-ommender systems Technologies and research issues InProceedings of the ACM International Conference on Elec-tronic Commerce (ICEC) Innsbruck Austria 2008

[26] Brad Fitzpatrick Distributed caching with memcached InLinux Journal Volume 2004 Issue 124 2004

[27] Oded Goldreich and Rafail Ostrovsky Software protectionand simulation on oblivious rams J ACM 43(3)431ndash473May 1996

[28] Google container engine httpscloudgooglecomcontainer-engine

[29] Asela Gunawardana and Christopher Meek A unified ap-proach to building hybrid recommender systems In Proc ofthe Third ACM Conference on Recommender Systems (Rec-Sys) New York NY 2009

[30] Sanchika Gupta and Padam Kumar Vm profile based opti-mized network attack pattern detection scheme for ddos at-tacks in cloud In Proc of SSCC Mysore India 2013

[31] Yi Han Tansu Alpcan Jeffrey Chan and Christopher LeckieSecurity games for virtual machine allocation in cloud com-puting In 4th International Conference on Decision andGame Theory for Security Fort Worth TX 2013

[32] Amir Herzberg Haya Shulman Johanna Ullrich and EdgarWeippl Cloudoscopy Services discovery and topology map-ping In Proceedings of the ACM Workshop on Cloud Com-puting Security Workshop (CCSW) Berlin Germany 2013

[33] Ben Hindman Andy Konwinski Matei Zaharia Ali GhodsiAnthony D Joseph Randy Katz Scott Shenker and Ion Sto-ica Mesos A platform for fine-grained resource sharing inthe data center In Proceedings of NSDI Boston MA 2011

[34] Jingwei Huang David M Nicol and Roy H CampbellDenial-of-service threat to hadoopyarn clusters with multi-tenancy In Proc of the IEEE International Congress on BigData Washington DC 2014

[35] Alexandru Iosup Nezih Yigitbasi and Dick Epema Onthe performance variability of production cloud services InProceedings of CCGRID Newport Beach CA 2011

[36] Vimalkumar Jeyakumar Mohammad Alizadeh David MazieresBalaji Prabhakar Changhoon Kim and Albert GreenbergEyeq Practical network performance isolation at the edge InProc of the 10th USENIX Conference on Networked SystemsDesign and Implementation (NSDI) Lombard IL 2013

[37] Yaakoub El Khamra Hyunjoo Kim Shantenu Jha and Man-ish Parashar Exploring the performance fluctuations of hpcworkloads on clouds In Proceedings of CloudCom Indi-anapolis IN 2010

[38] Krzysztof C Kiwiel Convergence and efficiency of subgra-dient methods for quasiconvex minimization In Mathemat-ical Programming (Series A) (Berlin Heidelberg Springer)90 (1) pp 1-25 2001

[39] Ruby B Lee Rethinking computers for cybersecurity IEEEComputer 48(4)16ndash25 2015

[40] Jacob Leverich and Christos Kozyrakis Reconciling highserver utilization and sub-millisecond quality-of-service InProceedings of EuroSys Amsterdam The Netherlands 2014

[41] Host server cpu utilization in amazon ec2 cloud httpgoogl2LTx4T

[42] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser andRuby B Lee Last-level cache side-channel attacks are prac-tical In Proc of IEEE Symposium on Security and Privacy(SampP) San Jose CA 2015

[43] Fei Liu Lanfang Ren and Hongtao Bai Mitigating cross-vm side channel attack on multiple tenants cloud platform InJournal of Computers Vol 9 No 4 (2014) 1005-1013 April2014

[44] David Lo Liqun Cheng Rama Govindaraju Luiz Andre Bar-roso and Christos Kozyrakis Towards energy proportionalityfor large-scale latency-critical workloads In Proceedings ofthe 41st Annual International Symposium on Computer Ar-chitecuture (ISCA) Minneapolis MN 2014

[45] David Lo Liqun Cheng Rama Govindaraju ParthasarathyRanganathan and Christos Kozyrakis Heracles Improvingresource efficiency at scale In Proc of the 42Nd AnnualInternational Symposium on Computer Architecture (ISCA)Portland OR 2015

[46] Mahout httpmahoutapacheorg

[47] Dave Mangot Ec2 variability The numbers revealed httptechmangotcomrollerdaveentryec2_variability_the_numbers_revealed

[48] Jason Mars and Lingjia Tang Whare-map heterogeneity inrdquohomogeneousrdquo warehouse-scale computers In Proceedingsof ISCA Tel-Aviv Israel 2013

[49] Robert Martin John Demme and Simha SethumadhavanTimewarp Rethinking timekeeping and performance moni-toring mechanisms to mitigate side-channel attacks In Pro-ceedings of the International Symposium on Computer Archi-tecture (ISCA) Portland OR 2012

[50] David Meisner Christopher M Sadler Luiz Andre BarrosoWolf-Dietrich Weber and Thomas F Wenisch Power man-agement of online data-intensive services In Proceedings ofthe 38th annual international symposium on Computer archi-tecture pages 319ndash330 2011

[51] Jelena Mirkovic and Peter Reiher A taxonomy of ddos attackand ddos defense mechanisms ACM SIGCOMM ComputerCommunication Review (CCR) April 2004

[52] Thomas Moscibroda and Onur Mutlu Memory performanceattacks Denial of memory service in multi-core systemsIn Proc of 16th USENIX Security Symposium on USENIXSecurity Symposium (SS) Boston MA 2007

[53] Ripal Nathuji Aman Kansal and Alireza Ghaffarkhah Q-clouds Managing performance interference effects for qos-aware clouds In Proceedings of EuroSys ParisFrance 2010

[54] Simon Ostermann Alexandru Iosup Nezih Yigitbasi RaduProdan Thomas Fahringer and Dick Epema A performanceanalysis of ec2 cloud computing services for scientific com-puting In Lecture Notes on Cloud Computing Volume 34p115-131 2010

[55] Tao Peng Christopher Leckie and Kotagiri RamamohanaraoSurvey of network-based defense mechanisms countering thedos and ddos problems ACM Comput Surv 39(1) April2007

[56] Diego Perez-Botero Jakub Szefer and Ruby B Lee Char-acterizing hypervisor vulnerabilities in cloud computingservers In Proceedings of the 2013 International Work-shop on Security in Cloud Computing SCCASIACCSHangzhou China 2013

[57] Moinuddin K Qureshi and Yale N Patt Utility-based cachepartitioning A low-overhead high-performance runtimemechanism to partition shared caches In Proceedings ofthe 39th Annual IEEEACM International Symposium on Mi-croarchitecture MICRO 39 2006

[58] Himanshu Raj Ripal Nathuji Abhishek Singh and Paul Eng-land Resource management for isolation enhanced cloud ser-vices In Proc of the ACM Workshop on Cloud ComputingSecurity (CCSW) Chicago IL 2009

[59] Suhail Rehman and Majd Sakr Initial findings for provision-ing variation in cloud computing In Proceedings of Cloud-Com Indianapolis IN 2010

[60] Thomas Ristenpart Eran Tromer Hovav Shacham and Ste-fan Savage Hey you get off of my cloud Exploring infor-mation leakage in third-party compute clouds In Proc of theACM Conference on Computer and Communications Security(CCS) Chicago IL 2009

[61] Daniel Sanchez and Christos Kozyrakis Vantage Scalableand Efficient Fine-Grain Cache Partitioning In Proceedingsof the 38th annual International Symposium in ComputerArchitecture (ISCA-38) San Jose CA June 2011

[62] Jorg Schad Jens Dittrich and Jorge-Arnulfo Quiane-RuizRuntime measurements in the cloud Observing analyzingand reducing variance Proceedings VLDB Endow 3(1-2)460ndash471 September 2010

[63] Malte Schwarzkopf Andy Konwinski Michael Abd-El-Malek and John Wilkes Omega flexible scalable sched-ulers for large compute clusters In Proceedings of EuroSysPrague Czech Republic 2013

[64] Alan Shieh Srikanth Kandula Albert Greenberg and ChanghoonKim Seawall Performance isolation for cloud datacenternetworks In Proc of the USENIX Conference on Hot Topicsin Cloud Computing (HotCloud) Boston MA 2010

[65] David Shue Michael J Freedman and Anees Shaikh Perfor-mance isolation and fairness for multi-tenant cloud storage InProc of the 10th USENIX Conference on Operating SystemsDesign and Implementation (OSDI) Hollywood CA 2012

[66] Dan Tsafrir Yoav Etsion and Dror G Feitelson Secretlymonopolizing the cpu without superuser privileges In Procof 16th USENIX Security Symposium on USENIX SecuritySymposium Boston MA 2007

[67] Venkatanathan Varadarajan Thawan Kooburat BenjaminFarley Thomas Ristenpart and Michael Swift Resource-freeing attacks Improve your cloud performance (at yourneighborrsquos expense) In Proc of the Conference on Computerand Communications Security (CCS) Raleigh 2012

[68] Venkatanathan Varadarajan Thomas Ristenpart and MichaelSwift Scheduler-based defenses against cross-vm side-channels In Proc of the 23rd Usenix Security SymposiumSan Diego CA 2014

[69] Venkatanathan Varadarajan Yinqian Zhang Thomas Risten-part and Michael Swift A placement vulnerability study inmulti-tenant public clouds In Proc of the 24th USENIX Secu-rity Symposium (USENIX Security) Washington DC 2015

[70] Huaibin Wang Haiyun Zhou and Chundong Wang Vir-tual machine-based intrusion detection system framework incloud computing environment In Journal of Computers Oc-tober 2012

[71] Hui Wang Canturk Isci Lavanya Subramanian JongmooChoi Depei Qian and Onur Mutlu A-drm Architecture-aware distributed resource management of virtualized clus-ters In Proceedings of the 11th ACM SIGPLANSIGOPSinternational conference on Virtual Execution Environments(VEE) Istanbul Turkey 2015

[72] Ian H Witten Eibe Frank and Geoffrey Holmes DataMining Practical Machine Learning Tools and Techniques3rd Edition

[73] Zhenyu Wu Zhang Xu and Haining Wang Whispers inthe hyper-space High-speed covert channel attacks in thecloud In Proc of the 21st USENIX Conference on SecuritySymposium (USENIX Security) Bellevue WA 2012

[74] Yunjing Xu Michael Bailey Farnam Jahanian KaustubhJoshi Matti Hiltunen and Richard Schlichting An explo-ration of l2 cache covert channels in virtualized environmentsIn Proc of the 3rd ACM Workshop on Cloud Computing Se-curity Workshop (CCSW) Chicago IL 2011

[75] Zhang Xu Haining Wang and Zhenyu Wu A measurementstudy on co-residence threat inside the cloud In Proc ofthe 24th USENIX Security Symposium (USENIX Security)Washington DC 2015

[76] Yuval Yarom and Katrina Falkner Flush+reload a highresolution low noise l3 cache side-channel attack In Proc ofthe 23rd Usenix Security Symposium San Diego CA 2014

[77] Matei Zaharia Mosharaf Chowdhury Tathagata Das AnkurDave Justin Ma Murphy McCauly Michael J FranklinScott Shenker and Ion Stoica Resilient distributed datasetsA fault-tolerant abstraction for in-memory cluster computingIn Proceedings of NSDI San Jose CA 2012

[78] Yinqian Zhang Ari Juels Alina Oprea and Michael K Re-iter Homealone Co-residency detection in the cloud via side-channel analysis In Proc of the IEEE Symposium on Securityand Privacy Oakland CA 2011

[79] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-tenant side-channel attacks in paas cloudsIn Proc of the ACM SIGSAC Conference on Computer andCommunications Security (CCS) Scottsdale AZ 2014

[80] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-vm side channels and their use to ex-tract private keys In Proceedings of the ACM Conferenceon Computer and Communications Security (CCS) RaleighNC 2012

[81] Yinqian Zhang and Michael K Reiter Duppel retrofittingcommodity operating systems to mitigate cache side channelsin the cloud In Proc of the ACM Conference on Computerand Communications Security (CCS) Berlin Germany 2013

[82] Fangfei Zhou Manish Goel Peter Desnoyers and Ravi Sun-daram Scheduler vulnerabilities and coordinated attacks incloud computing J Comput Secur 21(4)533ndash559 July2013

[83] Jieming Zhu Pinjia He Zibin Zheng and Michael R LyuTowards online accurate and scalable qos prediction for run-time service adaptation In Proc of the IEEE InternationalConference on Distributed Computing Systems (ICDCS)Madrid Spain 2014

[84] Sergey Zhuravlev Sergey Blagodurov and Alexandra Fe-dorova Addressing shared resource contention in multicoreprocessors via scheduling In Proc of the Fifteenth Editionof on Architectural Support for Programming Languages andOperating Systems (ASPLOS) Pittsburgh PA 2010

  • Introduction
  • Related Work
  • Bolt
    • Threat Model
    • Application Detection
    • Challenges
    • Detection Accuracy
    • Limitations
      • Cloud User Study
      • Security Attacks
        • Internal Denial of Service Attack
        • Resource Freeing Attack
        • VM Co-residency Detection
          • Improving Security via Resource Isolation
          • Conclusions
Page 7: Bolt: I Know What You Did Last Summer In the Cloudcsl.stanford.edu/~christos/publications/2017.bolt.asplos.pdf · 2018-01-09 · Bolt: I Know What You Did Last Summer... In the Cloud

Dominant resource We now examine the correlation be-tween the resource an application puts the most pressure onand Boltrsquos ability to correctly detect it (Figure 6) The num-bers on each bar show how many applications have each re-source as dominant In general applications that are mosteasily detected are ones with high instruction cache (L1-i)memory bandwidth network bandwidth and disk capacitypressure These are workloads that fall in one of the fol-lowing categories latency-critical services with large code-bases such as webservers in-memory analytics like Sparkand disk-bound analytics like Hadoop Interestingly L2 ac-tivity is a poor indicator of application type in contrast to L1and LLC since it does not capture a significant change in theworking set size (from 32KB to 256KB for our platforms)Number of iterations In this controlled experiment westop the detection process upon correct identification of aworkload For several jobs this requires more than one it-erations of profiling and data mining Figure 7a shows thefraction of workloads that were correctly-identified after Niterations 71 of victims only require a single iterationwhile an additional 15 required a second A small frac-tion of applications needed more than two iterations whilejobs that are not identified correctly until the sixth iterationdid not benefit from additional iterations The number of ap-plications per machine also affects the iterations required fordetection (Figure 7b) When a single job is scheduled on amachine one iteration is sufficient for almost all workloadsAs the number of victims per machine increases additionaliterations are needed to disentangle the contention signals ofeach co-scheduled job

Figure 8 shows a case of detection over several iter-ations The victim is a 4vCPU instance executing differ-ent consecutive jobs Detection occurs every 20sec by de-fault (see Fig 10 for a sensitivity study on profiling fre-quency) After the initial iteration Bolt detects the job asmcf from SPECCPU2006 In the next two iterations the in-terference profile of the victim continues to fit mcfrsquos char-acteristics At t=60sec Bolt detects that the job profile haschanged and now resembles an SVM classifier running onMahout [46] Similarly at t=180sec the victim changesagain to a Spark data mining workload Changes are typi-cally captured within a few secondsResource pressure Figure 9 shows the correlation betweenthe pressure of a victim in a given resource and Boltrsquos abilityto correctly detect it We plot detection accuracy for threecore and three uncore resources the results are similar forthe remaining four In almost all cases very low or veryhigh pressure carries the most value for detection For diskbandwidth accuracy remains high except for the 20-50 re-gion which contains many application classes with moder-ate disk activity including analytics in Hadoop databasesand graph applications Resource pressure also affects thenumber of iterations Bolt needs to correctly identify a work-load For resources with a lot of value for detection such as

1 2 3 4 5 6Iterations until Detection

0

10

20

30

40

50

60

70

80

PD

F (

)

1 2 3 4 5 6Iterations until Detection

0

20

40

60

80

100

PD

F (

)

1 app2 apps3 apps4 apps5 apps

Figure 7 PDFs of iterations required for detection intotal (left) and as a function of the co-residents number(right)

0 1 2 3 4 5 6 7Time (min)

0

20

40

60

80

100

Res

ourc

e P

ress

ure

() SPEC Hadoop Spark mem$d Cassandra L1-i

L1-dL2LLCCPUmemCapmemBwNetBwDiskCapDiskBw

Figure 8 Example of workload phase detection by Bolt

the L1-i and LLCs when pressure is moderate several iter-ations are needed for accuracy to increase In contrast foroff-chip resources like network bandwidth this effect is lesspronounced In general when moderate pressure hurts de-tection accuracy (Figure 9) more iterations are needed forcorrect identificationScheduler So far we have used a least loaded schedulerwhich is commonly-used in datacenters [33 63] This sched-uler does not account for resource contention between appli-cations leading to suboptimal performance [16 24 5 8482] Recent work has shown that if interference is accountedfor at the scheduler level both performance and utilizationimprove [48 19 53] We now evaluate the impact of sucha scheduler on Boltrsquos detection accuracy Quasar [19] lever-ages machine learning to quickly determine which applica-tions can be co-scheduled on the same machine without de-structive interference We use Quasar to schedule all victimapplications and then inject Bolt in each physical host for de-tection In a real setting Bolt can trick an interference-awarescheduler by incurring very low levels of interference ini-tially until it is co-scheduled with a victim application andcan determine its type and characteristics

Table 1 shows Boltrsquos accuracy with the least loadedscheduler (LL) and Quasar Interestingly accuracy increasesslightly with Quasar 2 on average but in general the im-pact of the scheduler is small There are two reasons for theincrease first both LL and Quasar do not share a singlehyperthread between jobs so core pressure measurementsremain accurate Second because Quasar only co-schedulesjobs with different critical resources it provides Bolt with aless ldquonoisyrdquo interference signal for uncore resources mak-ing distinguishing co-residents easier Therefore reducinginterference in software alone is not sufficient to mitigatethese security threats

0 20 40 60 80 100Resource Pressure ()

50

60

70

80

90

100

Acc

urac

y (

)

0 20 40 60 80 100Resource Pressure ()

50

60

70

80

90

100

Acc

urac

y (

)

L1-iLLCCPUMemCapNetBWDiskBW

Figure 9 Detection accuracy as a function of the pres-sure victims place in various shared resources

0 50 100 150 200 250 300Profiling Interval (sec)

0

20

40

60

80

100

Acc

urac

y (

)

0 5 10 15 20 25 30Adversarial VM size (vCPUs)

0

20

40

60

80

100

Acc

urac

y (

)

0 2 4 6 8 10Number of Benchmarks

0

20

40

60

80

100

Acc

urac

y (

)

Figure 10 Sensitivity to (a) profiling frequency (b)adversarial VM size and (c) profiling benchmarks

Sensitivity analysis Finally we examine how design de-cisions in Bolt affect detection Figure 10a shows how ac-curacy changes as profiling frequency decreases For profil-ing intervals beyond 30 sec accuracy drops rapidly If pro-filing only occurs every 5 minutes almost half the victimsare incorrectly identified Note that if workloads are long-running and mostly-stable profiling can be less frequentwithout such an impact on accuracy Figure 10b shows ac-curacy as a function of the adversarial VMrsquos size We exam-ine sizes offered as on-demand instances by EC2 [1] If theadversary has fewer than 4 vCPUs its resources are insuffi-cient to create enough contention to capture the co-residentsrsquopressure Accuracy continues to grow for larger sizes how-ever the larger the adversarial instance is the less likely itwill be co-scheduled with other VMs nullifying the valueof Bolt Finally Figure 10c shows accuracy as a function ofthe number of microbenchmarks used for profiling A singlebenchmark is not sufficient to fingerprint the characteristicsof a workload however using more than 3 benchmarks hasdiminishing returns in accuracy Unless otherwise specifiedwe use 20sec profiling intervals 4 vCPU adversarial VMsand 2 benchmarks for initial profiling

35 LimitationsWhile Bolt can accurately detect most frequently-run work-loads it has certain limitations First when no job sharesa core with the adversary the system assumes linear rela-tion between the co-residentsrsquo resource pressure in uncoreresources While this is true in some cases it may not begenerally accurate We will explore alternative ways of dis-tinguishing co-residents in future work Second the systemcannot differentiate between multiple jobs running in a sin-gle instance and multiple instances running one job eachWhile this does not affect detection if a covert-channel at-tack is aimed at a particular user more accurate informationwould increase the adversaryrsquos leverage Similarly Bolt can-not currently distinguish between two copies of the same job

sharing a platform at low load and one copy of the job thatruns at higher load

4 Cloud User StudyWe now evaluate Bolt in a real-world environment withmultiple users and unknown applications We conducted amulti-user study on Amazon EC2 subject to the followingrules

bull We have engaged 20 independent users from two aca-demic institutions (Cornell University and Stanford Uni-versity) provided them with access to a shared clusterof 200 c38xlarge instances on EC2 with 32 vCPUs and60GB of memory each and asked them to submit severalapplications of their choosing

bull On each instance we maintain a 4-vCPU VM for Bolt Allother resources are made available to the users as VMs

bull Each user submits one or more applications and all usershave equal priority ie no application will be preemptedto accommodate the job of another user

bull Users were instructed to behave amenably ie to avoidconsuming disproportionate resources so that the systemcan observe a representative job sample from all users

bull Users were also advised to pin their applications to spe-cific cores to eliminate interference from the OS sched-uler Physical cores can be shared across jobs howevereach vCPU (hardware thread) is dedicated to a single ap-plication

bull Users can optionally select the instance(s) they want tolaunch their applications on We provide a spreadsheetwith a list of all available instances and their utilizationat each point in time This information as well as any apriori information on the type number and characteris-tics of submitted applications is concealed from Bolt

bull If a user does not select an instance a least-loaded sched-uler selects the VM with the most available cores andmemory

Once users start launching jobs in the cluster we instan-tiate Bolt on each machine and activate it periodically to de-tect the type characteristics and number of co-scheduledapplications The entire experiment lasts approximately 4hours beyond which point all instances are terminated Fig-ure 11 shows the PDF of application types the different userslaunched provided by the users after the completion of theexperiment Different colors correspond to different users Atotal of 436 jobs were submitted to the cluster The applica-tion mix includes analytics jobs in Hadoop and Spark sci-entific benchmarks (Parsec Bioparallel) hardware synthesistools (Cadence VivadoHLS) multicore (zsim) and n-bodysimulations email clients web browsers web servers (http)music and video streaming services and databases amongothers Each job may take one or more vCPUs Even though

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30O

ccur

renc

esUser 1User 2User 3User 4User 5User 6User 7User 8User 9User 10

User 11User 12User 13User 14User 15User 16User 17User 18User 19User 20

1hadoop 5cadence 9MLPython 13spec 17parsec 21postgres 25ppt 29du -h 33cpu burn 37html 41cpmv 45rsync 49rm 53ix2spark 6zsim 10make 14matlab 18vim 22musicStream 26OS img 30crdel cgroup 34audacity 38cassandra 42sirius 46ping 50skype3email 7video 11mem$d 15mysql 19scala 23minebench 27pdfview 31bioparallel 35javascript 39mongoDB 43oProfile 47photoshop 51zipkin

4browser 8latex 12http server 16vivado 20php 24n-body sim 28scons 32storm 36create VMs 40mkdir 44dwnld LF 48ssh 52graphX

Figure 11 Probability distribution function of applications launched in the cloud study per user

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30

Occ

urre

nces

Correctly Identifying App Name

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30

Occ

urre

nces

Correctly Identifying App Characteristics

0 50 100 150 200Time (min)

0

50

100

150

200

Inst

ance

s

0006121824303642485460

Act

ive

Jobs

Figure 12 Detection accuracy of (a) application labels and (b) resource characteristics Fig 12(c) number of jobs perinstance

figure 11 groups applications by framework the individuallogic and datasets differ between jobs in the same frame-work Note that we have not updated the training set for thisstudy it consists of the same 120 applications described inSection 34

Figure 12a shows the number of applications that werecorrectly identified by name across the different categoriesIn total 277 jobs are correctly labeled As expected Boltcould not assign a label to application types it has not seenbefore such as email clients or image editing applicationsHowever even when Bolt cannot label an application it canstill identify the resources it is sensitive to as shown in fig-ure 12b For 385 out of 436 jobs Bolt correctly identifiesthe jobrsquos resource characteristics For performance attackssuch as the ones described in Section 5 this information issufficient to dramatically degrade the performance of a vic-tim application Finally figure 12c shows the number of co-scheduled applications per instance throughout the durationof the experiment The bottom 14 instances remained un-used for the remaining 186 instances the number of activeapplications ranges from 1 to 6 with each job taking oneor more vCPUs The majority of misclassified applicationscorrespond to instances with 5 or more concurrently activejobs

5 Security AttacksBolt makes several cloud attacks practical and difficult todetect We discuss three possible attacks that leverage theinformation obtained through detection

51 Internal Denial of Service AttackAttack setting Denial of service attacks hurt the perfor-mance of a victim service by overloading its resources [142 51] In cloud settings specifically they can be catego-rized in two types external and internal (or host-based) at-tacks External attacks are the most conventional form ofDoS [51 13 22] These attacks utilize external servers to di-rect excessive traffic to the victims flooding their resourcesand hurting their availability External DoS attacks affectmostly PaaS and SaaS systems and include IP spoofing syn-chronization (SYN) flooding smurf buffer overflow landand teardrop attacks

In contrast internal DoS attacks take advantage of IaaSand PaaS cloud multi-tenancy to launch adversarial pro-grams on the same host as the victim and degrade its perfor-mance [14 60 34 52] For example Ristenpart et al [60]showed how an adversarial user can leverage the IP namingconventions of IaaS clouds to locate a victim VM and de-grade its performance Cloud providers are starting to builddefenses against such attacks For example EC2 offers au-toscaling that will increase the instances of a service un-der heavy resource usage Similarly there is related workon mechanisms that detect saturation in memory bandwidth

0 20 40 60 80 100 120Time (s)

0

2000

4000

6000

8000

10000

12000

99th

ile

Lat

ency

BoltNaive

Detection

Migration

0 20 40 60 80 100 120Time (s)

0

20

40

60

80

100

CP

U U

tiliz

atio

n (

) BoltNaive

Detection

Migration

Figure 13 Latency and utilization with Bolt and a naıveDoS attack that saturates CPU resources

and can trigger VM migration [71] This means that DoSattacks that simply overload a physical host are ineffectivewhen such defenses are in place We focus on host-basedDoS attacks and make them resilient against such defensesby avoiding resource saturation We leverage the informationobtained by Bolt to construct custom interference programsthat stress the resources the victim is most sensitive to with-out overloading the host

Bolt creates custom contentious workloads by combiningthe microbenchmarks used to measure the resource pressureof the victim Since these microbenchmarks are tunable Boltconfigures their intensity to a higher point than their mea-sured pressure ci during detection For example if a victimis detected as memcached and its most critical resources arethe L1-i cache (81 pressure) and the last level cache (LLC)(78 pressure) Bolt uses the two microbenchmarks for L1-i and LLC at a higher intensity than what memcached cantolerateImpact We use the DoS attack setting above against the 108applications of the controlled experiment (Section 34) TheDoS attack results in degraded performance by 22x on aver-age and up to 98x in terms of execution time Degradation ismuch more pronounced for interactive workloads like key-value stores with tail latency increasing by 8-140x a dra-matic impact for applications with strict tail latency SLAs

We now examine the impact of the DoS attack on uti-lization If it translates to resource saturation there is a highprobability that the DoS will be detected and the victim mi-grated to a new machine The experimental cluster supportslive migration If CPU utilization (sampled every 1 sec) ex-ceeds 70 the victim is migrated to an unloaded host Fig-ure 13 compares the tail latency and CPU utilization withBolt to that of a naıve DoS that saturates the CPU through acompute-intensive kernel We focus on a single victim VMrunning memcached The overhead of migration (time be-tween initiating migration and latency returning to normallevels) for this VM is 8sec Performance degradation is sim-ilar for both systems until t=80sec at which point the victimis migrated to a new host due to the compute-intensive kernelcausing utilization to exceed 70 While during migrationperformance continues to degrade once the VM resumes inthe new server latency returns to nominal levels In contrastBolt keeps utilization low and impacts the victimrsquos perfor-mance beyond t=80sec

52 Resource Freeing AttackAttack setting The goal of a resource freeing attack (RFA)is to modify the workload of a victim such that it frees upresources for the adversary improving its performance [6766] The adversarial VM consists of two components a ben-eficiary and a helper The beneficiary is the program whoseperformance the attacker wants to improve and the helperis the program that forces the victim to yield its resourcesThe RFA works by adding load in the victimrsquos critical re-sources causing other resources to free up For example ifthe victim is a memory-bound Spark job introducing addi-tional memory traffic will result in Spark stalling in memoryand lessening its pressure in the other resources until it canreclaim its required memory bandwidth When launched ina public cloud the victim of an RFA ends up paying moreand achieving worse performance compared to running inisolation

We now create a proof of concept RFA for a webserver anetwork-bound Hadoop job and a memory-bound Spark jobLaunching RFAs requires in depth knowledge of the victimrsquosresource requirements [67] which Bolt provides by identi-fying the victimrsquos dominant resource Once the dominant re-source is detected the runtime applies a custom helper pro-gram that saturates this critical resource For the webserverthe helper is a CPU-intensive benchmark launching CGI re-quests causing the victim to saturate its CPU usage servic-ing fewer real user requests and freeing up cache resourcesFor the Hadoop job we use a network-intensive benchmarksimilar to iperf which saturates network bandwidth andfrees up CPU and memory resources for the beneficiarySimilarly for Spark we use a streaming memory benchmarkthat slows down k-means and frees network and compute re-sources for the adversary The selection of the beneficiaryis of lesser importance without loss of generality we se-lect mcf a CPU-intensive benchmark from SPECCPU2006The same methodology can be used with other beneficiariesconditioned to their critical resource not overlapping withthe victimrsquosImpact Table 2 shows the performance degradation for thethree victim applications and the improvement in executiontime for the beneficiary The webserver suffers the most interms of queries per second as the helperrsquos CGI requestspollute its cache preventing it from servicing legitimate userrequests Hadoop and Spark experience significant degrada-tions in execution time due to network and memory band-width saturation respectively Execution time for mcf im-proves by 16-38 benefiting from the victim stalling in itscritical resource to improve its cache and CPU usage

53 VM Co-residency DetectionAttack setting Sections 34 and 4 showed that we can iden-tify the applications sharing a cloud infrastructure Howevera malicious user is rarely interested in a random service run-ning on a public cloud More often they target a specific

Victim Beneficiary TargetApp Perf App Perf Resource

Apache Webserver -64 (QPS) mcf +24 CPUHadoop (SVM) -36 (Exec) mcf +16 Network BW

Spark (k-means) -52 (Exec) mcf +38 Memory BW

Table 2 RFA impact on the victims and beneficiaries

workload eg a competitive e-commerce site Thereforethey need to pinpoint where the target resides in a practicalmanner This requires a launch strategy and a mechanism forco-residency detection [69] The attack is practical if the tar-get is located with high confidence in reasonable time andwith modest resource costs Bolt enables this attack to becarried out in a practical way and remains resilient againstdefense mechanisms such as virtual private clouds (VPCs)which make internal IP addresses private to a single tenantOnce a target VM is located the adversary can launch RFAor DoS attacks as previously described Co-residency detec-tion attacks rely on leakage of logical information eg IPaddress or on observing the impact of a side-channel attackdue to resource contention Bolt relies on a variation of thelatter approach

Assume a system of N servers A target victim userlaunches k VMs The adversary launches n malicious VMsInstances are launched simultaneously to avoid malicious-malicious instance co-residency [69] The probability that atleast one adversarial VM will be co-scheduled with a victiminstance is P( f ) = 1minus (1minus kN)n The adversary then usesa two-process detection scheme The sender is a process thatcreates contention to degrade the victimrsquos performance Thereceiver is the process that detects this degradation runningeither on the same host as the victim (cooperative detection)or externally (uncooperative detection) We assume an unco-operative victim which is the more general case Bolt worksas the sender to determine the type of co-residents on eachsampled host and identify any of the desired type Assumem VMs of the desired type have been detected in the sampleset Bolt now needs to prune down this space to VM(s) fromthe specific target victim It launches an external receiveraiming at the victim over a public channel eg HTTP orkey-value protocol to ping the victim service This is theonly point of communication with the victim At the sametime the sender incurs contention in the resources the victimis most sensitive to If the sender and victim are co-residentsthe receiverrsquos requests will be slower due to contention Boltquickly prunes down the sampled VMs by determining thetype of co-residents In a large cluster this reduces the timeto pinpoint a victim and the resource costs for the adver-sary It also increases the confidence in the detection in thepresence of noise from other co-residentsImpact We evaluate co-residency detection on the same40-node cluster The victim is a SQL server instantiatinga single VM The cluster also hosts 7 other VMs runningSQL Other active VMs host key-value stores Hadoop andSpark jobs Bolt launches 10 senders simultaneously on

randomly-selected machines (P( f ) = ) and detects 3VMs running SQL in the sample set It then introduces mem-ory interference as the receiver initiates a large number ofSQL queries While these queries have 816msec mean la-tency without interference latency now becomes 2614msecGiven a sim 3times latency increase we can conclude that one ofthe co-residents is the victim SQL server Detection required6sec from instantiation to receiver detection and 11 adver-sary VMs

6 Improving Security via Resource IsolationSince interference is at the heart of Boltrsquos detection method-ology isolation mechanisms that attenuate contention shouldreduce its accuracy We first evaluate to what extent ex-isting isolation techniques mitigate security vulnerabilitiesand then highlight trade-offs between security performanceand utilization We use the same setting as in the controlledexperiment of Section 34 We evaluate 4 resource-specificisolation techniques and 3 settings for OS-level isolationmechanisms baremetal containerized and virtualized Thebaremetal system does not employ any OS level isolationThe containerized setup uses Linux containers (lxc) andassigns cores to applications via cpuset cgroups Bothcontainers and VMs constrain memory capacity Baremetalexperiments do not enforce memory capacity allocationsand the Linux scheduler is free to float applications acrosscores

The first resource-specific isolation techniques is threadpinning to physical cores to constrain interference fromscheduling actions like context switching The number ofcores an application is allocated can change dynamicallyand is limited by how fast Linux can migrate tasks typicallyin the tens of milliseconds

For network isolation we use the outbound networkbandwidth partitioning capabilities of Linuxrsquos traffic con-trol Specifically we use the qdisc [8] scheduler with hier-archical token bucket queueing discipline (HTB) to enforceegress bandwidth limits The limits are set to the maximumtraffic burst rate for each application to avoid contention(ceil parameter in HTB interface) Ingress network band-width isolation has been extensively studied in previouswork [36] these approaches can be applied here as well

For DRAM bandwidth isolation there is no commer-cially available partitioning mechanism To enforce band-width isolation we use the following approach we monitorthe DRAM bandwidth usage of each application throughperformance counters [45] and modify the scheduler to onlycolocate jobs on machines that can accommodate their ag-gregate peak memory bandwidth requirements This requiresextensive application knowledge and is used simply to high-light the benefits of DRAM bandwidth isolation

Finally for last level cache (LLC) isolation we use theCache Allocation Technology (CAT) available in recent In-tel chips [11] CAT partitions the LLC in ways which in

0

20

40

60

80

100

Acc

urac

y (

)

None ThreadPinning

+Net BWPartitioning

+Mem BWPartitioning

+CachePartitioning

+CoreIsolation

BaremetalLinuxContainersVirtualMachines

Figure 14 Detection accuracy with isolation techniques

a highly-associative cache allows for non-overlapping par-titions at the granularity of a few percent of the LLC ca-pacity Each co-resident is allocated one partition configuredto its current capacity requirements [57] Partitions can beresized at runtime by reprogramming specific low-level reg-isters (MSRs) changes take effect after a few milliseconds

We add one isolation mechanism at a time in the threesystem settings (baremetal containers VMs) Figure 14shows the impact of isolation techniques on Boltrsquos detectionaccuracy As expected when no isolation is used baremetalallows for a significantly higher detection accuracy thancontainer- and VM-based systems mostly due to the lat-ter constraining core and memory capacity usage As a re-sult introducing thread pinning benefits baremetal the mostsince it reduces core contention It also benefits container-and VM-based setups to a lesser degree by eliminating un-predictability introduced by the Linux scheduler (eg con-text switching) [40] The dominant resource of each appli-cation determines which isolation technique benefits it themost Thread pinning mostly benefits workloads bound byon-chip resources such as L1L2 caches and cores Addingnetwork bandwidth partitioning lowers detection accuracyfor all three settings almost equally It primarily benefitsnetwork-bound workloads for which network interferenceconveys the most information for detection Memory band-width isolation further reduces accuracy by 10 on averagebenefiting jobs dominated by DRAM traffic Finally cachepartitioning has the most dramatic reduction in accuracyespecially for LLC-bound workloads We attribute this tothe importance cache pressure has as a detection signal Thenumber of co-residents also affects the extent to which iso-lation helps The more co-scheduled applications exist permachine the more isolation techniques degrade accuracy asthey make distinguishing between co-residents harder

Unfortunately even when all the techniques are on ac-curacy is still 50 due to two reasons current techniquesare not fine-grain enough to allow strict and scalable isola-tion and core resources (L1L2 caches CPU) are prone tointerference due to contending hyperthreads [61] To eval-uate the latter hypothesis we modify the scheduler suchthat hyperthreads of different jobs are never scheduled onthe same physical core eg if a job needs 7vCPUs it isallocated 4 dedicated physical cores The grey bars of Fig-ure 14 show the detection accuracy Baremetal instances stillallow certain applications to be detected but for container-

ized and virtualized settings accuracy drops to 14 The re-maining accuracy corresponds to disk-heavy workloads Im-proving security however comes at a performance penaltyof 34 on average in execution time as threads of the samejob are forced to contend with one another Alternativelyusers can overprovision their resource reservations to avoiddegradation which results in a 45 drop in utilization Thismeans that the cloud provider cannot leverage CPU idlenessto share machines decreasing the cost benefits of cloud com-puting Note that enforcing core isolation alone is also notsufficient as it allows a detection accuracy of 46Discussion The previous analysis highlights a design prob-lem with current datacenter platforms Traditional multi-cores are prone to contention which will only worsen asmore cores are integrated in each server and multi-tenancybecomes more pronounced Existing isolation techniques areinsufficient to mitigate security vulnerabilities and tech-niques that provide reasonable security guarantees sacrificeperformance or cost efficiency through low utilization Thishighlights the need for new fine-grain and coordinated iso-lation techniques that guarantee security at high utilizationfor shared resources

7 ConclusionsWe have presented Bolt a practical system that uses onlinedata mining techniques to identify the type and character-istics of applications running on shared cloud systems andenables practical attacks that degrade their performance Ina 40-server cluster Bolt correctly identifies 87 out of 108workloads and degrades tail latency by up to 140x We alsoused Bolt in a user study on EC2 to detect the character-istics of unknown jobs submitted by multiple users Finallywe show that while existing isolation techniques are helpfulthey are not sufficient to mitigate such attacks Bolt revealsreal easy-to-exploit threats in public clouds We hope thatthis work will motivate cloud providers and computer sci-entists to develop and deploy stricter isolation primitives incloud systems

AcknowledgementsWe sincerely thank all the participants in the user study fortheir time and effort We also thank Daniel Sanchez EdSuh Grant Ayers Mingyu Gao Ana Klimovic and the restof the MAST group as well as the anonymous reviewersfor their feedback on earlier versions of this manuscriptThis work was supported by the Stanford Platform LabNSF grant CNS-1422088 and a John and Norma BalenSesquicentennial Faculty Fellowship

References[1] Amazon ec2 httpawsamazoncomec2

[2] Aman Bakshi and Yogesh B Dujodwala Securing cloudfrom ddos attacks using intrusion detection system in virtualmachine In Proc of the Second International Conference onCommunication Software and Networks (ICCSN) 2010

[3] Luiz Barroso and Urs Hoelzle The Datacenter as a Com-puter An Introduction to the Design of Warehouse-Scale Ma-chines MC Publishers 2009

[4] Naomi Benger Joop van de Pol Nigel P Smart and YuvalYarom rdquoooh aah just a little bitrdquo A small amount of sidechannel can go a long way In Proc of the InternationalCryptographic Hardware and Embedded Systems Workshop(CHES) Busan South Korea 2014

[5] Major Bhadauria and Sally A McKee An approach toresource-aware co-scheduling for cmps In Proc of the 24thACM International Conference on Supercomputing (ICS)Tsukuba Japan 2010

[6] Leon Bottou Large-scale machine learning with stochas-tic gradient descent In Proceedings of the InternationalConference on Computational Statistics (COMPSTAT) ParisFrance 2010

[7] Eric Brewer Kubernetes The path to cloud native httpgooglQgkzYB SOCC Keynote August 2015

[8] Martin A Brown Traffic control howto httplinux-ipnetarticlesTraffic-Control-HOWTO

[9] Robin Burke Hybrid recommender systems Survey andexperiments User Modeling and User-Adapted Interaction12(4)331ndash370 November 2002

[10] Apache cassandra httpcassandraapacheorg

[11] Intel Rcopy64 and IA-32 Architecture Software DeveloperrsquosManual vol3B System Programming Guide Part 2 Septem-ber 2014

[12] Ludmila Cherkasova Diwaker Gupta and Amin VahdatComparison of the three cpu schedulers in xen SIGMET-RICS Perform Eval Rev 35(2)42ndash51 September 2007

[13] Scott A Crosby and Dan S Wallach Denial of service viaalgorithmic complexity attacks In Proceedings of the 12thConference on USENIX Security Symposium WashingtonDC 2003

[14] Marwan Darwish Abdelkader Ouda and Luiz FernandoCapretz Cloud-based ddos attacks and defenses In Procof i-Society Toronto ON 2013

[15] Christina Delimitrou and Christos Kozyrakis iBench Quan-tifying Interference for Datacenter Workloads In Proceed-ings of the 2013 IEEE International Symposium on WorkloadCharacterization (IISWC) Portland OR September 2013

[16] Christina Delimitrou and Christos Kozyrakis Paragon QoS-Aware Scheduling for Heterogeneous Datacenters In Pro-ceedings of the Eighteenth International Conference on Ar-chitectural Support for Programming Languages and Oper-ating Systems (ASPLOS) Houston TX USA 2013

[17] Christina Delimitrou and Christos Kozyrakis QoS-AwareScheduling in Heterogeneous Datacenters with Paragon InACM Transactions on Computer Systems (TOCS) Vol 31Issue 4 December 2013

[18] Christina Delimitrou and Christos Kozyrakis Quality-of-Service-Aware Scheduling in Heterogeneous Datacenterswith Paragon In IEEE Micro Special Issue on Top Picks fromthe Computer Architecture Conferences MayJune 2014

[19] Christina Delimitrou and Christos Kozyrakis Quasar Resource-Efficient and QoS-Aware Cluster Management In Proceed-ings of the Nineteenth International Conference on Architec-tural Support for Programming Languages and OperatingSystems (ASPLOS) Salt Lake City UT USA 2014

[20] Christina Delimitrou and Christos Kozyrakis HCloudResource-Efficient Provisioning in Shared Cloud SystemsIn Proceedings of the Twenty First International Conferenceon Architectural Support for Programming Languages andOperating Systems (ASPLOS) April 2016

[21] Christina Delimitrou Daniel Sanchez and Christos KozyrakisTarcil Reconciling Scheduling Speed and Quality in LargeShared Clusters In Proceedings of the Sixth ACM Symposiumon Cloud Computing (SOCC) August 2015

[22] Jake Edge Denial of service via hash collisions httplwnnetArticles474912 January 2012

[23] Benjamin Farley Ari Juels Venkatanathan Varadarajan ThomasRistenpart Kevin D Bowers and Michael M Swift More foryour money Exploiting performance heterogeneity in publicclouds In Proc of the ACM Symposium on Cloud Computing(SOCC) San Jose CA 2012

[24] Alexandra Fedorova Margo Seltzer and Michael D SmithImproving performance isolation on chip multiprocessors viaan operating system scheduler In Proceedings of the 16thIntl Conference on Parallel Architecture and CompilationTechniques (PACT) Brasov Romania 2007

[25] Alexander Felfernig and Robin Burke Constraint-based rec-ommender systems Technologies and research issues InProceedings of the ACM International Conference on Elec-tronic Commerce (ICEC) Innsbruck Austria 2008

[26] Brad Fitzpatrick Distributed caching with memcached InLinux Journal Volume 2004 Issue 124 2004

[27] Oded Goldreich and Rafail Ostrovsky Software protectionand simulation on oblivious rams J ACM 43(3)431ndash473May 1996

[28] Google container engine httpscloudgooglecomcontainer-engine

[29] Asela Gunawardana and Christopher Meek A unified ap-proach to building hybrid recommender systems In Proc ofthe Third ACM Conference on Recommender Systems (Rec-Sys) New York NY 2009

[30] Sanchika Gupta and Padam Kumar Vm profile based opti-mized network attack pattern detection scheme for ddos at-tacks in cloud In Proc of SSCC Mysore India 2013

[31] Yi Han Tansu Alpcan Jeffrey Chan and Christopher LeckieSecurity games for virtual machine allocation in cloud com-puting In 4th International Conference on Decision andGame Theory for Security Fort Worth TX 2013

[32] Amir Herzberg Haya Shulman Johanna Ullrich and EdgarWeippl Cloudoscopy Services discovery and topology map-ping In Proceedings of the ACM Workshop on Cloud Com-puting Security Workshop (CCSW) Berlin Germany 2013

[33] Ben Hindman Andy Konwinski Matei Zaharia Ali GhodsiAnthony D Joseph Randy Katz Scott Shenker and Ion Sto-ica Mesos A platform for fine-grained resource sharing inthe data center In Proceedings of NSDI Boston MA 2011

[34] Jingwei Huang David M Nicol and Roy H CampbellDenial-of-service threat to hadoopyarn clusters with multi-tenancy In Proc of the IEEE International Congress on BigData Washington DC 2014

[35] Alexandru Iosup Nezih Yigitbasi and Dick Epema Onthe performance variability of production cloud services InProceedings of CCGRID Newport Beach CA 2011

[36] Vimalkumar Jeyakumar Mohammad Alizadeh David MazieresBalaji Prabhakar Changhoon Kim and Albert GreenbergEyeq Practical network performance isolation at the edge InProc of the 10th USENIX Conference on Networked SystemsDesign and Implementation (NSDI) Lombard IL 2013

[37] Yaakoub El Khamra Hyunjoo Kim Shantenu Jha and Man-ish Parashar Exploring the performance fluctuations of hpcworkloads on clouds In Proceedings of CloudCom Indi-anapolis IN 2010

[38] Krzysztof C Kiwiel Convergence and efficiency of subgra-dient methods for quasiconvex minimization In Mathemat-ical Programming (Series A) (Berlin Heidelberg Springer)90 (1) pp 1-25 2001

[39] Ruby B Lee Rethinking computers for cybersecurity IEEEComputer 48(4)16ndash25 2015

[40] Jacob Leverich and Christos Kozyrakis Reconciling highserver utilization and sub-millisecond quality-of-service InProceedings of EuroSys Amsterdam The Netherlands 2014

[41] Host server cpu utilization in amazon ec2 cloud httpgoogl2LTx4T

[42] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser andRuby B Lee Last-level cache side-channel attacks are prac-tical In Proc of IEEE Symposium on Security and Privacy(SampP) San Jose CA 2015

[43] Fei Liu Lanfang Ren and Hongtao Bai Mitigating cross-vm side channel attack on multiple tenants cloud platform InJournal of Computers Vol 9 No 4 (2014) 1005-1013 April2014

[44] David Lo Liqun Cheng Rama Govindaraju Luiz Andre Bar-roso and Christos Kozyrakis Towards energy proportionalityfor large-scale latency-critical workloads In Proceedings ofthe 41st Annual International Symposium on Computer Ar-chitecuture (ISCA) Minneapolis MN 2014

[45] David Lo Liqun Cheng Rama Govindaraju ParthasarathyRanganathan and Christos Kozyrakis Heracles Improvingresource efficiency at scale In Proc of the 42Nd AnnualInternational Symposium on Computer Architecture (ISCA)Portland OR 2015

[46] Mahout httpmahoutapacheorg

[47] Dave Mangot Ec2 variability The numbers revealed httptechmangotcomrollerdaveentryec2_variability_the_numbers_revealed

[48] Jason Mars and Lingjia Tang Whare-map heterogeneity inrdquohomogeneousrdquo warehouse-scale computers In Proceedingsof ISCA Tel-Aviv Israel 2013

[49] Robert Martin John Demme and Simha SethumadhavanTimewarp Rethinking timekeeping and performance moni-toring mechanisms to mitigate side-channel attacks In Pro-ceedings of the International Symposium on Computer Archi-tecture (ISCA) Portland OR 2012

[50] David Meisner Christopher M Sadler Luiz Andre BarrosoWolf-Dietrich Weber and Thomas F Wenisch Power man-agement of online data-intensive services In Proceedings ofthe 38th annual international symposium on Computer archi-tecture pages 319ndash330 2011

[51] Jelena Mirkovic and Peter Reiher A taxonomy of ddos attackand ddos defense mechanisms ACM SIGCOMM ComputerCommunication Review (CCR) April 2004

[52] Thomas Moscibroda and Onur Mutlu Memory performanceattacks Denial of memory service in multi-core systemsIn Proc of 16th USENIX Security Symposium on USENIXSecurity Symposium (SS) Boston MA 2007

[53] Ripal Nathuji Aman Kansal and Alireza Ghaffarkhah Q-clouds Managing performance interference effects for qos-aware clouds In Proceedings of EuroSys ParisFrance 2010

[54] Simon Ostermann Alexandru Iosup Nezih Yigitbasi RaduProdan Thomas Fahringer and Dick Epema A performanceanalysis of ec2 cloud computing services for scientific com-puting In Lecture Notes on Cloud Computing Volume 34p115-131 2010

[55] Tao Peng Christopher Leckie and Kotagiri RamamohanaraoSurvey of network-based defense mechanisms countering thedos and ddos problems ACM Comput Surv 39(1) April2007

[56] Diego Perez-Botero Jakub Szefer and Ruby B Lee Char-acterizing hypervisor vulnerabilities in cloud computingservers In Proceedings of the 2013 International Work-shop on Security in Cloud Computing SCCASIACCSHangzhou China 2013

[57] Moinuddin K Qureshi and Yale N Patt Utility-based cachepartitioning A low-overhead high-performance runtimemechanism to partition shared caches In Proceedings ofthe 39th Annual IEEEACM International Symposium on Mi-croarchitecture MICRO 39 2006

[58] Himanshu Raj Ripal Nathuji Abhishek Singh and Paul Eng-land Resource management for isolation enhanced cloud ser-vices In Proc of the ACM Workshop on Cloud ComputingSecurity (CCSW) Chicago IL 2009

[59] Suhail Rehman and Majd Sakr Initial findings for provision-ing variation in cloud computing In Proceedings of Cloud-Com Indianapolis IN 2010

[60] Thomas Ristenpart Eran Tromer Hovav Shacham and Ste-fan Savage Hey you get off of my cloud Exploring infor-mation leakage in third-party compute clouds In Proc of theACM Conference on Computer and Communications Security(CCS) Chicago IL 2009

[61] Daniel Sanchez and Christos Kozyrakis Vantage Scalableand Efficient Fine-Grain Cache Partitioning In Proceedingsof the 38th annual International Symposium in ComputerArchitecture (ISCA-38) San Jose CA June 2011

[62] Jorg Schad Jens Dittrich and Jorge-Arnulfo Quiane-RuizRuntime measurements in the cloud Observing analyzingand reducing variance Proceedings VLDB Endow 3(1-2)460ndash471 September 2010

[63] Malte Schwarzkopf Andy Konwinski Michael Abd-El-Malek and John Wilkes Omega flexible scalable sched-ulers for large compute clusters In Proceedings of EuroSysPrague Czech Republic 2013

[64] Alan Shieh Srikanth Kandula Albert Greenberg and ChanghoonKim Seawall Performance isolation for cloud datacenternetworks In Proc of the USENIX Conference on Hot Topicsin Cloud Computing (HotCloud) Boston MA 2010

[65] David Shue Michael J Freedman and Anees Shaikh Perfor-mance isolation and fairness for multi-tenant cloud storage InProc of the 10th USENIX Conference on Operating SystemsDesign and Implementation (OSDI) Hollywood CA 2012

[66] Dan Tsafrir Yoav Etsion and Dror G Feitelson Secretlymonopolizing the cpu without superuser privileges In Procof 16th USENIX Security Symposium on USENIX SecuritySymposium Boston MA 2007

[67] Venkatanathan Varadarajan Thawan Kooburat BenjaminFarley Thomas Ristenpart and Michael Swift Resource-freeing attacks Improve your cloud performance (at yourneighborrsquos expense) In Proc of the Conference on Computerand Communications Security (CCS) Raleigh 2012

[68] Venkatanathan Varadarajan Thomas Ristenpart and MichaelSwift Scheduler-based defenses against cross-vm side-channels In Proc of the 23rd Usenix Security SymposiumSan Diego CA 2014

[69] Venkatanathan Varadarajan Yinqian Zhang Thomas Risten-part and Michael Swift A placement vulnerability study inmulti-tenant public clouds In Proc of the 24th USENIX Secu-rity Symposium (USENIX Security) Washington DC 2015

[70] Huaibin Wang Haiyun Zhou and Chundong Wang Vir-tual machine-based intrusion detection system framework incloud computing environment In Journal of Computers Oc-tober 2012

[71] Hui Wang Canturk Isci Lavanya Subramanian JongmooChoi Depei Qian and Onur Mutlu A-drm Architecture-aware distributed resource management of virtualized clus-ters In Proceedings of the 11th ACM SIGPLANSIGOPSinternational conference on Virtual Execution Environments(VEE) Istanbul Turkey 2015

[72] Ian H Witten Eibe Frank and Geoffrey Holmes DataMining Practical Machine Learning Tools and Techniques3rd Edition

[73] Zhenyu Wu Zhang Xu and Haining Wang Whispers inthe hyper-space High-speed covert channel attacks in thecloud In Proc of the 21st USENIX Conference on SecuritySymposium (USENIX Security) Bellevue WA 2012

[74] Yunjing Xu Michael Bailey Farnam Jahanian KaustubhJoshi Matti Hiltunen and Richard Schlichting An explo-ration of l2 cache covert channels in virtualized environmentsIn Proc of the 3rd ACM Workshop on Cloud Computing Se-curity Workshop (CCSW) Chicago IL 2011

[75] Zhang Xu Haining Wang and Zhenyu Wu A measurementstudy on co-residence threat inside the cloud In Proc ofthe 24th USENIX Security Symposium (USENIX Security)Washington DC 2015

[76] Yuval Yarom and Katrina Falkner Flush+reload a highresolution low noise l3 cache side-channel attack In Proc ofthe 23rd Usenix Security Symposium San Diego CA 2014

[77] Matei Zaharia Mosharaf Chowdhury Tathagata Das AnkurDave Justin Ma Murphy McCauly Michael J FranklinScott Shenker and Ion Stoica Resilient distributed datasetsA fault-tolerant abstraction for in-memory cluster computingIn Proceedings of NSDI San Jose CA 2012

[78] Yinqian Zhang Ari Juels Alina Oprea and Michael K Re-iter Homealone Co-residency detection in the cloud via side-channel analysis In Proc of the IEEE Symposium on Securityand Privacy Oakland CA 2011

[79] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-tenant side-channel attacks in paas cloudsIn Proc of the ACM SIGSAC Conference on Computer andCommunications Security (CCS) Scottsdale AZ 2014

[80] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-vm side channels and their use to ex-tract private keys In Proceedings of the ACM Conferenceon Computer and Communications Security (CCS) RaleighNC 2012

[81] Yinqian Zhang and Michael K Reiter Duppel retrofittingcommodity operating systems to mitigate cache side channelsin the cloud In Proc of the ACM Conference on Computerand Communications Security (CCS) Berlin Germany 2013

[82] Fangfei Zhou Manish Goel Peter Desnoyers and Ravi Sun-daram Scheduler vulnerabilities and coordinated attacks incloud computing J Comput Secur 21(4)533ndash559 July2013

[83] Jieming Zhu Pinjia He Zibin Zheng and Michael R LyuTowards online accurate and scalable qos prediction for run-time service adaptation In Proc of the IEEE InternationalConference on Distributed Computing Systems (ICDCS)Madrid Spain 2014

[84] Sergey Zhuravlev Sergey Blagodurov and Alexandra Fe-dorova Addressing shared resource contention in multicoreprocessors via scheduling In Proc of the Fifteenth Editionof on Architectural Support for Programming Languages andOperating Systems (ASPLOS) Pittsburgh PA 2010

  • Introduction
  • Related Work
  • Bolt
    • Threat Model
    • Application Detection
    • Challenges
    • Detection Accuracy
    • Limitations
      • Cloud User Study
      • Security Attacks
        • Internal Denial of Service Attack
        • Resource Freeing Attack
        • VM Co-residency Detection
          • Improving Security via Resource Isolation
          • Conclusions
Page 8: Bolt: I Know What You Did Last Summer In the Cloudcsl.stanford.edu/~christos/publications/2017.bolt.asplos.pdf · 2018-01-09 · Bolt: I Know What You Did Last Summer... In the Cloud

0 20 40 60 80 100Resource Pressure ()

50

60

70

80

90

100

Acc

urac

y (

)

0 20 40 60 80 100Resource Pressure ()

50

60

70

80

90

100

Acc

urac

y (

)

L1-iLLCCPUMemCapNetBWDiskBW

Figure 9 Detection accuracy as a function of the pres-sure victims place in various shared resources

0 50 100 150 200 250 300Profiling Interval (sec)

0

20

40

60

80

100

Acc

urac

y (

)

0 5 10 15 20 25 30Adversarial VM size (vCPUs)

0

20

40

60

80

100

Acc

urac

y (

)

0 2 4 6 8 10Number of Benchmarks

0

20

40

60

80

100

Acc

urac

y (

)

Figure 10 Sensitivity to (a) profiling frequency (b)adversarial VM size and (c) profiling benchmarks

Sensitivity analysis Finally we examine how design de-cisions in Bolt affect detection Figure 10a shows how ac-curacy changes as profiling frequency decreases For profil-ing intervals beyond 30 sec accuracy drops rapidly If pro-filing only occurs every 5 minutes almost half the victimsare incorrectly identified Note that if workloads are long-running and mostly-stable profiling can be less frequentwithout such an impact on accuracy Figure 10b shows ac-curacy as a function of the adversarial VMrsquos size We exam-ine sizes offered as on-demand instances by EC2 [1] If theadversary has fewer than 4 vCPUs its resources are insuffi-cient to create enough contention to capture the co-residentsrsquopressure Accuracy continues to grow for larger sizes how-ever the larger the adversarial instance is the less likely itwill be co-scheduled with other VMs nullifying the valueof Bolt Finally Figure 10c shows accuracy as a function ofthe number of microbenchmarks used for profiling A singlebenchmark is not sufficient to fingerprint the characteristicsof a workload however using more than 3 benchmarks hasdiminishing returns in accuracy Unless otherwise specifiedwe use 20sec profiling intervals 4 vCPU adversarial VMsand 2 benchmarks for initial profiling

35 LimitationsWhile Bolt can accurately detect most frequently-run work-loads it has certain limitations First when no job sharesa core with the adversary the system assumes linear rela-tion between the co-residentsrsquo resource pressure in uncoreresources While this is true in some cases it may not begenerally accurate We will explore alternative ways of dis-tinguishing co-residents in future work Second the systemcannot differentiate between multiple jobs running in a sin-gle instance and multiple instances running one job eachWhile this does not affect detection if a covert-channel at-tack is aimed at a particular user more accurate informationwould increase the adversaryrsquos leverage Similarly Bolt can-not currently distinguish between two copies of the same job

sharing a platform at low load and one copy of the job thatruns at higher load

4 Cloud User StudyWe now evaluate Bolt in a real-world environment withmultiple users and unknown applications We conducted amulti-user study on Amazon EC2 subject to the followingrules

bull We have engaged 20 independent users from two aca-demic institutions (Cornell University and Stanford Uni-versity) provided them with access to a shared clusterof 200 c38xlarge instances on EC2 with 32 vCPUs and60GB of memory each and asked them to submit severalapplications of their choosing

bull On each instance we maintain a 4-vCPU VM for Bolt Allother resources are made available to the users as VMs

bull Each user submits one or more applications and all usershave equal priority ie no application will be preemptedto accommodate the job of another user

bull Users were instructed to behave amenably ie to avoidconsuming disproportionate resources so that the systemcan observe a representative job sample from all users

bull Users were also advised to pin their applications to spe-cific cores to eliminate interference from the OS sched-uler Physical cores can be shared across jobs howevereach vCPU (hardware thread) is dedicated to a single ap-plication

bull Users can optionally select the instance(s) they want tolaunch their applications on We provide a spreadsheetwith a list of all available instances and their utilizationat each point in time This information as well as any apriori information on the type number and characteris-tics of submitted applications is concealed from Bolt

bull If a user does not select an instance a least-loaded sched-uler selects the VM with the most available cores andmemory

Once users start launching jobs in the cluster we instan-tiate Bolt on each machine and activate it periodically to de-tect the type characteristics and number of co-scheduledapplications The entire experiment lasts approximately 4hours beyond which point all instances are terminated Fig-ure 11 shows the PDF of application types the different userslaunched provided by the users after the completion of theexperiment Different colors correspond to different users Atotal of 436 jobs were submitted to the cluster The applica-tion mix includes analytics jobs in Hadoop and Spark sci-entific benchmarks (Parsec Bioparallel) hardware synthesistools (Cadence VivadoHLS) multicore (zsim) and n-bodysimulations email clients web browsers web servers (http)music and video streaming services and databases amongothers Each job may take one or more vCPUs Even though

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30O

ccur

renc

esUser 1User 2User 3User 4User 5User 6User 7User 8User 9User 10

User 11User 12User 13User 14User 15User 16User 17User 18User 19User 20

1hadoop 5cadence 9MLPython 13spec 17parsec 21postgres 25ppt 29du -h 33cpu burn 37html 41cpmv 45rsync 49rm 53ix2spark 6zsim 10make 14matlab 18vim 22musicStream 26OS img 30crdel cgroup 34audacity 38cassandra 42sirius 46ping 50skype3email 7video 11mem$d 15mysql 19scala 23minebench 27pdfview 31bioparallel 35javascript 39mongoDB 43oProfile 47photoshop 51zipkin

4browser 8latex 12http server 16vivado 20php 24n-body sim 28scons 32storm 36create VMs 40mkdir 44dwnld LF 48ssh 52graphX

Figure 11 Probability distribution function of applications launched in the cloud study per user

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30

Occ

urre

nces

Correctly Identifying App Name

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30

Occ

urre

nces

Correctly Identifying App Characteristics

0 50 100 150 200Time (min)

0

50

100

150

200

Inst

ance

s

0006121824303642485460

Act

ive

Jobs

Figure 12 Detection accuracy of (a) application labels and (b) resource characteristics Fig 12(c) number of jobs perinstance

figure 11 groups applications by framework the individuallogic and datasets differ between jobs in the same frame-work Note that we have not updated the training set for thisstudy it consists of the same 120 applications described inSection 34

Figure 12a shows the number of applications that werecorrectly identified by name across the different categoriesIn total 277 jobs are correctly labeled As expected Boltcould not assign a label to application types it has not seenbefore such as email clients or image editing applicationsHowever even when Bolt cannot label an application it canstill identify the resources it is sensitive to as shown in fig-ure 12b For 385 out of 436 jobs Bolt correctly identifiesthe jobrsquos resource characteristics For performance attackssuch as the ones described in Section 5 this information issufficient to dramatically degrade the performance of a vic-tim application Finally figure 12c shows the number of co-scheduled applications per instance throughout the durationof the experiment The bottom 14 instances remained un-used for the remaining 186 instances the number of activeapplications ranges from 1 to 6 with each job taking oneor more vCPUs The majority of misclassified applicationscorrespond to instances with 5 or more concurrently activejobs

5 Security AttacksBolt makes several cloud attacks practical and difficult todetect We discuss three possible attacks that leverage theinformation obtained through detection

51 Internal Denial of Service AttackAttack setting Denial of service attacks hurt the perfor-mance of a victim service by overloading its resources [142 51] In cloud settings specifically they can be catego-rized in two types external and internal (or host-based) at-tacks External attacks are the most conventional form ofDoS [51 13 22] These attacks utilize external servers to di-rect excessive traffic to the victims flooding their resourcesand hurting their availability External DoS attacks affectmostly PaaS and SaaS systems and include IP spoofing syn-chronization (SYN) flooding smurf buffer overflow landand teardrop attacks

In contrast internal DoS attacks take advantage of IaaSand PaaS cloud multi-tenancy to launch adversarial pro-grams on the same host as the victim and degrade its perfor-mance [14 60 34 52] For example Ristenpart et al [60]showed how an adversarial user can leverage the IP namingconventions of IaaS clouds to locate a victim VM and de-grade its performance Cloud providers are starting to builddefenses against such attacks For example EC2 offers au-toscaling that will increase the instances of a service un-der heavy resource usage Similarly there is related workon mechanisms that detect saturation in memory bandwidth

0 20 40 60 80 100 120Time (s)

0

2000

4000

6000

8000

10000

12000

99th

ile

Lat

ency

BoltNaive

Detection

Migration

0 20 40 60 80 100 120Time (s)

0

20

40

60

80

100

CP

U U

tiliz

atio

n (

) BoltNaive

Detection

Migration

Figure 13 Latency and utilization with Bolt and a naıveDoS attack that saturates CPU resources

and can trigger VM migration [71] This means that DoSattacks that simply overload a physical host are ineffectivewhen such defenses are in place We focus on host-basedDoS attacks and make them resilient against such defensesby avoiding resource saturation We leverage the informationobtained by Bolt to construct custom interference programsthat stress the resources the victim is most sensitive to with-out overloading the host

Bolt creates custom contentious workloads by combiningthe microbenchmarks used to measure the resource pressureof the victim Since these microbenchmarks are tunable Boltconfigures their intensity to a higher point than their mea-sured pressure ci during detection For example if a victimis detected as memcached and its most critical resources arethe L1-i cache (81 pressure) and the last level cache (LLC)(78 pressure) Bolt uses the two microbenchmarks for L1-i and LLC at a higher intensity than what memcached cantolerateImpact We use the DoS attack setting above against the 108applications of the controlled experiment (Section 34) TheDoS attack results in degraded performance by 22x on aver-age and up to 98x in terms of execution time Degradation ismuch more pronounced for interactive workloads like key-value stores with tail latency increasing by 8-140x a dra-matic impact for applications with strict tail latency SLAs

We now examine the impact of the DoS attack on uti-lization If it translates to resource saturation there is a highprobability that the DoS will be detected and the victim mi-grated to a new machine The experimental cluster supportslive migration If CPU utilization (sampled every 1 sec) ex-ceeds 70 the victim is migrated to an unloaded host Fig-ure 13 compares the tail latency and CPU utilization withBolt to that of a naıve DoS that saturates the CPU through acompute-intensive kernel We focus on a single victim VMrunning memcached The overhead of migration (time be-tween initiating migration and latency returning to normallevels) for this VM is 8sec Performance degradation is sim-ilar for both systems until t=80sec at which point the victimis migrated to a new host due to the compute-intensive kernelcausing utilization to exceed 70 While during migrationperformance continues to degrade once the VM resumes inthe new server latency returns to nominal levels In contrastBolt keeps utilization low and impacts the victimrsquos perfor-mance beyond t=80sec

52 Resource Freeing AttackAttack setting The goal of a resource freeing attack (RFA)is to modify the workload of a victim such that it frees upresources for the adversary improving its performance [6766] The adversarial VM consists of two components a ben-eficiary and a helper The beneficiary is the program whoseperformance the attacker wants to improve and the helperis the program that forces the victim to yield its resourcesThe RFA works by adding load in the victimrsquos critical re-sources causing other resources to free up For example ifthe victim is a memory-bound Spark job introducing addi-tional memory traffic will result in Spark stalling in memoryand lessening its pressure in the other resources until it canreclaim its required memory bandwidth When launched ina public cloud the victim of an RFA ends up paying moreand achieving worse performance compared to running inisolation

We now create a proof of concept RFA for a webserver anetwork-bound Hadoop job and a memory-bound Spark jobLaunching RFAs requires in depth knowledge of the victimrsquosresource requirements [67] which Bolt provides by identi-fying the victimrsquos dominant resource Once the dominant re-source is detected the runtime applies a custom helper pro-gram that saturates this critical resource For the webserverthe helper is a CPU-intensive benchmark launching CGI re-quests causing the victim to saturate its CPU usage servic-ing fewer real user requests and freeing up cache resourcesFor the Hadoop job we use a network-intensive benchmarksimilar to iperf which saturates network bandwidth andfrees up CPU and memory resources for the beneficiarySimilarly for Spark we use a streaming memory benchmarkthat slows down k-means and frees network and compute re-sources for the adversary The selection of the beneficiaryis of lesser importance without loss of generality we se-lect mcf a CPU-intensive benchmark from SPECCPU2006The same methodology can be used with other beneficiariesconditioned to their critical resource not overlapping withthe victimrsquosImpact Table 2 shows the performance degradation for thethree victim applications and the improvement in executiontime for the beneficiary The webserver suffers the most interms of queries per second as the helperrsquos CGI requestspollute its cache preventing it from servicing legitimate userrequests Hadoop and Spark experience significant degrada-tions in execution time due to network and memory band-width saturation respectively Execution time for mcf im-proves by 16-38 benefiting from the victim stalling in itscritical resource to improve its cache and CPU usage

53 VM Co-residency DetectionAttack setting Sections 34 and 4 showed that we can iden-tify the applications sharing a cloud infrastructure Howevera malicious user is rarely interested in a random service run-ning on a public cloud More often they target a specific

Victim Beneficiary TargetApp Perf App Perf Resource

Apache Webserver -64 (QPS) mcf +24 CPUHadoop (SVM) -36 (Exec) mcf +16 Network BW

Spark (k-means) -52 (Exec) mcf +38 Memory BW

Table 2 RFA impact on the victims and beneficiaries

workload eg a competitive e-commerce site Thereforethey need to pinpoint where the target resides in a practicalmanner This requires a launch strategy and a mechanism forco-residency detection [69] The attack is practical if the tar-get is located with high confidence in reasonable time andwith modest resource costs Bolt enables this attack to becarried out in a practical way and remains resilient againstdefense mechanisms such as virtual private clouds (VPCs)which make internal IP addresses private to a single tenantOnce a target VM is located the adversary can launch RFAor DoS attacks as previously described Co-residency detec-tion attacks rely on leakage of logical information eg IPaddress or on observing the impact of a side-channel attackdue to resource contention Bolt relies on a variation of thelatter approach

Assume a system of N servers A target victim userlaunches k VMs The adversary launches n malicious VMsInstances are launched simultaneously to avoid malicious-malicious instance co-residency [69] The probability that atleast one adversarial VM will be co-scheduled with a victiminstance is P( f ) = 1minus (1minus kN)n The adversary then usesa two-process detection scheme The sender is a process thatcreates contention to degrade the victimrsquos performance Thereceiver is the process that detects this degradation runningeither on the same host as the victim (cooperative detection)or externally (uncooperative detection) We assume an unco-operative victim which is the more general case Bolt worksas the sender to determine the type of co-residents on eachsampled host and identify any of the desired type Assumem VMs of the desired type have been detected in the sampleset Bolt now needs to prune down this space to VM(s) fromthe specific target victim It launches an external receiveraiming at the victim over a public channel eg HTTP orkey-value protocol to ping the victim service This is theonly point of communication with the victim At the sametime the sender incurs contention in the resources the victimis most sensitive to If the sender and victim are co-residentsthe receiverrsquos requests will be slower due to contention Boltquickly prunes down the sampled VMs by determining thetype of co-residents In a large cluster this reduces the timeto pinpoint a victim and the resource costs for the adver-sary It also increases the confidence in the detection in thepresence of noise from other co-residentsImpact We evaluate co-residency detection on the same40-node cluster The victim is a SQL server instantiatinga single VM The cluster also hosts 7 other VMs runningSQL Other active VMs host key-value stores Hadoop andSpark jobs Bolt launches 10 senders simultaneously on

randomly-selected machines (P( f ) = ) and detects 3VMs running SQL in the sample set It then introduces mem-ory interference as the receiver initiates a large number ofSQL queries While these queries have 816msec mean la-tency without interference latency now becomes 2614msecGiven a sim 3times latency increase we can conclude that one ofthe co-residents is the victim SQL server Detection required6sec from instantiation to receiver detection and 11 adver-sary VMs

6 Improving Security via Resource IsolationSince interference is at the heart of Boltrsquos detection method-ology isolation mechanisms that attenuate contention shouldreduce its accuracy We first evaluate to what extent ex-isting isolation techniques mitigate security vulnerabilitiesand then highlight trade-offs between security performanceand utilization We use the same setting as in the controlledexperiment of Section 34 We evaluate 4 resource-specificisolation techniques and 3 settings for OS-level isolationmechanisms baremetal containerized and virtualized Thebaremetal system does not employ any OS level isolationThe containerized setup uses Linux containers (lxc) andassigns cores to applications via cpuset cgroups Bothcontainers and VMs constrain memory capacity Baremetalexperiments do not enforce memory capacity allocationsand the Linux scheduler is free to float applications acrosscores

The first resource-specific isolation techniques is threadpinning to physical cores to constrain interference fromscheduling actions like context switching The number ofcores an application is allocated can change dynamicallyand is limited by how fast Linux can migrate tasks typicallyin the tens of milliseconds

For network isolation we use the outbound networkbandwidth partitioning capabilities of Linuxrsquos traffic con-trol Specifically we use the qdisc [8] scheduler with hier-archical token bucket queueing discipline (HTB) to enforceegress bandwidth limits The limits are set to the maximumtraffic burst rate for each application to avoid contention(ceil parameter in HTB interface) Ingress network band-width isolation has been extensively studied in previouswork [36] these approaches can be applied here as well

For DRAM bandwidth isolation there is no commer-cially available partitioning mechanism To enforce band-width isolation we use the following approach we monitorthe DRAM bandwidth usage of each application throughperformance counters [45] and modify the scheduler to onlycolocate jobs on machines that can accommodate their ag-gregate peak memory bandwidth requirements This requiresextensive application knowledge and is used simply to high-light the benefits of DRAM bandwidth isolation

Finally for last level cache (LLC) isolation we use theCache Allocation Technology (CAT) available in recent In-tel chips [11] CAT partitions the LLC in ways which in

0

20

40

60

80

100

Acc

urac

y (

)

None ThreadPinning

+Net BWPartitioning

+Mem BWPartitioning

+CachePartitioning

+CoreIsolation

BaremetalLinuxContainersVirtualMachines

Figure 14 Detection accuracy with isolation techniques

a highly-associative cache allows for non-overlapping par-titions at the granularity of a few percent of the LLC ca-pacity Each co-resident is allocated one partition configuredto its current capacity requirements [57] Partitions can beresized at runtime by reprogramming specific low-level reg-isters (MSRs) changes take effect after a few milliseconds

We add one isolation mechanism at a time in the threesystem settings (baremetal containers VMs) Figure 14shows the impact of isolation techniques on Boltrsquos detectionaccuracy As expected when no isolation is used baremetalallows for a significantly higher detection accuracy thancontainer- and VM-based systems mostly due to the lat-ter constraining core and memory capacity usage As a re-sult introducing thread pinning benefits baremetal the mostsince it reduces core contention It also benefits container-and VM-based setups to a lesser degree by eliminating un-predictability introduced by the Linux scheduler (eg con-text switching) [40] The dominant resource of each appli-cation determines which isolation technique benefits it themost Thread pinning mostly benefits workloads bound byon-chip resources such as L1L2 caches and cores Addingnetwork bandwidth partitioning lowers detection accuracyfor all three settings almost equally It primarily benefitsnetwork-bound workloads for which network interferenceconveys the most information for detection Memory band-width isolation further reduces accuracy by 10 on averagebenefiting jobs dominated by DRAM traffic Finally cachepartitioning has the most dramatic reduction in accuracyespecially for LLC-bound workloads We attribute this tothe importance cache pressure has as a detection signal Thenumber of co-residents also affects the extent to which iso-lation helps The more co-scheduled applications exist permachine the more isolation techniques degrade accuracy asthey make distinguishing between co-residents harder

Unfortunately even when all the techniques are on ac-curacy is still 50 due to two reasons current techniquesare not fine-grain enough to allow strict and scalable isola-tion and core resources (L1L2 caches CPU) are prone tointerference due to contending hyperthreads [61] To eval-uate the latter hypothesis we modify the scheduler suchthat hyperthreads of different jobs are never scheduled onthe same physical core eg if a job needs 7vCPUs it isallocated 4 dedicated physical cores The grey bars of Fig-ure 14 show the detection accuracy Baremetal instances stillallow certain applications to be detected but for container-

ized and virtualized settings accuracy drops to 14 The re-maining accuracy corresponds to disk-heavy workloads Im-proving security however comes at a performance penaltyof 34 on average in execution time as threads of the samejob are forced to contend with one another Alternativelyusers can overprovision their resource reservations to avoiddegradation which results in a 45 drop in utilization Thismeans that the cloud provider cannot leverage CPU idlenessto share machines decreasing the cost benefits of cloud com-puting Note that enforcing core isolation alone is also notsufficient as it allows a detection accuracy of 46Discussion The previous analysis highlights a design prob-lem with current datacenter platforms Traditional multi-cores are prone to contention which will only worsen asmore cores are integrated in each server and multi-tenancybecomes more pronounced Existing isolation techniques areinsufficient to mitigate security vulnerabilities and tech-niques that provide reasonable security guarantees sacrificeperformance or cost efficiency through low utilization Thishighlights the need for new fine-grain and coordinated iso-lation techniques that guarantee security at high utilizationfor shared resources

7 ConclusionsWe have presented Bolt a practical system that uses onlinedata mining techniques to identify the type and character-istics of applications running on shared cloud systems andenables practical attacks that degrade their performance Ina 40-server cluster Bolt correctly identifies 87 out of 108workloads and degrades tail latency by up to 140x We alsoused Bolt in a user study on EC2 to detect the character-istics of unknown jobs submitted by multiple users Finallywe show that while existing isolation techniques are helpfulthey are not sufficient to mitigate such attacks Bolt revealsreal easy-to-exploit threats in public clouds We hope thatthis work will motivate cloud providers and computer sci-entists to develop and deploy stricter isolation primitives incloud systems

AcknowledgementsWe sincerely thank all the participants in the user study fortheir time and effort We also thank Daniel Sanchez EdSuh Grant Ayers Mingyu Gao Ana Klimovic and the restof the MAST group as well as the anonymous reviewersfor their feedback on earlier versions of this manuscriptThis work was supported by the Stanford Platform LabNSF grant CNS-1422088 and a John and Norma BalenSesquicentennial Faculty Fellowship

References[1] Amazon ec2 httpawsamazoncomec2

[2] Aman Bakshi and Yogesh B Dujodwala Securing cloudfrom ddos attacks using intrusion detection system in virtualmachine In Proc of the Second International Conference onCommunication Software and Networks (ICCSN) 2010

[3] Luiz Barroso and Urs Hoelzle The Datacenter as a Com-puter An Introduction to the Design of Warehouse-Scale Ma-chines MC Publishers 2009

[4] Naomi Benger Joop van de Pol Nigel P Smart and YuvalYarom rdquoooh aah just a little bitrdquo A small amount of sidechannel can go a long way In Proc of the InternationalCryptographic Hardware and Embedded Systems Workshop(CHES) Busan South Korea 2014

[5] Major Bhadauria and Sally A McKee An approach toresource-aware co-scheduling for cmps In Proc of the 24thACM International Conference on Supercomputing (ICS)Tsukuba Japan 2010

[6] Leon Bottou Large-scale machine learning with stochas-tic gradient descent In Proceedings of the InternationalConference on Computational Statistics (COMPSTAT) ParisFrance 2010

[7] Eric Brewer Kubernetes The path to cloud native httpgooglQgkzYB SOCC Keynote August 2015

[8] Martin A Brown Traffic control howto httplinux-ipnetarticlesTraffic-Control-HOWTO

[9] Robin Burke Hybrid recommender systems Survey andexperiments User Modeling and User-Adapted Interaction12(4)331ndash370 November 2002

[10] Apache cassandra httpcassandraapacheorg

[11] Intel Rcopy64 and IA-32 Architecture Software DeveloperrsquosManual vol3B System Programming Guide Part 2 Septem-ber 2014

[12] Ludmila Cherkasova Diwaker Gupta and Amin VahdatComparison of the three cpu schedulers in xen SIGMET-RICS Perform Eval Rev 35(2)42ndash51 September 2007

[13] Scott A Crosby and Dan S Wallach Denial of service viaalgorithmic complexity attacks In Proceedings of the 12thConference on USENIX Security Symposium WashingtonDC 2003

[14] Marwan Darwish Abdelkader Ouda and Luiz FernandoCapretz Cloud-based ddos attacks and defenses In Procof i-Society Toronto ON 2013

[15] Christina Delimitrou and Christos Kozyrakis iBench Quan-tifying Interference for Datacenter Workloads In Proceed-ings of the 2013 IEEE International Symposium on WorkloadCharacterization (IISWC) Portland OR September 2013

[16] Christina Delimitrou and Christos Kozyrakis Paragon QoS-Aware Scheduling for Heterogeneous Datacenters In Pro-ceedings of the Eighteenth International Conference on Ar-chitectural Support for Programming Languages and Oper-ating Systems (ASPLOS) Houston TX USA 2013

[17] Christina Delimitrou and Christos Kozyrakis QoS-AwareScheduling in Heterogeneous Datacenters with Paragon InACM Transactions on Computer Systems (TOCS) Vol 31Issue 4 December 2013

[18] Christina Delimitrou and Christos Kozyrakis Quality-of-Service-Aware Scheduling in Heterogeneous Datacenterswith Paragon In IEEE Micro Special Issue on Top Picks fromthe Computer Architecture Conferences MayJune 2014

[19] Christina Delimitrou and Christos Kozyrakis Quasar Resource-Efficient and QoS-Aware Cluster Management In Proceed-ings of the Nineteenth International Conference on Architec-tural Support for Programming Languages and OperatingSystems (ASPLOS) Salt Lake City UT USA 2014

[20] Christina Delimitrou and Christos Kozyrakis HCloudResource-Efficient Provisioning in Shared Cloud SystemsIn Proceedings of the Twenty First International Conferenceon Architectural Support for Programming Languages andOperating Systems (ASPLOS) April 2016

[21] Christina Delimitrou Daniel Sanchez and Christos KozyrakisTarcil Reconciling Scheduling Speed and Quality in LargeShared Clusters In Proceedings of the Sixth ACM Symposiumon Cloud Computing (SOCC) August 2015

[22] Jake Edge Denial of service via hash collisions httplwnnetArticles474912 January 2012

[23] Benjamin Farley Ari Juels Venkatanathan Varadarajan ThomasRistenpart Kevin D Bowers and Michael M Swift More foryour money Exploiting performance heterogeneity in publicclouds In Proc of the ACM Symposium on Cloud Computing(SOCC) San Jose CA 2012

[24] Alexandra Fedorova Margo Seltzer and Michael D SmithImproving performance isolation on chip multiprocessors viaan operating system scheduler In Proceedings of the 16thIntl Conference on Parallel Architecture and CompilationTechniques (PACT) Brasov Romania 2007

[25] Alexander Felfernig and Robin Burke Constraint-based rec-ommender systems Technologies and research issues InProceedings of the ACM International Conference on Elec-tronic Commerce (ICEC) Innsbruck Austria 2008

[26] Brad Fitzpatrick Distributed caching with memcached InLinux Journal Volume 2004 Issue 124 2004

[27] Oded Goldreich and Rafail Ostrovsky Software protectionand simulation on oblivious rams J ACM 43(3)431ndash473May 1996

[28] Google container engine httpscloudgooglecomcontainer-engine

[29] Asela Gunawardana and Christopher Meek A unified ap-proach to building hybrid recommender systems In Proc ofthe Third ACM Conference on Recommender Systems (Rec-Sys) New York NY 2009

[30] Sanchika Gupta and Padam Kumar Vm profile based opti-mized network attack pattern detection scheme for ddos at-tacks in cloud In Proc of SSCC Mysore India 2013

[31] Yi Han Tansu Alpcan Jeffrey Chan and Christopher LeckieSecurity games for virtual machine allocation in cloud com-puting In 4th International Conference on Decision andGame Theory for Security Fort Worth TX 2013

[32] Amir Herzberg Haya Shulman Johanna Ullrich and EdgarWeippl Cloudoscopy Services discovery and topology map-ping In Proceedings of the ACM Workshop on Cloud Com-puting Security Workshop (CCSW) Berlin Germany 2013

[33] Ben Hindman Andy Konwinski Matei Zaharia Ali GhodsiAnthony D Joseph Randy Katz Scott Shenker and Ion Sto-ica Mesos A platform for fine-grained resource sharing inthe data center In Proceedings of NSDI Boston MA 2011

[34] Jingwei Huang David M Nicol and Roy H CampbellDenial-of-service threat to hadoopyarn clusters with multi-tenancy In Proc of the IEEE International Congress on BigData Washington DC 2014

[35] Alexandru Iosup Nezih Yigitbasi and Dick Epema Onthe performance variability of production cloud services InProceedings of CCGRID Newport Beach CA 2011

[36] Vimalkumar Jeyakumar Mohammad Alizadeh David MazieresBalaji Prabhakar Changhoon Kim and Albert GreenbergEyeq Practical network performance isolation at the edge InProc of the 10th USENIX Conference on Networked SystemsDesign and Implementation (NSDI) Lombard IL 2013

[37] Yaakoub El Khamra Hyunjoo Kim Shantenu Jha and Man-ish Parashar Exploring the performance fluctuations of hpcworkloads on clouds In Proceedings of CloudCom Indi-anapolis IN 2010

[38] Krzysztof C Kiwiel Convergence and efficiency of subgra-dient methods for quasiconvex minimization In Mathemat-ical Programming (Series A) (Berlin Heidelberg Springer)90 (1) pp 1-25 2001

[39] Ruby B Lee Rethinking computers for cybersecurity IEEEComputer 48(4)16ndash25 2015

[40] Jacob Leverich and Christos Kozyrakis Reconciling highserver utilization and sub-millisecond quality-of-service InProceedings of EuroSys Amsterdam The Netherlands 2014

[41] Host server cpu utilization in amazon ec2 cloud httpgoogl2LTx4T

[42] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser andRuby B Lee Last-level cache side-channel attacks are prac-tical In Proc of IEEE Symposium on Security and Privacy(SampP) San Jose CA 2015

[43] Fei Liu Lanfang Ren and Hongtao Bai Mitigating cross-vm side channel attack on multiple tenants cloud platform InJournal of Computers Vol 9 No 4 (2014) 1005-1013 April2014

[44] David Lo Liqun Cheng Rama Govindaraju Luiz Andre Bar-roso and Christos Kozyrakis Towards energy proportionalityfor large-scale latency-critical workloads In Proceedings ofthe 41st Annual International Symposium on Computer Ar-chitecuture (ISCA) Minneapolis MN 2014

[45] David Lo Liqun Cheng Rama Govindaraju ParthasarathyRanganathan and Christos Kozyrakis Heracles Improvingresource efficiency at scale In Proc of the 42Nd AnnualInternational Symposium on Computer Architecture (ISCA)Portland OR 2015

[46] Mahout httpmahoutapacheorg

[47] Dave Mangot Ec2 variability The numbers revealed httptechmangotcomrollerdaveentryec2_variability_the_numbers_revealed

[48] Jason Mars and Lingjia Tang Whare-map heterogeneity inrdquohomogeneousrdquo warehouse-scale computers In Proceedingsof ISCA Tel-Aviv Israel 2013

[49] Robert Martin John Demme and Simha SethumadhavanTimewarp Rethinking timekeeping and performance moni-toring mechanisms to mitigate side-channel attacks In Pro-ceedings of the International Symposium on Computer Archi-tecture (ISCA) Portland OR 2012

[50] David Meisner Christopher M Sadler Luiz Andre BarrosoWolf-Dietrich Weber and Thomas F Wenisch Power man-agement of online data-intensive services In Proceedings ofthe 38th annual international symposium on Computer archi-tecture pages 319ndash330 2011

[51] Jelena Mirkovic and Peter Reiher A taxonomy of ddos attackand ddos defense mechanisms ACM SIGCOMM ComputerCommunication Review (CCR) April 2004

[52] Thomas Moscibroda and Onur Mutlu Memory performanceattacks Denial of memory service in multi-core systemsIn Proc of 16th USENIX Security Symposium on USENIXSecurity Symposium (SS) Boston MA 2007

[53] Ripal Nathuji Aman Kansal and Alireza Ghaffarkhah Q-clouds Managing performance interference effects for qos-aware clouds In Proceedings of EuroSys ParisFrance 2010

[54] Simon Ostermann Alexandru Iosup Nezih Yigitbasi RaduProdan Thomas Fahringer and Dick Epema A performanceanalysis of ec2 cloud computing services for scientific com-puting In Lecture Notes on Cloud Computing Volume 34p115-131 2010

[55] Tao Peng Christopher Leckie and Kotagiri RamamohanaraoSurvey of network-based defense mechanisms countering thedos and ddos problems ACM Comput Surv 39(1) April2007

[56] Diego Perez-Botero Jakub Szefer and Ruby B Lee Char-acterizing hypervisor vulnerabilities in cloud computingservers In Proceedings of the 2013 International Work-shop on Security in Cloud Computing SCCASIACCSHangzhou China 2013

[57] Moinuddin K Qureshi and Yale N Patt Utility-based cachepartitioning A low-overhead high-performance runtimemechanism to partition shared caches In Proceedings ofthe 39th Annual IEEEACM International Symposium on Mi-croarchitecture MICRO 39 2006

[58] Himanshu Raj Ripal Nathuji Abhishek Singh and Paul Eng-land Resource management for isolation enhanced cloud ser-vices In Proc of the ACM Workshop on Cloud ComputingSecurity (CCSW) Chicago IL 2009

[59] Suhail Rehman and Majd Sakr Initial findings for provision-ing variation in cloud computing In Proceedings of Cloud-Com Indianapolis IN 2010

[60] Thomas Ristenpart Eran Tromer Hovav Shacham and Ste-fan Savage Hey you get off of my cloud Exploring infor-mation leakage in third-party compute clouds In Proc of theACM Conference on Computer and Communications Security(CCS) Chicago IL 2009

[61] Daniel Sanchez and Christos Kozyrakis Vantage Scalableand Efficient Fine-Grain Cache Partitioning In Proceedingsof the 38th annual International Symposium in ComputerArchitecture (ISCA-38) San Jose CA June 2011

[62] Jorg Schad Jens Dittrich and Jorge-Arnulfo Quiane-RuizRuntime measurements in the cloud Observing analyzingand reducing variance Proceedings VLDB Endow 3(1-2)460ndash471 September 2010

[63] Malte Schwarzkopf Andy Konwinski Michael Abd-El-Malek and John Wilkes Omega flexible scalable sched-ulers for large compute clusters In Proceedings of EuroSysPrague Czech Republic 2013

[64] Alan Shieh Srikanth Kandula Albert Greenberg and ChanghoonKim Seawall Performance isolation for cloud datacenternetworks In Proc of the USENIX Conference on Hot Topicsin Cloud Computing (HotCloud) Boston MA 2010

[65] David Shue Michael J Freedman and Anees Shaikh Perfor-mance isolation and fairness for multi-tenant cloud storage InProc of the 10th USENIX Conference on Operating SystemsDesign and Implementation (OSDI) Hollywood CA 2012

[66] Dan Tsafrir Yoav Etsion and Dror G Feitelson Secretlymonopolizing the cpu without superuser privileges In Procof 16th USENIX Security Symposium on USENIX SecuritySymposium Boston MA 2007

[67] Venkatanathan Varadarajan Thawan Kooburat BenjaminFarley Thomas Ristenpart and Michael Swift Resource-freeing attacks Improve your cloud performance (at yourneighborrsquos expense) In Proc of the Conference on Computerand Communications Security (CCS) Raleigh 2012

[68] Venkatanathan Varadarajan Thomas Ristenpart and MichaelSwift Scheduler-based defenses against cross-vm side-channels In Proc of the 23rd Usenix Security SymposiumSan Diego CA 2014

[69] Venkatanathan Varadarajan Yinqian Zhang Thomas Risten-part and Michael Swift A placement vulnerability study inmulti-tenant public clouds In Proc of the 24th USENIX Secu-rity Symposium (USENIX Security) Washington DC 2015

[70] Huaibin Wang Haiyun Zhou and Chundong Wang Vir-tual machine-based intrusion detection system framework incloud computing environment In Journal of Computers Oc-tober 2012

[71] Hui Wang Canturk Isci Lavanya Subramanian JongmooChoi Depei Qian and Onur Mutlu A-drm Architecture-aware distributed resource management of virtualized clus-ters In Proceedings of the 11th ACM SIGPLANSIGOPSinternational conference on Virtual Execution Environments(VEE) Istanbul Turkey 2015

[72] Ian H Witten Eibe Frank and Geoffrey Holmes DataMining Practical Machine Learning Tools and Techniques3rd Edition

[73] Zhenyu Wu Zhang Xu and Haining Wang Whispers inthe hyper-space High-speed covert channel attacks in thecloud In Proc of the 21st USENIX Conference on SecuritySymposium (USENIX Security) Bellevue WA 2012

[74] Yunjing Xu Michael Bailey Farnam Jahanian KaustubhJoshi Matti Hiltunen and Richard Schlichting An explo-ration of l2 cache covert channels in virtualized environmentsIn Proc of the 3rd ACM Workshop on Cloud Computing Se-curity Workshop (CCSW) Chicago IL 2011

[75] Zhang Xu Haining Wang and Zhenyu Wu A measurementstudy on co-residence threat inside the cloud In Proc ofthe 24th USENIX Security Symposium (USENIX Security)Washington DC 2015

[76] Yuval Yarom and Katrina Falkner Flush+reload a highresolution low noise l3 cache side-channel attack In Proc ofthe 23rd Usenix Security Symposium San Diego CA 2014

[77] Matei Zaharia Mosharaf Chowdhury Tathagata Das AnkurDave Justin Ma Murphy McCauly Michael J FranklinScott Shenker and Ion Stoica Resilient distributed datasetsA fault-tolerant abstraction for in-memory cluster computingIn Proceedings of NSDI San Jose CA 2012

[78] Yinqian Zhang Ari Juels Alina Oprea and Michael K Re-iter Homealone Co-residency detection in the cloud via side-channel analysis In Proc of the IEEE Symposium on Securityand Privacy Oakland CA 2011

[79] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-tenant side-channel attacks in paas cloudsIn Proc of the ACM SIGSAC Conference on Computer andCommunications Security (CCS) Scottsdale AZ 2014

[80] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-vm side channels and their use to ex-tract private keys In Proceedings of the ACM Conferenceon Computer and Communications Security (CCS) RaleighNC 2012

[81] Yinqian Zhang and Michael K Reiter Duppel retrofittingcommodity operating systems to mitigate cache side channelsin the cloud In Proc of the ACM Conference on Computerand Communications Security (CCS) Berlin Germany 2013

[82] Fangfei Zhou Manish Goel Peter Desnoyers and Ravi Sun-daram Scheduler vulnerabilities and coordinated attacks incloud computing J Comput Secur 21(4)533ndash559 July2013

[83] Jieming Zhu Pinjia He Zibin Zheng and Michael R LyuTowards online accurate and scalable qos prediction for run-time service adaptation In Proc of the IEEE InternationalConference on Distributed Computing Systems (ICDCS)Madrid Spain 2014

[84] Sergey Zhuravlev Sergey Blagodurov and Alexandra Fe-dorova Addressing shared resource contention in multicoreprocessors via scheduling In Proc of the Fifteenth Editionof on Architectural Support for Programming Languages andOperating Systems (ASPLOS) Pittsburgh PA 2010

  • Introduction
  • Related Work
  • Bolt
    • Threat Model
    • Application Detection
    • Challenges
    • Detection Accuracy
    • Limitations
      • Cloud User Study
      • Security Attacks
        • Internal Denial of Service Attack
        • Resource Freeing Attack
        • VM Co-residency Detection
          • Improving Security via Resource Isolation
          • Conclusions
Page 9: Bolt: I Know What You Did Last Summer In the Cloudcsl.stanford.edu/~christos/publications/2017.bolt.asplos.pdf · 2018-01-09 · Bolt: I Know What You Did Last Summer... In the Cloud

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30O

ccur

renc

esUser 1User 2User 3User 4User 5User 6User 7User 8User 9User 10

User 11User 12User 13User 14User 15User 16User 17User 18User 19User 20

1hadoop 5cadence 9MLPython 13spec 17parsec 21postgres 25ppt 29du -h 33cpu burn 37html 41cpmv 45rsync 49rm 53ix2spark 6zsim 10make 14matlab 18vim 22musicStream 26OS img 30crdel cgroup 34audacity 38cassandra 42sirius 46ping 50skype3email 7video 11mem$d 15mysql 19scala 23minebench 27pdfview 31bioparallel 35javascript 39mongoDB 43oProfile 47photoshop 51zipkin

4browser 8latex 12http server 16vivado 20php 24n-body sim 28scons 32storm 36create VMs 40mkdir 44dwnld LF 48ssh 52graphX

Figure 11 Probability distribution function of applications launched in the cloud study per user

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30

Occ

urre

nces

Correctly Identifying App Name

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

Application Label

0

5

10

15

20

25

30

Occ

urre

nces

Correctly Identifying App Characteristics

0 50 100 150 200Time (min)

0

50

100

150

200

Inst

ance

s

0006121824303642485460

Act

ive

Jobs

Figure 12 Detection accuracy of (a) application labels and (b) resource characteristics Fig 12(c) number of jobs perinstance

figure 11 groups applications by framework the individuallogic and datasets differ between jobs in the same frame-work Note that we have not updated the training set for thisstudy it consists of the same 120 applications described inSection 34

Figure 12a shows the number of applications that werecorrectly identified by name across the different categoriesIn total 277 jobs are correctly labeled As expected Boltcould not assign a label to application types it has not seenbefore such as email clients or image editing applicationsHowever even when Bolt cannot label an application it canstill identify the resources it is sensitive to as shown in fig-ure 12b For 385 out of 436 jobs Bolt correctly identifiesthe jobrsquos resource characteristics For performance attackssuch as the ones described in Section 5 this information issufficient to dramatically degrade the performance of a vic-tim application Finally figure 12c shows the number of co-scheduled applications per instance throughout the durationof the experiment The bottom 14 instances remained un-used for the remaining 186 instances the number of activeapplications ranges from 1 to 6 with each job taking oneor more vCPUs The majority of misclassified applicationscorrespond to instances with 5 or more concurrently activejobs

5 Security AttacksBolt makes several cloud attacks practical and difficult todetect We discuss three possible attacks that leverage theinformation obtained through detection

51 Internal Denial of Service AttackAttack setting Denial of service attacks hurt the perfor-mance of a victim service by overloading its resources [142 51] In cloud settings specifically they can be catego-rized in two types external and internal (or host-based) at-tacks External attacks are the most conventional form ofDoS [51 13 22] These attacks utilize external servers to di-rect excessive traffic to the victims flooding their resourcesand hurting their availability External DoS attacks affectmostly PaaS and SaaS systems and include IP spoofing syn-chronization (SYN) flooding smurf buffer overflow landand teardrop attacks

In contrast internal DoS attacks take advantage of IaaSand PaaS cloud multi-tenancy to launch adversarial pro-grams on the same host as the victim and degrade its perfor-mance [14 60 34 52] For example Ristenpart et al [60]showed how an adversarial user can leverage the IP namingconventions of IaaS clouds to locate a victim VM and de-grade its performance Cloud providers are starting to builddefenses against such attacks For example EC2 offers au-toscaling that will increase the instances of a service un-der heavy resource usage Similarly there is related workon mechanisms that detect saturation in memory bandwidth

0 20 40 60 80 100 120Time (s)

0

2000

4000

6000

8000

10000

12000

99th

ile

Lat

ency

BoltNaive

Detection

Migration

0 20 40 60 80 100 120Time (s)

0

20

40

60

80

100

CP

U U

tiliz

atio

n (

) BoltNaive

Detection

Migration

Figure 13 Latency and utilization with Bolt and a naıveDoS attack that saturates CPU resources

and can trigger VM migration [71] This means that DoSattacks that simply overload a physical host are ineffectivewhen such defenses are in place We focus on host-basedDoS attacks and make them resilient against such defensesby avoiding resource saturation We leverage the informationobtained by Bolt to construct custom interference programsthat stress the resources the victim is most sensitive to with-out overloading the host

Bolt creates custom contentious workloads by combiningthe microbenchmarks used to measure the resource pressureof the victim Since these microbenchmarks are tunable Boltconfigures their intensity to a higher point than their mea-sured pressure ci during detection For example if a victimis detected as memcached and its most critical resources arethe L1-i cache (81 pressure) and the last level cache (LLC)(78 pressure) Bolt uses the two microbenchmarks for L1-i and LLC at a higher intensity than what memcached cantolerateImpact We use the DoS attack setting above against the 108applications of the controlled experiment (Section 34) TheDoS attack results in degraded performance by 22x on aver-age and up to 98x in terms of execution time Degradation ismuch more pronounced for interactive workloads like key-value stores with tail latency increasing by 8-140x a dra-matic impact for applications with strict tail latency SLAs

We now examine the impact of the DoS attack on uti-lization If it translates to resource saturation there is a highprobability that the DoS will be detected and the victim mi-grated to a new machine The experimental cluster supportslive migration If CPU utilization (sampled every 1 sec) ex-ceeds 70 the victim is migrated to an unloaded host Fig-ure 13 compares the tail latency and CPU utilization withBolt to that of a naıve DoS that saturates the CPU through acompute-intensive kernel We focus on a single victim VMrunning memcached The overhead of migration (time be-tween initiating migration and latency returning to normallevels) for this VM is 8sec Performance degradation is sim-ilar for both systems until t=80sec at which point the victimis migrated to a new host due to the compute-intensive kernelcausing utilization to exceed 70 While during migrationperformance continues to degrade once the VM resumes inthe new server latency returns to nominal levels In contrastBolt keeps utilization low and impacts the victimrsquos perfor-mance beyond t=80sec

52 Resource Freeing AttackAttack setting The goal of a resource freeing attack (RFA)is to modify the workload of a victim such that it frees upresources for the adversary improving its performance [6766] The adversarial VM consists of two components a ben-eficiary and a helper The beneficiary is the program whoseperformance the attacker wants to improve and the helperis the program that forces the victim to yield its resourcesThe RFA works by adding load in the victimrsquos critical re-sources causing other resources to free up For example ifthe victim is a memory-bound Spark job introducing addi-tional memory traffic will result in Spark stalling in memoryand lessening its pressure in the other resources until it canreclaim its required memory bandwidth When launched ina public cloud the victim of an RFA ends up paying moreand achieving worse performance compared to running inisolation

We now create a proof of concept RFA for a webserver anetwork-bound Hadoop job and a memory-bound Spark jobLaunching RFAs requires in depth knowledge of the victimrsquosresource requirements [67] which Bolt provides by identi-fying the victimrsquos dominant resource Once the dominant re-source is detected the runtime applies a custom helper pro-gram that saturates this critical resource For the webserverthe helper is a CPU-intensive benchmark launching CGI re-quests causing the victim to saturate its CPU usage servic-ing fewer real user requests and freeing up cache resourcesFor the Hadoop job we use a network-intensive benchmarksimilar to iperf which saturates network bandwidth andfrees up CPU and memory resources for the beneficiarySimilarly for Spark we use a streaming memory benchmarkthat slows down k-means and frees network and compute re-sources for the adversary The selection of the beneficiaryis of lesser importance without loss of generality we se-lect mcf a CPU-intensive benchmark from SPECCPU2006The same methodology can be used with other beneficiariesconditioned to their critical resource not overlapping withthe victimrsquosImpact Table 2 shows the performance degradation for thethree victim applications and the improvement in executiontime for the beneficiary The webserver suffers the most interms of queries per second as the helperrsquos CGI requestspollute its cache preventing it from servicing legitimate userrequests Hadoop and Spark experience significant degrada-tions in execution time due to network and memory band-width saturation respectively Execution time for mcf im-proves by 16-38 benefiting from the victim stalling in itscritical resource to improve its cache and CPU usage

53 VM Co-residency DetectionAttack setting Sections 34 and 4 showed that we can iden-tify the applications sharing a cloud infrastructure Howevera malicious user is rarely interested in a random service run-ning on a public cloud More often they target a specific

Victim Beneficiary TargetApp Perf App Perf Resource

Apache Webserver -64 (QPS) mcf +24 CPUHadoop (SVM) -36 (Exec) mcf +16 Network BW

Spark (k-means) -52 (Exec) mcf +38 Memory BW

Table 2 RFA impact on the victims and beneficiaries

workload eg a competitive e-commerce site Thereforethey need to pinpoint where the target resides in a practicalmanner This requires a launch strategy and a mechanism forco-residency detection [69] The attack is practical if the tar-get is located with high confidence in reasonable time andwith modest resource costs Bolt enables this attack to becarried out in a practical way and remains resilient againstdefense mechanisms such as virtual private clouds (VPCs)which make internal IP addresses private to a single tenantOnce a target VM is located the adversary can launch RFAor DoS attacks as previously described Co-residency detec-tion attacks rely on leakage of logical information eg IPaddress or on observing the impact of a side-channel attackdue to resource contention Bolt relies on a variation of thelatter approach

Assume a system of N servers A target victim userlaunches k VMs The adversary launches n malicious VMsInstances are launched simultaneously to avoid malicious-malicious instance co-residency [69] The probability that atleast one adversarial VM will be co-scheduled with a victiminstance is P( f ) = 1minus (1minus kN)n The adversary then usesa two-process detection scheme The sender is a process thatcreates contention to degrade the victimrsquos performance Thereceiver is the process that detects this degradation runningeither on the same host as the victim (cooperative detection)or externally (uncooperative detection) We assume an unco-operative victim which is the more general case Bolt worksas the sender to determine the type of co-residents on eachsampled host and identify any of the desired type Assumem VMs of the desired type have been detected in the sampleset Bolt now needs to prune down this space to VM(s) fromthe specific target victim It launches an external receiveraiming at the victim over a public channel eg HTTP orkey-value protocol to ping the victim service This is theonly point of communication with the victim At the sametime the sender incurs contention in the resources the victimis most sensitive to If the sender and victim are co-residentsthe receiverrsquos requests will be slower due to contention Boltquickly prunes down the sampled VMs by determining thetype of co-residents In a large cluster this reduces the timeto pinpoint a victim and the resource costs for the adver-sary It also increases the confidence in the detection in thepresence of noise from other co-residentsImpact We evaluate co-residency detection on the same40-node cluster The victim is a SQL server instantiatinga single VM The cluster also hosts 7 other VMs runningSQL Other active VMs host key-value stores Hadoop andSpark jobs Bolt launches 10 senders simultaneously on

randomly-selected machines (P( f ) = ) and detects 3VMs running SQL in the sample set It then introduces mem-ory interference as the receiver initiates a large number ofSQL queries While these queries have 816msec mean la-tency without interference latency now becomes 2614msecGiven a sim 3times latency increase we can conclude that one ofthe co-residents is the victim SQL server Detection required6sec from instantiation to receiver detection and 11 adver-sary VMs

6 Improving Security via Resource IsolationSince interference is at the heart of Boltrsquos detection method-ology isolation mechanisms that attenuate contention shouldreduce its accuracy We first evaluate to what extent ex-isting isolation techniques mitigate security vulnerabilitiesand then highlight trade-offs between security performanceand utilization We use the same setting as in the controlledexperiment of Section 34 We evaluate 4 resource-specificisolation techniques and 3 settings for OS-level isolationmechanisms baremetal containerized and virtualized Thebaremetal system does not employ any OS level isolationThe containerized setup uses Linux containers (lxc) andassigns cores to applications via cpuset cgroups Bothcontainers and VMs constrain memory capacity Baremetalexperiments do not enforce memory capacity allocationsand the Linux scheduler is free to float applications acrosscores

The first resource-specific isolation techniques is threadpinning to physical cores to constrain interference fromscheduling actions like context switching The number ofcores an application is allocated can change dynamicallyand is limited by how fast Linux can migrate tasks typicallyin the tens of milliseconds

For network isolation we use the outbound networkbandwidth partitioning capabilities of Linuxrsquos traffic con-trol Specifically we use the qdisc [8] scheduler with hier-archical token bucket queueing discipline (HTB) to enforceegress bandwidth limits The limits are set to the maximumtraffic burst rate for each application to avoid contention(ceil parameter in HTB interface) Ingress network band-width isolation has been extensively studied in previouswork [36] these approaches can be applied here as well

For DRAM bandwidth isolation there is no commer-cially available partitioning mechanism To enforce band-width isolation we use the following approach we monitorthe DRAM bandwidth usage of each application throughperformance counters [45] and modify the scheduler to onlycolocate jobs on machines that can accommodate their ag-gregate peak memory bandwidth requirements This requiresextensive application knowledge and is used simply to high-light the benefits of DRAM bandwidth isolation

Finally for last level cache (LLC) isolation we use theCache Allocation Technology (CAT) available in recent In-tel chips [11] CAT partitions the LLC in ways which in

0

20

40

60

80

100

Acc

urac

y (

)

None ThreadPinning

+Net BWPartitioning

+Mem BWPartitioning

+CachePartitioning

+CoreIsolation

BaremetalLinuxContainersVirtualMachines

Figure 14 Detection accuracy with isolation techniques

a highly-associative cache allows for non-overlapping par-titions at the granularity of a few percent of the LLC ca-pacity Each co-resident is allocated one partition configuredto its current capacity requirements [57] Partitions can beresized at runtime by reprogramming specific low-level reg-isters (MSRs) changes take effect after a few milliseconds

We add one isolation mechanism at a time in the threesystem settings (baremetal containers VMs) Figure 14shows the impact of isolation techniques on Boltrsquos detectionaccuracy As expected when no isolation is used baremetalallows for a significantly higher detection accuracy thancontainer- and VM-based systems mostly due to the lat-ter constraining core and memory capacity usage As a re-sult introducing thread pinning benefits baremetal the mostsince it reduces core contention It also benefits container-and VM-based setups to a lesser degree by eliminating un-predictability introduced by the Linux scheduler (eg con-text switching) [40] The dominant resource of each appli-cation determines which isolation technique benefits it themost Thread pinning mostly benefits workloads bound byon-chip resources such as L1L2 caches and cores Addingnetwork bandwidth partitioning lowers detection accuracyfor all three settings almost equally It primarily benefitsnetwork-bound workloads for which network interferenceconveys the most information for detection Memory band-width isolation further reduces accuracy by 10 on averagebenefiting jobs dominated by DRAM traffic Finally cachepartitioning has the most dramatic reduction in accuracyespecially for LLC-bound workloads We attribute this tothe importance cache pressure has as a detection signal Thenumber of co-residents also affects the extent to which iso-lation helps The more co-scheduled applications exist permachine the more isolation techniques degrade accuracy asthey make distinguishing between co-residents harder

Unfortunately even when all the techniques are on ac-curacy is still 50 due to two reasons current techniquesare not fine-grain enough to allow strict and scalable isola-tion and core resources (L1L2 caches CPU) are prone tointerference due to contending hyperthreads [61] To eval-uate the latter hypothesis we modify the scheduler suchthat hyperthreads of different jobs are never scheduled onthe same physical core eg if a job needs 7vCPUs it isallocated 4 dedicated physical cores The grey bars of Fig-ure 14 show the detection accuracy Baremetal instances stillallow certain applications to be detected but for container-

ized and virtualized settings accuracy drops to 14 The re-maining accuracy corresponds to disk-heavy workloads Im-proving security however comes at a performance penaltyof 34 on average in execution time as threads of the samejob are forced to contend with one another Alternativelyusers can overprovision their resource reservations to avoiddegradation which results in a 45 drop in utilization Thismeans that the cloud provider cannot leverage CPU idlenessto share machines decreasing the cost benefits of cloud com-puting Note that enforcing core isolation alone is also notsufficient as it allows a detection accuracy of 46Discussion The previous analysis highlights a design prob-lem with current datacenter platforms Traditional multi-cores are prone to contention which will only worsen asmore cores are integrated in each server and multi-tenancybecomes more pronounced Existing isolation techniques areinsufficient to mitigate security vulnerabilities and tech-niques that provide reasonable security guarantees sacrificeperformance or cost efficiency through low utilization Thishighlights the need for new fine-grain and coordinated iso-lation techniques that guarantee security at high utilizationfor shared resources

7 ConclusionsWe have presented Bolt a practical system that uses onlinedata mining techniques to identify the type and character-istics of applications running on shared cloud systems andenables practical attacks that degrade their performance Ina 40-server cluster Bolt correctly identifies 87 out of 108workloads and degrades tail latency by up to 140x We alsoused Bolt in a user study on EC2 to detect the character-istics of unknown jobs submitted by multiple users Finallywe show that while existing isolation techniques are helpfulthey are not sufficient to mitigate such attacks Bolt revealsreal easy-to-exploit threats in public clouds We hope thatthis work will motivate cloud providers and computer sci-entists to develop and deploy stricter isolation primitives incloud systems

AcknowledgementsWe sincerely thank all the participants in the user study fortheir time and effort We also thank Daniel Sanchez EdSuh Grant Ayers Mingyu Gao Ana Klimovic and the restof the MAST group as well as the anonymous reviewersfor their feedback on earlier versions of this manuscriptThis work was supported by the Stanford Platform LabNSF grant CNS-1422088 and a John and Norma BalenSesquicentennial Faculty Fellowship

References[1] Amazon ec2 httpawsamazoncomec2

[2] Aman Bakshi and Yogesh B Dujodwala Securing cloudfrom ddos attacks using intrusion detection system in virtualmachine In Proc of the Second International Conference onCommunication Software and Networks (ICCSN) 2010

[3] Luiz Barroso and Urs Hoelzle The Datacenter as a Com-puter An Introduction to the Design of Warehouse-Scale Ma-chines MC Publishers 2009

[4] Naomi Benger Joop van de Pol Nigel P Smart and YuvalYarom rdquoooh aah just a little bitrdquo A small amount of sidechannel can go a long way In Proc of the InternationalCryptographic Hardware and Embedded Systems Workshop(CHES) Busan South Korea 2014

[5] Major Bhadauria and Sally A McKee An approach toresource-aware co-scheduling for cmps In Proc of the 24thACM International Conference on Supercomputing (ICS)Tsukuba Japan 2010

[6] Leon Bottou Large-scale machine learning with stochas-tic gradient descent In Proceedings of the InternationalConference on Computational Statistics (COMPSTAT) ParisFrance 2010

[7] Eric Brewer Kubernetes The path to cloud native httpgooglQgkzYB SOCC Keynote August 2015

[8] Martin A Brown Traffic control howto httplinux-ipnetarticlesTraffic-Control-HOWTO

[9] Robin Burke Hybrid recommender systems Survey andexperiments User Modeling and User-Adapted Interaction12(4)331ndash370 November 2002

[10] Apache cassandra httpcassandraapacheorg

[11] Intel Rcopy64 and IA-32 Architecture Software DeveloperrsquosManual vol3B System Programming Guide Part 2 Septem-ber 2014

[12] Ludmila Cherkasova Diwaker Gupta and Amin VahdatComparison of the three cpu schedulers in xen SIGMET-RICS Perform Eval Rev 35(2)42ndash51 September 2007

[13] Scott A Crosby and Dan S Wallach Denial of service viaalgorithmic complexity attacks In Proceedings of the 12thConference on USENIX Security Symposium WashingtonDC 2003

[14] Marwan Darwish Abdelkader Ouda and Luiz FernandoCapretz Cloud-based ddos attacks and defenses In Procof i-Society Toronto ON 2013

[15] Christina Delimitrou and Christos Kozyrakis iBench Quan-tifying Interference for Datacenter Workloads In Proceed-ings of the 2013 IEEE International Symposium on WorkloadCharacterization (IISWC) Portland OR September 2013

[16] Christina Delimitrou and Christos Kozyrakis Paragon QoS-Aware Scheduling for Heterogeneous Datacenters In Pro-ceedings of the Eighteenth International Conference on Ar-chitectural Support for Programming Languages and Oper-ating Systems (ASPLOS) Houston TX USA 2013

[17] Christina Delimitrou and Christos Kozyrakis QoS-AwareScheduling in Heterogeneous Datacenters with Paragon InACM Transactions on Computer Systems (TOCS) Vol 31Issue 4 December 2013

[18] Christina Delimitrou and Christos Kozyrakis Quality-of-Service-Aware Scheduling in Heterogeneous Datacenterswith Paragon In IEEE Micro Special Issue on Top Picks fromthe Computer Architecture Conferences MayJune 2014

[19] Christina Delimitrou and Christos Kozyrakis Quasar Resource-Efficient and QoS-Aware Cluster Management In Proceed-ings of the Nineteenth International Conference on Architec-tural Support for Programming Languages and OperatingSystems (ASPLOS) Salt Lake City UT USA 2014

[20] Christina Delimitrou and Christos Kozyrakis HCloudResource-Efficient Provisioning in Shared Cloud SystemsIn Proceedings of the Twenty First International Conferenceon Architectural Support for Programming Languages andOperating Systems (ASPLOS) April 2016

[21] Christina Delimitrou Daniel Sanchez and Christos KozyrakisTarcil Reconciling Scheduling Speed and Quality in LargeShared Clusters In Proceedings of the Sixth ACM Symposiumon Cloud Computing (SOCC) August 2015

[22] Jake Edge Denial of service via hash collisions httplwnnetArticles474912 January 2012

[23] Benjamin Farley Ari Juels Venkatanathan Varadarajan ThomasRistenpart Kevin D Bowers and Michael M Swift More foryour money Exploiting performance heterogeneity in publicclouds In Proc of the ACM Symposium on Cloud Computing(SOCC) San Jose CA 2012

[24] Alexandra Fedorova Margo Seltzer and Michael D SmithImproving performance isolation on chip multiprocessors viaan operating system scheduler In Proceedings of the 16thIntl Conference on Parallel Architecture and CompilationTechniques (PACT) Brasov Romania 2007

[25] Alexander Felfernig and Robin Burke Constraint-based rec-ommender systems Technologies and research issues InProceedings of the ACM International Conference on Elec-tronic Commerce (ICEC) Innsbruck Austria 2008

[26] Brad Fitzpatrick Distributed caching with memcached InLinux Journal Volume 2004 Issue 124 2004

[27] Oded Goldreich and Rafail Ostrovsky Software protectionand simulation on oblivious rams J ACM 43(3)431ndash473May 1996

[28] Google container engine httpscloudgooglecomcontainer-engine

[29] Asela Gunawardana and Christopher Meek A unified ap-proach to building hybrid recommender systems In Proc ofthe Third ACM Conference on Recommender Systems (Rec-Sys) New York NY 2009

[30] Sanchika Gupta and Padam Kumar Vm profile based opti-mized network attack pattern detection scheme for ddos at-tacks in cloud In Proc of SSCC Mysore India 2013

[31] Yi Han Tansu Alpcan Jeffrey Chan and Christopher LeckieSecurity games for virtual machine allocation in cloud com-puting In 4th International Conference on Decision andGame Theory for Security Fort Worth TX 2013

[32] Amir Herzberg Haya Shulman Johanna Ullrich and EdgarWeippl Cloudoscopy Services discovery and topology map-ping In Proceedings of the ACM Workshop on Cloud Com-puting Security Workshop (CCSW) Berlin Germany 2013

[33] Ben Hindman Andy Konwinski Matei Zaharia Ali GhodsiAnthony D Joseph Randy Katz Scott Shenker and Ion Sto-ica Mesos A platform for fine-grained resource sharing inthe data center In Proceedings of NSDI Boston MA 2011

[34] Jingwei Huang David M Nicol and Roy H CampbellDenial-of-service threat to hadoopyarn clusters with multi-tenancy In Proc of the IEEE International Congress on BigData Washington DC 2014

[35] Alexandru Iosup Nezih Yigitbasi and Dick Epema Onthe performance variability of production cloud services InProceedings of CCGRID Newport Beach CA 2011

[36] Vimalkumar Jeyakumar Mohammad Alizadeh David MazieresBalaji Prabhakar Changhoon Kim and Albert GreenbergEyeq Practical network performance isolation at the edge InProc of the 10th USENIX Conference on Networked SystemsDesign and Implementation (NSDI) Lombard IL 2013

[37] Yaakoub El Khamra Hyunjoo Kim Shantenu Jha and Man-ish Parashar Exploring the performance fluctuations of hpcworkloads on clouds In Proceedings of CloudCom Indi-anapolis IN 2010

[38] Krzysztof C Kiwiel Convergence and efficiency of subgra-dient methods for quasiconvex minimization In Mathemat-ical Programming (Series A) (Berlin Heidelberg Springer)90 (1) pp 1-25 2001

[39] Ruby B Lee Rethinking computers for cybersecurity IEEEComputer 48(4)16ndash25 2015

[40] Jacob Leverich and Christos Kozyrakis Reconciling highserver utilization and sub-millisecond quality-of-service InProceedings of EuroSys Amsterdam The Netherlands 2014

[41] Host server cpu utilization in amazon ec2 cloud httpgoogl2LTx4T

[42] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser andRuby B Lee Last-level cache side-channel attacks are prac-tical In Proc of IEEE Symposium on Security and Privacy(SampP) San Jose CA 2015

[43] Fei Liu Lanfang Ren and Hongtao Bai Mitigating cross-vm side channel attack on multiple tenants cloud platform InJournal of Computers Vol 9 No 4 (2014) 1005-1013 April2014

[44] David Lo Liqun Cheng Rama Govindaraju Luiz Andre Bar-roso and Christos Kozyrakis Towards energy proportionalityfor large-scale latency-critical workloads In Proceedings ofthe 41st Annual International Symposium on Computer Ar-chitecuture (ISCA) Minneapolis MN 2014

[45] David Lo Liqun Cheng Rama Govindaraju ParthasarathyRanganathan and Christos Kozyrakis Heracles Improvingresource efficiency at scale In Proc of the 42Nd AnnualInternational Symposium on Computer Architecture (ISCA)Portland OR 2015

[46] Mahout httpmahoutapacheorg

[47] Dave Mangot Ec2 variability The numbers revealed httptechmangotcomrollerdaveentryec2_variability_the_numbers_revealed

[48] Jason Mars and Lingjia Tang Whare-map heterogeneity inrdquohomogeneousrdquo warehouse-scale computers In Proceedingsof ISCA Tel-Aviv Israel 2013

[49] Robert Martin John Demme and Simha SethumadhavanTimewarp Rethinking timekeeping and performance moni-toring mechanisms to mitigate side-channel attacks In Pro-ceedings of the International Symposium on Computer Archi-tecture (ISCA) Portland OR 2012

[50] David Meisner Christopher M Sadler Luiz Andre BarrosoWolf-Dietrich Weber and Thomas F Wenisch Power man-agement of online data-intensive services In Proceedings ofthe 38th annual international symposium on Computer archi-tecture pages 319ndash330 2011

[51] Jelena Mirkovic and Peter Reiher A taxonomy of ddos attackand ddos defense mechanisms ACM SIGCOMM ComputerCommunication Review (CCR) April 2004

[52] Thomas Moscibroda and Onur Mutlu Memory performanceattacks Denial of memory service in multi-core systemsIn Proc of 16th USENIX Security Symposium on USENIXSecurity Symposium (SS) Boston MA 2007

[53] Ripal Nathuji Aman Kansal and Alireza Ghaffarkhah Q-clouds Managing performance interference effects for qos-aware clouds In Proceedings of EuroSys ParisFrance 2010

[54] Simon Ostermann Alexandru Iosup Nezih Yigitbasi RaduProdan Thomas Fahringer and Dick Epema A performanceanalysis of ec2 cloud computing services for scientific com-puting In Lecture Notes on Cloud Computing Volume 34p115-131 2010

[55] Tao Peng Christopher Leckie and Kotagiri RamamohanaraoSurvey of network-based defense mechanisms countering thedos and ddos problems ACM Comput Surv 39(1) April2007

[56] Diego Perez-Botero Jakub Szefer and Ruby B Lee Char-acterizing hypervisor vulnerabilities in cloud computingservers In Proceedings of the 2013 International Work-shop on Security in Cloud Computing SCCASIACCSHangzhou China 2013

[57] Moinuddin K Qureshi and Yale N Patt Utility-based cachepartitioning A low-overhead high-performance runtimemechanism to partition shared caches In Proceedings ofthe 39th Annual IEEEACM International Symposium on Mi-croarchitecture MICRO 39 2006

[58] Himanshu Raj Ripal Nathuji Abhishek Singh and Paul Eng-land Resource management for isolation enhanced cloud ser-vices In Proc of the ACM Workshop on Cloud ComputingSecurity (CCSW) Chicago IL 2009

[59] Suhail Rehman and Majd Sakr Initial findings for provision-ing variation in cloud computing In Proceedings of Cloud-Com Indianapolis IN 2010

[60] Thomas Ristenpart Eran Tromer Hovav Shacham and Ste-fan Savage Hey you get off of my cloud Exploring infor-mation leakage in third-party compute clouds In Proc of theACM Conference on Computer and Communications Security(CCS) Chicago IL 2009

[61] Daniel Sanchez and Christos Kozyrakis Vantage Scalableand Efficient Fine-Grain Cache Partitioning In Proceedingsof the 38th annual International Symposium in ComputerArchitecture (ISCA-38) San Jose CA June 2011

[62] Jorg Schad Jens Dittrich and Jorge-Arnulfo Quiane-RuizRuntime measurements in the cloud Observing analyzingand reducing variance Proceedings VLDB Endow 3(1-2)460ndash471 September 2010

[63] Malte Schwarzkopf Andy Konwinski Michael Abd-El-Malek and John Wilkes Omega flexible scalable sched-ulers for large compute clusters In Proceedings of EuroSysPrague Czech Republic 2013

[64] Alan Shieh Srikanth Kandula Albert Greenberg and ChanghoonKim Seawall Performance isolation for cloud datacenternetworks In Proc of the USENIX Conference on Hot Topicsin Cloud Computing (HotCloud) Boston MA 2010

[65] David Shue Michael J Freedman and Anees Shaikh Perfor-mance isolation and fairness for multi-tenant cloud storage InProc of the 10th USENIX Conference on Operating SystemsDesign and Implementation (OSDI) Hollywood CA 2012

[66] Dan Tsafrir Yoav Etsion and Dror G Feitelson Secretlymonopolizing the cpu without superuser privileges In Procof 16th USENIX Security Symposium on USENIX SecuritySymposium Boston MA 2007

[67] Venkatanathan Varadarajan Thawan Kooburat BenjaminFarley Thomas Ristenpart and Michael Swift Resource-freeing attacks Improve your cloud performance (at yourneighborrsquos expense) In Proc of the Conference on Computerand Communications Security (CCS) Raleigh 2012

[68] Venkatanathan Varadarajan Thomas Ristenpart and MichaelSwift Scheduler-based defenses against cross-vm side-channels In Proc of the 23rd Usenix Security SymposiumSan Diego CA 2014

[69] Venkatanathan Varadarajan Yinqian Zhang Thomas Risten-part and Michael Swift A placement vulnerability study inmulti-tenant public clouds In Proc of the 24th USENIX Secu-rity Symposium (USENIX Security) Washington DC 2015

[70] Huaibin Wang Haiyun Zhou and Chundong Wang Vir-tual machine-based intrusion detection system framework incloud computing environment In Journal of Computers Oc-tober 2012

[71] Hui Wang Canturk Isci Lavanya Subramanian JongmooChoi Depei Qian and Onur Mutlu A-drm Architecture-aware distributed resource management of virtualized clus-ters In Proceedings of the 11th ACM SIGPLANSIGOPSinternational conference on Virtual Execution Environments(VEE) Istanbul Turkey 2015

[72] Ian H Witten Eibe Frank and Geoffrey Holmes DataMining Practical Machine Learning Tools and Techniques3rd Edition

[73] Zhenyu Wu Zhang Xu and Haining Wang Whispers inthe hyper-space High-speed covert channel attacks in thecloud In Proc of the 21st USENIX Conference on SecuritySymposium (USENIX Security) Bellevue WA 2012

[74] Yunjing Xu Michael Bailey Farnam Jahanian KaustubhJoshi Matti Hiltunen and Richard Schlichting An explo-ration of l2 cache covert channels in virtualized environmentsIn Proc of the 3rd ACM Workshop on Cloud Computing Se-curity Workshop (CCSW) Chicago IL 2011

[75] Zhang Xu Haining Wang and Zhenyu Wu A measurementstudy on co-residence threat inside the cloud In Proc ofthe 24th USENIX Security Symposium (USENIX Security)Washington DC 2015

[76] Yuval Yarom and Katrina Falkner Flush+reload a highresolution low noise l3 cache side-channel attack In Proc ofthe 23rd Usenix Security Symposium San Diego CA 2014

[77] Matei Zaharia Mosharaf Chowdhury Tathagata Das AnkurDave Justin Ma Murphy McCauly Michael J FranklinScott Shenker and Ion Stoica Resilient distributed datasetsA fault-tolerant abstraction for in-memory cluster computingIn Proceedings of NSDI San Jose CA 2012

[78] Yinqian Zhang Ari Juels Alina Oprea and Michael K Re-iter Homealone Co-residency detection in the cloud via side-channel analysis In Proc of the IEEE Symposium on Securityand Privacy Oakland CA 2011

[79] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-tenant side-channel attacks in paas cloudsIn Proc of the ACM SIGSAC Conference on Computer andCommunications Security (CCS) Scottsdale AZ 2014

[80] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-vm side channels and their use to ex-tract private keys In Proceedings of the ACM Conferenceon Computer and Communications Security (CCS) RaleighNC 2012

[81] Yinqian Zhang and Michael K Reiter Duppel retrofittingcommodity operating systems to mitigate cache side channelsin the cloud In Proc of the ACM Conference on Computerand Communications Security (CCS) Berlin Germany 2013

[82] Fangfei Zhou Manish Goel Peter Desnoyers and Ravi Sun-daram Scheduler vulnerabilities and coordinated attacks incloud computing J Comput Secur 21(4)533ndash559 July2013

[83] Jieming Zhu Pinjia He Zibin Zheng and Michael R LyuTowards online accurate and scalable qos prediction for run-time service adaptation In Proc of the IEEE InternationalConference on Distributed Computing Systems (ICDCS)Madrid Spain 2014

[84] Sergey Zhuravlev Sergey Blagodurov and Alexandra Fe-dorova Addressing shared resource contention in multicoreprocessors via scheduling In Proc of the Fifteenth Editionof on Architectural Support for Programming Languages andOperating Systems (ASPLOS) Pittsburgh PA 2010

  • Introduction
  • Related Work
  • Bolt
    • Threat Model
    • Application Detection
    • Challenges
    • Detection Accuracy
    • Limitations
      • Cloud User Study
      • Security Attacks
        • Internal Denial of Service Attack
        • Resource Freeing Attack
        • VM Co-residency Detection
          • Improving Security via Resource Isolation
          • Conclusions
Page 10: Bolt: I Know What You Did Last Summer In the Cloudcsl.stanford.edu/~christos/publications/2017.bolt.asplos.pdf · 2018-01-09 · Bolt: I Know What You Did Last Summer... In the Cloud

0 20 40 60 80 100 120Time (s)

0

2000

4000

6000

8000

10000

12000

99th

ile

Lat

ency

BoltNaive

Detection

Migration

0 20 40 60 80 100 120Time (s)

0

20

40

60

80

100

CP

U U

tiliz

atio

n (

) BoltNaive

Detection

Migration

Figure 13 Latency and utilization with Bolt and a naıveDoS attack that saturates CPU resources

and can trigger VM migration [71] This means that DoSattacks that simply overload a physical host are ineffectivewhen such defenses are in place We focus on host-basedDoS attacks and make them resilient against such defensesby avoiding resource saturation We leverage the informationobtained by Bolt to construct custom interference programsthat stress the resources the victim is most sensitive to with-out overloading the host

Bolt creates custom contentious workloads by combiningthe microbenchmarks used to measure the resource pressureof the victim Since these microbenchmarks are tunable Boltconfigures their intensity to a higher point than their mea-sured pressure ci during detection For example if a victimis detected as memcached and its most critical resources arethe L1-i cache (81 pressure) and the last level cache (LLC)(78 pressure) Bolt uses the two microbenchmarks for L1-i and LLC at a higher intensity than what memcached cantolerateImpact We use the DoS attack setting above against the 108applications of the controlled experiment (Section 34) TheDoS attack results in degraded performance by 22x on aver-age and up to 98x in terms of execution time Degradation ismuch more pronounced for interactive workloads like key-value stores with tail latency increasing by 8-140x a dra-matic impact for applications with strict tail latency SLAs

We now examine the impact of the DoS attack on uti-lization If it translates to resource saturation there is a highprobability that the DoS will be detected and the victim mi-grated to a new machine The experimental cluster supportslive migration If CPU utilization (sampled every 1 sec) ex-ceeds 70 the victim is migrated to an unloaded host Fig-ure 13 compares the tail latency and CPU utilization withBolt to that of a naıve DoS that saturates the CPU through acompute-intensive kernel We focus on a single victim VMrunning memcached The overhead of migration (time be-tween initiating migration and latency returning to normallevels) for this VM is 8sec Performance degradation is sim-ilar for both systems until t=80sec at which point the victimis migrated to a new host due to the compute-intensive kernelcausing utilization to exceed 70 While during migrationperformance continues to degrade once the VM resumes inthe new server latency returns to nominal levels In contrastBolt keeps utilization low and impacts the victimrsquos perfor-mance beyond t=80sec

52 Resource Freeing AttackAttack setting The goal of a resource freeing attack (RFA)is to modify the workload of a victim such that it frees upresources for the adversary improving its performance [6766] The adversarial VM consists of two components a ben-eficiary and a helper The beneficiary is the program whoseperformance the attacker wants to improve and the helperis the program that forces the victim to yield its resourcesThe RFA works by adding load in the victimrsquos critical re-sources causing other resources to free up For example ifthe victim is a memory-bound Spark job introducing addi-tional memory traffic will result in Spark stalling in memoryand lessening its pressure in the other resources until it canreclaim its required memory bandwidth When launched ina public cloud the victim of an RFA ends up paying moreand achieving worse performance compared to running inisolation

We now create a proof of concept RFA for a webserver anetwork-bound Hadoop job and a memory-bound Spark jobLaunching RFAs requires in depth knowledge of the victimrsquosresource requirements [67] which Bolt provides by identi-fying the victimrsquos dominant resource Once the dominant re-source is detected the runtime applies a custom helper pro-gram that saturates this critical resource For the webserverthe helper is a CPU-intensive benchmark launching CGI re-quests causing the victim to saturate its CPU usage servic-ing fewer real user requests and freeing up cache resourcesFor the Hadoop job we use a network-intensive benchmarksimilar to iperf which saturates network bandwidth andfrees up CPU and memory resources for the beneficiarySimilarly for Spark we use a streaming memory benchmarkthat slows down k-means and frees network and compute re-sources for the adversary The selection of the beneficiaryis of lesser importance without loss of generality we se-lect mcf a CPU-intensive benchmark from SPECCPU2006The same methodology can be used with other beneficiariesconditioned to their critical resource not overlapping withthe victimrsquosImpact Table 2 shows the performance degradation for thethree victim applications and the improvement in executiontime for the beneficiary The webserver suffers the most interms of queries per second as the helperrsquos CGI requestspollute its cache preventing it from servicing legitimate userrequests Hadoop and Spark experience significant degrada-tions in execution time due to network and memory band-width saturation respectively Execution time for mcf im-proves by 16-38 benefiting from the victim stalling in itscritical resource to improve its cache and CPU usage

53 VM Co-residency DetectionAttack setting Sections 34 and 4 showed that we can iden-tify the applications sharing a cloud infrastructure Howevera malicious user is rarely interested in a random service run-ning on a public cloud More often they target a specific

Victim Beneficiary TargetApp Perf App Perf Resource

Apache Webserver -64 (QPS) mcf +24 CPUHadoop (SVM) -36 (Exec) mcf +16 Network BW

Spark (k-means) -52 (Exec) mcf +38 Memory BW

Table 2 RFA impact on the victims and beneficiaries

workload eg a competitive e-commerce site Thereforethey need to pinpoint where the target resides in a practicalmanner This requires a launch strategy and a mechanism forco-residency detection [69] The attack is practical if the tar-get is located with high confidence in reasonable time andwith modest resource costs Bolt enables this attack to becarried out in a practical way and remains resilient againstdefense mechanisms such as virtual private clouds (VPCs)which make internal IP addresses private to a single tenantOnce a target VM is located the adversary can launch RFAor DoS attacks as previously described Co-residency detec-tion attacks rely on leakage of logical information eg IPaddress or on observing the impact of a side-channel attackdue to resource contention Bolt relies on a variation of thelatter approach

Assume a system of N servers A target victim userlaunches k VMs The adversary launches n malicious VMsInstances are launched simultaneously to avoid malicious-malicious instance co-residency [69] The probability that atleast one adversarial VM will be co-scheduled with a victiminstance is P( f ) = 1minus (1minus kN)n The adversary then usesa two-process detection scheme The sender is a process thatcreates contention to degrade the victimrsquos performance Thereceiver is the process that detects this degradation runningeither on the same host as the victim (cooperative detection)or externally (uncooperative detection) We assume an unco-operative victim which is the more general case Bolt worksas the sender to determine the type of co-residents on eachsampled host and identify any of the desired type Assumem VMs of the desired type have been detected in the sampleset Bolt now needs to prune down this space to VM(s) fromthe specific target victim It launches an external receiveraiming at the victim over a public channel eg HTTP orkey-value protocol to ping the victim service This is theonly point of communication with the victim At the sametime the sender incurs contention in the resources the victimis most sensitive to If the sender and victim are co-residentsthe receiverrsquos requests will be slower due to contention Boltquickly prunes down the sampled VMs by determining thetype of co-residents In a large cluster this reduces the timeto pinpoint a victim and the resource costs for the adver-sary It also increases the confidence in the detection in thepresence of noise from other co-residentsImpact We evaluate co-residency detection on the same40-node cluster The victim is a SQL server instantiatinga single VM The cluster also hosts 7 other VMs runningSQL Other active VMs host key-value stores Hadoop andSpark jobs Bolt launches 10 senders simultaneously on

randomly-selected machines (P( f ) = ) and detects 3VMs running SQL in the sample set It then introduces mem-ory interference as the receiver initiates a large number ofSQL queries While these queries have 816msec mean la-tency without interference latency now becomes 2614msecGiven a sim 3times latency increase we can conclude that one ofthe co-residents is the victim SQL server Detection required6sec from instantiation to receiver detection and 11 adver-sary VMs

6 Improving Security via Resource IsolationSince interference is at the heart of Boltrsquos detection method-ology isolation mechanisms that attenuate contention shouldreduce its accuracy We first evaluate to what extent ex-isting isolation techniques mitigate security vulnerabilitiesand then highlight trade-offs between security performanceand utilization We use the same setting as in the controlledexperiment of Section 34 We evaluate 4 resource-specificisolation techniques and 3 settings for OS-level isolationmechanisms baremetal containerized and virtualized Thebaremetal system does not employ any OS level isolationThe containerized setup uses Linux containers (lxc) andassigns cores to applications via cpuset cgroups Bothcontainers and VMs constrain memory capacity Baremetalexperiments do not enforce memory capacity allocationsand the Linux scheduler is free to float applications acrosscores

The first resource-specific isolation techniques is threadpinning to physical cores to constrain interference fromscheduling actions like context switching The number ofcores an application is allocated can change dynamicallyand is limited by how fast Linux can migrate tasks typicallyin the tens of milliseconds

For network isolation we use the outbound networkbandwidth partitioning capabilities of Linuxrsquos traffic con-trol Specifically we use the qdisc [8] scheduler with hier-archical token bucket queueing discipline (HTB) to enforceegress bandwidth limits The limits are set to the maximumtraffic burst rate for each application to avoid contention(ceil parameter in HTB interface) Ingress network band-width isolation has been extensively studied in previouswork [36] these approaches can be applied here as well

For DRAM bandwidth isolation there is no commer-cially available partitioning mechanism To enforce band-width isolation we use the following approach we monitorthe DRAM bandwidth usage of each application throughperformance counters [45] and modify the scheduler to onlycolocate jobs on machines that can accommodate their ag-gregate peak memory bandwidth requirements This requiresextensive application knowledge and is used simply to high-light the benefits of DRAM bandwidth isolation

Finally for last level cache (LLC) isolation we use theCache Allocation Technology (CAT) available in recent In-tel chips [11] CAT partitions the LLC in ways which in

0

20

40

60

80

100

Acc

urac

y (

)

None ThreadPinning

+Net BWPartitioning

+Mem BWPartitioning

+CachePartitioning

+CoreIsolation

BaremetalLinuxContainersVirtualMachines

Figure 14 Detection accuracy with isolation techniques

a highly-associative cache allows for non-overlapping par-titions at the granularity of a few percent of the LLC ca-pacity Each co-resident is allocated one partition configuredto its current capacity requirements [57] Partitions can beresized at runtime by reprogramming specific low-level reg-isters (MSRs) changes take effect after a few milliseconds

We add one isolation mechanism at a time in the threesystem settings (baremetal containers VMs) Figure 14shows the impact of isolation techniques on Boltrsquos detectionaccuracy As expected when no isolation is used baremetalallows for a significantly higher detection accuracy thancontainer- and VM-based systems mostly due to the lat-ter constraining core and memory capacity usage As a re-sult introducing thread pinning benefits baremetal the mostsince it reduces core contention It also benefits container-and VM-based setups to a lesser degree by eliminating un-predictability introduced by the Linux scheduler (eg con-text switching) [40] The dominant resource of each appli-cation determines which isolation technique benefits it themost Thread pinning mostly benefits workloads bound byon-chip resources such as L1L2 caches and cores Addingnetwork bandwidth partitioning lowers detection accuracyfor all three settings almost equally It primarily benefitsnetwork-bound workloads for which network interferenceconveys the most information for detection Memory band-width isolation further reduces accuracy by 10 on averagebenefiting jobs dominated by DRAM traffic Finally cachepartitioning has the most dramatic reduction in accuracyespecially for LLC-bound workloads We attribute this tothe importance cache pressure has as a detection signal Thenumber of co-residents also affects the extent to which iso-lation helps The more co-scheduled applications exist permachine the more isolation techniques degrade accuracy asthey make distinguishing between co-residents harder

Unfortunately even when all the techniques are on ac-curacy is still 50 due to two reasons current techniquesare not fine-grain enough to allow strict and scalable isola-tion and core resources (L1L2 caches CPU) are prone tointerference due to contending hyperthreads [61] To eval-uate the latter hypothesis we modify the scheduler suchthat hyperthreads of different jobs are never scheduled onthe same physical core eg if a job needs 7vCPUs it isallocated 4 dedicated physical cores The grey bars of Fig-ure 14 show the detection accuracy Baremetal instances stillallow certain applications to be detected but for container-

ized and virtualized settings accuracy drops to 14 The re-maining accuracy corresponds to disk-heavy workloads Im-proving security however comes at a performance penaltyof 34 on average in execution time as threads of the samejob are forced to contend with one another Alternativelyusers can overprovision their resource reservations to avoiddegradation which results in a 45 drop in utilization Thismeans that the cloud provider cannot leverage CPU idlenessto share machines decreasing the cost benefits of cloud com-puting Note that enforcing core isolation alone is also notsufficient as it allows a detection accuracy of 46Discussion The previous analysis highlights a design prob-lem with current datacenter platforms Traditional multi-cores are prone to contention which will only worsen asmore cores are integrated in each server and multi-tenancybecomes more pronounced Existing isolation techniques areinsufficient to mitigate security vulnerabilities and tech-niques that provide reasonable security guarantees sacrificeperformance or cost efficiency through low utilization Thishighlights the need for new fine-grain and coordinated iso-lation techniques that guarantee security at high utilizationfor shared resources

7 ConclusionsWe have presented Bolt a practical system that uses onlinedata mining techniques to identify the type and character-istics of applications running on shared cloud systems andenables practical attacks that degrade their performance Ina 40-server cluster Bolt correctly identifies 87 out of 108workloads and degrades tail latency by up to 140x We alsoused Bolt in a user study on EC2 to detect the character-istics of unknown jobs submitted by multiple users Finallywe show that while existing isolation techniques are helpfulthey are not sufficient to mitigate such attacks Bolt revealsreal easy-to-exploit threats in public clouds We hope thatthis work will motivate cloud providers and computer sci-entists to develop and deploy stricter isolation primitives incloud systems

AcknowledgementsWe sincerely thank all the participants in the user study fortheir time and effort We also thank Daniel Sanchez EdSuh Grant Ayers Mingyu Gao Ana Klimovic and the restof the MAST group as well as the anonymous reviewersfor their feedback on earlier versions of this manuscriptThis work was supported by the Stanford Platform LabNSF grant CNS-1422088 and a John and Norma BalenSesquicentennial Faculty Fellowship

References[1] Amazon ec2 httpawsamazoncomec2

[2] Aman Bakshi and Yogesh B Dujodwala Securing cloudfrom ddos attacks using intrusion detection system in virtualmachine In Proc of the Second International Conference onCommunication Software and Networks (ICCSN) 2010

[3] Luiz Barroso and Urs Hoelzle The Datacenter as a Com-puter An Introduction to the Design of Warehouse-Scale Ma-chines MC Publishers 2009

[4] Naomi Benger Joop van de Pol Nigel P Smart and YuvalYarom rdquoooh aah just a little bitrdquo A small amount of sidechannel can go a long way In Proc of the InternationalCryptographic Hardware and Embedded Systems Workshop(CHES) Busan South Korea 2014

[5] Major Bhadauria and Sally A McKee An approach toresource-aware co-scheduling for cmps In Proc of the 24thACM International Conference on Supercomputing (ICS)Tsukuba Japan 2010

[6] Leon Bottou Large-scale machine learning with stochas-tic gradient descent In Proceedings of the InternationalConference on Computational Statistics (COMPSTAT) ParisFrance 2010

[7] Eric Brewer Kubernetes The path to cloud native httpgooglQgkzYB SOCC Keynote August 2015

[8] Martin A Brown Traffic control howto httplinux-ipnetarticlesTraffic-Control-HOWTO

[9] Robin Burke Hybrid recommender systems Survey andexperiments User Modeling and User-Adapted Interaction12(4)331ndash370 November 2002

[10] Apache cassandra httpcassandraapacheorg

[11] Intel Rcopy64 and IA-32 Architecture Software DeveloperrsquosManual vol3B System Programming Guide Part 2 Septem-ber 2014

[12] Ludmila Cherkasova Diwaker Gupta and Amin VahdatComparison of the three cpu schedulers in xen SIGMET-RICS Perform Eval Rev 35(2)42ndash51 September 2007

[13] Scott A Crosby and Dan S Wallach Denial of service viaalgorithmic complexity attacks In Proceedings of the 12thConference on USENIX Security Symposium WashingtonDC 2003

[14] Marwan Darwish Abdelkader Ouda and Luiz FernandoCapretz Cloud-based ddos attacks and defenses In Procof i-Society Toronto ON 2013

[15] Christina Delimitrou and Christos Kozyrakis iBench Quan-tifying Interference for Datacenter Workloads In Proceed-ings of the 2013 IEEE International Symposium on WorkloadCharacterization (IISWC) Portland OR September 2013

[16] Christina Delimitrou and Christos Kozyrakis Paragon QoS-Aware Scheduling for Heterogeneous Datacenters In Pro-ceedings of the Eighteenth International Conference on Ar-chitectural Support for Programming Languages and Oper-ating Systems (ASPLOS) Houston TX USA 2013

[17] Christina Delimitrou and Christos Kozyrakis QoS-AwareScheduling in Heterogeneous Datacenters with Paragon InACM Transactions on Computer Systems (TOCS) Vol 31Issue 4 December 2013

[18] Christina Delimitrou and Christos Kozyrakis Quality-of-Service-Aware Scheduling in Heterogeneous Datacenterswith Paragon In IEEE Micro Special Issue on Top Picks fromthe Computer Architecture Conferences MayJune 2014

[19] Christina Delimitrou and Christos Kozyrakis Quasar Resource-Efficient and QoS-Aware Cluster Management In Proceed-ings of the Nineteenth International Conference on Architec-tural Support for Programming Languages and OperatingSystems (ASPLOS) Salt Lake City UT USA 2014

[20] Christina Delimitrou and Christos Kozyrakis HCloudResource-Efficient Provisioning in Shared Cloud SystemsIn Proceedings of the Twenty First International Conferenceon Architectural Support for Programming Languages andOperating Systems (ASPLOS) April 2016

[21] Christina Delimitrou Daniel Sanchez and Christos KozyrakisTarcil Reconciling Scheduling Speed and Quality in LargeShared Clusters In Proceedings of the Sixth ACM Symposiumon Cloud Computing (SOCC) August 2015

[22] Jake Edge Denial of service via hash collisions httplwnnetArticles474912 January 2012

[23] Benjamin Farley Ari Juels Venkatanathan Varadarajan ThomasRistenpart Kevin D Bowers and Michael M Swift More foryour money Exploiting performance heterogeneity in publicclouds In Proc of the ACM Symposium on Cloud Computing(SOCC) San Jose CA 2012

[24] Alexandra Fedorova Margo Seltzer and Michael D SmithImproving performance isolation on chip multiprocessors viaan operating system scheduler In Proceedings of the 16thIntl Conference on Parallel Architecture and CompilationTechniques (PACT) Brasov Romania 2007

[25] Alexander Felfernig and Robin Burke Constraint-based rec-ommender systems Technologies and research issues InProceedings of the ACM International Conference on Elec-tronic Commerce (ICEC) Innsbruck Austria 2008

[26] Brad Fitzpatrick Distributed caching with memcached InLinux Journal Volume 2004 Issue 124 2004

[27] Oded Goldreich and Rafail Ostrovsky Software protectionand simulation on oblivious rams J ACM 43(3)431ndash473May 1996

[28] Google container engine httpscloudgooglecomcontainer-engine

[29] Asela Gunawardana and Christopher Meek A unified ap-proach to building hybrid recommender systems In Proc ofthe Third ACM Conference on Recommender Systems (Rec-Sys) New York NY 2009

[30] Sanchika Gupta and Padam Kumar Vm profile based opti-mized network attack pattern detection scheme for ddos at-tacks in cloud In Proc of SSCC Mysore India 2013

[31] Yi Han Tansu Alpcan Jeffrey Chan and Christopher LeckieSecurity games for virtual machine allocation in cloud com-puting In 4th International Conference on Decision andGame Theory for Security Fort Worth TX 2013

[32] Amir Herzberg Haya Shulman Johanna Ullrich and EdgarWeippl Cloudoscopy Services discovery and topology map-ping In Proceedings of the ACM Workshop on Cloud Com-puting Security Workshop (CCSW) Berlin Germany 2013

[33] Ben Hindman Andy Konwinski Matei Zaharia Ali GhodsiAnthony D Joseph Randy Katz Scott Shenker and Ion Sto-ica Mesos A platform for fine-grained resource sharing inthe data center In Proceedings of NSDI Boston MA 2011

[34] Jingwei Huang David M Nicol and Roy H CampbellDenial-of-service threat to hadoopyarn clusters with multi-tenancy In Proc of the IEEE International Congress on BigData Washington DC 2014

[35] Alexandru Iosup Nezih Yigitbasi and Dick Epema Onthe performance variability of production cloud services InProceedings of CCGRID Newport Beach CA 2011

[36] Vimalkumar Jeyakumar Mohammad Alizadeh David MazieresBalaji Prabhakar Changhoon Kim and Albert GreenbergEyeq Practical network performance isolation at the edge InProc of the 10th USENIX Conference on Networked SystemsDesign and Implementation (NSDI) Lombard IL 2013

[37] Yaakoub El Khamra Hyunjoo Kim Shantenu Jha and Man-ish Parashar Exploring the performance fluctuations of hpcworkloads on clouds In Proceedings of CloudCom Indi-anapolis IN 2010

[38] Krzysztof C Kiwiel Convergence and efficiency of subgra-dient methods for quasiconvex minimization In Mathemat-ical Programming (Series A) (Berlin Heidelberg Springer)90 (1) pp 1-25 2001

[39] Ruby B Lee Rethinking computers for cybersecurity IEEEComputer 48(4)16ndash25 2015

[40] Jacob Leverich and Christos Kozyrakis Reconciling highserver utilization and sub-millisecond quality-of-service InProceedings of EuroSys Amsterdam The Netherlands 2014

[41] Host server cpu utilization in amazon ec2 cloud httpgoogl2LTx4T

[42] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser andRuby B Lee Last-level cache side-channel attacks are prac-tical In Proc of IEEE Symposium on Security and Privacy(SampP) San Jose CA 2015

[43] Fei Liu Lanfang Ren and Hongtao Bai Mitigating cross-vm side channel attack on multiple tenants cloud platform InJournal of Computers Vol 9 No 4 (2014) 1005-1013 April2014

[44] David Lo Liqun Cheng Rama Govindaraju Luiz Andre Bar-roso and Christos Kozyrakis Towards energy proportionalityfor large-scale latency-critical workloads In Proceedings ofthe 41st Annual International Symposium on Computer Ar-chitecuture (ISCA) Minneapolis MN 2014

[45] David Lo Liqun Cheng Rama Govindaraju ParthasarathyRanganathan and Christos Kozyrakis Heracles Improvingresource efficiency at scale In Proc of the 42Nd AnnualInternational Symposium on Computer Architecture (ISCA)Portland OR 2015

[46] Mahout httpmahoutapacheorg

[47] Dave Mangot Ec2 variability The numbers revealed httptechmangotcomrollerdaveentryec2_variability_the_numbers_revealed

[48] Jason Mars and Lingjia Tang Whare-map heterogeneity inrdquohomogeneousrdquo warehouse-scale computers In Proceedingsof ISCA Tel-Aviv Israel 2013

[49] Robert Martin John Demme and Simha SethumadhavanTimewarp Rethinking timekeeping and performance moni-toring mechanisms to mitigate side-channel attacks In Pro-ceedings of the International Symposium on Computer Archi-tecture (ISCA) Portland OR 2012

[50] David Meisner Christopher M Sadler Luiz Andre BarrosoWolf-Dietrich Weber and Thomas F Wenisch Power man-agement of online data-intensive services In Proceedings ofthe 38th annual international symposium on Computer archi-tecture pages 319ndash330 2011

[51] Jelena Mirkovic and Peter Reiher A taxonomy of ddos attackand ddos defense mechanisms ACM SIGCOMM ComputerCommunication Review (CCR) April 2004

[52] Thomas Moscibroda and Onur Mutlu Memory performanceattacks Denial of memory service in multi-core systemsIn Proc of 16th USENIX Security Symposium on USENIXSecurity Symposium (SS) Boston MA 2007

[53] Ripal Nathuji Aman Kansal and Alireza Ghaffarkhah Q-clouds Managing performance interference effects for qos-aware clouds In Proceedings of EuroSys ParisFrance 2010

[54] Simon Ostermann Alexandru Iosup Nezih Yigitbasi RaduProdan Thomas Fahringer and Dick Epema A performanceanalysis of ec2 cloud computing services for scientific com-puting In Lecture Notes on Cloud Computing Volume 34p115-131 2010

[55] Tao Peng Christopher Leckie and Kotagiri RamamohanaraoSurvey of network-based defense mechanisms countering thedos and ddos problems ACM Comput Surv 39(1) April2007

[56] Diego Perez-Botero Jakub Szefer and Ruby B Lee Char-acterizing hypervisor vulnerabilities in cloud computingservers In Proceedings of the 2013 International Work-shop on Security in Cloud Computing SCCASIACCSHangzhou China 2013

[57] Moinuddin K Qureshi and Yale N Patt Utility-based cachepartitioning A low-overhead high-performance runtimemechanism to partition shared caches In Proceedings ofthe 39th Annual IEEEACM International Symposium on Mi-croarchitecture MICRO 39 2006

[58] Himanshu Raj Ripal Nathuji Abhishek Singh and Paul Eng-land Resource management for isolation enhanced cloud ser-vices In Proc of the ACM Workshop on Cloud ComputingSecurity (CCSW) Chicago IL 2009

[59] Suhail Rehman and Majd Sakr Initial findings for provision-ing variation in cloud computing In Proceedings of Cloud-Com Indianapolis IN 2010

[60] Thomas Ristenpart Eran Tromer Hovav Shacham and Ste-fan Savage Hey you get off of my cloud Exploring infor-mation leakage in third-party compute clouds In Proc of theACM Conference on Computer and Communications Security(CCS) Chicago IL 2009

[61] Daniel Sanchez and Christos Kozyrakis Vantage Scalableand Efficient Fine-Grain Cache Partitioning In Proceedingsof the 38th annual International Symposium in ComputerArchitecture (ISCA-38) San Jose CA June 2011

[62] Jorg Schad Jens Dittrich and Jorge-Arnulfo Quiane-RuizRuntime measurements in the cloud Observing analyzingand reducing variance Proceedings VLDB Endow 3(1-2)460ndash471 September 2010

[63] Malte Schwarzkopf Andy Konwinski Michael Abd-El-Malek and John Wilkes Omega flexible scalable sched-ulers for large compute clusters In Proceedings of EuroSysPrague Czech Republic 2013

[64] Alan Shieh Srikanth Kandula Albert Greenberg and ChanghoonKim Seawall Performance isolation for cloud datacenternetworks In Proc of the USENIX Conference on Hot Topicsin Cloud Computing (HotCloud) Boston MA 2010

[65] David Shue Michael J Freedman and Anees Shaikh Perfor-mance isolation and fairness for multi-tenant cloud storage InProc of the 10th USENIX Conference on Operating SystemsDesign and Implementation (OSDI) Hollywood CA 2012

[66] Dan Tsafrir Yoav Etsion and Dror G Feitelson Secretlymonopolizing the cpu without superuser privileges In Procof 16th USENIX Security Symposium on USENIX SecuritySymposium Boston MA 2007

[67] Venkatanathan Varadarajan Thawan Kooburat BenjaminFarley Thomas Ristenpart and Michael Swift Resource-freeing attacks Improve your cloud performance (at yourneighborrsquos expense) In Proc of the Conference on Computerand Communications Security (CCS) Raleigh 2012

[68] Venkatanathan Varadarajan Thomas Ristenpart and MichaelSwift Scheduler-based defenses against cross-vm side-channels In Proc of the 23rd Usenix Security SymposiumSan Diego CA 2014

[69] Venkatanathan Varadarajan Yinqian Zhang Thomas Risten-part and Michael Swift A placement vulnerability study inmulti-tenant public clouds In Proc of the 24th USENIX Secu-rity Symposium (USENIX Security) Washington DC 2015

[70] Huaibin Wang Haiyun Zhou and Chundong Wang Vir-tual machine-based intrusion detection system framework incloud computing environment In Journal of Computers Oc-tober 2012

[71] Hui Wang Canturk Isci Lavanya Subramanian JongmooChoi Depei Qian and Onur Mutlu A-drm Architecture-aware distributed resource management of virtualized clus-ters In Proceedings of the 11th ACM SIGPLANSIGOPSinternational conference on Virtual Execution Environments(VEE) Istanbul Turkey 2015

[72] Ian H Witten Eibe Frank and Geoffrey Holmes DataMining Practical Machine Learning Tools and Techniques3rd Edition

[73] Zhenyu Wu Zhang Xu and Haining Wang Whispers inthe hyper-space High-speed covert channel attacks in thecloud In Proc of the 21st USENIX Conference on SecuritySymposium (USENIX Security) Bellevue WA 2012

[74] Yunjing Xu Michael Bailey Farnam Jahanian KaustubhJoshi Matti Hiltunen and Richard Schlichting An explo-ration of l2 cache covert channels in virtualized environmentsIn Proc of the 3rd ACM Workshop on Cloud Computing Se-curity Workshop (CCSW) Chicago IL 2011

[75] Zhang Xu Haining Wang and Zhenyu Wu A measurementstudy on co-residence threat inside the cloud In Proc ofthe 24th USENIX Security Symposium (USENIX Security)Washington DC 2015

[76] Yuval Yarom and Katrina Falkner Flush+reload a highresolution low noise l3 cache side-channel attack In Proc ofthe 23rd Usenix Security Symposium San Diego CA 2014

[77] Matei Zaharia Mosharaf Chowdhury Tathagata Das AnkurDave Justin Ma Murphy McCauly Michael J FranklinScott Shenker and Ion Stoica Resilient distributed datasetsA fault-tolerant abstraction for in-memory cluster computingIn Proceedings of NSDI San Jose CA 2012

[78] Yinqian Zhang Ari Juels Alina Oprea and Michael K Re-iter Homealone Co-residency detection in the cloud via side-channel analysis In Proc of the IEEE Symposium on Securityand Privacy Oakland CA 2011

[79] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-tenant side-channel attacks in paas cloudsIn Proc of the ACM SIGSAC Conference on Computer andCommunications Security (CCS) Scottsdale AZ 2014

[80] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-vm side channels and their use to ex-tract private keys In Proceedings of the ACM Conferenceon Computer and Communications Security (CCS) RaleighNC 2012

[81] Yinqian Zhang and Michael K Reiter Duppel retrofittingcommodity operating systems to mitigate cache side channelsin the cloud In Proc of the ACM Conference on Computerand Communications Security (CCS) Berlin Germany 2013

[82] Fangfei Zhou Manish Goel Peter Desnoyers and Ravi Sun-daram Scheduler vulnerabilities and coordinated attacks incloud computing J Comput Secur 21(4)533ndash559 July2013

[83] Jieming Zhu Pinjia He Zibin Zheng and Michael R LyuTowards online accurate and scalable qos prediction for run-time service adaptation In Proc of the IEEE InternationalConference on Distributed Computing Systems (ICDCS)Madrid Spain 2014

[84] Sergey Zhuravlev Sergey Blagodurov and Alexandra Fe-dorova Addressing shared resource contention in multicoreprocessors via scheduling In Proc of the Fifteenth Editionof on Architectural Support for Programming Languages andOperating Systems (ASPLOS) Pittsburgh PA 2010

  • Introduction
  • Related Work
  • Bolt
    • Threat Model
    • Application Detection
    • Challenges
    • Detection Accuracy
    • Limitations
      • Cloud User Study
      • Security Attacks
        • Internal Denial of Service Attack
        • Resource Freeing Attack
        • VM Co-residency Detection
          • Improving Security via Resource Isolation
          • Conclusions
Page 11: Bolt: I Know What You Did Last Summer In the Cloudcsl.stanford.edu/~christos/publications/2017.bolt.asplos.pdf · 2018-01-09 · Bolt: I Know What You Did Last Summer... In the Cloud

Victim Beneficiary TargetApp Perf App Perf Resource

Apache Webserver -64 (QPS) mcf +24 CPUHadoop (SVM) -36 (Exec) mcf +16 Network BW

Spark (k-means) -52 (Exec) mcf +38 Memory BW

Table 2 RFA impact on the victims and beneficiaries

workload eg a competitive e-commerce site Thereforethey need to pinpoint where the target resides in a practicalmanner This requires a launch strategy and a mechanism forco-residency detection [69] The attack is practical if the tar-get is located with high confidence in reasonable time andwith modest resource costs Bolt enables this attack to becarried out in a practical way and remains resilient againstdefense mechanisms such as virtual private clouds (VPCs)which make internal IP addresses private to a single tenantOnce a target VM is located the adversary can launch RFAor DoS attacks as previously described Co-residency detec-tion attacks rely on leakage of logical information eg IPaddress or on observing the impact of a side-channel attackdue to resource contention Bolt relies on a variation of thelatter approach

Assume a system of N servers A target victim userlaunches k VMs The adversary launches n malicious VMsInstances are launched simultaneously to avoid malicious-malicious instance co-residency [69] The probability that atleast one adversarial VM will be co-scheduled with a victiminstance is P( f ) = 1minus (1minus kN)n The adversary then usesa two-process detection scheme The sender is a process thatcreates contention to degrade the victimrsquos performance Thereceiver is the process that detects this degradation runningeither on the same host as the victim (cooperative detection)or externally (uncooperative detection) We assume an unco-operative victim which is the more general case Bolt worksas the sender to determine the type of co-residents on eachsampled host and identify any of the desired type Assumem VMs of the desired type have been detected in the sampleset Bolt now needs to prune down this space to VM(s) fromthe specific target victim It launches an external receiveraiming at the victim over a public channel eg HTTP orkey-value protocol to ping the victim service This is theonly point of communication with the victim At the sametime the sender incurs contention in the resources the victimis most sensitive to If the sender and victim are co-residentsthe receiverrsquos requests will be slower due to contention Boltquickly prunes down the sampled VMs by determining thetype of co-residents In a large cluster this reduces the timeto pinpoint a victim and the resource costs for the adver-sary It also increases the confidence in the detection in thepresence of noise from other co-residentsImpact We evaluate co-residency detection on the same40-node cluster The victim is a SQL server instantiatinga single VM The cluster also hosts 7 other VMs runningSQL Other active VMs host key-value stores Hadoop andSpark jobs Bolt launches 10 senders simultaneously on

randomly-selected machines (P( f ) = ) and detects 3VMs running SQL in the sample set It then introduces mem-ory interference as the receiver initiates a large number ofSQL queries While these queries have 816msec mean la-tency without interference latency now becomes 2614msecGiven a sim 3times latency increase we can conclude that one ofthe co-residents is the victim SQL server Detection required6sec from instantiation to receiver detection and 11 adver-sary VMs

6 Improving Security via Resource IsolationSince interference is at the heart of Boltrsquos detection method-ology isolation mechanisms that attenuate contention shouldreduce its accuracy We first evaluate to what extent ex-isting isolation techniques mitigate security vulnerabilitiesand then highlight trade-offs between security performanceand utilization We use the same setting as in the controlledexperiment of Section 34 We evaluate 4 resource-specificisolation techniques and 3 settings for OS-level isolationmechanisms baremetal containerized and virtualized Thebaremetal system does not employ any OS level isolationThe containerized setup uses Linux containers (lxc) andassigns cores to applications via cpuset cgroups Bothcontainers and VMs constrain memory capacity Baremetalexperiments do not enforce memory capacity allocationsand the Linux scheduler is free to float applications acrosscores

The first resource-specific isolation techniques is threadpinning to physical cores to constrain interference fromscheduling actions like context switching The number ofcores an application is allocated can change dynamicallyand is limited by how fast Linux can migrate tasks typicallyin the tens of milliseconds

For network isolation we use the outbound networkbandwidth partitioning capabilities of Linuxrsquos traffic con-trol Specifically we use the qdisc [8] scheduler with hier-archical token bucket queueing discipline (HTB) to enforceegress bandwidth limits The limits are set to the maximumtraffic burst rate for each application to avoid contention(ceil parameter in HTB interface) Ingress network band-width isolation has been extensively studied in previouswork [36] these approaches can be applied here as well

For DRAM bandwidth isolation there is no commer-cially available partitioning mechanism To enforce band-width isolation we use the following approach we monitorthe DRAM bandwidth usage of each application throughperformance counters [45] and modify the scheduler to onlycolocate jobs on machines that can accommodate their ag-gregate peak memory bandwidth requirements This requiresextensive application knowledge and is used simply to high-light the benefits of DRAM bandwidth isolation

Finally for last level cache (LLC) isolation we use theCache Allocation Technology (CAT) available in recent In-tel chips [11] CAT partitions the LLC in ways which in

0

20

40

60

80

100

Acc

urac

y (

)

None ThreadPinning

+Net BWPartitioning

+Mem BWPartitioning

+CachePartitioning

+CoreIsolation

BaremetalLinuxContainersVirtualMachines

Figure 14 Detection accuracy with isolation techniques

a highly-associative cache allows for non-overlapping par-titions at the granularity of a few percent of the LLC ca-pacity Each co-resident is allocated one partition configuredto its current capacity requirements [57] Partitions can beresized at runtime by reprogramming specific low-level reg-isters (MSRs) changes take effect after a few milliseconds

We add one isolation mechanism at a time in the threesystem settings (baremetal containers VMs) Figure 14shows the impact of isolation techniques on Boltrsquos detectionaccuracy As expected when no isolation is used baremetalallows for a significantly higher detection accuracy thancontainer- and VM-based systems mostly due to the lat-ter constraining core and memory capacity usage As a re-sult introducing thread pinning benefits baremetal the mostsince it reduces core contention It also benefits container-and VM-based setups to a lesser degree by eliminating un-predictability introduced by the Linux scheduler (eg con-text switching) [40] The dominant resource of each appli-cation determines which isolation technique benefits it themost Thread pinning mostly benefits workloads bound byon-chip resources such as L1L2 caches and cores Addingnetwork bandwidth partitioning lowers detection accuracyfor all three settings almost equally It primarily benefitsnetwork-bound workloads for which network interferenceconveys the most information for detection Memory band-width isolation further reduces accuracy by 10 on averagebenefiting jobs dominated by DRAM traffic Finally cachepartitioning has the most dramatic reduction in accuracyespecially for LLC-bound workloads We attribute this tothe importance cache pressure has as a detection signal Thenumber of co-residents also affects the extent to which iso-lation helps The more co-scheduled applications exist permachine the more isolation techniques degrade accuracy asthey make distinguishing between co-residents harder

Unfortunately even when all the techniques are on ac-curacy is still 50 due to two reasons current techniquesare not fine-grain enough to allow strict and scalable isola-tion and core resources (L1L2 caches CPU) are prone tointerference due to contending hyperthreads [61] To eval-uate the latter hypothesis we modify the scheduler suchthat hyperthreads of different jobs are never scheduled onthe same physical core eg if a job needs 7vCPUs it isallocated 4 dedicated physical cores The grey bars of Fig-ure 14 show the detection accuracy Baremetal instances stillallow certain applications to be detected but for container-

ized and virtualized settings accuracy drops to 14 The re-maining accuracy corresponds to disk-heavy workloads Im-proving security however comes at a performance penaltyof 34 on average in execution time as threads of the samejob are forced to contend with one another Alternativelyusers can overprovision their resource reservations to avoiddegradation which results in a 45 drop in utilization Thismeans that the cloud provider cannot leverage CPU idlenessto share machines decreasing the cost benefits of cloud com-puting Note that enforcing core isolation alone is also notsufficient as it allows a detection accuracy of 46Discussion The previous analysis highlights a design prob-lem with current datacenter platforms Traditional multi-cores are prone to contention which will only worsen asmore cores are integrated in each server and multi-tenancybecomes more pronounced Existing isolation techniques areinsufficient to mitigate security vulnerabilities and tech-niques that provide reasonable security guarantees sacrificeperformance or cost efficiency through low utilization Thishighlights the need for new fine-grain and coordinated iso-lation techniques that guarantee security at high utilizationfor shared resources

7 ConclusionsWe have presented Bolt a practical system that uses onlinedata mining techniques to identify the type and character-istics of applications running on shared cloud systems andenables practical attacks that degrade their performance Ina 40-server cluster Bolt correctly identifies 87 out of 108workloads and degrades tail latency by up to 140x We alsoused Bolt in a user study on EC2 to detect the character-istics of unknown jobs submitted by multiple users Finallywe show that while existing isolation techniques are helpfulthey are not sufficient to mitigate such attacks Bolt revealsreal easy-to-exploit threats in public clouds We hope thatthis work will motivate cloud providers and computer sci-entists to develop and deploy stricter isolation primitives incloud systems

AcknowledgementsWe sincerely thank all the participants in the user study fortheir time and effort We also thank Daniel Sanchez EdSuh Grant Ayers Mingyu Gao Ana Klimovic and the restof the MAST group as well as the anonymous reviewersfor their feedback on earlier versions of this manuscriptThis work was supported by the Stanford Platform LabNSF grant CNS-1422088 and a John and Norma BalenSesquicentennial Faculty Fellowship

References[1] Amazon ec2 httpawsamazoncomec2

[2] Aman Bakshi and Yogesh B Dujodwala Securing cloudfrom ddos attacks using intrusion detection system in virtualmachine In Proc of the Second International Conference onCommunication Software and Networks (ICCSN) 2010

[3] Luiz Barroso and Urs Hoelzle The Datacenter as a Com-puter An Introduction to the Design of Warehouse-Scale Ma-chines MC Publishers 2009

[4] Naomi Benger Joop van de Pol Nigel P Smart and YuvalYarom rdquoooh aah just a little bitrdquo A small amount of sidechannel can go a long way In Proc of the InternationalCryptographic Hardware and Embedded Systems Workshop(CHES) Busan South Korea 2014

[5] Major Bhadauria and Sally A McKee An approach toresource-aware co-scheduling for cmps In Proc of the 24thACM International Conference on Supercomputing (ICS)Tsukuba Japan 2010

[6] Leon Bottou Large-scale machine learning with stochas-tic gradient descent In Proceedings of the InternationalConference on Computational Statistics (COMPSTAT) ParisFrance 2010

[7] Eric Brewer Kubernetes The path to cloud native httpgooglQgkzYB SOCC Keynote August 2015

[8] Martin A Brown Traffic control howto httplinux-ipnetarticlesTraffic-Control-HOWTO

[9] Robin Burke Hybrid recommender systems Survey andexperiments User Modeling and User-Adapted Interaction12(4)331ndash370 November 2002

[10] Apache cassandra httpcassandraapacheorg

[11] Intel Rcopy64 and IA-32 Architecture Software DeveloperrsquosManual vol3B System Programming Guide Part 2 Septem-ber 2014

[12] Ludmila Cherkasova Diwaker Gupta and Amin VahdatComparison of the three cpu schedulers in xen SIGMET-RICS Perform Eval Rev 35(2)42ndash51 September 2007

[13] Scott A Crosby and Dan S Wallach Denial of service viaalgorithmic complexity attacks In Proceedings of the 12thConference on USENIX Security Symposium WashingtonDC 2003

[14] Marwan Darwish Abdelkader Ouda and Luiz FernandoCapretz Cloud-based ddos attacks and defenses In Procof i-Society Toronto ON 2013

[15] Christina Delimitrou and Christos Kozyrakis iBench Quan-tifying Interference for Datacenter Workloads In Proceed-ings of the 2013 IEEE International Symposium on WorkloadCharacterization (IISWC) Portland OR September 2013

[16] Christina Delimitrou and Christos Kozyrakis Paragon QoS-Aware Scheduling for Heterogeneous Datacenters In Pro-ceedings of the Eighteenth International Conference on Ar-chitectural Support for Programming Languages and Oper-ating Systems (ASPLOS) Houston TX USA 2013

[17] Christina Delimitrou and Christos Kozyrakis QoS-AwareScheduling in Heterogeneous Datacenters with Paragon InACM Transactions on Computer Systems (TOCS) Vol 31Issue 4 December 2013

[18] Christina Delimitrou and Christos Kozyrakis Quality-of-Service-Aware Scheduling in Heterogeneous Datacenterswith Paragon In IEEE Micro Special Issue on Top Picks fromthe Computer Architecture Conferences MayJune 2014

[19] Christina Delimitrou and Christos Kozyrakis Quasar Resource-Efficient and QoS-Aware Cluster Management In Proceed-ings of the Nineteenth International Conference on Architec-tural Support for Programming Languages and OperatingSystems (ASPLOS) Salt Lake City UT USA 2014

[20] Christina Delimitrou and Christos Kozyrakis HCloudResource-Efficient Provisioning in Shared Cloud SystemsIn Proceedings of the Twenty First International Conferenceon Architectural Support for Programming Languages andOperating Systems (ASPLOS) April 2016

[21] Christina Delimitrou Daniel Sanchez and Christos KozyrakisTarcil Reconciling Scheduling Speed and Quality in LargeShared Clusters In Proceedings of the Sixth ACM Symposiumon Cloud Computing (SOCC) August 2015

[22] Jake Edge Denial of service via hash collisions httplwnnetArticles474912 January 2012

[23] Benjamin Farley Ari Juels Venkatanathan Varadarajan ThomasRistenpart Kevin D Bowers and Michael M Swift More foryour money Exploiting performance heterogeneity in publicclouds In Proc of the ACM Symposium on Cloud Computing(SOCC) San Jose CA 2012

[24] Alexandra Fedorova Margo Seltzer and Michael D SmithImproving performance isolation on chip multiprocessors viaan operating system scheduler In Proceedings of the 16thIntl Conference on Parallel Architecture and CompilationTechniques (PACT) Brasov Romania 2007

[25] Alexander Felfernig and Robin Burke Constraint-based rec-ommender systems Technologies and research issues InProceedings of the ACM International Conference on Elec-tronic Commerce (ICEC) Innsbruck Austria 2008

[26] Brad Fitzpatrick Distributed caching with memcached InLinux Journal Volume 2004 Issue 124 2004

[27] Oded Goldreich and Rafail Ostrovsky Software protectionand simulation on oblivious rams J ACM 43(3)431ndash473May 1996

[28] Google container engine httpscloudgooglecomcontainer-engine

[29] Asela Gunawardana and Christopher Meek A unified ap-proach to building hybrid recommender systems In Proc ofthe Third ACM Conference on Recommender Systems (Rec-Sys) New York NY 2009

[30] Sanchika Gupta and Padam Kumar Vm profile based opti-mized network attack pattern detection scheme for ddos at-tacks in cloud In Proc of SSCC Mysore India 2013

[31] Yi Han Tansu Alpcan Jeffrey Chan and Christopher LeckieSecurity games for virtual machine allocation in cloud com-puting In 4th International Conference on Decision andGame Theory for Security Fort Worth TX 2013

[32] Amir Herzberg Haya Shulman Johanna Ullrich and EdgarWeippl Cloudoscopy Services discovery and topology map-ping In Proceedings of the ACM Workshop on Cloud Com-puting Security Workshop (CCSW) Berlin Germany 2013

[33] Ben Hindman Andy Konwinski Matei Zaharia Ali GhodsiAnthony D Joseph Randy Katz Scott Shenker and Ion Sto-ica Mesos A platform for fine-grained resource sharing inthe data center In Proceedings of NSDI Boston MA 2011

[34] Jingwei Huang David M Nicol and Roy H CampbellDenial-of-service threat to hadoopyarn clusters with multi-tenancy In Proc of the IEEE International Congress on BigData Washington DC 2014

[35] Alexandru Iosup Nezih Yigitbasi and Dick Epema Onthe performance variability of production cloud services InProceedings of CCGRID Newport Beach CA 2011

[36] Vimalkumar Jeyakumar Mohammad Alizadeh David MazieresBalaji Prabhakar Changhoon Kim and Albert GreenbergEyeq Practical network performance isolation at the edge InProc of the 10th USENIX Conference on Networked SystemsDesign and Implementation (NSDI) Lombard IL 2013

[37] Yaakoub El Khamra Hyunjoo Kim Shantenu Jha and Man-ish Parashar Exploring the performance fluctuations of hpcworkloads on clouds In Proceedings of CloudCom Indi-anapolis IN 2010

[38] Krzysztof C Kiwiel Convergence and efficiency of subgra-dient methods for quasiconvex minimization In Mathemat-ical Programming (Series A) (Berlin Heidelberg Springer)90 (1) pp 1-25 2001

[39] Ruby B Lee Rethinking computers for cybersecurity IEEEComputer 48(4)16ndash25 2015

[40] Jacob Leverich and Christos Kozyrakis Reconciling highserver utilization and sub-millisecond quality-of-service InProceedings of EuroSys Amsterdam The Netherlands 2014

[41] Host server cpu utilization in amazon ec2 cloud httpgoogl2LTx4T

[42] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser andRuby B Lee Last-level cache side-channel attacks are prac-tical In Proc of IEEE Symposium on Security and Privacy(SampP) San Jose CA 2015

[43] Fei Liu Lanfang Ren and Hongtao Bai Mitigating cross-vm side channel attack on multiple tenants cloud platform InJournal of Computers Vol 9 No 4 (2014) 1005-1013 April2014

[44] David Lo Liqun Cheng Rama Govindaraju Luiz Andre Bar-roso and Christos Kozyrakis Towards energy proportionalityfor large-scale latency-critical workloads In Proceedings ofthe 41st Annual International Symposium on Computer Ar-chitecuture (ISCA) Minneapolis MN 2014

[45] David Lo Liqun Cheng Rama Govindaraju ParthasarathyRanganathan and Christos Kozyrakis Heracles Improvingresource efficiency at scale In Proc of the 42Nd AnnualInternational Symposium on Computer Architecture (ISCA)Portland OR 2015

[46] Mahout httpmahoutapacheorg

[47] Dave Mangot Ec2 variability The numbers revealed httptechmangotcomrollerdaveentryec2_variability_the_numbers_revealed

[48] Jason Mars and Lingjia Tang Whare-map heterogeneity inrdquohomogeneousrdquo warehouse-scale computers In Proceedingsof ISCA Tel-Aviv Israel 2013

[49] Robert Martin John Demme and Simha SethumadhavanTimewarp Rethinking timekeeping and performance moni-toring mechanisms to mitigate side-channel attacks In Pro-ceedings of the International Symposium on Computer Archi-tecture (ISCA) Portland OR 2012

[50] David Meisner Christopher M Sadler Luiz Andre BarrosoWolf-Dietrich Weber and Thomas F Wenisch Power man-agement of online data-intensive services In Proceedings ofthe 38th annual international symposium on Computer archi-tecture pages 319ndash330 2011

[51] Jelena Mirkovic and Peter Reiher A taxonomy of ddos attackand ddos defense mechanisms ACM SIGCOMM ComputerCommunication Review (CCR) April 2004

[52] Thomas Moscibroda and Onur Mutlu Memory performanceattacks Denial of memory service in multi-core systemsIn Proc of 16th USENIX Security Symposium on USENIXSecurity Symposium (SS) Boston MA 2007

[53] Ripal Nathuji Aman Kansal and Alireza Ghaffarkhah Q-clouds Managing performance interference effects for qos-aware clouds In Proceedings of EuroSys ParisFrance 2010

[54] Simon Ostermann Alexandru Iosup Nezih Yigitbasi RaduProdan Thomas Fahringer and Dick Epema A performanceanalysis of ec2 cloud computing services for scientific com-puting In Lecture Notes on Cloud Computing Volume 34p115-131 2010

[55] Tao Peng Christopher Leckie and Kotagiri RamamohanaraoSurvey of network-based defense mechanisms countering thedos and ddos problems ACM Comput Surv 39(1) April2007

[56] Diego Perez-Botero Jakub Szefer and Ruby B Lee Char-acterizing hypervisor vulnerabilities in cloud computingservers In Proceedings of the 2013 International Work-shop on Security in Cloud Computing SCCASIACCSHangzhou China 2013

[57] Moinuddin K Qureshi and Yale N Patt Utility-based cachepartitioning A low-overhead high-performance runtimemechanism to partition shared caches In Proceedings ofthe 39th Annual IEEEACM International Symposium on Mi-croarchitecture MICRO 39 2006

[58] Himanshu Raj Ripal Nathuji Abhishek Singh and Paul Eng-land Resource management for isolation enhanced cloud ser-vices In Proc of the ACM Workshop on Cloud ComputingSecurity (CCSW) Chicago IL 2009

[59] Suhail Rehman and Majd Sakr Initial findings for provision-ing variation in cloud computing In Proceedings of Cloud-Com Indianapolis IN 2010

[60] Thomas Ristenpart Eran Tromer Hovav Shacham and Ste-fan Savage Hey you get off of my cloud Exploring infor-mation leakage in third-party compute clouds In Proc of theACM Conference on Computer and Communications Security(CCS) Chicago IL 2009

[61] Daniel Sanchez and Christos Kozyrakis Vantage Scalableand Efficient Fine-Grain Cache Partitioning In Proceedingsof the 38th annual International Symposium in ComputerArchitecture (ISCA-38) San Jose CA June 2011

[62] Jorg Schad Jens Dittrich and Jorge-Arnulfo Quiane-RuizRuntime measurements in the cloud Observing analyzingand reducing variance Proceedings VLDB Endow 3(1-2)460ndash471 September 2010

[63] Malte Schwarzkopf Andy Konwinski Michael Abd-El-Malek and John Wilkes Omega flexible scalable sched-ulers for large compute clusters In Proceedings of EuroSysPrague Czech Republic 2013

[64] Alan Shieh Srikanth Kandula Albert Greenberg and ChanghoonKim Seawall Performance isolation for cloud datacenternetworks In Proc of the USENIX Conference on Hot Topicsin Cloud Computing (HotCloud) Boston MA 2010

[65] David Shue Michael J Freedman and Anees Shaikh Perfor-mance isolation and fairness for multi-tenant cloud storage InProc of the 10th USENIX Conference on Operating SystemsDesign and Implementation (OSDI) Hollywood CA 2012

[66] Dan Tsafrir Yoav Etsion and Dror G Feitelson Secretlymonopolizing the cpu without superuser privileges In Procof 16th USENIX Security Symposium on USENIX SecuritySymposium Boston MA 2007

[67] Venkatanathan Varadarajan Thawan Kooburat BenjaminFarley Thomas Ristenpart and Michael Swift Resource-freeing attacks Improve your cloud performance (at yourneighborrsquos expense) In Proc of the Conference on Computerand Communications Security (CCS) Raleigh 2012

[68] Venkatanathan Varadarajan Thomas Ristenpart and MichaelSwift Scheduler-based defenses against cross-vm side-channels In Proc of the 23rd Usenix Security SymposiumSan Diego CA 2014

[69] Venkatanathan Varadarajan Yinqian Zhang Thomas Risten-part and Michael Swift A placement vulnerability study inmulti-tenant public clouds In Proc of the 24th USENIX Secu-rity Symposium (USENIX Security) Washington DC 2015

[70] Huaibin Wang Haiyun Zhou and Chundong Wang Vir-tual machine-based intrusion detection system framework incloud computing environment In Journal of Computers Oc-tober 2012

[71] Hui Wang Canturk Isci Lavanya Subramanian JongmooChoi Depei Qian and Onur Mutlu A-drm Architecture-aware distributed resource management of virtualized clus-ters In Proceedings of the 11th ACM SIGPLANSIGOPSinternational conference on Virtual Execution Environments(VEE) Istanbul Turkey 2015

[72] Ian H Witten Eibe Frank and Geoffrey Holmes DataMining Practical Machine Learning Tools and Techniques3rd Edition

[73] Zhenyu Wu Zhang Xu and Haining Wang Whispers inthe hyper-space High-speed covert channel attacks in thecloud In Proc of the 21st USENIX Conference on SecuritySymposium (USENIX Security) Bellevue WA 2012

[74] Yunjing Xu Michael Bailey Farnam Jahanian KaustubhJoshi Matti Hiltunen and Richard Schlichting An explo-ration of l2 cache covert channels in virtualized environmentsIn Proc of the 3rd ACM Workshop on Cloud Computing Se-curity Workshop (CCSW) Chicago IL 2011

[75] Zhang Xu Haining Wang and Zhenyu Wu A measurementstudy on co-residence threat inside the cloud In Proc ofthe 24th USENIX Security Symposium (USENIX Security)Washington DC 2015

[76] Yuval Yarom and Katrina Falkner Flush+reload a highresolution low noise l3 cache side-channel attack In Proc ofthe 23rd Usenix Security Symposium San Diego CA 2014

[77] Matei Zaharia Mosharaf Chowdhury Tathagata Das AnkurDave Justin Ma Murphy McCauly Michael J FranklinScott Shenker and Ion Stoica Resilient distributed datasetsA fault-tolerant abstraction for in-memory cluster computingIn Proceedings of NSDI San Jose CA 2012

[78] Yinqian Zhang Ari Juels Alina Oprea and Michael K Re-iter Homealone Co-residency detection in the cloud via side-channel analysis In Proc of the IEEE Symposium on Securityand Privacy Oakland CA 2011

[79] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-tenant side-channel attacks in paas cloudsIn Proc of the ACM SIGSAC Conference on Computer andCommunications Security (CCS) Scottsdale AZ 2014

[80] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-vm side channels and their use to ex-tract private keys In Proceedings of the ACM Conferenceon Computer and Communications Security (CCS) RaleighNC 2012

[81] Yinqian Zhang and Michael K Reiter Duppel retrofittingcommodity operating systems to mitigate cache side channelsin the cloud In Proc of the ACM Conference on Computerand Communications Security (CCS) Berlin Germany 2013

[82] Fangfei Zhou Manish Goel Peter Desnoyers and Ravi Sun-daram Scheduler vulnerabilities and coordinated attacks incloud computing J Comput Secur 21(4)533ndash559 July2013

[83] Jieming Zhu Pinjia He Zibin Zheng and Michael R LyuTowards online accurate and scalable qos prediction for run-time service adaptation In Proc of the IEEE InternationalConference on Distributed Computing Systems (ICDCS)Madrid Spain 2014

[84] Sergey Zhuravlev Sergey Blagodurov and Alexandra Fe-dorova Addressing shared resource contention in multicoreprocessors via scheduling In Proc of the Fifteenth Editionof on Architectural Support for Programming Languages andOperating Systems (ASPLOS) Pittsburgh PA 2010

  • Introduction
  • Related Work
  • Bolt
    • Threat Model
    • Application Detection
    • Challenges
    • Detection Accuracy
    • Limitations
      • Cloud User Study
      • Security Attacks
        • Internal Denial of Service Attack
        • Resource Freeing Attack
        • VM Co-residency Detection
          • Improving Security via Resource Isolation
          • Conclusions
Page 12: Bolt: I Know What You Did Last Summer In the Cloudcsl.stanford.edu/~christos/publications/2017.bolt.asplos.pdf · 2018-01-09 · Bolt: I Know What You Did Last Summer... In the Cloud

0

20

40

60

80

100

Acc

urac

y (

)

None ThreadPinning

+Net BWPartitioning

+Mem BWPartitioning

+CachePartitioning

+CoreIsolation

BaremetalLinuxContainersVirtualMachines

Figure 14 Detection accuracy with isolation techniques

a highly-associative cache allows for non-overlapping par-titions at the granularity of a few percent of the LLC ca-pacity Each co-resident is allocated one partition configuredto its current capacity requirements [57] Partitions can beresized at runtime by reprogramming specific low-level reg-isters (MSRs) changes take effect after a few milliseconds

We add one isolation mechanism at a time in the threesystem settings (baremetal containers VMs) Figure 14shows the impact of isolation techniques on Boltrsquos detectionaccuracy As expected when no isolation is used baremetalallows for a significantly higher detection accuracy thancontainer- and VM-based systems mostly due to the lat-ter constraining core and memory capacity usage As a re-sult introducing thread pinning benefits baremetal the mostsince it reduces core contention It also benefits container-and VM-based setups to a lesser degree by eliminating un-predictability introduced by the Linux scheduler (eg con-text switching) [40] The dominant resource of each appli-cation determines which isolation technique benefits it themost Thread pinning mostly benefits workloads bound byon-chip resources such as L1L2 caches and cores Addingnetwork bandwidth partitioning lowers detection accuracyfor all three settings almost equally It primarily benefitsnetwork-bound workloads for which network interferenceconveys the most information for detection Memory band-width isolation further reduces accuracy by 10 on averagebenefiting jobs dominated by DRAM traffic Finally cachepartitioning has the most dramatic reduction in accuracyespecially for LLC-bound workloads We attribute this tothe importance cache pressure has as a detection signal Thenumber of co-residents also affects the extent to which iso-lation helps The more co-scheduled applications exist permachine the more isolation techniques degrade accuracy asthey make distinguishing between co-residents harder

Unfortunately even when all the techniques are on ac-curacy is still 50 due to two reasons current techniquesare not fine-grain enough to allow strict and scalable isola-tion and core resources (L1L2 caches CPU) are prone tointerference due to contending hyperthreads [61] To eval-uate the latter hypothesis we modify the scheduler suchthat hyperthreads of different jobs are never scheduled onthe same physical core eg if a job needs 7vCPUs it isallocated 4 dedicated physical cores The grey bars of Fig-ure 14 show the detection accuracy Baremetal instances stillallow certain applications to be detected but for container-

ized and virtualized settings accuracy drops to 14 The re-maining accuracy corresponds to disk-heavy workloads Im-proving security however comes at a performance penaltyof 34 on average in execution time as threads of the samejob are forced to contend with one another Alternativelyusers can overprovision their resource reservations to avoiddegradation which results in a 45 drop in utilization Thismeans that the cloud provider cannot leverage CPU idlenessto share machines decreasing the cost benefits of cloud com-puting Note that enforcing core isolation alone is also notsufficient as it allows a detection accuracy of 46Discussion The previous analysis highlights a design prob-lem with current datacenter platforms Traditional multi-cores are prone to contention which will only worsen asmore cores are integrated in each server and multi-tenancybecomes more pronounced Existing isolation techniques areinsufficient to mitigate security vulnerabilities and tech-niques that provide reasonable security guarantees sacrificeperformance or cost efficiency through low utilization Thishighlights the need for new fine-grain and coordinated iso-lation techniques that guarantee security at high utilizationfor shared resources

7 ConclusionsWe have presented Bolt a practical system that uses onlinedata mining techniques to identify the type and character-istics of applications running on shared cloud systems andenables practical attacks that degrade their performance Ina 40-server cluster Bolt correctly identifies 87 out of 108workloads and degrades tail latency by up to 140x We alsoused Bolt in a user study on EC2 to detect the character-istics of unknown jobs submitted by multiple users Finallywe show that while existing isolation techniques are helpfulthey are not sufficient to mitigate such attacks Bolt revealsreal easy-to-exploit threats in public clouds We hope thatthis work will motivate cloud providers and computer sci-entists to develop and deploy stricter isolation primitives incloud systems

AcknowledgementsWe sincerely thank all the participants in the user study fortheir time and effort We also thank Daniel Sanchez EdSuh Grant Ayers Mingyu Gao Ana Klimovic and the restof the MAST group as well as the anonymous reviewersfor their feedback on earlier versions of this manuscriptThis work was supported by the Stanford Platform LabNSF grant CNS-1422088 and a John and Norma BalenSesquicentennial Faculty Fellowship

References[1] Amazon ec2 httpawsamazoncomec2

[2] Aman Bakshi and Yogesh B Dujodwala Securing cloudfrom ddos attacks using intrusion detection system in virtualmachine In Proc of the Second International Conference onCommunication Software and Networks (ICCSN) 2010

[3] Luiz Barroso and Urs Hoelzle The Datacenter as a Com-puter An Introduction to the Design of Warehouse-Scale Ma-chines MC Publishers 2009

[4] Naomi Benger Joop van de Pol Nigel P Smart and YuvalYarom rdquoooh aah just a little bitrdquo A small amount of sidechannel can go a long way In Proc of the InternationalCryptographic Hardware and Embedded Systems Workshop(CHES) Busan South Korea 2014

[5] Major Bhadauria and Sally A McKee An approach toresource-aware co-scheduling for cmps In Proc of the 24thACM International Conference on Supercomputing (ICS)Tsukuba Japan 2010

[6] Leon Bottou Large-scale machine learning with stochas-tic gradient descent In Proceedings of the InternationalConference on Computational Statistics (COMPSTAT) ParisFrance 2010

[7] Eric Brewer Kubernetes The path to cloud native httpgooglQgkzYB SOCC Keynote August 2015

[8] Martin A Brown Traffic control howto httplinux-ipnetarticlesTraffic-Control-HOWTO

[9] Robin Burke Hybrid recommender systems Survey andexperiments User Modeling and User-Adapted Interaction12(4)331ndash370 November 2002

[10] Apache cassandra httpcassandraapacheorg

[11] Intel Rcopy64 and IA-32 Architecture Software DeveloperrsquosManual vol3B System Programming Guide Part 2 Septem-ber 2014

[12] Ludmila Cherkasova Diwaker Gupta and Amin VahdatComparison of the three cpu schedulers in xen SIGMET-RICS Perform Eval Rev 35(2)42ndash51 September 2007

[13] Scott A Crosby and Dan S Wallach Denial of service viaalgorithmic complexity attacks In Proceedings of the 12thConference on USENIX Security Symposium WashingtonDC 2003

[14] Marwan Darwish Abdelkader Ouda and Luiz FernandoCapretz Cloud-based ddos attacks and defenses In Procof i-Society Toronto ON 2013

[15] Christina Delimitrou and Christos Kozyrakis iBench Quan-tifying Interference for Datacenter Workloads In Proceed-ings of the 2013 IEEE International Symposium on WorkloadCharacterization (IISWC) Portland OR September 2013

[16] Christina Delimitrou and Christos Kozyrakis Paragon QoS-Aware Scheduling for Heterogeneous Datacenters In Pro-ceedings of the Eighteenth International Conference on Ar-chitectural Support for Programming Languages and Oper-ating Systems (ASPLOS) Houston TX USA 2013

[17] Christina Delimitrou and Christos Kozyrakis QoS-AwareScheduling in Heterogeneous Datacenters with Paragon InACM Transactions on Computer Systems (TOCS) Vol 31Issue 4 December 2013

[18] Christina Delimitrou and Christos Kozyrakis Quality-of-Service-Aware Scheduling in Heterogeneous Datacenterswith Paragon In IEEE Micro Special Issue on Top Picks fromthe Computer Architecture Conferences MayJune 2014

[19] Christina Delimitrou and Christos Kozyrakis Quasar Resource-Efficient and QoS-Aware Cluster Management In Proceed-ings of the Nineteenth International Conference on Architec-tural Support for Programming Languages and OperatingSystems (ASPLOS) Salt Lake City UT USA 2014

[20] Christina Delimitrou and Christos Kozyrakis HCloudResource-Efficient Provisioning in Shared Cloud SystemsIn Proceedings of the Twenty First International Conferenceon Architectural Support for Programming Languages andOperating Systems (ASPLOS) April 2016

[21] Christina Delimitrou Daniel Sanchez and Christos KozyrakisTarcil Reconciling Scheduling Speed and Quality in LargeShared Clusters In Proceedings of the Sixth ACM Symposiumon Cloud Computing (SOCC) August 2015

[22] Jake Edge Denial of service via hash collisions httplwnnetArticles474912 January 2012

[23] Benjamin Farley Ari Juels Venkatanathan Varadarajan ThomasRistenpart Kevin D Bowers and Michael M Swift More foryour money Exploiting performance heterogeneity in publicclouds In Proc of the ACM Symposium on Cloud Computing(SOCC) San Jose CA 2012

[24] Alexandra Fedorova Margo Seltzer and Michael D SmithImproving performance isolation on chip multiprocessors viaan operating system scheduler In Proceedings of the 16thIntl Conference on Parallel Architecture and CompilationTechniques (PACT) Brasov Romania 2007

[25] Alexander Felfernig and Robin Burke Constraint-based rec-ommender systems Technologies and research issues InProceedings of the ACM International Conference on Elec-tronic Commerce (ICEC) Innsbruck Austria 2008

[26] Brad Fitzpatrick Distributed caching with memcached InLinux Journal Volume 2004 Issue 124 2004

[27] Oded Goldreich and Rafail Ostrovsky Software protectionand simulation on oblivious rams J ACM 43(3)431ndash473May 1996

[28] Google container engine httpscloudgooglecomcontainer-engine

[29] Asela Gunawardana and Christopher Meek A unified ap-proach to building hybrid recommender systems In Proc ofthe Third ACM Conference on Recommender Systems (Rec-Sys) New York NY 2009

[30] Sanchika Gupta and Padam Kumar Vm profile based opti-mized network attack pattern detection scheme for ddos at-tacks in cloud In Proc of SSCC Mysore India 2013

[31] Yi Han Tansu Alpcan Jeffrey Chan and Christopher LeckieSecurity games for virtual machine allocation in cloud com-puting In 4th International Conference on Decision andGame Theory for Security Fort Worth TX 2013

[32] Amir Herzberg Haya Shulman Johanna Ullrich and EdgarWeippl Cloudoscopy Services discovery and topology map-ping In Proceedings of the ACM Workshop on Cloud Com-puting Security Workshop (CCSW) Berlin Germany 2013

[33] Ben Hindman Andy Konwinski Matei Zaharia Ali GhodsiAnthony D Joseph Randy Katz Scott Shenker and Ion Sto-ica Mesos A platform for fine-grained resource sharing inthe data center In Proceedings of NSDI Boston MA 2011

[34] Jingwei Huang David M Nicol and Roy H CampbellDenial-of-service threat to hadoopyarn clusters with multi-tenancy In Proc of the IEEE International Congress on BigData Washington DC 2014

[35] Alexandru Iosup Nezih Yigitbasi and Dick Epema Onthe performance variability of production cloud services InProceedings of CCGRID Newport Beach CA 2011

[36] Vimalkumar Jeyakumar Mohammad Alizadeh David MazieresBalaji Prabhakar Changhoon Kim and Albert GreenbergEyeq Practical network performance isolation at the edge InProc of the 10th USENIX Conference on Networked SystemsDesign and Implementation (NSDI) Lombard IL 2013

[37] Yaakoub El Khamra Hyunjoo Kim Shantenu Jha and Man-ish Parashar Exploring the performance fluctuations of hpcworkloads on clouds In Proceedings of CloudCom Indi-anapolis IN 2010

[38] Krzysztof C Kiwiel Convergence and efficiency of subgra-dient methods for quasiconvex minimization In Mathemat-ical Programming (Series A) (Berlin Heidelberg Springer)90 (1) pp 1-25 2001

[39] Ruby B Lee Rethinking computers for cybersecurity IEEEComputer 48(4)16ndash25 2015

[40] Jacob Leverich and Christos Kozyrakis Reconciling highserver utilization and sub-millisecond quality-of-service InProceedings of EuroSys Amsterdam The Netherlands 2014

[41] Host server cpu utilization in amazon ec2 cloud httpgoogl2LTx4T

[42] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser andRuby B Lee Last-level cache side-channel attacks are prac-tical In Proc of IEEE Symposium on Security and Privacy(SampP) San Jose CA 2015

[43] Fei Liu Lanfang Ren and Hongtao Bai Mitigating cross-vm side channel attack on multiple tenants cloud platform InJournal of Computers Vol 9 No 4 (2014) 1005-1013 April2014

[44] David Lo Liqun Cheng Rama Govindaraju Luiz Andre Bar-roso and Christos Kozyrakis Towards energy proportionalityfor large-scale latency-critical workloads In Proceedings ofthe 41st Annual International Symposium on Computer Ar-chitecuture (ISCA) Minneapolis MN 2014

[45] David Lo Liqun Cheng Rama Govindaraju ParthasarathyRanganathan and Christos Kozyrakis Heracles Improvingresource efficiency at scale In Proc of the 42Nd AnnualInternational Symposium on Computer Architecture (ISCA)Portland OR 2015

[46] Mahout httpmahoutapacheorg

[47] Dave Mangot Ec2 variability The numbers revealed httptechmangotcomrollerdaveentryec2_variability_the_numbers_revealed

[48] Jason Mars and Lingjia Tang Whare-map heterogeneity inrdquohomogeneousrdquo warehouse-scale computers In Proceedingsof ISCA Tel-Aviv Israel 2013

[49] Robert Martin John Demme and Simha SethumadhavanTimewarp Rethinking timekeeping and performance moni-toring mechanisms to mitigate side-channel attacks In Pro-ceedings of the International Symposium on Computer Archi-tecture (ISCA) Portland OR 2012

[50] David Meisner Christopher M Sadler Luiz Andre BarrosoWolf-Dietrich Weber and Thomas F Wenisch Power man-agement of online data-intensive services In Proceedings ofthe 38th annual international symposium on Computer archi-tecture pages 319ndash330 2011

[51] Jelena Mirkovic and Peter Reiher A taxonomy of ddos attackand ddos defense mechanisms ACM SIGCOMM ComputerCommunication Review (CCR) April 2004

[52] Thomas Moscibroda and Onur Mutlu Memory performanceattacks Denial of memory service in multi-core systemsIn Proc of 16th USENIX Security Symposium on USENIXSecurity Symposium (SS) Boston MA 2007

[53] Ripal Nathuji Aman Kansal and Alireza Ghaffarkhah Q-clouds Managing performance interference effects for qos-aware clouds In Proceedings of EuroSys ParisFrance 2010

[54] Simon Ostermann Alexandru Iosup Nezih Yigitbasi RaduProdan Thomas Fahringer and Dick Epema A performanceanalysis of ec2 cloud computing services for scientific com-puting In Lecture Notes on Cloud Computing Volume 34p115-131 2010

[55] Tao Peng Christopher Leckie and Kotagiri RamamohanaraoSurvey of network-based defense mechanisms countering thedos and ddos problems ACM Comput Surv 39(1) April2007

[56] Diego Perez-Botero Jakub Szefer and Ruby B Lee Char-acterizing hypervisor vulnerabilities in cloud computingservers In Proceedings of the 2013 International Work-shop on Security in Cloud Computing SCCASIACCSHangzhou China 2013

[57] Moinuddin K Qureshi and Yale N Patt Utility-based cachepartitioning A low-overhead high-performance runtimemechanism to partition shared caches In Proceedings ofthe 39th Annual IEEEACM International Symposium on Mi-croarchitecture MICRO 39 2006

[58] Himanshu Raj Ripal Nathuji Abhishek Singh and Paul Eng-land Resource management for isolation enhanced cloud ser-vices In Proc of the ACM Workshop on Cloud ComputingSecurity (CCSW) Chicago IL 2009

[59] Suhail Rehman and Majd Sakr Initial findings for provision-ing variation in cloud computing In Proceedings of Cloud-Com Indianapolis IN 2010

[60] Thomas Ristenpart Eran Tromer Hovav Shacham and Ste-fan Savage Hey you get off of my cloud Exploring infor-mation leakage in third-party compute clouds In Proc of theACM Conference on Computer and Communications Security(CCS) Chicago IL 2009

[61] Daniel Sanchez and Christos Kozyrakis Vantage Scalableand Efficient Fine-Grain Cache Partitioning In Proceedingsof the 38th annual International Symposium in ComputerArchitecture (ISCA-38) San Jose CA June 2011

[62] Jorg Schad Jens Dittrich and Jorge-Arnulfo Quiane-RuizRuntime measurements in the cloud Observing analyzingand reducing variance Proceedings VLDB Endow 3(1-2)460ndash471 September 2010

[63] Malte Schwarzkopf Andy Konwinski Michael Abd-El-Malek and John Wilkes Omega flexible scalable sched-ulers for large compute clusters In Proceedings of EuroSysPrague Czech Republic 2013

[64] Alan Shieh Srikanth Kandula Albert Greenberg and ChanghoonKim Seawall Performance isolation for cloud datacenternetworks In Proc of the USENIX Conference on Hot Topicsin Cloud Computing (HotCloud) Boston MA 2010

[65] David Shue Michael J Freedman and Anees Shaikh Perfor-mance isolation and fairness for multi-tenant cloud storage InProc of the 10th USENIX Conference on Operating SystemsDesign and Implementation (OSDI) Hollywood CA 2012

[66] Dan Tsafrir Yoav Etsion and Dror G Feitelson Secretlymonopolizing the cpu without superuser privileges In Procof 16th USENIX Security Symposium on USENIX SecuritySymposium Boston MA 2007

[67] Venkatanathan Varadarajan Thawan Kooburat BenjaminFarley Thomas Ristenpart and Michael Swift Resource-freeing attacks Improve your cloud performance (at yourneighborrsquos expense) In Proc of the Conference on Computerand Communications Security (CCS) Raleigh 2012

[68] Venkatanathan Varadarajan Thomas Ristenpart and MichaelSwift Scheduler-based defenses against cross-vm side-channels In Proc of the 23rd Usenix Security SymposiumSan Diego CA 2014

[69] Venkatanathan Varadarajan Yinqian Zhang Thomas Risten-part and Michael Swift A placement vulnerability study inmulti-tenant public clouds In Proc of the 24th USENIX Secu-rity Symposium (USENIX Security) Washington DC 2015

[70] Huaibin Wang Haiyun Zhou and Chundong Wang Vir-tual machine-based intrusion detection system framework incloud computing environment In Journal of Computers Oc-tober 2012

[71] Hui Wang Canturk Isci Lavanya Subramanian JongmooChoi Depei Qian and Onur Mutlu A-drm Architecture-aware distributed resource management of virtualized clus-ters In Proceedings of the 11th ACM SIGPLANSIGOPSinternational conference on Virtual Execution Environments(VEE) Istanbul Turkey 2015

[72] Ian H Witten Eibe Frank and Geoffrey Holmes DataMining Practical Machine Learning Tools and Techniques3rd Edition

[73] Zhenyu Wu Zhang Xu and Haining Wang Whispers inthe hyper-space High-speed covert channel attacks in thecloud In Proc of the 21st USENIX Conference on SecuritySymposium (USENIX Security) Bellevue WA 2012

[74] Yunjing Xu Michael Bailey Farnam Jahanian KaustubhJoshi Matti Hiltunen and Richard Schlichting An explo-ration of l2 cache covert channels in virtualized environmentsIn Proc of the 3rd ACM Workshop on Cloud Computing Se-curity Workshop (CCSW) Chicago IL 2011

[75] Zhang Xu Haining Wang and Zhenyu Wu A measurementstudy on co-residence threat inside the cloud In Proc ofthe 24th USENIX Security Symposium (USENIX Security)Washington DC 2015

[76] Yuval Yarom and Katrina Falkner Flush+reload a highresolution low noise l3 cache side-channel attack In Proc ofthe 23rd Usenix Security Symposium San Diego CA 2014

[77] Matei Zaharia Mosharaf Chowdhury Tathagata Das AnkurDave Justin Ma Murphy McCauly Michael J FranklinScott Shenker and Ion Stoica Resilient distributed datasetsA fault-tolerant abstraction for in-memory cluster computingIn Proceedings of NSDI San Jose CA 2012

[78] Yinqian Zhang Ari Juels Alina Oprea and Michael K Re-iter Homealone Co-residency detection in the cloud via side-channel analysis In Proc of the IEEE Symposium on Securityand Privacy Oakland CA 2011

[79] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-tenant side-channel attacks in paas cloudsIn Proc of the ACM SIGSAC Conference on Computer andCommunications Security (CCS) Scottsdale AZ 2014

[80] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-vm side channels and their use to ex-tract private keys In Proceedings of the ACM Conferenceon Computer and Communications Security (CCS) RaleighNC 2012

[81] Yinqian Zhang and Michael K Reiter Duppel retrofittingcommodity operating systems to mitigate cache side channelsin the cloud In Proc of the ACM Conference on Computerand Communications Security (CCS) Berlin Germany 2013

[82] Fangfei Zhou Manish Goel Peter Desnoyers and Ravi Sun-daram Scheduler vulnerabilities and coordinated attacks incloud computing J Comput Secur 21(4)533ndash559 July2013

[83] Jieming Zhu Pinjia He Zibin Zheng and Michael R LyuTowards online accurate and scalable qos prediction for run-time service adaptation In Proc of the IEEE InternationalConference on Distributed Computing Systems (ICDCS)Madrid Spain 2014

[84] Sergey Zhuravlev Sergey Blagodurov and Alexandra Fe-dorova Addressing shared resource contention in multicoreprocessors via scheduling In Proc of the Fifteenth Editionof on Architectural Support for Programming Languages andOperating Systems (ASPLOS) Pittsburgh PA 2010

  • Introduction
  • Related Work
  • Bolt
    • Threat Model
    • Application Detection
    • Challenges
    • Detection Accuracy
    • Limitations
      • Cloud User Study
      • Security Attacks
        • Internal Denial of Service Attack
        • Resource Freeing Attack
        • VM Co-residency Detection
          • Improving Security via Resource Isolation
          • Conclusions
Page 13: Bolt: I Know What You Did Last Summer In the Cloudcsl.stanford.edu/~christos/publications/2017.bolt.asplos.pdf · 2018-01-09 · Bolt: I Know What You Did Last Summer... In the Cloud

References[1] Amazon ec2 httpawsamazoncomec2

[2] Aman Bakshi and Yogesh B Dujodwala Securing cloudfrom ddos attacks using intrusion detection system in virtualmachine In Proc of the Second International Conference onCommunication Software and Networks (ICCSN) 2010

[3] Luiz Barroso and Urs Hoelzle The Datacenter as a Com-puter An Introduction to the Design of Warehouse-Scale Ma-chines MC Publishers 2009

[4] Naomi Benger Joop van de Pol Nigel P Smart and YuvalYarom rdquoooh aah just a little bitrdquo A small amount of sidechannel can go a long way In Proc of the InternationalCryptographic Hardware and Embedded Systems Workshop(CHES) Busan South Korea 2014

[5] Major Bhadauria and Sally A McKee An approach toresource-aware co-scheduling for cmps In Proc of the 24thACM International Conference on Supercomputing (ICS)Tsukuba Japan 2010

[6] Leon Bottou Large-scale machine learning with stochas-tic gradient descent In Proceedings of the InternationalConference on Computational Statistics (COMPSTAT) ParisFrance 2010

[7] Eric Brewer Kubernetes The path to cloud native httpgooglQgkzYB SOCC Keynote August 2015

[8] Martin A Brown Traffic control howto httplinux-ipnetarticlesTraffic-Control-HOWTO

[9] Robin Burke Hybrid recommender systems Survey andexperiments User Modeling and User-Adapted Interaction12(4)331ndash370 November 2002

[10] Apache cassandra httpcassandraapacheorg

[11] Intel Rcopy64 and IA-32 Architecture Software DeveloperrsquosManual vol3B System Programming Guide Part 2 Septem-ber 2014

[12] Ludmila Cherkasova Diwaker Gupta and Amin VahdatComparison of the three cpu schedulers in xen SIGMET-RICS Perform Eval Rev 35(2)42ndash51 September 2007

[13] Scott A Crosby and Dan S Wallach Denial of service viaalgorithmic complexity attacks In Proceedings of the 12thConference on USENIX Security Symposium WashingtonDC 2003

[14] Marwan Darwish Abdelkader Ouda and Luiz FernandoCapretz Cloud-based ddos attacks and defenses In Procof i-Society Toronto ON 2013

[15] Christina Delimitrou and Christos Kozyrakis iBench Quan-tifying Interference for Datacenter Workloads In Proceed-ings of the 2013 IEEE International Symposium on WorkloadCharacterization (IISWC) Portland OR September 2013

[16] Christina Delimitrou and Christos Kozyrakis Paragon QoS-Aware Scheduling for Heterogeneous Datacenters In Pro-ceedings of the Eighteenth International Conference on Ar-chitectural Support for Programming Languages and Oper-ating Systems (ASPLOS) Houston TX USA 2013

[17] Christina Delimitrou and Christos Kozyrakis QoS-AwareScheduling in Heterogeneous Datacenters with Paragon InACM Transactions on Computer Systems (TOCS) Vol 31Issue 4 December 2013

[18] Christina Delimitrou and Christos Kozyrakis Quality-of-Service-Aware Scheduling in Heterogeneous Datacenterswith Paragon In IEEE Micro Special Issue on Top Picks fromthe Computer Architecture Conferences MayJune 2014

[19] Christina Delimitrou and Christos Kozyrakis Quasar Resource-Efficient and QoS-Aware Cluster Management In Proceed-ings of the Nineteenth International Conference on Architec-tural Support for Programming Languages and OperatingSystems (ASPLOS) Salt Lake City UT USA 2014

[20] Christina Delimitrou and Christos Kozyrakis HCloudResource-Efficient Provisioning in Shared Cloud SystemsIn Proceedings of the Twenty First International Conferenceon Architectural Support for Programming Languages andOperating Systems (ASPLOS) April 2016

[21] Christina Delimitrou Daniel Sanchez and Christos KozyrakisTarcil Reconciling Scheduling Speed and Quality in LargeShared Clusters In Proceedings of the Sixth ACM Symposiumon Cloud Computing (SOCC) August 2015

[22] Jake Edge Denial of service via hash collisions httplwnnetArticles474912 January 2012

[23] Benjamin Farley Ari Juels Venkatanathan Varadarajan ThomasRistenpart Kevin D Bowers and Michael M Swift More foryour money Exploiting performance heterogeneity in publicclouds In Proc of the ACM Symposium on Cloud Computing(SOCC) San Jose CA 2012

[24] Alexandra Fedorova Margo Seltzer and Michael D SmithImproving performance isolation on chip multiprocessors viaan operating system scheduler In Proceedings of the 16thIntl Conference on Parallel Architecture and CompilationTechniques (PACT) Brasov Romania 2007

[25] Alexander Felfernig and Robin Burke Constraint-based rec-ommender systems Technologies and research issues InProceedings of the ACM International Conference on Elec-tronic Commerce (ICEC) Innsbruck Austria 2008

[26] Brad Fitzpatrick Distributed caching with memcached InLinux Journal Volume 2004 Issue 124 2004

[27] Oded Goldreich and Rafail Ostrovsky Software protectionand simulation on oblivious rams J ACM 43(3)431ndash473May 1996

[28] Google container engine httpscloudgooglecomcontainer-engine

[29] Asela Gunawardana and Christopher Meek A unified ap-proach to building hybrid recommender systems In Proc ofthe Third ACM Conference on Recommender Systems (Rec-Sys) New York NY 2009

[30] Sanchika Gupta and Padam Kumar Vm profile based opti-mized network attack pattern detection scheme for ddos at-tacks in cloud In Proc of SSCC Mysore India 2013

[31] Yi Han Tansu Alpcan Jeffrey Chan and Christopher LeckieSecurity games for virtual machine allocation in cloud com-puting In 4th International Conference on Decision andGame Theory for Security Fort Worth TX 2013

[32] Amir Herzberg Haya Shulman Johanna Ullrich and EdgarWeippl Cloudoscopy Services discovery and topology map-ping In Proceedings of the ACM Workshop on Cloud Com-puting Security Workshop (CCSW) Berlin Germany 2013

[33] Ben Hindman Andy Konwinski Matei Zaharia Ali GhodsiAnthony D Joseph Randy Katz Scott Shenker and Ion Sto-ica Mesos A platform for fine-grained resource sharing inthe data center In Proceedings of NSDI Boston MA 2011

[34] Jingwei Huang David M Nicol and Roy H CampbellDenial-of-service threat to hadoopyarn clusters with multi-tenancy In Proc of the IEEE International Congress on BigData Washington DC 2014

[35] Alexandru Iosup Nezih Yigitbasi and Dick Epema Onthe performance variability of production cloud services InProceedings of CCGRID Newport Beach CA 2011

[36] Vimalkumar Jeyakumar Mohammad Alizadeh David MazieresBalaji Prabhakar Changhoon Kim and Albert GreenbergEyeq Practical network performance isolation at the edge InProc of the 10th USENIX Conference on Networked SystemsDesign and Implementation (NSDI) Lombard IL 2013

[37] Yaakoub El Khamra Hyunjoo Kim Shantenu Jha and Man-ish Parashar Exploring the performance fluctuations of hpcworkloads on clouds In Proceedings of CloudCom Indi-anapolis IN 2010

[38] Krzysztof C Kiwiel Convergence and efficiency of subgra-dient methods for quasiconvex minimization In Mathemat-ical Programming (Series A) (Berlin Heidelberg Springer)90 (1) pp 1-25 2001

[39] Ruby B Lee Rethinking computers for cybersecurity IEEEComputer 48(4)16ndash25 2015

[40] Jacob Leverich and Christos Kozyrakis Reconciling highserver utilization and sub-millisecond quality-of-service InProceedings of EuroSys Amsterdam The Netherlands 2014

[41] Host server cpu utilization in amazon ec2 cloud httpgoogl2LTx4T

[42] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser andRuby B Lee Last-level cache side-channel attacks are prac-tical In Proc of IEEE Symposium on Security and Privacy(SampP) San Jose CA 2015

[43] Fei Liu Lanfang Ren and Hongtao Bai Mitigating cross-vm side channel attack on multiple tenants cloud platform InJournal of Computers Vol 9 No 4 (2014) 1005-1013 April2014

[44] David Lo Liqun Cheng Rama Govindaraju Luiz Andre Bar-roso and Christos Kozyrakis Towards energy proportionalityfor large-scale latency-critical workloads In Proceedings ofthe 41st Annual International Symposium on Computer Ar-chitecuture (ISCA) Minneapolis MN 2014

[45] David Lo Liqun Cheng Rama Govindaraju ParthasarathyRanganathan and Christos Kozyrakis Heracles Improvingresource efficiency at scale In Proc of the 42Nd AnnualInternational Symposium on Computer Architecture (ISCA)Portland OR 2015

[46] Mahout httpmahoutapacheorg

[47] Dave Mangot Ec2 variability The numbers revealed httptechmangotcomrollerdaveentryec2_variability_the_numbers_revealed

[48] Jason Mars and Lingjia Tang Whare-map heterogeneity inrdquohomogeneousrdquo warehouse-scale computers In Proceedingsof ISCA Tel-Aviv Israel 2013

[49] Robert Martin John Demme and Simha SethumadhavanTimewarp Rethinking timekeeping and performance moni-toring mechanisms to mitigate side-channel attacks In Pro-ceedings of the International Symposium on Computer Archi-tecture (ISCA) Portland OR 2012

[50] David Meisner Christopher M Sadler Luiz Andre BarrosoWolf-Dietrich Weber and Thomas F Wenisch Power man-agement of online data-intensive services In Proceedings ofthe 38th annual international symposium on Computer archi-tecture pages 319ndash330 2011

[51] Jelena Mirkovic and Peter Reiher A taxonomy of ddos attackand ddos defense mechanisms ACM SIGCOMM ComputerCommunication Review (CCR) April 2004

[52] Thomas Moscibroda and Onur Mutlu Memory performanceattacks Denial of memory service in multi-core systemsIn Proc of 16th USENIX Security Symposium on USENIXSecurity Symposium (SS) Boston MA 2007

[53] Ripal Nathuji Aman Kansal and Alireza Ghaffarkhah Q-clouds Managing performance interference effects for qos-aware clouds In Proceedings of EuroSys ParisFrance 2010

[54] Simon Ostermann Alexandru Iosup Nezih Yigitbasi RaduProdan Thomas Fahringer and Dick Epema A performanceanalysis of ec2 cloud computing services for scientific com-puting In Lecture Notes on Cloud Computing Volume 34p115-131 2010

[55] Tao Peng Christopher Leckie and Kotagiri RamamohanaraoSurvey of network-based defense mechanisms countering thedos and ddos problems ACM Comput Surv 39(1) April2007

[56] Diego Perez-Botero Jakub Szefer and Ruby B Lee Char-acterizing hypervisor vulnerabilities in cloud computingservers In Proceedings of the 2013 International Work-shop on Security in Cloud Computing SCCASIACCSHangzhou China 2013

[57] Moinuddin K Qureshi and Yale N Patt Utility-based cachepartitioning A low-overhead high-performance runtimemechanism to partition shared caches In Proceedings ofthe 39th Annual IEEEACM International Symposium on Mi-croarchitecture MICRO 39 2006

[58] Himanshu Raj Ripal Nathuji Abhishek Singh and Paul Eng-land Resource management for isolation enhanced cloud ser-vices In Proc of the ACM Workshop on Cloud ComputingSecurity (CCSW) Chicago IL 2009

[59] Suhail Rehman and Majd Sakr Initial findings for provision-ing variation in cloud computing In Proceedings of Cloud-Com Indianapolis IN 2010

[60] Thomas Ristenpart Eran Tromer Hovav Shacham and Ste-fan Savage Hey you get off of my cloud Exploring infor-mation leakage in third-party compute clouds In Proc of theACM Conference on Computer and Communications Security(CCS) Chicago IL 2009

[61] Daniel Sanchez and Christos Kozyrakis Vantage Scalableand Efficient Fine-Grain Cache Partitioning In Proceedingsof the 38th annual International Symposium in ComputerArchitecture (ISCA-38) San Jose CA June 2011

[62] Jorg Schad Jens Dittrich and Jorge-Arnulfo Quiane-RuizRuntime measurements in the cloud Observing analyzingand reducing variance Proceedings VLDB Endow 3(1-2)460ndash471 September 2010

[63] Malte Schwarzkopf Andy Konwinski Michael Abd-El-Malek and John Wilkes Omega flexible scalable sched-ulers for large compute clusters In Proceedings of EuroSysPrague Czech Republic 2013

[64] Alan Shieh Srikanth Kandula Albert Greenberg and ChanghoonKim Seawall Performance isolation for cloud datacenternetworks In Proc of the USENIX Conference on Hot Topicsin Cloud Computing (HotCloud) Boston MA 2010

[65] David Shue Michael J Freedman and Anees Shaikh Perfor-mance isolation and fairness for multi-tenant cloud storage InProc of the 10th USENIX Conference on Operating SystemsDesign and Implementation (OSDI) Hollywood CA 2012

[66] Dan Tsafrir Yoav Etsion and Dror G Feitelson Secretlymonopolizing the cpu without superuser privileges In Procof 16th USENIX Security Symposium on USENIX SecuritySymposium Boston MA 2007

[67] Venkatanathan Varadarajan Thawan Kooburat BenjaminFarley Thomas Ristenpart and Michael Swift Resource-freeing attacks Improve your cloud performance (at yourneighborrsquos expense) In Proc of the Conference on Computerand Communications Security (CCS) Raleigh 2012

[68] Venkatanathan Varadarajan Thomas Ristenpart and MichaelSwift Scheduler-based defenses against cross-vm side-channels In Proc of the 23rd Usenix Security SymposiumSan Diego CA 2014

[69] Venkatanathan Varadarajan Yinqian Zhang Thomas Risten-part and Michael Swift A placement vulnerability study inmulti-tenant public clouds In Proc of the 24th USENIX Secu-rity Symposium (USENIX Security) Washington DC 2015

[70] Huaibin Wang Haiyun Zhou and Chundong Wang Vir-tual machine-based intrusion detection system framework incloud computing environment In Journal of Computers Oc-tober 2012

[71] Hui Wang Canturk Isci Lavanya Subramanian JongmooChoi Depei Qian and Onur Mutlu A-drm Architecture-aware distributed resource management of virtualized clus-ters In Proceedings of the 11th ACM SIGPLANSIGOPSinternational conference on Virtual Execution Environments(VEE) Istanbul Turkey 2015

[72] Ian H Witten Eibe Frank and Geoffrey Holmes DataMining Practical Machine Learning Tools and Techniques3rd Edition

[73] Zhenyu Wu Zhang Xu and Haining Wang Whispers inthe hyper-space High-speed covert channel attacks in thecloud In Proc of the 21st USENIX Conference on SecuritySymposium (USENIX Security) Bellevue WA 2012

[74] Yunjing Xu Michael Bailey Farnam Jahanian KaustubhJoshi Matti Hiltunen and Richard Schlichting An explo-ration of l2 cache covert channels in virtualized environmentsIn Proc of the 3rd ACM Workshop on Cloud Computing Se-curity Workshop (CCSW) Chicago IL 2011

[75] Zhang Xu Haining Wang and Zhenyu Wu A measurementstudy on co-residence threat inside the cloud In Proc ofthe 24th USENIX Security Symposium (USENIX Security)Washington DC 2015

[76] Yuval Yarom and Katrina Falkner Flush+reload a highresolution low noise l3 cache side-channel attack In Proc ofthe 23rd Usenix Security Symposium San Diego CA 2014

[77] Matei Zaharia Mosharaf Chowdhury Tathagata Das AnkurDave Justin Ma Murphy McCauly Michael J FranklinScott Shenker and Ion Stoica Resilient distributed datasetsA fault-tolerant abstraction for in-memory cluster computingIn Proceedings of NSDI San Jose CA 2012

[78] Yinqian Zhang Ari Juels Alina Oprea and Michael K Re-iter Homealone Co-residency detection in the cloud via side-channel analysis In Proc of the IEEE Symposium on Securityand Privacy Oakland CA 2011

[79] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-tenant side-channel attacks in paas cloudsIn Proc of the ACM SIGSAC Conference on Computer andCommunications Security (CCS) Scottsdale AZ 2014

[80] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-vm side channels and their use to ex-tract private keys In Proceedings of the ACM Conferenceon Computer and Communications Security (CCS) RaleighNC 2012

[81] Yinqian Zhang and Michael K Reiter Duppel retrofittingcommodity operating systems to mitigate cache side channelsin the cloud In Proc of the ACM Conference on Computerand Communications Security (CCS) Berlin Germany 2013

[82] Fangfei Zhou Manish Goel Peter Desnoyers and Ravi Sun-daram Scheduler vulnerabilities and coordinated attacks incloud computing J Comput Secur 21(4)533ndash559 July2013

[83] Jieming Zhu Pinjia He Zibin Zheng and Michael R LyuTowards online accurate and scalable qos prediction for run-time service adaptation In Proc of the IEEE InternationalConference on Distributed Computing Systems (ICDCS)Madrid Spain 2014

[84] Sergey Zhuravlev Sergey Blagodurov and Alexandra Fe-dorova Addressing shared resource contention in multicoreprocessors via scheduling In Proc of the Fifteenth Editionof on Architectural Support for Programming Languages andOperating Systems (ASPLOS) Pittsburgh PA 2010

  • Introduction
  • Related Work
  • Bolt
    • Threat Model
    • Application Detection
    • Challenges
    • Detection Accuracy
    • Limitations
      • Cloud User Study
      • Security Attacks
        • Internal Denial of Service Attack
        • Resource Freeing Attack
        • VM Co-residency Detection
          • Improving Security via Resource Isolation
          • Conclusions
Page 14: Bolt: I Know What You Did Last Summer In the Cloudcsl.stanford.edu/~christos/publications/2017.bolt.asplos.pdf · 2018-01-09 · Bolt: I Know What You Did Last Summer... In the Cloud

[31] Yi Han Tansu Alpcan Jeffrey Chan and Christopher LeckieSecurity games for virtual machine allocation in cloud com-puting In 4th International Conference on Decision andGame Theory for Security Fort Worth TX 2013

[32] Amir Herzberg Haya Shulman Johanna Ullrich and EdgarWeippl Cloudoscopy Services discovery and topology map-ping In Proceedings of the ACM Workshop on Cloud Com-puting Security Workshop (CCSW) Berlin Germany 2013

[33] Ben Hindman Andy Konwinski Matei Zaharia Ali GhodsiAnthony D Joseph Randy Katz Scott Shenker and Ion Sto-ica Mesos A platform for fine-grained resource sharing inthe data center In Proceedings of NSDI Boston MA 2011

[34] Jingwei Huang David M Nicol and Roy H CampbellDenial-of-service threat to hadoopyarn clusters with multi-tenancy In Proc of the IEEE International Congress on BigData Washington DC 2014

[35] Alexandru Iosup Nezih Yigitbasi and Dick Epema Onthe performance variability of production cloud services InProceedings of CCGRID Newport Beach CA 2011

[36] Vimalkumar Jeyakumar Mohammad Alizadeh David MazieresBalaji Prabhakar Changhoon Kim and Albert GreenbergEyeq Practical network performance isolation at the edge InProc of the 10th USENIX Conference on Networked SystemsDesign and Implementation (NSDI) Lombard IL 2013

[37] Yaakoub El Khamra Hyunjoo Kim Shantenu Jha and Man-ish Parashar Exploring the performance fluctuations of hpcworkloads on clouds In Proceedings of CloudCom Indi-anapolis IN 2010

[38] Krzysztof C Kiwiel Convergence and efficiency of subgra-dient methods for quasiconvex minimization In Mathemat-ical Programming (Series A) (Berlin Heidelberg Springer)90 (1) pp 1-25 2001

[39] Ruby B Lee Rethinking computers for cybersecurity IEEEComputer 48(4)16ndash25 2015

[40] Jacob Leverich and Christos Kozyrakis Reconciling highserver utilization and sub-millisecond quality-of-service InProceedings of EuroSys Amsterdam The Netherlands 2014

[41] Host server cpu utilization in amazon ec2 cloud httpgoogl2LTx4T

[42] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser andRuby B Lee Last-level cache side-channel attacks are prac-tical In Proc of IEEE Symposium on Security and Privacy(SampP) San Jose CA 2015

[43] Fei Liu Lanfang Ren and Hongtao Bai Mitigating cross-vm side channel attack on multiple tenants cloud platform InJournal of Computers Vol 9 No 4 (2014) 1005-1013 April2014

[44] David Lo Liqun Cheng Rama Govindaraju Luiz Andre Bar-roso and Christos Kozyrakis Towards energy proportionalityfor large-scale latency-critical workloads In Proceedings ofthe 41st Annual International Symposium on Computer Ar-chitecuture (ISCA) Minneapolis MN 2014

[45] David Lo Liqun Cheng Rama Govindaraju ParthasarathyRanganathan and Christos Kozyrakis Heracles Improvingresource efficiency at scale In Proc of the 42Nd AnnualInternational Symposium on Computer Architecture (ISCA)Portland OR 2015

[46] Mahout httpmahoutapacheorg

[47] Dave Mangot Ec2 variability The numbers revealed httptechmangotcomrollerdaveentryec2_variability_the_numbers_revealed

[48] Jason Mars and Lingjia Tang Whare-map heterogeneity inrdquohomogeneousrdquo warehouse-scale computers In Proceedingsof ISCA Tel-Aviv Israel 2013

[49] Robert Martin John Demme and Simha SethumadhavanTimewarp Rethinking timekeeping and performance moni-toring mechanisms to mitigate side-channel attacks In Pro-ceedings of the International Symposium on Computer Archi-tecture (ISCA) Portland OR 2012

[50] David Meisner Christopher M Sadler Luiz Andre BarrosoWolf-Dietrich Weber and Thomas F Wenisch Power man-agement of online data-intensive services In Proceedings ofthe 38th annual international symposium on Computer archi-tecture pages 319ndash330 2011

[51] Jelena Mirkovic and Peter Reiher A taxonomy of ddos attackand ddos defense mechanisms ACM SIGCOMM ComputerCommunication Review (CCR) April 2004

[52] Thomas Moscibroda and Onur Mutlu Memory performanceattacks Denial of memory service in multi-core systemsIn Proc of 16th USENIX Security Symposium on USENIXSecurity Symposium (SS) Boston MA 2007

[53] Ripal Nathuji Aman Kansal and Alireza Ghaffarkhah Q-clouds Managing performance interference effects for qos-aware clouds In Proceedings of EuroSys ParisFrance 2010

[54] Simon Ostermann Alexandru Iosup Nezih Yigitbasi RaduProdan Thomas Fahringer and Dick Epema A performanceanalysis of ec2 cloud computing services for scientific com-puting In Lecture Notes on Cloud Computing Volume 34p115-131 2010

[55] Tao Peng Christopher Leckie and Kotagiri RamamohanaraoSurvey of network-based defense mechanisms countering thedos and ddos problems ACM Comput Surv 39(1) April2007

[56] Diego Perez-Botero Jakub Szefer and Ruby B Lee Char-acterizing hypervisor vulnerabilities in cloud computingservers In Proceedings of the 2013 International Work-shop on Security in Cloud Computing SCCASIACCSHangzhou China 2013

[57] Moinuddin K Qureshi and Yale N Patt Utility-based cachepartitioning A low-overhead high-performance runtimemechanism to partition shared caches In Proceedings ofthe 39th Annual IEEEACM International Symposium on Mi-croarchitecture MICRO 39 2006

[58] Himanshu Raj Ripal Nathuji Abhishek Singh and Paul Eng-land Resource management for isolation enhanced cloud ser-vices In Proc of the ACM Workshop on Cloud ComputingSecurity (CCSW) Chicago IL 2009

[59] Suhail Rehman and Majd Sakr Initial findings for provision-ing variation in cloud computing In Proceedings of Cloud-Com Indianapolis IN 2010

[60] Thomas Ristenpart Eran Tromer Hovav Shacham and Ste-fan Savage Hey you get off of my cloud Exploring infor-mation leakage in third-party compute clouds In Proc of theACM Conference on Computer and Communications Security(CCS) Chicago IL 2009

[61] Daniel Sanchez and Christos Kozyrakis Vantage Scalableand Efficient Fine-Grain Cache Partitioning In Proceedingsof the 38th annual International Symposium in ComputerArchitecture (ISCA-38) San Jose CA June 2011

[62] Jorg Schad Jens Dittrich and Jorge-Arnulfo Quiane-RuizRuntime measurements in the cloud Observing analyzingand reducing variance Proceedings VLDB Endow 3(1-2)460ndash471 September 2010

[63] Malte Schwarzkopf Andy Konwinski Michael Abd-El-Malek and John Wilkes Omega flexible scalable sched-ulers for large compute clusters In Proceedings of EuroSysPrague Czech Republic 2013

[64] Alan Shieh Srikanth Kandula Albert Greenberg and ChanghoonKim Seawall Performance isolation for cloud datacenternetworks In Proc of the USENIX Conference on Hot Topicsin Cloud Computing (HotCloud) Boston MA 2010

[65] David Shue Michael J Freedman and Anees Shaikh Perfor-mance isolation and fairness for multi-tenant cloud storage InProc of the 10th USENIX Conference on Operating SystemsDesign and Implementation (OSDI) Hollywood CA 2012

[66] Dan Tsafrir Yoav Etsion and Dror G Feitelson Secretlymonopolizing the cpu without superuser privileges In Procof 16th USENIX Security Symposium on USENIX SecuritySymposium Boston MA 2007

[67] Venkatanathan Varadarajan Thawan Kooburat BenjaminFarley Thomas Ristenpart and Michael Swift Resource-freeing attacks Improve your cloud performance (at yourneighborrsquos expense) In Proc of the Conference on Computerand Communications Security (CCS) Raleigh 2012

[68] Venkatanathan Varadarajan Thomas Ristenpart and MichaelSwift Scheduler-based defenses against cross-vm side-channels In Proc of the 23rd Usenix Security SymposiumSan Diego CA 2014

[69] Venkatanathan Varadarajan Yinqian Zhang Thomas Risten-part and Michael Swift A placement vulnerability study inmulti-tenant public clouds In Proc of the 24th USENIX Secu-rity Symposium (USENIX Security) Washington DC 2015

[70] Huaibin Wang Haiyun Zhou and Chundong Wang Vir-tual machine-based intrusion detection system framework incloud computing environment In Journal of Computers Oc-tober 2012

[71] Hui Wang Canturk Isci Lavanya Subramanian JongmooChoi Depei Qian and Onur Mutlu A-drm Architecture-aware distributed resource management of virtualized clus-ters In Proceedings of the 11th ACM SIGPLANSIGOPSinternational conference on Virtual Execution Environments(VEE) Istanbul Turkey 2015

[72] Ian H Witten Eibe Frank and Geoffrey Holmes DataMining Practical Machine Learning Tools and Techniques3rd Edition

[73] Zhenyu Wu Zhang Xu and Haining Wang Whispers inthe hyper-space High-speed covert channel attacks in thecloud In Proc of the 21st USENIX Conference on SecuritySymposium (USENIX Security) Bellevue WA 2012

[74] Yunjing Xu Michael Bailey Farnam Jahanian KaustubhJoshi Matti Hiltunen and Richard Schlichting An explo-ration of l2 cache covert channels in virtualized environmentsIn Proc of the 3rd ACM Workshop on Cloud Computing Se-curity Workshop (CCSW) Chicago IL 2011

[75] Zhang Xu Haining Wang and Zhenyu Wu A measurementstudy on co-residence threat inside the cloud In Proc ofthe 24th USENIX Security Symposium (USENIX Security)Washington DC 2015

[76] Yuval Yarom and Katrina Falkner Flush+reload a highresolution low noise l3 cache side-channel attack In Proc ofthe 23rd Usenix Security Symposium San Diego CA 2014

[77] Matei Zaharia Mosharaf Chowdhury Tathagata Das AnkurDave Justin Ma Murphy McCauly Michael J FranklinScott Shenker and Ion Stoica Resilient distributed datasetsA fault-tolerant abstraction for in-memory cluster computingIn Proceedings of NSDI San Jose CA 2012

[78] Yinqian Zhang Ari Juels Alina Oprea and Michael K Re-iter Homealone Co-residency detection in the cloud via side-channel analysis In Proc of the IEEE Symposium on Securityand Privacy Oakland CA 2011

[79] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-tenant side-channel attacks in paas cloudsIn Proc of the ACM SIGSAC Conference on Computer andCommunications Security (CCS) Scottsdale AZ 2014

[80] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-vm side channels and their use to ex-tract private keys In Proceedings of the ACM Conferenceon Computer and Communications Security (CCS) RaleighNC 2012

[81] Yinqian Zhang and Michael K Reiter Duppel retrofittingcommodity operating systems to mitigate cache side channelsin the cloud In Proc of the ACM Conference on Computerand Communications Security (CCS) Berlin Germany 2013

[82] Fangfei Zhou Manish Goel Peter Desnoyers and Ravi Sun-daram Scheduler vulnerabilities and coordinated attacks incloud computing J Comput Secur 21(4)533ndash559 July2013

[83] Jieming Zhu Pinjia He Zibin Zheng and Michael R LyuTowards online accurate and scalable qos prediction for run-time service adaptation In Proc of the IEEE InternationalConference on Distributed Computing Systems (ICDCS)Madrid Spain 2014

[84] Sergey Zhuravlev Sergey Blagodurov and Alexandra Fe-dorova Addressing shared resource contention in multicoreprocessors via scheduling In Proc of the Fifteenth Editionof on Architectural Support for Programming Languages andOperating Systems (ASPLOS) Pittsburgh PA 2010

  • Introduction
  • Related Work
  • Bolt
    • Threat Model
    • Application Detection
    • Challenges
    • Detection Accuracy
    • Limitations
      • Cloud User Study
      • Security Attacks
        • Internal Denial of Service Attack
        • Resource Freeing Attack
        • VM Co-residency Detection
          • Improving Security via Resource Isolation
          • Conclusions
Page 15: Bolt: I Know What You Did Last Summer In the Cloudcsl.stanford.edu/~christos/publications/2017.bolt.asplos.pdf · 2018-01-09 · Bolt: I Know What You Did Last Summer... In the Cloud

[59] Suhail Rehman and Majd Sakr Initial findings for provision-ing variation in cloud computing In Proceedings of Cloud-Com Indianapolis IN 2010

[60] Thomas Ristenpart Eran Tromer Hovav Shacham and Ste-fan Savage Hey you get off of my cloud Exploring infor-mation leakage in third-party compute clouds In Proc of theACM Conference on Computer and Communications Security(CCS) Chicago IL 2009

[61] Daniel Sanchez and Christos Kozyrakis Vantage Scalableand Efficient Fine-Grain Cache Partitioning In Proceedingsof the 38th annual International Symposium in ComputerArchitecture (ISCA-38) San Jose CA June 2011

[62] Jorg Schad Jens Dittrich and Jorge-Arnulfo Quiane-RuizRuntime measurements in the cloud Observing analyzingand reducing variance Proceedings VLDB Endow 3(1-2)460ndash471 September 2010

[63] Malte Schwarzkopf Andy Konwinski Michael Abd-El-Malek and John Wilkes Omega flexible scalable sched-ulers for large compute clusters In Proceedings of EuroSysPrague Czech Republic 2013

[64] Alan Shieh Srikanth Kandula Albert Greenberg and ChanghoonKim Seawall Performance isolation for cloud datacenternetworks In Proc of the USENIX Conference on Hot Topicsin Cloud Computing (HotCloud) Boston MA 2010

[65] David Shue Michael J Freedman and Anees Shaikh Perfor-mance isolation and fairness for multi-tenant cloud storage InProc of the 10th USENIX Conference on Operating SystemsDesign and Implementation (OSDI) Hollywood CA 2012

[66] Dan Tsafrir Yoav Etsion and Dror G Feitelson Secretlymonopolizing the cpu without superuser privileges In Procof 16th USENIX Security Symposium on USENIX SecuritySymposium Boston MA 2007

[67] Venkatanathan Varadarajan Thawan Kooburat BenjaminFarley Thomas Ristenpart and Michael Swift Resource-freeing attacks Improve your cloud performance (at yourneighborrsquos expense) In Proc of the Conference on Computerand Communications Security (CCS) Raleigh 2012

[68] Venkatanathan Varadarajan Thomas Ristenpart and MichaelSwift Scheduler-based defenses against cross-vm side-channels In Proc of the 23rd Usenix Security SymposiumSan Diego CA 2014

[69] Venkatanathan Varadarajan Yinqian Zhang Thomas Risten-part and Michael Swift A placement vulnerability study inmulti-tenant public clouds In Proc of the 24th USENIX Secu-rity Symposium (USENIX Security) Washington DC 2015

[70] Huaibin Wang Haiyun Zhou and Chundong Wang Vir-tual machine-based intrusion detection system framework incloud computing environment In Journal of Computers Oc-tober 2012

[71] Hui Wang Canturk Isci Lavanya Subramanian JongmooChoi Depei Qian and Onur Mutlu A-drm Architecture-aware distributed resource management of virtualized clus-ters In Proceedings of the 11th ACM SIGPLANSIGOPSinternational conference on Virtual Execution Environments(VEE) Istanbul Turkey 2015

[72] Ian H Witten Eibe Frank and Geoffrey Holmes DataMining Practical Machine Learning Tools and Techniques3rd Edition

[73] Zhenyu Wu Zhang Xu and Haining Wang Whispers inthe hyper-space High-speed covert channel attacks in thecloud In Proc of the 21st USENIX Conference on SecuritySymposium (USENIX Security) Bellevue WA 2012

[74] Yunjing Xu Michael Bailey Farnam Jahanian KaustubhJoshi Matti Hiltunen and Richard Schlichting An explo-ration of l2 cache covert channels in virtualized environmentsIn Proc of the 3rd ACM Workshop on Cloud Computing Se-curity Workshop (CCSW) Chicago IL 2011

[75] Zhang Xu Haining Wang and Zhenyu Wu A measurementstudy on co-residence threat inside the cloud In Proc ofthe 24th USENIX Security Symposium (USENIX Security)Washington DC 2015

[76] Yuval Yarom and Katrina Falkner Flush+reload a highresolution low noise l3 cache side-channel attack In Proc ofthe 23rd Usenix Security Symposium San Diego CA 2014

[77] Matei Zaharia Mosharaf Chowdhury Tathagata Das AnkurDave Justin Ma Murphy McCauly Michael J FranklinScott Shenker and Ion Stoica Resilient distributed datasetsA fault-tolerant abstraction for in-memory cluster computingIn Proceedings of NSDI San Jose CA 2012

[78] Yinqian Zhang Ari Juels Alina Oprea and Michael K Re-iter Homealone Co-residency detection in the cloud via side-channel analysis In Proc of the IEEE Symposium on Securityand Privacy Oakland CA 2011

[79] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-tenant side-channel attacks in paas cloudsIn Proc of the ACM SIGSAC Conference on Computer andCommunications Security (CCS) Scottsdale AZ 2014

[80] Yinqian Zhang Ari Juels Michael K Reiter and ThomasRistenpart Cross-vm side channels and their use to ex-tract private keys In Proceedings of the ACM Conferenceon Computer and Communications Security (CCS) RaleighNC 2012

[81] Yinqian Zhang and Michael K Reiter Duppel retrofittingcommodity operating systems to mitigate cache side channelsin the cloud In Proc of the ACM Conference on Computerand Communications Security (CCS) Berlin Germany 2013

[82] Fangfei Zhou Manish Goel Peter Desnoyers and Ravi Sun-daram Scheduler vulnerabilities and coordinated attacks incloud computing J Comput Secur 21(4)533ndash559 July2013

[83] Jieming Zhu Pinjia He Zibin Zheng and Michael R LyuTowards online accurate and scalable qos prediction for run-time service adaptation In Proc of the IEEE InternationalConference on Distributed Computing Systems (ICDCS)Madrid Spain 2014

[84] Sergey Zhuravlev Sergey Blagodurov and Alexandra Fe-dorova Addressing shared resource contention in multicoreprocessors via scheduling In Proc of the Fifteenth Editionof on Architectural Support for Programming Languages andOperating Systems (ASPLOS) Pittsburgh PA 2010

  • Introduction
  • Related Work
  • Bolt
    • Threat Model
    • Application Detection
    • Challenges
    • Detection Accuracy
    • Limitations
      • Cloud User Study
      • Security Attacks
        • Internal Denial of Service Attack
        • Resource Freeing Attack
        • VM Co-residency Detection
          • Improving Security via Resource Isolation
          • Conclusions