M2R: Enabling Stronger Privacy in MapReduce Computation

This paper is included in the Proceedings of the 24th USENIX Security Symposium

August 12–14, 2015 • Washington, D.C.

ISBN 978-1-931971-232

Open access to the Proceedings of the 24th USENIX Security Symposium

is sponsored by USENIX

M2R: Enabling Stronger Privacy in MapReduce Computation

Tien Tuan Anh Dinh, Prateek Saxena, Ee-Chien Chang, Beng Chin Ooi, and Chunwang Zhang, National University of Singapore

https://www.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/dinh

USENIX Association 24th USENIX Security Symposium 447

M2R: Enabling Stronger Privacy in MapReduce Computation

Tien Tuan Anh Dinh, Prateek Saxena, Ee-Chien Chang, Beng Chin Ooi, Chunwang ZhangSchool of Computing, National University of Singapore

[email protected], [email protected], [email protected]@comp.nus.edu.sg, [email protected]

AbstractNew big-data analysis platforms can enable distributedcomputation on encrypted data by utilizing trusted com-puting primitives available in commodity server hard-ware. We study techniques for ensuring privacy-preserving computation in the popular MapReduceframework. In this paper, we first show that protect-ing only individual units of distributed computation (e.g.map and reduce units), as proposed in recent works,leaves several important channels of information leak-age exposed to the adversary. Next, we analyze a varietyof design choices in achieving a stronger notion of pri-vate execution that is the analogue of using a distributedoblivious-RAM (ORAM) across the platform. We de-velop a simple solution which avoids using the expen-sive ORAM construction, and incurs only an additivelogarithmic factor of overhead to the latency. We im-plement our solution in a system called M2R, which en-hances an existing Hadoop implementation, and evaluateit on seven standard MapReduce benchmarks. We showthat it is easy to port most existing applications to M2Rby changing fewer than 43 lines of code. M2R adds fewerthan 500 lines of code to the TCB, which is less than0.16% of the Hadoop codebase. M2R offers a factor of1.3× to 44.6× lower overhead than extensions of previ-ous solutions with equivalent privacy. M2R adds a total of17% to 130% overhead over the insecure baseline solu-tion that ignores the leakage channels M2R addresses.

1 Introduction

The threat of data theft in public and private clouds frominsiders (e.g. curious administrators) is a serious con-cern. Encrypting data on the cloud storage is one stan-dard technique which allows users to protect their sen-sitive data from such insider threats. However, oncethe data is encrypted, enabling computation on it posesa significant challenge. To enable privacy-preservingcomputation, a range of security primitives have sur-faced recently, including trusted computing support forhardware-isolated computation [2, 5, 38, 40] as well aspurely cryptographic techniques [20,21,47]. These prim-

itives show promising ways for running computation se-curely on a single device running an untrusted softwarestack. For instance, trusted computing primitives can iso-late units of computation on an untrusted cloud server. Inthis approach, the hardware provides a confidential andintegrity-protected execution environment to which en-cryption keys can be made available for decrypting thedata before computing on it. Previous works have suc-cessfully demonstrated how to securely execute a unit ofuser-defined computation on an untrusted cloud node, us-ing support from hardware primitives available in com-modity CPUs [8, 14, 38, 39, 49] .

In this paper, we study the problem of enablingprivacy-preserving distributed computation on an un-trusted cloud. A sensitive distributed computation taskcomprises many units of computation which are sched-uled to run on a multi-node cluster (or cloud). The inputand output data between units of computation are sentover channels controlled by the cloud provisioning sys-tem, which may be compromised. We assume that eachcomputation node in the cluster is equipped with a CPUthat supports trusted computing primitives (for example,TPMs or Intel SGX). Our goal is to enable a privacy-preserving execution of a distributed computation task.Consequently, we focus on designing privacy in the pop-ular MapReduce framework [17]. However, our tech-niques can be applied to other distributed dataflow frame-works such as Spark [62], Dryad [26], and epiC [27].

Problem. A MapReduce computation consists of twotypes of units of computation, namely map and re-duce, each of which takes key-value tuples as input.The MapReduce provisioning platform, for exampleHadoop [1], is responsible for scheduling the map/reduceoperations for the execution in a cluster and for provid-ing a data channel between them [31]. We aim to achievea strong level of security in the distributed execution of aMapReduce task (or job) — that is, the adversary learnsnothing beyond the execution time and the number ofinput and output of each computation unit. If we vieweach unit of computation as one atomic operation of alarger distributed program, the execution can be thoughtof as running a set of operations on data values passed

1

448 24th USENIX Security Symposium USENIX Association

via a data channel (or a global “RAM”) under the ad-versary’s control. That is, our definition of privacy isanalogous to the strong level of privacy offered by thewell-known oblivious RAM protocol in the monolithicprocessor case [22].

We assume that the MapReduce provisioning platformis compromised, say running malware on all nodes inthe cluster. Our starting point in developing a defenseis a baseline system which runs each unit of computation(map/reduce instance) in a hardware-isolated process, asproposed in recent systems [49, 59]. Inputs and outputsof each computation unit are encrypted, thus the adver-sary observes only encrypted data. While this baselineoffers a good starting point, merely encrypting data in-transit between units of computation is not sufficient (seeSection 3). For instance, the adversary can observe thepattern of data reads/writes between units. As anotherexample, the adversary can learn the synchronization be-tween map and reduce units due to the scheduling struc-ture of the provisioning platform. Further, the adversaryhas the ability to duplicate computation, or tamper withthe routing of encrypted data to observe variations in theexecution of the program.

Challenges. There are several challenges in buildinga practical system that achieves our model of privacy.First, to execute map or reduce operations on a singlecomputation node, one could run all computation units— including the entire MapReduce platform — in an ex-ecution environment that is protected by use of existingtrusted computing primitives. However, such a solutionwould entail little trust given the large TCB, besides be-ing unwieldy to implement. For instance, a standard im-plementation of the Hadoop stack is over 190K lines ofcode. The scope of exploit from vulnerabilities in such aTCB is large. Therefore, the first challenge is to enablepractical privacy by minimizing the increase in platformTCB and without requiring any algorithmic changes tothe original application.

The second challenge is in balancing the needs of pri-vacy and performance. Addressing the leakage channelsdiscussed above using generic methods easily yields asolution with poor practical efficiency. For instance, hid-ing data read/write patterns between specific map andreduce operations could be achieved by a generic obliv-ious RAM (ORAM) solution [22, 55]. However, sucha solution would introduce a slowdown proportional topolylog in the size of the intermediate data exchange,which could degrade performance by over 100× whengigabytes of data are processed.

Our Approach. We make two observations that en-able us to achieve our model of privacy in a MapRe-duce implementation. First, on a single node, most of theMapReduce codebase can stay outside of the TCB (i.e.

code performing I/O and scheduling related tasks). Thus,we design four new components that integrate readily tothe existing MapReduce infrastructure. These compo-nents which amount to fewer than 500 lines of code arethe only pieces of trusted logic that need to be in theTCB, and are run in a protected environment on eachcomputation node. Second, MapReduce computation(and computation in distributed dataflow frameworks ingeneral) has a specific structure of data exchange and ex-ecution between map and reduce operations; that is, themap writes the data completely before it is consumed bythe reduce. Exploiting this structure, we design a com-ponent called secure shuffler which achieves the desiredsecurity but is much less expensive than a generic ORAMsolution, adding only a O(logN) additive term to the la-tency, where N is the size of the data.

Results. We have implemented a system called M2Rbased on Hadoop [1]. We ported 7 applications from apopular big-data benchmarks [25] and evaluated them ona cluster. The results confirm three findings. First, port-ing MapReduce jobs to M2R requires small developmenteffort: changing less than 45 lines of code. Second, oursolution offers a factor of 1.3× to 44.6× (median 11.2×)reduction in overhead compared to the existing solu-tions with equivalent privacy, and a total of 17%−130%of overhead over the baseline solution which protectsagainst none of the attacks we focus on in this paper. Ouroverhead is moderately high, but M2R has high compati-bility and is usable with high-sensitivity big data analysistasks (e.g. in medical, social or financial data analytics).Third, the design is scalable and adds a TCB of less than0.16% of the original Hadoop codebase.

Contributions. In summary, our work makes three keycontributions:

• Privacy-preserving distributed computation. Wedefine a new pragmatic level of privacy which canbe achieved in the MapReduce framework requiringno algorithmic restructuring of applications.

• Attacks. We show that merely encrypting data in en-claved execution (with hardware primitives) is inse-cure, leading to significant privacy loss.

• Practical Design. We design a simple, non-intrusive architectural change to MapReduce. Weimplement it in a real Hadoop implementationand benchmark its performance cost for privacy-sensitive applications.

2 The Problem

Our goal is to enable privacy-preserving computation fordistributed dataflow frameworks. Our current design and

2


map

mapper

shufflermap

map

shufflermap

reduce

reduce

reduce

...

...

...

......

reducer

mapper

reducer

reducer

shuffle phase

Figure 1: The MapReduce computation model.

implementation are specific to MapReduce framework,the computation structure of which is nevertheless simi-lar to other distributed dataflow engines [26, 27, 62], dif-fering mainly in supported operations.Background on MapReduce. The MapReduce lan-guage enforces a strict structure: the computation taskis split into map and reduce operations. Each instanceof a map or reduce, called a computation unit (or unit),takes a list of key-value tuples1. A MapReduce task con-sists of sequential phases of map and reduce operations.Once the map step is finished, the intermediate tuplesare grouped by their key-components. This process ofgrouping is known as shuffling. All tuples belonging toone group are processed by a reduce instance which ex-pects to receive tuples sorted by their key-component.Outputs of the reduce step can be used as inputs for themap step in the next phase, creating a chained MapRe-duce task. Figure 1 shows the dataflow from the map tothe reduce operations via the shuffling step. In the actualimplementation, the provisioning of all map units on onecluster node is locally handled by a mapper process, andsimilarly, by a reducer process for reduce units.

2.1 Threat ModelThe adversary is a malicious insider in the cloud, aimingto subvert the confidentiality of the client’s computationrunning on the MapReduce platform. We assume that theadversary has complete access to the network and storageback-end of the infrastructure and can tamper with thepersistent storage or network traffic. For each compu-tation node in the cluster, we assume that the adversarycan corrupt the entire software stack, say by installingmalware.

We consider an adversary that perpetrates both pas-sive and active attacks. A passive or honest-but-curiousattacker passively observes the computation session, be-

1To avoid confusion of the tuple key with cryptographic key, werefer to the first component in the tuple as key-component.

having honestly in relaying data between computationunits, but aims to infer sensitive information from theobserved data. This is a pragmatic model which includesadversaries that observe data backed up periodically ondisk for archival, or have access to performance moni-toring interfaces. An active or malicious attacker (e.g.an installed malware) can deviate arbitrarily from the ex-pected behavior and tamper with any data under its con-trol. Our work considers both such attacks.

There are at least two direct attacks that an adversarycan mount on a MapReduce computation session. First,the adversary can observe data passing between compu-tation units. If the data is left unencrypted, this leads toa direct breach in confidentiality. Second, the adversarycan subvert the computation of each map/reduce instanceby tampering with its execution. To address these basicthreats, we start with a baseline system described below.

Baseline System. We consider the baseline system inwhich each computation unit is hardware-isolated andexecuted privately. We assume that the baseline sys-tem guarantees that the program can only be invoked onits entire input dataset, or else it aborts in its first mapphase. Data blocks entering and exiting a computationunit are encrypted with authenticated encryption, and allside-channels from each computation unit are assumed tobe masked [51]. Intermediate data is decrypted only ina hardware-attested computation unit, which has limitedmemory to securely process up to T inputs tuples. Sys-tems achieving this baseline have been previously pro-posed, based on differing underlying hardware mecha-nisms. VC3 is a recent system built on Intel SGX [49].

Note that in this baseline system, the MapReduce pro-visioning platform is responsible for invoking varioustrusted units of computation in hardware-isolated pro-cesses, passing encrypted data between them. In Sec-tion 3, we explain why this baseline system leaks sig-nificant information, and subsequently define a strongerprivacy objective.

2.2 Problem DefinitionIdeally, the distributed execution of the MapReduce pro-gram should leak nothing to the adversary, except the to-tal size of the input, total size of the output and the run-ning time. The aforementioned baseline system fails toachieve the ideal privacy. It leaks two types of informa-tion: (a) the input and output size, and processing timeof individual computation unit, and (b) dataflow amongthe computation units.

We stress that the leakage of (b) is significant in manyapplications since it reveals relationships among the in-put. For instance, in the well-known example of comput-ing Pagerank scores for an encrypted graph [44], flowsfrom a computation unit to another correspond to edges

3


in the input graph. Hence, leaking the dataflow essen-tially reveals the whole graph edge-structure!

Techniques for hiding or reducing the leakage in (a)by padding the input/output size and introducing timingdelays are known [35, 41]. Such measures can often re-quire algorithmic redesign of the application [9] or use ofspecialized programming languages or hardware [33,34],and can lead to large overheads for applications wherethe worst case running time is significantly larger thanthe average case. We leave incorporating these orthogo-nal defenses out-of-scope.

Instead, in this work, we advocate focusing on elimi-nating leakage on (b), while providing a formulation thatclearly captures the information that might be revealed.We formulate the admissible leakage as Ψ which cap-tures the information (a) mentioned above, namely theinput/output size and running time of each trusted com-putation unit invoked in the system. We formalize thisintuition by defining the execution as a protocol betweentrusted components and the adversary, and define our pri-vacy goal as achieving privacy modulo-Ψ.

Execution Protocol. Consider an honest execution of aprogram on input I = 〈x1,x2, . . . ,xn〉. For a given map-reduce phase, let there be n map computation units. Letus label the map computation units such that the unit withlabel i takes xi as input. Recall that the tuples generatedby the map computation units are to be shuffled, anddivided into groups according to the key-components.Let K to be the set of unique key-components and letπ : [n+1,n+m]→K be a randomly chosen permutation,where m = |K|. Next, m reduce computation units are tobe invoked. We label them starting from n+1, such thatthe computation unit i takes tuples with key-componentπ(i) as input.

Let Ii,Oi,Ti be the respective input size (measured bynumber of tuples), output size, and processing time of thecomputation unit i, and call Ψi = 〈Ii,Oi,Ti〉 the IO-profileof computation unit i. The profile Ψ of the entire execu-tion on input I is the sequence of Ψi for all computationunits i ∈ [1, . . . ,n+m] in the execution protocol. If anadversary A can initiate the above protocol and observeΨ, we say that the adversary has access to Ψ.

Now, let us consider the execution of the program onthe same input I = 〈x1,x2, . . . ,xn〉 under a MapReduceprovisioning protocol by an adversary A . A semi-honestadversary A can obtain information on the value of theinput, output and processing time of every trusted in-stance, including information on trusted instances otherthan the map and reduce computation units. If the adver-sary is malicious, it can further tamper with the inputsand invocations of the instances. In the protocol, the ad-

versary controls 6 parameters:

(C1) the start time of each computation instance,(C2) the end time of each instance,(C3) the encrypted tuples passed to its inputs,(C4) the number of computation instances,(C5) order of computation units executed,(C6) the total number of map-reduce phases executed.

Since the adversary A can obtain “more” informationand tamper the execution, a question to ask is, can theadversary A gain more knowledge compared to an ad-versary A who only has access to Ψ? Using the standardnotions of indistinguishability2 and adversaries [28], wedefine a secure protocol as follows:

Definition 1 (Privacy modulo-Ψ ). A provisioning pro-tocol for a program is modulo-Ψ private if, for any ad-versary A executing the MapReduce protocol, there is aadversary A with access only to Ψ, such that the outputof A and A are indistinguishable.

The definition states that the output of the adversariescan be directly seen as deduction made on the informa-tion available. The fact that all adversaries have outputindistinguishable from the one which knows Ψ suggeststhat no additional information can be gained by any Abeyond that implied by knowledge of Ψ.Remarks. First, our definition follows the scenario pro-posed by Canneti [11], which facilitates universal com-position. Hence, if a protocol is private module-Ψ forone map-reduce phase, then an entire sequence of phasesexecuted is private module-Ψ. Note that our proposedM2R consists of a sequence of map, shuffle, and reducephases where each phase starts only after the previousphase has completed, and the chain of MapReduce jobsare carried out sequentially. Thus, universal compositioncan be applied. Second, we point out that if the developerrestructures the original computation to make the IO-profile the same for all inputs, then Ψ leaks nothing aboutthe input. Therefore, the developer can consider using or-thogonal techniques to mask timing latencies [41], hid-ing trace paths and IO patterns [34] to achieve ideal pri-vacy, if the performance considerations permit so.

2.3 AssumptionsIn this work, we make specific assumptions about thebaseline system we build upon. First, we assume that theunderlying hardware sufficiently protects each computa-tion unit from malware and snooping attacks. The rangeof threats that are protected against varies based on theunderlying trusted computing hardware. For instance,

2non-negligible advantage in a distinguishing game

4


traditional TPMs protect against software-only attacksbut not against physical access to RAM via attacks suchas cold-boot [24]. More recent trusted computing prim-itives, such as Intel SGX [40], encrypt physical mem-ory and therefore offer stronger protection against adver-saries with direct physical access. Therefore, we do notfocus on the specifics of how to protect each computa-tion unit, as it is likely to change with varying hardwareplatform used in deployment. In fact, our design canbe implemented in any virtualization-assisted isolationthat protects user-level processes on a malicious guestOS [12, 52, 57], before Intel SGX becomes available onthe market.

Second, an important assumption we make is that ofinformation leakage via side-channels (e.g. cache laten-cies, power) from a computation unit is minimal. Indeed,it is a valid concern and an area of active research. Bothsoftware and hardware-based solutions are emerging, butthey are orthogonal to our techniques [18, 29].

Finally, to enable arbitrary computation on encrypteddata, decryption keys need to be made made availableto each hardware-isolated computation unit. This pro-visioning of client’s keys to the cloud requires a set oftrusted administrator interfaces and privileged software.We assume that such trusted key provisioning exists, asis shown in recent work [49, 65].

3 Attacks

In this section, we explain why a baseline system thatmerely encrypts the output of each computation unitleaks significantly more than a system that achieves pri-vacy modulo-Ψ. We explain various subtle attack chan-nels that our solution eliminates, with an example.Running Example. Let us consider the canonical ex-ample of the Wordcount job in MapReduce, wherein thegoal is to count the number of occurrences of each wordin a set of input files. The map operation takes one fileas input, and for each word w in the file, outputs the tu-ple 〈w,1〉. All outputs are encrypted with standard au-thenticated encryption. Each reduce operation takes asinput all the tuples with the same tuple-key, i.e. the sameword, and aggregates the values. Hence the output ofreduce operations is an encrypted list of tuples 〈w,wc〉,where wc is the frequency of word w for all input files.For simplicity, we assume that the input is a set of filesF = F1, . . . ,Fn, each file has the same number of wordsand is small enough to be processed by a map operation3.What does Privacy modulo-Ψ Achieve? Here all themap computation units output same size tuples, and af-ter grouping, each reduce unit receives tuples grouped

3Files can be processed in fixed size blocks, so this assumption iswithout any loss of generality

by words. The size of map outputs and group sizes con-stitute Ψ, and a private modulo-Ψ execution thereforeleaks some statistical information about the collection offiles in aggregate, namely the frequency distribution ofwords in F . However, it leaks nothing about the con-tents of words in the individual files — for instance, thefrequency of words in any given file, and the commonwords between any pair of files are not leaked. As weshow next, the baseline system permits a lot of inferenceattacks as it fails to achieve privacy modulo-Ψ. In fact,eliminating the remaining leakage in this example maynot be easy, as it may assume apriori knowledge aboutthe probability distribution of words in F (e.g. using dif-ferential privacy [48]).

Passive Attacks. Consider a semi-honest adversary thatexecutes the provisioning protocol, but aims to infer ad-ditional information. The adversary controls 6 parame-ters C1-C6 (Section 2.2 ) in the execution protocol. Thenumber of units (C4) and map-reduce phases executed(C6) are dependent (and implied) by Ψ in an honest exe-cution, and do not leak any additional information aboutthe input. However, parameters C1,C2,C3 and C5 maydirectly leak additional information, as explained below.

• Dataflow Patterns (Channel C3). Assume that theencrypted tuples are of the same size, and hence donot individually leak anything about the underlyingplain text. However, since the adversary constitutesthe data communication channel, it can correlate thetuples written out by a map unit and read by a spe-cific reduce unit. In the Wordcount example, theith map unit processes words in the file Fi, and thenthe intermediate tuples are sorted before being fedto reduce units. By observing which map outputsare grouped together to the same reduce unit, theadversary can learn that the word wi in file Fi is thesame as a word w j in file Fj. This is true if theyare received by the same reduce unit as one group.Thus, data access patterns leak significant informa-tion about overlapping words in files.

• Order of Execution (Channel C5). A determinis-tic order of execution of nodes in any step can re-veal information about the underlying tuples be-yond what is implied by Ψ. For instance, if theprovisioning protocol always sorts tuple-keys andassigns them to reduce units in sorted order, thenthe adversary learns significant information. In theWordCount example, if the first reduce unit alwayscorresponds to words appearing first in the sortedorder, this would leak information about specificwords processed by the reduce unit. This is not di-rectly implied by Ψ.

• Time-of-Access (Channel C1,C2) Even if data ac-cess patterns are eliminated, time-of-access is an-

5


other channel of leakage. For instance, an optimiz-ing scheduler may start to move tuples to the re-duce units even before the map step is completed(pipelining) to gain efficiency. In such cases, the ad-versary can correlate which blocks written by mapunits are read by which reduce units. If outputs ofall but the ith map unit are delayed, and the jth re-duce unit completes, then the adversary can deducethat there is no dataflow from the ith map unit to jth

reduce unit. In general, if computation units in asubsequent step do not synchronize to obtain out-puts from all units in the previous step, the time ofstart and completion leaks information.

Active Attacks. While we allow the adversary to abortthe computation session at any time, we aim to preventthe adversary from using active attacks to gain advantagein breaking confidentiality. We remind readers that in ourbaseline system, the adversary can only invoke the pro-gram with its complete input set, without tampering withany original inputs. The output tuple-set of each compu-tation unit is encrypted with an authenticated encryptionscheme, so the adversary cannot tamper with individualtuples. Despite these preliminary defenses, several chan-nels for active attacks exist:

• Tuple Tampering. The adversary may attempt to du-plicate or eliminate an entire output tuple-set pro-duced by a computation unit, even though it cannotforge individual tuples. As an attack illustration,suppose the adversary wants to learn how manywords are unique to an input file Fi. To do this, theadversary can simply drop the output of the ith mapunit. If the number of tuples in the final output re-duces by k, the tuples eliminated correspond to kunique words in Fi.

• Misrouting Tuples. The adversary can reorder in-termediate tuples or route data blocks intended forone reduce unit to another. These attacks subvertour confidentiality goals. For instance, the adver-sary can bypass the shuffler altogether and route theoutput of ith map unit to a reduce unit. The output ofthis reduce unit leaks the number of unique wordsin Fi. Similar inference attacks can be achieved byduplicating outputs of tuples in the reduce unit andobserving the result.

4 Design

Our goal is to design a MapReduce provisioning proto-col which is private modulo-Ψ and adds a small amountof the TCB to the existing MapReduce platform. We ex-plain the design choices available and our observationsthat lead to an efficient and clean security design.

MapT

mapper

MapT

Reducer

grouping

x1

x2

x3

x...

xn-1

xn

MapT MapTMapT MapT

ReduceT

Secure Shuffler

o1

o2

o3

o...

on-1

on

In+1

In+2 I

n+...In+m

On+1

On+2 O

n+...O

n+m

JOBI

MixT MixT MixT MixTMixT MixT

ReduceT ReduceT ReduceT ReduceT ReduceT

GroupT GroupT GroupT GroupT GroupT GroupT

Figure 2: The data flow in M2R. Filled components are trusted.Input, intermediate and output tuples are encrypted. The orig-inal map and reduce operations are replaced with mapT andreduceT. New components are the mixer nodes which usemixT, and another trusted component called groupT.

4.1 Architecture OverviewThe computation proceeds in phases, each consisting ofa map step, a shuffle step, and a reduce step. Figure 2depicts the 4 new trusted components our design intro-duces into the dataflow pipeline of MapReduce. Thesefour new TCB components are mapT, reduceT, mixTand groupT. Two of these correspond to the executionof map and reduce unit. They ensure that output tuplesfrom the map and reduce units are encrypted and eachtuple is of the same size. The other 2 components im-plement the critical role of secure shuffling. We explainour non-intrusive mechanism for secure shuffling in Sec-tion 4.2. Further, all integrity checks to defeat active at-tacks are designed to be distributed requiring minimalglobal synchronization. The shuffler in the MapReduceplatform is responsible for grouping tuples, and invokingreduce units on disjoint ranges of tuple-keys. On eachcluster node, the reducer checks the grouped order andthe expected range of tuples received using the trustedgroupT component. The outputs of the reduce units arethen fed back into the next round of map-reduce phase.

Minimizing TCB. In our design, a major part of theMapReduce’s software stack deals with job schedulingand I/O operations, hence it can be left outside of theTCB. Our design makes no change to the grouping andscheduling algorithms, and they are outside our TCB asshown in the Figure 2. Therefore, the design is concep-tually simple and requires no intrusive changes to be im-plemented over existing MapReduce implementations.Developers need to modify their original applications toprepare them for execution in a hardware-protected pro-

6


cess in our baseline system, as proposed in previous sys-tems [38, 39, 49]. Beyond this modification made by thebaseline system to the original MapReduce, M2R requiresa few additional lines of code to invoke the new privacy-enhancing TCB components. That is, MapReduce appli-cations need modifications only to invoke components inour TCB. Next, we explain how our architecture achievesprivacy and integrity in a MapReduce execution, alongwith the design of these four TCB components.

4.2 Privacy-Preserving ExecutionFor any given execution, we wish to ensure that eachcomputation step in a phase is private modulo-Ψ. If themap step, the shuffle step, and the reduce step are in-dividually private modulo-Ψ, by the property of serialcomposibility, the entire phase and a sequence of phasescan be shown to be private. We discuss the design ofthese steps in this section, assuming a honest-but-curiousadversary limited to passive attacks. The case of mali-cious adversaries is discussed in Section 4.3.

4.2.1 Secure Shuffling

As discussed in the previous section, the key challengeis performing secure shuffling. Consider the naive ap-proach in which we simply move the entire shuffler intothe platform TCB of each cluster node. To see why this isinsecure, consider the grouping step of the shuffler, oftenimplemented as a distributed sort or hash-based group-ing algorithm. The grouping algorithm can only processa limited number of tuples locally at each mapper, so ac-cess to intermediate tuples must go to the network duringthe grouping process. Here, network data access patternsfrom the shuffler leak information. For example, if theshuffler were implemented using a standard merge sortimplementation, the merge step leaks the relative posi-tion of the pointers in sorted sub-arrays as it fetches partsof each sub-array from network incrementally4.

One generic solution to hide data access patterns is toemploy an ORAM protocol when communicating withthe untrusted storage backend. The grouping step willthen access data obliviously, thereby hiding all corre-lations between grouped tuples. This solution achievesstrong privacy, but with an overhead of O(logk N) foreach access when the total number of tuples is N [55].Advanced techniques can be employed to reduce theoverhead to O(logN), i.e. k = 1 [43]. Nevertheless, us-ing a sorting algorithm for grouping, the total overheadbecomes O(N logk+1 N), which translates to a factor of30−100× slowdown when processing gigabytes of shuf-fled data.

4This can reveal, for instance, whether the first sub-array is strictlylesser than the first element in the second sorted sub-array.

mapper mapper mapper

mixT mixT mixT mixT

mixT mixT mixT mixT

reducer reducer reducer

mixer mixer

mixer mixer

Figure 3: High level overview of the map-mix-reduce executionusing a 2-round mix network.

A more advanced solution is to perform oblivious sort-ing using sorting networks, for example, odd-even orbitonic sorting network [23]. Such an approach hidesdata access patterns, but admits a O(log2 N) latency (ad-ditive only). However, sorting networks are often de-signed for a fixed number of small inputs and hard toadapt to tens of gigabytes of distributed data.

We make a simple observation which yields a non-intrusive solution. Our main observation is that inMapReduce and other dataflow frameworks, the se-quence of data access patterns is fixed: it consists ofcycles of tuple writes followed by reads. The reduceunits start reading and processing their inputs only af-ter the map units have finished. In our solution, we re-write intermediate encrypted tuples with re-randomizedtuple keys such that there is no linkability between the re-randomized tuples and the original encrypted map outputtuples. We observe that this step can be realized by se-cure mix networks [30]. The privacy of the computa-tion reduces directly to the problem of secure mixing.The total latency added by our solution is an additiveterm of O(logN) in the worst case. Since MapReduceshuffle step is based on sorting which already admitsO(N logN) overhead, our design retains the asymptoticruntime complexity of the original framework.

Our design achieves privacy using a cascaded mix net-work (or cascaded-mix) to securely shuffle tuples [30].The procedure consists of a cascading of κ intermedi-ate steps, as shown in Figure 3. It has κ identical steps(called mixing steps) each employing a number of trustedcomputation units called mixT units, the execution ofwhich can be distributed over multiple nodes called mix-ers. Each mixT takes a fixed amount of T tuples thatit can process in memory, and passes exactly the samenumber of encrypted tuples to all mixT units in the sub-

7


sequent step. Therefore, in each step of the cascade, themixer utilizes N/T mixT units for mixing N tuples. Atκ = log N

T , the network ensures the strongest possible un-linkability, that is, the output distribution is statisticallyindistinguishable from a random distribution [30].

Each mixT unit decrypts the tuples it receives fromthe previous step, randomly permutes them using alinear-time algorithm and re-encrypts the permuted tu-ples with fresh randomly chosen symmetric key. Thesekeys are known only to mixT units, and can be derivedusing a secure key-derivation function from a commonsecret. The processing time of mixT are padded to aconstant. Note that the re-encryption time has low vari-ance over different inputs, therefore such padding incurslow overhead.

Let Ω represents the number of input and output tu-ples of cascaded-mix with κ steps. Intuitively, when κ issufficiently large, an semi-honest adversary who has ob-served the execution does not gain more knowledge thanΩ. The following lemma states that indeed this is thecase. We present the proof in Appendix A.

Lemma 1. Cascaded-mix is private module-Ω undersemi-honest adversary, given that the underlying encryp-tion scheme is semantically secure.

4.2.2 Secure Grouping

After the mixing step, the shuffler can group the random-ized tuple keys using its original (unmodified) groupingalgorithm, which is not in the TCB. The output of thecascaded-mix is thus fed into the existing grouping al-gorithm of MapReduce, which combines all tuples withthe same tuple-key and forward them to reducers. Read-ers will notice that if the outputs of the last step of thecascaded-mix are probabilistically encrypted, this group-ing step would need to be done in a trusted component.In our design, we add a last (κ +1)-th step in the cascadeto accommodate the requirement for subsequent group-ing. The last step in the cascade uses a deterministicsymmetric encryption Fs, with a secret key s, to encryptthe key-component of the final output tuples. Specifi-cally, the 〈a,b〉 is encrypted to a ciphertext of the form〈Fs(a),E(a,b)〉, where E(·) is a probabilistic encryptionscheme. This ensures that the two shuffled tuples withthe same tuple-keys have the same ciphertext for the key-component of the tuple, and hence the subsequent group-ing algorithm can group them without decrypting the tu-ples. The secret key s is randomized in each invocation ofthe cascaded-mix, thereby randomizing the ciphertextsacross two map-reduce phases or jobs.

What the adversary gains by observing the last stepof mixing is the tuples groups which are permuted usingFs(·). Thus, if Fs(·) is a pseudorandom function family,the adversary can only learn about the size of each group,

which is already implied by Ψ. Putting it all togetherwith the Lemma 1, we have:

Theorem 1. The protocol M2R is modulo-Ψ private (un-der semi-honest adversary), assuming that the underly-ing private-key encryption is semantically secure, andFs(·) is a pseudorandom function family.

4.3 Execution IntegritySo far, we have considered the privacy of the protocolagainst honest-but-curious adversaries. However, a mali-cious adversary can deviate arbitrarily from the protocolby mounting active attacks using the 6 parameters un-der its control. In this section, we explain the techniquesnecessary to prevent active attacks.

The program execution in M2R can be viewed as adirected acyclic graph (DAG), where vertices denotetrusted computation units and edges denote the flow ofencrypted data blocks. M2R has 4 kinds of trusted com-putation units or vertices in the DAG: mapT, mixT,groupT, and reduceT. At a high-level, our integrity-checking mechanism works by ensuring that nodes at thejth level (by topologically sorted order) check the con-sistency of the execution at level j−1. If they detect thatthe adversary deviates or tampers with the execution oroutputs from level j−1, then they abort the execution.

The MapReduce provisioning system is responsiblefor invoking trusted computation units, and is free to de-cide the total number of units spawned at each level j.We do not restrict the MapReduce scheduling algorithmto decide which tuples are processed by which reduceunit, and their allocation to nodes in the cluster. How-ever, we ensure that all tuples output at level i−1 are pro-cessed at level i, and there is no duplicate. Note that thisrequirement ensures that a computation in step i startsonly after outputs of previous step are passed to it, im-plicitly synchronizes the start of the computation units atstep i. Under this constraint, it can be shown that chan-nels C1-C2 (start-end time of each computation node)can only allow the adversary to delay an entire step, ordistinguish the outputs of units within one step, which isalready implied by Ψ. We omit a detailed proof in thispaper. Using these facts, we can show that the maliciousadversary has no additional advantage compared to anhonest-but-curious adversary, stated formally below.

Theorem 2. The protocol M2R is private modulo-Ψ un-der malicious adversary, assuming that the underlyingauthenticated-encryption is semantically secure (confi-dentiality) and secure under chosen message attack (in-tegrity), and Fs(·) is a pseudorandom function family.

Proof Sketch: Given a malicious adversary A that ex-ecutes the M2R protocol, we can construct an adversary

8


A that simulates A , but only has access to Ψ in thefollowing way. To simulate A , the adversary A needsto fill in information not present in Ψ. For the outputof a trusted unit, the simulation simply fills in randomtuples, where the number of tuples is derived fromΨ. The timing information can likewise be filled-in.Whenever A deviates from the protocol and feeds adifferent input to a trusted instance, the simulationexpects the instance will halt and fills in the informationaccordingly. Note that the input to A and the inputconstructed for the simulator A could have the sameDAG of program execution, although the encryptedtuples are different. Suppose there is a distinguisherthat distinguishes A and A , let us consider the twocases: either the two DAG’s are the same or different.If there is a non-negligible probability that they are thesame, then we can construct a distinguisher to contradictthe security of the encryption, or Fs(·). If there is anon-negligible probability that they are different, we canforge a valid authentication tag. Hence, the outputs ofA and A are indistinguishable.

Integrity Checks. Nearly all our integrity checks canbe distributed across the cluster, with checking of invari-ants done locally at each trusted computation. There-fore, our integrity checking mechanism can largely bun-dle the integrity metadata with the original data. Noglobal synchronization is necessary, except for the caseof the groupT units as they consume data output by anuntrusted grouping step. The groupT checks ensurethat the ordering of the grouped tuples received by thedesignated reduceT is preserved. In addition, groupTunits synchronize to ensure that each reducer processesa distinct range of tuple-keys, and that all the tuple-keysare processed by at least one of the reduce units.

4.3.1 Mechanisms

In the DAG corresponding to a program execution, theMapReduce provisioning system assigns unique instanceids. Let the vertex i at the level j has the designatedid (i, j), and the total number of units at level j be |Vj|.When a computation instance is spawned, its designedinstance id (i, j) and the total number of units |Vj| arepassed as auxiliary input parameters by the provision-ing system. Each vertex with id (i, j) is an operationof type mapT, groupT, mixT or reduceT, denotedby the function OpType(i, j). The basic mechanismfor integrity-checking consists of each vertex emitting atagged-block as output which can be checked by trustedcomponents in the next stage. Specifically, the taggedblock is 6-tuple B = 〈O, LvlCnt, SrcID, DstID, DstLvl,

DstType 〉, where:O is the encrypted output tuple-set,LvlCnt is the number of units at source level,SrcID is the instance id of the source vertex,DstID is instance id of destination vertex or NULLDstLvl is the level of the destination vertex,DstType is the destination operation type.

In our design, each vertex with id (i, j) fetches thetagged-blocks from all vertices at the previous level, de-noted by the multiset B, and performs the following con-sistency checks on B:

1. The LvlCnt for all b ∈ B are the same (say (B)).2. The SrcID for all b ∈ B are distinct.3. For set S = SrcID(b) |b ∈ B, |S| = (B).4. For all b ∈ B, DstLvl(b) = j.5. For all b ∈ B, DstID(b) = (i, j) or NULL.6. For all b ∈ B, DstType(b) = OpType(i, j).

Conditions 1,2 and 3 ensure that tagged-blocks fromall units in the previous level are read and that they aredistinct. Thus, the adversary has not dropped or dupli-cated any output tuple. Condition 4 ensures that thecomputation nodes are ordered sequentially, that is, theadversary cannot misroute data bypassing certain levels.Condition 6 further checks that execution progresses inthe expected order — for instance, the map step is fol-lowed by a mix, subsequently followed by a group step,and so on. We explain how each vertex decides theright or expected order independently later in this sec-tion. Condition 5 states that if the source vertex wishesto fix the recipient id of a tagged-block, it can verifiablyenforce it by setting it to non-NULL value.

Each tagged-block is encrypted with standard authen-ticated encryption, protecting the integrity of all meta-data in it. We explain next how each trusted computationvertex encodes the tagged-block.Map-to-Mix DataFlow. Each mixT reads the outputmetadata of all mapT. Thus, each mixT knows the to-tal number of tuples N generated in the entire map step,by summing up the counts of encrypted tuples received.From this, each mixT independently determines the to-tal number of mixers in the system as N/T . Note thatT is the pre-configured number of tuples that each mixTcan process securely without invoking disk accesses, typ-ically a 100M of tuples. Therefore, this step is com-pletely decentralized and requires no co-ordination be-tween mixT units. A mapT unit invoked with id (i, j)simply emit tagged-blocks, with the following structure:〈·, |Vj|,(i, j),NULL, j+1,mixT〉.Mix-to-Mix DataFlow. Each mixT re-encrypts and per-mutes a fixed number (T ) of tuples. In a κ-step cascadedmix network, at any step s (s < κ − 1) the mixT out-puts T/m tuples to each one of the m mixT units in the

9


step s+ 1. To ensure this, each mixT adds metadata toits tagged-block output so that it reaches only the speci-fied mixT unit for the next stage. To do so, we use theDstType field, which is set to type mixTs+1 by the mixerat step s. Thus, each mixT node knows the total num-ber of tuples being shuffled N (encoded in OpType), itsstep number in the cascaded mix, and the public value T .From this each mixT can determine the correct numberof cascade steps to perform, and can abort the executionif the adversary tries to avoid any step of the mixing.

Mix-to-Group DataFlow. In our design, the last mixstep (i, j) writes the original tuple as 〈Fs(k),(k,v, tctr)〉,where the second part of this tuple is protected withauthenticated-encryption. The value tctr is called atuple-counter, which makes each tuple globally distinctin the job. Specifically, it encodes the value (i, j,ctr)where ctr is a counter unique to the instance (i, j). Theassumption here is that all such output tuples will begrouped by the first component, and each group will beforwarded to reducers with no duplicates. To ensure thatthe outputs received are correctly ordered and untam-pered, the last mixT nodes send a special tagged-blockto groupT nodes. This tagged-block contains the countof tuples corresponding to Fs(k) generated by mixT unitwith id (i, j). With this information each groupT nodecan locally check that:

• For each received group corresponding to g=Fs(k),the count of distinct tuples (k, ·, i, j,ctr) it receivestallies with that specified in the tagged-block re-ceived from mixT node (i, j), for all blocks in B.

Finally, groupT units need to synchronize to check ifthere is any overlap between tuple-key ranges. Thisrequires an additional exchange of tokens betweengroupT units containing the range of group keys andtuple-counters that each unit processes.

Group-to-Reduce Dataflow. There is a one-to-onemapping between groupT units and reduceT units,where the former checks the correctness of the tuplegroup before forwarding to the designated reduceT.This communication is analogous to that between mixTunits, so we omit a detailed description for brevity.

5 Implementation

Baseline Setup. The design of M2R can be imple-mented differently depending on the underlying archi-tectural primitives available. For instance, we could im-plement our solution using Intel SGX, using the mech-anisms of VC3 to achieve our baseline. However, IntelSGX is not yet available in shipping CPUs, thereforewe use a trusted-hypervisor approach to implement the

baseline system, which minimizes the performance over-heads from the baseline system. We use Intel TXT tosecurely boot a trusted Xen-4.4.3 hypervisor kernel, en-suring its static boot integrity5. The inputs ands output ofmap and reduce units are encrypted with AES-GCM us-ing 256-bit keys. The original Hadoop jobs are executedas user-level processes in ring-3, attested at launch by thehypervisor, making an assumption that they are protectedduring subsequent execution. The MapReduce jobs aremodified to call into our TCB components implementedas x86 code, which can be compiled with SFI constraintsfor additional safety. The hypervisor loads, verifies andexecutes the TCB components within its address spacein ring-0. The rest of Hadoop stack runs in ring 3 andinvokes the units by making hypercalls. Note that theTCB components can be isolated as user-level processesin the future, but this is only meaningful if the processesare protected by stronger solutions such as Intel SGX orother systems [12, 14, 52].M2R TCB. Our main contributions are beyond the base-line system. We add four new components to the TCBof the baseline system. We have modified a stan-dard Hadoop implementation to invoke the mixT andgroupT units before and after the grouping step. Thesetwo components add a total 90 LoC to the platform TCB.No changes are necessary to the original grouping algo-rithm. Each mapT and reduceT implement the trustedmap and reduce operation — same as in the baseline sys-tem. They are compiled together with a static utility codewhich is responsible for (a) padding each tuple to a fixedsize, (b) encrypting tuples with authenticated encryption,(c) adding and verifying the metadata for tagged-blocks,and (d) recording the instance id for each unit. Most ofthese changes are fairly straightforward to implement. Toexecute an application, the client encrypts and uploadsall the data to M2R nodes. The user then submits M2Rapplications and finally decrypts the results.

6 Evaluation

This section describes M2R performance in a small clus-ter under real workloads. We ported 7 data intensive jobsfrom the standard benchmark to M2R, making less than25% changes in number of lines of code (LoC) to theoriginal Hadoop jobs. The applications add fewer than500 LoC into the TCB, or less than 0.16% of the entireHadoop software stack. M2R adds 17− 130% overheadin running time to the baseline system. We also compareM2R with another system offering the same level of pri-vacy, in which encrypted tuples are sent back to a trustedclient. We show that M2R is up to 44.6× faster compared

5Other hypervisor solutions such as TrustVisor [39], Over-shadow [14], Nova [56], SecVisor [50] could equivalently be used

10


Job LoC changed (vs.Hadoop job)

TCB increase (vs. Hadoopcodebase) Input size (vs. plaintext size) Shuffled bytes # App hyper-

calls# Platform hy-percall

Wordcount 10 (15%) 370 (0.14%) 2.1G (1.06×) 4.2G 3277173 35Index 28 (24%) 370 (0.14%) 2.5G (1.15×) 8G 3277173 59Grep 13 (13%) 355 (0.13%) 2.1G (1.06×) 75M 3277174 10Aggregate 16 (18%) 395 (0.15%) 2G (1.19×) 289M 18121377 12Join 30 (22%) 478 (0.16%) 2G (1.19×) 450M 11010647 14Pagerank 42 (20%) 429 (0.15%) 2.5G (4×) 2.6G 1750000 21KMeans 113 (7%) 400 (0.12%) 1G (1.09×) 11K 12000064 8

Table 1: Summary of the porting effort and TCB increase for various M2R applications, and the application runtime cost factors. Number of apphypercalls consists of both mapT and reduceT invocations. Number of platform hypercalls include groupT and mixT invocations.

Job Baseline (vs. no en-cryption)

M2R (% increase vs.baseline)

Download-and-compute(× M2R)

Wordcount 570 (221) 1156 (100%) 1859 (1.6×)Index 666 (423) 1549 (130%) 2061 (1.3×)Grep 70 (48) 106 (50%) 1686 (15.9×)Aggregate 125 (80) 205 (64%) 9140 (44.6×)Join 422 (211) 510 (20%) 5716 (11.2×)Pagerank 521 (334) 755 (44%) 1209 (1.6×)KMeans 123 (71) 145 (17%) 6071 (41.9×)

Table 2: Overall running time (s) of M2R applications in compari-son with other systems: (1) the baseline system protecting computationonly in single nodes, (2) the download-and-compute system which doesnot use trusted primitives but instead sends the encrypted tuples back totrusted servers when homomorphic encrypted computation is not pos-sible [59].

to this solution.

6.1 Setup & BenchmarksWe select a standard benchmark for evaluating Hadoopunder large workloads called HiBench suite [25]. The7 benchmark applications, listed in Table 1, cover awide range of data-intensive tasks: compute intensive(KMeans, Grep, Pagerank), shuffle intensive (Word-count, Index), database queries (Join, Aggregate), anditerative (KMeans, Pagerank). The size of the encryptedinput data is between 1 GB and 2.5 GB in these case stud-ies. Different applications have different amount of shuf-fled data, ranging from small sizes (75MB in Grep, 11Kin KMeans) to large sizes (4.2GB in Wordcount, 8GB inIndex).

Our implementation uses the Xen-4.3.3 64-bit hyper-visor compiled with trusted boot option. The rest of M2Rstack runs on Ubuntu 13.04 64-bit version. We con-duct our experiments in a cluster of commodity serversequipped with 1 quad-core Intel CPU 1.8GHz, 1TB harddrive, 8GB RAM and 1GB Ethernet cards. We vary oursetup to have between 1 to 4 compute nodes (runningmappers and reducers) and between 1 to 4 mixer nodesfor implementing a 2-step cascaded mix network. Theresults presented below are from running with 4 com-pute nodes and 4 mixers each reserving a 100MB bufferfor mixing, averaged over 10 executions.

6.2 Results: Performance

Overheads & Cost Breakdown. We observe a lin-ear scale-up with the number of nodes in the cluster,

which confirms the scalability of M2R. In our bench-marks (Table 2), we observe a total overhead of between17% − 130% over the baseline system that simply en-crypts inputs and outputs of map/reduce units, and uti-lizes none of our privacy-enhancing techniques. It canalso be seen that in all applications except for Grep andKMeans, running time is proportional to the size of datatransferred during shuffling (shuffled bytes column in Ta-ble 1). To understand the cost factors contributing to theoverhead, we measure the time taken by the secure shuf-fler, by the mapT and reduceT units, and by the rest ofthe Hadoop system which comprises the time spent onI/O, scheduling and other book-keeping tasks. This rel-ative cost breakdown is detailed in Figure 4. From theresult, we observe that the cost of the secure shuffler issignificant. Therefore, reducing the overheads of shuf-fling, by avoiding the generic ORAM solution, is well-incentivized and is critical to reducing the overall over-heads. The two main benchmarks which have high over-heads of over 100%, namely Wordcount and Index, incurthis cost primarily due to the cost of privacy-preservingshuffling a large amount of data. In benchmarks wherethe shuffled data is small (Grep, KMeans), the use ofmapT/reduceT adds relatively larger overheads thanthat from the secure shuffler. The second observation isthat the total cost of the both shuffler and other trustedcomponents is comparable to that of Hadoop, which pro-vides evidence that M2R preserves the asymptotic com-plexity of Hadoop.

Comparison to Previous Solutions. Apart from thebaseline system, a second point of comparison are pre-viously proposed systems that send encrypted tuplesto the user for private computation. Systems such asMonomi [59] and AutoCrypt [58] employ homomorphicencryption for computing on encrypted data on the singleservers. For operations that cannot be done on the serverusing partially homomorphic encryption, such Monomi-like systems forward the data to a trusted set of servers(or to the client’s private cloud) for decryption. We re-fer to this approach as download-and-compute approach.We estimate the performance of a Monomi-like systemextended to distributed computation tasks, for achiev-ing privacy equivalent to ours. To compare, we assumethat the system uses Paillier, ElGamal and randomized

11


Figure 4: Normalized break-down time for M2R applications. The run-ning time consists of the time taken by mapT and reduceT, plus thetime by the secure shuffler. The rest comes from the Hadoop runtime.

search schemes for homomorphic computation, but notOPE or deterministic schemes (since that leaks more thanM2R and our baseline system do). We run operationsthat would fall outside such the expressiveness of theallowed homomorphic operations, including shuffling,as a separate network request to the trusted client. Webatch network requests into one per MapReduce step.We assume that the network round trip latency to theclient is only 1ms — an optimistic approximation sincethe average round trip delay in the same data center is10 − 100ms [4, 61]. We find that this download-and-compute approach is slower compared to ours by a fac-tor of 1.3× to 44.6× (Table 2), with the median bench-mark running slower by 11.2×. The overheads are lowfor case-studies where most of the computation can behandled by homomorphic operations, but most of thebenchmarks require conversions between homomorphicschemes (thereby requiring decryption) [58, 59] or com-putation on plaintext values.

Platform-Specific Costs. Readers may wonder if theevaluation results are significantly affected by the choiceof our implementation platform. We find that the dom-inant costs we report here are largely complementary tothe costs incurred by the specifics of the underlying plat-form. We conduct a micro-benchmark to evaluate thecost of context-switches and the total time spent in thetrusted components to explain this aspect. In our plat-form, the cost of each hypercall (switch to trusted logic)is small (13µs), and the execution of each trusted com-ponent is largely proportional to the size of its input dataas shown in Figure 5. The time taken by the trustedcomputation grows near linearly with the input data-size,showing that the constant overheads of context-switchesand other platform’s specifics do not contribute to thereported results significantly. This implies that simpleoptimizations such as batching multiple trusted code in-vocations would not yield any significant improvements,since the overheads are indeed proportional to the totalsize of data and not the number of invocations. The totalnumber of invocations (via hypercalls) for app-specifictrusted logic (mapT, reduceT) is proportional to the to-tal number input tuples, which amounts for less than half

Figure 5: Cost of executing mapT instance of the Wordcount andAggregate job, and the cost for executing mixT. Input sizes (numberof ciphertexts per input) varies from 2 to 106.

a second overhead even for millions of input tuples. Thenumber of invocations to the other components (mixTand groupT) is much smaller (8−59) and the each invo-cation operates on large inputs of a few gigabytes; there-fore the dominant cost is not that of context-switches, butthat of the cost of multi-step shuffling operation itself andthe I/O overheads.

6.3 Results: Security & Porting Effort

Porting effort. We find that the effort to adapt all bench-marks to M2R is modest at best. For each benchmark, wereport the number of Java LoC we changed in order toinvoke the trusted components in M2R, measured usingthe sloccount tool 6. Table 1 shows that all applica-tions except for KMeans need to change fewer than 43LoC. Most changes are from data marshaling before andafter invoking the mapT and reduceT units. KMeansis more complex as it is a part of the Mahout distributionand depends on many other utility classes. Despite this,the change is only 113 LoC, or merely 7% of the originalKMeans implementation.TCB increase. We define our TCB increase as the to-tal size of the four trusted components. This representsthe additional code running on top of a base TCB, whichin our case is Xen. Note that our design can eliminatethe base TCB altogether in the future by using SGX en-claves, and only retain the main trusted components wepropose in M2R. The TCB increase comprises the per-application trusted code and platform trusted code. Theformer consists of the code for loading and executingmapT, reduceT units (213 LoC) as well as the codefor implementing their logic. Each map/reduce codebaseitself is small, fewer than 200 LoC, and runs as trustedcomponents in the baseline system itself. The platformtrusted code includes that of mixT and groupT, whichamounts to 90 LoC altogether. The entire Hadoop soft-ware stack is over 190K LoC and M2R avoids moving allof it into the TCB. Table 1 shows that all jobs have TCBincreases of fewer than 500 LoC, merely 0.16% of theHadoop codebase.Security. M2R achieves stronger privacy than previous

6http://www.dwheeler.com/sloccount

12


Job M2R Baseline (additional leakage)Wordcount # unique words + count word-file relationshipIndex # unique words + count word-file relationshipGrep nothing nothingAggregate # groups + group size record-group relationshipJoin # groups + group size record-group relationshipPagerank node in-degree whole input graphKMeans nothing nothing

Table 3: Remaining leakage of M2R applications, compared with thatin the baseline system.

platforms that propose to use encrypted computation forbig-data analysis. Our definition allows the adversary toobserve an admissible amount of information, capturedby Ψ, in the computation but hides everything else. It ispossible to quantitatively analyze the increased privacyin information-theoretic terms, by assuming the proba-bility distribution of input data [37, 53]. However, herewe present a qualitative description in Table 3 highlight-ing how much privacy is gained by the techniques in-troduced in M2R over the baseline system. For instance,consider the two case studies that incur most perfor-mance overhead (Wordcount, Index). In these examples,merely encrypting the map/reduce tuples leaks informa-tion about which file contains which words. This mayallow adversaries to learn the specific keywords in eachfile in the dataset. In M2R, this leakage is reduced tolearning only the total number of unique words in thecomplete database and the counts of each, hiding in-formation about individual files. Similarly, M2R hideswhich records are in which group for database opera-tions (Aggregate and Join). For Pagerank, the baselinesystem leaks the complete input graph edge structure,giving away which pair of nodes has an edge, whereasM2R reduces this leakage to only the in-degree of graphvertices. In the two remaining case studies, M2R providesno additional benefit over the baseline.

7 Related Work

Privacy-preserving data processing. One of M2R’s goalis to offer large-scale data processing in a privacy pre-serving manner on untrusted clouds. Most systems withthis capability are in the database domain, i.e. sup-porting SQL queries processing. CryptDB [47] takes apurely cryptographic approach, showing the practicalityof using partially homomorphic encryption schemes [3,15, 45, 46, 54]. CryptDB can only work on a small setof SQL queries and therefore is unable to support ar-bitrary computation. Monomi [59] supports more com-plex queries, by adopting the download-and-compute ap-proach for complex queries. As shown in our evaluation,such an approach incurs an order of magnitude largeroverheads.

There exist alternatives supporting outsourcing of

query processing to a third party via server-side trustedhardware, e.g. IBM 4764/5 cryptographic co-processors.TrustedDB [7] demonstrated that a secure outsourceddatabase solution can be built and run at a fraction ofthe monetary cost of any cryptography-enabled privatedata processing. However, the system requires expensivehardware and a large TCB which includes the entire SQLserver stack. Cipherbase improves upon TrustedDB byconsidering encrypting data with partially homomorphicschemes, and by introducing a trusted entity for queryoptimization [6]. M2R differs to these systems in twofundamental aspects. First, it supports general compu-tation on any type of data, as opposed to being restrictedto SQL and structured database semantics. Second, andmore importantly, M2R provides confidentiality in a dis-tributed execution environment which introduces morethreats than in a single-machine environment.

VC3 is a recent system offering privacy-preservinggeneral-purpose data processing [49]. It considersMapReduce and utilizes Intel SGX to maintain a smallTCB. This system is complementary to M2R, as it fo-cuses on techniques for isolated computation, key man-agement, etc. which we do not consider. The privacymodel in our system is stronger than that of VC3 whichdoes not consider traffic analysis attacks.

GraphSC offers a similar security guarantee to thatof M2R for specialized graph-processing tasks [42]. Itprovides a graph-based programming model similar toGraphLab’s [36], as opposed to the dataflow model ex-posed by M2R. GraphSC does not employ trusted prim-itives, but it assumes two non-colluding parties. Thereare two main techniques for ensuring data-oblivious andsecure computation in GraphSC: sorting and garbled cir-cuits. However, these techniques result in large perfor-mance overheads: a small Pagerank job in GraphSC is200,000×−500,000× slower than in GraphLab withoutsecurity. M2R achieves an overhead of 2×−5× increasein running time because it leverages trusted primitives forcomputation on encrypted data. A direct comparison ofoblivious sorting used therein instead of our secure shuf-fler is a promising future work.

Techniques for isolated computation. The current im-plementation of M2R uses a trusted hypervisor based onXen for isolated computation. Overshadow [14] andCloudVisor [63] are techniques with large TCB, whereasFlicker [38] and TrustVisor [39] reduce the TCB at thecost of performance. Recently, Minibox [32] enhances aTrustVisor-like hypervisor with two-way protection pro-viding security for both the OS and the applications (orPALs). Advanced hardware-based techniques include In-tel SGX [40] and Bastion [12] provide a hardware pro-tected secure mode in which applications can be exe-cuted at hardware speed. All these techniques are com-plementary to ours.

13


Mix networks. The concept of mix network is first de-scribed in the design of untraceable electronic mail [13].Since then, a body of research has concentrated on build-ing, analyzing and attacking anonymous communicationsystems [16, 19]. Canetti presents the first definition ofsecurity that is preserved under composition [11], fromwhich others have shown that the mix network is secureunder Canetti’s framework [10, 60]. Security propertiesof cascaded mix networks were studied in [30]. We usethese theoretical results in our design.

8 Conclusion & Future Work

In this paper, we defined a model of privacy-preservingdistributed execution of MapReduce jobs. We analyzedvarious attacks channels that break data confidentialityon a baseline system which employs both encryptionand trusted computing primitives. Our new design re-alizes the defined level of security, with a significant steptowards lower performance overhead while requiring asmall TCB. Our experiments with M2R showed that thesystem requires little effort to port legacy MapReduceapplications, and is scalable.

Systems such as M2R show evidence that specializeddesigns to hide data access patterns are practical alterna-tives to generic constructions such as ORAM. The ques-tion of how much special-purpose constructions benefitimportant practical systems, as compared to generic con-structions, is an area of future work. A somewhat moreimmediate future work is to integrate our design to otherdistributed dataflow systems. Although having the simi-lar structure of computation, those systems are based ondifferent sets of computation primitives and different ex-ecution models, which presents both opportunities andchallenges for reducing the performance overheads ofour design. Another avenue for future work is to real-ize our model of privacy-preserving distributed computa-tion in the emerging in-memory big-data platforms [64],where only very small overheads from security mecha-nisms can be tolerated.

9 Acknowledgements

The first author was funded by the National ResearchFoundation, Prime Minister’s Office, Singapore, underits Competitive Research Programme (CRP Award No.NRF-CRP8-2011-08). A special thanks to Shruti Topleand Loi Luu for their help in preparing the manuscript.We thank the anonymous reviewers for their insightfulcomments that helped us improve the discussions in thiswork.

References

[1] Apache hadoop. http://hadoop.apache.org.

[2] Trusted computing group. www.trustedcomputinggroup.org.

[3] R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu. Order pre-serving encryption for numeric data. In SIGMOD, pages563–574, 2004.

[4] M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Pa-tel, B. Prabhakar, S. Sengupta, and M. Sridharan. Datacenter tcp (dctcp). In SIGCOMM, 2010.

[5] T. Alves and D. Felton. Trustzone: integrated hardwareand software security. AMD white paper, 2004.

[6] A. Arasu, S. Blanas, K. Eguro, M. Joglekar, R. Kaushik,D. Kossmann, R. Ramamurthy, P. Upadhyaya, andR. Venkatesan. Secure database-as-a-service with ci-pherbase. In SIGMOD, pages 1033–1036, 2013.

[7] S. Bajaj and R. Sion. TrustedDB: a trusted hardwarebased database with privacy and data confidentiality. InSIGMOD, pages 205–216, 2011.

[8] A. Baumann, M. Peinado, and G. Hunt. Shielding ap-plications from an untrusted cloud with haven. In OSDI,2014.

[9] M. Blanton, A. Steele, and M. Alisagari. Data-obliviousgraph algorithms for secure computation and outsourcing.In ASIACCS, pages 207–218. ACM, 2013.

[10] J. Camenisch and A. Mityagin. Mix-network withstronger security. In Privacy Enhancing Technologies,2006.

[11] R. Canetti. Universally composable security: A newparadigm for cryptographic protocols. In IEEE Sympo-sium on Foundations of Computer Science, 2001.

[12] D. Champagne and R. B. Lee. Scalable architectural sup-port for trusted software. In HPCA, 2010.

[13] D. L. Chaum. Untraceable electronic mail, return ad-dresses, and digital pseudonyms. Communications of theACM, 1981.

[14] X. Chen, T. Garfinkel, E. Lewis, P. Subrahmanyam,C. Waldspurger, D. Boneh, J. Dwoskin, and D. Ports.Overshadow: a virtualization-based approach toretrofitting protection in commodity operating systems.In ASPLOS, pages 2–13, 2008.

[15] R. Curtmola, J. A. Garay, S. Kamara, and R. Ostrovsky.Searchable symmetric encryption: improved definitionsand efficient constructions. In ACM CCS′06, 2006.

[16] G. Danezis and C. Diaz. A survey of anonymous com-munication channels. Technical report, Technical ReportMSR-TR-2008-35, Microsoft Research, 2008.

[17] J. Dean and S. Ghemawat. Mapreduce: simplified dataprocessing on large clusters. In OSDI, 2014.

[18] J. Demme, R. Martin, A. Waksman, and S. Sethumadha-van. Side-channel vulnerability factor: a metric for mea-suring information leakage. In ISCA, 2012.

14


[19] R. Dingledine. Anonymity bibliography.http://freehaven.net/anonbib/.

[20] C. Gentry. Fully homomorphic encryption using ideal lat-tices. In ACM Symposium on Theory of Computing, May-June 2009.

[21] C. Gentry and S. Halevi. A working implementation offully homomorphic encryption. In EUROCRYPT, 2010.

[22] O. Goldreich and R. Ostrovsky. Software protection andsimulation on oblivious rams. Journal of the ACM, 1996.

[23] M. T. Goodrich and M. Mitzenmacher. Privacy-preserving access of outsourced data via oblivious ramsimulation. In ICALP, 2011.

[24] J. A. Halderman, S. D. Schoen, N. Heninger, W. Clark-son, W. Paul, J. A. Calandrino, A. J. Feldman, J. Appel-baum, and E. W. Felten. Lest we remember: cold-bootattacks on encryption keys. Communications of the ACM,52(5):91–98, 2009.

[25] S. Huang, J. Huang, Y. Liu, L. Yi, and J. Dai. Hibench:a representative and comprehensive hadoop benchmarksuite. In ICDE workshops, 2010.

[26] M. Isard, M. Budiu, Y. Y. andAndrew Birrell, and D. Fet-terly. Dryad: Distributed data-parallel programs from se-quential building blocks. In Eurosys, 2007.

[27] D. Jiang, G. Chen, B. C. Ooi, K.-L. Tan, and S. Wu. epic:an extensible and scalable system for processing big data.In VLDB, 2014.

[28] J. Katz and Y. Lindell. Introduction to modern cryptogra-phy. CRC Press, 2014.

[29] T. Kim, M. Peinado, and G. Mainar-Ruiz. Stealthmem:system-level protection against cache-based side channelattacks in the cloud. In USENIX Security, 2012.

[30] M. Klonowski and M. Kutyłowski. Provable anonymityfor networks of mixes. In Information Hiding, pages 26–38. Springer, 2005.

[31] F. Li, B. C. Ooi, M. T. Ozsu, and S. Wu. Distributed datamanagement using mapreduce. ACM Computing Survey,46(6), 2014.

[32] Y. Li, J. McCune, J. Newsome, A. Perrig, B. Baker, andW. Drewry. Minbox: a two-way sandbox for x86 nativecode. In USENIX ATC, 2014.

[33] C. Liu, M. Hicks, A. Harris, M. Tiwari, M. Maas, andE. Shi. Ghostrider: A hardware-software system formemory trace oblivious computation, 2015.

[34] C. Liu, M. Hicks, and E. Shi. Memory trace obliviousprogram execution. In IEEE CSF, pages 51–65. IEEE,2013.

[35] C. Liu, Y. Huang, E. Shi, J. Katz, and M. Hicks. Automat-ing efficient ram-model secure computation. In Securityand Privacy (SP), 2014 IEEE Symposium on, pages 623–638. IEEE, 2014.

[36] Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin,and J. M. Hellerstein. Distributed graphlab: A frameworkfor machine learning and data mining in the cloud. InVLDB, 2012.

[37] L. Luu, S. Shinde, P. Saxena, and B. Demsky. A modelcounter for constraints over unbounded strings. In PLDI,page 57, 2014.

[38] J. M. McCun, B. Parno, A. Perrig, M. K. Reiter, andH. Isozaki. Flicker: An execution infrastructure for tcbminimization. In EuroSys, 2008.

[39] J. McCune, Y. Li, N. Qu, Z. Zhou, A. Datta, V. Gligor,and A. Perrig. Trustvisor: Efficient tcb reduction and at-testation. In IEEE Symposium on Security and Privacy,pages 143–158, 2010.

[40] F. McKeen, I. Alexandrovich, A. Berenzon, C. V. Rozas,H. Shafi, V. Shanbhogue, and U. R. Savagaonkar. Inno-vative instructions and software model for isolated execu-tion. In HASP, 2013.

[41] D. Molnar, M. Piotrowski, D. Schultz, and D. Wagner.The program counter security model: Automatic detec-tion and removal of control-flow side channel attacks. InInformation Security and Cryptology-ICISC 2005, pages156–168. Springer, 2006.

[42] K. Nayak, X. S. Wang, S. Ioannidis, U. Weinsberg,N. Taft, and E. Shi. GraphSC: parallel secure compu-tation made easy. In IEEE Symposium on Security andPrivacy, 2015.

[43] O. Ohrimenko. Data-oblivious algorithms for privacy-preserving access to cloud storage. PhD thesis, BrownUniversity, 2014.

[44] L. Page, S. Brin, R. Motwani, and T. Winograd. Thepagerank citation ranking: bringing order to the web.Technical report, Stanford InfoLab, 1999.

[45] P. Paillier. Public-key cryptosystems based on compositedegree residuosity classes. In EUROCRYPT, May 1999.

[46] R. A. Popa, F. H. Li, and N. Zeldovich. An ideal-securityprotocol for order-preserving encoding. In IEEE Sympo-sium on Security and Privacy, pages 463–477, 2013.

[47] R. A. Popa, C. M. S. Redfield, N. Zeldovich, and H. Bal-akrishnan. Cryptdb: Protecting confidentiality with en-crypted query processing. In SOSP, 2011.

[48] I. Roy, S. T. V. Setty, A. Kilzer, V. Shmatikov, andE. Witchel. Airavat: Security and privacy for mapreduce.In NSDI, 2010.

[49] F. Schuster, M. Costa, C. Fournet, C. Gkantsidis,M. Peinado, G. Mainar-Ruiz, and M. Russinovich. Vc3:Trustworthy data analytics in the cloud. Technical report,Microsoft Research, 2014.

[50] A. Seshadri, M. Luk, N. Qu, and A. Perrig. Secvisor: atiny hypervisor to provide lifetime kernel code integrityfor commodity oses. In SOSP, pages 335–50, 2007.

[51] S. Shinde, Z. L. Chua, V. Narayanan, and P. Saxena. Pre-venting your faults from telling your secrets: Defensesagainst pigeonhole attacks. CoRR, abs/1506.04832, 2015.

[52] S. Shinde, S. Tople, D. Kathayat, and P. Saxena. Podarch:Protecting legacy applications with a purely hardware tcb.Technical report, National University of Singapore, 2015.

15


[53] V. Shmatikov and M.-H. Wang. Measuring relationshipanonymity in mix networks. In ACM workshop on Pri-vacy in electronic society, pages 59–62. ACM, 2006.

[54] D. X. Song, D. Wagner, and A. Perrig. Practical tech-niques for searches on encrypted data. In IEEE Sympo-sium on Security and Privacy, May 2000.

[55] E. Stefanov, M. van Dijk, E. Shi, C. Fletcher, L. Ren,X. Yu, and S. Devadas. Path oram: an extremely simpleoblivious ram protocol. In CCS, 2013.

[56] U. Steinberg and B. Kauer. Nova: a microhypervisor-based secure virtualization architecture. In Eurosys, 2010.

[57] J. Szefer and R. B. Lee. Architectural support forhypervisor-secure virtualization. In ASPLOS, 2012.

[58] S. Tople, S. Shinde, Z. Chen, and P. Saxena. AU-TOCRYPT: enabling homomorphic computation onservers to protect sensitive web content. In ACM CCS,pages 1297–1310, 2013.

[59] S. Tu, M. F. Kaashoek, S. Madden, and N. Zeldovich.Processing analytical queries over encrypted data. InVLDB, 2013.

[60] D. Wikstrom. A universally composable mix-net. In The-ory of Cryptography. 2004.

[61] C. Wilson, H. Ballani, T. Karagiannis, and A. Rowtron.Better never than late: meeting deadlines in datacenternetworks. In SIGCOMM, 2011.

[62] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma,M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica.Resilient distributed dataset: A fault-tolerant abstractionfor in-memory cluster computing. In NSDI, 2012.

[63] F. Zhang, J. Chen, H. Chen, and B. Zang. Cloudvisor:retrofitting protection of virtual machines in multi-tenantcloud with nested virtualization. In SOSP, pages 203–216, 2011.

[64] H. Zhang, G. Chen, B. C. Ooi, K.-L. Tan, and M. Zhang.In-memory big data management and processing: a sur-vey. TKDE, 27(7):1920–1948, 2015.

[65] Z. Zhou, J. Han, Y.-H. Lin, A. Perrig, and V. Gligor. Kiss:”key it simple and secure” corporate key management. InTRUST, 2013.

Appendix A Security Analysis

Proof (Lemma 1):Consider the “ideal mixer” that takes as input a se-

quence 〈x1, . . . ,xN〉 where each xi ∈ [1,N], picks a per-mutation p : [1,N]→ [1,N] randomly and then output thesequence 〈xp(1),xp(2), . . . ,xp(N)〉. Klonowski et al. [30]investigated the effectiveness of the cascaded network ofmixing, and showed that O(log N

T ) steps are suffice to

bring the distribution of the mixed sequence statisticallyclose to the output of the ideal mixer, where T is thenumber of items an instance can process in memory. Ourproof relies on the above-mentioned result.

Let us assume that κ , the number of steps carried outby cascaded-mix, is sufficiently large such that the distri-bution of the mixed sequence is statistically close to theideal mixer.

Consider an adversary S that executes the cascaded-mix. Let us construct an adversary A who simulates Sbut only has access to Ω. To fill in the tuple values notpresent in Ω, the simulation simply fills in random tuples.Note that the number of tuples can be derived from Ω.

Now, suppose that on input x1, . . . ,xN , the output ofA and S can be distinguished by D . We want toshow that this contradicts the semantic security of theunderlying encryption scheme, by constructing a distin-guisher D who can distinguish multiple ciphertexts fromrandom with polynomial-time sampling (i.e. the distin-guisher sends the challenger multiple messages, and re-ceive more than one sample).

Let z = 〈z1,z2, . . . ,zN〉 be the output of the mixer oninput x1, . . . ,xN . The distinguisher D asks the challengerfor a sequence of ciphertexts of z. Let ci, j’s be the ci-phertexts returned by the challenger, where ci, j is the i-thciphertexts of z j. To emulate S , likewise, D needs tofeed the simulation with the intermediate data generatedby mixT. Let yi, j be the i-th intermediate ciphertext inround j the distinguisher D generated for the emulation.The yi, j’s are generated as follow:

1. D simulates the cascaded-mix by randomly pick-ing a permutation for every mixT. Let p j : [1,N]→[1,N] be the overall permutation for round j. Letp j be the permutation that moves the i-th ciphertextin the input, to its location after j rounds. That is,p j(i) = p j( p j−1(i)), and p0(i) = i.

2. Set yi, j = cp j(i), j for each i, j.

Let v be the output of D’s simulation. Note that ifxi, j’s are random ciphertexts, then the distribution of v isthe same as the output distribution of A . On the otherhand, if xi, j’s are ciphertexts of z, then the input to theemulation is statistically close to the input of S , andthus distribution of v is statistically close to the outputdistribution of S.

Since D can distinguish output of S from A ’s, Dcan distinguish the ciphertexts of z from random.

16

Documents

M2R: Enabling Stronger Privacy in MapReduce Computation