16
Countering Targeted File Attacks using LocationGuard Mudhakar Srivatsa and Ling Liu College of Computing, Georgia Institute of Technology {mudhakar, lingliu}@cc.gatech.edu Abstract Serverless file systems, exemplified by CFS, Farsite and OceanStore, have received significant attention from both the industry and the research community. These file systems store files on a large collection of untrusted nodes that form an overlay network. They use crypto- graphic techniques to maintain file confidentiality and integrity from malicious nodes. Unfortunately, crypto- graphic techniques cannot protect a file holder from a Denial-of-Service (DoS) or a host compromise attack. Hence, most of these distributed file systems are vulner- able to targeted file attacks, wherein an adversary at- tempts to attack a small (chosen) set of files by attack- ing the nodes that host them. This paper presents Lo- cationGuard - a location hiding technique for securing overlay file storage systems from targeted file attacks. LocationGuard has three essential components: (i) lo- cation key, consisting of a random bit string (e.g., 128 bits) that serves as the key to the location of a file, (ii) routing guard, a secure algorithm that protects accesses to a file in the overlay network given its location key such that neither its key nor its location is revealed to an adversary, and (iii) a set of four location inference guards. Our experimental results quantify the overhead of employing LocationGuard and demonstrate its effec- tiveness against DoS attacks, host compromise attacks and various location inference attacks. 1 Introduction A new breed of serverless file storage services, like CFS [7], Farsite [1], OceanStore [15] and SiRiUS [10], have recently emerged. In contrast to traditional file systems, they harness the resources available at desktop worksta- tions that are distributed over a wide-area network. The collective resources available at these desktop worksta- tions amount to several peta-flops of computing power and several hundred peta-bytes of storage space [1]. These emerging trends have motivated serverless file storage as one of the most popular application over de- centralized overlay networks. An overlay network is a virtual network formed by nodes (desktop workstations) on top of an existing TCP/IP-network. Overlay networks typically support a lookup protocol. A lookup operation identifies the location of a file given its filename. Lo- cation of a file denotes the IP-address of the node that currently hosts the file. There are four important issues that need to be ad- dressed to enable wide deployment of serverless file sys- tems for mission critical applications. Efficiency of the lookup protocol. There are two kinds of lookup protocol that have been commonly deployed: the Gnutella-like broadcast based lookup protocols [9] and the distributed hash table (DHT) based lookup pro- tocols [25] [19] [20]. File systems like CFS, Farsite and OceanStore use DHT-based lookup protocols because of their ability to locate any file in a small and bounded number of hops. Malicious and unreliable nodes. Serverless file stor- age services are faced with the challenge of having to harness the collective resources of loosely coupled, in- secure, and unreliable machines to provide a secure, and reliable file-storage service. To complicate matters fur- ther, some of the nodes in the overlay network could be malicious. CFS employs cryptographic techniques to maintain file data confidentiality and integrity. Far- site permits file write and update operations by using a Byzantine fault-tolerant group of meta-data servers (di- rectory service). Both CFS and Farsite use replication as a technique to provide higher fault-tolerance and avail- ability. Targeted File Attacks. A major drawback with server- less file systems like CFS, Farsite and Ocean-Store is that they are vulnerable to targeted attacks on files. In 14th USENIX Security Symposium USENIX Association 81

Countering Targeted File Attacks using LocationGuard€¦ · A node is said to be malicious if the node either inten-tionally or unintentionally fails to follow the system’s protocols

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • Countering Targeted File Attacks using LocationGuard

    Mudhakar Srivatsa and Ling LiuCollege of Computing, Georgia Institute of Technology

    {mudhakar, lingliu}@cc.gatech.edu

    Abstract

    Serverless file systems, exemplified by CFS, Farsiteand OceanStore, have received significant attention fromboth the industry and the research community. Thesefile systems store files on a large collection of untrustednodes that form an overlay network. They use crypto-graphic techniques to maintain file confidentiality andintegrity from malicious nodes. Unfortunately, crypto-graphic techniques cannot protect a file holder from aDenial-of-Service (DoS) or a host compromise attack.Hence, most of these distributed file systems are vulner-able to targeted file attacks, wherein an adversary at-tempts to attack a small (chosen) set of files by attack-ing the nodes that host them. This paper presents Lo-cationGuard − a location hiding technique for securingoverlay file storage systems from targeted file attacks.LocationGuard has three essential components: (i) lo-cation key, consisting of a random bit string (e.g., 128bits) that serves as the key to the location of a file, (ii)routing guard, a secure algorithm that protects accessesto a file in the overlay network given its location keysuch that neither its key nor its location is revealed toan adversary, and (iii) a set of four location inferenceguards. Our experimental results quantify the overheadof employing LocationGuard and demonstrate its effec-tiveness against DoS attacks, host compromise attacksand various location inference attacks.

    1 Introduction

    A new breed of serverless file storage services, like CFS[7], Farsite [1], OceanStore [15] and SiRiUS [10], haverecently emerged. In contrast to traditional file systems,they harness the resources available at desktop worksta-tions that are distributed over a wide-area network. Thecollective resources available at these desktop worksta-tions amount to several peta-flops of computing power

    and several hundred peta-bytes of storage space [1].

    These emerging trends have motivated serverless filestorage as one of the most popular application over de-centralized overlay networks. An overlay network is avirtual network formed by nodes (desktop workstations)on top of an existing TCP/IP-network. Overlay networkstypically support a lookup protocol. A lookup operationidentifies the location of a file given its filename. Lo-cation of a file denotes the IP-address of the node thatcurrently hosts the file.

    There are four important issues that need to be ad-dressed to enable wide deployment of serverless file sys-tems for mission critical applications.

    Efficiency of the lookup protocol. There are two kindsof lookup protocol that have been commonly deployed:the Gnutella-like broadcast based lookup protocols [9]and the distributed hash table (DHT) based lookup pro-tocols [25] [19] [20]. File systems like CFS, Farsite andOceanStore use DHT-based lookup protocols because oftheir ability to locate any file in a small and boundednumber of hops.

    Malicious and unreliable nodes. Serverless file stor-age services are faced with the challenge of having toharness the collective resources of loosely coupled, in-secure, and unreliable machines to provide a secure, andreliable file-storage service. To complicate matters fur-ther, some of the nodes in the overlay network couldbe malicious. CFS employs cryptographic techniquesto maintain file data confidentiality and integrity. Far-site permits file write and update operations by using aByzantine fault-tolerant group of meta-data servers (di-rectory service). Both CFS and Farsite use replication asa technique to provide higher fault-tolerance and avail-ability.

    Targeted File Attacks. A major drawback with server-less file systems like CFS, Farsite and Ocean-Store isthat they are vulnerable to targeted attacks on files. In

    14th USENIX Security SymposiumUSENIX Association 81

  • 14th USENIX Security Symposium

    a targeted attack, an adversary is interested in compro-mising a small set of target files through a DoS attackor a host compromise attack. A denial-of-service attackwould render the target file unavailable; a host compro-mise attack could corrupt all the replicas of a file therebyeffectively wiping out the target file from the file sys-tem. The fundamental problem with these systems isthat: (i) the number of replicas (R) maintained by thesystem is usually much smaller than the number of ma-licious nodes (B), and (ii) the replicas of a file are storedat publicly known locations. Hence, malicious nodes caneasily launch DoS or host compromise attacks on the setof R replica holders of a target file (R � B).

    Efficient Access Control. A read-only file system likeCFS can exercise access control by simply encryptingthe contents of each file, and distributing the keys only tothe legal users of that file. Farsite, a read-write file sys-tem, exercises access control using access control lists(ACL) that are maintained using a Byzantine-fault-tolerantprotocol. However, access control is not truly distributedin Farsite because all users are authenticated by a smallcollection of directory-group servers. Further, PKI (public-key Infrastructure) based authentication and Byzantinefault tolerance based authorization are known to be moreexpensive than a simple and fast capability-based accesscontrol mechanism [6].

    In this paper we present LocationGuard as an effec-tive technique for countering targeted file attacks. Thefundamental idea behind LocationGuard is to hide thevery location of a file and its replicas such that, a legaluser who possesses a file’s location key can easily andsecurely locate the file on the overlay network; but with-out knowing the file’s location key, an adversary wouldnot be able to even locate the file, let alone access it orattempt to attack it. Further, an adversary would not beeven able to learn if a particular file exists in the file sys-tem or not. LocationGuard comprises of three essentialcomponents. The first component of LocationGuard is alocation key, which is a 128-bit string used as a key tothe location of a file in the overlay network. A file’s lo-cation key is used to generate legal capabilities (tokens)that can be used to access its replicas. The second com-ponent is the routing guard, a secure algorithm to locatea file in the overlay network given its location key suchthat neither the key nor the location is revealed to anadversary. The third component is an extensible collec-tion of location inference guards, which protect the sys-tem from traffic analysis based inference attacks, such aslookup frequency inference attacks, end-user IP-address

    inference attacks, file replica inference attacks, and filesize inference attacks. LocationGuard presents a carefulcombination of location key, routing guard, and locationinference guards, aiming at making it very hard for anadversary to infer the location of a target file by eitheractively or passively observing the overlay network.

    In addition traditional cryptographic guarantees likefile confidentiality and integrity, LocationGuardmitigatesdenial-of-service (DoS) and host compromise attacks,while adding very little performance overhead and veryminimal storage overhead to the file system. Our initialexperiments quantify the overhead of employing Loca-tionGuard and demonstrate its effectiveness against DoSattacks, host compromise attacks and various locationinference attacks.

    The rest of the paper is organized as follows. Sec-tion 2 provides terminology and background on over-lay network and serverless file systems like CFS andFarsite. Section 3 describes our threat model in detail.We present an abstract functional description of Loca-tionGuard in Section 4. Section 4, 5 describes the de-sign of our location keys and Section 6 presents a de-tailed description of the routing guard. We outline a briefdiscussion on overall system management in Section 8and present a thorough experimental evaluation of Loca-tionGuard in Section 9. Finally, we present some relatedwork in Section 10, and conclude the paper in Section11.

    2 Background and Terminology

    In this section, we give a brief overview on the vitalproperties of DHT-based overlay networks and their lookupprotocols (e.g., Chord [25], CAN [19], Pastry [20], Tapestry[3]). All these lookup protocols are fundamentally basedon distributed hash tables, but differ in algorithmic andimplementation details. All of them store the mappingbetween a particular search key and its associated data(file) in a distributed manner across the network, ratherthan storing them at a single location like a conventionalhash table. Given a search key, these techniques locateits associated data (file) in a small and bounded num-ber of hops within the overlay network. This is realizedusing three main steps. First, nodes and search keys arehashed to a common identifier space such that each nodeis given a unique identifier and is made responsible fora certain set of search keys. Second, the mapping of

    USENIX Association82

  • search keys to nodes uses policies like numerical close-ness or contiguous regions between two node identifiersto determine the (non-overlapping) region (segment) thateach node will be responsible for. Third, a small andbounded lookup cost is guaranteed by maintaining a tinyrouting table and a neighbor list at each node.

    In the context of a file system, the search key canbe a filename and the identifier can be the IP addressof a node. All the available node’s IP addresses arehashed using a hash function and each of them store asmall routing table (for example, Chord’s routing tablehas only m entries for an m-bit hash function and typi-cally m = 128) to locate other nodes. Now, to locate aparticular file, its filename is hashed using the same hashfunction and the node responsible for that file is obtainedusing the concrete mapping policy. This operation of lo-cating the appropriate node is called a lookup.

    Serverless file system like CFS, Farsite and Ocean-Store are layered on top of DHT-based protocols. Thesefile systems typically provide the following properties:(1) A file lookup is guaranteed to succeed if and only ifthe file is present in the system, (2) File lookup termi-nates in a small and bounded number of hops, (3) Thefiles are uniformly distributed among all active nodes,and (4) The system handles dynamic node joins and leaves.

    In the rest of this paper, we assume that Chord [25] isused as the overlay network’s lookup protocol. However,the results presented in this paper are applicable for mostDHT-based lookup protocols.

    3 Threat Model

    Adversary refers to a logical entity that controls and co-ordinates all actions by malicious nodes in the system.A node is said to be malicious if the node either inten-tionally or unintentionally fails to follow the system’sprotocols correctly. For example, a malicious node maycorrupt the files assigned to them and incorrectly (mali-ciously) implement file read/write operations. This defi-nition of adversary permits collusions among maliciousnodes.

    We assume that the underlying IP-network layer maybe insecure. However, we assume that the underlyingIP-network infrastructure such as domain name service(DNS), and the network routers cannot be subverted bythe adversary.

    An adversary is capable of performing two types ofattacks on the file system, namely, the denial-of-serviceattack, and the host compromise attack. When a nodeis under denial-of-service attack, the files stored at thatnode are unavailable. When a node is compromised, thefiles stored at that node could be either unavailable orcorrupted. We model the malicious nodes as having alarge but bounded amount of physical resources at theirdisposal. More specifically, we assume that a maliciousnode may be able to perform a denial-of-service attackonly on a finite and bounded number of good nodes, de-noted by α. We limit the rate at which malicious nodesmay compromise good nodes and use λ to denote themean rate per malicious node at which a good node canbe compromised. For instance, when there are B ma-licious nodes in the system, the net rate at which goodnodes are compromised is λ ∗B (node compromises perunit time). Note that it does not really help for one ad-versary to pose as multiple nodes (say using a virtual-ization technology) since the effective compromise ratedepends only on the aggregate strength of the adversary.Every compromised node behaves maliciously. For in-stance, a compromised node may attempt to compromiseother good nodes. Every good node that is compromisedwould independently recover at rate µ. Note that therecovery of a compromised node is analogous to clean-ing up a virus or a worm from an infected node. Whenthe recovery process ends, the node stops behaving ma-liciously. Unless and otherwise specified we assume thatthe rates λ and µ follow an exponential distribution.

    3.1 Targeted File Attacks

    Targeted file attack refers to an attack wherein an adver-sary attempts to attack a small (chosen) set of files inthe system. An attack on a file is successful if the tar-get file is either rendered unavailable or corrupted. Letfn denote the name of a file f and fd denote the datain file f . Given R replicas of a file f , file f is unavail-able (or corrupted) if at least a threshold cr number ofits replicas are unavailable (or corrupted). For example,for read/write files maintained by a Byzantine quorum[1], cr = dR/3e. For encrypted and authenticated files,cr = R, since the file can be successfully recovered aslong as at least one of its replicas is available (and un-corrupt) [7]. Most P2P trust management systems suchas [27] uses a simple majority vote on the replicas tocompute the actual trust values of peers, thus we havecr = dR/2e.

    14th USENIX Security SymposiumUSENIX Association 83

  • 14th USENIX Security Symposium

    Distributed file systems like CFS and Farsite are highlyvulnerable to target file attacks since the target file can berendered unavailable (or corrupted) by attacking a verysmall set of nodes in the system. The key problem arisesfrom the fact that these systems store the replicas of afile f at publicly known locations [13] for easy lookup.For instance, CFS stores a file f at locations derivablefrom the public-key of its owner. An adversary can at-tack any set of cr replica holders of file f , to renderfile f unavailable (or corrupted). Farsite utilizes a smallcollection of publicly known nodes for implementing aByzantine fault-tolerant directory service. On compro-mising the directory service, an adversary could obtainthe locations of all the replicas of a target file.

    Files on an overlay network have two primary at-tributes: (i) content and (ii) location. File content couldbe protected from an adversary using cryptographic tech-niques. However, if the location of a file on the over-lay network is publicly known, then the file holder issusceptible to DoS and host compromise attacks. Loca-tionGuard provides mechanisms to hide files in an over-lay network such that only a legal user who possesses afile’s location key can easily locate it. Further, an adver-sary would not even be able to learn whether a particu-lar file exists in the file system or not. Thus, any previ-ously known attacks on file contents would not be appli-cable unless the adversary succeeds in locating the file.It is important to note that LocationGuard is oblivious towhether or not file contents are encrypted. Hence, Lo-cationGuard can be used to protect files whose contentscannot be encrypted, say, to permit regular expressionbased keyword search on file contents.

    4 LocationGuard

    4.1 Overview

    We first present a high level overview of LocationGuard.Figure 1 shows an architectural overview of a file systempowered by LocationGuard. LocationGuard operates ontop of an overlay network ofN nodes. Figure 2 providesa sketch of the conceptual design of LocationGuard. Lo-cationGuard scheme guards the location of each file andits access with two objectives: (1) to hide the actual lo-cation of a file and its replicas such that only legal userswho hold the file’s location key can easily locate thefile on the overlay network, and (2) to guard lookupson the overlay network from being eavesdropped by an

    adversary. LocationGuard consists of three core compo-nents. The first component is location key, which con-trols the transformation of a filename into its location onthe overlay network, analogous to a traditional crypto-graphic key that controls the transformation of plaintextinto ciphertext. The second component is the routingguard, which makes the location of a file unintelligible.The routing guard is, to some extent, analogous to a tra-ditional cryptographic algorithm which makes a file’scontents unintelligible. The third component of Loca-tionGuard includes an extensible package of location in-ference guards that protect the file system from indirectattacks. Indirect attacks are those attacks that exploit afile’s metadata information such as file access frequency,end-user IP-address, equivalence of file replica contentsand file size to infer the location of a target file on theoverlay network. In this paper we focus only on the firsttwo components, namely, the location key and the rout-ing guard. For a detailed discussion on location infer-ence guards refer to our tech-report [23].

    In the following subsections, we first present the mainconcepts behind location keys and location hiding (Sec-tion 4.2) and describe a reference model for serverlessfile systems that operate on LocationGuard (Section 4.3).Then we present the concrete design of LocationGuard’score components: the location key (Section 5), and therouting guard (Section 6).

    4.2 Concepts and Definitions

    In this section we define the concept of location keys andits location hiding properties. We discuss the concretedesign of location key implementation and how locationkeys and location guards protect a file system from tar-geted file attacks in the subsequent sections.

    Consider an overlay network of sizeN with a Chord-like lookup protocol Γ. Let f1, f2, · · · , fR denote theRreplicas of a file f . Location of a replica f i refers to theIP-address of the node (replica holder) that stores replicaf i. A file lookup algorithm is defined as a function thataccepts f i and outputs its location on the overlay net-work. Formally we have Γ : f i → loc maps a replica f ito its location loc on the overlay network P .

    Definition 1 Location Key: A location key lk of a filef is a relatively small amount (m-bit binary string, typi-cally m = 128) of information that is used by a Lookupalgorithm Ψ : (f, lk) → loc to customize the transfor-

    USENIX Association84

  • Figure 1: LocationGuard: System Architecture

    Figure 2: LocationGuard: Conceptual Design

    mation of a file into its location such that the followingthree properties are satisfied:

    1. Given the location key of a file f , it is easy to lo-cate the R replicas of file f .

    2. Without knowing the location key of a file f , it ishard for an adversary to locate any of its replicas.

    3. The location key lk of a file f should not be ex-posed to an adversary when it is used to access thefile f .

    Informally, location keys are keys with location hidingproperty. Each file in the system is associated with alocation key that is kept secret by the users of that file.A location key for a file f determines the locations ofits replicas in the overlay network. Note that the lookupalgorithmΨ is publicly known; only a file’s location keyis kept secret.

    Property 1 ensures that valid users of a file f caneasily access it provided they know its location key lk.Property 2 guarantees that illegal users who do not havethe correct location key will not be able to locate the fileon the overlay network, making it harder for an adver-sary to launch a targeted file attack. Property 3 warrantsthat no information about the location key lk of a file fis revealed to an adversary when executing the lookupalgorithm Ψ.

    Having defined the concept of location key, we presenta reference model for a file system that operates on Lo-cationGuard. We use this reference model to present aconcrete design of LocationGuard’s three core compo-nents: the location key, the routing guard and the loca-tion inference guards.

    4.3 Reference Model

    A serverless file system may implement read/write oper-ations by exercising access control in a number of ways.For example, Farsite [1] uses an access control list main-tained among a small number of directory servers througha Byzantine fault tolerant protocol. CFS [7], a read-onlyfile system, implements access control by encrypting thefiles and distributing the file encryption keys only to thelegal users of a file. In this section we show how a Lo-cationGuard based file system exercises access control.

    In contrast to other serverless file systems, a Loca-tionGuard based file system does not directly authen-ticate any user attempting to access a file. Instead, ituses location keys to implement a capability-based ac-cess control mechanism, that is, any user who presentsthe correct file capability (token) is permitted access tothat file. In addition to file token based access control,LocationGuard derives the encryption key for a file fromits location key. This makes it very hard for an adversaryto read file data from compromised nodes. Furthermore,it utilizes routing guard and location inference guards tosecure the locations of files being accessed on the over-lay network. Our access control policy is simple: if youcan name a file, then you can access it. However, we donot use a file name directly; instead, we use a pseudo-filename (128-bit binary string) generated from a file’sname and its location key (see Section 5 for detail). Theresponsibility of access control is divided among the fileowner, the legal file users, and the file replica holdersand is managed in a decentralized manner.

    File Owner. Given a file f , its owner u is responsible forsecurely distributing f ’s location key lk (only) to thoseusers who are authorized to access the file f .

    14th USENIX Security SymposiumUSENIX Association 85

  • 14th USENIX Security Symposium

    Legal User. A user u who has obtained the valid loca-tion key of file f is called a legal user of f . Legal usersare authorized to access any replica of file f . Given afile f ’s location key lk, a legal user u can generate thereplica location token rlti for its ith replica. Note thatwe use rlti as both the pseudo-filename and the capa-bility of f i. The user u now uses the lookup algorithmΨ to obtain the IP-address of node r = Ψ(rlti) (pseudo-filename rlti). User u gains access to replica f i by pre-senting the token rlti to node r (capability rlti).

    Non-malicious Replica Holder. Assume that a node ris responsible for storing replica f i. Internally, node rstores this file content under a file name rlti. Note thatnode r does not need to know the actual file name (f )of a locally stored file rlti. Also, by design, given theinternal file name rlti, node r cannot guess its actualfile name (see Section 5). When a node r receives aread/write request on a file rlti it checks if a file namedrlti is present locally. If so, it directly performs the re-quested operation on the local file rlti. Access controlfollows from the fact that it is very hard for an adversaryto guess correct file tokens.

    Malicious Replica Holder. Let us consider the casewhere the node r that stores a replica f i is malicious.Note that node r’s response to a file read/write requestcan be undefined. Note that we have assumed that thereplicas stored at malicious nodes are always under at-tack (recall that up to cr − 1 out of R file replicas couldbe unavailable or corrupted). Hence, the fact that a mali-cious replica holder incorrectly implements file read/writeoperation or that the adversary is aware of the tokensof those file replicas stored at malicious nodes does notharm the system. Also, by design, an adversary whoknows one token rlti for replica f i would not be able toguess the file name f or its location key lk or the tokensfor others replicas of file f (see Section 5).

    Adversary. An adversary cannot access any replica offile f stored at a good node simply because it cannotguess the token rlti without knowing its location key.However, when a good node is compromised an adver-sary would be able to directly obtain the tokens for allfiles stored at that node. In general, an adversary couldcompile a list of tokens as it compromises good nodes,and corrupt the file replicas corresponding to these to-kens at any later point in time. Eventually, the adversarywould succeed in corrupting cr or more replicas of a filef without knowing its location key. LocationGuard ad-dresses such attacks using location rekeying technique

    discussed in Section 7.3.

    In the subsequent sections, we show how to generatea replica location token rlti (1 ≤ i ≤ R) from a filef and its location key (Section 5), and how the lookupalgorithm Ψ performs a lookup on a pseudo-filenamerlti without revealing the capability rlti to maliciousnodes in the overlay network (Section 6). It is impor-tant to note that the ability of guarding the lookup fromattacks like eavesdropping is critical to the ultimate goalof file location hiding scheme, since a lookup operationusing a lookup protocol (such as Chord) on identifierrlti typically proceeds in plain-text through a sequenceof nodes on the overlay network. Hence, an adversarymay collect file tokens by simply sniffing lookup queriesover the overlay network. The adversary could use thesestolen file tokens to perform write operations on the cor-responding file replicas, and thus corrupt them, withoutthe knowledge of their location keys.

    5 Location Keys

    The first and most simplistic component of LocationGuardis the concept of location keys. The design of locationkey needs to address the following two questions: (1)How to choose a location key? (2) How to use a locationkey to generate a replica location token − the capabilityto access a file replica?

    The first step in designing location keys is to deter-mining the type of string used as the identifier of a lo-cation key. Let user u be the owner of a file f . User ushould choose a long random bit string (128-bits) lk asthe location key for file f . The location key lk shouldbe hard to guess. For example, the key lk should not besemantically attached to or derived from the file name(f ) or the owner name (u).

    The second step is to find a pseudo-random functionto derive the replica location tokens rlti (1 ≤ i ≤ R)from the filename f and its location key lk. The pseudo-filename rlti is used as a file replica identifier to locatethe ith replica of file f on the overlay network. LetElk(x) denote a keyed pseudo-random function with in-put x and a secret key lk and ‖ denotes string concatena-tion. We derive the location token rlti =Elk(fn ‖ i) (re-call that fn denotes the name of file f ). Given a replica’sidentifier rlti, one can use the lookup protocol Ψ to lo-cate it on the overlay network. The function E shouldsatisfy the following conditions:

    USENIX Association86

  • 1a) Given (fn ‖ i) and lk it is easy to computeElk(fn ‖i).

    2a) Given (fn ‖ i) it is hard to guess Elk(fn ‖ i)without knowing lk.

    2b) Given Elk(fn ‖ i) it is hard to guess the file namefn.

    2c) Given Elk(fn ‖ i) and the file name fn it is hardto guess lk.

    Condition 1a) ensures that it is very easy for a valid userto locate a file f as long as it is aware of the file’s loca-tion key lk. Condition 2a), states that it should be veryhard for an adversary to guess the location of a targetfile f without knowing its location key. Condition 2b)ensures that even if an adversary obtains the identifierrlti of replica f i, he/she cannot deduce the file name f .Finally, Condition 2c) requires that even if an adversaryobtains the identifiers of one or more replicas of file f ,he/she would not be able to derive the location key lkfrom them. Hence, the adversary still has no clue aboutthe remaining replicas of the file f (by Condition 2a).Conditions 2b) and 2c) play an important role in ensur-ing good location hiding property. This is because forany given file f , some of the replicas of file f could bestored at malicious nodes. Thus an adversary could beaware of some of the replica identifiers. Finally, observethat Condition 1a) and Conditions {2a), 2b), 2c)} mapto Property 1 and Property 2 in Definition 1 (in Section4.2) respectively.

    There are a number of cryptographic tools that sat-isfies our requirements specified in Conditions 1a), 2a),2b) and 2c). Some possible candidates for the functionE are (i) a keyed-hash function like HMAC-MD5 [14],(ii) a symmetric key encryption algorithm like DES [8]or AES [16], and (iii) a PKI based encryption algorithmlike RSA [21]. We chose to use a keyed-hash functionlike HMAC-MD5 because it can be computed very ef-ficiently. HMAC-MD5 computation is about 40 timesfaster than AES encryption and about 1000 times fasterthan RSA encryption using the standard OpenSSL li-brary [17]. In the remaining part of this paper, we usekhash to denote a keyed-hash function that is used toderive a file’s replica location tokens from its name andits secret location key.

    6 Routing guard

    The second and fundamental component of LocationGuardis the routing guard. The design of routing guard aims atsecuring the lookup of file f such that it will be very hardfor an adversary to obtain the replica location tokens byeavesdropping on the overlay network. Concretely, letrlti (1 ≤ i ≤ R) denote a replica location token derivedfrom the file name f , the replica number i, and f ‘s loca-tion key identifier lk. We need to secure the lookup algo-rithm Ψ(rlti) such that the lookup on pseudo-filenamerlti does not reveal the capability rlti to other nodes onthe overlay network. Note that a file’s capability rlti

    does not reveal the file’s name; but it allows an adver-sary to write on the file and thus corrupt it (see referencefile system in Section 4.3).

    There are two possible approaches to implement asecure lookup algorithm: (1) centralized approach and(2) decentralized approach. In the centralized approach,one could use a trusted location server [12] to return thelocation of any file on the overlay network. However,such a location server would become a viable target forDoS and host compromise attacks.

    In this section, we present a decentralized secure lookupprotocol that is built on top of the Chord protocol. Notethat a naive Chord-like lookup protocol Γ(rlti) cannotbe directly used because it reveals the token rlti to othernodes on the overlay network.

    6.1 Overview

    The fundamental idea behind the routing guard is as fol-lows. Given a file f ’s location key lk and replica num-ber i, we want to find a safe region in the identifier spacewhere we can obtain a huge collection of obfuscated to-kens, denoted by {OTKi}, such that, with high proba-bility, Γ(otki) = Γ(rlti), ∀otki ∈OTKi. We call otki ∈OTKi an obfuscated identifier of the token rlti. Eachtime a user u wishes to lookup a token rlti, it performsa lookup on some randomly chosen token otki from theobfuscated identifier set OTKi. Routing guard ensuresthat even if an adversary were to observe obfuscatedidentifier from the set OTKi for one full year, it wouldbe highly infeasible for the adversary to guess the tokenrlti.

    We now describe the concrete implementation of therouting guard. For the sake of simplicity, we assume a

    14th USENIX Security SymposiumUSENIX Association 87

  • 14th USENIX Security Symposium

    unit circle for the Chord’s identifier space; that is, nodeidentifiers and file identifiers are real values from 0 to 1that are arranged on the Chord ring in the anti-clockwisedirection. Let ID(r) denote the identifier of node r. If ris the destination node of a lookup on file identifier rlti,i.e., r = Γ(rlti), then r is the node that immediately suc-ceeds rlti in the anti-clockwise direction on the Chordring. Formally, r = Γ(rlti) if ID(r) ≥ rlti and thereexists no other nodes, say v, on the Chord ring such thatID(r) > ID(v) ≥ rlti.

    We first introduce the concept of safe obfuscation toguide us in finding an obfuscated identifier set OTK i

    for a given replica location token rlti. We say that anobfuscated identifier otki is a safe obfuscation of iden-tifier rlti if and only if a lookup on both rlti and otki

    result in the same physical node r. For example, in Fig-ure 3, identifier otki1 is a safe obfuscation of identifierrlti (Γ(rlti) = Γ(otki1) = r), while identifier otk

    i2 is un-

    safe (Γ(otki2) = r′ 6= r).

    We define the set OTKi as a set of all identifiers inthe range (rlti − srg, rlti), where srg denotes a safeobfuscation range (0 ≤ srg < 1). When a user intendsto query for a replica location token rlti, the user actu-ally performs a lookup on an obfuscated identifier otki =obfuscate(rlti) = rlti−random(0, srg). The functionrandom(0, srg) returns a number chosen uniformly andrandomly in the range (0, srg).

    We choose a safe value srg such that:

    (C1) With high probability, any obfuscated identifierotki is a safe obfuscation of the token rlti.

    (C2) Given a set of obfuscated identifier otki it is veryhard for an adversary to guess the actual identifierrlti.

    Note that if srg is too small condition C1 is more likelyto hold, while condition C2 is more likely to fail. Incontrast, if srg is too big, condition C2 is more likely tohold but condition C1 is more likely to fail. In our firstprototype development of LocationGuard, we introducea system defined parameter prsq to denote the minimumprobability that any obfuscation is required to be safe. Inthe subsequent sections, we present a technique to derivesrg as a function of prsq . This permits us to quantify thetradeoff between condition C1 and condition C2.

    6.2 Determining the Safe Obfuscation Range

    Observe from Figure 3 that a obfuscation rand on iden-tifier rlti is safe if rlti−rand > ID(r′), where r′ isthe immediate predecessor of node r on the Chord ring.Thus, we have rand < rlti−ID(r′). The expressionrlti−ID(r′) denotes the distance between identifiers rltiand ID(r′) on the Chord identifier ring, denoted by dist(rlti,ID(r′)). Hence, we say that a obfuscation rand is safewith respect to identifier rlti if and only if rand < dist(rlti,ID(r′)), or equivalently, rand is chosen from the range(0, dist(rlti, ID(r′))).

    We use Theorem 6.1 to show that Pr(dist(rlti, ID(r′))> x) = e−x∗N , whereN denotes the number of nodes onthe overlay network and x denotes any value satisfying0 ≤ x < 1. Informally, the theorem states that the prob-ability that the predecessor node r′ is further away fromthe identifier rlti decreases exponentially with the dis-tance. For a detailed proof please refer to our tech-report[23].

    Observe that an obfuscation rand is safe with re-spect to rlti if dist(rlti, ID(r′)) > rand, the proba-bility that a obfuscation rand is safe can be calculatedusing e−rand∗N .

    Now, one can ensure that the minimum probabilityof any obfuscation being safe is prsq as follows. Wefirst use prsq to obtain an upper bound on rand: Bye−rand∗N ≥ prsq , we have, rand ≤ −loge(prsq)N . Hence,if rand is chosen from a safe range (0, srg), where srg= −loge(prsq)N , then all obfuscations are guaranteed to besafe with a probability greater than or equal to prsq .

    For instance, when we set prsq = 1−2−20 andN =1 million nodes, srg = − loge(prsq)N = 2

    −40. Hence, ona 128-bit Chord ring rand could be chosen from a rangeof size srg = 2128 ∗ 2−40 = 288. Table 1 shows thesize of a prsq−safe obfuscation range srg for differentvalues of prsq . Observe that if we set prsq = 1, then srg= − loge(prsq)N = 0. Hence, if we want 100% safety, theobfuscation range srg must be zero, i.e., the token rlti

    cannot be obfuscated.

    Theorem 6.1 LetN denote the total number of nodes inthe system. Let dist(x, y) denote the distance betweentwo identifiers x and y on a Chord’s unit circle. Letnode r′ be the node that is the immediate predecessor foran identifier rlti on the anti-clockwise unit circle Chordring. Let ID(r′) denote the identifier of the node r′.

    USENIX Association88

  • rr’

    1

    2

    rlt

    otk

    otk

    Figure 3: Lookup Using File IdentifierObfuscation: Illustration

    1 − prsq 2−10 2−15 2−20 2−25 2−30

    srg 298 293 288 283 278

    E[retries] 2−10 2−15 2−20 2−25 2−30

    hardness (years) 238 233 228 223 218

    Table 1: Lookup Identifier obfuscation

    Then, the probability that the distance between identi-fiers rlti and ID(r′) exceeds rg is given by Pr(dist(rlti,ID(r′)) > x) = e−x∗N for some 0 ≤ x < 1.

    6.3 Ensuring Safe Obfuscation

    Given that when prsq < 1, there is small probability thatan obfuscated identifier is not safe, i.e., 1−prsq > 0. Wefirst discuss the motivation for detecting and repairingunsafe obfuscations and then describe how to guaranteegood safety by our routing guard through a self-detectionand self-healing process.

    We first motivate the need for ensuring safe obfusca-tions. Let node r be the result of a lookup on identifierrlti and node v (v 6= r) be the result of a lookup onan unsafe obfuscated identifier otki. To perform a fileread/write operation after locating the node that storesthe file f , the user has to present the location token rlti

    to node v. If a user does not check for unsafe obfusca-tion, then the file token rlti would be exposed to someother node v 6= r. If node v were malicious, then it couldmisuse the capability rlti to corrupt the file replica actu-ally stored at node r.

    We require a user to verify whether an obfuscatedidentifier is safe or not using the following check: Anobfuscated identifier otki is considered safe if and onlyif rlti ∈ (otki, ID(v)), where v = Γ(otki). By the defi-nition of v and otki, we have otki ≤ ID(v) and otki ≤rlti (rand ≥ 0). By otki ≤ rlti ≤ ID(v), node vshould be the immediate successor of the identifier rlti

    and thus be responsible for it. If the check failed, i.e.,rlti > ID(v), then node v is definitely not a succes-sor of the identifier rlti. Hence, the user can flag otki

    as an unsafe obfuscation of rlti. For example, refer-ring Figure 3, otki1 is safe because, rlt

    i ∈ (otki1, ID(r))

    and r = Γ(otki1), and otki2 is unsafe because, rlt

    i /∈(otki2, ID(r

    ′)) and r′ = Γ(otki2).

    When an obfuscated identifier is flagged as unsafe,the user needs to retry the lookup operation with a newobfuscated identifier. This retry process continues untilmax retries rounds or until a safe obfuscation is found.Since the probability of an unsafe obfuscation is extremelysmall over multiple random choices of obfuscated tokens(otki), the call for retry rarely happens. We also foundfrom our experiments that the number of retries requiredis almost always zero and seldom exceeds one. We be-lieve that using max retries equal to two would sufficeeven in a highly conservative setting. Table 1 shows theexpected number of retries required for a lookup opera-tion for different values of prsq .

    6.4 Strength of Routing guard

    The strength of a routing guard refers to its ability tocounter lookup sniffing based attacks. A typical lookupsniffing attack is called the range sieving attack. In-formally, in a range sieving attack, an adversary sniffslookup queries on the overlay network, and attempts todeduce the actual identifier rlti from its multiple obfus-cated identifiers. We show that an adversary would haveto expend 228 years to discover a replica location tokenrlti even if it has observed 225 obfuscated identifiers ofrlti. Note that 225 obfuscated identifiers would be avail-able to an adversary if the file replica f i was accessedonce a second for one full year by some legal user of thefile f .

    One can show that given multiple obfuscated identi-fiers it is non-trivial for an adversary to categorize theminto groups such that all obfuscated identifiers in a groupare actually obfuscations of one identifier. To simplify

    14th USENIX Security SymposiumUSENIX Association 89

  • 14th USENIX Security Symposium

    the description of a range sieving attack, we considerthe worst case scenario where an adversary is capable ofcategorizing obfuscated identifiers (say, based on theirnumerical proximity).

    We first concretely describe the range sieving attackassuming that prsq and srg (from Theorem 6.1) are pub-lic knowledge. When an adversary obtains an obfus-cated identifier otki, the adversary knows that the ac-tual capability rlti is definitely within the range RG =(otki, otki + srg), where (0, srg) denotes a prsq−saferange. In fact, if obfuscations are uniformly and ran-domly chosen from (0, srg), then given an obfuscatedidentifier otki, the adversary knows nothing more thanthe fact that the actual identifier rlti could be uniformlyand distributed over the range RG = (otki, otki + srg).However, if a persistent adversary obtains multiple ob-fuscated identifiers {otki1, otki2, · · · , otkinid} that belongto the same target file, the adversary can sieve the iden-tifier space as follows. Let RG1, RG2, · · · , RGnid de-note the ranges corresponding to nid random obfusca-tions on the identifier rlti. Then the capability of thetarget file is guaranteed to lie in the sieved rangeRGs =∩nidj=1RGj . Intuitively, if the number of obfuscated iden-tifiers (nid) increases, the size of the sieved range RGsdecreases. For all tokens tk ∈ RGs, the likelihood thatthe obfuscated identifiers {otki1, otki2, · · · , otkinid} areobfuscations of the identifier tk is equal. Hence, the ad-versary is left with no smart strategy for searching thesieved range RGs other than performing a brute forceattack on some random enumeration of identifiers tk ∈RGs.

    Let E[RGs] denote the expected size of the sievedrange. Theorem 6.2 shows that E[RGs] = srgnid . Hence,if the safe range srg is significantly larger than nid thenthe routing guard can tolerate the range sieving attack.Recall the example in Section 6 where prsq = 1− 2−20,N = 106, the safe range srg = 288. Suppose that atarget file is accessed once per second for one year; thisresults in 225 file accesses. An adversary who logs allobfuscated identifiers over a year could sieve the rangeto about E[|RGs|] = 263. Assuming that the adver-sary performs a brute force attack on the sieved range,by attempting a file read operation at the rate of one readper millisecond, the adversary would have tried 235 readoperations per year. Thus, it would take the adversaryabout 263/235 = 228 years to discover the actual fileidentifier. For a detailed proof of Theorem 6.2 refer toour tech-report [23].

    Table 1 summarizes the hardness of breaking the ob-fuscation scheme for different values of prsq (minimumprobability of safe obfuscation), assuming that the ad-versary has logged 225 file accesses (one access per sec-ond for one year) and that the nodes permit at most onefile access per millisecond.

    Discussion. An interesting observation follows from theabove discussion: the amount of time taken to break thefile identifier obfuscation technique is almost indepen-dent of the number of attackers. This is a desirable prop-erty. It implies that as the number of attackers increasesin the system, the hardness of breaking the file capabili-ties will not decrease. The reason for location key basedsystems to have this property is because the time takenfor a brute force attack on a file identifier is fundamen-tally limited by the rate at which a hosting node permitsaccesses on files stored locally. On the contrary, a bruteforce attack on a cryptographic key is inherently paral-lelizable and thus becomes more powerful as the numberof attackers increases.

    Theorem 6.2 Let nid denote the number of obfuscatedidentifiers that correspond to a target file. Let RGs de-note the sieved range using the range sieving attack.Let srg denote the maximum amount of obfuscation thatcould be prsq−safely added to a file identifier. Then,the expected size of range RGs can be calculated byE[|RGs|] = srgnid .

    7 Location Inference Guards

    Inference attacks over location keys refer to those at-tacks wherein an adversary attempts to infer the locationof a file using indirect techniques. We broadly classifyinference attacks on location keys into two categories:passive inference attacks and host compromise based in-ference attacks. It is important to note that none of theinference attacks described below would be effective inthe absence of collusion among malicious nodes.

    7.1 Passive inference attacks

    Passive inference attacks refer to those attacks whereinan adversary attempts to infer the location of a target fileby passively observing the overlay network. We studiedtwo passive inference attacks on location keys.

    USENIX Association90

  • The lookup frequency inference attack is based onthe ability of malicious nodes to observe the frequencyof lookup queries on the overlay network. Assumingthat the adversary knows the relative file popularity, itcan use the target file’s lookup frequency to infer its lo-cation. It has been observed that the general popularityof the web pages accessed over the Internet follows aZipf-like distribution [27]. An adversary may study thefrequency of file accesses by sniffing lookup queries andmatch the observed file access frequency profile with aactual (pre-determined) frequency profile to infer the lo-cation of a target file. This is analogous to performing afrequency analysis attack on old symmetric key cipherslike the Caesar’s cipher [26].

    The end-user IP-address inference attack is based onassumption that the identity of the end-user can be in-ferred from its IP-address by an overlay network node r,when the user requests node r to perform a lookup on itsbehalf. A malicious node r could log and report this in-formation to the adversary. Recall that we have assumedthat an adversary could be aware of the owner and thelegal users of a target file. Assuming that a user accessesonly a small subset of the total number of files on theoverlay network (including the target file) the adversarycan narrow down the set of nodes on the overlay net-work that may potentially hold the target file. Note thatthis is a worst-case-assumption; in most cases it may notpossible to associate a user with one or a small numberIP-addresses (say, when the user obtains IP-address dy-namically (DHCP [2]) from a large ISP (Internet ServiceProvider)).

    7.2 Host compromise based inference at-tacks

    Host compromise based inference attacks require the ad-versary to perform an active host compromise attack be-fore it can infer the location of a target file. We studiedtwo host compromise based inference attacks on loca-tion keys.

    The file replica inference attack attempts to infer theidentity of a file from its contents; note that an adversarycan reach the contents of a file only after it compromisesthe file holder (unless the file holder is malicious). Thefile f could be encrypted to rule out the possibility ofidentifying a file from its contents. Even when the repli-cas are encrypted, an adversary can exploit the fact thatall the replicas of file f are identical. When an adversary

    compromises a good node, it can extract a list of identi-fier and file content pairs (or a hash of the file contents)stored at that node. Note that an adversary could per-form a frequency inference attack on the replicas storedat malicious nodes and infer their filenames. Hence, ifan adversary were to obtain the encrypted contents ofone of the replicas of a target file f , it could examinethe extracted list of identifiers and file contents to obtainthe identities of other replicas. Once, the adversary hasthe locations of cr copies of a file f , the f could be at-tacked easily. This attack is especially more plausible onread-only files since their contents do not change over along period of time. On the other hand, the update fre-quency on read-write files might guard them file replicainference attack.

    File size inference attack is based on the assump-tion that an adversary might be aware of the target file’ssize. Malicious nodes (and compromised nodes) reportthe size of the files stored at them to an adversary. Ifthe size of files stored on the overlay network followsa skewed distribution, the adversary would be able toidentify the target file (much like the lookup frequencyinference attack).

    For a detailed discussion on inference attacks andtechniques to curb them please refer to our technical re-port [23]. Identifying other potential inference attacksand developing defenses against them is a part of ourongoing work.

    7.3 Location Rekeying

    In addition to the inference attacks listed above, therecould be other possible inference attacks on a LocationGuardbased file system. In due course of time, the adversarymight be able to gather enough information to infer thelocation of a target file. Location rekeying is a generaldefense against both known and unknown inference at-tacks. Users can periodically choose new location keysso as to render all past inferences made by an adversaryuseless. In order to secure the system from Biham’s keycollision attacks [4], one may associate an initializationvector (IV) with the location key and change IV ratherthan the location key itself.

    Location rekeying is analogous to rekeying of cryp-tographic keys. Unfortunately, rekeying is an expensiveoperation: rekeying cryptographic keys requires data tobe re-encrypted; rekeying location keys requires files to

    14th USENIX Security SymposiumUSENIX Association 91

  • 14th USENIX Security Symposium

    be relocated on the overlay network. Hence, it is im-portant to keep the rekeying frequency small enough toreduce performance overheads and large enough to se-cure files on the overlay network. In our experimentssection, we estimate the periodicity with which locationkeys have to be changed in order to reduce the probabil-ity of an attack on a target file.

    8 Discussion

    In this section, we briefly discuss a number of issues re-lated to security, distribution and management of Loca-tionGuard.

    Key Security. We have assumed that in LocationGuardbased file systems it is the responsibility of the legalusers to secure location keys from an adversary. If a userhas to access thousands of files then the user must be re-sponsible for the secrecy of thousands of location keys.One viable solution could be to compile all location keysinto one key-list file, encrypt the file and store it on theoverlay network. The user now needs to keep secret onlyone location key that corresponds to the key-list. This128-bit location key could be physically protected usingtamper-proof hardware devices, smartcards, etc.

    Key Distribution. Secure distribution of keys has been amajor challenge in large scale distributed systems. Theproblem of distributing location keys is very similar tothat of distributing cryptographic keys. Typically, keysare distributed using out-of-band techniques. For in-stance, one could use PGP [18] based secure email ser-vice to transfer location keys from a file owner to fileusers.

    Key Management. Managing location keys efficientlybecomes an important issue when (i) an owner owns sev-eral thousand files, and (ii) the set of legal users for a filevary significantly over time. In the former scenario, thefile owner could reduce the key management cost by as-signing one location key for a group of files. Any userwho obtains the location key for a file f would implicitlybe authorized to access the group of files to which f be-long. However, the later scenario may seriously impedethe system’s performance in situations where it may re-quire location key to be changed each time the groupmembership changes.

    The major overhead for LocationGuard arises fromkey distribution and key management. Also, location

    rekeying could be an important factor. Key security, dis-tribution andmanagement in LocationGuard using groupkey management protocols [11] are a part of our ongoingresearch work.

    Other issues that are not discussed in this paper in-clude the problem of a valid user illegally distributingthe capabilities (tokens) to an adversary, and the robust-ness of the lookup protocol and the overlay network inthe presence of malicious nodes. In this paper we as-sume that all valid users are well behaved and the lookupprotocol is robust. Readers may refer to [24] for de-tailed discussion on the robustness of lookup protocolson DHT based overlay networks.

    9 Experimental Evaluation

    In this section, we report results from our simulationbased experiments to evaluate the LocationGuard approachfor building secure wide-area network file systems. Weimplemented our simulator using a discrete event sim-ulation [8] model, and implemented the Chord lookupprotocol [25] on the overlay network compromising ofN = 1024 nodes. In all experiments reported in this pa-per, a random p = 10% ofN nodes are chosen to behavemaliciously. We set the number of replicas of a file to beR = 7 and vary the corruption threshold cr in our ex-periments. We simulated the bad nodes as having largebut bounded power based on the parameters α (DoS at-tack strength), λ (node compromise rate) and µ (noderecovery rate) (see the threat model in Section 3). Westudy the cost of using location key technique by quan-tifying the overhead of using location keys and evaluatethe benefits of the location key technique by measuringthe effectiveness of location keys against DoS and hostcompromise based target file attacks.

    9.1 LocationGuard

    Operational Overhead. We first quantify the perfor-mance and storage overheads incurred by location keys.All measurements presented in this section were obtainedon a 900 MHz Intel Pentium III processor running Red-Hat Linux 9.0. Let us consider a typical file read/writeoperation. The operation consists of the following steps:(i) generate the file replica identifiers, (ii) lookup thereplica holders on the overlay network, and (iii) processthe request at replica holders. Step (i) requires computa-

    USENIX Association92

  • tions using the keyed-hash function with location keys,which otherwise would have required computations us-ing a normal hash function. We found that the compu-tation time difference between HMAC-MD5 (a keyed-hash function) and MD5 (normal hash function) is negli-gibly small (order of a few microseconds) using the stan-dard OpenSSL library [17]. Step (ii) involves a pseudo-random number generation (few microseconds using theOpenSSL library) and may require lookups to be retriedin the event that the obfuscated identifier turns out tobe unsafe. Given that unsafe obfuscations are extremelyrare (see Table 1) retries are only required occasionallyand thus this overhead is negligible. Step (iii) adds nooverhead because our access check is almost free. Aslong as the user can present the correct filename (token),the replica holder would honor a request on that file.

    Now, let us compare the storage overhead at the usersand the nodes that are a part of the overlay network.Users need to store only an additional 128-bit locationkey (16 Bytes) along with other file meta-data for eachfile they want to access. Even a user who uses 1 mil-lion files on the overlay network needs to store only anadditional 16MBytes of location keys. Further, there isno extra storage overhead on the rest of the nodes on theoverlay network. For a detailed description of our im-plementation of LocationGuard and benchmark resultsfor file read and write operations refer to our tech-report[23].

    Denial of Service Attacks. Figure 4 shows the proba-bility of an attack for varying α and different values ofcorruption threshold (cr). Without the knowledge of thelocation of file replicas an adversary is forced to attack(DoS) a random collection of nodes in the system andhope that that at least cr replicas of the target file is at-tacked. Observe that if the malicious nodes are morepowerful (larger α) or if the corruption threshold cr isvery low with respect to the size (N ) of the network,then the probability of an attack is higher. If an adver-sary were aware of the R replica holders of a target filethen a weak collection of B malicious nodes, such asB = 102 (i.e., 10% of N ) with α = RB =

    7102 = 0.07,

    can easily attack the target file. It is also true that fora file system to handle the DoS attacks on a file withα = 1, it would require a large number of replicas (Rclose to B) to be maintained for each file. For example,in the case where B = 10% × N and N = 1024, thesystem needs to maintain as large as 100+ replicas foreach file. Clearly, without location keys, the effort re-quired for an adversary to attack a target file (i.e., make

    it unavailable) is dependent only on R, but is indepen-dent of the number of good nodes (G) in the system. Onthe contrary, the location key based techniques scale thehardness of an attack with the number of good nodes inthe system. Thus even with a very small R, the locationkey based system can make it very hard for any adver-sary to launch a targeted file attack.

    Host Compromise Attacks. To further evaluate the ef-fectiveness of location keys against targeted file attacks,we evaluate location keys against host compromise at-tacks. Our first experiment on host compromise attackshows the probability of an attack on the target file as-suming that the adversary can not collect capabilities (to-kens) stored at the compromised nodes. Hence, the tar-get file is attacked if cr or more of its replicas are storedat either malicious nodes or compromised nodes. Figure5 shows the probability of an attack for different valuesof corruption threshold (cr) and varying ρ = µλ (mea-sured in number of node recoveries per node compro-mise). We ran the simulation for a duration of 100λ timeunits. Recall that 1λ denotes the mean time required forone malicious node to compromise a good node. Notethat if the simulation were run for infinite time then theprobability of attack is always one. This is because, atsome point in time, cr or more replicas of a target filewould be assigned to malicious nodes (or compromisednodes) in the system.

    From Figure 5 we observe that when ρ ≤ 1, the sys-tem is highly vulnerable since the node recovery rate islower than the node compromise rate. Note that whilea DoS attack could tolerate powerful malicious nodes(α > 1), the host compromise attack cannot tolerate thesituation where the node compromise rate is higher thantheir recovery rate (ρ ≤ 1). This is primarily becauseof the cascading effect of host compromise attack. Thelarger the number of compromised nodes we have, thehigher is the rate at which other good nodes are compro-mised (see the adversary model in Section 3). Table 2shows the mean fraction of good nodes (G′) that are inan uncompromised state for different values of ρ. Ob-serve from Table 2 that when ρ = 1, most of the goodnodes are in compromised state.

    As we have mentioned in Section 4.3, the adversarycould collect the capabilities (tokens) of the file replicasstored at compromised nodes; these tokens can be usedby the adversary at any point in future to corrupt thesereplicas using a simple write operation. Hence, our sec-ond experiment on host compromise attack measures the

    14th USENIX Security SymposiumUSENIX Association 93

  • 14th USENIX Security Symposium

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    3 3.5 4 4.5 5 5.5 6 6.5 7

    Pro

    babi

    lity

    of S

    ucce

    ssfu

    l Atta

    ck

    Corruption Threshold (cr)

    α = 0.5α = 1.0α = 2.0α = 4.0

    Figure 4: Probability of a Target FileAttack for N = 1024 nodes and

    R = 7 using DoS Attack

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    3 3.5 4 4.5 5 5.5 6 6.5 7

    Pro

    babi

    lity

    of S

    ucce

    ssfu

    l Atta

    ck

    Corruption Threshold (cr)

    ρ = 1.1ρ = 1.2ρ = 1.5ρ = 3.0

    Figure 5: Probability of a Target FileAttack for N = 1024 nodes and

    R = 7 using Host Compromise Attack(with no token collection)

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 0.2 0.4 0.6 0.8 1

    Pro

    babi

    lity

    of S

    ucce

    ssfu

    l Atta

    ck

    Fraction of Good Nodes Compromised

    ρ = 1.1ρ = 1.2ρ = 1.5ρ = 3.0

    Figure 6: Probability of a Target FileAttack for N = 1024 nodes and

    R = 7 using Host Compromise Attackwith token collection fromcompromised nodes

    ρ 0.5 1.0 1.1 1.2 1.5 3.0G′ 0 0 0.05 0.44 0.77 0.96

    Table 2: Mean Fraction of Good Nodes in Uncompromised State (G′)

    probability of a attack assuming that the adversary col-lects the file tokens stored at compromised nodes. Figure6 shows the mean effort required to locate all the replicasof a target file (cr = R). The effort required is expressedin terms of the fraction of good that need to be compro-mised by the adversary to attack the target file.

    Note that in the absence of location keys, an adver-sary needs to compromise at most R good nodes in or-der to succeed a targeted file attack. Clearly, locationkey based techniques increase the required effort by sev-eral orders of magnitude. For instance, when ρ = 3, anadversary has to compromise 70% of the good nodes inthe system in order to attain the probability of an attackto a nominal value of 0.1, even under the assumptionthat an adversary collects file capabilities from compro-mised nodes. If an adversary compromises every goodnode in the system once, it gets to know the tokens ofall files stored on the overlay network. In Section 7.3we had proposed location re-keying to protect the filesystem from such attacks. The exact period of locationre-keying can be derived from Figure 6. For instance,when ρ = 3, if a user wants to retain the attack prob-ability below 0.1, the time interval between re-keyingshould equal the amount of time it takes for an adver-sary to compromise 70% of the good nodes in the sys-tem. Table 3 shows the time taken (normalized by 1λ )for an adversary to increase the attack probability on atarget file to 0.1 for different values of ρ. Observe thatas ρ increases, location re-keying can be more and moreinfrequent.

    Lookup Guard. We performed the following exper-iments on our lookup identifier obfuscation technique(see Section 6): (i) studied the effect of obfuscation rangeon the probability of a safe obfuscation, (ii) measuredthe number of lookup retries, and (iii) measured the ex-pected size of the sieved range (using the range sievingattack). We found that a safe range for identifier ob-fuscation is very large even for large values of sq (veryclose to 1); we observed that number of lookup retries isalmost zero and seldom exceeds one; we found that thesize of the sieved range is too large to attempt a bruteforce attack even if file accesses over one full year werelogged. Finally, our experimental results very closelymatch the analytical results shown in Table 1. For moredetails on our experimental results refer to our tech-report[23].

    10 Related Work

    Serverless distributed file systems like CFS [7], Farsite[1], OceanStore [15] and SiRiUS [10] have received sig-nificant attention from both the industry and the researchcommunity. These file systems store files on a largecollection of untrusted nodes that form an overlay net-work. They use cryptographic techniques to ensure filedata confidentiality and integrity. Unfortunately, crypto-graphic techniques cannot protect a file holder from DoSor host compromise attacks. LocationGuard presentslow overhead and highly effective techniques to guard

    USENIX Association94

  • ρ 0.5 1.0 1.1 1.2 1.5 3.0Re-keying Interval 0 0 0.43 1.8 4.5 6.6

    Table 3: Time Interval between Location Re-Keying (normalized by 1λtime units)

    a distributed file system from such targeted file attacks.

    The secure Overlay Services (SOS) paper [13] de-scribes an architecture that proactively prevents DoS at-tacks using secure overlay tunneling and routing via con-sistent hashing. However, the assumptions and the appli-cations in [13] are noticeably different from that of ours.For example, the SOS paper uses the overlay networkfor introducing randomness and anonymity into the SOSarchitecture to make it difficult for malicious nodes to at-tack target applications of interest. LocationGuard tech-niques treat the overlay network as a part of the targetapplications we are interested in and introduce random-ness and anonymity through location key based hashingand lookup based file identifier obfuscation, making itdifficult for malicious nodes to target their attacks on asmall subset of nodes in the system, who are the replicaholders of the target file of interest.

    The Hydra OS [6], [22] proposed a capability-basedfile access control mechanism. LocationGuard can beviewed as an implementation of capability-based accesscontrol on a wide-area network. The most importantchallenge for LocationGuard is that of keeping a file’scapability secret and yet being able to perform a lookupon it (see Section 6).

    Indirect attacks such as attempts to compromise cryp-tographic keys from the system administrator or use faultattacks like RSA timing attacks, glitch attacks, hard-ware and software implementation bugs [5] have beenthe most popular techniques to attack cryptographic al-gorithms. Similarly, attackers might resort to inferenceattacks on LocationGuard since a brute force attack (evenwith range sieving) on location keys is highly infeasi-ble. Due to space restrictions we have not been ableto include location inference guards in this paper. Fordetails on location inference guards, refer to our tech-report [23].

    11 Conclusion

    In this paper we have proposed LocationGuard for se-curing wide area serverless file sharing systems from

    targeted file attacks. Analogous to traditional crypto-graphic keys that hide the contents of a file, LocationGuardhide the location of a file on an overlay network. Lo-cationGuard retains traditional cryptographic guaranteeslike file data confidentiality and integrity. In addition,LocationGuard guards a target file from DoS and hostcompromise attacks, provides a simple and efficient ac-cess control mechanism and adds minimal performanceand storage overhead to the system. We presented exper-imental results that demonstrate the effectiveness of ourtechniques against targeted file attacks. In conclusion,LocationGuard mechanisms make it possible to buildsimple and secure wide-area network file systems.

    AcknowledgementsThis research is partially supported by NSF CNS,

    NSF ITR, IBM SUR grant, and HP Equipment Grant.Any opinions, findings, and conclusions or recommen-dations expressed in the project material are those of theauthors and do not necessarily reflect the views of thesponsors.

    References

    [1] A. Adya, W. Bolosky, M. Castro, G. Cermak,R. Chaiken, J. R. Douceur, J. Howell, J. R. Lorch,M. Theimer, and R. P. Wattenhofer. Farsite: Fed-erated, available and reliable storage for an incom-pletely trusted environment. In Proceedings of the5th International Symposium on OSDI, 2002.

    [2] I. R. Archives. RFC 2131: Dy-namic host configuration protocol.http://www.faqs.org/rfcs/rfc2131.html.

    [3] J. K. B. Zhao and A. Joseph. Tapestry: An in-frastructure for fault-tolerance wide-area locationand routing. Technical Report UCB/CSD-01-1141,University of California, Berkeley, 2001.

    [4] E. Biham. How to decrypt or even substitute DES-encrypted messages in 228 steps. In InformationProcessing Letters, 84, 2002.

    14th USENIX Security SymposiumUSENIX Association 95

  • 14th USENIX Security Symposium

    [5] D. Boneh and D. Brumley. Remote timing attacksare practical. In Proceedings of the 12th USENIXSecurity Symposium, 2003.

    [6] E. Cohen and D. Jefferson. Protection in the hydraoperating system. In Proceeding of the ACM Sym-posium on Operating Systems Principles, 1975.

    [7] F. Dabek, M. F. Kaashoek, D. Karger, R. Morris,and I. Stoica. Wide-area cooperative storage withCFS. In Proceedings of the 18th ACM SOSP, Oc-tober 2001.

    [8] FIPS. Data encryption standard (DES).http://www.itl.nist.gov/fipspubs/fip46-2.htm.

    [9] Gnutella. The gnutella home page.http://gnutella.wego.com/.

    [10] E. J. Goh, H. Shacham, N. Modadugu, andD. Boneh. SiRiUS: Securing remote untrusted stor-age. In Proceedings of NDSS, 2003.

    [11] H. Harney and C. Muckenhirn. Group keymanagement protocol (GKMP) architecture.http://www.rfc-archive.org/getrfc.php?rfc=2094.

    [12] T. Jaeger and A. D. Rubin. Preserving integrity inremote file location and retrieval. In Proceedingsof NDSS, 1996.

    [13] A. Keromytis, V. Misra, and D. Rubenstein. SOS:Secure overlay services. In Proceedings of theACM SIGCOMM, 2002.

    [14] H. Krawczyk, M. Bellare, andR. Canetti. RFC 2104 - HMAC: Keyed-hashing for message authentication.http://www.faqs.org/rfcs/rfc2104.html.

    [15] J. Kubiatowics, D. Bindel, Y. Chen, S. Czerwin-ski, P. Eaton, D. Geels, R. Gummadi, S. Rhea,H. Weatherspoon, W. Weimer, C. Wells, andB. Zhao. OceanStore: An architecture for global-scale persistent storage. In Proceedings of the 9thInternational Conference on Architectural Supportfor Programming Languages and Operating Sys-tems, November 2000.

    [16] NIST. AES: Advanced encryption standard.http://csrc.nist.gov/CryptoToolkit/aes/.

    [17] OpenSSL. OpenSSL: The open source toolkit forssl/tls. http://www.openssl.org/.

    [18] PGP. Pretty good privacy. http://www.pgp.com/.

    [19] S. Ratnasamy, P. Francis, M. Handley, R. Karp, andS. Shenker. A scalable content-addressable net-work. In Proceedings of SIGCOMM Annual Con-ference on Data Communication, Aug 2001.

    [20] A. Rowstron and P. Druschel. Pastry: Scalable,distributed object location and routing for large-scale peer-to-peer systems. In Proceedings ofthe 18th IFIP/ACM International Conference onDistributed Systems Platforms (Middleware 2001),Nov 2001.

    [21] RSA. RSA security - public-key cryptography standards (pkcs).http://www.rsasecurity.com/rsalabs/pkcs/.

    [22] J. S. Shapiro, J. M. Smith, and D. J. Farber. EROS:A fast capability system. In Proceedings of 17thACM Symposium on Operating Systems Principles,1999.

    [23] M. Srivatsa and L. Liu. Countering targeted file at-tacks using location keys. Technical Report GIT-CERCS-04-31, Georgia Institute of Technology,2004.

    [24] M. Srivatsa and L. Liu. Vulnerabilities and securityissues in structured overlay networks: A quantita-tive analysis. In Proceedings of the Annual Com-puter Security Applications Conference (ACSAC),2004.

    [25] I. Stoica, R. Morris, D. Karger, M. Kaashoek,and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. InProceedings of SIGCOMM Annual Conference onData Communication, August 2001.

    [26] M. World. The caesar cipher.http://www.mathworld.com.

    [27] L. Xiong and L. Liu. Peertrust: Supportingreputation-based trust for peer-to-peer electroniccommunities. In Proceedings of IEEE TKDE, Vol.16, No. 7, 2004.

    USENIX Association96