Distributed Detection of Node Replication Attacks in ...suri/Spr05/secure5A.pdf · in Sensor Networks Bryan Parnoy Adrian Perrigz Carnegie Mellon University fparno, [email protected]

Distributed Detection of Node Replication Attacksin Sensor Networks∗

Bryan Parno† Adrian Perrig‡

Carnegie Mellon University{parno, perrig}@cmu.edu

Virgil Gligor§

University of [email protected]

Abstract

The low-cost, off-the-shelf hardware components inunshielded sensor-network nodes leave them vulnera-ble to compromise. With little effort, an adversary maycapture nodes, analyze and replicate them, and sur-reptitiously insert these replicas at strategic locationswithin the network. Such attacks may have severe con-sequences; they may allow the adversary to corrupt net-work data or even disconnect significant parts of the net-work. Previous node replication detection schemes de-pend primarily on centralized mechanisms with singlepoints of failure, or on neighborhood voting protocolsthat fail to detect distributed replications. To addressthese fundamental limitations, we propose two new al-gorithms based on emergent properties [17], i.e., prop-erties that arise only through the collective action ofmultiple nodes. Randomized Multicast distributes nodelocation information to randomly-selected witnesses, ex-ploiting the birthday paradox to detect replicated nodes,while Line-Selected Multicast uses the topology of thenetwork to detect replication. Both algorithms pro-vide globally-aware, distributed node-replica detection,and Line-Selected Multicast displays particularly strongperformance characteristics. We show that emergent al-gorithms represent a promising new approach to sensornetwork security; moreover, our results naturally extendto other classes of networks in which nodes can be cap-tured, replicated and re-inserted by an adversary.

∗The views and conclusions contained in this paper are those of theauthors and should not be interpreted as representing the official poli-cies, either expressed or implied, of Bosch, Carnegie Mellon Univer-sity, NSF, the Army Research Office, the Army Research Laboratory,the U.S. Government or any of its agencies.

†Bryan Parno is supported in part by an NDSEG Fellowship, whichis sponsored by the Department of Defense.

‡This research was supported in part by CyLab at Carnegie Mellonunder grant DAAD19-02-1-0389 from the Army Research Office, andgrant CAREER CNS-0347807 from NSF, and by a gift from Bosch.

§Virgil Gligor was supported in part by the U.S. Army ResearchOffice under Award No. DAAD19-01-1-0494, and by the U.S. ArmyResearch Laboratory under Cooperative Agreement DAAD19-01-2-0011 for the Collaborative Technology Alliance for Communicationsand Networks.

1 Introduction

The ease of deploying sensor networks contributes totheir appeal. They can quickly scale to large configura-tions, since administrators can simply drop new sensorsinto the desired locations in the existing network. Tojoin the network, new nodes require neither administra-tive intervention nor interaction with a base station; in-stead, they typically initiate simple neighbor discoveryprotocols [6, 13] by broadcasting their prestored creden-tials (e.g., their unique ID and/or the unique ID of theirkeys).

Unfortunately, sensor nodes typically employ low-cost commodity hardware components unprotected bythe type of physical shielding that could preclude ac-cess to a sensor’s memory, processing, sensing and com-munication components. Cost considerations make itimpractical to use shielding that could detect pressure,voltage, and temperature changes [11, 33, 36] that anadversary might use to access a sensor’s internal state.Deploying unshielded sensor nodes in hostile environ-ments enables an adversary to capture, replicate, and in-sert duplicated nodes at chosen network locations withlittle effort. Thus, if the adversary compromises evena single node, she can replicate it indefinitely, spread-ing her influence throughout the network. If left un-detected, node replication leaves any network vulner-able to a large class of insidious attacks. Using repli-cated nodes, the adversary can subvert data aggregationprotocols by injecting false data or suppressing legiti-mate data. Further, blame for abnormal behavior cannow be spread across the replicas, reducing the likeli-hood that any one node exceeds the detection thresh-old. Even more insidiously, node replicas placed at ju-diciously chosen locations can revoke legitimate nodesand disconnect the network by triggering correct execu-tion of node-revocation protocols that rely on thresholdvoting schemes [6, 10, 13, 27].

Previous approaches for detecting node replicationtypically rely on centralized monitoring, since localizedvoting systems [6, 27] cannot detect distributed replica-tion. Centralized schemes require all of the nodes in thenetwork to transfer a list of their neighbors’ claimed lo-

cations1 to a central base station that can examine thelists for conflicting location claims. Like all centralizedapproaches, this creates a single-point of failure. If theadversary can compromise the base-station or interferewith its communications, then the centralized approachwill fail. Also, the nodes surrounding the base stationare subjected to an undue communication burden thatmay shorten the network’s life expectancy.

In this paper, we use two different emergent algo-rithms to provide the first examples of globally-awaredistributed node-replication detection systems. Theemergent nature of our algorithms makes them ex-tremely resilient to active attacks, and both protocolsseek to minimize power consumption by limiting com-munication, while still operating within the extremelylimited memory capacity of typical sensor nodes. Anemergent algorithm leverages the features that no indi-vidual node can provide, but that emerge through thecollaboration of many nodes. Our first protocol, Ran-domized Multicast, distributes location claims to a ran-domly selected set of witness nodes. The Birthday Para-dox predicts that a collision will occur with high prob-ability if the adversary attempts to replicate a node.Our second protocol, Line-Selected Multicast, exploitsthe routing topology of the network to select witnessesfor a node’s location and utilizes geometric probabil-ity to detect replicated nodes. This protocol has mod-est communication and memory requirements. Further-more, our solutions apply equally well to any class ofnetwork in which the adversary can capture, replicateand insert additional nodes. Examples include wirelessad hoc networks and peer-to-peer networks. We arguethat such networks require the resiliency of emergent se-curity techniques to resist an adversary that can subvertan arbitrary number of nodes at unpredictable locations.We expect that distributed algorithms based on emer-gent properties will provide the best defenses for attacksagainst these systems.

In the following section, we provide a more detaileddescription of the node replication attack that we planto thwart, and we supply a summary of notation usedthroughout the paper. Then, in Section 3 we summa-rize some of the earlier proposals and explain why theyfail to prevent replication attacks. After discussing somepreliminary approaches to distributed detection in Sec-tion 4, we present and analyze our two primary pro-tocols, Randomized Multicast and Line-Selected Mul-ticast, in Sections 5 and 6 respectively. We compareand contrast the protocols, discuss synchronization andauthentication issues and generalize our algorithms inSection 8. Finally, we review related research in Sec-

1To prevent the adversary from using the location information tofind and disable nodes, we could instead broadcast a locator uniqueto the node’s neighborhood that would reveal less information but stillbe verifiable by the neighbors. For example, the locator could consistof the node’s list of neighbors. If the list becomes prohibitively long,each node can broadcast the list to its neighbors but sign a hash of thelist. The neighbors verify that they are on the list, check the hash, andthen only propagate the hash value, instead of the entire list.

tion 9 and present our future work and conclusions inSections 10 and 11.

2 Background

2.1 Goals

For a given sensor network, we would like to detecta node replication attack, i.e., an attempt by the ad-versary to add one or more nodes to the network thatuse the same ID as another node in the network. Ideally,we would like to detect this behavior without centralizedmonitoring, since centralized solutions suffer from sev-eral inherent drawbacks (see Section 3.1). The schemeshould also revoke the replicated nodes, so that non-faulty nodes in the network cease to communicate withany nodes injected in this fashion.

We evaluate each protocol’s security by examiningthe probability of detecting an attack given that the ad-versary inserts L replicas of a subverted node. The pro-tocol must provide robust detection even if the adver-sary captures additional nodes. We also evaluate the ef-ficiency of each protocol. In a sensor network, commu-nication (both sending and receiving) requires at leastan order of magnitude more power than any other oper-ation [14], so our first priority must be minimizing com-munication, both for the network as a whole and for theindividual nodes (since hotspots will quickly exhaust anode’s power supply). Moreover, sensor nodes typicallyhave a limited amount of memory, often on the order of afew kilobytes [14]. Thus, any protocol requiring a largeamount of memory will be impractical.

2.2 Sensor Network Environments

A sensor network typically consists of hundreds, oreven thousands, of small, low-cost nodes distributedover a wide area. The nodes are expected to functionin an unsupervised fashion even if new nodes are added,or old nodes disappear (e.g., due to power loss or acci-dental damage). While some networks include a centrallocation for data collection, many operate in an entirelydistributed manner, allowing the operators to retrieve ag-gregated data from any of the nodes in the network. Fur-thermore, data collection may only occur at irregular in-tervals. For example, many military applications striveto avoid any centralized and fixed points of failure. In-stead, data is collected by mobile units (e.g., unmannedaerial units, foot soldiers, etc.) that access the sensornetwork at unpredictable locations and utilize the firstsensor node they encounter as a conduit for the informa-tion accumulated by the network. Since these networksoften operate in an unsupervised fashion for long periodsof time, we would like to detect a node replication attacksoon after it occurs. If we wait until the next data col-lection cycle, the adversary has time to use its presencein the network to corrupt data, decommission legitimate

nodes, or otherwise subvert the network’s intended pur-pose.

We also assume that the adversary cannot readily cre-ate new IDs for nodes. Newsome et al. describe sev-eral techniques to prevent the adversary from deployingnodes with arbitrary IDs [27]. For example, we can tieeach node’s ID to the unique knowledge it possesses. Ifthe network uses a key predistribution scheme [6, 13],then a node’s ID could correspond to the set of secretkeys it shares with its neighbors (e.g., a node’s ID isgiven by the hash of its secret keys). In this system, anadversary gains little advantage by claiming to possessan ID without actually holding the appropriate keys. As-suming the sensor network implements this safeguard,an adversary cannot create a new ID without guessingthe appropriate keys (for most systems, this is infeasi-ble), so instead the adversary must capture and clone alegitimate node.

2.3 Adversary Model

In examining the security of a sensor network, wetake a conservative approach by assuming that the ad-versary has the ability to surreptitiously capture a lim-ited number of legitimate sensor nodes. We limit thepercentage of nodes captured, since an adversary thatcan capture most or all of the nodes in the networkcan obviously subvert any protocol running in the net-work. Having captured these nodes, the adversary canemploy arbitrary attacks on the nodes to extract theirprivate information. For example, the adversary mightexploit the unshielded nature of the nodes to read theircryptographic information from memory. The adversarycould then clone the node by loading the node’s crypto-graphic information onto multiple generic sensor nodes.Since sensor networks are inherently designed to facili-tate ad hoc deployment, these clones can then be easilyinserted into arbitrary locations within the network, sub-ject only to the constraint that each inserted node sharesat least one key with some of its neighbors. We allowall of the nodes under the adversary’s control to commu-nicate and collaborate, but we make the simplifying as-sumption that any cloned node has at least one legitimatenode as a neighbor. In Section 8.4, we show how we canremove this assumption while retaining security. We as-sume that the adversary operates in a stealthy manner,attempting to avoid detection, since detection could trig-ger an automated protocol to sweep the network, using atechnique such as SWATT [32] to remove compromisednodes, or draw human attention and/or intervention. Inthe following discussion, we will also assume that nodesunder the adversary’s control (both the subverted nodesand their clones) continue to follow the protocols de-scribed. This allows us to focus on the details of theprotocols, but in Section 10, we will suggest methodsfor relaxing this assumption.

As described above, our adversary model differs fromthe Dolev-Yao adversary [9] in several respects. Tra-

ditionally used to analyze cryptographic protocols, theDolev-Yao model allows the adversary to read and writemessages at any location within the network. However,in our discussion, we restrict the adversary to read andwrite messages using only the nodes under its control.On the other hand, our model also allows the adversaryto subvert and replicate existing nodes in an adaptivemanner, capabilities not available to the Dolev-Yao ad-versary. These capabilities allow the adversary to mod-ify both the network topology and the “trust” topology,since the set of legitimate nodes changes as the adver-sary subverts nodes and inserts additional replicas.

2.4 Notation

For clarity, we list the symbols and notation usedthroughout the paper below:

n Number of nodes in the networkd Average degree of each nodep Probability a neighbor will replicate

location informationg Number of witnesses selected by

each neighborlα Location node α claims to occupy

H(M) Hash of MKα α’s public keyK−1

α α’s private key{M}K

−1α

α’s signature on MS Set of all possible node IDs

3 Previous Protocols

Thus far, protocols for detecting node replicationhave relied on a trusted base station to provide globaldetection. For the sake of completeness, we also discussthe use of localized voting mechanisms. We considerthese protocols in the abstract; for specific examples ofprevious protocols, see Section 9. Until now, it was gen-erally believed that these two alternatives exhausted thespace of possibilities. This paper expands the designspace to offer new alternatives with strong security andefficiency characteristics.

3.1 Centralized Detection

The most straightforward detection scheme requireseach node to send a list of its neighbors and their claimedlocations to the base station. The base station can thenexamine every neighbor list to look for replicated nodes.If it discovers one or more replicas, it can revoke thereplicated nodes by flooding the network with an authen-ticated revocation message.

While conceptually simple, this approach suffersfrom several drawbacks inherent in a centralized system.First, the base station becomes a single point of failure.

Any compromise of the base station or the communica-tion channel around the base station will render this pro-tocol useless. Furthermore, the nodes closest to the basestation will receive the brunt of the routing load and willbecome attractive targets for the adversary. The protocolalso delays revocation, since the base station must waitfor all of the reports to come in, analyze them for con-flicts and then flood revocations throughout the network.A distributed or local protocol could potentially revokereplicated nodes in a more timely fashion. Finally, manynetworks do not have the luxury of a powerful base sta-tion, making a distributed solution a necessity.

In terms of security, this protocol achieves 100% de-tection of all replicated nodes, assuming all messagessuccessfully reach the base station. As far as efficiency,if we assume that the average path length2 to the basestation is O(

√n) and each node has an average degree d

(for d � n), then this protocol requires O(n√

n) com-munication for all of the reports from the nodes to reachthe base station. The storage required at each node isO(d). At the base station, the protocol requires O(n ·d),though storage is presumably less of a concern for thebase station.

3.2 Local Detection

To avoid relying on a central base station, we couldinstead rely on a node’s neighbors to perform replica-tion detection. Using a voting mechanism, the neigh-bors can reach a consensus on the legitimacy of a givennode. Unfortunately, while achieving detection in a dis-tributed fashion, this method fails to detect distributednode replication in disjoint neighborhoods within thenetwork. As long as the replicated nodes are at leasttwo hops away from each other, a purely local approachcannot succeed.

4 Preliminary Approaches

One might imagine addressing the shortcomings ofpreviously proposed protocols by implementing dis-tributed detection using a simple broadcast scheme, orby using deterministic replication of location claims. Tothe best of our knowledge, neither of these protocolshave been discussed in the literature. Despite their draw-backs, we discuss them to provide background and intu-ition for our two primary protocols, Randomized Mul-ticast and Line-Selected Multicast, presented in Sec-tions 5 and 6 respectively. In all four protocols, we as-sume that nodes know their own geographic positions.Numerous researchers have proposed schemes for deter-mining node location, using everything from highly ab-stract graph embeddings [28], to connectivity informa-tion [8], to powerful beacon nodes placed on the perime-ter of the network [5]. Some of these proposals require

2This will hold true if the sensor network deployment approximatesany regular polygon.

that some or all of the nodes have GPS receivers, butmany do not. For our purposes, any of these protocolswill suffice. We also assume that the nodes in the net-work remain relatively stationary, at least for the timeit takes to perform one round of replication detection.If the network designers anticipate occasional mobility,they can schedule regular detection rounds. As long as anode successfully participates in a round, it can continueto communicate until the next round, even if its positionchanges in the interim. We discuss additional timing de-tails in Section 8.2.

4.1 Node-To-Network Broadcasting

One approach to distributed detection utilizes a sim-ple broadcast protocol. Essentially, each node in the net-work uses an authenticated broadcast message to floodthe network with its location information. Each nodestores the location information for its neighbors and if itreceives a conflicting claim, revokes the offending node.

This protocol achieves 100% detection of all dupli-cate location claims under the assumption that the broad-casts reach every node. This assumption may not holdif the adversary can jam key areas or otherwise interferewith communication paths through the network. Nodescould employ redundant messages or authenticated ac-knowledgment techniques to try to thwart such an attack.In terms of efficiency, this protocol requires each nodeto store location information about its d neighbors. Onenode’s location broadcast requires O(n) messages, as-suming the nodes employ a duplicate suppression algo-rithm in which each node only broadcasts a given mes-sage once. Thus, the total communication cost for theprotocol is O(n2). Given the simplicity of the schemeand the level of security achieved, this cost may be justi-fiable for small networks. However, for large networks,the n2 factor is too costly, so we investigate schemeswith a lower cost.

4.2 Deterministic Multicast

To improve on the communication cost of the previ-ous protocol, we describe a detection protocol that onlyshares a node’s location claim with a limited subset ofdeterministically chosen “witness” nodes. When a nodebroadcasts its location claim, its neighbors forward thatclaim to a subset of the nodes called witnesses. The wit-nesses are chosen as a function of the node’s ID. If theadversary replicates a node, the witnesses will receivetwo different location claims for the same node ID. Theconflicting location claims become evidence to triggerthe revocation of the replicated node.

More formally, in this protocol, whenever node γhears a location claim lα from node α, it computesF (α) = {ω1, ω2, . . . , ωg}, where F maps each node IDin the set of possible node IDs, S, to a set of g node IDs:

F : S → {σ : σ ∈ 2S , |σ| = g} (1)

The nodes with IDs in the set {ω1, ω2, . . . , ωg} consti-tute the witnesses for node α. Node γ forwards lα toeach of these witnesses. If α claims to be at more thanone location, the witnesses will receive conflicting loca-tion claims, which they can flood through the network,discrediting α.

In this protocol, each node in the network stores g lo-cation claims on average. For communication, assumingα’s neighbors do not collaborate, we will need each ofα’s neighbors to probabilistically decide which of the ωi

to inform. If each node selects g ln g

drandom destinations

from the set of possible ωi, then the coupon collector’sproblem [7] assures us that each of the ωi’s will receiveat least one of the location claims. Assuming an aver-age network path length of O(

√n) nodes, this results in

O( g ln g√

n

d) messages. Unfortunately, this cost does not

provide much security. Since F is a deterministic func-tion, an adversary can also determine the ωis. Thus, theybecome targets for subversion. If the adversary can cap-ture or jam all g of the messages destined to the ωis, thenshe can create as many replicas of α as she desires (lim-ited only by the requirement that no two replicas sharea neighbor). Since the communication costs of this pro-tocol grow as O(g ln g), we cannot afford a large valuefor g, and yet a small value for g allows the adversary al-most unlimited replication abilities after compromisinga fixed number of nodes; in other words, if the adversarycontrols the g witnesses for α, she can create unlimitedreplicas of α and suppress the conflicting reports arriv-ing at the witness nodes. These disadvantages make thisprotocol unappealing.

5 Randomized Multicast

To improve the resiliency of the deterministic mul-ticast protocol discussed in Section 4.2, we propose anew protocol that randomizes the witnesses for a givennode’s location claim, so that the adversary cannot an-ticipate their identities. When a node announces its lo-cation, each of its neighbors sends a copy of the locationclaim to a set of randomly selected witness nodes. If theadversary replicates a node, then two sets of witnesseswill be selected. In a network of n nodes, if each loca-tion produces

√n witnesses, then the birthday paradox

predicts at least one collision with high probability, i.e.,at least one witness will receive a pair of conflicting lo-cation claims. The two conflicting locations claims formsufficient evidence to revoke the node, so the witness canflood the pair of locations claims through the network,and each node can independently confirm the revocationdecision.

5.1 Assumptions

As discussed in Section 4, our protocols assume thateach node knows its own location. We also assume thatthe network utilizes an identity-based public key sys-tem such that each node α is deployed with a private

key, K−1α , and any other node can calculate α’s public

key using α’s ID, i.e., Kα = f(α). If necessary, wecould replace this system with a more traditional PKIin which we assume the network authorities use a mas-ter public/private-key pair (KM , K−1

M ) to sign α’s pub-lic key; however, transmitting this public-key certificatewill have a substantial communication overhead.

Traditionally, researchers have assumed that publickey systems exceed the memory and computational ca-pacity of sensor nodes. However, public key cryptogra-phy on new sensor hardware may not be as prohibitiveas traditionally assumed. In recent work, Malan et al.demonstrate that they can successfully generate 163 bitECC keys on the MICA 2 in under 34 seconds [22]. Fur-thermore, the latest generation of Telos sensors comewith 10KB of RAM and can achieve 5x the data rate ofthe MICA 2, making public-key algorithms more prac-tical. In Section 8.3.2, we discuss how we could insteaduse symmetric-key cryptography to lower the computa-tional overhead, at the expense of additional communi-cation.

5.2 Protocol Description

At a high level, the protocol has each node broadcastits location claim, along with a signature authenticatingthe claim. Each of the node’s neighbors probabilisti-cally forwards the claim to a randomly selected set ofwitness nodes. If any witness receives two different lo-cation claims for the same node ID, it can revoke thereplicated node. The birthday paradox ensures that wedetect replication with high probability using a relativelylimited number of witnesses.

More formally, each node α broadcasts a locationclaim to its neighbors, β1, β2, . . . , βd. The loca-tion claim has the format 〈IDα, lα, {H(IDα, lα)}K−1

α〉,

where lα represents α’s location (e.g., geographic coor-dinates (x, y)). Upon hearing this announcement, eachneighbor, βi, verifies α’s signature and the plausibilityof lα (for example, if each node knows its own posi-tion and has some knowledge of the maximum prop-agation radius of the communication layer, then it canloosely bound α’s set of potential locations). Then, withprobability p, each neighbor selects g random locationswithin the network and uses geographic routing (e.g.,GPSR [19]) to forward α’s claim to the nodes closest tothe chosen locations (as in GHT [30]). Since we haveassumed the nodes are distributed randomly, this shouldproduce a random selection from the nodes in the net-work. In Section 5.3, we show that the probability ofselecting the same node more than once is generally neg-ligible. Collectively, the nodes chosen by the neighborsconstitute the witnesses for α.

Each witness that receives a location claim, first ver-ifies the signature. Then, it checks the ID against all ofthe location claims it has received thus far. If it ever re-ceives two different locations claims for the same nodeID, then it has detected a node replication attack, and

these two location claims serve as evidence to revoke thenode. It blacklists α from further communication by im-mediately flooding the network with the pair of conflict-ing location claims, lα and l′α. Each node receiving thispair can independently verify the signatures and agreewith the revocation decision. Thus, the sensor networkboth detects and defeats the node replication attack in afully distributed manner. Furthermore, the randomiza-tion prevents the adversary from predicting which nodewill detect the replication.

5.3 Security Analysis

Let malicious node α claim to be at L locations, l1,l2,. . . , lL. We would like to determine the probability ofa collision using the randomized multicast protocol out-lined above, since a collision at a witness correspondsto detection of α’s replication. At each location li, p · dnodes randomly select g witnesses. If the neighbors co-ordinated perfectly, this would store α’s location claimat exactly p · d · g locations. However, since we pre-fer to have each neighbor act independently, there maybe some amount of overlap between the witnesses eachneighbor selects. To determine the impact of this over-lap, we would like to determine the number of nodes,Nreceive, that will receive the location claim assumingthe neighbors choose witnesses independently. If Pclaim

is the probability that a node hears at least one claim andPnone is the probability that a node hears no locationclaims, then we have:

E[Nreceive] = n · Pclaim (2)

Pclaim = 1 − Pnone (3)

Since each neighbor is assumed to select g random,unique witness locations, the probability (Pf ) that anode fails to hear any of the g announcements from oneneighbor is:

Pf = 1 − g

n(4)

Since each neighbor decides independently whether tosend out location claims, the number of nodes that ac-tually send out location claims is distributed binomially,with mean p · d and variance d · p(1− p). For a networkwith d = 20 and p = 1

10, the variance will be less than

0.005, so we will approximate the number of neighborsthat send out locations claims as p · d. Since the neigh-bors choose their destinations independently, we have:

Pnone =

(

1 − g

n

)p·d

(5)

Combining equations 2, 3 and 5, the number of witnessnodes that receive at least one location claim is:

E[Nreceive] = n ·(

1 −(

1 − g

n

)p·d)

(6)

The Binomial Theorem allows us to approximate(1 − x)

y as (1− xy) for small x, so as long as g � n,

we have Nreceive ≈ p·d·g, so overlapping witness loca-tions should not impact the security of the protocol. Asan example, in a network with n = 10, 000, g = 100,d = 20, and p = 0.1, perfect coordination would tell200 nodes, while independent selection would tell 199.Thus, for the remainder of the analysis, we will assumethat p · d · g nodes receive each location claim.

If the adversary inserts L replicas of α, we wouldlike to determine the probability that two conflicting lo-cation reports collide at some witness node, since thiscorresponds to the probability that a witness detects thenode replication. Note that even if there are more thantwo replicas of α, we still only need two location claimsto collide in order to completely revoke all L of thereplicas, since one collision will prompt a network-wideflood of the duplicate claims (li and lj) and any othernode that has heard location claim lk for k 6= i, j willalso revoke α.

Following the standard derivation of the birthdayparadox [7], the probability Pnc1 that the p · d · g re-cipients of claim l1 do not receive any of the p · d · gcopies of claim l2 is given by:

Pnc1 =

(

1 − p · d · gn

)p·d·g

(7)

Similarly, the probability Pnc2 that the 2·p·d·g recipientsof claims l1 and l2 do not receive any of the p·d·g copiesof claim l3 is given by:

Pnc2 =

(

1 − 2 · p · d · gn

)p·d·g

(8)

Thus, the probability Pnc of no collisions at all is givenby:

Pnc =

L−1∏

i=1

(

1 − i · p · d · gn

)p·d·g

(9)

Using the standard approximation that (1+x) ≤ ex, wehave:

Pnc ≤L−1∏

i=1

e−i·p2

·d2·g2

n (10)

≤ e−p2

·d2·g2

n

� L−1i=1 i (11)

≤ e−p2

·d2·g2

n

L(L−1)2 (12)

Since the probability of a collision, Pc, is simply 1−Pnc,the probability of detecting L replicas is:

Pc ≥ 1 − e−p2

·d2·g2

n

L(L−1)2 (13)

Thus, if n = 10, 000, g = 100, d = 20, and p = 0.05,we will detect a single replication of α with probabilitygreater than 63%, and if α is replicated twice, we willdetect it with probability greater than 95%.

Unlike the deterministic proposal (Section 4.2), weno longer need to worry about the adversary using a lim-ited number of captured nodes to enable an unlimited

amount of replication. If the adversary captures neigh-boring nodes α and β, then the total number of claimsabout either node is reduced by 1

d, essentially reducing

the neighbor count of each node by one. Since all of theprotocol decisions are made locally by individual nodes,the adversary has only two options remaining: it can dis-rupt the routing of packets from the remaining legitimateneighbors or it can capture all of the legitimate neigh-bors. Routing disruptions create tell-tale signs of the ad-versary’s presence in the network and will be avoided bya prudent adversary. Capturing all of the neighbors of anode targeted for replication leads to a practical attackwhich we address in Section 8.4.

5.4 Efficiency Analysis

This scheme still poses a relatively high storage cost.On average, each node will have to store p · d · g loca-tion claims. To ensure a collision with greater than 50%probability, p·d·g will have to be on the order of O(

√n).

Even if we can reduce the size of each claim to the pay-load of a packet, 36B, our hypothetical network withn = 10, 000, g = 100, d = 20, and p = 0.05 will re-quire, on average, that each node store 3,700 B, which isjust under 91% of the Mica 2’s total RAM. Similarly, thecommunication requirements of the protocol are non-trivial. For each node, we generate p · d · g messagesthat must be evenly spread throughout the network. In anetwork randomly deployed on the unit square, the aver-age distance between any two randomly chosen nodes isapproximately 0.521

√n ≈

√n

2, so the communication

costs for the network are on the order of O(n√

n·p·d·g).As mentioned before, p · d · g ≈ O(

√n), so our com-

munication costs are O(n2), equivalent to those of thenaive broadcast scheme outlined in Section 4.1.

We can employ a number of enhancements to im-prove both the communication and space requirementsof our protocol. First, we can trade resiliency for mem-ory and communication savings. For example, in ourhypothetical network3, if we are willing to allow the ad-versary to create up to four replicas of α, then we canreduce the number of messages sent out, g, by 75%.Since communication costs are O(g2) we save on com-munication and require less than 1KB of space, but wewould still detect the adversary’s presence with proba-bility greater than 50%. We can also save both commu-nication and space by reducing the number of locationclaims sent out by 1

d, i.e. g′ = g

d. Each recipient of one

of these claims uses a broadcast message to query herneighbors as to whether they have a conflicting value.Even with these additional queries, the communicationcosts are now O(

√n · p · g) messages per node. Fi-

nally, we can reduce the storage burden by introducinga loose notion of synchronization into the process – seeSection 8.2.

3The network in which n = 10, 000, g = 100, d = 20, andp = 0.05.

6 Line-Selected Multicast

6.1 Overview

To reduce the communication costs of our ran-domized multicast protocol, we investigate a differentscheme to detect conflicting location claims. Inspiredby Braginsky and Estrin’s work on Rumor Routing [4],we note that nodes in a sensor network function bothas sensing units and as routers. For a location claim totravel from node α to node γ, it must pass through sev-eral intermediate nodes as well. If these intermediatenodes also store the location claim, then we have effec-tively drawn a line across the network. If a conflictinglocation claim ever crosses the line, then the node at theintersection will detect the conflict and initiate a revoca-tion broadcast. Since the expected number of intersec-tions, c, of x randomly drawn lines4 intersecting withinthe bounds of the unit circle is given by:

E(c) = x(x − 1)

(

1

6+

245

144π2

)

(14)

we only need a few such lines to insure an intersection.For example, with only three lines, we expect two colli-sions (see Solomon’s lecture notes [34] for details of thederivation). With this insight in mind, we can craft an al-ternate detection protocol with improved performance.

6.2 Protocol Outline

Our new protocol modifies the Randomized Multi-cast protocol, so that we fix p · d · g as a small constantr. When α’s neighbors send out the evidence of α’s lo-cation claim to the r witnesses, each of the nodes alongthe route stores a copy of the location claim as well. Forexample, let βi send a copy of α’s location claim lα to γj

via σ1, σ2 . . . σm. Upon receiving lα, σk verifies the sig-nature on the claim, checks for a conflict with the claimsalready in its buffer, stores a copy of lα in its buffer, andthen forwards lα to σk+1. If any of the nodes discov-ers a conflict, i.e., finds another location claim l′ for αsuch that lα 6= l′α, then it floods the network with theunforgeable evidence (the conflicting set of signed loca-tion claims) of α’s attempted replication, resulting in adistributed revocation of α. If the collision happens tooccur at a replica, it still does not preclude another col-lision from occurring elsewhere in the network. Also,since all protocol decisions are made locally and proba-bilistically, the adversary cannot predict the location ofthe collision, so the probability of a collision occurringat a node under the adversary’s control will be negligi-ble, unless the adversary already has an overwhelmingpresence within the network.

PSfrag replacementsX

A

B

C1

R

C2

R

C3

R

R

Figure 1. Line-Segment Intersection Given threerandomly selected points, A, B and X , then if the fourthrandomly-selected point Y falls in any of the Ci regions,the resulting quadrilateral will be convex. If Y falls inany of the R regions, the figure will be re-entrant. ForXY to intersect AB, Y must fall in region C3.PSfrag replacements

11

α′

β1′

β3′

γ1′

γ3′

β2′

γ2′

α

β1

β2 β3

γ1

γ2

γ3

σ

Figure 2. Line-Selected Multicast In this figure, theadversary has created a replica of α, α′. Neighbors(βi and βi

′) at these locations all report these claims torandomly selected witnesses γi and γi

′, which results inan intersection at σ.

6.3 Analysis

As described above, our protocol actually “draws”line-segments, not lines, through the network. Unfor-tunately, the probability of two segments intersectingis considerably less than that of two lines intersecting(given above by Equation 14). To find the probability

4Lines drawn by randomly selecting two points, p1 and p2, fromwithin the unit circle, and drawing the line extending through them.

that two line-segments intersect5, we can use the solu-tion to Sylvester’s Four-Point Problem. The Four-PointProblem asks for the probability that four randomly se-lected points in a convex domain will form a re-entrantquadrilateral, i.e., one in which one point falls withinthe triangle formed by the other three points. Solomonshows that if the points are selected from a circular do-main, then the probability that the points form a re-entrant quadrilateral is 35

12π2 [34]. If we select our firstthree points at random, then we can divide the regioninto seven sections: four re-entrant and three convex (seeFigure 1). The two line-segments will only intersect ifthe fourth point falls in the convex region C3. Thus, theprobability of intersection is given by:

Pintersect =1

3

(

1 − 35

12π2

)

≈ 0.235 (15)

To further complicate the analysis, our line segments arenot drawn independently at random, but originate froma central point and radiate out in random directions (seeFigure 2). However, Monte-Carlo simulations indicatethat even if we only draw two random segments origi-nating from α and two from α′, the probability of at leastone intersection is greater than 56%, and with five linesegments per point, we have a 95% probability of inter-section. Since an intersection corresponds to detectinga node replication attack, we can detect an attack withhigh probability, using only a constant number of line-segments. While this approach depends on the routingtopology of the network, our simulations indicate thatit reliably detects node replication in realistic scenarios(see Section 7).

Since we only use a constant number of line-segments per node, the Line-Selected Multicast proto-col has very reasonable performance characteristics. As-suming each line-segment is of length O(

√n), then our

protocol only requires O(n√

n) communication for theentire network and each node stores O(

√n) location

claims. We can also reduce the storage requirement byusing the time synchronization enhancements describedin Section 8.2.

7 Simulations

To verify the accuracy of our theoretical predictions,we ran simulations to measure the communication re-quirements of our two primary protocols, RandomizedMulticast and Line-Selected Multicast. Since Line-Selected Multicast relies on the topology of the networkto detect node replications, we also evaluated its detec-tion rate in a variety of network configurations. Ex-amples of the network topologies tested appear in Ap-pendix A.

5Segments drawn by randomly selecting two points, p1 and p2,from within the unit circle, and drawing the line segment connectingthem.

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000Number of Nodes

0

500

1000

1500

2000

2500

3000

Tot

al N

umbe

r of

Pac

kets

Sen

t and

Rec

eive

d

Randomized MulticastLine-Selected MulticastSQRT(n)

(a) Communication Overhead This figure indicates theaverage amount of communication per node. Note thatRandomized Multicast scales linearly with the number ofnodes, while Line-Selected Multicast scales as

√

n. Theerror bars on the data from the two Multicast schemesrepresent the standard error.

Thin H Thin Cross S Large Cross L Large H Uniform0

0.1

0.2

0.3

0.4

0.5

Ave

rage

Pro

babi

lity

of D

etec

tion

(b) Probability of Detection This graph illustrates the av-erage probability of detecting a single node replicationusing Line-Selected Multicast in a variety of topologies.See Appendix A for examples of the graphs tested.

Figure 3. Simulation Results

In our simulations, we deploy n nodes uniformly atrandom within a 500 × 500 square, with n varying be-tween 1,000 and 10,000. We assume the standard unit-disc bidirectional communication model and we adjustthe communication range, so that each node will haveapproximately 40 neighbors on average6. We use an av-erage of the total number of messages sent or receivedper node as a measure of the communication require-ments, and we measure resiliency by counting the num-ber of times we must run the protocol in order to detecta single node replication (i.e., we select a random nodeand insert one replica into the network). Thus, we cal-culate the probability of detection, Pd as:

Pd =1

# repetitions(16)

For the Randomized Multicast protocol, we used

p · d · g =√

n

which theoretically gives us a 63% probability to de-tect replication, and for Line-Selected Multicast we usedr = 6 (i.e., each location claim creates six line seg-ments). For each network configuration, we generatedtwenty random graphs and averaged the results of tentrials on each graph.

As shown in Figure 3(a), the simulations closelymatch our theoretical predictions. The communicationfor Randomized Multicast scales linearly with the num-ber of nodes, while the Line-Selected Multicast only

6Our simulations show little variance with values of d ranging from10 to 50, though the communication required by the Randomized Mul-ticast drops slightly for larger values of d.

grows at O(√

n). Our simulations also indicate that themaximum amount of communication required of anyone node as compared to the average case scales loga-rithmically with the number of nodes in the graph. UsingRandomized Multicast in a network with 1,000 nodesrequires a maximum of four times as much communica-tion as the average case, and with 10,000 nodes, it re-quires seven times as much. With Line-Selected Multi-cast, the maximum amount of communication in a net-work with 1,000 nodes is twice as much as the averagecase and four times as much for 10,000 nodes.

As Figure 3(b) illustrates, Line-Selected Multicast re-liably detects node replications in a variety of irregularnetwork configurations. To improve the probability ofdetection, we can always repeat the protocol or add afew additional line segments per node.

8 Discussion

In this section, we compare the performance of thevarious protocols we have discussed, and we presentseveral techniques based on loose-time synchronizationthat reduce the storage requirements of the protocols.We also discuss potential issues that may arise from ouruse of public-key cryptography, as well as symmetricalternatives that would require less computational over-head at the price of additional communication. In Sec-tion 8.4, we describe a sophisticated attack that appliesto all of the protocols we have discussed, and we presenta defense against it. Finally, based on the success ofour Randomized Multicast and Line-Selected Multicastprotocols, we argue that algorithms based on emergentproperties offer the most promising techniques for pro-

Communication Memory

Broadcast O(n2) O(d)

Deterministic Multicast O( g ln g√

n

d) O(g)

Randomized Multicast O(n2) O(√

n)

Line-Selected Multicast O(n√

n) O(√

n)

Table 1. Summary of Protocol Costs This tables il-lustrates the memory and communication costs for eachprotocol. The communication costs are for the entirenetwork and the memory costs are per node. The Line-Selected Multicast protocol offers the most efficient so-lution in terms of resiliency versus cost.

viding security in sensor networks.

8.1 Protocol Comparison

Table 1 summarizes the costs for each of the dis-tributed protocols previously discussed. The Broadcastprotocol offers the simplest solution, but the commu-nication overhead will only be tolerable for small net-works. Deterministic Multicast improves on the com-munication requirements, but by selecting a fixed setof witnesses, it loses resiliency. An attacker can per-form unlimited replications after only compromising afixed number of nodes. Randomized Multicast pro-vides excellent resiliency, since it prevents the adversaryfrom anticipating the identity of the witnesses. Unfor-tunately, it imposes communication overhead equal tothat of the Broadcast scheme. However, for networksin which the number of nodes is less than the square ofthe average degree, Randomized-Multicast will tend tobe more space efficient. Finally, Line-Selected Multicastuses less communication than Broadcast or RandomizedMulticast, but provides comparable or greater resiliency,making it a particularly attractive choice. Our simula-tions confirm that this resiliency remains in a variety ofnetwork configurations (see Section 7). We can also re-duce the storage requirements for these protocols by us-ing the time synchronization enhancements described inSection 8.2.

8.2 Synchronized Detection

We now consider the synchronization issues involvedin detecting node replication. All of the protocols de-scribed above require a loose notion of synchroniza-tion to insure timely detection. Various protocols existthat can offer the coarse-grained level of synchroniza-tion that we require. For example, Reference-BroadcastSynchronization (RBS) enables pairwise synchroniza-tion with low overhead and high precision [12]. Huand Servetto describe a protocol that provides asymp-totically optimal time synchronization in dense net-

works [18]. Any of these protocols will suffice for ourpurposes.

Deciding how often to perform the detection protocoltrades efficiency of detection against the communicationand storage costs required by each iteration. However,as we describe below, we can leverage our assumptionof loose synchronization to mitigate the cost of runningthe protocols.

8.2.1 High Noon

In one modification, detection happens during a fixedwindow of time (of length t) that occurs every T unitsof time (for T � t). At the beginning of each time win-dow, each node broadcasts its location claim to its neigh-bors, who then resend it to random locations in the net-work. Each node looks for conflicts in location claimsarriving during the time window and revokes conflictingnodes. After time t has elapsed, the nodes forget all ofthe location claims (but continue to remember the list ofrevoked nodes). Using this modification, the nodes onlydevote significant memory resources to detection duringthe brief time window of length t. The rest of the time(T − t), they can utilize their entire memory for non-detection purposes.

8.2.2 Time Slots

As an alternate approach, we assume the node IDs arerandomly distributed on some fixed interval [0..N ]. Fora given protocol parameter k, we can divide time intoepochs of length T , with each epoch consisting of k timeslots. During each epoch, in time slot s, all of the nodeswith IDs located in the interval [s· N

k..(s+1)· N

k] broad-

cast their location claims to their neighbors, who thenfollow the standard protocol. Nodes receiving a locationclaim from a node with an ID in that interval store theclaim for the duration of the time slot and check for con-flicts. At the end of the time slot, they can forget all ofthe claims they have heard. Using this method reducesthe storage requirements at each node to O( p·d·g

k).

8.2.3 Security Requirements

Since both of these modifications operate deterministi-cally, the adversary might attempt to launch her replica-tion attack between time slots or refuse to follow the pro-tocol (i.e., by not broadcasting her location at the spec-ified time). However, we can thwart this behavior byhaving each node remember which neighbors it heardfrom during the previous epoch. Later, if a node hearsfrom a neighbor that did not participate in the previousepoch, it will refuse to communicate with that node untilthe node successfully participates in a detection epoch.This effectively precludes the adversary from avoidingrandomized detection.7 Since the nodes in most sensor

7Technically, both of these modifications would require some addi-tional configuration to account for propagation delays and uncertaintyas to the size and width of the network, but the essential idea remains

networks must already remember a list of their neigh-bors, this would only require an additional d bits. Ofgreater concern is the fact that this modification limitswhen new nodes can join the network. This can be mit-igated by appropriate choices for t, T , and k, as well asdecisions regarding deployment timing.

8.3 Authentication

The use of public-key signatures requires several ex-tensions to our protocols to prevent the signatures them-selves from becoming a security threat. Also, while weargued in Section 5.1 that algorithmic and hardware im-provements are beginning to make public-key cryptog-raphy practical for sensor networks, we also present amechanism that utilizes symmetric one-time signaturesas an alternative to the public-key authentication previ-ously assumed. While the symmetric signatures reducethe computational overhead of generating and verifyingsignatures, they require additional communication, mak-ing them less appealing than the public-key approach.

8.3.1 Public Key Security Adjustments

To prevent the use of signatures from becoming a secu-rity liability, we need to ensure that the adversary can-not perform a Denial-of-Service attack by making nodesverify bogus signatures or by reporting its neighbors’position claims to every node in the network, rather thanprobabilistically reporting them to g nodes. In practice,these two attacks are unlikely to be a concern. Sincethe basic protocol currently requires each node that re-ceives a location claim to verify the signatures containedwithin the claim, legitimate nodes will immediately de-tect any attempt to inject faulty signatures. If we addsome form of neighbor-to-neighbor authentication, thena node identifying a faulty signature can also assign ap-propriate blame for the fault. It may then choose toblacklist or otherwise revoke the guilty party. If the ad-versary can flood a node with faulty signatures not orig-inating from a node (via some other broadcast source),then the adversary could just as easily perform a jam-ming attack or otherwise interfere with legitimate com-munication. Neighbor-to-neighbor authentication alsoprevents the adversary from framing an innocent thirdparty.

To address the second concern, we note that whena legitimate node, γ, forwards a location claim to its grandomly chosen locations, these location claims mustbe routed through its neighbors. If the neighbors lis-ten promiscuously, they can all detect γ’s attempts toforward the same location claim more than g times andrefuse to forward additional claims. Unfortunately, thisdoes require each node to keep a count associated witheach neighbor, but it may be necessary to prevent the ad-versary from wasting the network’s collective memory.

sound.

8.3.2 Symmetric Alternatives

As an alternative to computation-intensive public-keyalgorithms, researchers have proposed using more effi-cient symmetric cryptographic mechanisms for sensornetworks – for example, broadcast authentication basedon one-way chains and time synchronization [20, 29], orone-time signatures based on one-way functions [25].

For the purposes of this paper, if sensor nodes onlyneed to sign a single location statement, a one-timesignature will suffice. For example, we could use theMerkle-Winternitz signature [25], which has alreadybeen successfully used for stream signatures [31]. Toverify a Merkle-Winternitz signature, a verifier onlyneeds to possess an authentic verification value, i.e., thepublic key, which in this case is a hash value over severalone-way chain values. However, since storing all of thepublic values of all the nodes would have a high over-head, we could instead construct a Merkle hash tree [24]over all public values, and the resulting root node couldbe used to authenticate any one-time signature in thattree. For signature verification, a node would includeall the values in the hash tree that allow a verifier torecompute the root node of the hash tree; for n nodes,this approach would require ln n extra values. Using theparameters suggested by Rohatgi [31] for the Merkle-Winternitz signature, a signature is 230 bytes large, andthe additional verification values would require an addi-tional 100 bytes. This is quite large for a sensor network,and thus we suggest using asymmetric cryptography toachieve smaller messages, despite the higher verificationcost. Since message transmission accounts for the ma-jority of energy consumption, we may conserve energyby sending smaller messages but requiring a higher com-putation overhead.

8.4 Masked-Replication Attacks

In our discussion of the resiliency of the various pro-tocols, we have assumed that each node has at least onelegitimate neighbor. Without this assumptions, all of theprotocols discussed thus far may fail to detect node repli-cation. If the adversary compromises all of α’s neigh-bors, then she can create one replica of α without fearof detection, since the compromised nodes will not sendout location claims for the original α. To create a sec-ond replica of α, the adversary must also compromisethe nodes surrounding the first replica. Thus, the adver-sary must compromise an additional d nodes for eachreplica of α she wishes to create. However, if the com-promised nodes mask the replicated nodes, the adversarycan improve her foothold in the network. For example,if the adversary compromises nodes µ1, µ2 . . . µk, thenshe can assign replicas of {µ1, µ2 . . . µi−1, µi+1 . . . µk}as neighbors of µi. This gives the adversary the influ-ence of k2 nodes after she compromises k nodes. As aconcrete example, suppose the adversary compromisesnodes µ1 and µ2. Now, she can create replicas µ′

1 andµ′

2. By assigning µ1 as µ′2’s only neighbor, and µ2 as

µ′1’s only neighbor, the compromised nodes can mask

the replicas. None of the protocols discussed will de-tect these replicas, since they rely on a replicated node’sneighbors to propagate the replica’s location claims.

Fortunately, we can thwart this attack with a straight-forward defense. Each node β maintains a list of the mnodes from which it has seen the most traffic. For eachnode α on the list, β appoints itself a pseudo-neighborof α with probability

Ppseudo ∝ trafficα

claimsα

(17)

where trafficα is the amount of traffic that β has seenfrom α, and claimsα is the number of location claimsβ has seen concerning α. As a pseudo-neighbor, β re-quests a location claim directly from α. If α fails torespond, β ceases to forward traffic from α. If α does re-spond, β follows the same protocol-dependent behavioras the real neighbors of α (e.g., in the Randomized Mul-ticast, β will forward the location claim to a randomlyselected set of witness nodes).

The use of pseudo-neighbors successfully defeatsmasked-replication attacks. If the masked nodes failto send out location claims, then the closest legitimatenodes will have a higher probability of appointing them-selves pseudo-neighbors. If the replicated nodes fail torespond to requests from the pseudo-nodes, the legiti-mate nodes will cut off communication. By forcing allnodes to provide location claims, we can rely on the re-siliency of our detection algorithm to revoke replicatednodes.

8.5 Emergent Properties

Our two primary algorithms, Randomized Multicastand Line-Selected Multicast, achieve distributed detec-tion of global events, while imposing low overheadsand providing high resiliency. Their strong propertiesare a result of their emergent nature. Emergent algo-rithms utilize the collective efforts of multiple sensornodes to provide capabilities beyond those of any indi-vidual node. For example, after a network’s initial de-ployment, a topology emerges as nodes exchange neigh-bor information and establish a routing infrastructure.Since emergent algorithms operate in a distributed fash-ion, they tend to be highly robust in the face of individ-ual node failures, and they avoid the problems inherentin centralized solutions. These properties make themideal for security applications, particularly in a settingin which we cannot predict the number or identity ofthe sensors that will be subverted by an adversary. Tothe best of our knowledge, our protocols represent thefirst application of emergent algorithms to the problemof security in sensor networks, and we believe that ad-ditional research will only continue the trend towardsdistributed solutions. Furthermore, emergent algorithmsapply more generally to security in other classes of net-works in which individual nodes are vulnerable to com-

promise (e.g., peer-to-peer networks and ad-hoc wire-less networks), allowing solutions in one class to extendnaturally to the others.

9 Related Work

Eschenauer and Gligor [13] propose centralized noderevocation in sensor networks. When the base stationdetects a misbehaving node, it broadcasts a message torevoke that node. Chan, Perrig, and Song [6] proposea localized mechanism for sensor network node revoca-tion; in their approach, nodes can revoke their neigh-bors. Neither paper discusses a distributed approach fordetecting distributed intrusions.

In broadcast encryption, key revocation has beenan important mechanism to recover from compromisedkeys [2, 3, 15, 16, 21, 26]; however, these approaches donot provide duplicate node detection in sensor networks.Similarly, the results of key revocation in the context ofmulticast content distribution [35, 37] do not apply here.

The Sybil attack is related to the node replicationattack—in the Sybil attack, a malicious node gains anunfair advantage by claiming multiple identities [10].Douceur presents countermeasures for peer-to-peer net-works that involve resource verification (computation,communication, and memory) [10]. Newsome et al.present techniques to defend against the Sybil attack insensor networks [27]. Their countermeasures includewireless network testing, key space verification, andcentral node registration. Only their centralized noderegistration technique (similar to the one we describe inSection 3.1) can detect a node replication attack, but itis brittle in the face of node compromise and has a highoverhead, as we show in this paper.

Bawa et al. [1] propose an algorithm for counting thenumber of members of a peer-to-peer network that is re-lated in spirit to our approach; they propose a randomsampling approach that provides an estimate of the net-work size after O(

√n) samples are drawn and a dupli-

cate node is sampled (for a network of size n), by usingthe birthday paradox. Our Randomized Multicast alsorelys on the birthday paradox.

10 Future Work

In the preceding discussion, we have assumed that thenodes controlled by the adversary continue to follow theprotocols described. In our future work, we would liketo explore additional mechanisms to ensure that our pro-tocols continue to function even in the face of misbehav-ing nodes. For example, McCune et al. describe a tech-nique that uses secure implicit sampling to detect nodesthat suppress or drop messages [23]. We could also usesome of the techniques described in Section 8.2 to peri-odically sweep the network for replicas, thus preventingthe adversary from establishing a significant foothold inthe network.

11 Conclusion

We have discussed various approaches used to detectnode replication. In Section 3, we show how central-ized approaches place excessive trust in the base stationand excessive load on those nodes near it. Local vot-ing schemes are ill equipped to detect distributed nodereplication. In contrast, we present two schemes that en-able distributed detection of distributed events. The fi-nal scheme, Line-Selected Multicast, provides excellentresiliency while achieving near optimal communicationoverhead with only modest memory requirements. Bothof our primary schemes illustrate the power of emer-gent properties in sensor networks. Given the adversarymodel typically assumed in sensor networks, we arguethat the security of such networks will increasingly de-pend on emergent algorithms. Cost considerations andunattended deployment will always leave individual sen-sors vulnerable to compromise. Since we cannot predictthe exact nature or number of targets the adversary willselect, the network must collectively resist, report andrevoke compromised nodes in a manner that goes be-yond traditional intrusion detection systems. We expectthat emergent algorithms will ultimately provide the bestdefense against these insidious attacks.

Acknowledgements

The authors would like to thank Haowen Chen andDawn Song for their insightful comments and sugges-tions. Diana Seymour provided discerning observationsand invaluable help editing the paper. We would alsolike to thank the anonymous reviewers for their helpfulsuggestions.

References

[1] M. Bawa, H. Garcia-Molina, A. Gionis, and R. Motwani.Estimating aggregates on a peer-to-peer network. Tech-nical report, Stanford University, 2003.

[2] C. Blundo and A. Cresti. Space requirements for broad-cast encryption. In Advances in Cryptology (EURO-CRYPT), 1995.

[3] C. Blundo, L. Mattos, and D. Stinson. Trade-offs be-tween communication and storage in unconditionally se-cure schemes for broadcast encryption and interactivekey distribution. In Advances in Cryptology (CRYPTO),1996.

[4] D. Braginsky and D. Estrin. Rumor routing algorithmfor sensor networks. In Proceedings of ACM Workshopon Wireless Sensor Networks and Applications, 2002.

[5] N. Bulusu, J. Heidemann, and D. Estrin. GPS-less low-cost outdoor localization for very small devices. IEEEPersonal Communications Magazine, October 2000.

[6] H. Chan, A. Perrig, and D. Song. Random key predistri-bution schemes for sensor networks. In Proceedings ofIEEE Symposium on Security and Privacy, May 2003.

[7] T. Cormen, C. Leiserson, R. Rivest, and C. Stein. Intro-duction to Algorithms. MIT Press, 2001.

[8] L. Doherty, K. S. J. Pister, and L. E. Ghaoui. Convexposition estimation in wireless sensor networks. In Pro-ceedings of IEEE Infocom, 2001.

[9] D. Dolev and A. C. Yao. On the security of public keyprotocols. IEEE Transactions on Information Theory,1983.

[10] J. R. Douceur. The Sybil attack. In Proceedings of Work-shop on Peer-to-Peer Systems (IPTPS), Mar. 2002.

[11] J. Dyer, M. Lindemann, R. Perez, R. Sailer, L. vanDoorn, S. W. Smith, and S. Weingart. Building the IBM4758 Secure Coprocessor. IEEE Computer, 2001.

[12] J. Elson, L. Girod, and D. Estrin. Fine-grained net-work time synchronization using reference broadcasts.SIGOPS Oper. Syst. Rev., 2002.

[13] L. Eschenauer and V. Gligor. A key-managementscheme for distributed sensor networks. In Proceedingsof the ACM Conference on Computer and Communica-tion Security (CCS), Nov. 2002.

[14] D. Estrin, R. Govindan, J. S. Heidemann, and S. Kumar.Next century challenges: Scalable coordination in sensornetworks. In Mobile Computing and Networking, 1999.

[15] A. Fiat and M. Naor. Broadcast encryption. In Advancesin Cryptology (CRYPTO), 1994.

[16] J. Garay, J. Staddon, and A. Wool. Long-lived broad-cast encryption. In Advances in Cryptology (CRYPTO),2000.

[17] V. D. Gligor. Security of emergent properties in ad-hocnetworks. In Proceedings of International Workshop onSecurity Protocols, Apr. 2004.

[18] A. Hu and S. D. Servetto. Asymptotically optimal timesynchronization in dense sensor networks. In Proceed-ings of ACM International Conference on Wireless Sen-sor Networks and Applications, 2003.

[19] B. Karp and H. T. Kung. GPSR: Greedy perimeter state-less routing for wireless networks. In Proceedings ofConference on Mobile Computing and Networking (Mo-biCom), Aug. 2000.

[20] D. Liu and P. Ning. Efficient distribution of key chaincommitments for broadcast authentication in distributedsensor networks. In Proceedings of Network and Dis-tributed System Security Symposium (NDSS), Feb. 2003.

[21] M. Luby and J. Staddon. Combinatorial bounds forbroadcast encryption. In Advances in Cryptology(EUROCRYPT), 1998.

[22] D. Malan, M. Welsh, and M. Smith. A public-key infras-tructure for key distribution in TinyOS based on ellipticcurve cryptography. In Proceedings of IEEE Conferenceon Sensor and Ad hoc Communications and Networks(SECON), Oct. 2004.

[23] J. M. McCune, E. Shi, A. Perrig, and M. K. Reiter. De-tection of denial-of-message attacks on sensor networkbroadcasts. In Proceedings of IEEE Symposium on Se-curity and Privacy, May 2005.

[24] R. Merkle. Protocols for public key cryptosystems. InProceedings of the IEEE Symposium on Research in Se-curity and Privacy, Apr. 1980.

[25] R. Merkle. A digital signature based on a conven-tional encryption function. In Advances in Cryptology(CRYPTO), 1988.

[26] D. Naor, M. Naor, and J. Lotspiech. Revocation andtracing schemes for stateless receivers. In Advances inCryptology (CRYPTO), 2001.

[27] J. Newsome, E. Shi, D. Song, and A. Perrig. The Sybilattack in sensor networks: Analysis and defenses. InProceedings of IEEE Conference on Information Pro-cessing in Sensor Networks (IPSN), Apr. 2004.

[28] J. Newsome and D. Song. GEM: Graph embedding forrouting and data-centric storage in sensor networks with-out geographic information. In ACM Conference on Em-bedded Networked Sensor Systems (SenSys), Nov. 2003.

[29] A. Perrig, R. Szewczyk, V. Wen, D. Culler, and J. D.Tygar. SPINS: Security protocols for sensor networks.In ACM Conference on Mobile Computing and Networks(MobiCom), July 2001.

[30] S. Ratnasamy, B. Karp, L. Yin, F. Yu, D. Estrin,R. Govindan, and S. Shenker. GHT: A geographic hashtable for data-centric storage. In Proceedings of ACM In-ternational Workshop on Wireless Sensor Networks andApplications (WSNA), Sept. 2002.

[31] P. Rohatgi. A compact and fast hybrid signature schemefor multicast packet. In Proceedings of ACM Conferenceon Computer and Communications Security (CCS), Nov.1999.

[32] A. Seshadri, A. Perrig, L. van Doorn, and P. Khosla.Swatt: Software-based attestation for embedded devices.In Proceedings of IEEE Symposium on Security and Pri-vacy, May 2004.

[33] S. W. Smith and S. Weingart. Building a high-performance, programmable secure coprocessor. Com-puter Networks, Apr. 1999. Special Issue on ComputerNetwork Security.

[34] H. Solomon. Geometric Probability. Society for Indus-trial and Applied Mathematics (SIAM), 1978.

[35] D. Wallner, E. Harder, and R. Agee. Key manage-ment for multicast: Issues and architectures. InternetRequest for Comment RFC 2627, Internet EngineeringTask Force, June 1999.

[36] S. Weingart. Physical security devices for computer sub-systems: A survey of attacks and defenses. In Cryp-tographic Hardware and Embedded Systems (CHES),Aug. 2000.

[37] C. Wong, M. Gouda, and S. Lam. Secure group com-munications using key graphs. In Proceedings of ACMSIGCOMM, 1998.

A Network Topologies

(a) Thin H (b) Thin Cross (c) S

(d) Large Cross (e) L (f) Large H

Figure 4. Assorted Network Topologies We tested the Line-Selected Multicast protocol with a variety of network topolo-gies. Samples of each are shown here. Dots represent nodes, and lines represent connections between neighbors.

Documents

Distributed Detection of Node Replication Attacks in ...suri/Spr05/secure5A.pdf · in Sensor Networks Bryan Parnoy Adrian Perrigz Carnegie Mellon University fparno, [email protected]