24
Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer, and Herbert Rubens Department of Computer Science Johns Hopkins University Baltimore, MD {baruch, dholmer, herb}@cs.jhu.edu Technical Report Version 2 October 5th, 2003 Abstract The Swarm Intelligence paradigm has recently been demonstrated as an effective approach for routing in small static network configurations with no adversarial intervention. These algorithms have also been proven to be robust and resilient to changes in node configuration. However, none of the existing routing algorithms can withstand a dynamic proactive adversarial attack, where the network may be completely controlled by byzantine adversaries. The routing protocol presented in this work attempts to provide throughput competitive route selection against an adversary which is essentially unlimited; more specifically, the adversary benefits from complete collusion of adversarial nodes, can engage in arbitrary byzantine behavior and can mount arbitrary adaptive attacks, dynamically changing its attack with each new packet. In this work, we show how to use the Swarm Intelligence paradigm and Distributed Reinforcement Learning in order to develop provably secure routing against byzantine adversaries. A proof ofthe convergence time of our algorithm is presented as well as preliminary simulation results. 1 Introduction The motivation for this work is to design robust routing for wired overlay networks. In general, routing protocols are susceptible to a wide variety of attacks. A malicious node may advertise false routing information, try to redirect routes, or perform a denial of service attack by engaging a node in resource consuming activities such as routing packets in a loop. The goal of this work is to propose a generic framework for routing protocols which operate in a dynamic byzantine adversarial environment. Such a strong model has not been considered in the literature to the best of our knowledge. In fact, one does not even need to consider the full power of byzantine attacks. Even relatively benign adaptive dynamic denial of service attacks are already sufficient to break most existing algorithmic work. (In section 2.5, we briefly describe why adaptive DOS attacks are so devastating.) What is remarkable about our result is the ability to prove near-optimality bounds under completely arbitrary adversarial behavior. We use techniques similar to the “Swarm Intelligence” and Distributed Reinforcement Learning paradigm. Swarm Intelligence is a set of learning and biologically-inspired approaches to solve hard optimization problems using distributed cooperative agents. Motivation comes from work which explored the behaviors of ants and how they coordinate each other’s selection of routes based on a pheromone secretion. 1

Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

  • Upload
    others

  • View
    26

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

Provably Secure Competitive Routing against Proactive Byzantine

Adversaries via Reinforcement Learning

Baruch Awerbuch, David Holmer, and Herbert Rubens

Department of Computer Science

Johns Hopkins University

Baltimore, MD

baruch, dholmer, [email protected]

Technical Report Version 2

October 5th, 2003

Abstract

The Swarm Intelligence paradigm has recently been demonstrated as an effective approach forrouting in small static network configurations with no adversarial intervention. These algorithmshave also been proven to be robust and resilient to changes in node configuration. However, noneof the existing routing algorithms can withstand a dynamic proactive adversarial attack, where thenetwork may be completely controlled by byzantine adversaries.

The routing protocol presented in this work attempts to provide throughput competitive routeselection against an adversary which is essentially unlimited; more specifically, the adversary benefitsfrom complete collusion of adversarial nodes, can engage in arbitrary byzantine behavior and canmount arbitrary adaptive attacks, dynamically changing its attack with each new packet. In thiswork, we show how to use the Swarm Intelligence paradigm and Distributed Reinforcement Learningin order to develop provably secure routing against byzantine adversaries. A proof of the convergencetime of our algorithm is presented as well as preliminary simulation results.

1 Introduction

The motivation for this work is to design robust routing for wired overlay networks. In general, routingprotocols are susceptible to a wide variety of attacks. A malicious node may advertise false routinginformation, try to redirect routes, or perform a denial of service attack by engaging a node in resourceconsuming activities such as routing packets in a loop.

The goal of this work is to propose a generic framework for routing protocols which operate ina dynamic byzantine adversarial environment. Such a strong model has not been considered in theliterature to the best of our knowledge. In fact, one does not even need to consider the full powerof byzantine attacks. Even relatively benign adaptive dynamic denial of service attacks are alreadysufficient to break most existing algorithmic work. (In section 2.5, we briefly describe why adaptiveDOS attacks are so devastating.)

What is remarkable about our result is the ability to prove near-optimality bounds under completelyarbitrary adversarial behavior. We use techniques similar to the “Swarm Intelligence” and DistributedReinforcement Learning paradigm. Swarm Intelligence is a set of learning and biologically-inspiredapproaches to solve hard optimization problems using distributed cooperative agents. Motivationcomes from work which explored the behaviors of ants and how they coordinate each other’s selectionof routes based on a pheromone secretion.

1

Page 2: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

As in [46], one can imagine a model of network routing, such that the network is populated byartificial ants (packets) that make use of the trail laying principle; at each node an ant encounters onthe journey to its destination, it leaves an amount of pheromone which evaporates with time, but is anincreasing function of the frequency of traversal of that location. The ant then selects the next nodeon its journey on the basis of the local pheromone distribution [46]. The routing decision is determinedby these pheromone distributions.

In our Byzantine routing approach, we act similarly: the process of route detection and faultavoidance is carried out by a distributed process of “learning” fault free paths, in spite of deceptivetechniques pursued by adversaries. The routing process creates and adjusts a probability distributionat each node for the node’s neighbors. The probability associated with a neighbor reflects the relativelikelihood of that neighbor forwarding and eventually delivering the packet to the destination.

It may appear that our Byzantine-inspired routing model may lead to impractical algorithms intypical (non-Byzantine) settings. However, ant-inspired routing algorithms were developed and testedin real network environments by British Telecomm and NTT for both wired and wireless networkswith superior results [11, 51, 16, 12, 44, 46, 9, 19]. AntNet, a particular such algorithm, was tested inrouting for data communication networks [11]. The algorithm performed better than OSPF, distributedBellman-Ford with various dynamic metrics, and various modifications of shortest path with a dynamiccost metric [8, 44].

2 Network and Security Model

In the following section we describe both our network and security models and any assumptions thatwe make. We also fully explain the adversarial model that we consider in this work and the challengesto mitigating the adversarial threats from both an algorithmic and security stand point.

2.1 Security Assumptions

Authentication and Symmetric Cryptographic Primitives. We assume existence of efficientcryptographic primitives. This requires pairwise shared keys between the sender and other nodes inthe network 1 which are established on-demand. The public-key infrastructure used for authenticationcan be either completely distributed (as described in [26]), or Certificate Authority (CA) based. Inthe latter case, a distributed cluster of peer CA’s sharing a common certificate and revocation listcan be deployed to improve the CA’s availability. However, the issues of establishing authenticationmechanisms are beyond the scope of this paper.

No trusted parties assumed. This work does not assume the existence of any trusted servers orrouters. In this work we consider only the source and the destination to be trusted. The only assumptionis that any intermediate node on the path between the source and destination can be authenticatedand can participate in the protocol. Nodes that can not be authenticated do not participate in theprotocol.

Authenticated Link State Routing We assume the existence of authenticated link state routinginformation. The authentication is used to prevent external adversaries from participating in the rout-ing protocol. However, byzantine adversaries can still influence the routing protocol, so the informationis not trusted. This routing information is used as a starting point which is provided to our protocolas input.

1We discourage group shared keys since this is an invitation for impersonation in a cooperative environment.

2

Page 3: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

2.2 Considered Attacks

Even authenticated nodes may exhibit byzantine behavior at times. The goal of our protocol is notto explicitly detect byzantine behavior (which is essentially impossible) but rather to avoid byzantineparticipants in such a way that the traffic losses are nearly optimal.

We define byzantine behavior as any action by an authenticated node that results in disruption ordegradation of the routing service. We assume that an intermediate node can exhibit such behavioreither alone or in collusion with other nodes. More generally, we use the term fault to refer to anydisruption that causes significant loss or delay in the network. A fault can be caused by byzantinebehavior, external adversaries, lower layer influences, and certain types of normal network behaviorsuch as bursting traffic.

2.3 Adversary models overview

We now provide the list of all the possible adversarial models captured by our work. We would liketo classify adversaries based on a number of dimensions. In each dimension, we rank each adversarybased on its power. We then address the adversary which is the “strongest” in each dimension.

Malignancy of the attacks:

• A benign adversary can drop packets, e.g. exercise a remote Denial of Service (DOS) attack, butcannot infiltrate routers and cannot not engage in active attacks such as modifying or miss-routing(tunnelling) messages or acknowledgements.

• Static malicious adversary is the next level up in the adversarial hierarchy. This class of attacksallows the adversary to permanently control a static subset of the routers. An adversarial networknode (router) may employ the following attacks:

– Libel: assigning the blame to its neighbors for the failure to transmit or receive packets.

– Impersonation: pretending to be somebody else.

– Miss-routing: sending a message in the wrong direction.

– Topology-distortion: incorrectly advertising topology information to draw traffic into itself(dropping it later).

– Ack-distortion: attempting to send fake acknowledgements to confuse the protocol

and, basically do anything else an adversary can think of to confuse and disrupt the routingalgorithm.

• Proactive malicious adversaries: is the ultimate adversary that we would like to confront. It maytake control of a dynamic subset of routers at different times, getting in and out, completely atwill, in spite of these routers being previously authenticated. While a router or edge is “possessed”by an adversary, it executes any of the byzantine or DOS attacks above.

Collusion power of adversary:

• The easiest case involves no collusion: each adversarial router acts on its own.

• The hardest possible case that we would like to handle in this paper is complete collusion: alladversarial routers act as a team, engaging in simultaneous collusion with other routers which areunder adversarial control (e.g., enabling “tunnelling” or “wormhole” attacks, as well as “switchingblame” between compromised routers for the purpose of avoiding detection.)

3

Page 4: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

Adaptivity/Intelligence of the adversary:

• The easiest model is that of a completely static adversary. This adversary designs a static faultpattern ahead of time, while knowing the algorithm, but without knowing any coin flips used bythe algorithm. It applies some pattern of attacks to all the packets on a given router (edge). Itcompletely destroys all static deterministic or randomized algorithms, such as min-hop routingpath, random path, random selection among all edge-disjoint paths, etc.) However, some of theexisting work on byzantine routing handles such an adversary, e.g, [6].

• Dynamic adversary (which is the focus of this paper) designs a dynamic router infiltration patternahead of time, with full knowledge of the routing algorithm, but without knowledge of recent coinflips used by the algorithm. Each compromised router may decide to forward or to drop packetsat will, e.g., forwarding all the “path establishment” or “topology update” packets, and thendropping the actual traffic. This adversary can completely destroy any deterministic routingalgorithm e.g., that attempts to collect traffic statistics and choose the next path based on lowestfault rate on previously sampled paths, e.g., RON [4] and ODSBR [6]. The only known algorithmthat can succeed against this type of adversary is a “learning” style algorithm as in [7]; howeverthat protocol was only designed for DOS attacks and fails in the case of byzantine behavior.

2.4 System and Security challenges

Authentication is of little use. Although one might assume that, once authenticated, a nodeshould be trusted, there are many scenarios where this is not appropriate. For example, a miss-configured router might be authenticated and participating in the routing protocol, but unable tosuccessfully deliver packets. In addition, a router could be under attack and unable to deliver packetsto destinations it has advertised. Thus, we focus on providing routing survivability under a dynamic“proactive” [20] adversarial model, where dynamically changing subsets of intermediate nodes canperform arbitrary byzantine attacks such as creating routing loops, miss-routing packets on non-optimalpaths, or selectively dropping packets (black hole). Only the source and destination nodes are assumedto be trusted.

Maintaining link-state databases is expensive and not useful. While this sounds counter-intuitive, the standard proactive “link-state” routing model, where topology updates are being broad-cast through the network, is not applicable to this type of byzantine setting. Indeed, nothing preventsbyzantine nodes from claiming perfect connectivity with every node in the network and then com-pletely destroying all traffic. It is provably impossible under certain circumstances, for example when amajority of the nodes are malicious, to attribute a byzantine fault occurring along a path to a specificnode, even using expensive and complex byzantine agreement. Consequently, the topology databaseused in link state methods cannot be trusted and must be verified by the protocol.

2.5 Algorithmic challenges

Past performance is no guarantee of future success. The problem is that we are makingdecisions “online”, where the online algorithm needs to service requests by selecting actions while havingonly information regarding past requests, and no (or very limited) information regarding the futurerequests. There are no stochastic assumptions about the way the adversary is acting or the sequenceof fault patterns generated. Moreover, it is also assumed that there is a powerful adversary thatgenerates the worst input sequence for the online algorithm. A fundamental question regarding onlinealgorithms is how to evaluate their performance. It is rare, and in some cases outright impossible thatone deterministic algorithm always outperforms another deterministic algorithm. One classical method

4

Page 5: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

has been to assume a stochastic model regarding the environment, and compare the performance ofdifferent algorithms on the same model. However, in an adversarial setting, there is no reason why theadversary should follow the rules of the stochastic model we invented - if anything, the adversary willdo exactly the the opposite of what our model would predict. (There is no reason to hope that theadversary will not know our model.)

One may be tempted to think that converging to the best fixed route may be easily accomplishedby utilizing a “greedy” heuristic: keep track of past error rates on different links, and simply select thepath with the smallest overall failure rate, namely the sum of the failure rates of its links. For example,existing work on routing in overlay networks, such as RON [4] runs a greedy strategy of windows of aspecific size. The intuition is very strong: extrapolate the past failure pattern into the future. This maywork if failure pattern is static or if there is a statistical model of failures. However this method failsquite spectacularly in the case of dynamic adversaries. For example, consider 100 options of choosing arelay point between the sender and receiver, which define 100 different paths. At time i path i (modulo100) is under control of the adversary, and is failing all packets; at all other times the path is perfect.The greedy algorithm will always pick the worst path, even though at any moment in time only 1% ofthe paths are faulty.

To see the principle flaw in this greedy strategy, consider the performance of an investor who tries tofollow the best performing stock on the stock market, with the naive assumption that “past performanceis a guarantee of future results”. In fact, in an adversarial setting, e.g., the stock market, it may verywell be the case that past performance correlates negatively with future results, and algorithms mustnot be fooled into this easy trap.

Our path selection must withstand competitive analysis. In general, there may not be anideal fault-free path, but yet there may be the best path in terms of accumulated loss on that path.Our goal is to make path selections, such that our overall performance is comparable to that of the“best path”.

Our goal is thus to find a robust randomized algorithm that works well on all inputs, in the sensethat the expected behavior of our algorithm is comparable to the optimum fixed path on each input.Our goal is to design a distributed algorithm for sending messages on paths with the overall smallestloss ratio. The contribution of this paper is a novel distributed algorithm for route selection andits proof of optimality in the sense that the total number of messages lost in our algorithm exhibitsa very small additive gap with optimum prescient route assignment. The optimum assignment ismade with complete knowledge of adversary actions, in the past, current, and future, and has infinitecomputational power. The only restriction on the optimum route is that it must change somewhat lessfrequently.

The loss of performance of the algorithm that we are seeking should be based on competitiveanalysis [47]. We compare the performance of our online algorithms to the best static off-line. Namelythe off-line algorithm selects a fixed path and uses it during all time steps. Our results bound thedifference between the cost of the best static off-line and our online algorithms. This difference is alsoreferred to in the learning literature as regret.

For example, consider a network with potentially 200 users, around 1000 potential links and aguarantee of existence of one fixed path of 7 hops between sender and receiver that has an average linkfault rate of 0.01 %, i.e. 99.99% of the time the whole path is reliable. The only problem is selectingone out of around 2007 possible paths, while only having information about past experiments overthese paths. In this case, one can construct a counter-example in which greedy may never succeed indelivering a single message, i.e. has a 100% fault rate, in spite of thoroughly selecting the best of the2007 paths so far! The explanation to this counter-intuitive fact is that any deterministic algorithmcan be easily fooled by an adversary, forcing it to pick the currently worst path that always fails.

5

Page 6: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

In contrast, the “competitive” randomized algorithm that we are seeking should be able to “zeroin” on the reliable path, or at least get comparable performance. The algorithm we are seeking shouldguarantee, for any adversarial behavior (subject to the adversary’s “pledge” to keep some path oflength 7 hops being 99.99% reliable on each link), a fault rate of just below 7 · 0.01% = 0.07% overlong sequences (this is averaged over the coin tosses of our algorithm).

Randomness is not a guarantee of success. Another “quick fix” is to try to select a randompath. There is a misconception in the community that selecting one of many node-disjoint paths, orselecting a random path will somehow guarantee success. (The stock-market equivalent of this belief isasserting that a stock index such as the NASDAQ 100 cannot go down by much.) There is somethingappealing about random strategies in the sense that they allow the spreading of risk; what is wrongwith the above quick fix is that there is no attempt to adapt the probabilities. In fact, it is easy togenerate counter-examples to either one of these policies. For example, consider a faulty edge leadinginto a dense subnetwork; generating paths at random pretty much guarantees hitting this faulty edge.

Multiple paths do not guarantee success. In has been generally recognized that sending packetsalong “multiple paths” is more “fault-tolerant” than sending packets over a single path. However, it iseasy to come up with a example involving a collection of edge-disjoint paths (e.g., disjoint paths in anexpander where the error probability is constant) such that each path has at least one fault, and yet arandom path is likely to not have any fault with high probability.

2.6 Contribution of this paper

General statement. The main result of this paper is a routing protocol that combines all thefollowing features:

• flooding free delivery: no flooding of data packets is taking place even under heavy traffic load.Each packet is sent over a selected path of certain bounded length to its destination.

• near-optimum loss: for every adversarial pattern our algorithms achieves near-optimal loss onthat sequence.

The routing protocol presented in this work attempts to provide throughput competitive routeselection against an adversary which is (essentially) not limited; more specifically, the adversary benefitsfrom complete collusion of adversarial nodes, can engage in arbitrary byzantine behavior and can mountarbitrary adaptive attacks, dynamically changing its attack with each new packet.

It appears from the description of the model that no possible solution exists against such a strongadversary. Indeed our model does allow the adversary to compromise the whole network. Now, oncethe adversary is completely in control of everybody, it appears that our algorithm cannot possiblyaccomplish any packet deliveries. If the adversary wiped out the whole network than surely the failureto communicate is not being attributed to a naive routing algorithm, but rather to the information-theoretic inability to accomplish communication by any algorithm, including the optimal. In this case,our algorithm should be “off the hook”. More dangerous to us are situations where a “post-mortem”analysis indicates that it was indeed possible to route the messages by a “reasonable benchmark”strategy, but our algorithm in fact failed to match the performance of such reasonable strategy interms of overall packet loss.

6

Page 7: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

Quantitative definition of optimality. Informally, we pick as a benchmark a fixed strategy (bestfixed path) that knows the pattern of the attack of the dynamic adversary, and has optimized itsselection in advance. In fact, a stronger version of our results can be proved, by picking a somewhatstronger dynamic benchmark, which will in fact be allowed to change, albeit less frequently than thealgorithm (this is described below). We still need to formally define the “best path”.

Definition 2.1 We consider a path to be “instance-optimal” if, for a given schedule of adversarialinfiltration/attacks on routers/network edges, processors, it minimizes, among all possible paths, theaverage number of faults, attacks, message distortions or other adversarial action on a packet’s path.Let this sum be the called “hypothetical loss” of a path.

Definition 2.2 The “hypothetical regret” of a routing strategy is the worst case normalized difference,over all possible adversarial actions between the following two factors:

• the actual loss of our routing strategy, and

• the above hypothetical loss of instance-optimal path.

Notice that the benchmark strategy and our strategy are both evaluated on the same input instance,namely the particular instance of adversarial behavior. The difference is that our algorithm does notknow the adversarial behavior, while the benchmark strategy was custom-tailored for this instance.

Our algorithm is outlined in Sec. 3 and formally described in section B. The following theorem isan informal statement of Thm. A.1.

Theorem 2.3 For all ǫ > 0, the algorithm accomplishes regret upper-bounded by ǫ.

Extension to dynamic benchmarks. The statement of Thm. 2.3 can be strengthened to dynamicbenchmarks. Let us make the following notations: E is the set of the edges in the network, and ǫsa isan arbitrarily chosen constant, d is degree of the network, and H is the length of routing paths beingconsidered, and ǫsa, ǫex is the error tolerance parameters that we choose (e.g,, ǫsa = ǫex = 0.01).

More precisely, we will allow our dynamic benchmark to change every τben = T ·τpha packets, where

τpha = Ω(|E|

ǫsa) and T =

H2

ǫex2· log d.

In contrast, our algorithm will changes its policy each τpha packets. In other words, we expect ouralgorithm to “learn” the best path after T phases, each phase lasting τpha packets. Consider a scheduleof possessed processors (i.e., a time-table indicating at which time each router is being possessed),which is unknown to our routing strategy, but may be known to somebody else. The statement ofThm. 2.3 applies in this case as well.

3 Major ideas of our algorithm

The algorithm we present is executed at each source node with respect to a specific destination. Theactual data packets are source routed and the integrity of the packets is protected using a reverseordered HMAC technique described in Section 4. Every source is maintaining its own graph andprobabilities of reaching specific destinations. This information is not shared between nodes becauseof the byzantine adversarial model which is assumed. Source nodes only rely on other nodes in thenetwork to forward packets and return acknowledgments. It is also assumed that the nodes may choosenot to forward the packets or not to return acknowledgements and the source adjusts its probability

7

Page 8: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

distribution accordingly. It is important to note that this approach does not rely on intermediary nodesto make hop by hop routing decisions as in other approaches. The following description provides anoverview of the major ideas of our algorithm.

We outline below the main components of our algorithm, postponing formal presentation to SectionB and its proof to section C.

Establishing a feedback mechanism. We require that each packet be acknowledged using a se-cure acknowledgement scheme by the destination and each intermediate node along the path. If anadversary lies on the path it can disrupt the packet or the ack propagation. As a result of our secureacknowledgement process, any adversarial disruption results in the following: the source receives anack from the last intermediate node before the disruption, indicating the first edge on the path (we callit the separator) that appears to have failed. (See Sec.B.2.) We refer to the edges between the sourceand the separator as lucky. The separator and all the edges between the separator and the destinationare considered unlucky. This feedback mechanism allows the algorithm to establish a reliability history.

The intuition behind this approach is that the reliability history of a given link in the network,reflects not only the reliability of that link, but rather the sources ability to traverse the path up to andincluding the given link. For example, if a link in the network is completely reliable, but the best pathleading to that link is unreliable, then the reliability history of the link will be low under our scheme.The end result is that the reliability history of a link correlates with the connectivity of its endpointto the source.

Randomized path strategy. Given a set of paths and their reliability history, a strategy for selectinga path is required. The greedy strategy would always pick the path with the best reliability history.However, as previously discussed, the greedy strategy is deterministic and thus vulnerable to adversarialmanipulation. An oblivious randomized strategy would simply spread packets evenly over all theavailable paths. This type of oblivious strategy will be immune to adversarial manipulation, but willachieve poor performance in the case where the quality of paths is heterogenous (i.e. when some pathsare much better than the average). Our strategy, as in [7], is to use a random path, but to weight thelikelihood of using a path by the exponent of that path’s reliability. Given a loss rate x for a path,βx will be the weight assigned to that path, where β is a small value less than one. The value of β iscrucial for the performance of our algorithm. As β approaches zero, our decision sequence resemblesthat of the greedy strategy, and thus will be vulnerable to adversarial attacks. As β approaches one, ourdecision sequence will resemble that of the inefficient oblivious random strategy. Using an appropriatesmall value of β allows our strategy to preference efficient paths like the greedy strategy, but maintainenough randomness to avoid adversarial manipulation. The optimal value of β can be determined usingthe “best expert” framework of [30, 13]. (See Sec. B.3.) More intuition can be obtained by work onthe non-stochastic multi-armed bandit problem [5] and its distributed analog in [7].

Path selection algorithm. In order to realize our exponentially weighted randomized strategy, anefficient algorithm for generating the sequence of paths from a general graph is necessary. In a smallnetwork, it is possible for the source to simply list all possible paths, compute the weight for each, andperform a normal weighted random selection. However, since the number of paths in the network cangrow exponentially with the number of nodes, this simple technique is not scalable. Our solution is touse a two step process in order to generate the weighted random paths.

The first step is to transform the general undirected network graph (e.g. Fig. 1) into a layereddirected graph from the sender to the receiver (Fig. 2). The receiver resides only in layer zero of thislayered graph. All nodes that can reach the receiver in i hops and can be reached by the sender inH − i hops are represented in layer i of the graph. The sender is represented in layer H of the graph,

8

Page 9: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

s

a

r

b

c

Figure 1: The original network.

s3 s

2

a3 a

2

b2

s4

s1

c1

r0

Figure 2: Constructed layered graph.

and an edge from the sender to itself in the general graph is assumed. The resulting layered graph canbe computed in polynomial time, and represents the union of all possible paths of H hops or less fromthe sender to the receiver. The purpose of this layered directed graph is to prune out all infeasiblepaths in order to simplify the next step of the path selection process. Byzantine adversaries may stillcreate false links which will not be pruned by this step. In the worst case they may cause the networkto seem fully connected, in which case all nodes except the receiver will have representatives in all butthe first and last layers. This will slow the convergence time of the algorithm but does not affect itsoptimality.

The second step uses a weighted parent selection process starting from the receiver in order togenerate the random path. The weight of each incoming edge to a node in the layered graph representsthe source’s ability to communicate with that node over the edge. Thus, the last hop in a path fromthe sender to a given node is chosen by a weighted random selection from the set of incoming edges.Recursive application of this step builds a complete path from the sender to the receiver. Since all pathsin the layered graph originate from the sender and the reliability history of each link represents thefull path reliability to the sender, weighted random paths can be selected using this efficient recursiveprocess.

Sampling unexplored territory. A subtle but important component of the algorithm is its abilityto sample edges in the network, since this is the only way it is able to detect changes in the adversarialfault pattern. This means that the algorithm is probabilistically deviating from the optimal routes inorder to make sure that feedback is obtained from each edge in the network. This is reminiscent ofthe classical multi-armed bandit algorithms in [5] and its extension to network routing in [7], but ourimplementation here is quite different.

This deviation is accomplished by forcing the path to pass, with small specified probability, over acompletely random edge. This is accomplished by generating the path in segments. The first segmentis generated by using the above technique to find a path from the sender to the start of the link, thesecond segment is the link itself, and the final segment also uses the above technique to find the pathfrom the end of the link to the receiver.

Let us give some intuition on why this is desirable. By allowing β to be small the algorithm willmove quickly to the least fault path, but will only utilize even slightly less reliable paths with lowprobability. In order to ensure that the algorithm is making correct decisions, it continuously needs tohave updated feedback on the other paths in the network. This is accomplished by random sampling.Ideally, if every data packet was used as a sampling packet, meaning the sampling rate was 100%, thealgorithm would have extremely high loss rates from exploring faulty paths, but would have extremelyaccurate estimates of the current reliability of paths in the network. On the other hand if the algorithmonly samples paths with 1% of its traffic, then it will be slightly slower in detecting changes, but willbe sending a higher percentage of its packets over links that do not fail.

9

Page 10: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

It is important to note that the algorithm samples with actual data packets and not small ex-ploratory test packets. This is critically important since we are working under a strong byzantineadversarial model. It is assumed that if we send test packets, the adversaries will be aware that we aretesting links and allow the packets to pass through the network. This would increase our probability ofselecting the adversarial controlled links. By utilizing real data packets, the only way for the adversaryto “draw us in” would be to allow enough data packets to successfully pass through its links, that itslinks appeared more reliable then other links in the network. However, our goal is to deliver packetsfrom the source to the destination. If the adversary chooses to deliver these packets, then it is notexhibiting adversarial behavior. If the adversary decides to increase its fault rate as we increase therate at which we utilize its links, the algorithm will immediately respond since it would not be receivingacks for packets.

4 Byzantine Fault Detection

Our detection algorithm is based on using acknowledgments (acks) of the data packets. If a valid ackis not received within a timeout, it is assumed that the packet has been lost. Note that this definitionof loss includes both malicious and non-malicious causes. A loss can be caused by packet drop due tobuffer overflow, packet corruption, a malicious attempt to modify the packet contents, or any otherevent that prevents either the packet or the ack from being received and verified within the timeout.We would like to emphasize that our protocol provides fault avoidance and never disconnects nodesfrom the network. Thus, the impact of false positives, due to normal events such as bursting traffic,is drastically reduced. This provides a much more flexible solution than one where nodes are declaredfaulty and excluded from the network.

4.1 Fault Detection Overview

Our fault detection protocol requires the destination and each intermediate node to return an ack tothe source, for every successfully received data packet. We use shared keys between the source andeach intermediate node as a basis for our cryptographic primitives in order to avoid the prohibitivelyhigh cost of using public key cryptography on a per packet basis. These pairwise shared keys can beestablished on-demand via a key exchange protocol such as Diffie-Hellman [18], authenticated usingdigital signatures.

4.2 Source Route Specification

The mechanism for specifying the source route on a packet is essential for the correct operation of theprotocol. The source route is specified as a list of addresses, where the order of the addresses indicatesthe path which the packet will traverse. The list is “onion” encrypted [52]. Each route is specified byan address indicating the next hop, an HMAC of the packet (not including the list), and the encryptedremaining list. Both the HMAC and the encrypted remaining list are computed with the shared keybetween the source and that node. An HMAC [3] using a one-way hash function such as SHA1 [1] anda standard block cipher encryption algorithm such as AES [2] can be used.

When a node receives a packet it verifies the HMAC of the packet and decrypts the remainder of thesource route using its shared key with the source. If the HMAC verifies, the node proceeds to forwardthe packet to the next hop. Since the source route is onion encrypted the intermediate nodes are onlyaware of the next hop and do not know the full path that the packet will traverse. This prevents theadversary from strategically dropping packets which are intended to traverse specific nodes. It can droppackets based on the selected next hop, but this will only incriminate the adversary, thus decreasing

10

Page 11: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

Source Destination

0.05

0.10 0.04

0.03

1 3

5

42

0

Figure 3: Simple Simulation Topology

the probability that the source selects it as a forwarding node.

4.3 Acknowledgment Specification

For each successfully received data packet, the destination generates an ack packet containing thesequence number of the data packet and an HMAC for authentication. Each probe appends its ownHMAC to the ack and forwards it along the reverse path towards the source. Timeouts are set at everynode so that if the ack is not received, the node gives up waiting and generates its own ack packet. Inorder for the protocol to locate the fault, only the node immediately preceding the fault location shouldtimeout and generate an ack. To accomplish this, a staggered timeout scheme is utilized. A simpletechnique for implementing the staggered timeout scheme is to use a fixed parameter that representsan upper bound on the amount of time it takes for a packet to traverse a single link. A node may thencompute its timeout duration by multiplying this parameter by the number of hops needed to traversethe remainder of the path to the destination and return back to the current position along the path.

When the source receives the ack packet, it attempts to verify the accumulated HMACs startingfrom the end of the packet. The first HMAC, if any, that does not verify, defines the link on which aloss is registered, i.e. the protocol registers a loss on the link between the last valid HMAC and the firstencountered invalid HMAC. If the source times out the ack, a loss is registered on the link between thesource and the first node. The result of this ack creation, timeout, and verification technique, is thatif an adversary drops, modifies, or delays either an ack or data packet, then a loss will be registered onthe affected link.

5 Implementation Results

In order to substantiate the claims made in this work, simulations were conducted to investigate theconvergence time of the algorithm. The simulations were conducted by developing a simple programwhich would simulate the decision making process of the algorithm and examine its performance againstadversarial inputs. The simulation consisted of a source selecting a path to the destination at each unitof time. An adversarial model would then select which nodes at the current unit of time were faulty.The packet would traverse the graph and receive positive feedback from the destination if there wereno faulty nodes on the path, or from the last non-faulty node before the packet was dropped. Usingthis feedback the algorithm would adjust its probabilities and compute a new path for the next packet.

5.1 Simple Configuration

The first set of simulations consisted of 10,000 packets which were sent from the source to the des-tination. The nodes were arranged as indicated in figure 3. The intermediary nodes are labelled 0

11

Page 12: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

Beta = 0.5 Sample Rate = 0.10

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 1001 2001 3001 4001 5001 6001 7001 8001 9001 10001

Packets Sent

Perc

en

t o

n L

ink 0 to 1

0 to 2

1 to 3

1 to 4

2 to 3

2 to 4

3 to 5

4 to 5

Beta = 0.1 Sample Rate = 0.10

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 1001 2001 3001 4001 5001 6001 7001 8001 9001 10001

Packets Sent

Perc

en

t o

n L

ink 0 to 1

0 to 2

1 to 3

1 to 4

2 to 3

2 to 4

3 to 5

4 to 5

Beta = 0.05 Sample Rate = 0.10

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 1001 2001 3001 4001 5001 6001 7001 8001 9001 10001

Packets Sent

Pe

rce

nt

on

Lin

k 0 to 1

0 to 2

1 to 3

1 to 4

2 to 3

2 to 4

3 to 5

4 to 5

Beta = 0.05 Sample Rate = 0.01

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 1001 2001 3001 4001 5001 6001 7001 8001 9001 10001

Packets Sent

Perc

en

t o

n L

ink 0 to 1

0 to 2

1 to 3

1 to 4

2 to 3

2 to 4

3 to 5

4 to 5

Figure 4: Simple Configuration Results

through 5 and their fault rates are indicated. At every unit of time each node was probabilisticallyselected to fail based on the nodes fault rate. On this particular simple configuration the optimalpath would be from the source to node 1, from node 1 to node 3, and then from node 3 to node 5.The results of the simulation are the sources estimated link preference metrics at every unit of time.The experiment was run for different values of β, which is the base of the exponent described in thealgorithm specification. As the value of β decreases, the algorithm responds faster to changes and isable to converge faster. However, as the algorithm responds faster it begins to resemble the greedyalgorithm which has vulnerabilities. In order to explore the effects of β on the convergence time of thealgorithm, multiple experiments were run with the same adversarial input. The results are indicatedin Figure 4. The graphs in this figure show the probability of the source selecting a specific edge atevery unit of time.

With β=0.5 the results show that the source is slowly realizing which path is correct and is sendinga majority of its traffic over the correct sequence of nodes. Notice that once the algorithm convergedit was sending approximately 90% of its traffic across the first hop to node 1 which was experiencinga 5% loss rate and only 10% of its traffic to node 2 which had a 10% loss rate. While this seemsreasonable, notice the convergence when β=0.1 or better yet 0.05. With these values the source is ableto completely differentiate between nodes and converge quickly on the best path. Since the algorithmis continuously exploring sub-optimal paths with a small fraction of its traffic, this low value of β allowsit to react quickly as the adversary changes its fault pattern. While the optimal algorithm might knowthe fault pattern ahead of time, the algorithm we present is able to follow it very closely. If the speedat which the adversary moves is slightly slower then our decision making process, then our algorithmwould in fact follow the optimal at every step.

In order to investigate the effects of the sampling percentage on the convergence time an additional

12

Page 13: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

0

4

5

7

8

9

10

11 14

16

17

19

20

21

22

23

24

25

1

2

3 6 15

13

DestinationSource

12 18

Figure 5: Large Simulation Topology

Beta=0.05 Sample Rate=0.01

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 1001 2001 3001 4001 5001 6001 7001 8001 9001

Packets Sent

Perc

en

t o

n L

ink

Beta=0.05 Sample Rate=0.10

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 1001 2001 3001 4001 5001 6001 7001 8001 9001

Packets Sent

Perc

en

t o

n L

ink

Figure 6: Large Simulation Results

experiment was done using the same simple configuration described above, but with the sampling ratedecreased from 10% to 1%. The results of this experiment are also indicated in Figure 4. It appearsthat the lower sampling rate inhibits the algorithms ability to converge as well as the 10% samplingrate.

5.2 Large Configuration

The previous example provided a demonstration of the effects of various parameters on the performanceof our algorithm. While the previous example showed the convergence of the algorithm, it is somewhatless interesting since the example consisted of a small set of both nodes and links. In this secondexample we will consider a network with 25 nodes forming a 10 layered graph, with 3 nodes at eachlayer (except at the source and destination). The topology of the network is indicated in Figure 5. Inthis example there exists an optimal path from the source to the destination which experiences no loss.This optimal path is indicated in Figure 5 by the bold line. All other links in the network exhibit 10%loss, meaning that when we sample them they will successfully forward the packet 90% of the time.As a result, this should make it more difficult for our algorithm to discover the optimal path since thenon-optimal paths in the network only experience marginal loss. This simulation consisted of a sourceattempting to deliver 10,000 packets to the destination. The graphs in Figures 6 show the results whenwe are performing random sampling with 1% and 10% of our packets respectively. In this experimentβ equal to 0.05 was used.

The results indicate that the algorithm is able to successfully converge on the optimal path afterapproximately 1000 packets are sent. Once the algorithm learned the best path it was able to sendapproximately 99% of its traffic successfully to the destination. The graph visually indicates this byshowing the sources link preferences at every unit of time. When the simulation begins the sourceconsiders all of the links in the network to be equal and then learns their reliability by sending trafficacross the links and receiving feedback. As the number of packets (or trials) increases the sourcesknowledge of the network continuously becomes more accurate. This is evident as the reliable paths

13

Page 14: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

become separated from the less reliable paths and selected with near 100% probability. Since thesource is continuously sampling the less desirable edges it is able to respond quickly to changes in theadversarial fault pattern.

6 Existing Work

Existing Algorithmic Work: The only algorithmic results that possibly work under this strongadversarial model are based on a computational learning framework [30, 13]. Near optimal learningalgorithms, with reliable global information for finding a shortest path in a graph, where at eachtime a different known cost is assigned to each edge, were studied in [30, 13]; these solutions havean exponential computational overhead. Polynomial computational overhead schemes were recentlysuggested by [53, 28]. The solutions in [30, 13] and [53, 28] correspond to “link-state” routing, andthe case where the adversary can only exercise dynamic DOS attacks, but cannot cheat. Byzantinebehavior causes such algorithms to collapse. Most recently, “on-demand” routing against an obliviousadversary was suggested in [7]. In this model, an adversary cannot cheat and the pattern of cheatingand blocking is assumed to be oblivious, i.e., this work does not handle dynamic DOS attacks.

Reinforcement Learning: Interest in applications of ant-based routing in MANETs has risen andmany recent papers have addressed the subject [32, 8]. Gunes et al [33] considers ant-based approachto routing in MANETs, with a completely reactive algorithm. Marwaha et al. [45] studies a hybridapproach using both AODV and reactive Ant-based exploration. Baras et al [8] describes a newalgorithm that utilizes the inherent broadcast nature of wireless networks to multicast control andsignaling packets (ants). ARAMA [32] uses analogous approach. Work on Swarm Intelligence isdescribed in [31, 11, 51, 16, 12, 44, 46, 9, 19, 32, 8].

Networking/Security: algorithms for MANETs. A substantial research effort has gone into thedevelopment of secure routing algorithms both for wireless MANETs and overlay networks. A numberof routing algorithms have been proposed for wireless MANET’s; these protocols can generally becategorized as either proactive or reactive. They include DSDV [39], AODV [40], DSR [27], and severalothers. Many researchers focused on securing classes of (wired) routing protocols such as link-state[23, 14, 56, 21] and distance-vector [50]. Others addressed in detail the security issues of well-knownprotocols such as OSPF [37] and BGP [49].

As mentioned in [23], source authentication is more of a concern in routing than confidentiality.Papadimitratos and Haas showed in [38] how impersonation and replay attacks can be prevented for on-demand routing by disabling route caching and providing end-to-end authentication using an HMAC [3]primitive which relies on the existence of security associations between sources and destinations. Dahillet al. [17] focus on providing hop-by-hop authentication for the route discovery stage of two well-knownon-demand protocols: AODV [40] and DSR [27], relying on digital signatures. Other significant worksinclude SEAD [24] and Ariadne [25] that provide efficient secure solutions for the DSDV [39] and DSR[27] routing protocols, respectively. While SEAD uses one-way hash chains to provide authentication,Ariadne uses a variant of the Tesla [42] source authentication technique to achieve similar securitygoals.

Marti et al. [34] address a problem similar to the one we consider, survivability of the routingservice when nodes selectively drop packets. They take advantage of the wireless cards promiscuousmode and have trusted nodes monitoring their neighbors. Links with an unreliable history are avoidedin order to achieve robustness. Although the idea of using the promiscuous mode is interesting, thissolution does not work well in multi-rate wireless networks because nodes might not hear their neighbors

14

Page 15: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

forwarding communication due to different modulations. In addition, this method is not robust againstcollaborating adversaries.

Collaborating adversaries can be handled by new work of Mittal and Vigna (CCS 2002) [35] whichplaces “sensors” in various locations, monitoring both topology updates as well as network traffic inorder to detect adversarial behavior. In order to decide whether a routing update is malicious or not,a router needs to have reliable, nonlocal topology information. Reliance on external sensors is needed,because, unfortunately, RIP routers do not have this information. It is necessary to check every possiblepath (there are exponentially many of them) from every router to every other router, since adversariesmay engage in selective message blocking. The obvious drawback is the introduction of the sensors;second is reliance on their reliability. Also, message overhead of the algorithm grows with (potentiallyexponential) number of paths.

The problem of source authentication for routing protocols was explored using digital signatures[37] or symmetric cryptography based methods: hash chains [23], chains of one-time signatures [56]or HMAC [21]. Intrusion detection is another topic that researchers focused on, for generic link-state[55, 43] or OSPF [54].

Perlman [41] designed the Network-layer Protocol with Byzantine Robustness (NPBR) which ad-dresses denial of service at the expense of flooding and digital signatures. The problem of byzantinenodes that simply drop packets (black holes) in wired networks is explored in [15, 10]. The approachin [15] is to use a number of trusted nodes to probe their neighbors, assuming a limited model andwithout discussing how probing packets are disguised from the adversary. A different technique, flowconservation, is used in [10]. Based on the observation that for a correct node the number of bytesentering a node should be equal to the number of bytes exiting the node (within a threshold), theauthors suggest a scheme where nodes monitor the flow in the network. This is done by requiring eachnode to have a copy of the routing table of their neighbors and reporting the incoming and outgoingdata. Although interesting, the scheme does not work when two or more adversarial nodes collude.

Using digital signatures to verify packet paths has been considered for distance vector, link stateadvertisements and BGP [48, 36, 29]. Other work attempts to reduce the performance overheadassociated with securing routing protocols using digital signatures in context of efficiency and securityof updates in link state routing (Hauser et al. [22]).

7 Conclusion

In this work we have presented an online algorithm which is based on the Swarm Intelligence paradigmand Distributed Reinforcement Learning approach. Through mathematical analysis and simulationresults we have shown the algorithms competitive performance under a strong adversarial model con-sisting of dynamic proactive adversarial attacks, where the network may be completely controlled bybyzantine adversaries.

The results of this work indicate validity of our approach and motivate the need for future workin this direction. We intend on implementing this protocol in a more realistic simulation environmentand exploring the effects of both mobility and more sophisticated active adversarial attacks on thealgorithm.

15

Page 16: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

References

[1] Secure Hash Standard (SHA1). Number FIPS 180-1. National Institute for Standards and Technology (NIST), 1995.http://www.itl.nist.gov/fipspubs/fip180-1.htm.

[2] Advanced Encryption Standard (AES). Number FIPS 197. National Institute for Standards and Technology (NIST),2001. http://csrc.nist.gov/encryption/aes/.

[3] The Keyed-Hash Message Authentication Code (HMAC). Number FIPS 198. National Institute for Standards andTechnology (NIST), 2002. http://csrc.nist.gov/publications/fips/index.html.

[4] David G. Andersen, Hari Balakrishnan, M. Frans Kaashoek, and Robert Morris. Resilient overlay networks. InProceedings 18th ACM SOSP, Banff, Canada, 10 October 2001.

[5] Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. Gambling in a rigged casino: the adversarialmulti-armed bandit problem. In Proceedings of the 36th Annual Symposium on Foundations of Computer Science,pages 322–331. IEEE Computer Society Press, Los Alamitos, CA, 1995.

[6] Baruch Awerbuch, Dave Holmer, Cristina Nita-Rotaru, and Herb Rubens. An on-demand secure routing protocolresilent to byzantine failures. In Wireless Security Workshop Proceedings, September 2002.

[7] Baruch Awerbuch and Yishay Mansour. Online learning of reliable network paths. In PODC, 2003. to appear.

[8] John S. Baras and Harsh Mehta. Dynamic adaptive routing in manets. In Procedings of Annual ARL CTA Symposium,2003.

[9] E. Bonabeau, F. Henaux, S. Guerin, D. Snyers, P. Kuntz, and G. Theraulaz. Routing in telecommunications networkswith “smart” ant-like agents. In Intelligent Agents for Telecommunications Applications ’98 (IATA’98), 1998.

[10] Kirk A. Bradley, Steven Cheung, Nick Puketza, Biswanath Mukherjee, and Ronald A. Olsson. Detecting disruptiverouters: A distributed network monitoring approach. In IEEE Symposium on Security and Privacy, 1998.

[11] G. Di Caro and M. Dorigo. AntNet: a mobile agents approach to adaptive routing. Technical Report IRIDIA/97-12,Universite Libre de Bruxelles, Belgium, 1997.

[12] Gianni Di Caro and Marco Dorigo. Antnet: Distributed stigmergetic control for communications networks. Journalof Artificial Intelligence Research, 9:317–365, 1998.

[13] Nicolo Cesa-Bianchi, Yoav Freund, David P. Helmbold, David Haussler, Robert E. Schapire, and Manfred K. War-muth. How to use expert advice. In Proc. 25th ACM Symp. on Theory of Computing, pages 382–391, 1993. Toappear, Journal of the Association for Computing Machinery.

[14] Steven Cheung. An efficient message authentication scheme for link state routing. In The 13th Annual ComputerSecurity Applications Conference, pages 90–98, December 1997.

[15] Steven Cheung and Karl Levitt. Protecting routing infrastructures from denial of service using cooperative intrusiondetection. In New Security Paradigms Workshop, 1997.

[16] P. Druschel D. Subramaniam and J. Chen. Ants and reinforcement learning : A case study in routing in dynamicnetworks. In Proceedings of IEEE MILCOM, Atlantic City, NJ, 1997.

[17] B. Dahill, B.N. Levine, C. Shields, and E. Royer. A secure routing protocol for ad hoc networks. Technical Report01-37, Department of Computer Scinece, University of Massachusetts, August 2001.

[18] W. Diffie and M. E. Hellman. New directions in cryptography. IEEE Trans. Inform. Theory, IT-22:644–654, November1976.

[19] Marco Dorigo, Vittorio Maniezzo, and Alberto Colorni. The Ant System: Optimization by a colony of cooperatingagents. IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics, 26(1):29–41, 1996.

[20] Yair Frankel, Peter Gemmell, Philip D. MacKenzie, and Moti Yung. Optimal resilience proactive public-key cryp-tosystems. In IEEE Symposium on Foundations of Computer Science, pages 384–393, 1997.

[21] Michael T. Goodrich. Efficient and secure network routing algorithms. Provisional patent filing., January 2001.

[22] Ralf Hauser, Antoni Przygienda, and Gene Tsudik. Reducing the cost of security in link state routing. In In InternetSociety Symposium on Network and Distributed Systems Security,, pages 93–99, 1997.

[23] Ralf Hauser, Tony Przygienda, , and Gene Tsudik. Reducing the cost of security in link-state routing. In Symposiumof Network and Distributed Systems Security, 1997.

[24] Yih-Chun Hu, David B. Johnson, and Adrian Perrig. SEAD: Secure efficient distance vector routing for mobilewireless ad hoc networks. In The 4th IEEE Workshop on Mobile Computing Systems and Applications. IEEE, June2002.

[25] Yih-Chun Hu, Adrian Perrig, and David B. Johnson. Ariadne: A secure on-demand routing protocol for ad hocnetworks. In The 8th ACM International Conference on Mobile Computing and Networking, September 2002. Toappear.

16

Page 17: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

[26] J.-P. Hubaux, L. Buttyan, and S. Capkun. The quest for security in mobile ad hoc networks. In The 2nd ACMSymposium on Mobile Ad Hoc Networking and Computing, October 2001.

[27] David B. Johnson, David A. Maltz, and Josh Broch. DSR: The Dynamic Source Routing Protocol for Multi-HopWireless Ad Hoc Networks. in Ad Hoc Networking, chapter 5, pages 139–172. Addison-Wesley, 2001.

[28] Adam Kalai and Santosh Vempala. Efficient algorithms for the online decision problem. In Proc. of 16th Conf. onComputational Learning Theory, Wash. DC, 2003.

[29] S. Kent, C. Lynn, J. Mikkelson, and K. Seo. Real world performance and deployment issues. In Secure BorderGateway Protocol (SecureBGP). In Proceedings of the Symposium on Network and Distributed System Security,February 2000.

[30] Nick Littlestone and Manfred K. Warmuth. The weighted majority algorithm. Information and Computation,108:212–261, 1994. A preliminary version appeared in FOCS 1989.

[31] M. Littman and J. Boyan. A distributed reinforcement learning scheme for network routing. In Proceedings of theInternational Workshop on Applications of Neural Networks to Telecommunications. Alspector, J., Goodman, R. andBrown, T. X. (Ed.), pages 45–51, 1993.

[32] U. Sorges M. Gunes and I. Bouazizi. The ant colony based routing algorithm for manets. In Proc. of the 2002 ICPPWorkshop on Ad Hoc Networks (IWAHN 2002), pages 79–85. IEEE Computer Society Press, 2002.

[33] U. Sorges M. Gunes and I. Bouazizi. ara - the ant colony based routing algorithm for manets. In Proc. of the 2002ICPP Workshop on Ad Hoc Networks (IWAHN 2002), pages 79–85. IEEE Computer Society Press Stephan Olariuedt., 2002.

[34] S. Marti, T. Giuli, K. Lai, and M. Baker. Mitigating routing misbehavior in mobile ad hoc networks. In The 6thInternational Conference on Mobile Computing and Networking, August 2000.

[35] Vishal Mittal and Giovanni Vigna. Sensor-based intrusion detection for intra-domain distance-vector routing. InProceedings of the CCS, pages 127 – 137. ACM, 2002.

[36] S. Murphy and M. Badger. Digital signature protection of the ospf routing protocol, 1996.

[37] S. L. Murphy and M. R. Badger. Digital signature protection of the OSPF routing protocol. In Symposium onNetworks and Distributed Systems Security, 1996.

[38] P. Papadimitratos and Z.J. Haas. Secure routing for mobile ad hoc networks. In SCS Communication Networks andDistributed Systems Modeling and Simulation Conference, pages 27–31, January 2002.

[39] Charles E. Perkins and Pravin Bhagwat. Highly dynamic destination-sequenced distance-vector routing (DSDV) formobile computers. In ACM SIGCOMM’94 Conference on Communications Architectures, Protocols and Applications,1994.

[40] Charles E. Perkins and Elizabeth M. Royer. Ad hoc Networking, chapter Ad hoc On-Demand Distance VectorRouting. Addison-Wesley, 2000.

[41] Radia Perlman. Network Layer Protocols with Byzantine Robustness. PhD thesis, MIT LCS TR-429, October 1988.

[42] Adrian Perrig, Ran Canetti, Dawn Song, and Doug Tygar. Efficient and secure source authentication for multicast.In Network and Distributed System Security Symposium, February 2001.

[43] Diheng Qu, Brian M. Vetter, Feiyi Wang, Ravindra Narayan, S. Felix Wu, Y. Frank Jou, Fengmin Gong, and ChandruSargor. Statistical anomaly detection for link-state routing protocols. In IEEE Symposium on Security and Privacy(5 Minutes), May 1997.

[44] O. E. Holland R. Schoonderwoerd and J. L. Bruten. Ant-like agents for load balancing in telecommunicationsnetworks. In Proc. First ACM International Conference on Autonomous Agents, Marina del Rey, California, pages209–216, 1997.

[45] C. K. Tham S. Marwaha and D. Srinivasan. Mobile agents based routing protocol for mobile ad hoc networks. In inProceedings of IEEE Globecom,, 2002.

[46] Ruud Schoonderwoerd, Owen E. Holland, Janet L. Bruten, and Leon J. M. Rothkrantz. Ant-based load balancingin telecommunications networks. Adaptive Behavior, 5(2):169–207, 1996.

[47] Sleator and Tarjan. Amortized efficiency of list update and paging rules. Communication of the ACM, 28(2):202–208,1985.

[48] B. Smith and J. Garcia-Luna-Aceves. Securing the border gateway routing protocol. In Proc. Global Internet’96,London, UK, 20-21 November 1996., 1996.

[49] B. Smith and J.J. Garcia-Luna-Aceves. Efficient security mechanisms for the border gateway routing protocol.Computer Communications (Elsevier), 21(3):203–210, 1998.

[50] Bradley R. Smith, Shree Murthy, and J.J. Garcia-Luna-Aceves. Securing distance-vector routing protocols. InSymposium on Networks and Distributed Systems Security, 1997.

17

Page 18: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

[51] Devika Subramanian, Peter Druschel, and Johnny Chen. Ants and reinforcement learning: A case study in routingin dynamic networks. In IJCAI (2), pages 832–839, 1997.

[52] Paul F. Syverson, David M. Goldschlag, and Michael G. Reed. Anonymous connections and onion routing. In IEEESymposium on Security and Privacy, 1997.

[53] Eiji Takimoto and Manfred K. Warmuth. Path kernels and multiplicative updates. In COLT Proceedings, 2002.

[54] S.F. Wu, H.C. Chang, D. Qu, F. Wang F. Jou, F. Gong, C. Sargor, and R. Cleaveland. JiNao: Design andimplementation of a scalable intrusion detection system for the OSPF routing protocol. Journal of Computer Networksand ISDN Systems, 1999.

[55] Shyhtsun F. Wu, Fei yi Wang, Brian M. Vetter, W. Rance Cleaveland, Y. Frank Jou, Fengmin Gong, and Chan-dramouli Sargor. Intrusion detection for link-state routing protocols. In IEEE Symposium on Security and Privacy,1997.

[56] K. Zhang. Efficient protocols for signing routing messages. In Symposium on Networks and Distributed SystemsSecurity, 1998.

Appendix

A The Formal Model

A.1 Routing strategy.

Consider an overlay network G(V,E) with a source s and receiver r, and a hop count H. At each integertime t ∈ N, source sends packet to the receiver over a path of H hops at the most. We normalize ourtime units in such a way that transmission of a packet over H hops takes 1 unit of time at the most.

Let Π(s, r) be collection of paths from s to r. A H-routing algorithm Π consists of, for each timet ∈ N, a deterministic or randomized routing policy πt. A deterministic routing strategy specifies amapping, for each t ∈ N,

πt : V × V h → E

whose arguments are:

• current router v ∈ V

• prefix of the current routing path Π′ ∈ V h, that leads from source s to v

and whose value, e = π(v, t,Π′) is an outgoing edge e = (v → u) ∈ E which is the next edge to betaken by the packet. A probabilistic routing strategy specifies a mapping

ζ : V × V h × E → (0, 1)

which specifies the probability ζ(v, t,Π′, e) of continuing on edge e, after arriving at router v throughrouting path Π′.

The deterministic routing strategy is a special case of the probabilistic strategy, and thus in therest of the paper we describe probabilistic strategies. The description of the strategy, namely mappingζ, is carried in the packet.

A routing strategy is memoryless if it is independent of path traversed by the packet. An algorithmis τ -stationary if its strategy is changing at most once in τ steps.

18

Page 19: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

A.2 Adversary Model

We denote by A(Π) the strategy of the adversary against routing policy Π. It obliviously to actions ofthe algorithm, but aware of its routing policy. At each moment in time t ∈ N, the adversary picks asubset of nodes under its control, VA(t) ⊂ V . Let UA(t) ∈ E be the subset of edges incident to VA(t).

For each node v and each time t, we define

λA(e, t) =

1 if e ∈ UA(t)1 otherwise

(1)

For each time t ∈ N, the adversary picks an infiltration strategy

λA : N→ 2E

of the subset of edges, and a blocking strategy γA

and whose value γA(t, e,Π′, π) ∈ 0, 1 is the decision whether to block the packet or not to blockit. Namely,

γA(t, e,Π′, π) =

1 if e ∈ Ut and e blocks at time t0 otherwise

Note that e /∈ Ut means that e is not controlled by adversary and there is no way to block thispacket.

The values γA(t, e,Π′, π) are generally not known to the algorithm. Our secure acknowledgmentscheme alleviates this problem.

A.3 Evaluating the algorithm

Formalizing the benchmark cost. Consider now our benchmark strategy ΠA against a givenadversary A that must be fixed in some interval, say (0, τben). Specifically, this strategy consists ofa path Π = (e1, e2, . . . , eH) picked by the benchmark algorithm. For any 0 ≤ t ≤ τben, define theinstantaneous faultiness of such path

λ(Π, t) =H

i=0

λA(ej , t)

Best path ΠA is minimizing loss for A:

ΠA = argminΠExpt∈0,τben[λ+(Π, t)]

and our benchmark is Expt∈0,τben[λ(ΠA, t)].

Formalizing the algorithm cost. Let our algorithm B pick pick path ΠB(t) = (e1( t), e2(t), . . . , ej−1(t))

to edge e at layer j at time t. Note that the loss factor for our path at time t, for an edge e is

ψ(ΠB(t)) = 1−j

i=1

(1− γA(t, ej(t),Πj(t), π))

The algorithms average loss is Expt∈0,τalg[ψ(Π(t))].

19

Page 20: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

The error term. Our loss is maximum, overall selection of A and all selection of ΠA of

δB = maxA,ΠA

Expt∈0,τalg[ψ(ΠB(t))− λ(ΠA, t]

Theorem A.1 The algorithm B in section B accomplishes δB ≤ ǫ for all ǫ in window τalg = τben · τpha.Here,

τpha = Ω(|E|

ǫsa) (2)

T =H2

ǫex2· log d (3)

B Details of the Algorithm

B.1 Transforming a graph into a layered directed graph

Suppose that a packet starts at source s, and while traversing the network carries a hop count. Whenpacket arrives to node v after traversing i hops, we consider the packet as if it arrives at “virtual” nodevH−i. We can consider a directed levelled acyclic “virtual” graph G′(V ′, E′) where

V ′ = vi|v ∈ V, i ≤ H

andE′ = vi → ui+1|(u, v) ∈ E

Suppose we have a sender s, a receiver r, and a hop count H.

V 0 ← r0repeat j = 0 downto H − 1

V j+1 ← vj+1 | ∃uj ∈ Vj , (u, v) ∈ E

Ej+1 ← vj+1 → uj | ∃uj ∈ Vj , (u, v) ∈ E

remove all edges/vertices unreachable from sH

Figure 7: Algorithm for constructing layered graph.

We observe that a directed levelled graph G′ of depth H simulates any communication where packettraverses a bounded number of hops H ≤ |V |. Thus, for the rest of the paper, we consider, withoutloss of generality, our network graph to be levelled, directed, acyclic, and every node is reachable fromthe source.

Now, we will imagine a node s and r which we refer to as the “source” and “sink” nodes respectively.The sink node r will be placed in layer 0 of the graph.

B.2 Feedback mechanism

For each packet sent at phase at time t, let Π(t) be the collection of nodes traversed by this packet. Forǫac fraction of all packets, each node will (indirectly) ack this packet, and the path will be partitionedinto sub-paths Π(t) = Π+(t)·Π−(t), so that edges (vj , vj−1) ∈ Π+(t) have acked, while edges (vj , vj−1) ∈Π−(t) have failed to ack.

20

Page 21: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

We also divide time into phases T1, T2, . . . TT, each of length τpha, so that t ∈ Ti is τpha · i ≤ t ≤τpha · (i+ 1).

Let Ti(e) = t ∈ T |e ∈ Π(t) be the set of times edge e appears on the path, and, similarly letT +

i (e) = t ∈ T |e ∈ Π+(t) be the set of times edge e appears on the path and succeeded to ack, andlet T −

i (e) = t ∈ T |e ∈ Π+(t) be the number of times e failed to ack. We define the “faultiness” ofedge (x, y) at time t as

ψ(x, y, t) =

1 if (x, y) ∈ Π−(t)0 if (x, y) ∈ Π+(t)undefined otherwise

(4)

and we defined the average of a function f(t) that is defined at times t1, . . . tk in phase Ti as

Expt∈Ti[f(t)] =1

k

k∑

i=1

f(ti) (5)

Also, the average of a function over phases, from phase 1 till phase j of a function f(i) will be

Expi∈0,i[f(j)] =1

j

j∑

i=1

f(i) (6)

Define average grade for (x, y) in a phase j as

ψ(x, y, Tj) = Expt∈Tj−1[ψ(x, y, Ti]) (7)

and let the average cumulative weight be

Ψ(x, y, Ti) = Expi∈0,i−1[ψ(x, y, Tj ]) (8)

This construction is used in [7].

B.3 Selecting Incoming probabilities

The weight of the link (x, y) is updated as

ξ(x, y, Ti)← βΨ(x,y,Ti)) (9)

Finally, each node v sets a probability of its incoming edges as

π(v, u, Ti)←ξ(v, u, Ti)

w∈I(v)

ξ(v, w, Ti)(10)

This construction is used in [7]. Below, we provide the code for selecting the path (which is run atthe source node); see Fig. 8.

21

Page 22: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

Path Generationi← Hvi ← r /* start at the sink */repeat H times

select vi−1 ← u with prob. π(vi, u,H − i)i← i− 1

Figure 8: Links state version of the algorithm. The above algorithm is performed at the source node.

B.4 Sampling the unexplored territory

Also, we can consider

ζr|e : E → (0, 1)

with the meaning of probability ζ(e)r|e is that, while routing message from s to r, and conditionedon passing through edge e, each node v chooses an outgoing edge e = (v → u) ∈ for u ∈ O(v) (whereO(v) are outgoing edges of v (with probability proportional to ζ(e)r|e.

Note that ζ(e)r|(u,v) can be computed according to following:

ζ(e)r|(u,v) =

ζ(e)u if e precedes (u, v)1 if e = (u, v)ζ(e)r if e follows (u, v)

(11)

That is, in layers before u we route towards u. Once we reach node u we route on (u, v) with prob.1. After crossing (u, v) we route towards r.

Consider now probability

ζǫ(e) = (1− ǫ)ζr(e) +∑

e∈E

ǫ

|E|· ζr|e(e)

This corresponds to using ζr for (1− ǫ) fraction of the time, and using ζr|e with random e for ǫ|E| of

the time. Let us choose ǫ = ǫsa. This is the final probability distribution used by our routing algorithm.

This technique has not been used in this form in the literature. It is inspired by [7] and the originalwork of Auer et al [5].

C Proof of Near-Optimality of Loss Fraction

For the analysis we will relate the number of times a node v was disconnected from the source s to itsdistance from the source. The proof would be by induction on the distance from the source. Considerfirst the operation of experts in different phases.

The following bound is derived using techniques in [30, 7]. Define the adversarial pattern

λ(e, t) =

1 if e is adversarrail at time t0 otherwise

(12)

22

Page 23: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

Given a path Π = (e1, e2 . . . eH) at time t, define lossiness of this path as the number of edges onthis paths under adversary control

λ(π, t) =H

i=1

λ(ei, t)

Theorem C.1 There exists a randomized algorithm, BEX(v), that selects edge, at phase j, from v tou ∈ I(v) with probability proportional to

βExpi∈0,j−1[Ψ(v,u,Ti)]

such that for any sequence of T phases at node v, holds

Expi∈0,i[Ψ(BEX(v)] ≤ O(

log d

T)+

minu∈I(v)

Expi∈0,T[λ(v, u, Ti) + Ψ(BEX(u), Ti)]

Proof. The optimal prescient algorithm has the opportunity to pick, among all u ∈ I(v), fixed parentu∗ minimizing

Expi∈0,T[λ(v, u, Ti) + ψ(BEX(u, Ti)]

and accomplish loss of at most

Expi∈0,i[v, u∗, Ti] ≤ λ(v, u∗, Ti) + Ψ(BEX(u), Ti)

The best expert algorithm BEX(v) is trying to minimize its loss associated with parent selection,and option u∗ is one of the options available to it. However, due to its online decision making process,

best experts learns adapts to best possible option while suffering an additive regret of O(

log d

T) (i.e.,

its loss is O(

log d

T) higher) on sequence of T steps. The proof of the additive bound on regret follows

from an argument similar to [30]. ⊓⊔

The following lemma is using a proof technique similar to [7].

Lemma C.2 Let v be a node at distance j from s and W an arbitrary spanning tree Then,

Expi∈0,T[Ψ(BEX, v, Ti)] ≤

Expi∈0,T[λ(W, v, Ti)] + j ·

log d

T

Proof. The proof is by induction on j. For j = 1 the claim is trivial. For j = 2 we are simplyrunning the BEX algorithm and the bound is directly from Theorem C.1. Assume the claim hold for jand show it for j + 1. Let I(v) be the neighbors of v in level j. By the induction hypothesis, we havethat u ∈ I(v) is disconnected from s at most

Expi∈0,T[Ψ(BEX, v, Ti)] ≤

Expi∈0,T[λ(W, v, Ti)] + j ·

log d

T)

23

Page 24: Provably Secure Competitive Routing against …...Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning Baruch Awerbuch, David Holmer,

Let uW be the parent of v in W. Note that v has the options of choosing uW; by doing so, the fractionof times v will get disconnected is at most

Expi∈0,T[λ(W, uW, Ti)] + j ·

log d

T)

By choosing uW, the fraction of times that v will get disconnected is upper bounded by the sum offraction of times uW gets disconnected and edge from v to uW gets disconnected, which is, by definition,at most Expi∈0,T[λ(v, uW, Ti]. Thus, we get

Expi∈0,T[λ(W, uW, Ti) + λ(v, uW, Ti] + j ·

log d

T

≤ Expi∈0,T[λ(W, uW, Ti)] + (j + 1) ·

log d

T)

The claim follows.

⊓⊔

24