On Tracing Attackers of Distributed Denial-of-Service Attack through

On Tracing Attackers of DistributedDenial-of-Service Attack through

Distributed Approaches

WONG, Tsz Yeung

A Thesis Submitted in Partial Fulfillment

of the Requirements for the Degree of

Doctor of Philosophy

in

Computer Science and Engineering

c©The Chinese University of Hong Kong

September 2007

The Chinese University of Hong Kong holds the copyright of this thesis. Any

person(s) intending to use a part or the whole of the materials in the thesis

in a proposed publication must seek copyright release from the Dean of the

Graduate School.

Thesis/Assessment Committee

Professor YOUNG Fung Yu (Chair)

Professor WONG Man Hon (Thesis Supervisor)

Professor LEE Moon Chuen (Committee Member)

Professor LEONG Hong Va (External Examiner)

Abstract

The denial-of-service attack has been a pressing problem in recent years.

Denial-of-service defense research has blossomed into one of the main streams

in network security. Various techniques such as the pushback message, the

ICMP traceback, and the packet filtering techniques are the remarkable re-

sults from this active field of research.

The focus of this thesis is to study and devise efficient and practical al-

gorithms to tackle the flood-based distributed denial-of-service attacks (flood-

-based DDoS attack for short), and we aim to trace every location of the

attacker. In this thesis, we propose a revolutionary, divide-and-conquer trace-

back methodology. Tracing back the attackers on a global scale is always a

difficult and tedious task. Alternatively, we suggest that one should first iden-

tify Internet service providers (ISPs) that contribute to the flood-based DDoS

attack by using a macroscopic traceback approach. After the concerned ISPs

have been found, one can narrow the traceback problem down, and then the

attackers can be located by using a microscopic traceback approach.

For the macroscopic traceback problem, we propose an algorithm, which

leverages the well-known Chandy-Lamport’s distributed snapshot algorithm,

so that a set of border routers of the ISPs can correctly gather statistics in a

coordinated fashion. The victim site can then deduce the local traffic intensi-

ties of all the participating routers. Given the collected statistics, we provide

a method for the victim site to locate the attackers who sent out dominat-

ing flows of packets. Our finding shows that the proposed methodology can

i

pinpoint the location of the attackers in a short period of time.

In the second part of the thesis, we study a well-known technique against

the microscopic traceback problem. The probabilistic packet marking (PPM

for short) algorithm by Savage et.al. has attracted the most attention in con-

tributing the idea of IP traceback. The most interesting point of this IP

traceback approach is that it allows routers to encode certain information on

the attack packets based on a pre-determined probability. Upon receiving a

sufficient number of marked packets, the victim (or a data collection node) can

construct the set of paths the attack packets traversed (or the attack graph),

and hence the victim can obtain the locations of the attackers. In this the-

sis, we present a discrete-time Markov chain model that calculates the precise

number of marked packets required to construct the attack graph.

Though the PPM algorithm is a desirable algorithm that tackles the mi-

croscopic traceback problem, the PPM algorithm is not perfect as its ter-

mination condition is not well-defined in the literature. More importantly,

without a proper termination condition, the traceback results could be wrong.

In this thesis, we provide a precise termination condition for the PPM algo-

rithm. Based on the precise termination condition, we devise a new algorithm

named the rectified probabilistic packet marking algorithm (RPPM algorithm

for short). The most significant merit of the RPPM algorithm is that when

the algorithm terminates, it guarantees that the constructed attack graph is

correct with a specified level of confidence. Our finding shows that the RPPM

algorithm can guarantee the correctness of the constructed attack graph un-

der different probabilities that the routers mark the attack packets and dif-

ferent structures of the network graphs. The RPPM algorithm provides an

autonomous way for the original PPM algorithm to determine its termination,

and it is a promising means to enhance the reliability of the PPM algorithm.

ii

摘摘摘要要要

這數年間，「分散式阻斷服務攻擊」已成為一個迫切需要解決的問題。故防

治分散式阻斷服務攻擊的研究已成為一個主要的網絡保安課題。這活躍的研

究疇產生了多個卓越的研究結果，如pushback信息技術、ICMP追蹤技術及封

包過濾技術等。

　　本論文主要研究防治「洪水式阻斷服務攻擊」（簡稱「洪水攻擊」）的

方案，及設計可行的、高效的演算法以防治洪水攻擊。本論文主要研究方

向，是研究方案用以找出洪水攻擊的發動地點。我們提出一種創新的、以分

治法為本的追蹤技術，以追蹤發動洪水攻擊的地點。洪水攻擊的規模往往是

全球性的，故追蹤發動攻擊的地點亦往往是困難與煩瑣的。因此，我們提出

一個二步的追蹤方案，以追蹤全球性洪水攻擊的發動地點。第一步，所有的

網絡供應商要合作，以找出那些網絡供應商包含了洪水攻擊的發動地點。這

一步，我們稱之為「宏觀追蹤方案」。下一步，當發現了那些網絡供應商包

含了洪水攻擊的發動地點，有關的網絡供應商便會採用「微觀追蹤方案」，

以追蹤在網絡供應商內的所有的洪水攻擊的發動地點。

　　本論文提出一「宏觀追蹤演算法」。該宏觀追蹤演算法是建基於有名的

「Chandy－Lamport分佈式快照演算法」，以進行分佈式的追蹤。我們命名

該演算法為「快照追蹤演算法」。快照追蹤演算法是在各網絡供應商的邊界

路由器上執行的，而這些路由器將按照演算法的指示合作地收集數據，再把

數據送給洪水攻擊的受害網站。根據路由器的數據，受害網站即可以排列出

各網絡供應商輸出的攻擊流量。從而，受害網站即可以找出有可能的攻擊發

動地點。根據我們的研究發現，快照追蹤演算法是一個高效的演算法，能在

短時間內找出攻擊發動地點。

iii

　　本論文接著探討「微觀追蹤演算法」。「或然性封包編碼演算法」（簡

稱PPM演算法）是一個備受著目的IP追蹤演算法，而該演算法亦是適合成為

「微觀追蹤演算法」。PPM演算法值得留意的特點，在於其根據一個預先設

定的或然率，稱為「編碼或然率」，在網絡供應商內的路由器上，把封包選

擇性的編碼。當洪水攻擊的受害網站收到足夠的已編碼封包，PPM演算法便

可以計算出攻擊封包的行走路線。從而，PPM演算法便可以找出洪水攻擊的

發動地點。在本論文中，我們研究出一個「馬爾可夫鏈模型」，能讓受害網

站準確地計算出需要的已編碼封包數量，以計算出準確的攻擊封包行走路

線。

　　縱使PPM演算法是一個優秀的微觀追蹤演算法，可惜的是，由於現在沒

有研究項目把它的「停止運作條件」作明確的定義，因此PPM演算法並不算

是一個完美的演算法。更重要的是，若果PPM演算法的停止運作條件是錯誤

的話，它的追蹤結果（即攻擊封包的行走路線）將會是錯誤的。本論文將

為PPM演算法，研究出一個精確的停止運作條件。由於新的停止運作條件將

為PPM演算法帶來改變，我們把新的演算法命名為「修正的或然性封包編

碼演算法」（簡稱RPPM演算法）。RPPM演算法最重要的價值，在於它能保

證RPPM演算法的追蹤結果是在一個指定的準確度以上。我們的研究發現，在

不同的編碼或然率及網絡架構之下，RPPM演算法都可以保證追蹤結果是在指

定的準確度之上。總結RPPM演算法的優點，是其能為PPM演算法帶來自動化

的停止運作條件，從而提高了PPM演算法的可靠性。

iv

Acknowledgement

In completing this thesis, I am most grateful to my thesis advisor, Dr. Man-

hon Wong, and my former thesis advisor, Dr. John Chi-shing Lui, who have

been giving continuous support and guidance to me throughout the past five

years.

I am also glad to have my colleagues in the Department Computer Science

and Engineering, especially Mr. C. M. Lee, Mr. Ray Lam, Mr. Y. T. Ma, Mr.

T. B. Ma, Mr. Y. K. Liu, Mr. Y. K. Hui, Ms. Catherine Zhou, and Dr. L.C.

Lau. They have given me invaluable advice and support through my years of

research life.

Last but not least, I am most glad to have Ms. Elaine Chan who has been

giving me unconditional love and the strength to get through the difficulties I

encountered.

v

Contents

1 Defense Against Denial-of-Service Attack 1

1.1 Overview of Attack Methodology . . . . . . . . . . . . . . . . . 2

1.1.1 Vulnerability-based attack . . . . . . . . . . . . . . . . . 3

1.1.2 Flood-based attack . . . . . . . . . . . . . . . . . . . . . 4

1.1.3 Worm attack . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1.4 Flash crowd . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.2 Scope of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.2.1 General assumptions . . . . . . . . . . . . . . . . . . . . 10

1.2.2 A divide-and-conquer traceback approach . . . . . . . . . 11

1.3 Structure and Contribution of the Thesis . . . . . . . . . . . . . 14

1.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.4.1 Distributed Snapshot Algorithm . . . . . . . . . . . . . . 16

1.4.2 DDoS Defense Mechanisms . . . . . . . . . . . . . . . . . 17

2 Distributed Snapshot Traceback Algorithm 22

2.1 Overview and Problem Definition . . . . . . . . . . . . . . . . . 24

2.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.1.2 Problem definition . . . . . . . . . . . . . . . . . . . . . 25

2.1.3 Traceback methodology . . . . . . . . . . . . . . . . . . 28

2.1.4 How to perform the traceback . . . . . . . . . . . . . . . 29

2.1.5 Difficulties of a distributed traffic measurement . . . . . 31

vi

2.2 Distributed Algorithm . . . . . . . . . . . . . . . . . . . . . . . 32

2.2.1 Reasons for incorrect traceback result . . . . . . . . . . . 33

2.2.2 Measuring the correct local traffic . . . . . . . . . . . . . 33

2.2.3 The distributed snapshot algorithm . . . . . . . . . . . . 34

2.2.4 Pseudocode and execution of snapshot algorithm . . . . 36

2.2.5 Example in calculating the traceback result . . . . . . . 42

2.3 Interpreting the Traceback Result . . . . . . . . . . . . . . . . . 46

2.3.1 Investigation of the traffic inequality . . . . . . . . . . . 47

2.3.2 Calculating bounds for the number of packets arrived at

the victim site . . . . . . . . . . . . . . . . . . . . . . . . 50

2.4 Performance Evaluations . . . . . . . . . . . . . . . . . . . . . . 53

2.5 Implementation Issues . . . . . . . . . . . . . . . . . . . . . . . 63

2.5.1 Topology construction . . . . . . . . . . . . . . . . . . . 63

2.5.2 System overhead . . . . . . . . . . . . . . . . . . . . . . 64

2.5.3 Implementation issue based on ICMP traceback . . . . . 67

2.5.4 An alternative to aggregate congestion control and push-

back . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

2.5.5 Special deployment - acyclic network . . . . . . . . . . . 68

2.5.6 Partial deployment . . . . . . . . . . . . . . . . . . . . . 72

2.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . 77

3 Probabilistic Packet Marking Algorithm 79

3.1 Structure of This Chapter . . . . . . . . . . . . . . . . . . . . . 80

3.2 Goal and Structure of the PPM Algorithm . . . . . . . . . . . . 80

3.2.1 Global network and attack graph . . . . . . . . . . . . . 80

3.2.2 Constructed graph . . . . . . . . . . . . . . . . . . . . . 81

3.2.3 Structure of the PPM algorithm . . . . . . . . . . . . . . 82

3.3 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

3.3.1 Marked packets and PPM markings . . . . . . . . . . . . 85

vii

3.3.2 Router . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

3.3.3 Packet marking probability . . . . . . . . . . . . . . . . 86

3.3.4 Attack source and attack pattern . . . . . . . . . . . . . 87

3.3.5 Attack graph and packet routing . . . . . . . . . . . . . 88

3.4 Graph Reconstruction Example . . . . . . . . . . . . . . . . . . 90

3.4.1 Packet marking . . . . . . . . . . . . . . . . . . . . . . . 90

3.4.2 Attack graph reconstruction . . . . . . . . . . . . . . . . 91

3.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . 91

4 Termination Condition of PPM Algorithm 94

4.1 Using the Upper-Bound Packet Number as the Termination

Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4.1.1 Failure under the multiple-attacker environment . . . . . 96

4.1.2 Simulation findings . . . . . . . . . . . . . . . . . . . . . 97

4.1.3 Chapter structure . . . . . . . . . . . . . . . . . . . . . . 98

4.2 Packet-Type Model . . . . . . . . . . . . . . . . . . . . . . . . . 100

4.2.1 Packet-Type probability . . . . . . . . . . . . . . . . . . 101

4.2.2 Pseudocode of the calculation of the packet-type proba-

bilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

4.2.3 Illustration of the calculation of the packet-type proba-

bility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

4.3 Using Markov Chain Model to Find the Sufficient Packet Number107

4.3.1 The Markov process . . . . . . . . . . . . . . . . . . . . 107

4.3.2 Example on discrete-time Markov chain modeling . . . . 110

4.3.3 Fundamental matrix . . . . . . . . . . . . . . . . . . . . 113

4.3.4 Example on calculating E[X] . . . . . . . . . . . . . . . 117

4.4 Disproving the Upper-Bound Packet Number as the Termina-

tion Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

4.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . 119

viii

5 Rectified Probabilistic Packet Marking Algorithm 121

5.1 Structure of This Chapter . . . . . . . . . . . . . . . . . . . . . 122

5.2 Overview of the RPPM Algorithm . . . . . . . . . . . . . . . . . 123

5.2.1 Working principle . . . . . . . . . . . . . . . . . . . . . . 123

5.2.2 Flow of rectified graph reconstruction procedure . . . . . 124

5.3 Execution Diagram of the RPPM algorithm . . . . . . . . . . . 126

5.3.1 Types of states . . . . . . . . . . . . . . . . . . . . . . . 126

5.3.2 Types of transitions . . . . . . . . . . . . . . . . . . . . . 127

5.3.3 Worst-case, average-case, and best-case scenarios . . . . 128

5.3.4 Role of the execution diagram . . . . . . . . . . . . . . . 129

5.4 Derivation of Termination Packet Number . . . . . . . . . . . . 130

5.4.1 Technique . . . . . . . . . . . . . . . . . . . . . . . . . . 130

5.4.2 State-change probability . . . . . . . . . . . . . . . . . . 130

5.4.3 TPN derivation . . . . . . . . . . . . . . . . . . . . . . . 131

5.4.4 Section summary and TPN calculation subroutine . . . . 139

5.5 Graph Reconstruction Example . . . . . . . . . . . . . . . . . . 139

5.5.1 State C1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

5.5.2 State C2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

5.5.3 State C3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

5.6 Simulation Result . . . . . . . . . . . . . . . . . . . . . . . . . . 143

5.6.1 Simulation environment . . . . . . . . . . . . . . . . . . 143

5.6.2 Simulation: different values of the marking probability . 144

5.6.3 Simulation: different graph structures . . . . . . . . . . . 147

5.6.4 Section summary . . . . . . . . . . . . . . . . . . . . . . 151

5.7 Supporting Routers with Multiple Victim Routes . . . . . . . . 151

5.7.1 Problem of multiple victim routes . . . . . . . . . . . . . 152

5.7.2 Formulating an extra set of extended graphs . . . . . . . 153

5.7.3 Reformulation of packet-type probability . . . . . . . . . 154

5.7.4 Simulation: support for multiple victim routes . . . . . . 157

ix

5.7.5 Section summary . . . . . . . . . . . . . . . . . . . . . . 158

5.8 Deployment Issues of the RPPM Algorithm . . . . . . . . . . . 158

5.8.1 Choice of the marking probability . . . . . . . . . . . . . 158

5.8.2 Execution time comparison between the PPM and the

RPPM algorithms . . . . . . . . . . . . . . . . . . . . . . 161

5.8.3 Scalability issue in PPM algorithm . . . . . . . . . . . . 165

5.8.4 Precision problem . . . . . . . . . . . . . . . . . . . . . . 166

5.9 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . 171

Bibliography 176

x

List of Figures

1.1 The architecture of a typical flood-based DDoS attack. . . . . . 5

1.2 The architecture of a reflector attack. . . . . . . . . . . . . . . . 6

1.3 The overview of the divide-and-conquer traceback approach. . . 12

2.1 An example network topology. . . . . . . . . . . . . . . . . . . . 25

2.2 Asynchronous reading of outgoing traffic counters in Example B. 31

2.3 Correct accumulative local traffic without clock synchronization. 32

2.4 An example execution of the snapshot algorithm. . . . . . . . . 39

2.5 Bji = 0 under all circumstances. . . . . . . . . . . . . . . . . . . 41

2.6 Aji is the channel state. . . . . . . . . . . . . . . . . . . . . . . 41

2.7 A network topology with two attackers who reside in the local

domains of R3 and R4. . . . . . . . . . . . . . . . . . . . . . . . 43

2.8 A timing diagram that shows the progress of the distributed

snapshot traceback algorithm. . . . . . . . . . . . . . . . . . . . 44

2.9 Classification of pre-monitoring, monitoring and post-monitoring

packets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2.10 The channel state of Link3,2 contains pre-monitoring (monitor-

ing) packets from both R3 and R4 in the first (second) instance

of the snapshot algorithm. . . . . . . . . . . . . . . . . . . . . . 51

2.11 (a) Network topology and (b) Legend for Simulations A and B. 54

2.12 Simulation A.1. Bounds for the real local traffic under constant

traffic rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

xi

2.13 Simulation A.2. The real local traffic under exponential on/off

process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

2.14 Simulation A.3. Effect of multiple attackers on the real local

traffic bounds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

2.15 Simulation A.4. Effect of new attackers’ locations. . . . . . . . . 58

2.16 Simulation A.5. On different attack traffic rates. . . . . . . . . . 59

2.17 Simulation B. Simulation for large scale Internet Topology. . . . 61

2.18 (a) An acyclic network with one attacker who resides in the

local domain of R3. (b) R3 maintains two accumulative outgo-

ing traffic counters C3,1(t) and C3,2(t) for the links Link3,1 and

Link3,2, respectively. . . . . . . . . . . . . . . . . . . . . . . . . 69

2.19 Another timing diagram that shows the progress of the dis-

tributed snapshot traceback algorithm. . . . . . . . . . . . . . . 71

2.20 (a) The same example network as Figure 2.7 with attacking

domains R3 and R4. But, the router R3 is an undeployed router.

(b) Logically, a virtual link between the router R2 and R4 is

formed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

2.21 A timing diagram that shows the progress of the DDoS trace-

back algorithm under the partial deployment environment. . . . 74

2.22 (a) In this example network, the router R2 is an undeployed

routers while the others are deployed routers. (b) As the un-

deployed router is transparent to the traceback protocol, the

router R1 records the channel state of the virtual links Link3,1

and Link4,1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

2.23 The timing diagram under a partial deployment environment.

A drawback is that the channel states of the virtual links Link3,1

and Link4,1 become indistinguishable at router R1. . . . . . . . 76

3.1 A typical case of a DDoS attack toward the victim V. . . . . . . 81

xii

3.2 The illustration of an attack graph: (a) an attack graph is not

the entire network; the attack graph is the paths traversed by

attack packets; (b) the attack graph may become larger than

the actual one due to the lack of legitimacy of the packets. . . . 82

3.3 The pseudocode of the packet marking procedure of the PPM

algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

3.4 The pseudocode of the path reconstruction procedure of the

PPM algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

3.5 The failure of the router R1 causes the route tables of R2, R3,

and R4 to change. This results in a constructed graph with

routers having multiple outgoing edges. . . . . . . . . . . . . . . 89

3.6 A step-by-step illustration of the reconstruction of the attack

graph based on the incoming packet sequence in Table 3.1. . . . 92

4.1 A six-router binary tree network: the upper-bound equation

cannot be applied under this multiple-attacker environment. . . 96

4.2 A eight-router tree network with four independent linear paths:

another multiple-attacker environment. . . . . . . . . . . . . . . 97

4.3 Simulation result: Number of marked packets required versus

number of independent paths. . . . . . . . . . . . . . . . . . . . 99

4.4 An increasing yet chaotic trend of the rate of change of the

number of marked packets required. . . . . . . . . . . . . . . . . 99

4.5 The pseudocode of the packet-type probability calculation – it

calculates the packet-type probability of every edge in the graph

G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

4.6 (a) Ga: A simple example linear network with three edges. (b)

Gb: An example network with multiple paths leading from R3

and R4 to the victim. . . . . . . . . . . . . . . . . . . . . . . . . 106

xiii

4.7 Example network G1: it is a linear network with three routers

and one victim. . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

4.8 Illustration of the Markov chain model of the PPM algorithm

with network G1 in Figure 4.7. . . . . . . . . . . . . . . . . . . . 111

4.9 The constructed transition probability matrix formulated of the

Markov chain shown in Figure 4.8. . . . . . . . . . . . . . . . . 112

4.10 Simulation result versus theoretical result: for network G1 in

Figure 4.7, we obtain two close sets of results for the distribution

of the sufficient packet number X. . . . . . . . . . . . . . . . . . 114

4.11 Example network G2: totally 16,384 Markov states. . . . . . . . 114

4.12 Probability distribution of the sufficient packet number on the

14-router binary-tree network, G2. . . . . . . . . . . . . . . . . . 115

4.13 Fundamental matrix calculated by Equation (4.13) with transi-

tion probability matrix P shown in Figure 4.9. . . . . . . . . . . 118

4.14 The comparison between the simulation and the theoretical re-

sults: both results disprove the linear property proposed by

previous work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

5.1 The design goal of the RPPM algorithm: to have a correct

constructed graph with probability greater than P ∗. . . . . . . . 123

5.2 The pseudocode of the rectified graph reconstruction procedure

of the RPPM algorithm. . . . . . . . . . . . . . . . . . . . . . . 125

5.3 An execution diagram of the rectified graph reconstruction pro-

cedure of the RPPM algorithm constructing a graph with n edges.127

5.4 Extended graph example: a constructed graph and its set of

extended graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . 133

5.5 The pseudocode of the termination packet number (TPN) cal-

culation subroutine. . . . . . . . . . . . . . . . . . . . . . . . . . 138

xiv

5.6 State C1: a constructed graph with one edge, and its extended

graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

5.7 State C2: a constructed graph with two edge, and its extended

graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

5.8 The simulations show that the larger the marking probability

is, the closer to the worst-case execution the simulation result is. 145

5.9 RPPM algorithm simulation: 15-node linear network with ran-

dom marking probability. . . . . . . . . . . . . . . . . . . . . . . 147

5.10 RPPM algorithm simulation: 14-router binary-tree network with

random marking probability. . . . . . . . . . . . . . . . . . . . . 148

5.11 RPPM algorithm simulation: 14-router random-tree network

with random marking probability. . . . . . . . . . . . . . . . . . 148


with marking probability = 0.1. . . . . . . . . . . . . . . . . . . 149



5.14 RPPM algorithm simulation: 1,000-router random-tree network


5.15 When the routers have more than one victim route, the RPPM

algorithm cannot guarantee the correctness of the constructed

graph when the confidence level is larger than 0.59. . . . . . . . 153

5.16 An illustration of the extended graph with the support of mul-

tiple victim routes. . . . . . . . . . . . . . . . . . . . . . . . . . 154

5.17 The pseudocode of the packet-type probability calculation sub-

routine which supports multiple victim routes. . . . . . . . . . . 156

5.18 With the support for multiple victim routes, the RPPM algo-

rithm can provide the guarantee of the correctness of the con-

structed graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

xv

5.19 Average number of marked packets required for a correct graph

reconstruction against different values of the marking probability.159

5.20 Average number of total packets (marked packets plus unmarked

packets) required for a correct graph reconstruction against dif-

ferent values of the marking probability. . . . . . . . . . . . . . 160

5.21 The number of marked packets recorded for the set of RPPM

algorithm simulations carried out on a random-tree networks

with 14 routers. . . . . . . . . . . . . . . . . . . . . . . . . . . . 162


algorithm simulations carried out on random-tree networks with

50 routers and 100 routers. . . . . . . . . . . . . . . . . . . . . . 162


algorithm simulations carried out on random-tree networks with

500 routers and 1000 routers. . . . . . . . . . . . . . . . . . . . 163

5.24 The percentage of number of marked mpackets increased when

comparing the RPPM algorithm to the PPM algorithm with

different network scales. . . . . . . . . . . . . . . . . . . . . . . 163

5.25 Scalability analysis: average number of marked packets collected

by the PPM algorithm versus the size of the attack graph. . . . 166

5.26 The pseudocode of repeating the RPPM algorithm to increase

the runtime probability. . . . . . . . . . . . . . . . . . . . . . . 170

xvi

List of Tables

2.1 Computation of the accumulative local traffic in Example A by

using Equation (2.1). . . . . . . . . . . . . . . . . . . . . . . . . 30

2.2 Computation of Li(t1, t2): the local traffic within [t1, t2] in Ex-

ample A by using Equation (2.2). . . . . . . . . . . . . . . . . . 30

2.3 Computation of accumulative local traffic at time ti,1 and ti,2. . 45

2.4 The local traffic intensity counts only the packets in between

the two instances of the snapshot algorithm. . . . . . . . . . . . 45

3.1 A sequence of packets collected by the victim. . . . . . . . . . . 90

4.1 Packet-type probabilities for Ga in Figure 4.6. . . . . . . . . . . 106

4.2 Packet-type probabilities for Gb in Figure 4.6 : after the path

(R3, R2, R1, v) of Gb is considered. . . . . . . . . . . . . . . . . . 106

4.3 Packet-type probabilities for Gb in Figure 4.6: after both paths

(R3, R2, R1, v) and (R4, R1, v) of Gb are considered. . . . . . . . 107

5.1 The marked packet-type probabilities of the extended graph G1,1

and G1,2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

5.2 The marked packet-type probabilities of the extended graphs

G2,1,G2,2, and G2,3. . . . . . . . . . . . . . . . . . . . . . . . . . 143

5.3 The average number of packets and time required to form a

correct constructed graph in a 100BaseT Ethernet. . . . . . . . 164

xvii

Chapter 1

Defense Against

Denial-of-Service Attack

“If you know your enemies and know yourself, you will win hundred times in

hundred battles.” The Art of War, Sun Tzu.

The emergence of the Internet as a pervasive form of communication has led

to the recent enormous deployment of E-business and information distribution

services. However, the success of the Internet also attracts malicious attackers

who abuse system resources and expose the inherent security problems of the

Internet. Distributed denial-of-service (DDoS) attack is one of the most press-

ing problems on the Internet. Well-known commercial sites such as Yahoo!,

Amazon, and eBay were attacked and were out of service for many hours due

to a series of DDoS attacks on February 2000[1, 2]. Since then, DDoS attacks

have increased in size, frequency, sophistication, and severity.

In this chapter, we are going to understand what a distributed denial-of-

-service attack is. We dissect the methodologies of common DDoS attacks

in Section 1.1. After we are familiar with the nature of the DDoS attacks,

we define the scope of this thesis in Section 1.2: to trace the location of the

attackers of a DDoS attack. In the same section, we suggest our approach

against DDoS attacks in a world-wide scale, and we name it the divide-and-

conquer traceback approach. In Section 1.4, we introduce previous work that is

1

Chapter 1 Defense Against Denial-of-Service Attack 2

related to this thesis. Roughly speaking, this covers the methodologies that will

be introduced in later chapters, including the distributed snapshot algorithm,

the packet filtering technique, and the IP traceback technique.

1.1 Overview of Attack Methodology

The goal of the DDoS attacks is to degrade or even disable the service(s)

provided by the target. For the example attack case in [1], the targeted services

are the web services provided by Yahoo!, CNN, and Amazon. We classify a

DDoS attack in terms of the attack methodology. A denial-of-service attack

can be realized in either two techniques:

1. exploiting the vulnerability in network protocols and software; and

2. leveraging high volume of address-spoofing, bogus traffic.

We name the former type of attack the vulnerability-based attack and the latter

type of attack the flood-based attack. These two kinds of attacks are usually

mixed together in order to bring about a large amount of damage.

Note that an attacker always wants to disguise himself or herself as a set of

legitimate users. There is a loophole in the TCP protocol that no components,

devices, or authorities on the Internet can check the identity of any packets

sent. Say the attacker is sending a packet from a machine with address A, he

or she can easily change the source address of the packet to address B without

anyone noticing. We name this kind of packets spoofed packets since the source

address is spoofed.

The advantage of the attacker sending spoofed packets is to keep his or

her location secret. Then, the DDoS countermeasures would not target him

or her so easily. Though this exploits the vulnerability of the TCP protocol,

we choose not to classify attacks using spoofed packets as vulnerability-based


attacks because every attack uses spoofed packets. Henceforth, throughout

the text, we always assume that every attacker sends spoofed packets.

1.1.1 Vulnerability-based attack

In the following sections, we introduce two severe kinds of vulnerability-based

attacks. This kind of attack leverages the flaws in protocol designs and the de-

fects in software. Once such vulnerabilities are exploited, the service provided

by the victim will be shut or degraded.

TCP-SYN flood attack

The TCP-SYN flood attack[3] (or SYN attack) is an infamous vulnerability-

based attack. Though the attack carries the word “flood,” what the attack does

is to exploit the vulnerability in the implementation of the SYN packet han-

dling of the TCP/IP protocol. In a nutshell, this attack targets the three-way

handshake protocol of the TCP protocol [4]. The attack brings down a host

by flooding the host with enough spoofed SYN packets so that these spoofed

SYN packets occupy all the available connections of the hosts. Eventually,

there are no more resources left for further connections. The countermeasure

of this threat is the SYN cookies introduced in [5]. Nowadays, most operating

systems already have SYN cookies implemented (inside the operating system’s

kernel).

Low-rate TCP attack

In [6], the authors proposed and realized a new form of attack that targets

the congestion control mechanism of the TCP protocol. The attacker carefully

orchestrates the periodic attack packets to exploit the fixed minimum TCP

retransmission timeout so as to shut off most, if not all, legitimate TCP flows.

Though there is are incident reports on the low-rate TCP attack, there are


already solutions [7, 8, 9] proposed in the literature.

1.1.2 Flood-based attack

The flood-based attack aims to disable a victim host by leveraging a high

volume of spoofed traffic. Once this type of DDoS attack is launched, the

victim will experience increasing load. The service will usually be impacted

significantly, and there are cases when the victims have broken down. To

realize such an attack, computers in an order of tens of thousands are needed

in order to generate a significantly large burden on the victim. The attacker

in reality cannot own such a scale of resource but steals them.

The attacker usually obtains computing resources by compromising a large

number of computers in order to launch a large-scale flood-based attack. This

can be realized by exploiting known vulnerabilities in widespread operating

systems such as Microsoft Windows. When such a exploitation is done, the

attacker usually gains the highest privilege of the compromised computer and

can perform whatever acts he or she likes. We called those compromised

computers zombies [10, 11]. Although the attack involves the technique of

exploiting vulnerabilities, the technique is not the payload of the DDoS attack,

and such an exploitation neither brings the zombies down nor degrades the

computing performance of the zombies after all.

Zombie attack

Once a large group of zombies has been gathered, the attacker loads attack

programs to the zombies. The zombies are then turned into unwitting attack-

ers, and the DDoS attack is then launched. Figure 1.1 shows the deployment

scenario and the entities involved in a DDoS attack using zombies [12]. The

attacker seated in front of his or her own computer controls a set of handlers

that are, again, obtained by exploiting vulnerabilities. These handlers are used


A Attacker

H H

Z Z Z Z

νVictim

Handlers

Zombies

Figure 1.1: The architecture of a typical flood-based DDoS attack.

to control the zombies so that the attacker can become stealthy during the at-

tack. The zombies are the ones that are sending spoofed traffic to the victim.

There are occasions that, when an outbreak happens, the Internet becomes

paralyzed because this kind of attack usually targets on widespread software.

It is always difficult to hunt down the attacker of a zombie attack. The

attacker always protects the communication between the handlers and the

zombies by encrypting the communication channels [12]. What we can do is

to ask the Internet service providers (ISPs) to help locate and filter the attack

traffic so as to ease the pain of the victim. Also, replacing legacy and buggy

software is a crucial step to reduce the number of handlers and zombies that

can be obtained by attackers. Moreover, intrusion detection systems (IDS)

[13, 14] should always be installed in order to detect and stop intrusions by

attackers promptly and effectively.

Reflector attack

There is another kind of automated attack using a similar architecture called

the reflector attack [15]. As shown in Figure 1.2, the main feature of this attack

is that the zombies are not attacking the victim directly but through a set of


Z Z Z Z

νVictim

Zombies

R R R RReflectors

Figure 1.2: The architecture of a reflector attack.

reflectors.

The zombies send spoofed packets with the source addresses set to the

victim’s address and the destination addresses set to the reflectors’ addresses.

The reflectors are usually some public servers, such as domain name servers,

and the content of the spoofed packets is usually a request for service from the

reflectors. The reflectors will then generate replies without knowing that the

requests are frauds. As a result, the reflectors send the replies to the victim as

the source addresses of the requests are set to the victim’s address.

The reflector attack is, therefore, by its nature, more detrimental than

using the zombie attack model alone because:

1. it amplifies the effect of the DDoS attack. Let us imagine that the at-

tacker has only one zombie. By sending spoofed packets to different

reflectors, one zombie is already enough to attack the victim in a dis-

tributed way;

2. it also degrades the services provided by the reflectors. During the re-

flector attack, the reflectors are loaded by the requests from the zombies,

and this degrades the services provided by the reflectors; and


3. it is more difficult to be traced. Since the reflecting flow is coming

from innocent hosts (given that the reflectors are not compromised),

the tracing can be done readily, but to find that they are just reflectors.

Peer-to-peer attack

This is an emerging type of attack mechanism. The peer-to-peer (P2P for

short) DDoS attack does not attack the P2P file sharing network but makes

use of the P2P network to launch a DDoS attack [16]. A P2P file transfer

network usually has ten of thousands of clients joining it. One type of the P2P

attack is to poison the file records shared by the clients. The attack writes a

bogus entry saying that a certain location, which is the victim, is providing a

certain set of files (and usually the victim is not a member of P2P network).

When innocent clients follow the bogus entry for a file sharing service, it will

end up in an error. But, the victim is bombarded with tens of thousands

irrelevant file requests.

Automatic attack tools

Several well-known DDoS attack tools adopt the above attack architectures.

These tools are designed to be versatile so that they can mount different types

of attack payloads to the zombies. Several famous tools include the Tribe

Flood Network 2000 (TFN2K for short) [17], the Trinoo [18], and the Stachel-

draht [19]. These automatic attack tools are well designed and are effective in

launching DDoS attacks.

1.1.3 Worm attack

The worm attack is another form of automatic attack tool. To define, a worm

is a piece of software that runs on a computer, and the computer is unwillingly

having the worm running. The worm has the ability to duplicate itself, and


has the duplicated copies infect other computers. From a functional point of

view, a worm infects a computer by exploiting vulnerabilities of the software

used on the target computer. A worm also has its payload: some payloads just

infect other computers, some payloads harm the hosting computer, or some

payloads attack target sites in a cooperative manner.

Code red

The Code Red [20] is a famous worm that roamed the Internet during the

summer of 2001. The worm exploits a vulnerability in the Microsoft IIS server

which, is widely deployed around the globe (around 20% of market share by

2001 [21]). The payload of this worm was twofold: first the worm tried to infect

as many IIS servers as possible, and then all the worms were coordinated to

launch a DDoS attack toward several victims such as the web server of the

U.S. White House.

In response, Microsoft announced the vulnerabilities with the correspond-

ing software patches provided. The attack ceased when the vulnerabilities were

fixed, and at the same time, the ISPs filtered the payload of the worm.

Slammer

Again, Microsoft was the target of another famous worm attack. The worm

named Slammer demonstrated a severe attack on the Internet in 2003 [22] by

using a vulnerability of Microsoft SQL server. The Slammer is actually an in-

teresting worm attack incident. The only payload it carried was to propagate

itself with a blitz tactic. Once the worm affected a vulnerable Microsoft SQL

server, the immediately probed the network for other vulnerable Microsoft

SQL servers by rapidly firing malicious traffic with random IP addresses. The

malicious traffic brought down many routers and then initiated a wave of rout-

ing table updates. When the failed routers were fixed and were online again,

the worm started another wave of routing table updates. The bombardment


of the malicious traffic, the failures of routers, and the changes of the routing

tables together shut the Internet down partially. This was, as a matter of fact,

a DDoS attack that targeted the Internet infrastructure.

1.1.4 Flash crowd

Despite the mentioned explicit attacks, there are scenarios in which the ser-

vices provided by the victim are degraded because of legitimate traffic. The

flash crowd happens when many users simultaneously send requests to one

Web site, usually because of special events attracting the interest of the mass

population. These events could be scheduled ones such as broadcasts of World

Cup matches, unpredictable events such as earthquakes, or links from popular

Web sites (see [23] for details).

In our context, the flash crowd is certainly not a DDoS attack. Neverthe-

less, the flash crowd behaves similarly to a DDoS attack. The victim and the

network itself can be overloaded by a flash crowd event, and the aggregated

volume of the legitimate traffic is comparable to a DDoS attack. In the liter-

ature, publications have mentioned this problem and suggested solutions have

been provided [24, 25].

To conclude, the DDoS attack may take different attack forms, strategies,

and patterns. Interested readers can refer to survey articles [26, 27] for more

details.

1.2 Scope of the Thesis

In this thesis, we target the flood-based attack, and we aim to stop such an

attack when one can detect it. According to industrial practices against DDoS

attacks [28], one should do the following steps in response to a DDoS attack:


1. Preparation. Service providers have a high chance of successful defense

against a DDoS attack if they have laid the groundwork against it.

2. Detect. The ability to quickly identify an attack is critical to minimizing

the damage that the attack can cause.

3. Traceback. Once a service provider has detected an attack, the next

step is to traceback–trying to determine the source of the attack so that

the service provider can apply mitigation techniques, or, if the source of

the attack is from another network, inform the corresponding peer.

4. Containment. When an organization knows where an attack is coming

from, the organization should apply containment and filtering mecha-

nisms to stop the malicious traffic.

5. Postmortem. After a security incident, it is important for the orga-

nization to review what was most effective during an attack and what

could be improved.

The target of this thesis is to trace back: to locate the sources of the attack

flows that are contributing to the DDoS attack. In the following section, we

present some general assumptions.

1.2.1 General assumptions

We aim to locate the sources of the attack flows. Hence, if the attacker(s) are

using the attacking architecture mentioned in Section 1.1.2, we are concerned

only with the locations of the zombies or the locations of the reflectors in the

reflector attack.

We assume that the victim has the ability to detect that the providing

service is being degraded by overwhelming traffic. We also assume that the

victim is allowed to report the incident to the victim’s ISP, and the ISP will

then handle the incident.


We are not interested in discriminating between a legitimate flow and an

attack flow. We are also not interested in distinguishing between a flash crowd

or a DDoS attack. What we are concerned with is identifying flows that

degrade the service provided by the victim.

Last but not least, since we are concerned with the flood-based attack only,

we are not going provide solutions to remedy vulnerability-based attack such

as the low-rate TCP attack.

1.2.2 A divide-and-conquer traceback approach

As DDoS attacks are becoming more violent and the attack scale is enlarg-

ing, tracking down attackers across the globe is becoming more difficult and

more tedious. To provide relief from such a adversary reality, we propose a

divide-and-conquer approach so that the global-scale traceback problem can be

divided into tractable sub-problems.

Overview

From a technical point of view, in the case of launching a global-scale attack,

attack sources are spread across different Internet service providers (ISPs for

short), and these sources send attack traffic toward the ISP where the victim

resides. As shown in Figure 1.3, attackers located in ISPs C, D, and E send

traffic toward ISP A, where the victim resides.

We propose that the ISPs should be coordinated, and together discover

the ISPs that are contributing overwhelming traffic, and we call this problem

the macroscopic traceback problem. After the problematic ISPs have been

identified, in the next step, each ISP should trace the location of attackers

within its administrative domain, and we call this problem the microscopic

traceback problem.

Specifically, a macroscopic traceback algorithm should be deployed within


A

D B

E

C

ν

ν

Intra-ISPnetwork

Inter-ISPnetwork

ISP backbonerouter

R ISP borderrouter

Attacker Victim site

Macro-tracebackprocessing node

Micro-tracebackprocessing node

Figure 1.3: The overview of the divide-and-conquer traceback approach.


the inter-ISP network. Referring to Figure 1.3, the border routers and the cou-

pling links between the border routers together form the inter-ISP network. To

facilitate the deployment of the macroscopic traceback algorithm, every border

router is connected to a macro-traceback processing node, which executes the

macroscopic traceback algorithm. On the other hand, a microscopic traceback

algorithm should be deployed within the intra-ISP network, and the intra-ISP

network is constructed by a network of backbone routers of an ISP. Again, a

processing node, namely the micro-traceback processing node, is added to help

trace the attackers within the network inside an ISP.

An example divide-and-conquer traceback execution

In addition to the architecture of the divide-and-conquer traceback approach,

Figure 1.3 also sets up an attack scenario. In the figure, we have five attacking

sources with the distribution that ISP C contain three attackers while both

ISPs D and E contains one.

In the beginning, at the moment that a DDoS attack is detected, the victim,

which resides in ISP A, calls for the DDoS defense service from its ISP. In turn,

the border router of ISP A diverts the traffic sent toward the victim to the

macro-traceback processing node, and the macro-traceback processing node

initiates a macroscopic traceback algorithm. The macro-traceback processing

nodes of the remaining ISPs join the algorithm accordingly. The traceback

result of the macroscopic traceback algorithm should discover that ISPs C, D,

and E contain the sources of the attack.

Next, ISP A would inform ISPs C, D, and E about the traceback result.

In response, each border router of the concerned ISPs diverts all the outgoing

traffic sent toward the victim to the micro-traceback processing node. Each

processing node, running the microscopic traceback algorithm, aims to locate

the attack sources, which are sending traffic toward it. Once the traceback

result is ready, the concerned ISP can discover the locations of the attackers,


and follow-up actions, such as packet filtering, will then be carried out.

Justification

First of all, it will be attractive to the ISPs if the traceback algorithms are

deployed only within their administrative domains. To justify, from the ISPs’

points of view, they do not want to disclose any information about their net-

works. The reason is simple: their peers are actually competitors, not part-

ners. Thus, any algorithms that execute across multiple ISPs have difficulties

in deployment, and this is the reason for confining the microscopic traceback

algorithm within the intra-ISP network.

On the other hand, the divide-and-conquer approach not only narrows

down the traceback scope by using the macroscopic traceback algorithm but

also speeds up the traceback process by having multiple execution instances

of the microscopic traceback algorithm concurrently at different ISPs.

We believe that there is no silver bullet that can handle every kind of flood-

based DDoS attack. We believe that one should use the right tool against the

right problem, the right model, and the right scenario. Therefore, in this

thesis, we choose to investigate the DDoS attack defense mechanism from two

different angles.

1.3 Structure and Contribution of the Thesis

In Chapter 2, we devise a macroscopic traceback algorithm. Leveraging the

well-known Chandy-Lamport’s distributed snapshot algorithm, we propose a

distributed algorithm that can correctly collect statistics (in a distributed

sense) from programmable routers in a coordinated fashion [29]. Then, by

analyzing the collected data, a victim can deduce the intensity of the traffic

generated by the network that is attached to every participating router. The

contribution of the algorithm is twofold. Firstly, this is the first piece of work


that applies a classical distributed algorithm in a DDoS attack defense mech-

anism effectively. Second, this work also provides a theoretical foundation to

measure Internet traffic in a distributed sense.

In Chapter 3, we analyze a promising microscopic traceback algorithm.

The probabilistic packet marking algorithm (PPM algorithm for short) by Sav-

age et al. [30] is an effective way to locate attackers using flood-based DDoS

attacks. In this chapter, we present an overview of the PPM algorithm. Yet,

the PPM algorithm is not perfect as its termination condition is not well-

defined in the literature. More importantly, it is found that, without a proper

termination condition, the attack graph constructed by the PPM algorithm

would be wrong. In Chapter 4, we study the termination condition of the

PPM algorithm. This is the first piece of work in the literature that studies

the termination condition of the PPM algorithm [31]. We present a discrete-

time Markov chain model that provides a precise calculation for the termina-

tion condition for the PPM algorithm. Nevertheless, the mechanism requires

knowledge of the attack graph in advance. This contradicts the purpose of the

traceback algorithm, which is designed to find the attack graph. This leads to

the surrender of the current termination condition of the PPM algorithm

To improve the termination condition of the PPM algorithm, we present

a new algorithm, the rectified probabilistic packet marking algorithm (RPPM

algorithm for short) in Chapter 5 [32]. The most significant merit of the RPPM

algorithm is that when the algorithm terminates, the algorithm guarantees

the correctness of the traceback result with a specified level of confidence.

Our findings show that the RPPM algorithm can guarantee such a correctness

under different deployment scenarios. As one of the major contributions of this

thesis, the RPPM algorithm provides an autonomous way, which is missing in

the original PPM algorithm, to determine its termination, and it is a promising

means to enhance the reliability of the PPM algorithm.


1.4 Related Work

The macroscopic snapshot algorithm that will be introduce in Chapter 2 lever-

ages the well-known Chandy-Lamport distributed snapshot algorithm. In this

section, we first introduce the importance of this distributed snapshot algo-

rithm. Then, we introduce the development of the techniques against DDoS

attacks, mainly the packet filtering technique and the IP traceback technique.

1.4.1 Distributed Snapshot Algorithm

The very first distributed snapshot algorithm was proposed by Dijkstra and

Scholten [33]. Later, Chandy and Lamport proposed the consistent global

snapshot algorithm in [34], and the algorithm is derived from Lamport’s ear-

lier work on logical time [35]. Fischer et. al. designed another algorithm for

consistent global snapshots, and this algorithm is tailored for transaction-based

systems [36].

The distributed snapshot algorithm has been applied in capturing consis-

tent global state of a distributed system. The primary use of the snapshot

algorithm is in checkpointing and rollback recovery [37]. The checkpointing

and recovery are vital properties that allows systems to make progress in the

presence of failures. In brief, checkpointing [38] is a technique to save the

states of an executing process. Processes achieve fault tolerance by saving re-

covery information periodically during failed-free executions. Upon a failure,

a failed process uses the saved information to restart the computation from

an intermediate state, thereby reducing the amount of lost computation. The

recovery information includes the states of the participating processes, called

checkpoints.

In a distributed system, a global checkpointing scheme requires a coor-

dinated checkpointing of the participating processes. The Chandy-Lamport

distributed snapshot algorithm provides a proofed consistent global state with


all the processes logically synchronized, without a global clock. Therefore, the

recovery of the distributed system is made possible with the checkpointing of

the global system. It is a common practice having the distributed snapshot

algorithm to save the checkpointing data in a database system [39].

1.4.2 DDoS Defense Mechanisms

One major security problem of the IP protocol is that the source address can

be filled by any user [40]. In a DDoS attack, attackers exploit this vulnerability

in order to hide their existences as well as to hinder authorities in tracing the

attack origins. In the literature, work has been done to mitigate the effects of

the DDoS attack by filtering the malicious packets before they can reach the

victim. On the other hand, there is also research work going on in order to

trace back the sources of the attack in the presence of the spoofed packets.

Packet filtering

One possible way to stop the attackers from spoofing the source addresses of

the malicious packets is ingress filtering [41]. Under such a filtering mechanism,

a router is configured to drop packets that arrive with illegitimate addresses.

This requires the participating routers to have the ability to examine every

packet that passes through as well as sufficient knowledge to distinguish be-

tween the legitimate packets and the illegitimate packets. The best way to

deploy ingress filtering is at the border routers of an AS1 /ISP because it is

rather easy for the border routers of the ASes and the ISPs to acquire the

range of legitimate addresses.

However, the fatal problem of ingress filtering is that it requires a widespread

deployment in order that the mechanism can efficiently remove most malicious

packets. Unfortunately, a signification fraction of the ISPs do not implement

1AS stands for autonomous system.


this approach. Moreover, although ingress filtering is deployed globally, an at-

tacker can still launch an attack by setting the spoofed address to be a member

of the legitimate address range of the AS. On the other hand, when a router

is deployed with egress filtering [42], a router is commanded to drop packets

that leave that router with illegitimate addresses. However, one can notice

that this mechanism bears the same defects of ingress filtering.

Park et al. have proposed the route-based distributed packet filtering scheme

in [43]. For example, let AS 1, AS 2, and AS 3 be three distinct autonomous

systems. Under a normal situation, AS 2 receives and routes the packets from

AS 1 at its incoming interface. If an attacker at AS 1 sends a spoofed packet

with the source address that belongs to AS 3, based on the routing information

of AS 2, this packet is an abnormal packet, and it will then be dropped. The

authors analyze the distributed packet filtering scheme on the power-law-based

Internet model. The performance result shows that the main advantage of the

proposed scheme is that it does not require a global deployment, and still can

filter a significant fraction of the malicious packets.

Yau et al. [44] proposed a feedback-based mechanism to throttle the rate

at which the routers send packets toward the victim. The scheme is to have

programmable routers deployed on the network. When these routers receive

throttling signals sent from the victim, the routers restrict the flows sent toward

the victim. The contribution of this work is to guarantee a max-min fairness

when throttling the flows so that large flows will have a large impact while the

small flows can survive the throttling.

Mahajan et al. proposed the aggregate congestion control in [24]. The sug-

gested method modifies the router’s congestion control algorithm, and the

method is twofold. Firstly, every router is equipped with the local ACC, which

can (i) identify a congestion, (ii) classify and identify “bad” traffic aggregates

from the input queue of the router, (iii) rate-limit the arrivals in order to ease

the congestion, and (iv) send push-back messages to upstream routers when


the congestion cannot be eased by the local ACC alone. Secondly, the push-

back mechanism is rather passive compared to the active local ACC measure.

Since congestion can only be detected on downstream routers, the upstream

router will be invoked to launch the local ACC and rate-limits the aggregates

specified by the push-back messages.

Chen et al. applied the ACC (aggregate congestion control) to mitigate

a DDoS attack [45]. The core of the DDoS attack defense mechanism is the

ability to detect the high bandwidth aggregate which that leverages the ACC

technique. The authors suggest that the defense should take place on the edge

routers of an ISP, and the edge routers together form a defense perimeter.

The perimeter is then responsible for locating the packets belonging to the

high bandwidth aggregate. Once those packets are found, the edge routers

that discover the packets accordingly install a rate-limit filter, which drops

the packets according to the acceptance rate. The authors have proposed two

solutions in locating which edge router accepts the problematic aggregate: one

is done by multi-casting, and another one is done by IP traceback.

Xu et al. suggested a methodology to sustain the availability of web services

under a DDoS attack [46]. The goal of the defense system is to, first, defend

against attacks using spoofed addresses, and second, minimize the system re-

sources consumed by adversary who is using legitimate addresses. To get rid

of the spoofed-address traffic, the authors suggest using the HTTP redirect

message. To mitigate the damage brought about by the adversary traffic using

legitimate addresses, the system is modeled as a minimax game. The goal is

to maximize the small traffic and to penalize the large traffic by suspending

the concerned connections.

IP traceback

Savage et al. proposed the probabilistic packet marking scheme in [30]. Every

router participating in this scheme marks the IP header of every packet passing


through it based on a pre-defined probability. At the victim site, the victim

can reconstruct the packet path (or the attack path) by collecting a sufficient

number of marked packets from the routers. Detailed analysis of the PPM

algorithm will be given in Chapter 3.

Following the work of Savage, many pieces of work researching the field of

IP traceback have emerged. In [47], the authors analyzed the time as well as

the number of packets that are sufficient to construct the attack graph with

certain a confidence interval. In [48], the authors proposed an authentication

scheme based on the approach suggested by Savage. The aim of this work is

to hinder the malicious parties in altering the marked field in the IP header

of the packets. Also, the authors have mentioned that if the victim site knows

the map of its upstream routers, the mechanism does not need to encode

the full IP address in the packet marking. They improved Savage’s marking

approach by hashing so as to achieve a lower false positive rate as well as a

lower computation overhead. On the other hand, Park et al. analyzed the work

of Savage, and pointed out that the spoofing of the marking field may impede

traceback by the victim site [49]. Attackers may be able to choose the spoofed

marking value and the source address in order to hide themselves.

Despite the IP traceback approach proposed by Savage, Dean et al. formu-

lated the traceback problem into a polynomial reconstruction problem [50, 51].

They use algebraic coding theory to encode traceback information in the

packet, similar to Savage’s approach. On the other hand, Snoeren et al. pro-

posed an efficient hash-based approach to trace back the attackers [52]. Every

packet that passes through a router is hashed into the storage device associ-

ated with the router. By tracking the storage device of every router, one can

derive the traversed path of a single packet.

Adler formulated a new IP traceback scheme that is capable of tracing the

attacker with an arbitrary number of encoding bits in the attack packets [53].

According to the analysis, one can apply an IP traceback scheme that uses one


encoding bit per packet in a single-attacker environment. However, the lower

bound of the number of bits required is greater than one in a multiple-attacker

environment.

Sung et al.́s work is the first that combines an IP traceback approach and an

automatic packet filtering approach [54]. The scheme employs the IP traceback

approach by Savage to discover the attack path. Then, by setting up a defense

perimeter, the attack packets are filtered preferentially at the routers that are

far from the victim. By using this scheme, the attack packets can be filtered

at a far distance from the victim while the legitimate packets can reach the

victim instead of being dropped.

Unlike the probabilistic packet marking algorithm that marks packets ran-

domly, Belenky et al. suggested marking every packet that passes through the

edge routers of an ISP [55]. Then, by collecting the marked packets, one can

know from which edge router the attack traffic is coming.

For details on other IP traceback schemes, readers may refer to the detailed

survey articles in [56, 57].

Chapter 2

Distributed Snapshot Traceback

Algorithm

In this chapter, we present a distributed approach to effectively trace back the

location of potential flood-based attack sources. To begin with, we focus on the

macroscopic traceback problem introduced in the previous chapter. According

to our knowledge, this is the first work that considers the network model of

the macroscopic traceback problem. In general, this approach can also be

applied to the micro-network model as well without affecting the algorithm’s

effectiveness.

Technically, our proposed approach is twofold. Firstly, our approach is

grounded in the programmable router architecture [58, 59, 60, 61] wherein

the participating routers can be programmed so that they can collaboratively

collect traffic statistics for a victim site, or the macro-traceback processing

node in the divide-and-conquer approach1 . After the statistical information

has been forwarded to a victim site, the victim site can then do the following:

1. construct the attack graph with the network paths taken by all received

packets at the victim, and

1To be consistent, we stay with the name victim when we are addressing the macro-traceback processing node.

22

Chapter 2 Distributed Snapshot Traceback Algorithm 23

2. accurately determine the magnitudes or intensities of the traffic gener-

ated from the local network of each participating router, and we name

this traffic the local traffic.

Secondly, upon determining the intensity of the measured traffic from each par-

ticipating router, the victim site can determine a subset of attacking (border)

routers whose workload consumes a large percentage of the victim’s resource.

The contributions of this chapter are as follows:

• to provide an effective distributed traceback methodology to determine

the local traffic of participating routers;

• to measure the local traffic all routers are determined at the same logical

time without requiring any global clock or global synchronization from

each participating router; and

• to assist the victim to efficiently locate the attackers who contributing

large attack flows.

The rest of the chapter is organized as follows. In Section 2.1, we formally

present the traceback problem and present the network setting wherein we

perform the distributed traceback. We also give an example to illustrate the

reason that one needs a distributed algorithm to carefully record the local

state of each participating router so as to achieve a correct traceback result.

Moreover, definitions and notations used throughout the chapter will be given.

In Section 2.2, we present the distributed algorithm to correctly record the

state of each participating router. In Section 2.3, we present the method to

correctly interpret the traceback results obtained by the distributed algorithm.

In Section 2.4, we carry out NS-2 simulations to illustrate the effectiveness of

our distributed traceback methodology. Implementation issues are discussed

in Section 2.5. Lastly, the conclusion of this chapter will be given in Section

2.6.


2.1 Overview and Problem Definition

In this section, we first present the overview of our approach, then we present

a network model. We also illustrate why one needs a distributed algorithm to

correctly perform the traceback under a DDoS attack.

2.1.1 Overview

To eliminate the detrimental effect of the flood-based DDoS attack, tracing the

location of the attacker and filtering out all the malicious packets are essential

steps. Since the attacker is sending a huge amount of packets comparing to

those of the normal users, one can easily notice the large portion of traffic

from the attacker on the victim side through a traffic intensity measuring

mechanism.

However, this approach is not obvious since the attackers are usually spoof-

ing the source address of the malicious packets. One can hardly measure the

traffic intensity of a particular host based on the source addresses the outgoing

packets. Alternatively, we suggest measuring the intensity of the outgoing traf-

fic towards the victim on the routers. Certainly, this scheme neither measures

the traffic intensity of an individual user nor traceback to a particular attacker.

Nevertheless, it aims to identify a number of routers which have high volume

of outgoing traffic towards the victim site. This indicates that the origins of

the attack are from the domains of those routers.

In order to measure and collect the traffic intensities from the participating

routers, we propose a novel approach by applying the snapshot algorithm sug-

gested in [34], and we name the algorithm the distributed snapshot traceback

algorithm (snapshot algorithm for short). The snapshot algorithm provides a

means to coordinate all the participating routers in the traffic measurement

and the data collecting procedures. The algorithm also provides a way to

measure the traffic intensity correctly. The advantages of our approach are:


LAN0

R1

ν

R2

R5

LAN1

LAN2

R3

R4

CA

C C. . . . . .C

LAN3

LAN4

LAN5

C C

C

C

C C

C

routerRi

C normal client

A attacker

ν victim

traffic flow

LAN

Figure 2.1: An example network topology.

• easy to implement without a large modification of the routers, and

• fast; the approach requires only a few seconds in measuring the traffic

intensities of the router.

2.1.2 Problem definition

Let us first define our network model, and the model is well-suited the macro-

scopic traceback problem.

Network components

An example network is shown in Figure 2.1. In the figure, an inverted, di-

rected acyclic graph rooted at V represents the network topology, and the root

node V represents a victim site. The graph is composed of the routers and

the local administrative domains (LANs) of the routers. For the simplicity of

illustration, the graph only shows the network components that are partici-

pating in transmitting and forwarding traffics to the victim site V. Let Ri be


an upstream router of V and the graph is a map of all routers which forward

traffic to V.

A LAN contains a number of end hosts which includes some legitimate

clients of V and possibly some attackers of V. The traffic generated by the

clients and the attackers are forwarded by routers. For example, in Figure

2.1, router R1 serves as a gateway of LAN0 and LAN1, and these two LANs

are regarded as the local administrative domain (domain for short) of R1. A

router is responsible for sending traffic generated from its domain, and it is also

responsible for forwarding traffic generated from the domains of its upstream

routers.

For example, in Figure 2.1, routers R3, R4, and R5 are considered as the

upstream routers of R1. Particularly, routers R3 and R4 are regarded as the

immediate upstream routers of R1. We say that a router is a leaf router if it is

not connected to any upstream routers, such as R2, R3, and R5 in Figure 2.1.

Other routers are then called the transit routers. Throughout this work, we

let U(Ri) be the set of upstream routers of Ri, U(Ri) be the set of immediate

upstream routers of Ri, D(Ri) be the set of downstream routers of Ri, and

D(Ri) be the set of immediate downstream routers of Ri.

Note that, according to the divide-and-conquer approach described in Sec-

tion 1.2.2 (on Page 11), a router in this model represents the border router of

an ISP while the corresponding LAN represents the administrative domain of

the ISP.

Assumptions

After we have introduced the network components, we discuss the assumptions

imposed on them.

We assume that the victim is equipped with the distributed snapshot trace-

back algorithm. This implies the victim has the ability and the computing

resources to execute the proposed traceback algorithm and to process the


traceback results. In reality, it is not practical to have the victim installed

the required software and hardware. Nevertheless, as mentioned in Section

1.2.2, the macro-traceback processing node will perform the traceback in the

victim’s place. Though the macro-traceback processing node is not the true

victim, we name the node that initiates the traceback algorithm the victim.

We assume that every router in the network (we mean the inter-ISP net-

work) is equipped with the traceback algorithm and has the computing re-

sources to run the traceback algorithm. Again, it is the macro-traceback pro-

cessing node which executes the traceback algorithm, instead of the router

itself. Nevertheless, we will discuss the feasibility of the router to execute the

proposed traceback algorithm in Section 2.5.2. More importantly, we assume

that every router in the network will not be compromised by any malicious

parties. A compromised router may disrupt the executions of the traceback

algorithm, or, even worse, report fake traceback results. This assumption

protects the execution of the traceback algorithm from being harmed by com-

promised routers.

Traffic classification

In our work, we classify three types of traffics: they are the transit traffic, the

local traffic, and the outgoing traffic. The transit traffic of Ri is the traffic

forwarded from the immediate upstream routers of Ri while the local traffic of

Ri represents the traffic generated from the local administrative domain of Ri.

Eventually, the outgoing traffic of Ri is the sum of the transit traffic and the

local traffic of Ri. To illustrate, let us consider the following example using

Figure 2.1. Part of the traffic to V was generated in LAN5, and this traffic

has to pass through routers R5, R4, and R1 before reaching V. Therefore,

the traffic from LAN5 is considered as the transit traffic of router R4. On the

other hand, the clients in LAN4 also generates traffic to V, and this traffic is

considered as the local traffic of R4. The union of these two streams of traffic


generated in LAN4 and LAN5 is considered as the outgoing traffics of R4.

We assume that each router maintains a counter which records the accu-

mulative (for the ease of presentation, let us ignore the counter wraparound

problem.) volume of the outgoing traffic, counted in terms of the number of

packets, towards the victim site V.

Lastly, we define an attacker and his/her behavior as follows. An attacker is

a host which is sending high volume of traffics towards the victim site within a

period of time (usually within seconds), and thereby consumes a large portion

of the victim’s resource. An attacker may generate any kinds of packets with

spoofed source addresses.

2.1.3 Traceback methodology

Before elaborating the distributed algorithm, we formally define the following

concepts.

Definition 2.1 The accumulative outgoing traffic counter of the router Ri at

time t records the accumulative number of packets which are destined for the

victim site V up to time t. We denote the value of the accumulative outgoing

traffic counter of Ri at time t for the victim V as Ci(t).

Definition 2.2 The local traffic of the router Ri is the number of packets

which are destined for the victim site V generated within the local adminis-

trative domain of Ri in the time interval [t1, t2]. We denote the local traffic of

Ri in the time interval [t1, t2] as Li(t1, t2).

Formally, we let Ci(t) be the counter value of the accumulative outgoing

traffic of router Ri at time t and let U(Ri) be a set of immediate upstream

routers of Ri. Note that, in the case that the router Ri is serving more than

one victim, there will be different copies of the counter Ci(t) with different


values. The accumulative local traffic Ni(t) of router Ri at time t is given as

follows.

Accumulative Local Traffic

Ni(t) =

Ci(t), Ri is a leaf;

Ci(t) −∑

Rj∈U(Ri)Cj(t), otherwise.

(2.1)

Let Li(t1, t2) represents the local traffic generated by the router Ri within the

time interval[t1, t2]:

Local Traffic

Li(t1, t2) = Ni(t2) − Ni(t1). (2.2)

The implication of Equations (2.1) and (2.2) is that, by using the outgoing

traffic counters, one can deduce the accumulative local traffic to the victim

site V for every router by Equation (2.1). Then, by taking the difference of

these accumulative local traffics between two different time instants as shown

in Equation (2.2), one can obtain the local traffic to the victim V for every

router within the measurement interval [t1, t2].

2.1.4 How to perform the traceback

We describe the steps of the traceback process as follows. When V receives a

huge amount of traffic that exceeds its pre-defined threshold of traffic loading,

V declares that it is under a DDoS attack, and the traceback procedure is

started. V signals all routers to read their outgoing traffic counters. In order

to determine the local traffic within a time interval [t1, t2], V needs to send the

counter reading signals to all participating routers twice: one at time t1 and

the other at time t2. Then, every router takes the counter value of its outgoing

traffic counter, and sends the counter statistics back to the victim accordingly.

Eventually, after V has collected these two sets of data from the participating

routers, V computes the local traffic generated from each domain within [t1, t2]


Outgoing Traffic Counter at time tTime C5(t) C4(t) C3(t) C2(t) C1(t)t = t1 20000 35000 5000 10000 65000t = t2 50000 66000 6000 11000 99000

Table 2.1: Computation of the accumulative local traffic in Example A byusing Equation (2.1).

Local Traffic from t1 to t2R5 R4 R3 R2 R1

Li(t1, t2) 30000 1000 1000 1000 2000

Table 2.2: Computation of Li(t1, t2): the local traffic within [t1, t2] in ExampleA by using Equation (2.2).

by Equations (2.1) and (2.2). By comparing the intensities of the local traffics

of the participating routers, one can determine the locations of the attackers.

Traceback example

To illustrate this traceback process, let us consider the following simple but

illustrative example using the network topology of Figure 2.1 (on Page 25), and

we call this example Example A. There is only one attacker located in LAN5.

The attacker launches a DoS attack to the victim site V, and V initiates the

traceback procedure to determine the location of the attacker. For simplicity,

we assume that the initial values of all outgoing traffic counters of the routers

are zero (i.e., Ci(0) = 0 for all routers Ri in the network topology). The

counter values of all five routers’ outgoing traffic are taken at time instants t1

and t2. Table 2.1 depicts the outgoing traffic Ci(t) at time instants t1 and t2

for all routers, and the values of all routers’ accumulative local traffic Ni(t) at

time instants t1 and t2.

The local traffic Li(t1, t2) of each router generated within [t1, t2] are shown

in Table 2.2 wherein the computations are performed based on Equation (2.2).


20, 000 pkts time

time

R5

R4

C5(t1)

t1 t2

C5(t2)

C4(t1)

t1 t2’

C4(t2’)

Figure 2.2: Asynchronous reading of outgoing traffic counters in Example B.

Comparing the intensities of the local traffics of these five routers within the

interval [t1, t2], one can deduce that the domain of router R5 is the location of

the attacker.

2.1.5 Difficulties of a distributed traffic measurement

By using this traffic measurement, one can easily jump to the conclusion that

a DDoS traceback is an easy task. However, we will show that there are some

deficiencies in this distributed counter reading approach. The major problem is

that Equations (2.1) and (2.2) are only correct if the network has a global clock

and all routers can perform synchronous reading of their respective outgoing

traffic counters. Let us consider Example A again but with asynchronous

reading of the counters, and we call this Example B depicted in Figure 2.2.

A black rectangle in the figure represents the time instant at which the

outgoing traffic counter of a router is read. Since the second outgoing traffic

counters for R4 and R5 are not read simultaneously, some packets from R5

sending to V are not recorded by R5, but are recorded by R4. To illustrate

this numerically, C5(t2) becomes 30,000 instead of 50,000. Thus, L5(t1, t2)

becomes 10,000 while L4(t1, t′2) becomes 21,000.

This shows that asynchronous reading of the counters will mislead the

victim site to conclude that the domain of router R4 is the location of the


attacker. In the next section, we present a complete distributed algorithm

to precisely measure the local traffics of all routers in a synchronized manner

without a global clock among the routers.

2.2 Distributed Algorithm

In this section, we present the complete distributed algorithm to measure the

local traffic of every router. We first define the notion of “correctness” for

measuring the local traffic of each participating router and demonstrate how

one can effectively achieve the required correctness. Besides the general proof

for the correctness of the proposed distributed algorithm, we also present an

example to illustrate the effectiveness of this distributed algorithm.

In the previous section, we have used an example to illustrate that a straight

forward manner of reading the outgoing traffic counters can lead to an erro-

neous conclusion (which concludes a wrong location of the attacker). The

reason for this erroneous conclusion is that the outgoing traffic counters of the

immediate upstream routers of R4 are not recorded correctly.

Rj

Ri

Cj(tj,k)

Ci(ti,k)

time

time

tj,k

ti,k

Aji Bji

Figure 2.3: Correct accumulative local traffic without clock synchronization.


2.2.1 Reasons for incorrect traceback result

Figure 2.3 illustrates a timing diagram with two routers Ri and Rj , where

Rj is an immediate upstream router of Ri. The black rectangle in the figure

represents the time instant at which the outgoing traffic counter of a router Ri

is read, and we let this time instant be ti,k, where k ∈ [1, 2] represents whether

the reading is taken for the first or the second time. We assume that Figure 2.3

is illustrating the reading of the outgoing traffic counter for the kth time. Let

Cj(tj,k) be the value of Rj’s outgoing traffic counter at time tj,k and let Ci(ti,k)

be the value of Ri’s outgoing traffic counter at time ti,k. The Aji block in the

figure represents a sequence of packets that are sent to V through Rj before

tj,k but are received by Ri after ti,k. Correspondingly, the Bji block represents

a sequence of packets that are sent to V by Rj after tj,k but are received by

Ri before time ti,k. When Ri records the traffic counter Ci(ti,k), there may be

chances that

• the amount of traffic represented by Aji should be included in Ci(ti,k),

but it is in fact not considered; or

• the amount of traffic represented by Bji should not be included in Ci(ti,k),

but it is in fact counted.

These are scenarios when the mis-counting of packets happened and will cer-

tainly lead to an erroneous conclusion.

2.2.2 Measuring the correct local traffic

Based on the above findings, we re-consider the formation of the calculation

of the local traffic. Let the accumulative local traffic of router Ri at time ti,k

be Ni(ti,k). The accumulative local traffic of Ri at time ti,k is the difference

between the outgoing traffic of Ri at ti,k and the transit traffic received by Ri


at ti,k. Hence:

Ni(ti,k) = Ci(ti,k) − ( transit traffic received by Ri at ti,k ) .

The transit traffic received by Ri at time ti,k is the outgoing traffic sent from

all immediate upstream routers of Ri. On one hand, since packets in Aji are

not received by Ri at ti,k, but are recorded by Rj at tj,k, one needs to reduce

this traffic workload from Cj(tj,k). On the other hand, packets in Bji are

received by Ri at ti,k, but are not recorded by Rj at tj,k. Thus, one needs to

include this traffic workload in Cj(tj,k). Therefore, the correct accounting of

the accumulative local traffic from router Ri is defined as follows:

Ni(ti,k) = Ci(ti,k) −∑

Rj∈U(Ri)

( Cj(tj,k) − Aji + Bji ) .

Let us apply the above equation back to Example B illustrated in Figure

2.2. The accumulative local traffic N4(t4,2) becomes:

N4(t4,2) = C4(t4,2) − (C5(t5,2) − A54 + B54)

= 66000 − (30000 − 0 + 20000)

= 16000.

∴ L4(t4,1, t4,2) = 16000 − 15000 = 1000.

Hence, one can conclude that the attacker is in the domain of R5. In the fol-

lowing subsection, we present an efficient distributed algorithm to measure the

two sequences of packets Aji and Bji correctly so as to satisfy the correctness

criteria stated above.

2.2.3 The distributed snapshot algorithm

We make use of the result in [34] to collect all outgoing traffic counters in a

coordinated manner, and, at the same time, to determine the values of Aji

and Bji. We call this algorithm the distributed snapshot traceback algorithm

(snapshot algorithm for short).


There are three main components in this algorithm, namely (1) the marker,

(2) the local state of a participating router or the victim site, and (3) the

channel state. These three components have the following functionalities under

our DDoS application.

1. Marker: The marker is a special packet with a special header or a spe-

cial header entry. The marker is initially sent by V to all its neighboring

routers. The functionality of the marker is to facilitate all participat-

ing routers to record their local states and to derive the corresponding

channel states.

2. Local state: The routers as well as the victim site have their own

local states. For a participating router, say Ri, the local state at time t

corresponds to the value of its outgoing traffic counter Ci(t). However,

the victim site V does not have an outgoing traffic counter. Instead,

the local state of the victim site V refers to the accumulative number

of packets that V has received by time t, i.e., the aggregated traffic

destined for V from the domains of all participating routers. We denote

this accumulative incoming traffic to V at time t as TV(t). To find the

aggregated incoming traffic sent to V within the interval [t1, t2], one can

perform the following calculation:

Incoming Traffic of V

IV(t1, t2) = TV(t2) − TV(t1). (2.3)

3. Channel state: This corresponds to the number of packets that are

received by a router after the router records its own local state but before

that router receives the marker along that link. Its role will be thoroughly

discussed later.

The snapshot algorithm in [34] assumes that the packet delivery process

is in the order sent (or FIFO). As two adjacent routers are connected by a


communication link or in the same LAN, the delivery order of the packets can

be preserved under this kind of physical connection. On the other hand, when

a router or the victim is measuring the channel state, it has to distinguish

at which channel an incoming packet is arriving. One cannot depend on the

source address of the incoming packet as it is spoofed. We suggest that the

router should refer to the level-two address (e.g., MAC address in Ethernet) of

a packet in order to distinguish from which channel the packet is coming. This

method does not bear the risk that the attackers are sending spoof packets

because we are interested only in the upstream router that a packet is coming

from. Also, it is futile for the attacker to spoof the level-two source address

of a packet because when the packet is routed through a routing device, the

level-two address is usually altered and set to the hardware address of that

routing device.

2.2.4 Pseudocode and execution of snapshot algorithm

In the following, we are going to present the flow to the snapshot algorithm.

First, we have to define the following notations. Let Ri and Rj be two adjacent

routers connected by two uni-directional links, namely Linki,j and Linkj,i.

Linki,j carries traffic that is from Ri to Rj while Linkj,i carries traffic from Rj

to Ri. Let the time instant that Ri records its local state be ti,k, and let the

time instant that Ri receives a marker from Linkj,i after it has recorded its

local state be ti,kj (if the marker arrives before Ri records its local state, then

that time instant is ti,k. See the following algorithm for details). Lastly, let

Hji(ti,k, ti,kj) be the channel state of Linkj,i, which is the number of packets

received by Rj from Ri after ti,k and before ti,kj . The following pseudo-code

shows the outline of the snapshot algorithm.

Distributed Snapshot Traceback Algorithm


Algorithm initialization

V records the value of incoming traffic at t as TV(t);

For (each link Linkv,k that connects V to its neighboring router Rk) {V sends a marker along Linkv,k and starts recording the number of

packets received from Linkk,v;

}

Marker-sending and marker-receiving rules

For the victim site V:

If (V has received a marker from a Linkk,v at time t′){V stop recording the number of packets received from Linkk,v and

stores the value as the channel state Hkv(t, t′);

}For each participating router Ri:

If (Ri has received a marker from Linkj,i at time ti,k and Ri has not

recorded its local state) {Ri records the value of the outgoing traffic counter at time t as Ci(t);

Ri sets the channel state Hji(ti,k, ti,kj) as zero;

For (each link Linki,k that connects Ri to its neighboring router Rk)

{Ri sends a marker along Linki,k;

Ri starts recording the number of packets received from Linkk,i, except

Linkj,i;

}If (Ri has received a marker from Linkj,i at time tji,k and Ri has already

recorded its local state){Ri stop recording the number of packets from Linkj,i and stores the

value as Hji(ti,k, ti,kj);

}


Termination

For each participating router Ri in the network topology:

If (Ri has recorded local state and has finished recording channel states

for all incoming links) {Ri sends the snapshot data (i.e., its local state and all its channel

states) to V;

Ri terminates;

}

For the victim site V:

If (V has recorded its incoming traffic and has finished recording all

channel states for all its incoming links and has received the snapshot

data sending from all participating routers) {V terminates;

}

In Figure 2.4, we depict the execution of the snapshot algorithm according

to the above pseudocode. The first step should be the invocation of the snap-

shot algorithm and it occurs when the victim site V acknowledges that it is

under a DDoS attack (for instance, V finds that the amount of incoming traffic

has exceeded a pre-defined threshold). In the beginning of the algorithm, Vsends markers to its neighboring routers along its outgoing links (Step 1.1 in

the figure). At the same time, the victim measures the number of packets from

the incoming links, and these are the channel states (Step 2.2 in the figure).

The router, upon the arrival of a marker from the victim, propagates the

markers to routers further from the victim (Step 2.1). Also, it returns a marker

back to the victim (Step 2.2) and the victim stops measuring the corresponding

channel state when the return marker arrives (Step 3). Eventually, a router


R1

ν

R2

R5

R3

R4

CA

C AC

C C

C

C

C C

C

ν

Step 1.1.Markers are sent fromvictim to neighboringrouters.

Step 1.2.Victim starts measuringchannel states.

R1

Step 2.1.Upon the receipt of the markerfrom the victim, the router returnsa marker back to the victim.

Step 2.2The router keeps propagatingthe markers to other routers.

ν

Step 3.Markers return to the victim and itstops measuring the channel states.

R2

Step 4.The local state of the router startssending back to the victim.

R4

Step 4.The local state of the router startssending back to the victim.

Figure 2.4: An example execution of the snapshot algorithm.


send its local state to the victim when it has stopped measuring all the channel

states (router R4 - Step 4) or it does not have any channel states to measure

(router R2 - Step 4).

The properties of this series of coordinated actions among the routers by

the sending and receiving of markers are as follows:

1. to guarantee that Bji = 0, and

2. to ensure the measured value Hji(ti,k, ti,kj) is equivalent to Aji.

After a router has finished recording its local and channel states for all its

incoming links, it sends these information to V. The algorithm terminates after

V has finished recording its local state and channel states, and has received the

local states and the channel states from all participating routers. In addition,

Lemma 2.1 proves the mentioned properties of the algorithm.

Lemma 2.1 For any two adjacent routers Ri and Rj which are connected by

the Linkj,i, the snapshot algorithm guarantees that Bji = 0, and correctly

measures Aji as the channel state of Linkj,i.

Proof. We first prove that Bji = 0, then prove that Aji is the channel state

Hji(ti,k, ti,kj) of Linkj,i.

To illustrate, we depict the ideas of the proof in Figures 2.5 and 2.6. In

the figures, the black rectangle represents the time instant at which the value

of the outgoing traffic counter of a router is recorded. The shaded rectangle

represents the time instant at which a marker arrives at Ri after the value of the

outgoing traffic counter of Ri is recorded. The dotted line is the transmission of

the sequence of packets Aji or Bji from Rj to Ri while the solid line represents

the transmission of the marker from Rj to Ri.

For both figures, case 1 corresponds to the scenario that Ri records its local

state Ci(ti,k) because it receives the marker from Rj along Linkj,i. For case 2,


Rj

Ri

Cj(tj,k)

Ci(ti,k)

time

time

Rj

Ri

Cj(tj,k) time

time

Case 1: Bji = 0 Case 2: Bji = 0

tj,k

ti,k

t

ti,kjti,k

Ci(ti,k)

BjiBji

Figure 2.5: Bji = 0 under all circumstances.

Rj

Ri

Cj(tj,k)

Ci(ti,k)

time

time

Aji

Case 1: Aji = 0

Rj

Ri

Cj(tj,k)

Ci(ti,k)

time

time

Aji

Case 2: Aji = Hji(ti,k, ti,kj)

tj,k

ti,k

tj,k

ti,k ti,kj

Figure 2.6: Aji is the channel state.

Ri has already recorded its local state Ci(ti,k) before the arrival of the marker

from Rj along Linkj,i.

In Figure 2.5, we show that Bji = 0 under all circumstances. When the

router Rj records its local state Cj(tj,k), it sends markers to all its outgoing

channels. Since Linkj,i is FIFO, all packets in Bji cannot reach Ri before the

marker arrives at Ri. Therefore, Bji is equal to zero in both cases.

We now prove that Aji is equivalent to the channel state of Linkj,i. Recall

that Aji represents a sequence of packets that are sent to V by Rj before tj,k

but are received by Ri after ti,k. When the router Rj records its local state

Cj(tj,k), it sends markers to all its outgoing channels. There are two cases to


consider:

(A) In case 1 of Figure 2.6, all packets of Aji must not be able to reach Ri

after ti,k as Linkj,i is FIFO. Therefore, Aji = 0; or

(B) In case 2 of Figure 2.6, all packets of Aji reach Ri before ti,k due to

the FIFO property of Linkj,i. Since Ri records its local state Ci(ti,k) at

time ti,k and starts counting the number of packets arrived along Linkj,i,

Ri stops counting the number of packets arrived along Linkj,i when it

receives the marker from Rj at t′. The packets arrived between [ti,k, ti,kj]

is the channel state Hji(ti,k, ti,kj) of Linkj,i. Based on the result in case

1, the packets of Aji cannot reach Ri after the time ti,k. Thus, by the

definition of Aji, Aji is equal to channel state Hji(ti,k, ti,kj) of Linkj,i.

Finally, by Lemma 2.1, the equation for calculating the accumulative local

traffic Ni(ti,k) is as follows:


Rj∈U(Ri)

(

Cj(tj,k) − Hji(ti,k, ti,kj))

. (2.4)

To summarize, the aim of the traceback algorithm is to measure the local

traffic of every router within the time interval [t1, t2]. The victim site V initiates

the snapshot algorithm twice: once at time t1 and another at time t2. Based on

the local states and the channel states received from all routers, the victim site

calculates the consistent local traffic counters by applying Equation (2.4). In

turns, the victim calculates the local traffic intensity of router Ri by Equation

(2.2).

2.2.5 Example in calculating the traceback result

In this subsection, we illustrate the DDoS traceback algorithm through an

example by using the network topology shown in Figure 2.7. For simplicity,

the LANs of the routers are not shown and we assume that all the routers


R1

R2

R4

R3

ν

attackers

Figure 2.7: A network topology with two attackers who reside in the localdomains of R3 and R4.

have their own local domains. The attackers are located in the domains of the

routers R3 and R4 as indicated in the figure.

Figure 2.8 illustrates how the DDoS traceback algorithm works, we first

explain the symbols shown the figure. A black rectangle represents the time

instant that a router Ri records the value of its outgoing traffic counter, and

we denote this time instant as ti,k where k represents the kth instance of the

snapshot algorithm, k ∈ [1, 2]. In addition, the corresponding value of the

outgoing traffic counter is shown besides the black rectangle. For the victim

site V, the black rectangle represents the time instant that the victim site

V records the number of incoming packets. We denote this time instant as

tk where k ∈ [1, 2]. For the simplicity of illustration, we assume that the

initial values of the outgoing traffic counters of all routers and the initial value

of the accumulative incoming traffic of the victim site V are zero. On the

other hand, a shaded rectangle represents the time instant that a router or

the victim stops recording a channel state, and we denote this time instant as

ti,kj (the superscript j means that the marker coming from Linkj,i). Similar

to the presentation of the local state, the value of the channel state is shown

besides the shaded rectangle. This figure also shows the time instant that the


R1

R2

R3

R4

ν 0 0

0

10

100

0

010

230

20

110

200

230 0 0

10100

0

1st instance of thesnapshot algorithm

2nd instance of the snapshot algorithm

time instant of recordingthe local state

time instant tostop recordingthe channel state

time instant thatmalicious packets(100 packets) are sent

time instant thatnormal packets(10 packets) are sent

na nb

nc

ma

t1t2

t1,1

t2,1

t3,1

mbnd

t4,1

time

time

time

time

time

Figure 2.8: A timing diagram that shows the progress of the distributed snap-shot traceback algorithm.

domain of a router transmits packets to V. A sequence of normal packets is

represented by a white circle in the figure, and each white circle represents

10 packets (shown as na, nb, nc, and nd in the figure). On the other hand,

a sequence of malicious packets from the attackers is represented by a black

circle, and each black circle represents 100 packets (ma and mb in the figure).

Based on the values shown in Figure 2.8, one can apply Equation (2.4) to

calculate the accumulative local traffic of the routers and the victim site, and

the corresponding results are shown in Table 2.3. In order to have a clearer

illustration, we show the calculation of the accumulative local traffic N2(t2,1)

of R2 as an example. Referring to Figure 2.7, the immediate upstream routers

of R2 are R1 and R3. Then, we apply Equation (2.4) as follows:

N2(t2,1) = C2(t2,1) − (C1(t1,1) − H12(t2,1, t2,11)) −

(C3(t3,1) − H32(t2,1, t2,13))

= 0 − (10 − 10) − (0 − 0) = 0 .

Similarly, one can follow the above procedure to calculate the accumulative

local traffic of R1, R3, and R4 in both instances of the snapshot algorithm.


Accumulative local traffic at time tTime N1(t) N2(t) N3(t) N4(t) Tv(t)t = ti,1 10 0 0 100 0t = ti,2 20 10 100 110 230

Table 2.3: Computation of accumulative local traffic at time ti,1 and ti,2.

Local traffic from ti,1 to ti,2R1 R2 R3 R4

Li(ti,1, ti,2) 10 10 100 10

Table 2.4: The local traffic intensity counts only the packets in between thetwo instances of the snapshot algorithm.

After calculating all accumulative local traffic for both instances of the

snapshot algorithm, one can apply Equation (2.2) to obtain the local traffic

intensity of every router. The result is shown in Table 2.4. The most significant

property of the DDoS traceback algorithm is that only the packets that are

sent within the two instances of the snapshot algorithm will be recorded. We

refer to Figure 2.8 again in order to illustrate the mentioned property. The

figure shows that there are totally 240 packets (packet sequences na, nb, nc,

nd, ma, and mb) that have been sent towards the victim site V. However,

the sequences of packets na and mb are sent before the routers participate in

the traceback algorithm and, thus, these packets are not recorded. Therefore,

one can conclude that if an attacker at Ri that has sent a massive amount of

packets within the two instances ti,1, ti,2 of the snapshot algorithm, he/she will

be discovered by the traceback methodology.

To conclude this section, we presented the DDoS traceback algorithm which

records the local traffic of the routers correctly without the requirement of a

global clock. Also, a proof is given to show the correctness of the algorithm.

Moreover, one can have a very clear understand on how the snapshot algorithm


works through the presented example.

2.3 Interpreting the Traceback Result

In the previous section, we presented the DDoS traceback algorithm which

enables the victim site V to correctly compute the local traffic of every partic-

ipating router. Referring back to the example in Figure 2.8, one can observe

that the local administrative domain of router R3 is the location of the attacker

by comparing the local traffic intensity of Table 2.4. Since the attacker in R3

sent the sequence of malicious packets ma to the victim site V, the calculated

local traffic L3(t3,1, t3,2) of R3 is significantly larger than those of other routers.

Nevertheless, from Figure 2.8, one can notice that another attacker in R4 had

also sent the sequence of malicious packets mb to V before t4,1, but these mali-

cious packets are not revealed in Table 2.4 so that one cannot determine R4 is

another location of the attacker. The reason is that the sequence of packets mb

is not sent to V within [t4,1, t4,2]. Therefore, the malicious packets mb which

are sent by R4 before t4,1 and are received by V within [t1, t2] are not recorded

as the local traffic of R4. This leads to a problem relating local traffic of the

router Ri and incoming traffic of the victim site V.

According to Table 2.4, the sum of all local traffics is 10 + 10 + 100 + 10

= 130. But the incoming traffic received by V within [t1, t2] is:

IV(t1, t2) = TV(t2) − TV(t1) = 230 − 0 = 230.

The total number of packets generated by all routers within the two instances

of the snapshot algorithm is not equal to the number of packets received by

V within [t1, t2]. We call this as the traffic inequality. The traffic inequal-

ity suggests that the local traffics of the routers may not arrive at V within

[t1, t2], and thus one should not rely only on the local traffic of each router

to determine the location of the attackers. In the following subsections, we


will investigate this problem and we will illustrate a methodology to locate all

potential attackers.

2.3.1 Investigation of the traffic inequality

In this subsection, we present a detailed analysis of the traffic inequality. To

start our analysis, we first distinguish packets that are sent from the domain

of a router Ri to its downstream routers into three categories based on the

time that Ri records its local state. These packets are (1) the pre-monitoring,

(2) the monitoring, and (3) the post-monitoring packets, and they are formally

defined in Definition 2.3.

Definition 2.3 We define pre-monitoring, monitoring and post-monitoring

packets with respect to the time that the router Ri records its local state:

1. A packet sent from the local administrative domain of Ri is called a pre-

monitoring packet if and only if the packet is sent before Ri records its

local state in the first instance of the snapshot algorithm (ti,1).

2. A packet sent from the local administrative domain of Ri is called a

monitoring packet if and only if the packet is sent after Ri records its

local state in the first instance of the snapshot algorithm (ti,1), and before

Ri records its local state in the second instance of the snapshot algorithm

(ti,2).

3. A packet sent from the local administrative domain of Ri is called a

post-monitoring packet if and only if the packet is sent after Ri records

its local state in the second instance of the snapshot algorithm (ti,2).

Figure 2.9, which shows a timing diagram of a router Ri, illustrates the

three categories of traffics. All packets which are sent before the first instance


pre-monitoringpackets

post-monitoringpackets

monitoringpackets

Ri time

ti,1 ti,2

time instant that a router records its local state

monitoring regionpre-monitoringregion

post-monitoringregion

Figure 2.9: Classification of pre-monitoring, monitoring and post-monitoringpackets.

of the snapshot algorithm are pre-monitoring packets. The packets which are

sent between two snapshots are the monitoring packets, and these packets are

actually the local traffic of Ri. The packets which are sent after the second

instance of the snapshot algorithm are the post-monitoring packets.

Based on the above classification, one can have a better understanding

about the snapshot algorithm. Let Rj be the downstream router of Ri. The

channel state recorded by Rj in the first snapshot is the pre-monitoring packets

from Ri entering the monitoring region of Rj , e.g., na and mb in Figure 2.8.2

Similarly, the channel state recorded by Rj in the second snapshot is the

monitoring packets from Ri entering the post-monitoring region of Rj , e.g.,

nd in Figure 2.8.

We analyze the effect of the channel states recorded by the routers from the

point of view of the victim site. We denote the aggregated channel states as the

sum of all channel states recorded in an instance of the snapshot algorithm.

Let δ(k) be the numbers of packets in the aggregated channel states of the kth

2Note these pre-monitoring packets may also enter the post-monitoring region of Rj

but these packets will not affect our analysis. It is because it will be canceled out in thecalculation of the local traffic


instances of the snapshot algorithm respectively, i.e.,

δ(k) =∑

Ri,Rj∈G

Hji(ti,k, ti,kj), where k ∈ [1, 2] . (2.5)

During the first instance of the snapshot algorithm, δ(1) represents all pre-

monitoring packets that are received in the monitoring region of the victim site.

Similarly, during the second instance of the snapshot algorithm, δ(2) represents

all monitoring packets which are received in the post-monitoring region of the

victim site. Referring to the example in Figure 2.8, δ(1) = na + mb = 110 and

δ(2) = nd = 10.

As a matter of fact, the monitoring packets sent from router Ri are the

local traffic of Ri. If these packets are received only in the monitoring region,

i.e., within [t1, t2], of the victim site V, the traffic inequality problem will not

exist. However, the pre-monitoring packets of the aggregated channel states in

the first instance of the snapshot algorithm arrive at V within [t1, t2], therefore,

the victim site actually receives both the monitoring and the pre-monitoring

packets from all routers within [t1, t2]. Also, the monitoring packets of the

aggregated channel states in the second instance of the snapshot algorithm

arrive at V after t2. Hence, the victim site does not receive all monitoring

packets from the routers.

Let the local traffic of Ri be Li(ti,1, ti,2), where i ∈ [1, . . . , n], and let

IV(t1, t2) be the incoming traffic of the victim site V within [t1, t2]. According

to the above observation, we have the following equation relating IV(t1, t2),

Li(ti,1, ti,2) of Ri, δ(1), and δ(2):

IV(t1, t2) =∑

Ri∈G

Li(ti,1, ti,2) + δ(1) − δ(2). (2.6)

The interpretation of Equation (2.6) is as follows. IV(t1, t2) is composed of the

monitoring packets from all the routers and the pre-monitoring packets of the

aggregated channel states δ(1). Thus, IV(t1, t2) is the sum of∑

Ri∈GLi(ti,1, ti,2)

and δ(1). However, the monitoring packets in the aggregated channel states


δ(2) are received by the victim site V after t2. Therefore, δ(2) is subtracted from

the above sum. Referring to the example in Figure 2.8, we have the following

result by using Equation (2.6):

∑

Ri∈G

Li(ti,1, ti,2) + δ(1) − δ(2) = 130 + 110 − 10 = 230.

The above value is exactly equal to the value of incoming traffic IV(t1, t2) of

V. To summarize, the traffic inequality is compensated by Equation (2.6).

2.3.2 Calculating bounds for the number of packets ar-

rived at the victim site

Recall from the previous subsection, we have two important observations:

1. The packets arrived at the victim site within [t1, t2] not only include the

monitoring packets from the routers, but also include the pre-monitoring

packets of the aggregated channel state δ(1), and

2. The monitoring packets of the aggregated channel state δ(2) arrive at the

victim site after t2.

These observations imply that one cannot directly use the local traffics

Li(ti,1, ti,2) of the participating routers to find the locations of the attackers.

To overcome this problem, one has to identify the originating domains of the

pre-monitoring packets as well as the monitoring packets in the channel states.

Consider the illustration in Figure 2.10 which contains the same topology

as in Figure 2.7 as well as a timing diagram that shows the time-lines of R2,

R3, and R4. The channel state of Link3,2 is measured by the router R2, and it

contains the sequence of packets my. However, the packet sequence mx may

also be included in the channel state of Link3,2 since mx may arrive at router

R3 before R3 records its local state. Thus, the packets in the channel state of

Link3,2 may be sent from R2’s upstream routers R3 and R4. In summary, let


Channel state measured by R2

ν

Upstream routersof R2

time

time

time

R1

R2

R3

R4

R4

R3

R2

mx

my

Figure 2.10: The channel state of Link3,2 contains pre-monitoring (monitoring)packets from both R3 and R4 in the first (second) instance of the snapshotalgorithm.

Rj ∈ U(Ri), and let Ri and Rj be connected by Linkj,i. A non-empty channel

state, Hji(ti,k, ti,kj), of Linkj,i, is composed of the pre-monitoring (monitoring)

packets from the upstream routers of Ri in the first (second) instance of the

snapshot algorithm, and these packets are from Linkj,i.

Based on the above observation, one cannot determine the originating do-

mains of the packets in the channel states. This implies that one cannot cal-

culate an exact number of packets that have arrived at the victim site within

[t1, t2] from each participating router. However, we provide a methodology to

determine the upper and the lower bounds of these packets. Let Rr be an

upstream router of Rq. Let Hpq(tq,k, tq,kp)|Rr

represents the exact number of

packets which are sent from the domain of Rr and contributes to the channel

state Hpq(tq,k, tq,kp) of Linkp,q. Hence, the channel state of Linkp,q is the sum

of Hpq(tq,k, tq,kp)|Rr

for all upstream routers Rr of Rq, and the corresponding

equation is as follows:

Hpq(tq,k, tq,kp) =

∑

Rr∈U(Rq)

Hpq(tq,k, tq,kp)|Rr

. (2.7)

Also, recall from Section 2.1.2 that D(Ri) represents a set of downstream

routers of Ri. The amount of monitoring packets that are generated by the


domain of Ri and arrive at V after t2 is:

∑

Rp,Rq∈D(Ri)

Hpq(tq,2, tq,2p)|Ri

. (2.8)

Equation (2.8) represents the number of monitoring packets which are sent

from the domain of Ri and are recorded as the channel states of the downstream

router of Ri in the second instance of the snapshot algorithm. Li(ti,1, ti,2)

represents the number of monitoring packets sent by Ri within the snapshot

interval. Since the packets in Equation (2.8) are the monitoring packets that

are not received at V within [t1, t2], the number of monitoring packets which

are sent from Ri and are received by V in [t1, t2] is:

Li(ti,1, ti,2) −∑

Rp,Rq∈D(Ri)

Hpq(tq,2, tq,2p)|Ri

. (2.9)

Let L∗i (ti,1, ti,2) be the number of packets which are sent from Ri and are

received by the victim site V within [t1, t2] (the real local traffic). These packets

are composed of two components: (i) the monitoring packets sent from Ri and

arrived at V in [ti,1, ti,2], which is given by Equation (2.9), and (ii) the pre-

monitoring packets sent from Ri and arrived at V within [t1, t2], and these

packets are given as follows:

∑

Rp,Rq∈D(Ri)

Hpq(tq,1, tq,1p)|Ri

. (2.10)

Thus, by Equations (2.9) and (2.10), the real local traffic L∗i (ti,1, ti,2) of Ri is

represented as follows:

L∗i (ti,1, ti,2) = Li(ti,1, ti,2) −

∑

Rp,Rq∈D(Ri)

Hpq(tq,2, tq,2p)|Ri

+

∑

Rp,Rq∈D(Ri)

Hpq(tq,1, tq,1p)|Ri

. (2.11)

Note that it is possible that the pre-monitoring packets may arrive at V after

t2 probably because the interval [t1, t2] is not long enough. However, those


packets will not affect the correctness of the calculation of the real local traffic

L∗i (ti,1, ti,2) because these packets will be recorded in both Equations (2.8) and

(2.10). As shown in Equation (2.11), these packets will be canceled out.

Let Upper(L∗i ) and Lower(L∗

i ) be the upper and lower bounds of the real

local traffic L∗i (ti,1, ti,2) respectively. To find the bounds of L∗

i (ti,1, ti,2) in Equa-

tion (2.11), one can observe that

Hpq(tq,k, tq,kp) ≥ Hpq(tq,k, tq,k

p)|Ri.

Therefore, Upper(L∗i ) and Lower(L∗

i ) are:

L∗i (ti,1, ti,2) ≤ Li(ti,1, ti,2) +

∑

Rp,Rq∈D(Ri)

Hpq(tq,1, tq,1p), and (2.12)

L∗i (ti,1, ti,2) ≥ Li(ti,1, ti,2) −

∑

Rp,Rq∈D(Ri)

Hpq(tq,2, tq,2p) . (2.13)

Referring to Figure 2.8 on Page 44, Upper(L∗4) and Lower(L∗

4) are:

Upper(L∗4) = Li(t4,1, t4,2) + H43(t3,1, t3,1

4) = 110

Lower(L∗4) = Li(t4,1, t4,2) − H43(t3,2, t3,2

4) = 0 .

Since the attacker in R4 sends the sequence of malicious packets mb to V,

Upper(L∗4) is significantly higher than the others. This suggests that the do-

main of R4 is also a possible location of the attackers. In the next section,

we present the simulation results that show the effectiveness of our distributed

methodology.

2.4 Performance Evaluations

In the previous sections, we presented the DDoS traceback algorithm to deter-

mine the intensity of local traffic for each participating router. In this section,

we carry out two different sets of simulations to demonstrate the effectiveness

of our proposed methodology. In the first set of simulations (Simulation A),


R3

R4

R5

R6

R1

R2

ν

Upper Bound of Local Traffic

Local Traffic

Lower Bound of Local Traffic

(a) (b)

Figure 2.11: (a) Network topology and (b) Legend for Simulations A and B.

we use a simple network topology as depicted in Figure 2.11(a) to illustrate

the correctness and robustness of our algorithm under various factors (e.g.,

different processes of generating traffic, different attackers’ location distribu-

tions, . . . , etc). For the second set of simulations (Simulation B), we extend

the performance study to a large scale realistic Internet topology.

Simulation A (Correctness and robustness of DDoS traceback al-

gorithm): This set of simulations evaluates the correctness of the proposed

DDoS traceback algorithm. For this set of simulations, we use a network topol-

ogy in Figure 2.11(a) which contains six routers. The packets are generated by

two methods: (1) constant rate (e.g., an average rate of 100 pkts/sec implies

that every 0.01 second, a router will generate a new packet to the victim site

V), (2) exponential on/off process. (i.e. packets are sent at a fixed rate during

the “on” periods, and no packet will be sent during the “off” periods). Both

the on and off periods are taken from an exponential distribution. The average

duration for the on period and the off period are set to 100ms in this set of

simulations. The bandwidth and the delay of each link are set to 100Mbps

and 50ms respectively.


Simulation A.1 (Bounds for the local traffic): In this simulation, there is one

attacker who is located at the domain of R3. The attack traffic rate of R3 is set

as a constant rate of 500 pkts/sec while the normal traffic rates for all other

routers are set to a constant rate of 100 pkts/sec. The victim site V initiates the

DDoS traceback algorithm to determine the location of the attackers. Figure

2.11(b) illustrates the legend for various graphs in our simulations. Figure 2.12

shows that the upper bound of real local traffic Upper(L∗i ), the lower bound

of real local traffic Lower(L∗i ), as well as the real local traffic L∗

i for all six

routers in four different measurement intervals. The snapshot time interval of

four cases are 1, 2, 3 and 4 seconds respectively. The lower bound and upper

bound of real local traffic are computed based on Equations (2.12) and (2.13).

The real local traffic L∗i is the number of packets sent from router Ri and

received by the victim site V within the snapshot time interval. Note that the

real local traffic L∗i is provided only in the simulation environment. In Figure

2.12, one can observe that:

1. The real local traffic L∗i is between Upper(L∗

i ) and Lower(L∗i ), which

means that our DDoS traceback algorithm can successfully bound the

exact number of packets sent from router Ri and received by the victim

site V in the snapshot time interval.

2. The difference between the bounds of the real local traffic will reduce if

we increase the duration of the measurement interval.

3. The lower bound of real local traffic of the attack domain R3 is signifi-

cantly higher than the upper bound of real local traffic of other routers.

This implies we can locate the source of the attack traffic.

4. Lastly, we observe that the measurement interval can be very short (e.g, 4

seconds) and one can quickly determine the domain of R3 is the location

of the attacker.


0

100

200

300

400

500

600

700

R1 R2 R3 R4 R5 R6

Num

ber

of P

acke

ts

Snapshot Time Interval for 1 seconds

0

200

400

600

800

1000

1200

R1 R2 R3 R4 R5 R6

Num

ber

of P

acke

ts


0

300

600

900

1200

1500

1800

R1 R2 R3 R4 R5 R6

Num

ber

of P

acke

ts


0

400

800

1200

1600

2000

2400

R1 R2 R3 R4 R5 R6

Num

ber

of P

acke

ts


Figure 2.12: Simulation A.1. Bounds for the real local traffic under constanttraffic rate.

0

100

200

300

400

500

600

700

800

900

R1 R2 R3 R4 R5 R6

Num

ber

of P

acke

ts


0

200

400

600

800

1000

1200

1400

1600

R1 R2 R3 R4 R5 R6

Num

ber

of P

acke

ts


0

300

600

900

1200

1500

1800

2100

R1 R2 R3 R4 R5 R6

Num

ber

of P

acke

ts


0

400

800

1200

1600

2000

2400

R1 R2 R3 R4 R5 R6

Num

ber

of P

acke

ts


Figure 2.13: Simulation A.2. The real local traffic under exponential on/offprocess.


0

100

200

300

400

500

600N

umbe

r of

Pac

kets

Snapshot Time Interval for 1 second R6 R5R4 R3R2R1

0

200

400

600

800

1000

1200

R6 R5R4 R3R2R1

Num

ber

of P

acke

ts


0

200

400

600

800

1000

1200

1400

1600

Num

ber

of P

acke

ts

Snapshot Time Interval for 3 seconds R6 R5R4 R3R2R1

0

500

1000

1500

2000

Num

ber

of P

acke

ts

Snapshot Time Interval for 4 seconds R6 R5R4 R3R2R1

Figure 2.14: Simulation A.3. Effect of multiple attackers on the real localtraffic bounds.

Simulation A.2 (Exponential on/off process for packet generation): In this

simulation, we consider the packet generation process which is based on an

on/off exponential process. We use the same network topology in Figure

2.11(a) and repeat the similar simulation as in Simulation A.1. The average

duration of the on period and the off period are set to 100ms in this simula-

tion. Figure 2.13 illustrates the simulation results. We observe that even if the

packet generation process is governed by an on/off process, the algorithm is

robust enough to accurately determine the local traffic intensities of all partic-

ipating routers. Similar to Simulation A.1, the same conclusion can be made

for this simulation.

Simulation A.3 (Multiple attackers): In this simulation, there are two attack

domains and they are located in R3 and R5. We repeat the similar simulation

as in Simulation A.2. The average on period and off period are set to 100ms.

The local traffic from the attack domains is set as 500 pkts/sec while the local


0

100

200

300

400

500

600N

umbe

r of

Pac

kets

Snapshot Time Interval for 1 second R6 R5 R4 R3 R2R1

0

200

400

600

800

1000

Num

ber

of P

acke

ts

Snapshot Time Interval for 2 seconds R6 R5 R4 R3 R2R1

0

200

400

600

800

1000

1200

1400

1600

Num

ber

of P

acke

ts

Snapshot Time Interval for 3 seconds R6 R5 R4 R3 R2R1

0

500

1000

1500

2000

R6 R5 R4 R3 R2R1

Num

ber

of P

acke

ts


Figure 2.15: Simulation A.4. Effect of new attackers’ locations.

traffic of the normal domain is 100 pkts/sec. In Figure 2.14, we observe that

the lower bound of real local traffic of the domains of R3 and R5 are signif-

icantly higher than the upper bound of real local traffic of other domains for

all cases. Therefore, our DDoS traceback algorithm can effectively and quickly

determine various attackers’ locations.

Simulation A.4 (Varying attackers’ location): In this simulation, we consider

different locations for the two attackers and analyze its effect. We repeat the

same simulation as in Simulation A.4 but the two attackers are in routers R2

and R4. In Figure 2.15, one can observe that the lower bound of real local

traffics of the domains of R2 and R4 are significantly higher than the upper

bound of real local traffics of other domains for all cases. We conclude that

our methodology is robust and it is not sensitive to the location distributions

of different attackers.


0

20

40

60

80

100

0 200 400 600 800 1000

Mea

sure

d pe

rcen

tage

of a

ttack

traf

fic (

%)

Attack traffic rate (pkt/sec)

(500, 50.51)

(225, 31.35)

snapshot interval: 1 secsnapshot interval: 2 secsnapshot interval: 5 sec

snapshot interval: 10 secsnapshot interval: 100 sec

Figure 2.16: Simulation A.5. On different attack traffic rates.

Simulation A.5. (Different attack traffic rates): In this simulation, we inves-

tigate the effect of different attack traffic rates on the traceback result. Again,

as similar to Simulation A.1, the normal traffic is sending at a rate of 100

pkt/sec, and the attacker is in the domain of R3 sending out packets with a

constant bit rate. But, now, we carry out the simulation with different attack

traffic rates ranging from 50 pkt/sec to 1000 pkt/sec at a step of 25 pkt/sec.

Our aim is to investigate 1) in what percentage the attack traffic contributes

to the aggregated traffic received by the victim, and 2) in what range of the

attack traffic the traceback methodology is effective in locating the attackers.

In Figure 2.16, there are five different plots of the measured attack traffic

percentage against the attack traffic rate at five different snapshot intervals:

one, two, five, ten, and a hundred seconds. The attack traffic percentage is

calculated by dividing the local traffic of R3 by the total number of received

packets. Firstly, according to the figure, the percentage of the measured attack


traffic decreases as the snapshot interval increases. Nevertheless, the decreasing

percentage will eventually converge to a certain value as shown in the plot of

100-second snapshot interval.

We now describe the way in which the victim locate the attackers. We

denote a term filtering threshold which means if the percentage that a domain’s

traffic contributes to the aggregated traffic is over that threshold, that domain

will be considered as an attacking domain. Eventually, the corresponding

network administrator will be notified and starts filtering the large flow. If

one sets the threshold to 50%, then, referring to the figure, one can find only

the attackers with traffic rate greater than or equal to 500 pkt/sec, labeled by

the coordinates (500, 50.51). For another example, if the threshold is set to be

30%, then the victim can find attackers with the traffic rate greater than 225

pkt/sec.

According to this simulation, we can find two main factors affecting the

effectiveness in locating the attack traffics.

1. The attack traffic to the normal traffic ratio. According to the simulation

results, the methodology may fail to detect the attacker if the attack

traffic rate is not large enough.

2. The total number of domains. If the total number of domains that the

traceback methodology is monitoring is large, then even every innocent

domain sends a small amount of traffic, this makes the attack flow not

significantly dominating and lowers the percentage of the attack flow.

Under this situation, this may require the administrator to lower the

filtering threshold. For the case that the total number of domains is

small, a similar analysis can be applied and it is suggested that the

threshold should be a large value in order that the innocent domains will

not be mis-classified as attacking domains.


0

500

1000

1500

2000

Class1 Class2 Class3 Class4 Class5 Normal

Ave

rage

Num

ber

of P

acke

ts


0

500

1000

1500

2000

2500

3000

3500


Ave

rage

Num

ber

of P

acke

ts


0

1000

2000

3000

4000

5000


Ave

rage

Num

ber

of P

acke

ts


0

1000

2000

3000

4000

5000

6000


Ave

rage

Num

ber

of P

acke

ts


Figure 2.17: Simulation B. Simulation for large scale Internet Topology.

In conclusion, the set of results in Simulation A shows the following find-

ings.

1. As the snapshot interval becomes longer, the upper bound and the lower

bound of the local traffic become closer to the measured local traffic.

2. The methodology can quickly locate the attackers with dominating large

flows, however the efficiency is subject to the magnitude of the attack

flow and the total number of domains in the network.

Simulation B (Simulations on a large scale realistic Internet topol-

ogy): To validate the correctness of the theoretical bounds of the local traffic,

we extend the performance study to a large scale and realistic Internet topol-

ogy. We use the Internet topology from [62]. The testing dataset in our

simulations contains 1,000 distinct routers. The source of traceroute is consid-

ered as the victim site V and the traceroute dataset is considered as the map of

the upstream routers. We use this dataset and construct a network simulation


test-bed based on NS-2. There are five classes of attack traffic rates. The at-

tack traffic rates of Classes 1, 2, 3, 4, and 5 are 150 pkts/sec, 175 pkts/sec, 200

pkts/sec, 225 pkts/sec, and 250 pkts/sec respectively. There are 10 attackers

in each class and the attackers are evenly distributed in the different domains

of the network. The local traffic rate of normal domain is 10 pkts/sec. Packets

are generated according to the exponential on/off process. Both the average

on/off period are set to 100ms in this simulation. The bandwidth and the

delay of each link are set to 100Mbps and 20ms respectively.

The victim site V initiates the DDoS traceback algorithm to determine the

location of the attackers. Figure 2.17 shows that the upper bound of real local

traffic Upper(L∗i ), the lower bound of real local traffic Lower(L∗

i ), and the real

local traffic L∗i for the five classes of attacker as well as for the normal routers.

In this simulation, we have four different measurement intervals, they have the

duration of 5, 10, 15 and 20 seconds respectively. The attack domain which has

the largest upper bound of real local traffic within its class is selected and its

traffic rates are plotted in the figure. From Figure 2.17, one can observe that

L∗i is in between Upper(L∗

i ) and Lower(L∗i ) for all classes, which means that

our algorithm can successfully bound the real local traffics of all participating

routers. When the snapshot time interval increases from 5s to 20s, we observe

that the spread of the bounds of the local traffic tends to decrease. Therefore,

the estimation of the real local traffic L∗i becomes more accurate for a longer

snapshot time interval. On the other hand, we can see that the lower bound

of real local traffic from each of the five attack domains is significantly higher

than the upper bound of real local traffic of normal domain. It implies that

we can quickly (e.g, with 20 seconds) and accurately (based on the differences

of the bounds) determine the locations of the attackers.


2.5 Implementation Issues

In previous sections, we have shown that our DDoS traceback algorithm is

effective in locating potential attackers and filtering attack packets. The dis-

tributed traceback algorithm relies on the assumption that the victim site Vhas a map of its upstream routers. In this section, we illustrate this assump-

tion is reasonable and practical. Also, we show that our proposed distributed

traceback algorithm leverages on the existing traceback technologies and can

complement existing infrastructure such as the ICMP traceback technique [63].

However, we cannot show that the overhead of the proposed method is small

on high-end routers. We can show only that the overhead of the traceback

service is not significant under a set of experimental Linux routers.

2.5.1 Topology construction

There are ways to obtain a map of upstream routers for a given victim site.

Many network management tools exist for mapping, for example a tool based

on traceroute from Lucent Bell Labs[62] and a tool based on ICMP echo re-

quests from CAIDA[64]. In these techniques, the victim site V sends packet

to probe hosts which are k ≥ 1 hops away. This packet contains a TTL field

which is decremented by one for each traverse link. When the TTL reaches

zero, the router sends a reply back to V. This form of probing provides the

router adjacency information which can help V to build a map of upstream

routers.

Another efficient method to obtain an upstream map is to store the router

adjacency or edge information into the packets. Approaches like probabilis-

tic packet marking[30, 48] encode the router adjacency information into the

packet header. Other approaches like itrace[63, 65, 66] generate separate ICMP

packet with router adjacency information to a victim site V. When V receives

these packets, it extracts the router adjacency information to build a map of


upstream routers.

Note that when one invokes our proposed distributed traceback methodol-

ogy, the traceback region occurs only within the map of upstream routers. In

other words, only routers within the map need to participate in the distributed

traceback. Since it is possible that some leaf routers are at the edge of the

map and at the same time, connected to some other routers outside the map.

In this case, all transit and local traffics of this type of leaf routers will be con-

sidered as the local traffics of these leaf routers. One can progressively apply

the distributed traceback algorithm to this type of leaf routers to determine

the source of the attack. For example, if the local traffic of a leaf router is very

high, then one can consider this leaf router as a victim site and initiates the

distributed traceback algorithm again.

However, it is heavy for an ordinary host to store the map as it is huge in

size. Nevertheless, techniques like the Sink Hole [28] can help forwarding traffic

to a data processing center hosted by the ISP. The data collection process and

the network map storage can then be done in that dedicated host. The only

weakness about this approach is that if there are many hosts requesting the

traceback service, the loading of the data processing center will become huge.

2.5.2 System overhead

The proposed traceback methodology is working on the victim site and the

participating routers. Under most of the execution time, the routers have to

keep track on traceback data whenever a packet passes through it. The pro-

cessing of the outgoing counters, the markers, and the channel states must

incur an inevitable overhead on the router. However, we cannot provide any

solid data about this issue. The problem is that nowadays high-end routers,

which are deployed world-wide, do not provide any programmable feature for

us to modify the router and measure the overhead of our proposed traceback


methodology. Nevertheless, on the low-end side, the Linux-based router pro-

vides us a possible choice of the programmable routers.

We have implemented a programmable router prototype together with our

proposed DDoS traceback algorithm named the OPERA [67] by introducing

new modules to the netfilter [68] in the Linux system. Although the system

overhead on high-end routers remains a subject of the future research, we

provide the system overhead analysis on low-end routers to show, firstly, the

proposed methodology is implementable and is deployable, and, secondly, the

proposed methodology does not involve complex computation and hence has

a small overhead problem on the low-end routers.

We have implemented an experimental environment called the OPERA.

Each router has to install and load the modules provided in OPERA. Nev-

ertheless, it is difficult to carry out experiments to measure the overhead,

however, the work done by Harris et al. [69] can support our claim that the

system overhead for a Linux router is not expensive.

The work done by Harris et al. [69] has carried out experiments to test

the firewall in Linux machines, i.e., the iptables. We focus on the latency test

in which shows that the performance degrades as the number of filter rules

increases. The experiments carried out in [69] show that when filtering IP

addresses, TCP/UDP ports, and MAC addresses, the latency for filtering a

packet per rule increases linearly and the latencies are approximately 0.12,

0.66, and 0.68 µs/rule respectively.

In the OPERA project, we utilize the iptables, but, we also introduce new

functionality by inserting routines into the hook points of the netfilter. When

we implement the snapshot algorithm, the inserted routine only handles two

events: 1) updates a variable whenever a packet comes in, and 2) responses

instantly on incoming markers. The first event can be handled efficiently as it

is just a variable update, and the second event requires a routine in matching

the source of an incoming marker as well as the injection of a new marker.


Nevertheless, these two events are analogous to a filtering routine. Thus,

the introduced system overhead will be as light as using the iptables, which is

widely used in Linux system nowadays, with only two filtering rules introduced.

On the other hand, for each victim site, OPERA only needs to allocate a

set of memory for each outgoing interface and another set of memory for each

channel state. Hence, this involves only a few memory usage3 , and is scalable

for several hundreds of registered victim sites4 . We argue that this traceback

service is a privileged service. There will not be many sites paying for this

service except those with high popularity. It is peculiar for a router to handle

several thousands of victim sites simultaneously. If this does happen, this is

a sign of another level of DDoS attack and the attack target is the traceback

mechanism itself. This requires an distributed authentication protocol among

the routers and the victim sites. This is beyond the scope of this thesis and is

considered as the future work.

Further, if there are compromised hosts sending requests of tracing DDoS

attacks which are not really happening, our system will be overwhelmed by the

malicious hosts. If a host is compromised, the most important point to note is

the ability for a router, a victim, or a third party (e.g. Certificate Authority,

CA) to discover its malicious identity. In most cases, there is no way for any

entity to distinguish whether a request is coming from a compromised host or

not. If a host is compromised, the compromiser most likely has the private

information of that host such as the encryption keys and the authentication

secrets, and she is free to invoke the traceback service even though an authen-

tication protocol is implemented between the routers and the clients. Hence,

the method to discover the malicious identity of a compromised host is beyond

the scope of our traceback system.

3For example, a router with three incoming interfaces and one outgoing interface onlyneeds four variables, and each variable needs four bytes (an unsigned long integer) for oneset of snapshot data. A total memory usage is only 16 bytes.

4There are several hundred Kbytes of memory available in the kernel of Linux.


2.5.3 Implementation issue based on ICMP traceback

Our proposed distributed traceback methodology can complement and leverage

on the current ICMP traceback [63]. The main idea of ICMP traceback is,

for each router, it samples the forwarding packets with low probability (e.g.,

1/20000), and to generate an authenticated ICMP traceback message on to the

sampled packets and forwards to the victim site. The ICMP traceback message

has information about the routers: on which link interfaces packets arrive and

depart, as well as the information about its previous and next routers. During

a DoS attack, a victim can use these ICMP traceback messages to reconstruct

an attack path.

We view that our proposed distributed traceback methodology and the

ICMP traceback infrastructure are complimentary. The ICMP traceback ap-

proach encodes a map of upstream routers in the ICMP traceback messages.

Therefore, a victim site can use the information in ICMP traceback messages

to build a map of upstream routers. A router running the ICMP traceback

service has the capability to associate a packet with the input port or MAC ad-

dress on which a packet arrived. This capability can help the routers to count

the incoming and outgoing traffic in our distributed traceback algorithm. One

can also use the existing ICMP traceback infrastructure so that routers can

send back local state and channel states back the victim site. Another im-

portant point is that the ICMP traceback provides an authentication service.

This can also be applied to authenticate the victim site, the senders of the

marker, local state and channel state information. The main disadvantage

of the existing ICMP traceback is, due to the low probability of generating

ICMP messages, it requires many attack packets to pass through a router so

as to identify the locations of the attackers. On the contrary, our distributed

traceback algorithm can trace the location of the attackers in a short period

time. Hence, one can achieve a more effective traceback performance by using


our proposed distributed traceback algorithm in conjunction with the ICMP

traceback service.

2.5.4 An alternative to aggregate congestion control and

push-back

Our proposed methodology can be treated as an alternative to the aggregate

congestion control [24] (ACC for short, and a brief survey in included in Section

1.4.2). The ACC, same as our proposed approach, also requires modifications

of the routers and introduces inevitable overheads. The reasons why our ap-

proach can be an alternative to the ACC mechanism are as follows:

• The modifications of the routers for ACC approach is much more complex

than the modification brought about by our approach, and this implies

a heavier overhead for the routers.

• The classification of the aggregates is a not a light burden for the router.

E.g., the router may have to match the “characteristics” of every incom-

ing packet against every definition of the known aggregates. As suggested

in [24], the classification of aggregates depends on the rules known to the

routers. If the number of rules is large, which is quite certain to be true

in order to have an effective aggregation detection, the burden will be

large. As the overhead of our approach grows with the number of victims

while the ACC approach has an inevitable large overhead, our approach

can be a better choice.

2.5.5 Special deployment - acyclic network

Through the example presented in Section 2.2.5 on Page 42 as well as the simu-

lations presented in Section 2.4 on Page 53, we have shown how the distributed


R3

R1

R2

ν

R1

R2

ν

(a)

(b)

R4

R4

R3

C3,1(t)

C3,2(t)

Figure 2.18: (a) An acyclic network with one attacker who resides in the localdomain of R3. (b) R3 maintains two accumulative outgoing traffic countersC3,1(t) and C3,2(t) for the links Link3,1 and Link3,2, respectively.

snapshot traceback algorithm works in a tree topology. We now consider the

case that the network is an acyclic one.

A tree is obviously an acyclic network. However, a network having a router

with multiple number of outgoing links, without forming any cycles, is also

considered as an acyclic network. The Chandy-Lamport distributed snapshot

algorithm can be applied to acyclic networks. In the following, we repeat the

procedures taken in Section 2.2.5 (on Page 42) so as to demonstrate how the

distributed snapshot traceback algorithm works on acyclic networks as well.

Figure 2.18(a) shows an example acyclic network. The attacker is inside

the domain of router R3. Nevertheless, unlike other routers in the example

network in Figure 2.7 on Page 43, R3 has two outgoing links. Since R3 has two


outgoing links, it is natural that there should be two corresponding accumula-

tive outgoing traffic counters for R3: one for Link3,1 and one for Link3,2. But,

the current algorithm presented in Section 2.2.4 (on Page 36) only supports

one outgoing traffic counter. To remedy this problem, we provide the following

amendments to the distributed snapshot traceback algorithm.

Supporting multiple number of outgoing link

Instead of having one accumulative outgoing traffic counter for each router,

we set the number of the accumulative outgoing traffic counters to the number

of outgoing links. Denote Ci,j(t) as the value of the accumulative outgoing

traffic counter for Linki,j at time instant t. During the kth instance of the

snapshot algorithm, denote the time instant when Ri receives a marker from

Linkj,i and records the value of the counter Ci,j(t) as t(i,j),k, and denote the

time instant when Ri receives a marker from Linkp,i after it has recorded the

value of any one of its accumulative outgoing traffic counters as t(i,j),kp. Lastly,

we abuse the notation ti,k and denote it as the logical time that Ri calculates

its accumulative local traffic.

Then, the accumulative local traffic, instead of using Equation (2.4) on

Page 42, is changed into the following:


Rj∈U(Ri)

(

Cj,i(tj,k) − Hji(t(i,j),k, t(i,j),kj))

. (2.14)

where Ci(t) =∑

Rj∈D(Ri)Ci,j(t) and D(Ri) is the set of immediate downstream

routers of Ri.

Figure 2.19 shows the timing diagram for a traceback executed on the

acyclic network shown in Figure 2.18(a). Note, in the figure, we also depicted

the routes of the packets after they have passed through R3 because there are

two downstream routers for the packets to choose.

With the above change in the calculation of the accumulative local traffic,

we are interested in how one can have the correct local traffic values calculated.


R4

R3

R2

R1

ν 0

10

0



time instant of recordingthe accumulativeoutgoing counter

time instant tostop recordingthe channel state

time instant thatmalicious packets(100 packets) are sent

time instant thatnormal packets(10 packets) are sent

na nb

ma

time

time

time

time

time

10

0

10

0

0

120

20

020

100

0

20

100

00 00

00

t(2,3),1

t(1,3),1

t(2,3),2

t(1,3),2

t(3,4),1

t(2,3),14 t(2,3),2

4

t(3,4),2

Figure 2.19: Another timing diagram that shows the progress of the distributedsnapshot traceback algorithm.

It is expected that the values of the local traffic of R3 and R4 are 10 and 100,

respectively. Since the calculation of R4 is obvious, we present the calculation

of the local traffic of R3 as follows.

N3(t3,1) = (C3,2(t(2,3),1) + C3,1(t(1,3),1))

−(C4,3(t(3,4),1) − H43(t(2,3),1, t(2,3),14))

= (10 + 0) − (10 − 0) = 0.

N3(t3,2) = (C3,2(t(2,3),2) + C3,1(t(1,3),2))

−(C4,3(t(3,4),2) − H43(t(2,3),2, t(2,3),24))

= (100 + 20) − (20 − 0) = 100.

∴ L3(t3,1, t3,2) = 100.

In summary, to support a network graph with routers having more than

one outgoing edge, such routers have to implement more than one accumula-

tive outgoing traffic counter. The calculation of the accumulative local traffic

should then be changed accordingly as shown in Equation (2.14).


2.5.6 Partial deployment

Our traceback scheme is proofed to provide a meaningful traceback result ac-

cording to the previous analysis and simulations. However, we have assumed

that all the routers involved in the traceback are equipped with the traceback

ability. In this subsection, we discuss the possibility for our scheme to provide

a meaningful traceback result under a partial deployment environment, which

means that not all routers involved know the traceback protocol. Firstly, we

have the following notations. We name a router with the traceback scheme de-

ployed a deployed router while a router without the traceback scheme deployed

an undeployed router.

Our idea in providing the partial deployment is to treat the local traffic

generated from an undeployed router as the local traffic of its nearest down-

stream deployed router. Thus, if an attacker is located in the domain of an

undeployed router, then its downstream deployed router will report a high level

of local traffic. This suggests that an attacker is hiding in either the deployed

router or its upstream deployed router.

However, the tradeoff in providing the partial deployment is the introduc-

tion of a set of strict conditions. The conditions are as follows.

1. The last mile router of the victim must be a deployed router. If

the last mile router of the victim is an undeployed router, then the local

traffic of the last mile router will become the local traffic of the victim,

which is not a reasonable result because the victim should not generate

any local traffic.

2. An undeployed router will not process nor drop the marker

packet. The undeployed router is transparent to the traceback protocol,

and thus the marker should only be forwarded by the undeployed router.

3. Each deployed router knows whether a router in the Internet


R1

R2

R4

R3

ν

attackers

undeployed router

R1

R2

R4

ν

attackers

a virtual link

(a) (b)

Figure 2.20: (a) The same example network as Figure 2.7 with attacking do-mains R3 and R4. But, the router R3 is an undeployed router. (b) Logically,a virtual link between the router R2 and R4 is formed.

map is a deployed router or an undeployed router. From the

viewpoint of a deployed router, in order to send markers to its nearest

upstream deployed routers, the deployed router needs to know the loca-

tion of the nearest upstream deployed routers. In a partial deployment

environment, the nearest upstream deployed routers may not be the im-

mediate neighboring upstream router. Therefore, for practical reasons,

each deployed router is required to know the locations of all the deployed

and the undeployed routers.

Deployment illustration

To illustrate how the partial deployment works, we revisit the example in

Figures 2.7 and 2.8 in Section 2.2.5. In this time, we change R3 to be an

undeployed router as shown in Figure 2.20(a).

When the traceback starts, the victim sends a marker to the router R2.

When R2 receives that marker, it decides the set of routers to which the marker

should be sent. According to the third condition (the deployed router knows

where the nearest upstream deployed routers are), the router R2 will find

that the nearest upstream deployed routers are R1 and R4. Then, R2 sends


R1

R2

R3

R4

ν 0 0

0

10

100

10010

230

20

110

230 0 10

0



na nb

nc

ma

t1t2

t1,1

t2,1

mbnd

t4,1

time

time

time

time

time

Figure 2.21: A timing diagram that shows the progress of the DDoS tracebackalgorithm under the partial deployment environment.

two markers destined for R1 and R4 accordingly. As the undeployed router

R3 forwards only the marker packet, eventually, all deployed routers will be

invoked. Meanwhile, the router R2 is instructed to record the channel states

until the markers from routers R1 and R4 arrive. From the viewpoint of router

R2, when it records the channel state, it no longer records the channel state

of the physical link Link3,2. Instead, a virtual link Link4,2 is established as

shown in Figure 2.20(b), and the router R2 will monitor this virtual link.

Figure 2.21 shows the timing diagram of the traceback result under the

partial deployment environment, and it shows the same scenario as in Figure

2.8 in Section 5.5 except that the router R3 is an undeployed router, thus there

is no reading recorded by R3. Also, the timing diagram depicts that the router

R1 is recording the channel state of the virtual link Link4,2. We now calculate


the accumulative local traffic of R2 by using Equation (2.4).

N2(t2,1) = C2(t2,1) − (C1(t1,1) − H12(t2,1, t2,11))

− (C4(t4,1) − H42(t4,1, t4,14))

= 0 − (10 − 10) − (100 − 100) = 0 .

N2(t2,2) = C2(t2,2) − (C1(t1,2) − H12(t2,2, t2,21))

− (C4(t4,2) − H42(t4,2, t4,24))

= 230 − (20 − 0) − (110 − 10) = 110 .

The local traffic of R2 is 110, which is the sum of the local traffic of R2 and R3

in the full deployment environment (see Table 2.4). Hence, the local traffic of

the undeployed router R3 is forwarded to the downstream deployed router R2.

Problem in measuring channel states

However, there is one problem about the introduction of the virtual link. To

illustrate, let us consider the scenario in Figure 2.22. According to the snapshot

algorithm, the router R1 will record the channel states of the virtual links

Link3,1 and Link4,1. But, the channel states that R1 is recording is the physical

link Link2,1. As R2 routes packets from R3 and R4, the physical channel Link2,1

is a mixture of channel states of Link3,1 and Link4,1. As a result, R1 is not

able to distinguish the two virtual links. This problem is illustrated by the

zoom-in part of Figure 2.23. After the marker from R3 arrives, the physical

channel Link2,1 now belongs only to the virtual link Link4,1.

To remedy the problem, the following approximation is applied: to dis-

tribute the mixed channel states into two shares proportional to the mea-

sured local states of R3 and R4. Mathematically, we have the following. De-

note the mixed channel state measured by R1 on the physical link Link2,1 as

H21(t1,1, t1,13), and denote the channel state measured by R1 on the physical


R1R2

R4

ν

R3undeployed

router

R1

R4

ν

R3virtual link Link3 1

virtual link Link4 1

(a) (b)

Figure 2.22: (a) In this example network, the router R2 is an undeployedrouters while the others are deployed routers. (b) As the undeployed router istransparent to the traceback protocol, the router R1 records the channel stateof the virtual links Link3,1 and Link4,1.

R4

R3

R2

R1

ν

time

time

time

time

timemixed channel

stateschannel state of

virtual link Link4 1

Figure 2.23: The timing diagram under a partial deployment environment. Adrawback is that the channel states of the virtual links Link3,1 and Link4,1

become indistinguishable at router R1.

link after R1 receives a marker from R3 as H21(t1,13, t1,1

4) . Also, denote the lo-

cal traffic measured at R3 and R4 as L3(t3,1, t3,2) and L4(t4,1, t4,2) respectively.

Then, the channel states H31(t1,1, t1,13) and H41(t1,1, t1,1

4) are given as follows:

H31(t1,1, t1,13) = H21(t1,1, t1,1

3) × L3(t3,1, t3,2)

L3(t3,1, t3,2) + L4(t4,1, t4,2); (2.15)

H41(t1,1, t1,14) = H21(t1,1, t1,1

3) × L3(t3,1, t3,2)

L3(t3,1, t3,2) + L4(t4,1, t4,2)

+ H21(t1,13, t1,1

4) . (2.16)

The rationale of this solution is based on the assumption that if the ratio


of the local traffic of R3 to the local traffic of R4 is at a certain value (say

x) within the snapshot interval, then, during the time before and after the

snapshot interval, the ratio will be very likely around the value x. Therefore,

we distribute the mixed channel state in two shares according to the ratio x.

Note that this solution is scalable. Although the undeployed routers form a

sub-network, the scheme still works conditioned that the sub-network routes

packets in a FIFO manner.

2.6 Chapter Summary

In this chapter, we propose a distributed traceback methodology for DDoS

attacks such that a victim site can locate attackers on the fly. During the

execution of the algorithm, a router has only to perform a light-weighted pro-

cedure that that keep track of (i) the number of packets forwarded to a victim

site and (ii) the number of transit packets for all its incoming links during the

recording of the router’s local state. By providing these two pieces of infor-

mation, a victim site can accurately determine the intensity of a router’s local

traffic. We also present an efficient algorithm so that a victim site can accu-

rately determine the number of packets (with upper- and lower-bounds) from

each router during the victim’s measurement interval. The set of information

can assist the victim to determine the locations of the attackers whether the at-

tack packets are spoofed or not within a short measurement interval. We carry

out simulations to illustrate that our methodology is effective independent of

the attack traffic volume, the attack traffic patterns as well as the location

distribution of the attackers. We believe that the proposed distributed trace-

back methodology can complement and leverage the existing ICMP traceback

so that a more efficient and accurate traceback can be obtained.

One drawback of the distributed snapshot traceback algorithm is that it is

not sensitive to non-significantly dominating flows. Accordingly to the results


obtained from the simulations, our methodology can only rank the sizes of the

flows so that one can pinpoint the dominating flows. If all of the flows are of

similar volumes, no conclusion can be made from the traceback result. Or, the

only conclusion is every domain is demonstrating a distributed denial-of-service

attack simultaneously and implies a global scale attack.

Chapter 3

Probabilistic Packet Marking

Algorithm

The distributed snapshot traceback algorithm introduced in the previous chap-

ter has provided us with an efficient algorithm in measuring the traffic gen-

erated by domains or ISPs. As suggested in Chapter 1, after the distributed

snapshot traceback algorithm has pinpointed the ISPs that send dominating

flows, the next step is to focus on the traceback within the targeted ISPs, us-

ing a microscopic traceback algorithm. Because of the different structure of the

network inside an ISP, the distributed snapshot traceback algorithm may not

be the best traceback algorithm. However, we are not going to reinvent the

wheel, and we choose an algorithm that is widely accepted, the probabilistic

packet marking algorithm, to be the microscopic traceback algorithm.

The probabilistic packet marking algorithm (PPM algorithm for short)

proposed by Savage et al. [30] attracted the most attention in contributing the

idea of IP traceback. The most interesting point of this IP traceback approach

is that it allows routers to encode certain information on the attack packets

based on a pre-determined probability. Upon receiving a sufficient number of

marked packets, the victim can construct the set of paths the attack packets

traversed, and hence the victim can obtain the location(s) of the attacker(s).

In our case, the victim is not be the true victim, but the micro-traceback

79

Chapter 3 Probabilistic Packet Marking Algorithm 80

processing node. Nonetheless, we still name the place that the attack traffic

concentrates the victim, and we treat the victim as a normal client in the

remainder of the thesis.

3.1 Structure of This Chapter

In this chapter, we present an in-depth introduction of the PPM algorithm. In

Section 3.2, we describe the goal as well as the structure of the PPM algorithm.

In Section 3.3, we present the assumptions of the PPM algorithm. Section 3.4

shows an illustration of the PPM algorithm so that readers can become familiar

with the execution of the PPM algorithm. Lastly, Section 3.5 concludes this

chapter.

3.2 Goal and Structure of the PPM Algorithm

The goal of the PPM algorithm is to obtain a constructed graph such that

the constructed graph contains the attack graph, where an attack graph is the

set of paths the attack packets traversed, and a constructed graph is a graph

returned by the PPM algorithm.

3.2.1 Global network and attack graph

We depict the meaning of an attack graph through the example in Figure 3.1.

The figure shows a simple network with the legitimate users as well as the

attackers attached. We name this network the global network. As an attack

graph is defined as the paths traversed by the attack packets that form the

attack graph, therefore, not all of the global network is the attack graph, and

the attack graph should contain only the affected routers and edges, depicted

in Figure 3.2.


R1

ν

R5

R2

R4R3

R7

R8

R6

legitimateusers

attackers

Ri

ν

router

victim

group ofclients

traffic flow

Figure 3.1: A typical case of a DDoS attack toward the victim V.

Nevertheless, it is always hard to decide whether a packet is legitimate or

not. Eventually, there may be cases when the attack graph contains more

nodes and more edges than the actual attack graph. As depicted in Figure

3.1, the legitimate traffic is mixed with the attack traffic (at router R1). As it

is not easy to make a fast and accurate decision about the legitimacy of the

packet (because the source address of the packet may be spoofed), an attack

graph that has included routers and edges that are not traversed by the attack

packets is also accepted, and we call this graph the relaxed attack graph.

3.2.2 Constructed graph

To fulfill the goal to obtain the attack graph, [30] suggested a method to encode

the information of the edges of the attack graph into the attack packets through

cooperation between the routers and the victim site. When collecting enough

encoded packets, the victim builds a constructed graph based on the encoded

information. Thus, a constructed graph is the result returned by the PPM

algorithm.

We now define the correctness of the constructed graph. A constructed


R1

ν

R4R7

R8 R5 R2

R1

νR4R7

R8 R5 R2

R3

(a) (b)

Figure 3.2: The illustration of an attack graph: (a) an attack graph is not theentire network; the attack graph is the paths traversed by attack packets; (b)the attack graph may become larger than the actual one due to the lack oflegitimacy of the packets.

graph must contain the attack graph as its sub-graph. When the PPM al-

gorithm stops and returns such a graph, then the PPM algorithm returns

a correct result. Otherwise, the constructed graph is an incorrect one. We

formally define the correctness of the constructed graph in Definition 3.1.

Definition 3.1 A constructed graph returned by the PPM algorithm is correct

if and only if the constructed graph contains the attack graph as a sub-graph.

Note important that Definition 3.1 includes the case when the constructed

graph is the same as the attack graph as well as the case when the constructed

graph is a relaxed attack graph.

3.2.3 Structure of the PPM algorithm

In particular, the PPM algorithm is made up of two separated procedures: the

packet marking procedure, which is executed on the router side, and the path

reconstruction procedure, which is executed on the victim side.

The packet marking procedure is designed to randomly encode edges’ infor-

mation on the packets arriving at the routers. By using the information, the


victim then executes the path reconstruction procedure to construct the attack

graph. We first briefly review the packet marking procedure so that readers

can become familiar with how the router marks information on the packets.

A brief review of the packet marking procedure

The packet marking procedure aims to encode every edge of the attack graph,

and the routers encode the information in the following three marking fields of

an attack packet: the start, the end, and the distance fields (wherein [30] has

discussed the design of the encoding of the marking fields). In the following,

we describe how a packet stores the information about an edge in the attack

graph, and the pseudocode of the procedure from [30] is given in Figure 3.3

for reference.

When a packet arrives at a router, the router determines how to process

the packet based on a random number x (line #1 in the pseudocode). If x

is smaller than the pre-defined marking probability pm, the router chooses to

start encoding an edge. In other words, the probability that the router starts

encoding an edge is pm. The router sets the start field of the incoming packet

to the router’s address, and resets the distance field of that packet to zero.

Then, the router forwards the packet to the next router.

When the packet arrives at the next router, the router again chooses if it

should start encoding another edge. Say, this time, the router chooses not to

start encoding a new edge. Then, the router will find out that the previous

router has started marking an edge because the distance field of the packet

is zero. Eventually, the router sets the end field of the packet to the router’s

address. Nevertheless, the router increments the distance field of the packet

by one so as to indicate the end of the encoding. Now, the start and the end

fields together encode an edge of the attack graph. For this encoded edge to be

received by the victim, successive routers should choose not to start encoding

an edge, i.e., the case x > pm in the pseudocode, because a packet can encode


Packet Marking Procedure(Packet w)

1. Let x be a random number in [0 . . . 1)2. If x < pm, then3. write router’s address into w.start and 0 into w.distance4. else5. If w.distance = 0 then6. write router’s address into w.end7. end If8. increment w.distance by one9. end If

Figure 3.3: The pseudocode of the packet marking procedure of the PPMalgorithm.

only one edge. Further, every successive router will increment the distance

field by one so that the victim will know the distance of the encoded edge.

Path reconstruction procedure

The path reconstruction procedure is the final step to build the constructed

graph. The procedure works with the encoded packets, and it extracts the

edge information from every packet. Note that, to avoid attackers in spoofing

the packets, the victim has to know the global network (not the attack graph),

and the procedure will eliminate the abnormal edge information (line #8 in

Figure 3.4). A subtle note is that, as the name of the procedure suggested,

this procedure works only with paths. But, this does not stop the procedure

from handling multiple numbers of paths.


Path reconstruction procedure

1. Let G be a tree with root v, where v is the victim.2. Let every edge in G be (start, end, distance).3. For each packet w from attacker4. if w.distance == 0 ; then5. insert edge (w.start, v, 0) into G;6. else7. insert edge (w.start, w.end, w.distance) into G;8. remove any edge (x, y, d) with d 6= distance from x to v in G;9. extract path (Ri. . .Rj) by enumerating acyclic paths in G;

Figure 3.4: The pseudocode of the path reconstruction procedure of the PPMalgorithm.

3.3 Assumptions

The PPM algorithm is not a versatile methodology that can tackle all kinds of

distributed denial-of-service attacks such as the TCP-SYN flood attack and the

reflector attack. There are assumptions imposed on the execution environment

and the algorithm itself.

3.3.1 Marked packets and PPM markings

A received packet by the victim may contain information that is needed to

reconstruct the attack graph. According to [30], the marked information is

sliced and is distributed in encoded packets. In turn, the victim can reconstruct

the information by collecting enough fragments.

In our context, the PPM markings are the information reconstructed by

collected packets. To simplify the discussion, we are actually not focused on

the reconstruction of the sliced-edge information. Rather, we use the term

marked packet as a virtual packet that contains a set of reconstructed marking


information.

Assumption 3.1 It is assumed that a marked packet can always be recon-

structed from slices of encoded information.

3.3.2 Router

We assume that every router in the network is willing to participate in the PPM

algorithm when requested. For each router, we assume that it is equipped with

the ability to mark packets following the packet marking procedure.

Functionally speaking, a router can either be an transit router or a leaf

router: a transit router forwards traffic from upstream routers to its down-

stream routers (or the victim) while a leaf router connects to the upstream

client computers (not routers) and forwards the clients’ traffic to its down-

stream routers (or the victim).

3.3.3 Packet marking probability

One of the characteristics of the PPM algorithm is to mark packets randomly.

The randomness is controlled by a variable called the marking probability, pm,

and every (participating) router owns a copy of this variable. There is no

restriction on the packet marking probability pm. But allowing every router

to have a different value for pm would complicate our discussion, and we make

the following assumption.

Assumption 3.2 It is assumed that every participating router uses the same

value of the packet marking probability pm throughout the execution of the

PPM algorithm.

Though this assumption sounds impractical when the PPM algorithm is

deployed in a worldwide scale, the fixed marking probability becomes natural

when the PPM algorithm is deployed within an ISP. In addition, there is


work in the literature which claims that by changing the value the marking

probability of each router, the number of packets required to construct the

correct constructed graph can be minimize [70].

3.3.4 Attack source and attack pattern

An attack source is the end-host that sends packets to the victim (not neces-

sarily a high volume of traffic). Usually, the number of attack sources can be

in an order of thousands, and the aggregated volume is therefore overwhelm-

ing. Though such an attack source may not be the attacker (the attack source

may only be a zombie), it is necessary to stop such an overwhelming flow by

locating the sources. We therefore treat every attack source as an attacker.

Assumption 3.3 An attacker is the end-host (the leaf router of an attack

graph) that sends an attacking flow toward the victim.

A flood-based DDoS attack, according to its name, attacks the victim

by flooding the victim with packets, loads the victim with an extraordinary

amount of traffic, and hence disables or degrades the service provided by the

victim. However, there is no defined pattern by which the attackers bombard

the victim. It can be a continuous flow, a bursty periodic strike, etc. For

simplicity, we make an assumption about the attack pattern as follows.

Assumption 3.4 Every attacker sends out a continuous flow of packets. Also,

every attacker sends approximately the same number of packets toward the

victim.

Note that if this assumption is actually not true, say the attack pattern is ac-

tually a bursty, then the obtained attack graph may not cover all the attackers.


3.3.5 Attack graph and packet routing

The attack graph generated by the PPM algorithm has a very strong depen-

dence on the routings inside the global network graph since the attack graph is

formed by the traversals of the packets. Nevertheless, due to the autonomous

property of the network routers, the routings inside the Internet graph may

be changed under abnormal situations. Unfortunately, a flood-based DDoS

attack is one of the abnormal situations. A high volume of flows generated by

the DDoS attack creates a congested environment within the Internet graph.

This may drive the routers to change their routings so as to cope with such a

change (and we all know that their acts are futile).

Eventually, the attack graph may be changed because of the changes in

the routings inside the Internet graph, and the topology of attack graph is,

therefore, short-lived. Nevertheless, our goal is to locate the attackers. The

short-lived property of the attack graph does not hinder us in achieving our

goal on the condition that the attack graph, from time to time, is pinpointing

the locations of the attackers.

Thus, the target of the PPM algorithm should not be fixed to find a con-

sistent attack graph. Rather, the target of the PPM algorithm is to locate the

attackers through the construction of the attack graph. We make the following

strong assumption.

Assumption 3.5 During the time that the PPM algorithm is executing, the

routings inside the global network graph should not change.

We provide the illustration on how the PPM algorithm reacts to the change

of the routings. In Figure 3.5(a), we have a network showing all the network

links and the current routing in the network. When one of the routers, R1, is

down, such failure triggers the routing table of every router to change com-

pletely as shown in Figure 3.5(b).


R1

R4

R3

R2

ν

R1

R4

R3

R2

ν

physical linkrouting path

Routing before change Routing after change

R1

R4

R3

R2

ν

possible constructed graph

constructed graph’s link

(a) (b) (c)

Figure 3.5: The failure of the router R1 causes the route tables of R2, R3, andR4 to change. This results in a constructed graph with routers having multipleoutgoing edges.

Under such a scenario, the set of collected packets may included encoded

packets from the routing configurations in both Figures 3.5(a) and 3.5(b).

Therefore, the constructed attack graph may become the one shown in Figure

3.5(c).

We argue that this result is not an undesirable one as long as the definition

of a correct attack graph construction (Definition 3.1 on Page 82) still holds

because the new attack graph is indeed composed of all the edges traversed by

the packets. In the remainder of this paper, we stay with this assumption, and

we will discuss the scenario when this assumption is relaxed in Section 5.7.

On the other hand, modern routing protocols [71, 72] currently used by

routers favor the formation of a routing tree rather than a routing graph. The

difference between a routing tree and a routing graph lies in the number of

outgoing routes to a particular address. Usually, there is only one route for

one destination address. Therefore, the corresponding attack graph should be

a tree instead of a graph unless the routings with the attack graph have been

changed.


Pkt # Src Hop#1 Hop#2 Hop#3 Hop#4 New1 R7 (φ, φ, φ) (φ, φ, φ) (φ, φ, φ) (φ, φ, φ) −2 R7 (R7, φ, 0) (R7, R4, 1) (R7, R4, 2) (R7, R4, 3)

√

3 R8 (R8, φ, 0) (R5, φ, 0) (R2, φ, 0) (R2,V, 1)√

4 R7 (R7, φ, 0) (R7, R4, 1) (R7, R4, 2) (R7, R4, 3) ×5 R7 (φ, φ, φ) (R4, φ, 0) (R4, R1, 1) (R4, R1, 2)

√

6 R8 (φ, φ, φ) (φ, φ, φ) (R2, φ, 0) (R2,V, 1) ×7 R7 (φ, φ, φ) (φ, φ, φ) (R1, φ, 0) (R1,V, 1)

√

8 R8 (φ, φ, φ) (R5, φ, 0) (R5, R2, 1) (R5, R2, 2)√

9 R8 (R8, φ, 0) (R8, R5, 1) (R8, R5, 2) (R8, R5, 3)√

......

......

......

...

Table 3.1: A sequence of packets collected by the victim.

Assumption 3.6 Every participating router has only one outgoing route to-

ward the victim.

For the ease of presentation, we call the “outgoing route toward the victim”

the victim route throughout this thesis.

3.4 Graph Reconstruction Example

After the descriptions of the PPM algorithm from the previous sections, we

illustrate a traceback example by using the network graph in Figure 3.1 (on

Page 81). Table 3.1 shows a set of packets that are received by the victim.

3.4.1 Packet marking

Table 3.1 not only shows the sequence of the arriving packets, but also displays

the way that the participating routers mark the packets.

For each row in the table, it displays the source of the concerned packet.

According to Figure 3.1, there are only two sources of attack packets, and they

are the routers R7 and R8. Also, the length of the path between R7 and the


victim as well as the length of the path between R8 and the victim are the same,

which is three. Then, each row of the table displays a step-by-step illustrates

of the packet marking procedure with the column Hop#1 corresponding to

the first router, . . . , and the column Hop#4 corresponding to the destination

victim. For example, if the source of a packet is R7, then the four hops are

R7, R4, R1, and the victim V. Lastly, the table also shows if a packet contains

a new kind of marking combinations when it is arrived at the victim, and the

column New indicates such an information.

Table 3.1 actually has displayed all kinds of marked packets and the cor-

responding packet marking histories. In addition, the table also includes an

unmarked packet.

3.4.2 Attack graph reconstruction

By using the sequence of packets displayed in Table 3.1, the victim can re-

construct the attack graph, and the steps are shown in Figure 3.6. For each

arrived packet, Figure 3.6 displays the corresponding graph constructed. There

are cases when the victim receives an unmarked marked or the victim receives

a marked packet which has been received before. Correspondingly, the con-

structed graph will not grow in size under such cases. Otherwise, one edge

will be added for each arrived marked packets. Eventually, the ninth packet

completes the attack graph reconstruction example, and the constructed con-

structed graph is correct according to Definition 3.1 on Page 82.

3.5 Chapter Summary

In this chapter, we introduced the probabilistic packet marking algorithm

(PPM algorithm for short). The algorithm is distributed in nature, and it

requires cooperation between the victim and the participating routers. The


1

4

7 8

5

2 3

6

9

R5

R8

ν ν ν

ν ν

ν

R4

R7

R2

R4

R7

R4

R7

R1

R2R2

R4

R7

ν

R4

R7

R1

R2

R4

R7

R1R2

ν

R4

R7

R1R2

ν

R4

R7

R1 R2

R5

(no change)

(no change) (no change)

Figure 3.6: A step-by-step illustration of the reconstruction of the attack graphbased on the incoming packet sequence in Table 3.1.


goal of the PPM algorithm is to discover the locations of the attackers by dis-

covering the paths traversed by the attack packets. We call the union of the

paths the attack graph. The merit of the PPM algorithm is to use a probabilis-

tic way to encode edge information on the attack packets. This light-weighted,

efficient, and yet effective way to encode information on packets sheds light on

the way to deploy traceback algorithms on production routers, and this is why

the PPM algorithm is one of the best candidates for the microscopic traceback

algorithm.

However, the PPM algorithm faces a vital defect when it is deployed. Ac-

cording to Section 3.4, an example shows the way that the victim constructs

the attack graph. Also, this example shows the way that the victim stops the

attack graph reconstruction: when the constructed graph is the same as the

attack graph. Yet, in the example, the victim knows the time to stop because

the attack graph is known (as we know that Figure 3.1 on Page 81 is the attack

graph). There is a high possibility that when an ISP wants to discover the

attack graph, the attack graph is however unknown to the ISP. Then, the ISP

does not know the time to terminate the PPM algorithm. In the next chapter,

we address the termination condition of the PPM algorithm.

Chapter 4

Termination Condition of PPM

Algorithm

The probabilistic packet marking algorithm has gained much attention, and

it has been enhanced by much work in the literature. Yet, it is important to

ensure that the PPM algorithm can be deployed successfully.

For a traceback algorithm to be successfully deployed, several issues have

to be addressed. First, the initialization of the algorithm is vital. For the PPM

algorithm to be deployed as a microscopic traceback algorithm at the ISP level,

the routers on the ISP must be synchronized under centralized control. Then,

the initialization among the routers can be done seamlessly under the control

of the ISP. Next, the collection and the processing of the traceback data are

also vital. In our case, the victim may not have the ability to collect or to

process the data. Nevertheless, the ISP can set up special routing support so

as to divert the malicious traffic to the micro-traceback processing node (as

described in Section 1.2.2 on page 13).

Lastly, a traceback algorithm should know when the algorithm should stop.

For the PPM algorithm, however, the termination is not thoroughly discussed

in the literature. It turns out that the termination condition is important

because it determines the correctness of the constructed graph: if the algorithm

stops too early, the constructed graph will not contain enough edges of the

94

Chapter 4 Termination Condition of PPM Algorithm 95

attack graph, and, thus, fail to fulfill the traceback purpose. On the other

hand, it is not correct to allow the algorithm to run for a long period before

the victim starts the graph reconstruction procedure. The reason is obvious:

the victim would never know how much time is long enough.

In [30], the authors provided an estimation of the number of marked packets

required before the victim can have a constructed graph that is the same as the

attack graph in a single-attacker environment. Let X be the number of marked

packets required for the victim to reconstruct a path, and we name this number

the sufficient packet number. Let d be the length of the reconstructed path.

Also, let pm be the marking probability of every router in the path. Equation

(4.1) given by [30] defines the upper bound on the expected sufficient packet

number E[X], and we name this equation the upper–bound packet number

throughout this paper.

E[X] <ln(d)

pm(1 − pm)d−1. (4.1)

4.1 Using the Upper-Bound Packet Number

as the Termination Condition

Although there is no explicit definition of the termination condition of the PPM

algorithm in [30], it is well-accepted that Equation (4.1) is the termination

condition under the single-attacker environment. The authors also claimed

that, in a multiple-attacker environment,

“the number of packets needed to reconstruct each path is indepen-

dent, so the number of packets needed to reconstruct all paths is a

linear function of the number of attackers”.

Nevertheless, we have found that it is not the case in general. More specifically,

Equation (4.1) should not be treated as the termination condition of the PPM

algorithm.


R1

v

R2

R4R3 R5 R6

pool of attack sources

victim

Figure 4.1: A six-router binary tree network: the upper-bound equation cannotbe applied under this multiple-attacker environment.

4.1.1 Failure under the multiple-attacker environment

Firstly, one cannot apply the termination condition to complex networks such

that the reconstruction of one path is dependent on another. This scenario can

be explained by Figure 4.1, a binary-tree network with six routers. The leaf

routers, from R3 to R6, are connected to a pool of attackers. These attackers

send out attack traffic towards to the victim V, and this presents a multiple-

-attacker environment. In this graph, the attack packets traversed through

four paths which are identical in structure. However, there are “shared” edges

among these paths. This implies that the reconstruction of one path is depen-

dent on another. Therefore, one cannot treat Equation (4.1) as the termination

condition under this scenario, and this restricts the application of the PPM

algorithm.

Secondly, although every path in a given network is independent, we have

found that the number of marked packets needed to reconstruct the network

graph does not have a linear relationship with the number of paths, i.e., the

claim made by [30] is not correct. We have carried out a set of simulations to

show our finding, and we start the description of our simulation setup from

the network depicted in Figure 4.2. The network contains four paths that

are identical in structure, and, more importantly, there are no shared edges

between any two paths. We name these paths the independent paths. Also, we


R1

v

R4

R2

R5

R3

R8

R7


victim

R6

Figure 4.2: A eight-router tree network with four independent linear paths:another multiple-attacker environment.

assume that one independent path connects to one attacker, and every attacker

sends out a similar amount of attack traffic towards the victim.

4.1.2 Simulation findings

If the claim of the linear property in [30] is right, then there will be a linear

relationship between the number of the independent paths and the number of

packets required to construct a correct constructed graph, where the definition

of the correctness of the constructed graph is defined in Definition 3.1 on Page

82.

To show whether the claim is right or not, we carry out simulations. Given a

network graph, such as the one shown in Figure 4.2, a simulation is to measure

the number of packets needed by the PPM algorithm to return a constructed

graph that is exactly the same as the given network graph. Then, for a specific

network graph, we repeat such a simulation by 10,000 times so that we can

obtain an average value of the number of packets required. Eventually, we

performed a series simulations for the input network graphs having one to 50

independent paths.

Figure 4.3 shows the result of this set of simulations. One can observe

that the average number of marked packets required to construct a correct


constructed graph increases as the number of independent paths increases.

In order to show whether the number of marked packets required increases

linearly with the number of paths or not, we plot the rate of change of the

number of marked packets required in Figure 4.4. Surprisingly, the graph shows

an increasing trend of the rate of change of the number of marked packets

required. The claim about the multiple-attacker environment made in [30] is

therefore wrong. In later context, we will provide the formal calculation of

the number of marked packets required, and provide a formal way to disprove

the use of the upper-bound equation as the termination condition of the PPM

algorithm.

Theoretically, the packet collecting problem can be transformed into the

coupon-collecting problem with unequal probabilities [73]. The fault made by

[30] is to treat the probability that every encoded edge arrived at the victim

the same, which is wrong (we will discuss it in Section 4.2). The solution to

the coupon-collection problem with unequal probabilities is very complex, and

does not show a linear property with the number of the independent paths.

In summary, the problem of using the upper-bound packet number as the

termination condition is that the relationship between the number of the attack

paths and the expected sufficient packet number E[X] is not known. Therefore,

the PPM algorithm cannot guarantee the correctness in the multiple-attacker

environment.

4.1.3 Chapter structure

In this chapter, we are going to derive a formal way to calculate the expected

number of marked packets required to construct a correct constructed graph.

First, different kinds of marked packets make up the constructed graph, and

the modeling of the packet marking process then becomes important in order


0

500

1000

1500

2000

2500

3000

0 20 40 60 80 100 120 140 160 180 200

Num

ber

of m

arke

d pa

cket

s re

quire

d

Number of independent paths

Simulation: Number of marked packet required vs Number of independent paths

Simulation result

Figure 4.3: Simulation result: Number of marked packets required versus num-ber of independent paths.

0

2

4

6

8

10

12

14

16

18

20

0 20 40 60 80 100 120 140 160 180 200

Rat

e of

cha

nge

of n

umbe

r of

mar

ked

pack

ets

requ

ired

Number of independent paths

Simulation: Rate of change of number of marked packet required vs Number of independent paths

Rate of change

Figure 4.4: An increasing yet chaotic trend of the rate of change of the numberof marked packets required.


to understand the pattern of the arriving packets. Such a model will be intro-

duced in Section 4.2, and the model is called packet-type model. By using the

packet-type model, we aim to find the sufficient packet number X as well as

its expectation E[X]. In Sections 4.3, we introduce the discrete-time Markov

chain model to find E[X]. In Section 4.4, we will compare the simulation re-

sult and the theoretical result, and the consistent results together disprove the

upper-bound packet number as the termination condition. Lastly, Section 4.5

concludes this chapter.

4.2 Packet-Type Model

The packet-type model is actually a model of the packet marking procedure

(Figure 3.3 on Page 84). This model aims to describe all possible types of the

marked packets. We first define the packet-type random variable.

Definition 4.1 Define T (G) as the packet-type random variable. T (G) = e

represents that a packet encoding the edge e arrives at the victim, where e is

in the set of edges of the attack graph G. Also, define T (G) = φ if the packet

arrived at the victim is unmarked.

The encoding of every packet arrived at the victim can be represented by

the random variable T (G). In addition, the distance between an edge and the

victim also plays a vital role. We define the edge distance function as follows.

Edge Distance Function

d((Ri, Rj),V, path) =

1, Rj = V;

d((Rj, Rk),V, path) + 1, otherwise,(4.2)

where path = (Ri, Rj, Rk, . . . ,V) is the path from Ri to the victim V. Note

that , according to Assumption 3.6 (every router has only one victim route),

the edge distance function should return only an unique output.


4.2.1 Packet-Type probability

For every value of the packet-type random variable, there is a corresponding

probability for its occurrence. We name this probability the packet-type prob-

ability. In the following derivation of the packet-type probability, we let the

attack graph be G = (V, E). Also, let Ri, Rj ∈ V , and let (Ri, Rj) ∈ E.

For a packet encoding (Ri, Rj) arriving at the victim V, the packet has to

firstly pass through the edge (Ri, Rj), and then the packet has to be marked by

the router Ri but not the successive routers (meaning that the packet encodes

the edge (Ri, Rj)). Based on the above statement, we have:

P (T (G) = (Ri, Rj)) = P ( “a packet passes through (Ri, Rj)” and

“a packet encodes (Ri, Rj)”)

= P (“a packet passes through (Ri, Rj)”) ×

P (“a packet encodes (Ri, Rj)” |

“a packet passes through (Ri, Rj)”).

Henceforth, we firstly derive the probability that a packet passes through

(Ri, Rj). Then, we derive the probability that a packet encodes (Ri, Rj) con-

ditioned that the packet passes through (Ri, Rj). For the ease of presentation,

we name the former probability the via probability, Pv((Ri, Rj), G), and the

latter one the conditional encoding probability, Pc((Ri, Rj), G). Note also that,

without loss of generality, our proposed solution can also deal with the edges

in the form (Ri,V) where V is the victim site.

Via probability

By Assumptions 3.3 and 3.4 (on Page 87), the probability that a packet comes

from a particular leaf router is the same as the probability that it comes from

another leaf router. On the other hand, Assumption 3.6 (on Page 90) implies

that there is only one path leading from a leaf router to the victim.


Through the above analysis, the via probability is the product of the portion

of leaf routers that are connected to the edge (Ri, Rj). In the following, we

establish a method to calculate the via probability. Let L(G) be the set of leaf

routers in the network graph G and let |L(G)| be the number of leaf routers

in the set L(G). Let Path(R,V) be the path leading from the router R to the

victim V.

For every leaf router Rl ∈ L(G), there is only one path Path(Rl,V) leading

to the victim. Whether this path contains the target edge (Ri, Rj) or not is

another story. We provide a function δ(p, e), where p is a path and e is an

edge, and we name it the edge testing function. If p contains e, then the edge

testing function returns one. Otherwise, the edge testing function returns zero.

Equation 4.3 gives the formal presentation of the edge testing function.

Edge Testing Function δ(p, e) =

1, e ∈ p

0, e /∈ p(4.3)

Through the introduction of the edge testing function, one can determine

if a path contains a particular edge. Then, one can count the number of

leaf routers whose path towards the victim contain the target edge (Ri, Rj).

Equation 4.4 gives the method that counts such a number of leaf routers, and

it is the via probability.

Via Probability

Pv((Ri, Rj)) =1

|L(G)| ×∑

Rl∈L(G)

δ(Path(Rl,V), (Ri, Rj)). (4.4)

Note importantly that such a derivation of the via probability would become

inapplicable if Assumption 3.6 (assumption on one victim route) is void.

Conditional encoding probability

The conditional encoding probability concerns how a packet’s marking can

reach the victim without being over-written. The formulation of this proba-

bility relies on the edge distance function introduced in early Section 4.2.


The edge distance function d ((Ri, Rj),V, Path(Ri,V)) gives the distance

from Ri to V. For the encoding of (Ri, Rj) becoming successful, all the routers

from Rj until the last mile router of V have to choose not to encode a new edge.

Hence, the following equation defines the conditional encoding probability.

Conditional Encoding Probability

Pc((Ri, Rj)) = pm × (1 − pm)d((Ri,Rj),V ,Path(Ri,V))−1. (4.5)

Finally, we have the packet-type probability of (Ri, Rj) as follows.

Packet-type Probability

P (T (G)=(Ri, Rj)) =1

|L(G)| ×∑

Rl∈L(G)

δ(Path(Rl,V), (Ri, Rj)) ×

pm × (1 − pm)d((Ri,Rj),V ,Path(Ri,V))−1. (4.6)

In addition, the packet-type probability of an unmarked packet is as follows:

Unmarked Packet Probability

P (T (G) = φ) = 1 −∑

e∈E

P (T (G) = e) . (4.7)

Note that the above derivation of the packet-type probability includes the

presence of the unmarked packets. If the victim considers only the marked

packets, a suitable normalization should be applied as follows. Denote Tm(G)

as the marked packet-type random variable which is the same as the packet-

type random variable T (G) except that Tm(G) takes only on values of the edge

set E of the graph G. Then, the marked packet-type probability is given by

Equation (4.8).

Marked Packet-Type Probability

P (Tm(G)=e) =P (T (G)=e)

1 − P (T (G)=φ), ∀e ∈ E. (4.8)


Packet type probability(Graph G)

/* The size of the “result” array is equal to the number of edges of thinput graph “G”*/1. result := allocate memory(G.edge) ;2. For (i := 0; i < G.edge; i := i + 1)3. result[i] := 0 ;4. end For5. Foreach leaf in G ; do/* “search path” finds the path from leaf to victim */6. path := search path(leaf, victim) ;8. Foreach edge in path ; do9. length := edge distance function(path, edge) ;10. result[edge] := result[edge] + 1/G.leaf num × pm × (1−pm)length−1;12. end Foreach14. end Foreach15. return result ;

Figure 4.5: The pseudocode of the packet-type probability calculation – itcalculates the packet-type probability of every edge in the graph G.


4.2.2 Pseudocode of the calculation of the packet-type

probabilities

In Figure 4.5, we provide an algorithm to calculate the packet-type probability

of every edge of an input graph. The algorithm first constructs the path leading

from every leaf routers to the victim. Then, for each path, it calculates and

accumulates the packet-type probability by Equation (4.6) for every edge in

the path. Eventually, it returns the packet-type probabilities of all edges of the

input graph. Note that the calculations of the packet-type probability for an

unmarked packet and the marked packet-type probabilities are not included in

the pseudocode, but one can calculate it by Equations (4.7) and (4.8) together

with the results obtained by the algorithm.

4.2.3 Illustration of the calculation of the packet-type

probability

In this subsection, we use two example graphs in Figure 4.6 to demonstrate the

calculation of the packet-type probabilities under single-path and multi-path

environments, respectively.

The single-path environment

The graph Ga in Figure 4.6 contains one leaf router and has only one path

(R3, R2, R1, v) from leaf router R3 to the victim v. As there is only one leaf

and one path in Ga, the probability that a packet passes through any edge is

one. Thus, the packet-type probability is calculated as shown in Table 4.1.

The multiple-path environment

In graph Gb, there are two paths: (R3, R2, R1, v) and (R4, R1, v). Both paths

contain the edge (R1, v), and this implies that the packet-type probability


R1 V

R4

R2R3R1 VR3 R2

(a) (b)

Figure 4.6: (a) Ga: A simple example linear network with three edges. (b)Gb: An example network with multiple paths leading from R3 and R4 to thevictim.

Edge (e)(R1, v) (R2, R1) (R3, R2)

P(T(Ga) = e) pm pm(1 − pm) pm(1 − pm)2

Table 4.1: Packet-type probabilities for Ga in Figure 4.6.

of (R1, v) is accumulated from two paths. For the path (R3, R2, R1, v), the

contributed packet-type probabilities are shown in Table 4.2. Then, for the

path (R4, R1, v), the accumulated packet-type probabilities are shown in Table

4.3.

Edge (e)(R1, v) (R2, R1) (R3, R2) (R4, R1)

P(T(Gb) = e) 12pm

12pm(1 − pm) 1

2pm(1 − pm)2 0

Table 4.2: Packet-type probabilities for Gb in Figure 4.6 : after the path(R3, R2, R1, v) of Gb is considered.


Edge (e)(R1, v) (R2, R1) (R3, R2) (R4, R1)

P(T(Gb) = e) pm12pm(1 − pm) 1

2pm(1 − pm)2 1

2pm(1 − pm)

Table 4.3: Packet-type probabilities for Gb in Figure 4.6: after both paths(R3, R2, R1, v) and (R4, R1, v) of Gb are considered.

4.3 Using Markov Chain Model to Find the

Sufficient Packet Number

In the previous section, the packet-type model is defined. We employ such a

model in this section, and transform the packet collection process of the PPM

algorithm into a discrete-time Markov chain model. Using the well-established

mathematical techniques, we introduce the methodology to calculate the suf-

ficient packet number X and the expected sufficient packet number E[X].

4.3.1 The Markov process

One of the main tasks perform by the PPM algorithm is to collect the marked

packets, and this task terminates when there is at least one marked packet

from each packet type. Let us now define the underlying Markov process M.

In our model, each Markov state represents a combination of the collected

marked packets. Let G = (V, E) be a network graph. The state space SG of

the Markov process M is the power set of the edge set E and is stated as

follows:

SG = {Es | Es ⊆ E}.

For the ease of presentation, we make use of an example network G1 depicted

in Figure 4.7. For the example network G1, the state space SG1is as follows:

SG1= { φ, (e1), (e2), (e3), (e1, e2), (e1, e3), (e2, e3), (e1, e2, e3) },


R3 R2 R1 νe1e2e3

Figure 4.7: Example network G1: it is a linear network with three routers andone victim.

where, without loss of generality, (ei, ej) represents the victim has collected

two types of marked packets: (i) packets encoding ei and (ii) packets encoding

ej . On the other hand, φ represents that the victim has not collected any

marked packets yet.

Markov states

The physical meanings of the Markov states in M are as follows. The Markov

process starts with the victim collected no marked packets, i.e., state φ. The

Markov process stops when the victim has collected the complete set of marked

packets, and the process is then in the absorbing state. The PPM algorithm

constructs the attack graph when such a state is reached. On the other hand,

the remaining states of the Markov process represent the intermediate states

of the PPM algorithm.

According to the above descriptions, if there are totally m edges in the

attack graph, then there will be totally 2m Markov states. As the network

size grows and the number of edges increases, the Markov chain may become

computationally expensive to model. Nevertheless, one can employ efficient

techniques such as aggregation or dis-aggregation [74] and stochastic comple-

mentation [75] to reduce the state space of the Markov chain model.

Transitions

Besides the Markov states, a discrete-time Markov chain also includes its one-

step transition probability matrix, and, in our Markov model, a transition


implies an arrival of a packet (not necessarily a marked packet). A transition

occurs if one of the following two situations happens:

1. an arrival of an unmarked packet, or an arrival of a marked packet but

the victim has received this type of packets already. Under these two

cases, the process stays in the same state; or

2. an arrival of a marked packet which was never received by the victim

before. In this case, the process advances to another Markov state.

For example, considering the graph G1 again, suppose that the current state of

M is (e1), and the victim receives a packet encoded e1 again, then the Markov

process stays in state (e1). For another example, while the process is still in

state (e1), the victim receives a packet encoding e2. Then, the process makes

a transition to state (e1, e2).

Next, we define the transition structure of the Markov process. Let there

be m edges named e1, e2, . . . , em. The transition structure in Equation (4.9)

formally defines all the possible transitions of the Markov process.

Transition structure

φ −→ (ei) ;

(ei1) −→ (ei1 , ei2), where ei1 6= ei2 ;

(ei1 , ei2) −→ (ei1 , ei2, ei3), where ei3 6= ei1 & ei3 6= ei2 ;...

...

(ei1 , ei2 , . . . , eim−1) −→ (ei1 , ei2, . . . , eim−1

, eim) ;

(4.9)

where i1, i2, . . . , im ∈ [1, m].

Transition probabilities

Lastly, every transition is associated with a transition probability. In turn,

the transition probability involves the packet-type probability. Back to the

example of network G1 in Figure 4.7, from state (e1) to state (e1, e2), one


requires the arrival of the packet encoding the edge e2, and, therefore, the

transition probability is the packet-type probability P (T (G1)=e2).

We now show the formulation of the transition probability matrix of the

PPM algorithm. Denote the transition probability matrix of the Markov chain

as P, and denote an entry P[i, j] as the probability that the Markov process

M makes a transition from state i to state j. Then, the transition probabil-

ity matrix P is formulated in Equation (4.10) according to the Markov state

transitions defined in Equation (4.9).

Transition probability structure

P[φ, φ] = P (T ′(G) = φ);

P[φ, (ei1)] = P (T ′(G) = ei1);

P[(ei1), (ei1)] = P (T ′(G) = ei1);

P[(ei1), (ei1 , Ri2)] = P (T ′(G) = ei2);...

...

P[(ei1 , . . . , eir), (ei1 , . . . , eir)] =∑r

k=1 P (T ′(G) = eik);

P[(ei1 , . . . , eir), (ei1 , . . . , eir , eir+1)] = P (T ′(G) = eir+1

);...

...

P[(ei1 , . . . , ein−1), (ei1 , . . . , ein−1

)] =∑n−1

k=1 P (T ′(G) = eik);

P[(ei1 , . . . , ein−1), (ei1 , . . . , ein−1

, ein)] = P (T ′(G) = ein);

P[(ei1 , . . . , ein), (ei1 , ei2, . . . , ein)] = 1.

(4.10)

where G is the attack graph.

4.3.2 Example on discrete-time Markov chain modeling

We present an example to model the traceback process of the example network

G1 into a discrete-time Markov chain When one follows the modeling rules

described above, one can construct the Markov chain as shown in Figure 4.8.

We describe the Markov chain in Figure 4.8 in the following. Every state

except the start state, labeled state 1, has a set of boxes along side with it. For


e2

2

3

4

5

6

7

81

1P(T(G)=φ)

P(T(G)=φ) + P(T(G)=e1)

P(T(G)=φ) + P(T(G)=e2)

P(T(G)=φ) + P(T(G)=e3)

P(T(G)=e1)

P(T(G)=e2)

P(T(G)=e3)

P(T(G)=e2)

P(T(G)=e2)

P(T(G)=e3) P(T(G)=e1)

P(T(G)=e3)P(T(G)=e1)

P(T(G)=e3)

P(T(G)=e2)

P(T(G)=e1)

P(T(G)=φ) + P(T(G)=e1) + P(T(G)=e2)

P(T(G)=φ) + P(T(G)=e1) +

P(T(G)=e3)

P(T(G)=φ) + P(T(G)=e2) + P(T(G)=e3)

e1 e3

e1 e2e1

e1 e3

e2

e3 e2 e3

Figure 4.8: Illustration of the Markov chain model of the PPM algorithm withnetwork G1 in Figure 4.7.


example, state 2 has a ‘e1’ in the box. These boxes indicates which types of

packets that the victim has received. Hence, the box of state 2 indicates that

the marked packets encoding edge e1 have been received. The same rationale

applies for other states. Specifically, for state 8, all types of marked packets are

collected with the three boxes appeared, and we name this state the absorbing

state. On the other hand, the transitions of the Markov model are represented

as arrows. The probabilities of these transitions are derived from the transition

probability structure in Equation (4.10). Note importantly that, on state 8,

its stationary transition probability is one, and this implies that further packet

arrivals will not change the state of the process.

If one sets the marking probability to be pm = 12, then the marked packet-

type probabilities of graph G1 are as follows:

P (Tm(G) = e) =

47

e = e1;

27

e = e2;

17

e = e3;

0 e = φ.

(4.11)

By employing the Markov chain (Figure 4.8) and the calculated marked packet-

type probability (Equation (4.11)), one can construct the transition probability

matrix as shown in Figure 4.9.

P =

0 47

27

17

0 0 0 00 4

70 0 2

717

0 00 0 2

70 4

70 1

70

0 0 0 17

0 47

27

00 0 0 0 6

70 0 1

7

0 0 0 0 0 57

0 27

0 0 0 0 0 0 37

47

0 0 0 0 0 0 0 1

Figure 4.9: The constructed transition probability matrix formulated of theMarkov chain shown in Figure 4.8.


Importance of the transition probability matrix

The transition probability matrix is a vast source of information about the

PPM algorithm. Let the transition probability matrix to be P. If we raise

the matrix to a certain power, say k, then Pk represents the system’s states

after k packets have been arrived at the victim, and the entry Pk[1, 8] in Pk

represents the probability that the system makes a transition from state 1 to

state 8 after k packets have been arrived at the victim. In other words, Pk[1, 8

represents the accumulative probability that k marked packets are enough to

construct an attack graph. Mathematically, we have the following:

Pk[1, 8] = P (X < k) =

k∑

i=0

P (X=i) .

Hence, one can obtain the probability that the sufficient packet number is i,

P (X=i), as follows:

P (X=i) = P i[1, 8] −P i−1[1, 8] . (4.12)

Further, by Equation (4.12), one can construct the probability density function

of P [X = k] as shown in Figure 4.10. Note that this figure shows a close result

between the simulation data (the distribution of the number of required marked

packets generated by 100,000 individual simulation samples) and the density

function.

For another example number, one can find the probability distribution

function of the sufficient packet number of a more complex network graph, G2

in Figure 4.11. Figure 4.12 shows the probability distribution of the sufficient

packet number as well as the corresponding simulation result. The two results

are close, and this supports the correctness of the Markov chain model.

4.3.3 Fundamental matrix

Not only the probability density function of the sufficient packet number can

be found, the Markov chain can also help to find the expected sufficient packet


0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0 5 10 15 20 25 30

Pro

babi

lity

P[X

=i]

Number of marked packets (i)

Markov-chain Model for PPM algorithm

Markov-chain ModelSimulation result (mean = 8.3252)

Figure 4.10: Simulation result versus theoretical result: for network G1 in Fig-ure 4.7, we obtain two close sets of results for the distribution of the sufficientpacket number X.

R1

ν

R2

R4R3 R5 R6

R7 R8 R9 R10 R11 R12 R13 R14


victim

Figure 4.11: Example network G2: totally 16,384 Markov states.


0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0 50 100 150 200 250 300 350 400 450 500

Pro

babi

lity

P[X

=i]

Number of marked packets (i)

Probability Distibution of Sufficient Packet Number

Theoretical ResultSimulation Result (mean = 151.66)

Figure 4.12: Probability distribution of the sufficient packet number on the14-router binary-tree network, G2.

number E[X] using the theory of the fundamental matrix [76]. In brief, a

fundamental matrix a n − 1 × n − 1 matrix which is transformed from the

transition probability matrix. In here, we describe the steps involved in the

calculations of the fundamental matrix and E[X].

To calculate the fundamental matrix, one has to first partition the state

space of the Markov chain S into two mutually exhaustive and exclusive par-

titions: St is the partition for all transient states while Sa is the partition for

the absorbing states (e.g., state wherein all marked packets are received by the

victim). After the partition, the one-step transition probability matrix can be

represented as:

P =

Q C

0 1

.

In other words, in the case that there is only one absorbing state, and if

P is of size n × n, then Q is of size n−1 × n−1 wherein Q is the transition


sub-matrix for all transient states in St (e.g., these are states which have less

than n received marked packets). The sub-matrix Q is used to calculate the

fundamental matrix M as follows:

Fundamental Matrix

M = (I− Q)−1 =∞∑

k=0

Qk . (4.13)

Theorem 4.1 Let M be the fundamental matrix of the underlying Markov

chain describes by the one-step transition probability matrix P. The (i, j)th

entry of M, denote as M[i, j], represents the expected number of visits from

the transient state i to the transient state j before entering the absorbing state.

Proof. The proof is done in [77]. In the following, we are repeating the proof

in terms of our application.

Let the number of visits from the transient state i to the transient state j

before entering the absorbing state be Xij .

Support that the PPM algorithm, i.e., the Markov process, is in the tran-

sient state si. In one step, it may enter the absorbing state sn, with probability

P[i, n].The corresponding number of visits to state sj is equal to zero unless

j = 1. Define:

δij =

1, i = j;

0, otherwise;

Thus, Xij = δij with probability P[i, n]. Alternatively, the process may go

to transient state sk at the first step with probability P[i, k]. The subsequent

number of visits to state sj is given by Xkj. If i = j, the total number of visits,

Xij , will be Xkj + 1. Otherwise, it will be Xkj. Therefore,

Xij =

δij with probability P[i, j]

Xkj + δij with probability P[j, k], 1 ≤ k < n


If the random variable y denotes the state of the process at the second step

(given that the initial state i), we can summarize as follows:

E[Xij |Y = n] = δij ,

E[Xij |Y = k] = E[Xkj + δij] = E[Xkj] + E[δij ] = δij .

Now, since the pmf of Y is easily derived as P (Y = k) = P[i, k], 1 ≤ k ≤ n,

we can use the theorem of total expectation to obtain

E[Xij ] =∑

k E[Xij |Y = k] × P (Y = k)

= P[i, n] × δij +∑n−1

k=1 P[i, k] × (E[Xkj] + δij)

=∑n

k=1 P[i, k] × δij +∑n−1

k=1 P[i, k] × E[Xkj]

= δij +∑n−1

k=1 P[i, k] × E[Xkj] .

Forming the (n − 1) × (n − 1) matrix consisting of elements E[Xij], we have

[E[Xij]] = I + Q × [E[Xij ]] ⇒ [E[Xij ]] = (I − Q)−1 = M.

Based on the theorem above, one can calculate the expected number of

visits from the starting state to every transient states before entering the ab-

sorbing state, i.e, E[X] in our application. Thus, E[X] can be expressed as:

Expected sufficient packet number

E[X] =n−1∑

i=1

M[1, i] , (4.14)

where state 1 is the start state, i.e., the victim has not received any marked

packets.

4.3.4 Example on calculating E[X]

We continue our example on network graph G1 and calculate E[X]. By fol-

lowing the formulation of the fundamental matrix specified in Equation (4.13),


M =

1 43

25

16

6415

1 1160

0 73

0 0 143

76

00 0 7

50 28

50 7

20

0 0 0 76

0 73

712

0 0 0 0 710

0 00 0 0 0 0 7

20

0 0 0 0 0 0 74

Figure 4.13: Fundamental matrix calculated by Equation (4.13) with transitionprobability matrix P shown in Figure 4.9.

one can calculate the fundamental matrix M as shown in Figure 4.13. Lastly,

by using Equation (4.14), the value of E[X] is given as follows:

E[X] = 1 +4

3+

2

5+

1

6+

64

15+ 1 +

11

60= 8.3500,

which is quite close to the simulation result shown in Figure 4.10 (on Page

114).

For the example network G2 in Figure 5.10, the calculated expected suffi-

cient packet number is 151.77 while the simulation result is 151.66 according

to Figure 4.12 (on Page 115). This again shows a close result between the

simulation and the calculation from the theoretical model.

4.4 Disproving the Upper-Bound Packet Num-

ber as the Termination Condition

With the discrete-time Markov chain model derived, one can show the rela-

tionship between the number of independent paths and the number of marked

packets required in a more clearer sense (described earlier in Section 4.1.2

on Page 97). Figure 4.14 shows a comparison between the simulation on the

independent path analysis and the theoretical result using the Markov chain

model.

According to the results, one can observe that the two set of data are


0

1

2

3

4

5

6

7

8

9

10

0 2 4 6 8 10 12 14

Rat

e of

cha

nge

of n

umbe

r of

mar

ked

pack

ets

requ

ired

Number of indenpendent paths

Comparison: Simulation vs Theoretical Result

Theoretical resultSimulation result

Figure 4.14: The comparison between the simulation and the theoretical re-sults: both results disprove the linear property proposed by previous work.

consistent. This clears the doubt that the increasing trend in the rate of

change of the average marked packets needed is resulted because of statistical

errors in the simulations. More importantly, the simulation and the theoretical

results together disprove the use of the upper-bound equation (Equation (4.1)

on Page 95) as the termination condition in a multiple-attacker environment.

The relationship between the number of independent attack paths and the

expected sufficient packet number is proved to be non-linear using the discrete-

time Markov chain model.

4.5 Chapter Summary

In this chapter, we studied the way that the probabilistic packet marking

algorithm (PPM algorithm for short) should terminate its execution. The

number of marked packets collected by the victim is believed to be a natural

choice of the termination condition, and we called the number of the marked


packets that are required to reconstruct the attack graph the sufficient packet

number.

We first had an in-depth understanding of the PPM algorithm. Through

simulation results, we knew that the relationship between the number of attack

paths and the expected sufficient packet number E[X] is not linear, and the

relationship is still outstanding. In quest for the answer, We first provided a

probabilistic model on the packet marking procedure of the PPM algorithm

(Figure 3.3 on Page 84), and we called such a model the packet-type model.

Then, we devised a discrete-time Markov chain model of the PPM algorithm.

The Markov chain model can give an accurate calculation of the probability

distribution function of the sufficient packet number as well as the expected

sufficient packet number.

Nevertheless, no matter how accurate and how efficient the calculation of

the expectation E[X] is, it is suggested that the expected number of required

marked packets E[X] should not be treated as the termination condition. De-

pending on the underlying probability distribution of the random variable X,

when the mean is reached, there is still a non-zero probability that the con-

structed graph is still an incorrect one. For instance, if the probability distri-

bution of X is a uniform distribution, then the probability that a correct attack

graph is constructed is just 0.5. As a matter of fact, one does not have such a

probability distribution unless one has known the attack graph in advance.

This contradicting condition motivates us to devise a new termination con-

dition of the PPM algorithm, and this idea eventually creates a new form of the

algorithm. We introduce a new traceback algorithm, the rectified probabilistic

packet marking algorithm, in the next chapter.

Chapter 5

Rectified Probabilistic Packet

Marking Algorithm

According to the understanding of the probabilistic packet marking (PPM)

algorithm accumulated from the previous chapters, we conclude the following

points:

• the termination condition of the PPM algorithm relies on the number of

marked packets collected by the victim;

• the number of marked packets required differs by the size and the struc-

ture of the attack graph; and

• the calculations of the probability density function and the expectation

of the number of marked packets required to reconstruct the attack graph

demand complete knowledge of the attack graph.

The last of the above points is the fatal weakness of the current termination

condition of the PPM algorithm because if one already knows the attack graph,

why would one need the PPM algorithm to find the attack graph? This leads

to an obvious choice: to abandon the current termination condition and devise

a new one.

In this chapter, we devise a new version of the PPM algorithm, and the new

algorithm is called the rectified probabilistic packet marking algorithm (RPPM

121

Chapter 5 Rectified Probabilistic Packet Marking Algorithm 122

algorithm for short). The new PPM algorithm has the following attractive

features:

• the RPPM algorithm reconstructs the attack graph without any knowl-

edge of the attack graph;

• the user of the RPPM algorithm is free to determine the correctness of

the constructed graph; and

• when the RPPM algorithm terminates, the constructed graph is guar-

anteed to reach the correctness assigned by the user independent of the

marking probability and the structure of the underlying network graph.

5.1 Structure of This Chapter

In this chapter, we present the RPPM algorithm in details. First, an overview

of the RPPM algorithm will be given in Section 5.2. Next, we show the

dynamics of the execution of the RPPM algorithm in Section 5.3, through the

introduction of the execution diagram. The execution diagram has provided

the foundation of the way that the RPPM algorithm terminates itself. In

Section 5.4, the derivation of the termination condition of the RPPM algorithm

will be presented. Then, we illustrate the execution of the RPPM algorithm

through a graph reconstruction example in Section 5.5. Simulation results of

the RPPM algorithm are presented in Section 5.6, and the results will display

the robustness of the RPPM algorithm. Last but not least, some deployment

issues of the RPPM algorithm will be disclosed in Section 5.8, and Section 5.9

concludes this chapter.


Markedpacketstream

Correctconstructedgraph

Incorrectconstructedgraph

probability > P*

probability < 1-P*

RPPMAlgorithm

User-selectedcorrectness P*

Figure 5.1: The design goal of the RPPM algorithm: to have a correct con-structed graph with probability greater than P ∗.

5.2 Overview of the RPPM Algorithm

To achieve the goal mentioned, we devise a probabilistic computation that guar-

antees that the constructed graph is the same as the attack graph with proba-

bility greater than P ∗, in which we call P ∗ the traceback confidence level (it is

analogous to the level of confidence that the algorithm wants to achieve). To

accomplish this goal, the graph reconstruction procedure of the original PPM

algorithm is completely replaced, and we call the new procedure the rectified

graph reconstruction procedure. On the other hand, we preserve the packet

marking procedure so that every router deployed with the PPM algorithm

dose not have to be changed.

5.2.1 Working principle

According to the original working principle of the PPM algorithm, one can

observe that when more marked packets arrive at the victim, the constructed

graph grows larger until it becomes the same as the attack graph. Then, the

following important point can be concluded from the above observation:

If there is a marked packet arriving at the victim but the con-

structed graph does not change accordingly, then the probability

that the current constructed graph is the same as the attack graph


will be increased.

In the meantime, we name the period that the constructed graph is not

updated the idle time. Accordingly, the rectified graph reconstruction proce-

dure should then monitor the status of the constructed graph as well as the

idle time. The longer the idle time, the more certain the estimation that the

constructed graph is the attack graph. Nevertheless, we are not going to mea-

sure the idle time in terms of “execution time.” Instead, we measure the idle

time in terms of the number of marked packets received.

Specifically, the rectified graph reconstruction procedure calculates the ter-

mination packet number (TPN for short) whenever the constructed graph is

updated. The TPN is used so that when the number of marked packets re-

ceived by the victim is larger than the TPN but the constructed graph is not

updated, then the RPPM algorithm should stop, and the algorithm should

claim that the probability that the constructed graph is the same as the at-

tack graph is at least P ∗.

5.2.2 Flow of rectified graph reconstruction procedure

Based on the above working principle, we give the the pseudocode of the rec-

tified graph reconstruction procedure in Figure 5.2, and this procedure should

be started as soon as the victim collected the first marked packet.

When a marked packet arrives at the victim, the procedure first checks if

this packet encodes a new edge. If so, the procedure updates the constructed

graph Gc accordingly. Next, if the constructed graph is connected, where

connected means every router can reach the victim, the procedure calculates

the termination packet number (TPN). We will comeback and discuss the case

when the graph is disconnected. The procedure then resets the counter for the

incoming packets to zero, and starts counting the number of incoming packets.

In the meantime, the procedure checks if the number of collected packets is


Rectified Graph Reconstruction Procedure (P ∗)

/* Initially, Gc contains the “victim” node only, and pkt count = 0.*/1. Foreach incoming packet pkt ; do2. pkt count := pkt count + 1;3. If the incoming packet pkt contains an edge e that is not included in

Gc; then4. Construct the new attack graph Gc by inserting the edge e ;5. If Gc is a connected graph ; then6. TPN := TPN subroutine(Gc, P ∗) ;7. pkt count := 0 ;8. end If9. end If10. If Gc is a connected graph ; then11. If pkt count > TPN ; then12. Return Gc as the constructed attack graph ;13. end If14. end If15. end Foreach

Figure 5.2: The pseudocode of the rectified graph reconstruction procedure ofthe RPPM algorithm.

larger than the TPN. If so, the procedure claims that the constructed graph

Gc is the attack graph with probability P ∗. Otherwise, the victim receives

a packet encoding a new edge. Then, the procedure updates the constructed

graph, revisits the TPN calculation subroutine, resets the counter for incoming

packets, and waits until a packet encoding new edge arrives or the number of

incoming packets is larger than the new TPN.

In the case that the constructed graph is disconnected, the procedure should

not calculate the TPN nor terminate its execution. The reason is that an attack

graph must be connected. Then, it would be wrong to return a disconnected

graph as a correct constructed graph. Therefore, the procedure should wait


until the constructed graph becomes connected again.

As a result, the termination condition of the RPPM algorithm is “the

counter for the incoming marked packets is larger than the termination packet

number (TPN)”. This shows that “the calculation of the TPN during each

update of the constructed graph” is the core of the RPPM algorithm. In the

next step, we provide a deeper understanding of the RPPM algorithm through

the introduction of the execution diagram.

5.3 Execution Diagram of the RPPM algorithm

According to the previous section, it is observed that the TPN, the constructed

graph, and the execution of the rectified graph reconstruction procedure are

closely related. Such a relationship can be visualized by the construction of the

execution diagram, as shown in Figure 5.3. The execution diagram presents

the dynamics of the execution of the rectified graph reconstruction procedure.

5.3.1 Types of states

There are two types of states in the diagram, the execution state and the

termination state. When the procedure is running, we say that “the rectified

graph reconstruction procedure is in an execution state”. Otherwise, we say

that “the rectified graph reconstruction procedure is in the termination state”.

The execution states also tells us the state of the constructed graph.

1. When the procedure is in the start state, labeled by “0”, it means the

procedure has started running and there are no edges in the constructed

graph.

2. When the procedure is in a connected state, it means the constructed

graph is connected. A connected state labeled by Ci means the con-

structed graph is connected and contains i edges.


0

C1 Cn-1 Cn

STOP

D1 Dn-1

C2

D2

disconnected states

start state

connectedstates

growth transition termination transitiontermination stateexecution state

Figure 5.3: An execution diagram of the rectified graph reconstruction proce-dure of the RPPM algorithm constructing a graph with n edges.

3. When the procedure is in a disconnected state, the constructed graph is

disconnected. A disconnected state labeled by Di means the constructed

graph is disconnected and contains i edges.

Note that both the connected and disconnected states, say Ci and Di, respec-

tively, refer to all the possible graphs that have i edges. Last but not least,

when the procedure is in the termination state, it means the procedure has

stopped.

5.3.2 Types of transitions

There are two types of transitions in the execution diagram. When the proce-

dure takes a growth transition, it means a new edge is added to the constructed

graph. When the procedure takes a termination transition, it means the pro-

cedure is going to stop running.

The transition structure in Figure 5.3 is derived from the pseudocode of


the rectified graph reconstruction procedure in Figure 5.2. We briefly describe

the transition structure as follows.

1. If a packet encoding a new edge arrives before the number of received

packets is larger than the TPN, then the procedure takes a growth tran-

sition and proceeds to either a connected state or a disconnected state,

depending on the connectivity of the updated constructed graph.

2. If the number of received packets is larger than the TPN, then the pro-

cedure takes the termination transition and proceeds to the termination

state.

3. If the procedure is in one of the disconnected states, then it is meaning-

less to return such a graph as the correct constructed graph, and there

is no transition connecting the disconnected states to the termination

state. The procedure then keeps collecting packets until it proceeds to a

connected state.

5.3.3 Worst-case, average-case, and best-case scenarios

According to the execution diagram, one can classify three kinds of execution

scenarios of the RPPM algorithm. They are the worst-case, the average-case,

and the best-case scenarios. This classification is based on the possibility that

the RPPM algorithm returns a correct graph.

If one assumes that the constructed graph is always connected, then, at

every state, the victim has to calculate the TPN, and has to wait until the rec-

tified graph reconstruction procedure makes a transition to the next connected

state or the termination state. In other words, the procedure is vulnerable re-

turning an incorrect result because there is always a non-zero probability that

the procedure is terminated. We name this scenario the worst-case scenario.

On the other hand, if the constructed graph is allowed to enter a disconnected


state, then the procedure would not always have the possibility to enter the

termination state. We name this scenario the average-case scenario.

In addition, there is a possibility that the rectified graph reconstruction

procedure is always in the disconnected states (except the state when the

constructed graph becomes the attack graph). Then, there is no chance for

the procedure to return an incorrect result. We name this scenario the best-

-case scenario. Note that the best-case scenario will always have a successful

graph reconstruction.

5.3.4 Role of the execution diagram

The execution diagram provides a thorough understanding about the relation-

ship among the execution of the rectified graph reconstruction procedure, the

constructed graph, and the TPN. Through the analysis of the execution dia-

gram, it can be observed that different execution scenarios of the procedure

would affect the probability that the procedure returns a correct constructed

graph.

It is observed that the worst-case scenario would be the hardest case for

the rectified graph reconstruction procedure to return a correct graph. There-

fore, it is an ideal point for us to derive the calculation of the TPN. Sup-

pose one could successfully provide a guarantee of the correctness of the con-

structed graph under the worst-case scenario, then such a guarantee can also

be provided in the average-case scenario. Moreover, it is expected that the

average-case scenario should out-perform the worst-case scenario in terms of

the successful rate of returning a correct constructed graph. In the next step,

we move on to the derivation of the termination packet number.


5.4 Derivation of Termination Packet Number

In this section, we discuss the derivation of the TPN at each connected state so

that the RPPM algorithm returns a correct constructed graph with probability

larger than P ∗.

5.4.1 Technique

The technique of the TPN calculation is to similar to the testing hypothesis

technique. First, say the rectified graph reconstruction procedure is currently

in state Ci. Then, we make the following hypothesis.

Hypothesis: the attack graph has more than i edges.

When there are more marked packets arriving at the victim and the constructed

graph has not been changed, it gives the procedure an increasing confidence

to reject the hypothesis.

The rectified graph reconstruction procedure should express such a confi-

dence in terms of the number of marked packets arriving at the victim. Then,

the TPN is actually a threshold saying that the procedure has a confidence

larger than P ∗ to reject the hypothesis. Therefore, the constructed graph is

the same as the attack graph with a probability of P ∗ when the TPN is reached.

In the following, we introduce the state-change probability, and this is the

first step of the derivation of the TPN. As mentioned at the end of Section

5.3, we consider only the worst-case case scenario, i.e., the constructed graph

is always connected.

5.4.2 State-change probability

We denote Pτi(Ci → Ci+1) as the probability that the rectified graph recon-

struction procedure proceeds from state Ci to state Ci+1 with TPN set to τi,

and we name this probability the state-change probability from Ci to Ci+1. In


other words, it is the probability that the victim receives a new edge before

the number of collected marked packets is larger than the TPN τi. Note that

we are not referring to any specific constructed graphs. Instead, as mentioned

in Section 5.3.1, Ci represents all the possible connected graphs with i edges.

Since the probability that the RPPM algorithm returning a correct con-

structed graph is equivalent to the probability that the RPPM algorithm makes

a transition of n− 1 steps from state C1 to state Cn, mathematically, we have

the following equation:

P (constructed graph is correct) =n−1∏

j=1

Pτj(Cj → Cj+1) .

Then, our claim is correct given that the product of the state-change prob-

abilities from state C1 to state Cn should be greater than P ∗ and is given

by:i∏

j=1

Pτj(Cj → Cj+1) > P ∗ .

For the sake of further presentation, we transform the above equation into

Equation (5.1):

Pτi(Ci → Ci+1) >

P ∗

Xi−1, where Xi−1 =

i−1∏

j=1

Pτj(Cj → Cj+1) . (5.1)

Note that Xi−1 in Equation (5.1) is the product of the state-change probabil-

ities of the past states of the rectified graph reconstruction procedure, and we

named it the accumulated state-change probability at state Ci. We will discuss

how to calculate the accumulated state-change probability in Section 5.4.3.

5.4.3 TPN derivation

According to the previous subsection, we know that the TPN at each connected

state can be found by Equation (5.1), which is expressed in terms of the state-

-change probability. In this subsection, we derive the TPN by deriving the

state-change probability with the following steps:


1. To recall, the state-change probability is the probability that the con-

structed graph of state Ci evolves into the constructed graph of state

Ci+1. Hence, the first step to calculate the state-change probability is

to find all the graphs that could possibly be the next constructed graph,

and we name this set of graphs the extended graphs.

2. In the second step, for each extended graph Ge, we find the probability

that the current constructed graph becomes the extended graph Ge. As a

matter of fact, the above probability is the state-change probability from

Ci to Ci+1 conditioned that the extended graph Ge is the next constructed

graph, and we name this the conditional state-change probability.

3. From the conditional state-change probability, one can find the state-

-change probability (and thus the TPN) through the definition of the

condition probability. Nevertheless, because the calculation of the exact

TPN violates basic assumptions of the traceback problem, the upper-

-bounded TPN would be derived alternatively, and the relationship be-

tween the exact TPN and the upper-bounded TPN will be presented.

Extended graphs

The extended graphs are the predictions of the future constructed graph based

on the current graph. Denote the constructed graph in state Ci of the rectified

graph reconstruction procedure as Gi where i ≥ 1. By the assumption that

every router has only one victim route (Assumption 3.6 on Page 90) and the

assumption that every constructed graph is connected (which was made earlier

in this section), when the constructed graph evolves from Gi to Gi+1, there are

always one new edge and one new node inserted into Gi.

The example in Figure 5.4 helps illustrate the above point. On the left

side of the figure, there is a constructed graph with one edge connecting two

nodes, and the victim and the router are labeled by v and R1, respectively.


R1 ν

R1 ν

R1

ν

R2

R2

ConstructedGraph

ExtendedGraphs

Figure 5.4: Extended graph example: a constructed graph and its set of ex-tended graphs.

On the right side of the figure, a new edge is inserted in the constructed graph

at two possible locations: the graph on the left has the new edge (R2, R1),

and another one has the new edge (R2, v). We name the introduced edges the

extended edges. Formally, we define the extended graphs of Gi in Definition

5.1, and we define G(Gi) as the set of extended graphs.

Definition 5.1 Let G(Gi) be the set of extended graphs of the constructed

graph Gi = (Vi, Ei) in state Ci of the rectified graph reconstruction procedure.

G(Gi) = {Ge = (Ve, Ee) | ∃(u, t) /∈ Ei & u /∈ Vi & t ∈ Vi

such that Ve = Vi ∪ u and Ee = Ei ∪ (u, t)} .

By the assumption that every constructed graph is connected in this sec-

tion, G(Gi) has already included all the possible candidates for the next con-

structed graph Gi+1. So, in the next step, we assume that an extended graph

Ge is the next constructed graph Gi+1. Then, we calculate the state-change

probability conditioned that Gi+1 = Ge, and we call it conditional state-change

probability. Lastly, by using the definition of conditional probability:

Pτi(Ci → Ci+1) =

∑

Ge∈G(Gi)

Pτi(Ci → Ci+1 | Gi+1 = Ge) × P (Gi+1 = Ge) ,

we have the state-change probability.


The conditional state-change probability

The conditional state-change probability is calculated according to the follow-

ing rationale. If one assumes that Gi+1 = Ge, then one knows the topology of

the next constructed graph, and also knows where the extended edge is. Then,

the state-change probability is equivalent to the probability that a packet en-

coding the extended edge arrives at the victim before the number of collected

packets is larger than the TPN.

The probability that the extended edge e′ arrives at the victim is exactly the

packet-type probability P (T (Ge) = e′). Because the marking process of each

packet is independent, the state-change probability conditioned that Gi+1 = Ge

is therefore given by the following equation:

Pτi(Ci → Ci+1 | Gi+1 = Ge) = 1 −

(

1 − P (T (Ge) = e′))τi

. (5.2)

Note that Equation (5.2) is an increasing function with respect to τi because:

ddx

(

1 −(

1 − P (T (Ge) = e′))x)

= −(

1 − P (T (Ge) = e′))x

log(

1 − P (T (Ge) = e′))

> 0,

where x > 0 & P (T (Ge) = e′) ∈ (0, 1).

To continue with the calculation of the state-change probability, the prob-

ability P (Gi+1 = Ge) has to be known. However, this is prohibited by the

assumption that the victim does not have any information about the attack

graph. As an alternative, the upper-bounded TPN will be derived instead.

Upper-bounded TPN

Since the conditional state-change probability is increasing with respect to τi

(stated in the note of Equation (5.2)), one can always find a sufficiently large

integer, τ ∗i , such that:

Pτ∗

i(Ci → Ci+1 | Gi+1 = Ge) >

P ∗

Xi−1

, ∀ Ge ∈ G(Gi) . (5.3)


By the above idea, we have:

Pτ∗

i(Ci → Ci+1) =

∑

Ge∈G(Gi)

Pτ∗

i(Ci → Ci+1 | Gi+1 = Ge) × P (Gi+1 = Ge)

>∑

Ge∈G(Gi)

P ∗

Xi−1

× P (Gi+1 = Ge) . (By Eq. (5.3).)

∵

∑

Ge∈G(Gi)

P (Gi+1 = Ge) = 1 , ∴ Pτ∗

i(Ci → Ci+1) >

P ∗

Xi−1.

Hence, this shows that τ ∗i can also be a TPN of state Ci because Equation (5.1)

is satisfied. By the above arguments, it is required to confirm the existence

of τ ∗ such that τ ∗i is large enough to satisfy Equation (5.3). From Equation

(5.3), we have:

Pτ∗

i(Ci → Ci+1 | Gi+1 = Ge) >

P ∗

Xi−1

⇒ 1 −(

1 − P (T (Ge) = e′))τ∗

i

>P ∗

Xi−1

(By Eq (5.2).)

⇒ τ ∗i >

log(

1 − P ∗

Xi−1

)

log(

1 − P (T (Ge) = e′)) .

Since the TPN is an integer, we have:

τ ∗i = ⌊Yi(Ge) + 1⌋ , where Yi(Ge) =

log(

1 − P ∗

Xi−1

)

log(

1 − P (T (Ge) = e′)) .

Further, by the monotonic increasing property of the logarithmic function,

Yi(Ge) is monotonic decreasing with respect to P (T (Ge) = e′). Thus, by

finding the value minGe∈G(Gi) P (T (Ge) = e′), the maximum value of τ ∗i in the

set of extended graphs G(Gi) can be found. Therefore,

Upper-bounded TPN

τ ∗i =

log(

1 − P ∗

Xi−1

)

log(1 − pmin)+ 1

, where pmin = minGe∈G(Gi)

P (T (Ge) = e′). (5.4)

Remark. The upper-bounded TPN derived in Equation (5.4) may not be the

exact value of the TPN because if the corresponding extended graph of pmin


in Equation (5.4) is not the next constructed graph Gi+1, then the true TPN

should be smaller (by the decreasing property of Yi(Ge) in the proof). That is

why we name τ ∗i the upper-bounded TPN.

Calculation of the accumulated state-change probability

According to Equation (5.1), the accumulated state-change probability is given

by:

Xi−1 =i−1∏

j=1

Pτ∗

j(Cj → Cj+1) =

Xi−2 × Pτ∗

i−1(Ci−1 → Ci) , i > 1 ;

1 , i = 1 .

Since the state-change probability is not derived, we opt to calculate the accu-

mulated state-change probability after the state of the rectified graph recon-

struction procedure has been changed.

Let us consider the scenario that the constructed graph is changed from

Gi−1 to Gi. Since after the state has been changed, the probability P (Gi = Ge)

becomes either one or zero for every extended graph Ge, and that means:

P (Gi = Ge) =

0, Ge ∈ G(Gi−1) − {Gi};1, Ge = Gi.

Then, the state-change probability Pτ∗

i−1(Ci−1 → Ci) becomes:

Pτ∗

i−1(Ci−1 → Ci)

=∑

Ge∈G(Gi−1) Pτ∗

i−1(Ci−1 → Ci | Gi = Ge) × P (Ge = Gi)

= Pτ∗

i−1(Ci−1 → Ci | Gi = Gi) × P (Gi = Gi)

= 1 −(

1 − P (T (Gi) = ei))τ∗

i−1

, (By Eq. (5.2).)

where ei is the new edge added to Gi.

Hence, the accumulated state-change probability Xi−1 can be obtained after

the rectified graph reconstruction procedure has proceeded from state Ci−1 to

state Ci. Equation (5.5) presents the calculation of the accumulated state-

change probability.


Accumulated State-Change Probability

Xi−1 =

Xi−2 ×(

1 −(

1 − P (T (Gi) = ei))τ∗

i−1

)

, i > 1 ;

1 , i = 1 .

(5.5)

The accumulated state-change probability for a disconnected state

We now consider the case when the assumption that the constructed graph is

always connected is removed, i.e., a normal execution of the RPPM algorithm.

Suppose that the rectified graph reconstruction procedure enters the discon-

nected state Di+1 from the connected state Ci, the update of the accumulated

state-change probability has to be changed.

According to the previous discussion, the accumulated state-change prob-

ability depends on the constructed graph in state Di+1, which is disconnected.

Nevertheless, because the graph Gi is disconnected, then the packet-type

probability P (T (Gi) = ei) cannot be found. As an alternative, we choose

minGe∈G(Gi) P (T (Ge) = e′) in Equation (5.4) as the value of P (T (Gi+1) = ei+1)

in Equation (5.5). The reason for the above choice is as follows:

τ ∗i >

log(

1 − P ∗

Xi−1

)

log(

1 − pmin

) ⇒ Xi−1 ×(

1 −(

1 − pmin

)τ∗

i

)

> P ∗ .

where pmin = minGe∈G(Gi) P (T (Ge) = e′).

Hence, the accumulated state-change probability is still larger than the

traceback confidence level P ∗ by choosing minGe∈G(Gi) P (T (Ge) = e′) as the

value of P (T (Gi+1) = ei+1) in Equation (5.5). In the next subsection, we

conclude this section and provide the pseudocode of the TPN calculation sub-

routine.


TPN subroutine(Graph G, Traceback Confidence Level P ∗)

/* Let the variables τ , X and p min be static variables, which meanthe values of these variables are not erased after exiting the subroutine.*/

1. If G is not connected & G.edge > 0; then2. If the previous state is a connected state ; then3. X := X × (1 − (1 − p min)τ ) ;4. end If5. exit the subroutine ;6. end If7. If the previous state is a connected state & G.edge > 0; then8. p := packet-type probability of the new edge of the constructed graph

;9. X := X × (1 − (1 − p)τ ) ;10. end If11. p min := 1 ;12. Foreach extended graph Ge in G(G) ; do13. p := the packet-type probability of the extended edge of Ge ;14. p min := min(p min, p) ;15. end Foreach16. τ := ⌊log(1 − P ∗/X)/ log(1 − p min) + 1⌋ ;17. return τ ;

Figure 5.5: The pseudocode of the termination packet number (TPN) calcu-lation subroutine.


5.4.4 Section summary and TPN calculation subroutine

To summarize, we have presented how one can calculate the TPN at every con-

nected state of the graph construction procedure so that the RPPM algorithm

returns a correct constructed graph with a specified probability P ∗.

Figure 5.5 shows the subroutine that calculates the TPN, and it is ex-

ecuted whenever the rectified graph reconstruction procedure enters a new

state. When the routine is visited for the first time, the variable “X” that is

used to store the accumulated state-change probability is initialized to one.

Next, based on the connectivity of the current constructed graph, the variable

“X” is updated in different ways: 1) if the current constructed graph is con-

nected, the subroutine calculates the packet-type probability of the new edge

and then updates the variable “X”; and 2) if the current constructed graph

is disconnected, the subroutine uses the minimum packet-type probability of

the extended edge that was chosen from the extended graphs of the previous

constructed graph, i.e., “p min” in the pseudocode in Figure 5.5. Next, if the

current constructed graph is disconnected, the TPN subroutine will not calcu-

late the TPN, and one should exit the subroutine. Otherwise, the subroutine

calculates the TPN based on Equation (5.4). Finally, the subroutine returns

the calculated TPN.

After the calculation of the TPN has been derived, we will illustrate the

interactions between the execution of the rectified graph reconstruction pro-

cedure, the growth of the constructed graph, as well as the calculation of the

TPN in the next section.

5.5 Graph Reconstruction Example

In this section, we illustrate the entire graph reconstruction process of the

RPPM algorithm through an example. The example follows the pseudocodes

of the packet-type probability calculation, the rectified graph reconstruction


R1 ν

R1 ν

R1

νR2

(G1,1) (G1,2)

ConstructedGraph

ExtendedGraphs

(G1)

R2

Figure 5.6: State C1: a constructed graph with one edge, and its extendedgraphs.

procedure, and the TPN calculation subroutine in Figures 4.5 (on Page 104),

5.2 (on Page 125), and 5.5 (on Page 138), respectively.

We have the following configuration in the example: the attack graph is

the linear network with three routers shown in Figure 4.7 (on Page 108, the

marking probability pm is 0.5, and the traceback confidence level P ∗ is 0.5.

Again, we present the example with the assumption that the constructed graph

is always connected (the worst-case scenario). Moreover, we assume that the

victim counts only the marked packets.

5.5.1 State C1

According to the assumption that every constructed graph is connected, the

victim first receives the edge (R1,V) and the rectified graph reconstruction

procedure enters state C1.

As the constructed graph G1 is connected, the TPN calculation subroutine

is executed. Firstly, one has to construct the extended graphs of G1. Since

G1 has two nodes, there will be two extended graphs namely G1,1 and G1,2 as

shown in Figure 5.6. Secondly, one has to calculate the marked packet-type

probabilities for both extended graphs, and the results are listed in Table 5.1.


Edges (e) of G1,1

(R1,V) (R2, R1)P (Tm(G1,1)=e) 2

313

Edges (e) of G1,1

(R1,V) (R2,V)P (Tm(G1,2)=e) 1

212

Table 5.1: The marked packet-type probabilities of the extended graph G1,1

and G1,2.

According to the TPN calculation subroutine, one has to find the minimum

marked packet-type probability of the extended edges of the two extended

graphs. Referring to Table 5.1, the minimum value is 13. Then, the TPN of C1

τ1 is calculated as follows:

τ1 =

⌊

log(1 − 0.51

)

log(1 − 13)

+ 1

⌋

= ⌊2.7095⌋ = 2 .

Note that the variable X in the TPN calculation subroutine is initialized to

one.

Therefore, the victim should wait for two marked packets before it stops the

rectified graph reconstruction procedure. Suppose that a packet encoding the

edge (R2, R1) arrives before the victim exits the rectified graph reconstruction

procedure. Then, the procedure enters state C2.

5.5.2 State C2

Since the constructed graph C2 is again connected, the TPN calculation sub-

routine is executed. Before the subroutine starts calculating the TPN, the

accumulative state-change probability, i.e, the variable X in the TPN calcula-

tion subroutine, should be updated as follows:

X = 1 ×(

1 −(

1 − 1

3

)2)

=5

9.


ν

ν(G2,1)

(G2,2)

ConstructedGraph

ExtendedGraphs

νν

(G2,3)

(G2)R1R2

R3 R2

R2 R2

R1

R1 R1

R3 R3

Figure 5.7: State C2: a constructed graph with two edge, and its extendedgraphs.

The current constructed graph G2 and the extended graphs of G2, G2,1,

G2,2, and G2,3, are shown in Figure 5.7. The marked packet-type probabilities

for the three extended graphs are listed in Table 5.2, and the minimum marked

packet-type probability of the extended edge is 17

from the extended graph G2,1.

Then, the TPN τ2 is calculated as follows:

τ2 =

log(

1 − 0.55

9

)

log(

1 − 17

) + 1

= ⌊15.9372⌋ = 15 .

Hence, the victim should wait for 15 marked packets before it stops the

RPPM algorithm.

5.5.3 State C3

Suppose that a new edge (R3, R2) arrives before the number of collected marked

packets is larger than the TPN of state C2. Now, the constructed graph is

exactly the same as the attack graph and the rectified graph reconstruction

procedure enters the state C3. Nevertheless, the procedure has not yet stopped

as the victim does not know the attack graph. By similar steps, one can find


Edges (e) of G2,1

(R1,V) (R2, R1) (R3, R2)P (Tm(G2,1)=e) 4

727

17

Edges (e) of G2,2

(R1,V) (R2, R1) (R3, R1)P (Tm(G2,2)=e) 2

316

16

Edges (e) of G2,3

(R1,V) (R2, R1) (R3,V)P (Tm(G2,3)=e) 2

515

25

Table 5.2: The marked packet-type probabilities of the extended graphsG2,1,G2,2, and G2,3.

that the victim has to wait for 100 marked packets before it stops the RPPM

algorithm. Since there is no more new types of marked packets arrived, the

procedure enters the termination state.

In the next section, we present the simulation results of the RPPM algo-

rithm which show the correctness and robustness of the RPPM algorithm.

5.6 Simulation Result

In this section, we present the simulation results to show that the RPPM algo-

rithm is able to guarantee the correctness of the constructed graph independent

of the marking probability and the structure of the attack graph. First, we

describe the simulation environment.

5.6.1 Simulation environment

Every simulation of the RPPM algorithm starts with a testing network rooted

at the victim, i.e., the attack graph. The configuration of the network follows

the assumption stated in Section 3.3 (on Page 85). In addition, the network


has at least one leaf router, i.e., a router with zero incoming edges. Each edge

between two routers is directed and is assumed to have infinite capacity; thus,

no packet is lost under this environment.

Next, we describe the properties of the simulated packets. All packets are

homogeneous in terms of type, size, etc. Every packet’s destination is set to

the victim, and every packet starts its itinerary at one of the leaf routers of

the testing network, chosen at random. Further, the paths traversed by the

packets are chosen at random.

5.6.2 Simulation: different values of the marking prob-

ability

In this set of simulations, the impact of the marking probability on the success-

ful rate of the RPPM algorithm will be studied. As presented in Section 4.2

(on Page 100), the marking probability is one of the factors that determines

the packet-type probability and also the termination packet number. As a

matter of fact, the marking probability is closely related to the occurrences of

the different execution scenarios described in Section 5.3.3.

A high value of the marking probability is analogous to the worst-case

scenario. If the value of the marking probability is high, most of the arrived

packets are encoding edges that are close to the victim. Then, the constructed

graph is always connected with a very high probability, and thus, this case is

analogous to the worst-case scenario. On the contrary, the execution of the

RPPM algorithm is close to the best-case scenario with a very low value of the

marking probability.

We have conducted a set of simulations to verify the above claims. In this

set of simulations, the testing network is the linear network depicted in Figure

4.7 (on Page 108). The simulations are performed at three different values of

the marking probability: 0.1, 0.5, and 0.9.


0

0.2

0.4

0.6

0.8

1

0.999 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Suc

cess

ful r

ate

Traceback Confidence Level

RPPM algorithm simulation: 3-router linear network

pm = 0.1pm = 0.5pm = 0.9

bottom line

Figure 5.8: The simulations show that the larger the marking probability is,the closer to the worst-case execution the simulation result is.

Calculation of the successful rate

The results of the simulations are shown in Figure 5.8. Each data point in the

figure is obtained in the following way. For instance, we are computing the

data point for the plot with marking probability set to 0.1 and the traceback

confidence level set to zero. The RPPM algorithm is executed repeatedly

for 10,000 times with the marking probability set to 0.1 and the traceback

confidence level set to zero. For each instance of the RPPM algorithm, the

returned constructed graph is compared with the input network graph, i.e., the

3-router linear network in this case. If the constructed graph is the same as the

input network graph, then it is considered to be a successful execution. Else,

it is considered to be a failed execution. Hence, we are employing a stricter

definition of correctness than the one defined in Definition 3.1 on Page 82.


Lastly, the successful rate, i.e., the y-axis of the simulation result, is ob-

tained as follows.

Successful rate =Number of successful executions

Number of executions.

Therefore, the successful rate is the value showing how the RPPM algorithm

performed. Note that the above calculation of the successful rate applies to

every simulation within this chapter.

Definition of the bottom line

Back to Figure 5.8, in spite of the simulation results, there is an extra plot

in the figure named the “bottom line” representing the function y = x. If

a data point is above the bottom line, i.e., the successful rate is higher than

the traceback confidence level, then this implies the RPPM algorithm can

guarantee that correctness of the constructed graph, and vice versa. Therefore,

we expected that no data point would appear below the bottom line. Note

that the above definition of the bottom line applies to every simulation within

this chapter.

Simulation result

We now analyze the simulation result. Firstly, all the data points are above

the bottom line, and this shows that the RPPM algorithm can guarantee the

correctness of the constructed graph under different values of the marking

probability.

Secondly, one can observe that as the marking probability increases, the

rate at which the RPPM algorithm returns a correct graph decreases. With

pm = 0.9, the plot is very close to the bottom line. According to above

definition of the bottom line, this simulation result means the successful rate

is very close to the traceback confidence level, and this implies the worst-case

scenario.


0

0.2

0.4

0.6

0.8

1

0.999 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Suc

cess

ful r

ate


RPPM algorithm simulation: 14-router linear network with random marking probability

Worst caseAverage case

bottom line

Figure 5.9: RPPM algorithm simulation: 15-node linear network with randommarking probability.

Through this set of simulations, we showed that the RPPM algorithm can

guarantee the correctness of the constructed graph under different values of

the marking probability.

5.6.3 Simulation: different graph structures

The second set of simulations is to test if the RPPM algorithm can guarantee

the promised successful rate under different graph structures. In this set of

simulations, we execute the simulations under both the worst-case and the

average-case scenarios. The worst-case scenario is forced to be happening by

restricting the packet generation process while the average-case scenario is a

normal execution of the RPPM algorithm without any constraints. Also, for

each execution of the RPPM algorithm, the marking probability is set to a

random number from 0.1 to 0.9 inclusively.

The simulation results for the linear network, the binary-tree network, and


0

0.2

0.4

0.6

0.8

1

0.999 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Suc

cess

ful r

ate


RPPM algorithm simulation: 14-router binary-tree network with random marking probability


bottom line

Figure 5.10: RPPM algorithm simulation: 14-router binary-tree network withrandom marking probability.

0

0.2

0.4

0.6

0.8

1

0.999 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Suc

cess

ful r

ate


RPPM algorithm simulation: 14-router random-tree network with random marking probability


bottom line

Figure 5.11: RPPM algorithm simulation: 14-router random-tree network withrandom marking probability.


0

0.2

0.4

0.6

0.8

1

0.999 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Suc

cess

ful r

ate


RPPM algorithm simulation: 100-router random-tree network with marking probability = 0.1


bottom line

Figure 5.12: RPPM algorithm simulation: 100-router random-tree networkwith marking probability = 0.1.

0

0.2

0.4

0.6

0.8

1

0.999 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Suc

cess

ful r

ate




bottom line

Figure 5.13: RPPM algorithm simulation: 500-router random-tree networkwith marking probability = 0.1.


0

0.2

0.4

0.6

0.8

1

0.999 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Suc

cess

ful r

ate




bottom line

Figure 5.14: RPPM algorithm simulation: 1,000-router random-tree networkwith marking probability = 0.1.

the random-tree network containing 14 routers and one victim are shown in

Figures 5.9, 5.10, and 5.11, respectively. The topologies of the linear and the

binary-tree networks are self-explaining, and a random-tree network means

the nodes are randomly connected with the following constraints: 1) every

router can reach the victim in a non-zero number of hops; 2) there must be no

cycles in the graph; 3) the victim must not have any outgoing edges; and 4)

every router can only have one outgoing edge. Also, as [78] suggested that the

longest router in the Internet is 32, then the maximum length of the paths of

the testing network is therefore 32.

All three results show that no matter what the network is, all the data

points are above the bottom line. Hence, this shows that the RPPM algo-

rithm guarantees the correctness of the constructed graph independent of the

structure of the real network graph. Also, the simulation results support the

claim that the average-case scenario is out-performing the worst-case scenario

in terms of the successful rate. Further, we extend the simulations on the


random-tree network to larger network scales with 100, 500, and 1000 routers,

and the results are shown in Figures 5.12, 5.13, and 5.14, respectively. Accord-

ing to the results, the increasing network scale does not affect the guarantee

provided by the RPPM algorithm.

5.6.4 Section summary

In conclusion, the simulation results showed that the RPPM algorithm guar-

antees the correctness of the constructed graph independent of the marking

probability and the structure of the attack graph.

5.7 Supporting Routers with Multiple Victim

Routes

In this section, we relax the assumption that every router has only one outgoing

route towards the victim, i.e., Assumption 3.6 on Page 90. This change may

cause the attack packets to take more than one path towards to the victim,

and the routers in the constructed graph may have more than one outgoing

edge,

In the following, we first discuss the problem that emerged when the RPPM

algorithm is applied to routers having multiple victim routes. In addition, a set

of simulations is performed to illustrate the severity of the problem. Second,

we present the solution to the problem caused by the relaxed assumption: the

method is to introduce an extra set of extended graphs. Lastly, we perform

simulations based on this solution and compare the results with and without

the support of multiple victim routes.


5.7.1 Problem of multiple victim routes

Originally, without considering routers having multiple victim routes, the ar-

rival of a new encoded edge will add only a new node and a new edge to

the constructed graph (and note that it is the worst-case execution scenario).

However, when we allow a router to have multiple victim routes, the arrival of

a marked packet encoding a new edge can result in two different scenarios: (i)

a new node is added, i.e., one node plus one edge; or (ii) no new node is added,

which means the new edge connects two existing nodes. Since the latter case is

not considered by the RPPM algorithm, one may then doubt the guarantee of

the successful rate of the RPPM algorithm. The following simulation supports

this doubt.

The simulation environment

The testing network is a random-tree network with 10 nodes: one victim plus

nine routers. But this time, we allow the routers in the testing network to

have more than one victim route. Again, the marking probability is set to a

random number in [0.1 : 0.9], and the values are the same for all routers.

The simulation result

Figure 5.15 shows the simulation results for both the average-case and the

worst-case executions. For small values of the traceback confidence level, the

successful rates of both execution modes are still over the bottom line. How-

ever, the successful rate of the worst-case execution falls below the bottom line

when the traceback confidence level goes beyond 0.54 while the successful rate

of the average-case execution falls below the bottom line when the traceback

confidence level goes beyond 0.59.

One can conclude that the RPPM algorithm cannot provide a guarantee of

the successful rate in reconstructing the attack graph when the routers have


0

0.2

0.4

0.6

0.8

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.999

Suc

cess

ful R

ate


RPPM Simulation: 9-router random network with random marking probability

Average case, Single victim routeWorst case, Single victim route

Bottom line

Figure 5.15: When the routers have more than one victim route, the RPPMalgorithm cannot guarantee the correctness of the constructed graph when theconfidence level is larger than 0.59.

multiple outgoing routes towards the victim.

5.7.2 Formulating an extra set of extended graphs

To solve the problem, we suggest introducing an extra set of extended graphs.

The new set of extended graphs is defined in Definition 5.2.

Definition 5.2 Let G′(Gi) be the set of extended graphs of the constructed

graph Gi = (Vi, Ei) that supports multiple outgoing routes towards the victim.

G′(Gi) = {G′e = (Vi, E

′e) | ∃(u, v) /∈ Ei & u, v ∈ Vi such that E ′

e = Ei ∪ (u, v)},

and all graphs in G′(Gi) must not have any cycles.

According to Definition 5.2, an extended graph in G ′(Gi) introduces an

extra edge to the constructed graph without an extra node. The edge connects


ConstructedGraph

New Type ofExtended Graphs

R1 νR2

R1 νR2

extended edge

Figure 5.16: An illustration of the extended graph with the support of multiplevictim routes.

any two existing nodes with two restrictions: (1) no cycles and (2) a multi-

graph should not be formed. Then, this definition creates a family of extended

graphs with routers having multiple victim routes.

We illustrate the definition of the new set of extended graphs through an

example in Figure 5.16. The upper part of the figure shows a constructed

graph with two routers, R1 and R2, and the victim v and the lower part of

the figure is the new extended graph. For this example, there can only be one

extra edge (R2, v) according to Definition 5.2.

5.7.3 Reformulation of packet-type probability

As Assumption 3.6 is violated, the calculation of the packet-type probability

which is formulated based on this assumption is therefore violated too. A new

formulation of the packet-type probability is required. In here, we repeat some

parts of the derivation of the packet-type probability in Section 4.2.1 on Page

101.


Via probability

In the original calculation of the via probability presented in Equation (4.4)

on Page 102, the calculation considers there is only one path from a leaf router

Rl to the victim V. In our current scenario, there can be more than one

path between these two endpoints. Hence, each path between the endpoints

should be considered, and we assume that the probability for every path in

Path(Rl,V) to be chosen by a packet is the same. Therefore, the new via

probability is given as follows.

Via Probability for Multiple Victim Routers

Pv((Ri, Rj)) =1

|L(G)| ×∑

Rl∈L(G)

∑

r∈Path(Rl,V)

δ(Path(Rl,V), (Ri, Rj)) × 1

|Path(Rl,V)| . (5.6)

Conditional encoding probability

Since the conditional encoding probability does not concern how the paths are

selected but it concerns which path and which edge are selected, the conditional

encoding probability is not required to change, and it is given by Equation 4.5

on Page 103.

Reformulated packet-type probability

Finally, we have the reformulated packet-type probability as follows.

Packet-type Probability for Multiple Victim Routes

P (T (G)=(Ri, Rj)) =1

|L(G)| ×∑

Rl∈L(G)

∑

r∈Path(Rl,V)

δ(Path(Rl,V), (Ri, Rj)) × 1

|Path(Rl,V)| ×

pm × (1 − pm)d((Ri,Rj),V ,Path(Ri,V))−1. (5.7)


Packet type(Graph G)

/* The size of the “result” array is equal to the number of edges of theinput graph “G”. */1. result := allocate memory(G.edge) ;2. For (i := 0; i < G.edge; i := i + 1)3. result[i] := 0 ;4. end For5. Foreach leaf in G ; do/* “search path” finds all the paths from leaf to victim */6. path set := search path(leaf, victim) ;7. Foreach path in path set ; do8. Foreach edge in path ; do9. length := edge distance function(edge, victim, path) ;10. result[edge] := result[edge] + 1.0/G.leaf num × 1.0/path.num ×11. pm × (1 − pm)length−1;12. end Foreach13. end Foreach14. end Foreach15. return result ;

Figure 5.17: The pseudocode of the packet-type probability calculation sub-routine which supports multiple victim routes.

Reformulated packet-type probability calculation pseudocode

Lastly, the pseudocode of the calculation of the re-formulated packet-type

probabilities of a given graph is going to be different from the one given in

Figure 4.5 on Page 104.

The new pseudocode is displayed in Figure 5.17. The difference between

the new and the old pseudocodes is the “for loop” starting from line 7 in Figure

5.17. This loop searches for every path leading from a leaf router (leaf in line

5) towards the victim.


0

0.2

0.4

0.6

0.8

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.999

Suc

cess

ful R

ate


RPPM Simulation: 9-router random network with random marking probability

Average case, single victim routeWorst case, single victim route

Average case, multiple victim routeWorst case, multiple victim route

Bottom line

Figure 5.18: With the support for multiple victim routes, the RPPM algorithmcan provide the guarantee of the correctness of the constructed graph.

5.7.4 Simulation: support for multiple victim routes

Definitions 5.1 and 5.2 together form an expanded set of extended graphs,

and the pseudocode in Figure 5.17 provides the way to calculate the set of

packet-type probabilities of network graphs having multiple victim routers.

Next, We conduct the simulation again using the expanded set of extended

graphs, and the results are shown in Figure 5.18. In the figure, the RPPM

algorithm can guarantee the correctness of the constructed graph again with

the support of multiple victim routes. Technically speaking, the introduction

of the extra set of extended graphs is actually increasing the value of the

termination packet number (TPN). As the TPN increases, the successful rate

is therefore increasing.


5.7.5 Section summary

In conclusion, we provided support for routers having multiple victim routes.

Such support is done through an expansion of the set of the extended graphs.

We performed simulations to contrast the performances of the RPPM algo-

rithm with and without such support.

The drawback of this support is computation. Let n be the number of

nodes and m be the number of edges of the constructed graph. Originally, the

number of extended graphs is of order O(n). With the mentioned support, the

order of the number of extended graphs becomes O(nm). Hence, more time

is spent on calculating the TPN at each connected state of the rectified graph

reconstruction procedure. This shows the tradeoff in handling routers with

multiple victim routes.

5.8 Deployment Issues of the RPPM Algorithm

In previous sections, we discussed the RPPM algorithm in a theoretical sense.

In this section, we discuss the issues in deploying the RPPM algorithm.

5.8.1 Choice of the marking probability

It is not desirable to have a high value of the marking probability. Firstly, a

high value of the marking probability means a low value for the packet-type

probabilities for the majority types of packets. Hence, this implies that a large

number of marked packets are needed before the RPPM algorithm stops. This

also implies a long execution time of the RPPM algorithm.

Let us take a linear network with three routers and one victim (shown

in Figure 4.7 (on Page 108) as an example to illustrate the relationship be-

tween the marking probability and the number of packets required. Figure

5.19 shows the result of a simulation that aims to count the average number


0

20

40

60

80

100

120

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Ave

rage

num

ber

of m

arke

d pa

cket

s

Marking probability

PPM algorithm simulation: 4-node linear network

Simulation result

Figure 5.19: Average number of marked packets required for a correct graphreconstruction against different values of the marking probability.

of marked packets required for a correct graph reconstruction with different

values of the marking probability. The result shows that, for small values of

marking probability, the number of required packets is small. Nevertheless,

the number of required packets increases dramatically for large values of the

marking probability.

Despite the above reason, according to Section 5.6.2 (on Page 144), a high

value of the marking probability implies the presence of the worst-case scenario

of the RPPM algorithm. Although the worst-case scenario can still guarantee

the successful rate, it would be more beneficial to set the value of the marking

probability to a lower value so as to gain a larger successful rate than expected.

According to the above analysis, one should choose a small value for the

marking probability for a faster and more reliable graph reconstruction. How-

ever, what will happen if a too-small value of the marking probability is chosen?

Figure 5.20 is going to tell us the result.

Figure 5.20 displays two plots: the plot labelled “w/o marked packets” is


0

20

40

60

80

100

120

140

160

180

200

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Ave

rage

num

ber

of m

arke

d pa

cket

s

Marking probability

PPM algorithm simulation: 3-router linear network

w/ unmarked packetsw/o unmarked packets

Figure 5.20: Average number of total packets (marked packets plus unmarkedpackets) required for a correct graph reconstruction against different values ofthe marking probability.

the original plot taken from Figure 5.19 while the remaining plot is the average

number of marked packets plus unmarked packets. As one can observe that if

a too-small value of the marking probability is chosen, more packets, mainly

the unmarked packets, are received before the PPM algorithm can construct a

correct constructed graph. This implies a longer execution time. In addition,

these two plots merge together eventually because the number of unmarked

packets received by the victim is diminishing as the the marking probability

increases.

One may be interested to find the value of the marking probability such

that the number of marked required is minimize. However, the calculation of

such a value requires prior knowledge about the attack graph. Therefore, the

determination of the best marking probability cannot be done in the current

stage of research.


5.8.2 Execution time comparison between the PPM and

the RPPM algorithms

In order to guarantee the correctness of the constructed graph, the RPPM

algorithm has to collect extra packets so as to attain such a guarantee. Tech-

nically speaking, before the moment that the constructed graph becomes the

same as the attack graph, the number of marked packets collected should be

the same for both the PPM and RPPM algorithms. After the constructed

graph has become the attack graph, the RPPM algorithm has to wait until

the number of collected packets is larger than the termination packet number

(TPN). In other words, that extra sum of packets is the tradeoff in deploying

the RPPM algorithm than the PPM algorithm.

However, it is difficult to determine a theoretical value or bound of the TPN

because the TPN calculation depends on the construction process of the con-

structed graph. The construction process, in turns, depends on the sequence

of the arrivals of the marked packets, which is randomized. Alternatively, we

conduct an empirical study on the tradeoff of the RPPM algorithm.

We first conduct simulations that are similar to those presented in Section

5.6, and the simulations are executed on random-tree networks with 14 routers,

50 routers, 100 routers, 500 routers, and 1000 routers, with marking probability

set to 0.1. Figures 5.21, 5.22, and 5.23 show the number of marked packets

recorded from the simulations at different traceback confidence levels (and the

plots are separated into three figures because of different scales of the y-axis).

In addition, these simulations are operated under the average-case scenario.

In Figure 5.24, we present the number of marked packets increased when

one compares the number of packets collected by the RPPM algorithm to those

collected by the PPM algorithm. Note that the average number of marked

packets that is just enough to construct the attack graph is obtained by in-

structing the PPM algorithm to stop when the constructed graph just becomes


0

100

200

300

400

500

600

700

800

900

1000

0.999 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Ave

rage

num

ber

of p

acke

ts m

easu

red


RPPM algorithm simulation: random-tree network with marking probability = 0.1

14 routers

Figure 5.21: The number of marked packets recorded for the set of RPPMalgorithm simulations carried out on a random-tree networks with 14 routers.

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0.999 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Ave

rage

num

ber

of p

acke

ts m

easu

red



50 routers100 routers

Figure 5.22: The number of marked packets recorded for the set of RPPMalgorithm simulations carried out on random-tree networks with 50 routersand 100 routers.


0

20000

40000

60000

80000

100000

120000

140000

160000

0.999 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Ave

rage

num

ber

of p

acke

ts m

easu

red




Figure 5.23: The number of marked packets recorded for the set of RPPMalgorithm simulations carried out on random-tree networks with 500 routersand 1000 routers.

0

100

200

300

400

500

600

700

800

900

1000

1100

0.999 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

Per

cent

age

incr

ease

d





1000 routers

Figure 5.24: The percentage of number of marked mpackets increased whencomparing the RPPM algorithm to the PPM algorithm with different networkscales.


Number of Routers 14 50 100 500 1000

Average number ofmarked packets re-quired for PPM al-gorithm

92 1,170 1,550 15,222 40,792

Average time re-quired in 100BaseTEthernet (in second)

0.011 0.140 0.180 1.820 4.910

Table 5.3: The average number of packets and time required to form a correctconstructed graph in a 100BaseT Ethernet.

the attack graph. Hence, these plots show the tradeoff of the RPPM algorithm

at different network sizes and different traceback confidence level.

Three main observations can be concluded from Figure 5.24. Firstly, when

the traceback confidence level increases, the tradeoff of the RPPM algorithm

increases. Secondly, the number of collected packets by the RPPM algorithm

is larger than those collected by the PPM algorithm by several times for the

small range of the traceback confidence level (two to five times for the traceback

confidence level below 0.8) and such an increase reaches 10 times for high values

of the traceback confidence level.

Lastly, an interesting observation is that the tradeoffs for small networks

are more significant than those for large networks. This can be explained

by the probability in forming a disconnected graph. For a large network,

such a probability is much higher than that of a small network. When a

disconnected graph is formed, the TPN calculation is skipped until the graph

becomes connected. Hence, this keeps the value of the TPN small (because

of the reduced value of the accumulated state-change probability) during the

ending states of the RPPM algorithm.

On the other hand, according to Table 5.3, one can observe that the time for

the PPM algorithm to collect enough packets is in the order of a few seconds


in a 100BaseT Ethernet1 . Therefore, although the tradeoff of the RPPM

algorithm could reach a multiple of 10, such a tradeoff is acceptable.

5.8.3 Scalability issue in PPM algorithm

Scalability is one of the weaknesses of the PPM algorithm. One can observe

that as the path length between the victim and the leaf router becomes longer,

it becomes more difficult to collect a complete set of the marked packets. Not

only the path length affects the traceback time, but also the size of the attack

graph matters. In Figure 5.25, one can observe that the number of marked

packets required to build the constructed graph is increasing with the size of

the graph, and the trend is not subsiding. This shows that the PPM algorithm

itself has a scalability problem. Nonetheless, as the RPPM algorithm inherits

the packet marking procedure from the PPM algorithm, the RPPM algorithm

also has the scalability problem.

As suggested in the previous sub-section, for small networks, the traceback

process takes only a few seconds to complete. For an ISP network, the total

number of routers, including the backbone routers as well as the customer

routers, is, by average, in an order of thousands (the minimum number is 131

while the maximum number is 8,751) [79]. Therefore, the PPM algorithm

can handle the traceback problem on ISP networks. However, for networks

as large as the one provided by [80], which contains the routing map of the

whole Internet (with nearly 200,000 routers and more than 600,000 directed

links), the PPM algorithm appears to be powerless, and it is believed that a

traceback process may take days to finish.

Solving the scalability problem of the PPM algorithm is not the scope of

this thesis. Rather, we find a suitable application for the PPM algorithm to

be deployed effectively. As the PPM algorithm can perform traceback on an

1Under a 100BaseT Ethernet, one can transmit at most 8,333 packets (each with 1,500bytes) in one second.


0

5000

10000

15000

20000

25000

30000

35000

40000

45000

0 100 200 300 400 500 600 700 800 900 1000

Ave

rage

Num

ber

of M

arke

d P

acke

ts

Number of Routers

Average Number of Marked Packets vs Size of Attack Graph

Figure 5.25: Scalability analysis: average number of marked packets collectedby the PPM algorithm versus the size of the attack graph.

ISP network in a reasonable time, this actually justifies our choice in choosing

the PPM algorithm to be the microscopic traceback algorithm.

5.8.4 Precision problem

The last deployment issue concerns the precision in the TPN calculation sub-

routine. The worst consequence of this problem is a reduced guarantee on the

correctness of the RPPM algorithm.

Cause of the problem

In the TPN calculation subroutine, the accumulative state-change probabil-

ity Xi−1 in Equation (5.4) is updated whenever a new edge is added to the

constructed graph. Theoretically speaking, Xi−1 > P ∗ is always true because:

∵ τ ∗ ∈ Z+ & 0 < P (T (Ge) = e′) < 1,

∴ 1 − (1 − P (T (Ge) = e′))τ∗

> P ∗

Xi−1⇒ Xi−1 > P ∗.


Though Xi−1 > P ∗ is always true, Xi−1 is approaching to P ∗ after each update.

At each update, a value less than one is multiplied to Xi−1.

Specifically, if P ∗ is very close to Xi−1, then it will result in a precision

problem: there may not have enough precision for the calculation of the ex-

pression log(1 − P ∗

Xi−1) in Equation (5.4). In the worst case, the expression

log(1 − P ∗

Xi−1) may become log(0), which is a floating-point exception.

The floating-point exception must be avoided, and, to avoid such a problem,

the TPN calculation subroutine should stop the update of the accumulated

state-change probability Xi−1.

Denote Xlimit as a real number between 0 and 1. When the difference

between P ∗ and Xi−1 is smaller than Xlimit, the TPN calculation algorithm

stops updating the accumulated state-change probability Xi−1. Then, Equa-

tion (5.5) (on Page 137), which originally updates Xi−1, is changed as follows.

Xi−1 =

Xi−2 ×(

1 −(

1 − P (T (Gi)=ei))τ∗

i−1

)

,

i > 1 &

|Xi−2 − P ∗| > Xlimit;

Xi−2,i > 1 &

|Xi−2 − P ∗| ≤ Xlimit;

1, i = 1.

(5.8)

Nevertheless, Equation (5.8) would lead to the failure of the guarantee on

the correctness of the constructed graph although the equation can effectively

prevent the difference between Xi−1 and P ∗ from becoming one.

Reduced correctness

We show why the guarantee is void. Let X∗i−1 be the value of the accumu-

lative state-change probability obtained by Equation (5.5), and let Xi−1 be

the bounded value of the accumulative state-change probability obtained by


Equation (5.8). Originally, the upper-bounded TPN τ ∗i is obtained as follows:

τ ∗i =

log(

1 − P ∗

X∗

i−1

)

log(1 − pmin)+ 1

.

Since X∗i−1 ≤ Xi−1, then

∵ log(x) is decreasing when 0 < x < 1,

⇒ log(

1 − P ∗

X∗

i−1

)

< log(

1 − P ∗

Xi−1

)

;

∵ log(x) is negative when 0 < x < 1,

⇒log

„

1− P∗

X∗

i−1

«

log(1−pmin)>

log“

1− P∗

Xi−1

”

log(1−pmin).

Therefore, the TPN obtained by the bounded accumulated state-change prob-

ability Xi−1 is smaller than that is obtained by the original accumulated state-

change probability X∗i−1.

The above finding implies a very undesirable consequence: the RPPM al-

gorithm terminates before it has truly reached the guaranteed correctness.

In the following, we introduce the runtime probability as a tool to under-

stand how the correct guarantee of the RPPM algorithm is void.

Runtime probability and graph reconstruction example

Runtime probability is the probability that the constructed graph is the same

as the attack graph calculated during the RPPM algorithm is running. By

definition, the accumulated state-change probability is already the runtime

probability, and the runtime probability is merely a more meaningful alias.

We take another look at the graph reconstruction example in Section 5.5

(on Page 139) to show how the runtime probability helps understanding how

the RPPM algorithm cannot provided the said guarantee. In the example,

we assume that Xlimit = 1, and thus the accumulative state-change probabil-

ity would not be updated when a new edge is added. Still, we assume that

the constructed graph is always connected. The marking probability and the

traceback confidence level P ∗ are both 0.5.


When the first marked packet arrives, it should encode the edge (R1, v) in

Figure 4.7. But, this time, X1 should not be updated and X1 is still equal to

1. Then, when the edge (R2, R1) is added to the constructed graph, the TPN

at state C2 becomes:

τ2 =

⌊

log(

1 − 0.51

)

log(

1 − 17

) + 1

⌋

= ⌊5.4966⌋ = 5 ;

If the number of marked packets arrived at the victim is larger than τ2

before the third edge arrives, the runtime probability is:

(

1 −(

1 − 1

3

)2)

×(

1 −(

1 − 1

7

)5)

= 0.2985 .

while, according to Section 5.5 (on Page 139), the original runtime probability

is:(

1 −(

1 − 1

3

)2)

×(

1 −(

1 − 1

7

)15)

= 0.5005 .

The above example shows that the runtime probability is 0.2985, and this

means the RPPM algorithm could provide only a guarantee on the correctness

of the constructed graph of 0.2985 although the required guarantee is 0.5.

Repeated executions

We propose that the RPPM algorithm should be executed more than one time

in order to provide the promised correctness again. The trick of the repeated

execution method is to treat the returned constructed graph from the previous

execution instance as the input of the new execution instance.

Intuitively, this method works as follows. Say, at the first execution instance

of the RPPM algorithm, the construction graph is not yet the attack graph.

Then, continuing the second execution instance means giving the constructed

graph chances to continue to evolve.

Mathematically, denote Prun,i as the runtime probability of the ith instance

of the RPPM algorithm. Then, the probability that the constructed graph is


Repeated RPPM Algorithm(Traceback Confidence Level P ∗)

1. Execute the RPPM algorithm at traceback confidence level P ∗ andwith an empty constructed graph;

2. Obtain the runtime probability P run;3. Obtain the constructed graph Gc;4. While P run < P ∗ ; do5. Execute the RPPM algorithm at traceback confidence level P ∗ and

with the constructed graph Gc;6. Obtain the runtime probability P run′;7. Obtain the constructed graph Gc;8. P run := 1 − ((1 − P run) × (1 − P run′));11. Done

Figure 5.26: The pseudocode of repeating the RPPM algorithm to increasethe runtime probability.

correct after n consecutive executions of the RPPM algorithm, P (repeat n times),

is given by:

P (repeat n times) = 1 −n∏

i=1

(1 − Prun,i) . (5.9)

Since Prun,i > 0, as n increases, P (repeat n times) also increases. There-

fore, one can keep repeating the execution of the RPPM algorithm until the

guaranteed correctness is reached.

Pseudocode of repeated execution

Figure 5.26 shows the pseudocode of repeating the executions of the RPPM

algorithm until the probability that the constructed graph is same as the attack

graph is larger than the traceback confidence level P ∗.

The pseudocode works as follows. After the RPPM algorithm has been

executed for the first time, the runtime probability as well as the constructed

graph can be obtained after. If the runtime probability P run is less than


the traceback confidence level P ∗, then the RPPM algorithm algorithm will

be executed repeatedly until P run is larger than P ∗.

To summarize, the RPPM algorithm has a precision problem when it is

deployed. We observed that the precision problem would cause the RPPM

algorithm to fail to guarantee on the correctness of the constructed graph. We

propose executing the RPPM algorithm repeatedly so as to increase such a

correctness until the traceback confidence level P ∗ is reached.

5.9 Chapter Summary

Based on the termination condition analysis, one can conclude that the ex-

pected sufficient packet number (described and derived in Chapter 4) is not a

desirable termination condition of the PPM algorithm. Yet, there is a need for

the PPM algorithm to have a guarantee of the correctness of the constructed

graph.

In this chapter, we have suggested a new termination condition of the PPM

algorithm. We devised the rectified graph reconstruction procedure that gives

a precise termination condition for the PPM algorithm, and we called the

new traceback approach the rectified probabilistic packet marking algorithm

(RPPM algorithm for short). The RPPM algorithm, on one hand, does not

require any previous knowledge about the network graph, and, on the other

hand, guarantees that the constructed graph is a correct one with a specified

probability, and such a probability is an input parameter of the algorithm.

We have carried out a series of simulations to show the correctness and

robustness of the RPPM algorithm. The simulation results show that the

RPPM algorithm can always satisfy our claim that the constructed graph is

correct with a given probability. Also, the algorithm is robust under different

values of the marking probability and different structures of the attack graphs.

Moreover, we have addressed issues when the RPPM algorithm is deployed. To


conclude, the RPPM algorithm is an effective means to improve the reliability

of the original PPM algorithm.

Conclusion and Future Work

In this thesis, we focus on the defense mechanisms against the distributed

denial-of-service attack (DDoS attack for short). Specifically, we target on the

traceback of the locations of the attackers who are demonstrating a flood-based

DDoS attack. Yet, we narrow down our scope, and we consider only the sources

that are sending out attack traffic as the attackers.

We have proposed a revolutionary, divide-and-conquer traceback method-

ology, and the methodology is twofold. When a global-scale attack happens,

we proposed that the first step of the traceback process is to locate the Internet

service providers (ISPs for short) that are contributing overwhelming traffic

through a macroscopic traceback algorithm. Once the problematic ISPs are

uncovered, in the next step, each concerned ISP should locate the attackers

within its administrative domain using a microscopic traceback algorithm.

Such a divide-and-conquer approach has two merits. First, it provides a

fast way to confine the domain of the DDoS attack. Second, if the scale of

the DDoS attack is large, this approach divides the traceback problem into

several sub-problems, and conquers them in a parallel manner. In the thesis,

we first devised a macroscopic traceback algorithm called the distributed snap-

shot traceback algorithm. Then, we employed and enhanced the well-known

probabilistic packet marking algorithm, which suits the conditions of being a

microscopic traceback algorithm.

The distributed snapshot traceback algorithm (snapshot algorithm for short)

173


is the first traceback algorithm of its kind. Leveraging the well-known Chandy-

Lamport distributed snapshot algorithm, the snapshot algorithm coordinates

the border routers of the ISPs, and collects statistics from the border routers

in a distributed sense. Given the collected data, the victim can determine the

ISPs that contain the possible locations of the attackers. The proof has justified

the correctness of the algorithm, and the simulation results have demonstrated

the robustness of the algorithm.

The probabilistic packet marking algorithm (PPM algorithm for short) is a

prized traceback approach in terms of simplicity and effectiveness, and it is one

of the best candidates for a microscopic traceback algorithm. Yet, algorithm

this is a renowned traceback algorithm, the termination condition of the PPM

algorithm is, however, seldom studied in the literature. Nevertheless, our

finding has shown that the well-accepted termination condition of the PPM

algorithm is not correct in general cases. Worse, the defective termination

condition would lead to incorrect traceback results.

Having known that the traditional termination condition is defective, we

provided a discrete-time Markov chain model that corrects the faults in the

calculation of the traditional termination condition of the PPM algorithm.

Nevertheless, the effort spent on correcting such a calculation is in vain. In

order to have a precise calculation of the tradition termination condition, one

has to know the paths taken by the attack traffic in advance. However, these

paths are the results that the PPM algorithm aims to achieve. This contra-

dicting condition led us to discontinue to follow the traditional termination

condition of the PPM algorithm.

On the contrary, we introduced a new termination condition for the PPM

algorithm, and based on the new termination condition, we devised the rectified

probabilistic packet marking algorithm (RPPM algorithm for short). The most

signification contribution of the RPPM algorithm is that when the algorithm

terminates, the algorithm guarantees the traceback result with a specified level


of confidence. Our finding showed that the RPPM algorithm can provide such

a guarantee under different deployment scenarios. In conclusion, the RPPM

algorithm provides an autonomous way for the original PPM algorithm to

determine its termination, and it is a promising means to enhance the reliability

of the PPM algorithm.

Though the proposed solutions in the thesis are self-contained, there is

room for future research. For both the distributed snapshot traceback algo-

rithm and the RPPM (PPM) algorithm, they are prone to attacks caused by

compromised border routers or fake request (or marker) packets. A tailor-made

authentication protocol can be designed to resist the attacks, in a best-effort

manner.

For the RPPM algorithm, the scalability problem is worth noticing. As

mentioned in Section 5.8.3, when the scale of the real attack graph increases,

the number of the marked packets required to have the correct constructed

graph also increases. This prohibits the PPM algorithm to be deployed in

a world-wide scale. One possible research direction is to devise a methodol-

ogy to adaptively change the marking probability. This solution can minimize

the number of packets required by cleverly changing the marking probability.

Though it is believed that this approach would work mathematically, the pro-

tocol for autonomously changing the marking probability may be difficult to

formulate. Further research effort is required.

Bibliography

[1] F. Lau, S. H. Rubin, M. H. Smith, and L. Trajkovic, “Distributed Denial

of Service Attacks,” in IEEE International Conference on Systems, Man,

and Cybernetics, pp. 2275–2280, 2000.

[2] “Computer Emergency Response Team, CERT Advisory CA-2000-

01: Denial-of-Service Developments, http://www.cert.org/advisories/CA-

2000-01.html.”

[3] “Computer Emergency Response Team, CERT Advisory CA-1996-21:

TCP SYN Flooding and IP Spoofing Attacks, http://www.cert.org/-

advisories/CA-1996-21.html.”

[4] “DARPA Internet Program. RFC 793: Transmission Control Protocol,”

Sept. 1981.

[5] J. Lemon, “Resisting SYN Flood DoS Attacks with a SYN Cache,” in

Proceedings of BSDCON 2002, pp. 89–98, 2002.

[6] A. Kuzmanovic and E. W. Knightly, “Low-rate TCP-Targeted Denial of

Service Attacks: the Shrew vs. the Mice and Elephants,” in Proceedings

of ACM SIGCOMM 2003, pp. 75–86, 2003.

[7] H. Sun, J. C. S. Lui, and D. K. Y. Yau, “Distributed Mechanism in

Detecting and Defending Against the Low-rate TCP Attack,” Computer

Networks Journal, vol. 50, Sep 2006.

176

[8] H. Sun, J. C. S. Lui, and D. K. Y. Yau, “Defending Against Low-rate

TCP Attack: Dynamic Detection and Protection,” in IEEE International

Conference on Network Protocols (ICNP), Berlin, Germany, 2004.

[9] A. Shevtekar, K. Anantharam, and N. Ansari, “Low Rate TCP Denial-

of-Service Attack Detection at Edge Routers,” IEEE Communications

Letters, vol. 9, pp. 262–265, April 2005.

[10] J. Elliott, “Distributed Denial of Service Attacks and the Zombie Ant

Effect,” IT Professional, vol. 2, no. 2, pp. 55–57, 2000.

[11] R. Chang, “Defending against Flooding-based Distributed Denial-of-

Service Attacks: a Tutorial,” IEEE Communications Magazine, vol. 40,

no. 10, pp. 42–51, 2002.

[12] S. Dietrich, N. Long, and D. Dittrich, “Analyzing Distributed Denial of

Service Tools: The Shaft Case,” in Proceedings of the 14th System Ad-

ministration Conference, pp. 329–339, 2000.

[13] W. Lee and S. J. Stolfo, “A Framework for Constructing Features and

Models for Intrusion Detection Systems,” ACM Transactions on Infor-

mation and System Security (TISSEC), vol. 3, no. 4, pp. 227–261, 2000.

[14] J. Beale, Snort 2.1 Intrusion Detection, Second Edition. Syngress, 2 ed.,

May 2004.

[15] V. Paxson, “An Analysis of Using Reflectors for Distributed Denial-of-

Service Attacks,” ACM SIGCOMM Computer Communication Review,

vol. 31, no. 3, pp. 38 – 47, 2001.

[16] N. Naoumov and K. Ross, “Exploiting P2P Systems for DDoS Attacks,”

in Proceedings of the 1st International Conference on Scalable Information

Systems. Article Number 47, 2006.

177

[17] J. Barlow and W. Thrower, “TFN2K - An Analysis. The AXENT Secu-

rity Team. http://www.symantec.com/avcenter/security/Content/2000-

02 10 a.html,” 2000.

[18] D. Dittrich, “The DoS Project’s “Trinoo” Distributed Denial of Service

Attack Tool. http://staff.washington.edu/dittrich/misc/trinoo.analysis,”

1999.

[19] D. Dittrich, “The “Stacheldraht” Distributed Denial of Service Attack

Tool. http://staff.washington.edu/dittrich/misc/stacheldraht.analysis,”

1999.

[20] C. C. Zou, W. Gong, and D. Towsley, “Code Red Worm Propagation

Modeling and Analysis,” in Proceedings of the 9th ACM Conference on

Computer and Communications Security, pp. 138–147, 2002.

[21] “Netcraft: Web Server Survey Archive. http://www.netcraft.com.”

[22] D. Moore, V. Paxson, S. Savage, C. Shannon, S. Staniford, and N. Weaver,

“Inside the slammer worm,” IEEE Security and Privacy, vol. 1, no. 4,

pp. 33–39, 2003.

[23] S. Adler, “The Slashdot Effect: an Analysis of Three Internet Publica-

tions. http://ssadler.phy.bnl.gov/adler/SDE/SlashDotEffect.html,” 1999.

[24] R. Mahajan, S. M. Bellovin, S. Floyd, J. Ioannidis, V. Paxson, and

S. Shenker, “Controlling High Bandwidth Aggregates in the Network,”

ACM SIGCOMM Computer Communication Review, vol. 32, pp. 62–73,

Jul 2002.

[25] X. Chen and J. Heidemann, “Flash Crowd Mitigation via Adaptive Ad-

mission Control based on Application-Level Observations,” ACM Trans-

actions on Internet Technology (TOIT), vol. 5, no. 3, pp. 532–569, 2005.

178

[26] J. Mirkovic and P. Reiher, “A Taxonomy of DDoS Attack and DDoS De-

fense Mechanisms,” ACM SIGCOMM Computer Communication Review,

vol. 34, no. 2, pp. 39 – 53, 2004.

[27] A. Hussain, J. Heidemann, and C. Papadopoulos, “A Framework for Clas-

sifying Denial of Service Attacks,” in Proceedings of the 2003 conference

on Applications, Technologies, Architectures, and Protocols for Computer

Communications (SIGCOMM), pp. 99 – 110, 2003.

[28] D. Barry, “Proactive Protection: New techniques and best practices help

service providers counter increase in cyber attacks,” Packet: Cisco Sys-

tems Users Magazine, vol. 16, no. 1, pp. 64–68, 2004.

[29] T. Y. Wong, K. T. Law, J. C. S. Lui, and M. H. Wong, “An Efficient Dis-

tributed Algorithm to Identify and Traceback DDoS Traffic,” The Com-

puter Journal, vol. 49, no. 4, pp. 418–442, 2006.

[30] S. Savage, D. Wetherall, A. Karlin, and T. Anderson, “Practical Network

Support for IP Traceback,” in Proceedings of the 2000 ACM SIGCOMM

Conference, pp. 295–306, 2000.

[31] T. Y. Wong, J. C. S. Lui, and M. H. Wong, “Markov Chain Modeling

of the Probabilistic Packet Marking Algorithm,” International Journal of

Network Security, vol. 5, no. 1, pp. 32–40, 2007.

[32] T. Y. Wong, M. H. Wong, and J. C. S. Lui, “A Precise Termination

Condition of the Probabilistic Packet Marking Algorithm,” Accepted by

IEEE Transactions on Dependable and Secure Computing, August 2007.

[33] E. Dijkstra and C. Scholten, “Termination detection for diffusing com-

putuations,” Information Processing Letter, vol. 11, pp. 1–4, Aug 1980.

179

[34] K. Chandy and L. Lamport, “Distributed Snapshots: Determining Global

States of Distributed Systems,” ACM Transactions on Computer Systems,

vol. 3, pp. 63–75, Feb 1985.

[35] L. Lamport, “Time, Clocks, and the Ordering of Events in a Distributed

System,” Communications of the ACM, vol. 21, pp. 558–565, Jul 1978.

[36] M. J. Fischer, N. D. Griffeth, and N. A. Lynch, “Global States of a Dis-

tributed System,” IEEE Transactions on Software Engineering, vol. 8,

no. 3, pp. 198–202, 1982.

[37] R. Koo and S. Toueg, “Checkpointing and Rollback-Recovery for Dis-

tributed Systems,” IEEE Transactions on Software Engineering, vol. 13,

no. 1, pp. 23–31, 1987.

[38] E. N. Elnozahy, L. Alvisi, Y. M. Wang, and D. B. Johnson, “A Survey of

Rollback-Recovery Protocols in Message-Passing Systems,” ACM Com-

puting Surveys, vol. 34, no. 3, pp. 375–408, 2002.

[39] P. A. Bernstein, V. Hadzilacos, and N. Goodman, Concurrency Control

and Recovery in Database Systems. Addison-Wesley, 1987.

[40] S. Bellovin, “Security Problems in the TCP/IP Protocol Suite,” ACM

Computer Communications Review, vol. 19, no. 2, pp. 32 – 48, 1989.

[41] P. Ferguson and D. Senie, “RFC 2267: Network Ingress Filtering: Defeat-

ing Denial of Service Attacks which Employ IP Source Address Spoofing,”

The Internet Society, January 1998.

[42] “Egress filtering v 0.2, global incident analysis center. http://-

www.sans.org/y2k/egress.htm.”

180

[43] K. Park and H. Lee., “On the Effectiveness of Route-Based Packet Filter-

ing for Distributed DoS Attack Prevention in Power-Law Internets,” in

Proceedings of ACM SIGCOMM 2001, pp. 15 – 26, 2001.

[44] D. K. Y. Yau, J. C. S. Lui, F. Liang, and Y. Yam, “Defending Against

Distributed Denial-of-service Attacks with Max-min Fair Server-centric

Router Throttles,” IEEE/ACM Transactions on Networking, vol. 13,

no. 1, pp. 29–42, 2005.

[45] S. Chen and Q. Song, “Perimeter-Based Defense against High Bandwidth

DDoS Attacks,” IEEE Transactions on Parallel and Distributed Systems,

vol. 16, no. 6, pp. 526– 537, 2005.

[46] J. Xu and W. Lee, “Sustaining Availability of Web Services under Dis-

tributed Denial of Service Attacks,” IEEE Transactions on Computers,

vol. 52, no. 2, 2003.

[47] K. T. Law, J. C. S. Lui, and D. K. Y. Yau, “You Can Run, But You Can’t

Hide: An Effective Methodology to Traceback DDoS Attackers,” IEEE

Transactions on Parallel and Distributed Systems, vol. 15, no. 9, pp. 799

– 813, 2005.

[48] D. X. Song and A. Perrig, “Advanced and Authenticated Marking

Schemes for IP Traceback,” in Proceedings of IEEE INFOCOM ’01,

pp. 878–886, April 2001.

[49] K. Park and H. Lee., “On the Effectiveness of Probabilistic Packet Mark-

ing for IP Traceback under Denial of Service Attack,” in Proceedings of

IEEE INFOCOM ’01, pp. 338 – 347, 2001.

[50] D. Dean, M. Franklin, and A. Stubblefield, “An Algebraic Approach to

IP Traceback,” in Proceedings of Network and Distributed System Security

Symposium, NDSS ’01, February 2001.

181

[51] D. Dean, M. Franklin, and A. Stubblefield, “An Algebraic Approach to

IP Traceback,” ACM Transactions on Information and System Security

(TISSEC), vol. 5, no. 2, pp. 119–137, 2002.

[52] A. C. Snoeren, C. Partridge, L. A. Sanchez, C. E. Jones, F. Tchakountio,

S. T. Kent, and W. T. Strayer, “Hash-Based IP Traceback,” in Proceedings

of the ACM SIGCOMM 2001 Conference on Applications, Technologies,

Architectures, and Protocols for Computer Communication, pp. 3–14, Au-

gust 2001.

[53] M. Adler, “Trade-Offs in Probabilistic Packet Marking for IP Traceback,”

Journal of the ACM, vol. 52, pp. 217–244, March 2005.

[54] M. Sung and J. Xu, “IP Traceback-based Intelligent Packet Filtering: A

Novel Technique for Defending Against Internet DDoS Attacks,” IEEE

Transactions on Parallel and Distributed Systems, vol. 14, no. 9, pp. 861–

872, 2003.

[55] A. Belenky and N. Ansari, “IP Traceback with Deterministic Packet

Marking,” IEEE Communications Letters, vol. 7, pp. 162–164, April 2003.

[56] A. Belenky and N. Ansari, “On IP Traceback,” IEEE Communications

Magazine, vol. 41, pp. 142–153, July 2003.

[57] Z. Gao and N. Ansari, “Tracing Cyber Attacks from the Practical Per-

spective,” IEEE Communications Magazine, vol. 43, pp. 123–131, May

2005.

[58] D. E. Taylor, J. W. Lockwood, T. S. Sproull, and D. B. P. Jonathan

S. Turner, “Scalable IP Lookup for Programmable Routers,” in Proceed-

ings of IEEE Infocom, 2002.

182

[59] L. L. Peterson, S. Karlin, and K. Li, “OS Support for General-Purpose

Routers,” in Workshop on Hot Topics in Operating Systems, pp. 38–43,

1999.

[60] X. Qie, A. Bavier, L. Peterson, and S. Karlin, “Scheduling Computa-

tions on a Software-Based Router,” in Proceedings of ACM SIGMET-

RICS, June 2001.

[61] D. K. Y. Yau and X. Chen, “Resource Management in Software-

Programmable Router Operating Systems,” IEEE Journal on Selected

Areas in Communications (JSAC), vol. 19, March 2001.

[62] “Internet Mapping Project, http://research.lumeta.com/ches/map/-

index.html,” 1999.

[63] Steven M. Bellovin and Marcu Leech and Tom Taylor, ICMP Traceback

Messages, Internet Draft: draft-bellovin-itrace-04.txt, Feb 2003.

[64] “Cooperative Association for Internet Data Analysis, http://-

www.caida.org/.”

[65] S. F. Wu, L. Zhang, D. Massey, and A. Mankin, Intention-Driven ICMP

Trace-Back, Internet Draft: draft-wu-itrace-intention-00.txt. submission

date Feb. 2001, expiration date Aug. 2001.

[66] A. Mankin, D. Massey, C.-L. Wu, S. F. Wu, and L. Zhang, “On Design

and Evaluation of Intention-Driven ICMP Traceback,” in Proceedings of

IEEE Int. Conference on Computer Communications and Networks, 2001.

[67] B. C. Chan, J. C. Lau, and J. C. Lui, “OPERA: An Open-source Extensi-

ble Router Architecture for Adding New Network Services and Protocols,”

Journal of Systems and Software, vol. 78, no. 1, pp. 24–36, 2005.

[68] “The Netfiler/iptables Projects. http://www.netfilter.org.”

183

[69] J. Harris and A. J. Melara, “Performance analysis of the linux fire-

wall in a host,” in CiNIC - Calpoly intelligent NIC Project, http://-

www.ee.calpoly.edu/3comproject/, 2002.

[70] K. R. Tao Peng, Christopher Leckie, “Adjusted Probabilistic Packet

Marking for IP Traceback,” in NETWORKING 2002, pp. 697–708, 2002.

[71] C. Hedrick, “RFC 1058: Routing Information Protocol,” The Internet

Society, June 1988.

[72] J. Moy, “RFC 2328: Open Shortest Path First (OSPF) Version 2,” The

Internet Society, April 1998.

[73] H. von Schelling, “Coupon Collecting for Unequal Probabilities,” Amer-

cian Mathematics Monthly, vol. 61, pp. 306–311, 1954.

[74] P. J. Courtois, Decomposability: Queueing and computer system applica-

tions. Academic Press, 1977.

[75] L. Golubchik and J. C. Lui, “Bounding of Performance Measures for

Threshold-based Queueing Systems: Theory and Application to Dynamic

Resource Management in Video-on-Demand Servers,” IEEE Transactions

of Computers, vol. 51, pp. 353–372, Apr. 2002.

[76] K. S. Trivedi, Probability and Statistics with Reliability, Queuing and

Computer Science Applications. Wiley-Interscience, 2002.

[77] U. Bhat, Elements of Applied Stochastic Processes. New York: Wiley,

1984.

[78] V. Paxson, “End-to-end Routing Behavior in the Internet,” IEEE/ACM

Transactions on Network-ing, vol. 5, pp. 601–615, Oct. 1997.

184

[79] N. Spring, R. Mahajan, D. Wetherall, and T. Anderson, “Measuring ISP

Topologies with Rocketfuel,” IEEE/ACM Transactions on Networking,

vol. 12, no. 1, pp. 2–16, 2004.

[80] Cooperative Association for Internet Data Analysis, CAIDA, “CAIDA’s

Router-Level Topology Measurements, http://www.caida.org/tools/-

measurement/skitter/router topology/.”

185

Documents

On Tracing Attackers of Distributed Denial-of-Service Attack through