Upload
vokhanh
View
219
Download
1
Embed Size (px)
Citation preview
On Tracing Attackers of DistributedDenial-of-Service Attack through
Distributed Approaches
WONG, Tsz Yeung
A Thesis Submitted in Partial Fulfillment
of the Requirements for the Degree of
Doctor of Philosophy
in
Computer Science and Engineering
c©The Chinese University of Hong Kong
September 2007
The Chinese University of Hong Kong holds the copyright of this thesis. Any
person(s) intending to use a part or the whole of the materials in the thesis
in a proposed publication must seek copyright release from the Dean of the
Graduate School.
Thesis/Assessment Committee
Professor YOUNG Fung Yu (Chair)
Professor WONG Man Hon (Thesis Supervisor)
Professor LEE Moon Chuen (Committee Member)
Professor LEONG Hong Va (External Examiner)
Abstract
The denial-of-service attack has been a pressing problem in recent years.
Denial-of-service defense research has blossomed into one of the main streams
in network security. Various techniques such as the pushback message, the
ICMP traceback, and the packet filtering techniques are the remarkable re-
sults from this active field of research.
The focus of this thesis is to study and devise efficient and practical al-
gorithms to tackle the flood-based distributed denial-of-service attacks (flood-
-based DDoS attack for short), and we aim to trace every location of the
attacker. In this thesis, we propose a revolutionary, divide-and-conquer trace-
back methodology. Tracing back the attackers on a global scale is always a
difficult and tedious task. Alternatively, we suggest that one should first iden-
tify Internet service providers (ISPs) that contribute to the flood-based DDoS
attack by using a macroscopic traceback approach. After the concerned ISPs
have been found, one can narrow the traceback problem down, and then the
attackers can be located by using a microscopic traceback approach.
For the macroscopic traceback problem, we propose an algorithm, which
leverages the well-known Chandy-Lamport’s distributed snapshot algorithm,
so that a set of border routers of the ISPs can correctly gather statistics in a
coordinated fashion. The victim site can then deduce the local traffic intensi-
ties of all the participating routers. Given the collected statistics, we provide
a method for the victim site to locate the attackers who sent out dominat-
ing flows of packets. Our finding shows that the proposed methodology can
i
pinpoint the location of the attackers in a short period of time.
In the second part of the thesis, we study a well-known technique against
the microscopic traceback problem. The probabilistic packet marking (PPM
for short) algorithm by Savage et.al. has attracted the most attention in con-
tributing the idea of IP traceback. The most interesting point of this IP
traceback approach is that it allows routers to encode certain information on
the attack packets based on a pre-determined probability. Upon receiving a
sufficient number of marked packets, the victim (or a data collection node) can
construct the set of paths the attack packets traversed (or the attack graph),
and hence the victim can obtain the locations of the attackers. In this the-
sis, we present a discrete-time Markov chain model that calculates the precise
number of marked packets required to construct the attack graph.
Though the PPM algorithm is a desirable algorithm that tackles the mi-
croscopic traceback problem, the PPM algorithm is not perfect as its ter-
mination condition is not well-defined in the literature. More importantly,
without a proper termination condition, the traceback results could be wrong.
In this thesis, we provide a precise termination condition for the PPM algo-
rithm. Based on the precise termination condition, we devise a new algorithm
named the rectified probabilistic packet marking algorithm (RPPM algorithm
for short). The most significant merit of the RPPM algorithm is that when
the algorithm terminates, it guarantees that the constructed attack graph is
correct with a specified level of confidence. Our finding shows that the RPPM
algorithm can guarantee the correctness of the constructed attack graph un-
der different probabilities that the routers mark the attack packets and dif-
ferent structures of the network graphs. The RPPM algorithm provides an
autonomous way for the original PPM algorithm to determine its termination,
and it is a promising means to enhance the reliability of the PPM algorithm.
ii
摘摘摘要要要
這數年間,「分散式阻斷服務攻擊」已成為一個迫切需要解決的問題。故防
治分散式阻斷服務攻擊的研究已成為一個主要的網絡保安課題。這活躍的研
究疇產生了多個卓越的研究結果,如pushback信息技術、ICMP追蹤技術及封
包過濾技術等。
本論文主要研究防治「洪水式阻斷服務攻擊」(簡稱「洪水攻擊」)的
方案,及設計可行的、高效的演算法以防治洪水攻擊。本論文主要研究方
向,是研究方案用以找出洪水攻擊的發動地點。我們提出一種創新的、以分
治法為本的追蹤技術,以追蹤發動洪水攻擊的地點。洪水攻擊的規模往往是
全球性的,故追蹤發動攻擊的地點亦往往是困難與煩瑣的。因此,我們提出
一個二步的追蹤方案,以追蹤全球性洪水攻擊的發動地點。第一步,所有的
網絡供應商要合作,以找出那些網絡供應商包含了洪水攻擊的發動地點。這
一步,我們稱之為「宏觀追蹤方案」。下一步,當發現了那些網絡供應商包
含了洪水攻擊的發動地點,有關的網絡供應商便會採用「微觀追蹤方案」,
以追蹤在網絡供應商內的所有的洪水攻擊的發動地點。
本論文提出一「宏觀追蹤演算法」。該宏觀追蹤演算法是建基於有名的
「Chandy-Lamport分佈式快照演算法」,以進行分佈式的追蹤。我們命名
該演算法為「快照追蹤演算法」。快照追蹤演算法是在各網絡供應商的邊界
路由器上執行的,而這些路由器將按照演算法的指示合作地收集數據,再把
數據送給洪水攻擊的受害網站。根據路由器的數據,受害網站即可以排列出
各網絡供應商輸出的攻擊流量。從而,受害網站即可以找出有可能的攻擊發
動地點。根據我們的研究發現,快照追蹤演算法是一個高效的演算法,能在
短時間內找出攻擊發動地點。
iii
本論文接著探討「微觀追蹤演算法」。「或然性封包編碼演算法」(簡
稱PPM演算法)是一個備受著目的IP追蹤演算法,而該演算法亦是適合成為
「微觀追蹤演算法」。PPM演算法值得留意的特點,在於其根據一個預先設
定的或然率,稱為「編碼或然率」,在網絡供應商內的路由器上,把封包選
擇性的編碼。當洪水攻擊的受害網站收到足夠的已編碼封包,PPM演算法便
可以計算出攻擊封包的行走路線。從而,PPM演算法便可以找出洪水攻擊的
發動地點。在本論文中,我們研究出一個「馬爾可夫鏈模型」,能讓受害網
站準確地計算出需要的已編碼封包數量,以計算出準確的攻擊封包行走路
線。
縱使PPM演算法是一個優秀的微觀追蹤演算法,可惜的是,由於現在沒
有研究項目把它的「停止運作條件」作明確的定義,因此PPM演算法並不算
是一個完美的演算法。更重要的是,若果PPM演算法的停止運作條件是錯誤
的話,它的追蹤結果(即攻擊封包的行走路線)將會是錯誤的。本論文將
為PPM演算法,研究出一個精確的停止運作條件。由於新的停止運作條件將
為PPM演算法帶來改變,我們把新的演算法命名為「修正的或然性封包編
碼演算法」(簡稱RPPM演算法)。RPPM演算法最重要的價值,在於它能保
證RPPM演算法的追蹤結果是在一個指定的準確度以上。我們的研究發現,在
不同的編碼或然率及網絡架構之下,RPPM演算法都可以保證追蹤結果是在指
定的準確度之上。總結RPPM演算法的優點,是其能為PPM演算法帶來自動化
的停止運作條件,從而提高了PPM演算法的可靠性。
iv
Acknowledgement
In completing this thesis, I am most grateful to my thesis advisor, Dr. Man-
hon Wong, and my former thesis advisor, Dr. John Chi-shing Lui, who have
been giving continuous support and guidance to me throughout the past five
years.
I am also glad to have my colleagues in the Department Computer Science
and Engineering, especially Mr. C. M. Lee, Mr. Ray Lam, Mr. Y. T. Ma, Mr.
T. B. Ma, Mr. Y. K. Liu, Mr. Y. K. Hui, Ms. Catherine Zhou, and Dr. L.C.
Lau. They have given me invaluable advice and support through my years of
research life.
Last but not least, I am most glad to have Ms. Elaine Chan who has been
giving me unconditional love and the strength to get through the difficulties I
encountered.
v
Contents
1 Defense Against Denial-of-Service Attack 1
1.1 Overview of Attack Methodology . . . . . . . . . . . . . . . . . 2
1.1.1 Vulnerability-based attack . . . . . . . . . . . . . . . . . 3
1.1.2 Flood-based attack . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Worm attack . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.4 Flash crowd . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Scope of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.1 General assumptions . . . . . . . . . . . . . . . . . . . . 10
1.2.2 A divide-and-conquer traceback approach . . . . . . . . . 11
1.3 Structure and Contribution of the Thesis . . . . . . . . . . . . . 14
1.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4.1 Distributed Snapshot Algorithm . . . . . . . . . . . . . . 16
1.4.2 DDoS Defense Mechanisms . . . . . . . . . . . . . . . . . 17
2 Distributed Snapshot Traceback Algorithm 22
2.1 Overview and Problem Definition . . . . . . . . . . . . . . . . . 24
2.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.1.2 Problem definition . . . . . . . . . . . . . . . . . . . . . 25
2.1.3 Traceback methodology . . . . . . . . . . . . . . . . . . 28
2.1.4 How to perform the traceback . . . . . . . . . . . . . . . 29
2.1.5 Difficulties of a distributed traffic measurement . . . . . 31
vi
2.2 Distributed Algorithm . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.1 Reasons for incorrect traceback result . . . . . . . . . . . 33
2.2.2 Measuring the correct local traffic . . . . . . . . . . . . . 33
2.2.3 The distributed snapshot algorithm . . . . . . . . . . . . 34
2.2.4 Pseudocode and execution of snapshot algorithm . . . . 36
2.2.5 Example in calculating the traceback result . . . . . . . 42
2.3 Interpreting the Traceback Result . . . . . . . . . . . . . . . . . 46
2.3.1 Investigation of the traffic inequality . . . . . . . . . . . 47
2.3.2 Calculating bounds for the number of packets arrived at
the victim site . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4 Performance Evaluations . . . . . . . . . . . . . . . . . . . . . . 53
2.5 Implementation Issues . . . . . . . . . . . . . . . . . . . . . . . 63
2.5.1 Topology construction . . . . . . . . . . . . . . . . . . . 63
2.5.2 System overhead . . . . . . . . . . . . . . . . . . . . . . 64
2.5.3 Implementation issue based on ICMP traceback . . . . . 67
2.5.4 An alternative to aggregate congestion control and push-
back . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.5.5 Special deployment - acyclic network . . . . . . . . . . . 68
2.5.6 Partial deployment . . . . . . . . . . . . . . . . . . . . . 72
2.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . 77
3 Probabilistic Packet Marking Algorithm 79
3.1 Structure of This Chapter . . . . . . . . . . . . . . . . . . . . . 80
3.2 Goal and Structure of the PPM Algorithm . . . . . . . . . . . . 80
3.2.1 Global network and attack graph . . . . . . . . . . . . . 80
3.2.2 Constructed graph . . . . . . . . . . . . . . . . . . . . . 81
3.2.3 Structure of the PPM algorithm . . . . . . . . . . . . . . 82
3.3 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.3.1 Marked packets and PPM markings . . . . . . . . . . . . 85
vii
3.3.2 Router . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.3.3 Packet marking probability . . . . . . . . . . . . . . . . 86
3.3.4 Attack source and attack pattern . . . . . . . . . . . . . 87
3.3.5 Attack graph and packet routing . . . . . . . . . . . . . 88
3.4 Graph Reconstruction Example . . . . . . . . . . . . . . . . . . 90
3.4.1 Packet marking . . . . . . . . . . . . . . . . . . . . . . . 90
3.4.2 Attack graph reconstruction . . . . . . . . . . . . . . . . 91
3.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . 91
4 Termination Condition of PPM Algorithm 94
4.1 Using the Upper-Bound Packet Number as the Termination
Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.1.1 Failure under the multiple-attacker environment . . . . . 96
4.1.2 Simulation findings . . . . . . . . . . . . . . . . . . . . . 97
4.1.3 Chapter structure . . . . . . . . . . . . . . . . . . . . . . 98
4.2 Packet-Type Model . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.2.1 Packet-Type probability . . . . . . . . . . . . . . . . . . 101
4.2.2 Pseudocode of the calculation of the packet-type proba-
bilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.2.3 Illustration of the calculation of the packet-type proba-
bility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.3 Using Markov Chain Model to Find the Sufficient Packet Number107
4.3.1 The Markov process . . . . . . . . . . . . . . . . . . . . 107
4.3.2 Example on discrete-time Markov chain modeling . . . . 110
4.3.3 Fundamental matrix . . . . . . . . . . . . . . . . . . . . 113
4.3.4 Example on calculating E[X] . . . . . . . . . . . . . . . 117
4.4 Disproving the Upper-Bound Packet Number as the Termina-
tion Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . 119
viii
5 Rectified Probabilistic Packet Marking Algorithm 121
5.1 Structure of This Chapter . . . . . . . . . . . . . . . . . . . . . 122
5.2 Overview of the RPPM Algorithm . . . . . . . . . . . . . . . . . 123
5.2.1 Working principle . . . . . . . . . . . . . . . . . . . . . . 123
5.2.2 Flow of rectified graph reconstruction procedure . . . . . 124
5.3 Execution Diagram of the RPPM algorithm . . . . . . . . . . . 126
5.3.1 Types of states . . . . . . . . . . . . . . . . . . . . . . . 126
5.3.2 Types of transitions . . . . . . . . . . . . . . . . . . . . . 127
5.3.3 Worst-case, average-case, and best-case scenarios . . . . 128
5.3.4 Role of the execution diagram . . . . . . . . . . . . . . . 129
5.4 Derivation of Termination Packet Number . . . . . . . . . . . . 130
5.4.1 Technique . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.4.2 State-change probability . . . . . . . . . . . . . . . . . . 130
5.4.3 TPN derivation . . . . . . . . . . . . . . . . . . . . . . . 131
5.4.4 Section summary and TPN calculation subroutine . . . . 139
5.5 Graph Reconstruction Example . . . . . . . . . . . . . . . . . . 139
5.5.1 State C1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.5.2 State C2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.5.3 State C3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.6 Simulation Result . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.6.1 Simulation environment . . . . . . . . . . . . . . . . . . 143
5.6.2 Simulation: different values of the marking probability . 144
5.6.3 Simulation: different graph structures . . . . . . . . . . . 147
5.6.4 Section summary . . . . . . . . . . . . . . . . . . . . . . 151
5.7 Supporting Routers with Multiple Victim Routes . . . . . . . . 151
5.7.1 Problem of multiple victim routes . . . . . . . . . . . . . 152
5.7.2 Formulating an extra set of extended graphs . . . . . . . 153
5.7.3 Reformulation of packet-type probability . . . . . . . . . 154
5.7.4 Simulation: support for multiple victim routes . . . . . . 157
ix
5.7.5 Section summary . . . . . . . . . . . . . . . . . . . . . . 158
5.8 Deployment Issues of the RPPM Algorithm . . . . . . . . . . . 158
5.8.1 Choice of the marking probability . . . . . . . . . . . . . 158
5.8.2 Execution time comparison between the PPM and the
RPPM algorithms . . . . . . . . . . . . . . . . . . . . . . 161
5.8.3 Scalability issue in PPM algorithm . . . . . . . . . . . . 165
5.8.4 Precision problem . . . . . . . . . . . . . . . . . . . . . . 166
5.9 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . 171
Bibliography 176
x
List of Figures
1.1 The architecture of a typical flood-based DDoS attack. . . . . . 5
1.2 The architecture of a reflector attack. . . . . . . . . . . . . . . . 6
1.3 The overview of the divide-and-conquer traceback approach. . . 12
2.1 An example network topology. . . . . . . . . . . . . . . . . . . . 25
2.2 Asynchronous reading of outgoing traffic counters in Example B. 31
2.3 Correct accumulative local traffic without clock synchronization. 32
2.4 An example execution of the snapshot algorithm. . . . . . . . . 39
2.5 Bji = 0 under all circumstances. . . . . . . . . . . . . . . . . . . 41
2.6 Aji is the channel state. . . . . . . . . . . . . . . . . . . . . . . 41
2.7 A network topology with two attackers who reside in the local
domains of R3 and R4. . . . . . . . . . . . . . . . . . . . . . . . 43
2.8 A timing diagram that shows the progress of the distributed
snapshot traceback algorithm. . . . . . . . . . . . . . . . . . . . 44
2.9 Classification of pre-monitoring, monitoring and post-monitoring
packets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.10 The channel state of Link3,2 contains pre-monitoring (monitor-
ing) packets from both R3 and R4 in the first (second) instance
of the snapshot algorithm. . . . . . . . . . . . . . . . . . . . . . 51
2.11 (a) Network topology and (b) Legend for Simulations A and B. 54
2.12 Simulation A.1. Bounds for the real local traffic under constant
traffic rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
xi
2.13 Simulation A.2. The real local traffic under exponential on/off
process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.14 Simulation A.3. Effect of multiple attackers on the real local
traffic bounds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.15 Simulation A.4. Effect of new attackers’ locations. . . . . . . . . 58
2.16 Simulation A.5. On different attack traffic rates. . . . . . . . . . 59
2.17 Simulation B. Simulation for large scale Internet Topology. . . . 61
2.18 (a) An acyclic network with one attacker who resides in the
local domain of R3. (b) R3 maintains two accumulative outgo-
ing traffic counters C3,1(t) and C3,2(t) for the links Link3,1 and
Link3,2, respectively. . . . . . . . . . . . . . . . . . . . . . . . . 69
2.19 Another timing diagram that shows the progress of the dis-
tributed snapshot traceback algorithm. . . . . . . . . . . . . . . 71
2.20 (a) The same example network as Figure 2.7 with attacking
domains R3 and R4. But, the router R3 is an undeployed router.
(b) Logically, a virtual link between the router R2 and R4 is
formed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.21 A timing diagram that shows the progress of the DDoS trace-
back algorithm under the partial deployment environment. . . . 74
2.22 (a) In this example network, the router R2 is an undeployed
routers while the others are deployed routers. (b) As the un-
deployed router is transparent to the traceback protocol, the
router R1 records the channel state of the virtual links Link3,1
and Link4,1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.23 The timing diagram under a partial deployment environment.
A drawback is that the channel states of the virtual links Link3,1
and Link4,1 become indistinguishable at router R1. . . . . . . . 76
3.1 A typical case of a DDoS attack toward the victim V. . . . . . . 81
xii
3.2 The illustration of an attack graph: (a) an attack graph is not
the entire network; the attack graph is the paths traversed by
attack packets; (b) the attack graph may become larger than
the actual one due to the lack of legitimacy of the packets. . . . 82
3.3 The pseudocode of the packet marking procedure of the PPM
algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.4 The pseudocode of the path reconstruction procedure of the
PPM algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.5 The failure of the router R1 causes the route tables of R2, R3,
and R4 to change. This results in a constructed graph with
routers having multiple outgoing edges. . . . . . . . . . . . . . . 89
3.6 A step-by-step illustration of the reconstruction of the attack
graph based on the incoming packet sequence in Table 3.1. . . . 92
4.1 A six-router binary tree network: the upper-bound equation
cannot be applied under this multiple-attacker environment. . . 96
4.2 A eight-router tree network with four independent linear paths:
another multiple-attacker environment. . . . . . . . . . . . . . . 97
4.3 Simulation result: Number of marked packets required versus
number of independent paths. . . . . . . . . . . . . . . . . . . . 99
4.4 An increasing yet chaotic trend of the rate of change of the
number of marked packets required. . . . . . . . . . . . . . . . . 99
4.5 The pseudocode of the packet-type probability calculation – it
calculates the packet-type probability of every edge in the graph
G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.6 (a) Ga: A simple example linear network with three edges. (b)
Gb: An example network with multiple paths leading from R3
and R4 to the victim. . . . . . . . . . . . . . . . . . . . . . . . . 106
xiii
4.7 Example network G1: it is a linear network with three routers
and one victim. . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.8 Illustration of the Markov chain model of the PPM algorithm
with network G1 in Figure 4.7. . . . . . . . . . . . . . . . . . . . 111
4.9 The constructed transition probability matrix formulated of the
Markov chain shown in Figure 4.8. . . . . . . . . . . . . . . . . 112
4.10 Simulation result versus theoretical result: for network G1 in
Figure 4.7, we obtain two close sets of results for the distribution
of the sufficient packet number X. . . . . . . . . . . . . . . . . . 114
4.11 Example network G2: totally 16,384 Markov states. . . . . . . . 114
4.12 Probability distribution of the sufficient packet number on the
14-router binary-tree network, G2. . . . . . . . . . . . . . . . . . 115
4.13 Fundamental matrix calculated by Equation (4.13) with transi-
tion probability matrix P shown in Figure 4.9. . . . . . . . . . . 118
4.14 The comparison between the simulation and the theoretical re-
sults: both results disprove the linear property proposed by
previous work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.1 The design goal of the RPPM algorithm: to have a correct
constructed graph with probability greater than P ∗. . . . . . . . 123
5.2 The pseudocode of the rectified graph reconstruction procedure
of the RPPM algorithm. . . . . . . . . . . . . . . . . . . . . . . 125
5.3 An execution diagram of the rectified graph reconstruction pro-
cedure of the RPPM algorithm constructing a graph with n edges.127
5.4 Extended graph example: a constructed graph and its set of
extended graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.5 The pseudocode of the termination packet number (TPN) cal-
culation subroutine. . . . . . . . . . . . . . . . . . . . . . . . . . 138
xiv
5.6 State C1: a constructed graph with one edge, and its extended
graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.7 State C2: a constructed graph with two edge, and its extended
graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.8 The simulations show that the larger the marking probability
is, the closer to the worst-case execution the simulation result is. 145
5.9 RPPM algorithm simulation: 15-node linear network with ran-
dom marking probability. . . . . . . . . . . . . . . . . . . . . . . 147
5.10 RPPM algorithm simulation: 14-router binary-tree network with
random marking probability. . . . . . . . . . . . . . . . . . . . . 148
5.11 RPPM algorithm simulation: 14-router random-tree network
with random marking probability. . . . . . . . . . . . . . . . . . 148
5.12 RPPM algorithm simulation: 100-router random-tree network
with marking probability = 0.1. . . . . . . . . . . . . . . . . . . 149
5.13 RPPM algorithm simulation: 500-router random-tree network
with marking probability = 0.1. . . . . . . . . . . . . . . . . . . 149
5.14 RPPM algorithm simulation: 1,000-router random-tree network
with marking probability = 0.1. . . . . . . . . . . . . . . . . . . 150
5.15 When the routers have more than one victim route, the RPPM
algorithm cannot guarantee the correctness of the constructed
graph when the confidence level is larger than 0.59. . . . . . . . 153
5.16 An illustration of the extended graph with the support of mul-
tiple victim routes. . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.17 The pseudocode of the packet-type probability calculation sub-
routine which supports multiple victim routes. . . . . . . . . . . 156
5.18 With the support for multiple victim routes, the RPPM algo-
rithm can provide the guarantee of the correctness of the con-
structed graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
xv
5.19 Average number of marked packets required for a correct graph
reconstruction against different values of the marking probability.159
5.20 Average number of total packets (marked packets plus unmarked
packets) required for a correct graph reconstruction against dif-
ferent values of the marking probability. . . . . . . . . . . . . . 160
5.21 The number of marked packets recorded for the set of RPPM
algorithm simulations carried out on a random-tree networks
with 14 routers. . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.22 The number of marked packets recorded for the set of RPPM
algorithm simulations carried out on random-tree networks with
50 routers and 100 routers. . . . . . . . . . . . . . . . . . . . . . 162
5.23 The number of marked packets recorded for the set of RPPM
algorithm simulations carried out on random-tree networks with
500 routers and 1000 routers. . . . . . . . . . . . . . . . . . . . 163
5.24 The percentage of number of marked mpackets increased when
comparing the RPPM algorithm to the PPM algorithm with
different network scales. . . . . . . . . . . . . . . . . . . . . . . 163
5.25 Scalability analysis: average number of marked packets collected
by the PPM algorithm versus the size of the attack graph. . . . 166
5.26 The pseudocode of repeating the RPPM algorithm to increase
the runtime probability. . . . . . . . . . . . . . . . . . . . . . . 170
xvi
List of Tables
2.1 Computation of the accumulative local traffic in Example A by
using Equation (2.1). . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2 Computation of Li(t1, t2): the local traffic within [t1, t2] in Ex-
ample A by using Equation (2.2). . . . . . . . . . . . . . . . . . 30
2.3 Computation of accumulative local traffic at time ti,1 and ti,2. . 45
2.4 The local traffic intensity counts only the packets in between
the two instances of the snapshot algorithm. . . . . . . . . . . . 45
3.1 A sequence of packets collected by the victim. . . . . . . . . . . 90
4.1 Packet-type probabilities for Ga in Figure 4.6. . . . . . . . . . . 106
4.2 Packet-type probabilities for Gb in Figure 4.6 : after the path
(R3, R2, R1, v) of Gb is considered. . . . . . . . . . . . . . . . . . 106
4.3 Packet-type probabilities for Gb in Figure 4.6: after both paths
(R3, R2, R1, v) and (R4, R1, v) of Gb are considered. . . . . . . . 107
5.1 The marked packet-type probabilities of the extended graph G1,1
and G1,2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.2 The marked packet-type probabilities of the extended graphs
G2,1,G2,2, and G2,3. . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.3 The average number of packets and time required to form a
correct constructed graph in a 100BaseT Ethernet. . . . . . . . 164
xvii
Chapter 1
Defense Against
Denial-of-Service Attack
“If you know your enemies and know yourself, you will win hundred times in
hundred battles.” The Art of War, Sun Tzu.
The emergence of the Internet as a pervasive form of communication has led
to the recent enormous deployment of E-business and information distribution
services. However, the success of the Internet also attracts malicious attackers
who abuse system resources and expose the inherent security problems of the
Internet. Distributed denial-of-service (DDoS) attack is one of the most press-
ing problems on the Internet. Well-known commercial sites such as Yahoo!,
Amazon, and eBay were attacked and were out of service for many hours due
to a series of DDoS attacks on February 2000[1, 2]. Since then, DDoS attacks
have increased in size, frequency, sophistication, and severity.
In this chapter, we are going to understand what a distributed denial-of-
-service attack is. We dissect the methodologies of common DDoS attacks
in Section 1.1. After we are familiar with the nature of the DDoS attacks,
we define the scope of this thesis in Section 1.2: to trace the location of the
attackers of a DDoS attack. In the same section, we suggest our approach
against DDoS attacks in a world-wide scale, and we name it the divide-and-
conquer traceback approach. In Section 1.4, we introduce previous work that is
1
Chapter 1 Defense Against Denial-of-Service Attack 2
related to this thesis. Roughly speaking, this covers the methodologies that will
be introduced in later chapters, including the distributed snapshot algorithm,
the packet filtering technique, and the IP traceback technique.
1.1 Overview of Attack Methodology
The goal of the DDoS attacks is to degrade or even disable the service(s)
provided by the target. For the example attack case in [1], the targeted services
are the web services provided by Yahoo!, CNN, and Amazon. We classify a
DDoS attack in terms of the attack methodology. A denial-of-service attack
can be realized in either two techniques:
1. exploiting the vulnerability in network protocols and software; and
2. leveraging high volume of address-spoofing, bogus traffic.
We name the former type of attack the vulnerability-based attack and the latter
type of attack the flood-based attack. These two kinds of attacks are usually
mixed together in order to bring about a large amount of damage.
Note that an attacker always wants to disguise himself or herself as a set of
legitimate users. There is a loophole in the TCP protocol that no components,
devices, or authorities on the Internet can check the identity of any packets
sent. Say the attacker is sending a packet from a machine with address A, he
or she can easily change the source address of the packet to address B without
anyone noticing. We name this kind of packets spoofed packets since the source
address is spoofed.
The advantage of the attacker sending spoofed packets is to keep his or
her location secret. Then, the DDoS countermeasures would not target him
or her so easily. Though this exploits the vulnerability of the TCP protocol,
we choose not to classify attacks using spoofed packets as vulnerability-based
Chapter 1 Defense Against Denial-of-Service Attack 3
attacks because every attack uses spoofed packets. Henceforth, throughout
the text, we always assume that every attacker sends spoofed packets.
1.1.1 Vulnerability-based attack
In the following sections, we introduce two severe kinds of vulnerability-based
attacks. This kind of attack leverages the flaws in protocol designs and the de-
fects in software. Once such vulnerabilities are exploited, the service provided
by the victim will be shut or degraded.
TCP-SYN flood attack
The TCP-SYN flood attack[3] (or SYN attack) is an infamous vulnerability-
based attack. Though the attack carries the word “flood,” what the attack does
is to exploit the vulnerability in the implementation of the SYN packet han-
dling of the TCP/IP protocol. In a nutshell, this attack targets the three-way
handshake protocol of the TCP protocol [4]. The attack brings down a host
by flooding the host with enough spoofed SYN packets so that these spoofed
SYN packets occupy all the available connections of the hosts. Eventually,
there are no more resources left for further connections. The countermeasure
of this threat is the SYN cookies introduced in [5]. Nowadays, most operating
systems already have SYN cookies implemented (inside the operating system’s
kernel).
Low-rate TCP attack
In [6], the authors proposed and realized a new form of attack that targets
the congestion control mechanism of the TCP protocol. The attacker carefully
orchestrates the periodic attack packets to exploit the fixed minimum TCP
retransmission timeout so as to shut off most, if not all, legitimate TCP flows.
Though there is are incident reports on the low-rate TCP attack, there are
Chapter 1 Defense Against Denial-of-Service Attack 4
already solutions [7, 8, 9] proposed in the literature.
1.1.2 Flood-based attack
The flood-based attack aims to disable a victim host by leveraging a high
volume of spoofed traffic. Once this type of DDoS attack is launched, the
victim will experience increasing load. The service will usually be impacted
significantly, and there are cases when the victims have broken down. To
realize such an attack, computers in an order of tens of thousands are needed
in order to generate a significantly large burden on the victim. The attacker
in reality cannot own such a scale of resource but steals them.
The attacker usually obtains computing resources by compromising a large
number of computers in order to launch a large-scale flood-based attack. This
can be realized by exploiting known vulnerabilities in widespread operating
systems such as Microsoft Windows. When such a exploitation is done, the
attacker usually gains the highest privilege of the compromised computer and
can perform whatever acts he or she likes. We called those compromised
computers zombies [10, 11]. Although the attack involves the technique of
exploiting vulnerabilities, the technique is not the payload of the DDoS attack,
and such an exploitation neither brings the zombies down nor degrades the
computing performance of the zombies after all.
Zombie attack
Once a large group of zombies has been gathered, the attacker loads attack
programs to the zombies. The zombies are then turned into unwitting attack-
ers, and the DDoS attack is then launched. Figure 1.1 shows the deployment
scenario and the entities involved in a DDoS attack using zombies [12]. The
attacker seated in front of his or her own computer controls a set of handlers
that are, again, obtained by exploiting vulnerabilities. These handlers are used
Chapter 1 Defense Against Denial-of-Service Attack 5
A Attacker
H H
Z Z Z Z
νVictim
Handlers
Zombies
Figure 1.1: The architecture of a typical flood-based DDoS attack.
to control the zombies so that the attacker can become stealthy during the at-
tack. The zombies are the ones that are sending spoofed traffic to the victim.
There are occasions that, when an outbreak happens, the Internet becomes
paralyzed because this kind of attack usually targets on widespread software.
It is always difficult to hunt down the attacker of a zombie attack. The
attacker always protects the communication between the handlers and the
zombies by encrypting the communication channels [12]. What we can do is
to ask the Internet service providers (ISPs) to help locate and filter the attack
traffic so as to ease the pain of the victim. Also, replacing legacy and buggy
software is a crucial step to reduce the number of handlers and zombies that
can be obtained by attackers. Moreover, intrusion detection systems (IDS)
[13, 14] should always be installed in order to detect and stop intrusions by
attackers promptly and effectively.
Reflector attack
There is another kind of automated attack using a similar architecture called
the reflector attack [15]. As shown in Figure 1.2, the main feature of this attack
is that the zombies are not attacking the victim directly but through a set of
Chapter 1 Defense Against Denial-of-Service Attack 6
Z Z Z Z
νVictim
Zombies
R R R RReflectors
Figure 1.2: The architecture of a reflector attack.
reflectors.
The zombies send spoofed packets with the source addresses set to the
victim’s address and the destination addresses set to the reflectors’ addresses.
The reflectors are usually some public servers, such as domain name servers,
and the content of the spoofed packets is usually a request for service from the
reflectors. The reflectors will then generate replies without knowing that the
requests are frauds. As a result, the reflectors send the replies to the victim as
the source addresses of the requests are set to the victim’s address.
The reflector attack is, therefore, by its nature, more detrimental than
using the zombie attack model alone because:
1. it amplifies the effect of the DDoS attack. Let us imagine that the at-
tacker has only one zombie. By sending spoofed packets to different
reflectors, one zombie is already enough to attack the victim in a dis-
tributed way;
2. it also degrades the services provided by the reflectors. During the re-
flector attack, the reflectors are loaded by the requests from the zombies,
and this degrades the services provided by the reflectors; and
Chapter 1 Defense Against Denial-of-Service Attack 7
3. it is more difficult to be traced. Since the reflecting flow is coming
from innocent hosts (given that the reflectors are not compromised),
the tracing can be done readily, but to find that they are just reflectors.
Peer-to-peer attack
This is an emerging type of attack mechanism. The peer-to-peer (P2P for
short) DDoS attack does not attack the P2P file sharing network but makes
use of the P2P network to launch a DDoS attack [16]. A P2P file transfer
network usually has ten of thousands of clients joining it. One type of the P2P
attack is to poison the file records shared by the clients. The attack writes a
bogus entry saying that a certain location, which is the victim, is providing a
certain set of files (and usually the victim is not a member of P2P network).
When innocent clients follow the bogus entry for a file sharing service, it will
end up in an error. But, the victim is bombarded with tens of thousands
irrelevant file requests.
Automatic attack tools
Several well-known DDoS attack tools adopt the above attack architectures.
These tools are designed to be versatile so that they can mount different types
of attack payloads to the zombies. Several famous tools include the Tribe
Flood Network 2000 (TFN2K for short) [17], the Trinoo [18], and the Stachel-
draht [19]. These automatic attack tools are well designed and are effective in
launching DDoS attacks.
1.1.3 Worm attack
The worm attack is another form of automatic attack tool. To define, a worm
is a piece of software that runs on a computer, and the computer is unwillingly
having the worm running. The worm has the ability to duplicate itself, and
Chapter 1 Defense Against Denial-of-Service Attack 8
has the duplicated copies infect other computers. From a functional point of
view, a worm infects a computer by exploiting vulnerabilities of the software
used on the target computer. A worm also has its payload: some payloads just
infect other computers, some payloads harm the hosting computer, or some
payloads attack target sites in a cooperative manner.
Code red
The Code Red [20] is a famous worm that roamed the Internet during the
summer of 2001. The worm exploits a vulnerability in the Microsoft IIS server
which, is widely deployed around the globe (around 20% of market share by
2001 [21]). The payload of this worm was twofold: first the worm tried to infect
as many IIS servers as possible, and then all the worms were coordinated to
launch a DDoS attack toward several victims such as the web server of the
U.S. White House.
In response, Microsoft announced the vulnerabilities with the correspond-
ing software patches provided. The attack ceased when the vulnerabilities were
fixed, and at the same time, the ISPs filtered the payload of the worm.
Slammer
Again, Microsoft was the target of another famous worm attack. The worm
named Slammer demonstrated a severe attack on the Internet in 2003 [22] by
using a vulnerability of Microsoft SQL server. The Slammer is actually an in-
teresting worm attack incident. The only payload it carried was to propagate
itself with a blitz tactic. Once the worm affected a vulnerable Microsoft SQL
server, the immediately probed the network for other vulnerable Microsoft
SQL servers by rapidly firing malicious traffic with random IP addresses. The
malicious traffic brought down many routers and then initiated a wave of rout-
ing table updates. When the failed routers were fixed and were online again,
the worm started another wave of routing table updates. The bombardment
Chapter 1 Defense Against Denial-of-Service Attack 9
of the malicious traffic, the failures of routers, and the changes of the routing
tables together shut the Internet down partially. This was, as a matter of fact,
a DDoS attack that targeted the Internet infrastructure.
1.1.4 Flash crowd
Despite the mentioned explicit attacks, there are scenarios in which the ser-
vices provided by the victim are degraded because of legitimate traffic. The
flash crowd happens when many users simultaneously send requests to one
Web site, usually because of special events attracting the interest of the mass
population. These events could be scheduled ones such as broadcasts of World
Cup matches, unpredictable events such as earthquakes, or links from popular
Web sites (see [23] for details).
In our context, the flash crowd is certainly not a DDoS attack. Neverthe-
less, the flash crowd behaves similarly to a DDoS attack. The victim and the
network itself can be overloaded by a flash crowd event, and the aggregated
volume of the legitimate traffic is comparable to a DDoS attack. In the liter-
ature, publications have mentioned this problem and suggested solutions have
been provided [24, 25].
To conclude, the DDoS attack may take different attack forms, strategies,
and patterns. Interested readers can refer to survey articles [26, 27] for more
details.
1.2 Scope of the Thesis
In this thesis, we target the flood-based attack, and we aim to stop such an
attack when one can detect it. According to industrial practices against DDoS
attacks [28], one should do the following steps in response to a DDoS attack:
Chapter 1 Defense Against Denial-of-Service Attack 10
1. Preparation. Service providers have a high chance of successful defense
against a DDoS attack if they have laid the groundwork against it.
2. Detect. The ability to quickly identify an attack is critical to minimizing
the damage that the attack can cause.
3. Traceback. Once a service provider has detected an attack, the next
step is to traceback–trying to determine the source of the attack so that
the service provider can apply mitigation techniques, or, if the source of
the attack is from another network, inform the corresponding peer.
4. Containment. When an organization knows where an attack is coming
from, the organization should apply containment and filtering mecha-
nisms to stop the malicious traffic.
5. Postmortem. After a security incident, it is important for the orga-
nization to review what was most effective during an attack and what
could be improved.
The target of this thesis is to trace back: to locate the sources of the attack
flows that are contributing to the DDoS attack. In the following section, we
present some general assumptions.
1.2.1 General assumptions
We aim to locate the sources of the attack flows. Hence, if the attacker(s) are
using the attacking architecture mentioned in Section 1.1.2, we are concerned
only with the locations of the zombies or the locations of the reflectors in the
reflector attack.
We assume that the victim has the ability to detect that the providing
service is being degraded by overwhelming traffic. We also assume that the
victim is allowed to report the incident to the victim’s ISP, and the ISP will
then handle the incident.
Chapter 1 Defense Against Denial-of-Service Attack 11
We are not interested in discriminating between a legitimate flow and an
attack flow. We are also not interested in distinguishing between a flash crowd
or a DDoS attack. What we are concerned with is identifying flows that
degrade the service provided by the victim.
Last but not least, since we are concerned with the flood-based attack only,
we are not going provide solutions to remedy vulnerability-based attack such
as the low-rate TCP attack.
1.2.2 A divide-and-conquer traceback approach
As DDoS attacks are becoming more violent and the attack scale is enlarg-
ing, tracking down attackers across the globe is becoming more difficult and
more tedious. To provide relief from such a adversary reality, we propose a
divide-and-conquer approach so that the global-scale traceback problem can be
divided into tractable sub-problems.
Overview
From a technical point of view, in the case of launching a global-scale attack,
attack sources are spread across different Internet service providers (ISPs for
short), and these sources send attack traffic toward the ISP where the victim
resides. As shown in Figure 1.3, attackers located in ISPs C, D, and E send
traffic toward ISP A, where the victim resides.
We propose that the ISPs should be coordinated, and together discover
the ISPs that are contributing overwhelming traffic, and we call this problem
the macroscopic traceback problem. After the problematic ISPs have been
identified, in the next step, each ISP should trace the location of attackers
within its administrative domain, and we call this problem the microscopic
traceback problem.
Specifically, a macroscopic traceback algorithm should be deployed within
Chapter 1 Defense Against Denial-of-Service Attack 12
A
D B
E
C
ν
ν
Intra-ISPnetwork
Inter-ISPnetwork
ISP backbonerouter
R ISP borderrouter
Attacker Victim site
Macro-tracebackprocessing node
Micro-tracebackprocessing node
Figure 1.3: The overview of the divide-and-conquer traceback approach.
Chapter 1 Defense Against Denial-of-Service Attack 13
the inter-ISP network. Referring to Figure 1.3, the border routers and the cou-
pling links between the border routers together form the inter-ISP network. To
facilitate the deployment of the macroscopic traceback algorithm, every border
router is connected to a macro-traceback processing node, which executes the
macroscopic traceback algorithm. On the other hand, a microscopic traceback
algorithm should be deployed within the intra-ISP network, and the intra-ISP
network is constructed by a network of backbone routers of an ISP. Again, a
processing node, namely the micro-traceback processing node, is added to help
trace the attackers within the network inside an ISP.
An example divide-and-conquer traceback execution
In addition to the architecture of the divide-and-conquer traceback approach,
Figure 1.3 also sets up an attack scenario. In the figure, we have five attacking
sources with the distribution that ISP C contain three attackers while both
ISPs D and E contains one.
In the beginning, at the moment that a DDoS attack is detected, the victim,
which resides in ISP A, calls for the DDoS defense service from its ISP. In turn,
the border router of ISP A diverts the traffic sent toward the victim to the
macro-traceback processing node, and the macro-traceback processing node
initiates a macroscopic traceback algorithm. The macro-traceback processing
nodes of the remaining ISPs join the algorithm accordingly. The traceback
result of the macroscopic traceback algorithm should discover that ISPs C, D,
and E contain the sources of the attack.
Next, ISP A would inform ISPs C, D, and E about the traceback result.
In response, each border router of the concerned ISPs diverts all the outgoing
traffic sent toward the victim to the micro-traceback processing node. Each
processing node, running the microscopic traceback algorithm, aims to locate
the attack sources, which are sending traffic toward it. Once the traceback
result is ready, the concerned ISP can discover the locations of the attackers,
Chapter 1 Defense Against Denial-of-Service Attack 14
and follow-up actions, such as packet filtering, will then be carried out.
Justification
First of all, it will be attractive to the ISPs if the traceback algorithms are
deployed only within their administrative domains. To justify, from the ISPs’
points of view, they do not want to disclose any information about their net-
works. The reason is simple: their peers are actually competitors, not part-
ners. Thus, any algorithms that execute across multiple ISPs have difficulties
in deployment, and this is the reason for confining the microscopic traceback
algorithm within the intra-ISP network.
On the other hand, the divide-and-conquer approach not only narrows
down the traceback scope by using the macroscopic traceback algorithm but
also speeds up the traceback process by having multiple execution instances
of the microscopic traceback algorithm concurrently at different ISPs.
We believe that there is no silver bullet that can handle every kind of flood-
based DDoS attack. We believe that one should use the right tool against the
right problem, the right model, and the right scenario. Therefore, in this
thesis, we choose to investigate the DDoS attack defense mechanism from two
different angles.
1.3 Structure and Contribution of the Thesis
In Chapter 2, we devise a macroscopic traceback algorithm. Leveraging the
well-known Chandy-Lamport’s distributed snapshot algorithm, we propose a
distributed algorithm that can correctly collect statistics (in a distributed
sense) from programmable routers in a coordinated fashion [29]. Then, by
analyzing the collected data, a victim can deduce the intensity of the traffic
generated by the network that is attached to every participating router. The
contribution of the algorithm is twofold. Firstly, this is the first piece of work
Chapter 1 Defense Against Denial-of-Service Attack 15
that applies a classical distributed algorithm in a DDoS attack defense mech-
anism effectively. Second, this work also provides a theoretical foundation to
measure Internet traffic in a distributed sense.
In Chapter 3, we analyze a promising microscopic traceback algorithm.
The probabilistic packet marking algorithm (PPM algorithm for short) by Sav-
age et al. [30] is an effective way to locate attackers using flood-based DDoS
attacks. In this chapter, we present an overview of the PPM algorithm. Yet,
the PPM algorithm is not perfect as its termination condition is not well-
defined in the literature. More importantly, it is found that, without a proper
termination condition, the attack graph constructed by the PPM algorithm
would be wrong. In Chapter 4, we study the termination condition of the
PPM algorithm. This is the first piece of work in the literature that studies
the termination condition of the PPM algorithm [31]. We present a discrete-
time Markov chain model that provides a precise calculation for the termina-
tion condition for the PPM algorithm. Nevertheless, the mechanism requires
knowledge of the attack graph in advance. This contradicts the purpose of the
traceback algorithm, which is designed to find the attack graph. This leads to
the surrender of the current termination condition of the PPM algorithm
To improve the termination condition of the PPM algorithm, we present
a new algorithm, the rectified probabilistic packet marking algorithm (RPPM
algorithm for short) in Chapter 5 [32]. The most significant merit of the RPPM
algorithm is that when the algorithm terminates, the algorithm guarantees
the correctness of the traceback result with a specified level of confidence.
Our findings show that the RPPM algorithm can guarantee such a correctness
under different deployment scenarios. As one of the major contributions of this
thesis, the RPPM algorithm provides an autonomous way, which is missing in
the original PPM algorithm, to determine its termination, and it is a promising
means to enhance the reliability of the PPM algorithm.
Chapter 1 Defense Against Denial-of-Service Attack 16
1.4 Related Work
The macroscopic snapshot algorithm that will be introduce in Chapter 2 lever-
ages the well-known Chandy-Lamport distributed snapshot algorithm. In this
section, we first introduce the importance of this distributed snapshot algo-
rithm. Then, we introduce the development of the techniques against DDoS
attacks, mainly the packet filtering technique and the IP traceback technique.
1.4.1 Distributed Snapshot Algorithm
The very first distributed snapshot algorithm was proposed by Dijkstra and
Scholten [33]. Later, Chandy and Lamport proposed the consistent global
snapshot algorithm in [34], and the algorithm is derived from Lamport’s ear-
lier work on logical time [35]. Fischer et. al. designed another algorithm for
consistent global snapshots, and this algorithm is tailored for transaction-based
systems [36].
The distributed snapshot algorithm has been applied in capturing consis-
tent global state of a distributed system. The primary use of the snapshot
algorithm is in checkpointing and rollback recovery [37]. The checkpointing
and recovery are vital properties that allows systems to make progress in the
presence of failures. In brief, checkpointing [38] is a technique to save the
states of an executing process. Processes achieve fault tolerance by saving re-
covery information periodically during failed-free executions. Upon a failure,
a failed process uses the saved information to restart the computation from
an intermediate state, thereby reducing the amount of lost computation. The
recovery information includes the states of the participating processes, called
checkpoints.
In a distributed system, a global checkpointing scheme requires a coor-
dinated checkpointing of the participating processes. The Chandy-Lamport
distributed snapshot algorithm provides a proofed consistent global state with
Chapter 1 Defense Against Denial-of-Service Attack 17
all the processes logically synchronized, without a global clock. Therefore, the
recovery of the distributed system is made possible with the checkpointing of
the global system. It is a common practice having the distributed snapshot
algorithm to save the checkpointing data in a database system [39].
1.4.2 DDoS Defense Mechanisms
One major security problem of the IP protocol is that the source address can
be filled by any user [40]. In a DDoS attack, attackers exploit this vulnerability
in order to hide their existences as well as to hinder authorities in tracing the
attack origins. In the literature, work has been done to mitigate the effects of
the DDoS attack by filtering the malicious packets before they can reach the
victim. On the other hand, there is also research work going on in order to
trace back the sources of the attack in the presence of the spoofed packets.
Packet filtering
One possible way to stop the attackers from spoofing the source addresses of
the malicious packets is ingress filtering [41]. Under such a filtering mechanism,
a router is configured to drop packets that arrive with illegitimate addresses.
This requires the participating routers to have the ability to examine every
packet that passes through as well as sufficient knowledge to distinguish be-
tween the legitimate packets and the illegitimate packets. The best way to
deploy ingress filtering is at the border routers of an AS1 /ISP because it is
rather easy for the border routers of the ASes and the ISPs to acquire the
range of legitimate addresses.
However, the fatal problem of ingress filtering is that it requires a widespread
deployment in order that the mechanism can efficiently remove most malicious
packets. Unfortunately, a signification fraction of the ISPs do not implement
1AS stands for autonomous system.
Chapter 1 Defense Against Denial-of-Service Attack 18
this approach. Moreover, although ingress filtering is deployed globally, an at-
tacker can still launch an attack by setting the spoofed address to be a member
of the legitimate address range of the AS. On the other hand, when a router
is deployed with egress filtering [42], a router is commanded to drop packets
that leave that router with illegitimate addresses. However, one can notice
that this mechanism bears the same defects of ingress filtering.
Park et al. have proposed the route-based distributed packet filtering scheme
in [43]. For example, let AS 1, AS 2, and AS 3 be three distinct autonomous
systems. Under a normal situation, AS 2 receives and routes the packets from
AS 1 at its incoming interface. If an attacker at AS 1 sends a spoofed packet
with the source address that belongs to AS 3, based on the routing information
of AS 2, this packet is an abnormal packet, and it will then be dropped. The
authors analyze the distributed packet filtering scheme on the power-law-based
Internet model. The performance result shows that the main advantage of the
proposed scheme is that it does not require a global deployment, and still can
filter a significant fraction of the malicious packets.
Yau et al. [44] proposed a feedback-based mechanism to throttle the rate
at which the routers send packets toward the victim. The scheme is to have
programmable routers deployed on the network. When these routers receive
throttling signals sent from the victim, the routers restrict the flows sent toward
the victim. The contribution of this work is to guarantee a max-min fairness
when throttling the flows so that large flows will have a large impact while the
small flows can survive the throttling.
Mahajan et al. proposed the aggregate congestion control in [24]. The sug-
gested method modifies the router’s congestion control algorithm, and the
method is twofold. Firstly, every router is equipped with the local ACC, which
can (i) identify a congestion, (ii) classify and identify “bad” traffic aggregates
from the input queue of the router, (iii) rate-limit the arrivals in order to ease
the congestion, and (iv) send push-back messages to upstream routers when
Chapter 1 Defense Against Denial-of-Service Attack 19
the congestion cannot be eased by the local ACC alone. Secondly, the push-
back mechanism is rather passive compared to the active local ACC measure.
Since congestion can only be detected on downstream routers, the upstream
router will be invoked to launch the local ACC and rate-limits the aggregates
specified by the push-back messages.
Chen et al. applied the ACC (aggregate congestion control) to mitigate
a DDoS attack [45]. The core of the DDoS attack defense mechanism is the
ability to detect the high bandwidth aggregate which that leverages the ACC
technique. The authors suggest that the defense should take place on the edge
routers of an ISP, and the edge routers together form a defense perimeter.
The perimeter is then responsible for locating the packets belonging to the
high bandwidth aggregate. Once those packets are found, the edge routers
that discover the packets accordingly install a rate-limit filter, which drops
the packets according to the acceptance rate. The authors have proposed two
solutions in locating which edge router accepts the problematic aggregate: one
is done by multi-casting, and another one is done by IP traceback.
Xu et al. suggested a methodology to sustain the availability of web services
under a DDoS attack [46]. The goal of the defense system is to, first, defend
against attacks using spoofed addresses, and second, minimize the system re-
sources consumed by adversary who is using legitimate addresses. To get rid
of the spoofed-address traffic, the authors suggest using the HTTP redirect
message. To mitigate the damage brought about by the adversary traffic using
legitimate addresses, the system is modeled as a minimax game. The goal is
to maximize the small traffic and to penalize the large traffic by suspending
the concerned connections.
IP traceback
Savage et al. proposed the probabilistic packet marking scheme in [30]. Every
router participating in this scheme marks the IP header of every packet passing
Chapter 1 Defense Against Denial-of-Service Attack 20
through it based on a pre-defined probability. At the victim site, the victim
can reconstruct the packet path (or the attack path) by collecting a sufficient
number of marked packets from the routers. Detailed analysis of the PPM
algorithm will be given in Chapter 3.
Following the work of Savage, many pieces of work researching the field of
IP traceback have emerged. In [47], the authors analyzed the time as well as
the number of packets that are sufficient to construct the attack graph with
certain a confidence interval. In [48], the authors proposed an authentication
scheme based on the approach suggested by Savage. The aim of this work is
to hinder the malicious parties in altering the marked field in the IP header
of the packets. Also, the authors have mentioned that if the victim site knows
the map of its upstream routers, the mechanism does not need to encode
the full IP address in the packet marking. They improved Savage’s marking
approach by hashing so as to achieve a lower false positive rate as well as a
lower computation overhead. On the other hand, Park et al. analyzed the work
of Savage, and pointed out that the spoofing of the marking field may impede
traceback by the victim site [49]. Attackers may be able to choose the spoofed
marking value and the source address in order to hide themselves.
Despite the IP traceback approach proposed by Savage, Dean et al. formu-
lated the traceback problem into a polynomial reconstruction problem [50, 51].
They use algebraic coding theory to encode traceback information in the
packet, similar to Savage’s approach. On the other hand, Snoeren et al. pro-
posed an efficient hash-based approach to trace back the attackers [52]. Every
packet that passes through a router is hashed into the storage device associ-
ated with the router. By tracking the storage device of every router, one can
derive the traversed path of a single packet.
Adler formulated a new IP traceback scheme that is capable of tracing the
attacker with an arbitrary number of encoding bits in the attack packets [53].
According to the analysis, one can apply an IP traceback scheme that uses one
Chapter 1 Defense Against Denial-of-Service Attack 21
encoding bit per packet in a single-attacker environment. However, the lower
bound of the number of bits required is greater than one in a multiple-attacker
environment.
Sung et al.́s work is the first that combines an IP traceback approach and an
automatic packet filtering approach [54]. The scheme employs the IP traceback
approach by Savage to discover the attack path. Then, by setting up a defense
perimeter, the attack packets are filtered preferentially at the routers that are
far from the victim. By using this scheme, the attack packets can be filtered
at a far distance from the victim while the legitimate packets can reach the
victim instead of being dropped.
Unlike the probabilistic packet marking algorithm that marks packets ran-
domly, Belenky et al. suggested marking every packet that passes through the
edge routers of an ISP [55]. Then, by collecting the marked packets, one can
know from which edge router the attack traffic is coming.
For details on other IP traceback schemes, readers may refer to the detailed
survey articles in [56, 57].
Chapter 2
Distributed Snapshot Traceback
Algorithm
In this chapter, we present a distributed approach to effectively trace back the
location of potential flood-based attack sources. To begin with, we focus on the
macroscopic traceback problem introduced in the previous chapter. According
to our knowledge, this is the first work that considers the network model of
the macroscopic traceback problem. In general, this approach can also be
applied to the micro-network model as well without affecting the algorithm’s
effectiveness.
Technically, our proposed approach is twofold. Firstly, our approach is
grounded in the programmable router architecture [58, 59, 60, 61] wherein
the participating routers can be programmed so that they can collaboratively
collect traffic statistics for a victim site, or the macro-traceback processing
node in the divide-and-conquer approach1 . After the statistical information
has been forwarded to a victim site, the victim site can then do the following:
1. construct the attack graph with the network paths taken by all received
packets at the victim, and
1To be consistent, we stay with the name victim when we are addressing the macro-traceback processing node.
22
Chapter 2 Distributed Snapshot Traceback Algorithm 23
2. accurately determine the magnitudes or intensities of the traffic gener-
ated from the local network of each participating router, and we name
this traffic the local traffic.
Secondly, upon determining the intensity of the measured traffic from each par-
ticipating router, the victim site can determine a subset of attacking (border)
routers whose workload consumes a large percentage of the victim’s resource.
The contributions of this chapter are as follows:
• to provide an effective distributed traceback methodology to determine
the local traffic of participating routers;
• to measure the local traffic all routers are determined at the same logical
time without requiring any global clock or global synchronization from
each participating router; and
• to assist the victim to efficiently locate the attackers who contributing
large attack flows.
The rest of the chapter is organized as follows. In Section 2.1, we formally
present the traceback problem and present the network setting wherein we
perform the distributed traceback. We also give an example to illustrate the
reason that one needs a distributed algorithm to carefully record the local
state of each participating router so as to achieve a correct traceback result.
Moreover, definitions and notations used throughout the chapter will be given.
In Section 2.2, we present the distributed algorithm to correctly record the
state of each participating router. In Section 2.3, we present the method to
correctly interpret the traceback results obtained by the distributed algorithm.
In Section 2.4, we carry out NS-2 simulations to illustrate the effectiveness of
our distributed traceback methodology. Implementation issues are discussed
in Section 2.5. Lastly, the conclusion of this chapter will be given in Section
2.6.
Chapter 2 Distributed Snapshot Traceback Algorithm 24
2.1 Overview and Problem Definition
In this section, we first present the overview of our approach, then we present
a network model. We also illustrate why one needs a distributed algorithm to
correctly perform the traceback under a DDoS attack.
2.1.1 Overview
To eliminate the detrimental effect of the flood-based DDoS attack, tracing the
location of the attacker and filtering out all the malicious packets are essential
steps. Since the attacker is sending a huge amount of packets comparing to
those of the normal users, one can easily notice the large portion of traffic
from the attacker on the victim side through a traffic intensity measuring
mechanism.
However, this approach is not obvious since the attackers are usually spoof-
ing the source address of the malicious packets. One can hardly measure the
traffic intensity of a particular host based on the source addresses the outgoing
packets. Alternatively, we suggest measuring the intensity of the outgoing traf-
fic towards the victim on the routers. Certainly, this scheme neither measures
the traffic intensity of an individual user nor traceback to a particular attacker.
Nevertheless, it aims to identify a number of routers which have high volume
of outgoing traffic towards the victim site. This indicates that the origins of
the attack are from the domains of those routers.
In order to measure and collect the traffic intensities from the participating
routers, we propose a novel approach by applying the snapshot algorithm sug-
gested in [34], and we name the algorithm the distributed snapshot traceback
algorithm (snapshot algorithm for short). The snapshot algorithm provides a
means to coordinate all the participating routers in the traffic measurement
and the data collecting procedures. The algorithm also provides a way to
measure the traffic intensity correctly. The advantages of our approach are:
Chapter 2 Distributed Snapshot Traceback Algorithm 25
LAN0
R1
ν
R2
R5
LAN1
LAN2
R3
R4
CA
C C. . . . . .C
LAN3
LAN4
LAN5
C C
C
C
C C
C
routerRi
C normal client
A attacker
ν victim
traffic flow
LAN
Figure 2.1: An example network topology.
• easy to implement without a large modification of the routers, and
• fast; the approach requires only a few seconds in measuring the traffic
intensities of the router.
2.1.2 Problem definition
Let us first define our network model, and the model is well-suited the macro-
scopic traceback problem.
Network components
An example network is shown in Figure 2.1. In the figure, an inverted, di-
rected acyclic graph rooted at V represents the network topology, and the root
node V represents a victim site. The graph is composed of the routers and
the local administrative domains (LANs) of the routers. For the simplicity of
illustration, the graph only shows the network components that are partici-
pating in transmitting and forwarding traffics to the victim site V. Let Ri be
Chapter 2 Distributed Snapshot Traceback Algorithm 26
an upstream router of V and the graph is a map of all routers which forward
traffic to V.
A LAN contains a number of end hosts which includes some legitimate
clients of V and possibly some attackers of V. The traffic generated by the
clients and the attackers are forwarded by routers. For example, in Figure
2.1, router R1 serves as a gateway of LAN0 and LAN1, and these two LANs
are regarded as the local administrative domain (domain for short) of R1. A
router is responsible for sending traffic generated from its domain, and it is also
responsible for forwarding traffic generated from the domains of its upstream
routers.
For example, in Figure 2.1, routers R3, R4, and R5 are considered as the
upstream routers of R1. Particularly, routers R3 and R4 are regarded as the
immediate upstream routers of R1. We say that a router is a leaf router if it is
not connected to any upstream routers, such as R2, R3, and R5 in Figure 2.1.
Other routers are then called the transit routers. Throughout this work, we
let U(Ri) be the set of upstream routers of Ri, U(Ri) be the set of immediate
upstream routers of Ri, D(Ri) be the set of downstream routers of Ri, and
D(Ri) be the set of immediate downstream routers of Ri.
Note that, according to the divide-and-conquer approach described in Sec-
tion 1.2.2 (on Page 11), a router in this model represents the border router of
an ISP while the corresponding LAN represents the administrative domain of
the ISP.
Assumptions
After we have introduced the network components, we discuss the assumptions
imposed on them.
We assume that the victim is equipped with the distributed snapshot trace-
back algorithm. This implies the victim has the ability and the computing
resources to execute the proposed traceback algorithm and to process the
Chapter 2 Distributed Snapshot Traceback Algorithm 27
traceback results. In reality, it is not practical to have the victim installed
the required software and hardware. Nevertheless, as mentioned in Section
1.2.2, the macro-traceback processing node will perform the traceback in the
victim’s place. Though the macro-traceback processing node is not the true
victim, we name the node that initiates the traceback algorithm the victim.
We assume that every router in the network (we mean the inter-ISP net-
work) is equipped with the traceback algorithm and has the computing re-
sources to run the traceback algorithm. Again, it is the macro-traceback pro-
cessing node which executes the traceback algorithm, instead of the router
itself. Nevertheless, we will discuss the feasibility of the router to execute the
proposed traceback algorithm in Section 2.5.2. More importantly, we assume
that every router in the network will not be compromised by any malicious
parties. A compromised router may disrupt the executions of the traceback
algorithm, or, even worse, report fake traceback results. This assumption
protects the execution of the traceback algorithm from being harmed by com-
promised routers.
Traffic classification
In our work, we classify three types of traffics: they are the transit traffic, the
local traffic, and the outgoing traffic. The transit traffic of Ri is the traffic
forwarded from the immediate upstream routers of Ri while the local traffic of
Ri represents the traffic generated from the local administrative domain of Ri.
Eventually, the outgoing traffic of Ri is the sum of the transit traffic and the
local traffic of Ri. To illustrate, let us consider the following example using
Figure 2.1. Part of the traffic to V was generated in LAN5, and this traffic
has to pass through routers R5, R4, and R1 before reaching V. Therefore,
the traffic from LAN5 is considered as the transit traffic of router R4. On the
other hand, the clients in LAN4 also generates traffic to V, and this traffic is
considered as the local traffic of R4. The union of these two streams of traffic
Chapter 2 Distributed Snapshot Traceback Algorithm 28
generated in LAN4 and LAN5 is considered as the outgoing traffics of R4.
We assume that each router maintains a counter which records the accu-
mulative (for the ease of presentation, let us ignore the counter wraparound
problem.) volume of the outgoing traffic, counted in terms of the number of
packets, towards the victim site V.
Lastly, we define an attacker and his/her behavior as follows. An attacker is
a host which is sending high volume of traffics towards the victim site within a
period of time (usually within seconds), and thereby consumes a large portion
of the victim’s resource. An attacker may generate any kinds of packets with
spoofed source addresses.
2.1.3 Traceback methodology
Before elaborating the distributed algorithm, we formally define the following
concepts.
Definition 2.1 The accumulative outgoing traffic counter of the router Ri at
time t records the accumulative number of packets which are destined for the
victim site V up to time t. We denote the value of the accumulative outgoing
traffic counter of Ri at time t for the victim V as Ci(t).
Definition 2.2 The local traffic of the router Ri is the number of packets
which are destined for the victim site V generated within the local adminis-
trative domain of Ri in the time interval [t1, t2]. We denote the local traffic of
Ri in the time interval [t1, t2] as Li(t1, t2).
Formally, we let Ci(t) be the counter value of the accumulative outgoing
traffic of router Ri at time t and let U(Ri) be a set of immediate upstream
routers of Ri. Note that, in the case that the router Ri is serving more than
one victim, there will be different copies of the counter Ci(t) with different
Chapter 2 Distributed Snapshot Traceback Algorithm 29
values. The accumulative local traffic Ni(t) of router Ri at time t is given as
follows.
Accumulative Local Traffic
Ni(t) =
Ci(t), Ri is a leaf;
Ci(t) −∑
Rj∈U(Ri)Cj(t), otherwise.
(2.1)
Let Li(t1, t2) represents the local traffic generated by the router Ri within the
time interval[t1, t2]:
Local Traffic
Li(t1, t2) = Ni(t2) − Ni(t1). (2.2)
The implication of Equations (2.1) and (2.2) is that, by using the outgoing
traffic counters, one can deduce the accumulative local traffic to the victim
site V for every router by Equation (2.1). Then, by taking the difference of
these accumulative local traffics between two different time instants as shown
in Equation (2.2), one can obtain the local traffic to the victim V for every
router within the measurement interval [t1, t2].
2.1.4 How to perform the traceback
We describe the steps of the traceback process as follows. When V receives a
huge amount of traffic that exceeds its pre-defined threshold of traffic loading,
V declares that it is under a DDoS attack, and the traceback procedure is
started. V signals all routers to read their outgoing traffic counters. In order
to determine the local traffic within a time interval [t1, t2], V needs to send the
counter reading signals to all participating routers twice: one at time t1 and
the other at time t2. Then, every router takes the counter value of its outgoing
traffic counter, and sends the counter statistics back to the victim accordingly.
Eventually, after V has collected these two sets of data from the participating
routers, V computes the local traffic generated from each domain within [t1, t2]
Chapter 2 Distributed Snapshot Traceback Algorithm 30
Outgoing Traffic Counter at time tTime C5(t) C4(t) C3(t) C2(t) C1(t)t = t1 20000 35000 5000 10000 65000t = t2 50000 66000 6000 11000 99000
Table 2.1: Computation of the accumulative local traffic in Example A byusing Equation (2.1).
Local Traffic from t1 to t2R5 R4 R3 R2 R1
Li(t1, t2) 30000 1000 1000 1000 2000
Table 2.2: Computation of Li(t1, t2): the local traffic within [t1, t2] in ExampleA by using Equation (2.2).
by Equations (2.1) and (2.2). By comparing the intensities of the local traffics
of the participating routers, one can determine the locations of the attackers.
Traceback example
To illustrate this traceback process, let us consider the following simple but
illustrative example using the network topology of Figure 2.1 (on Page 25), and
we call this example Example A. There is only one attacker located in LAN5.
The attacker launches a DoS attack to the victim site V, and V initiates the
traceback procedure to determine the location of the attacker. For simplicity,
we assume that the initial values of all outgoing traffic counters of the routers
are zero (i.e., Ci(0) = 0 for all routers Ri in the network topology). The
counter values of all five routers’ outgoing traffic are taken at time instants t1
and t2. Table 2.1 depicts the outgoing traffic Ci(t) at time instants t1 and t2
for all routers, and the values of all routers’ accumulative local traffic Ni(t) at
time instants t1 and t2.
The local traffic Li(t1, t2) of each router generated within [t1, t2] are shown
in Table 2.2 wherein the computations are performed based on Equation (2.2).
Chapter 2 Distributed Snapshot Traceback Algorithm 31
20, 000 pkts time
time
R5
R4
C5(t1)
t1 t2
C5(t2)
C4(t1)
t1 t2’
C4(t2’)
Figure 2.2: Asynchronous reading of outgoing traffic counters in Example B.
Comparing the intensities of the local traffics of these five routers within the
interval [t1, t2], one can deduce that the domain of router R5 is the location of
the attacker.
2.1.5 Difficulties of a distributed traffic measurement
By using this traffic measurement, one can easily jump to the conclusion that
a DDoS traceback is an easy task. However, we will show that there are some
deficiencies in this distributed counter reading approach. The major problem is
that Equations (2.1) and (2.2) are only correct if the network has a global clock
and all routers can perform synchronous reading of their respective outgoing
traffic counters. Let us consider Example A again but with asynchronous
reading of the counters, and we call this Example B depicted in Figure 2.2.
A black rectangle in the figure represents the time instant at which the
outgoing traffic counter of a router is read. Since the second outgoing traffic
counters for R4 and R5 are not read simultaneously, some packets from R5
sending to V are not recorded by R5, but are recorded by R4. To illustrate
this numerically, C5(t2) becomes 30,000 instead of 50,000. Thus, L5(t1, t2)
becomes 10,000 while L4(t1, t′2) becomes 21,000.
This shows that asynchronous reading of the counters will mislead the
victim site to conclude that the domain of router R4 is the location of the
Chapter 2 Distributed Snapshot Traceback Algorithm 32
attacker. In the next section, we present a complete distributed algorithm
to precisely measure the local traffics of all routers in a synchronized manner
without a global clock among the routers.
2.2 Distributed Algorithm
In this section, we present the complete distributed algorithm to measure the
local traffic of every router. We first define the notion of “correctness” for
measuring the local traffic of each participating router and demonstrate how
one can effectively achieve the required correctness. Besides the general proof
for the correctness of the proposed distributed algorithm, we also present an
example to illustrate the effectiveness of this distributed algorithm.
In the previous section, we have used an example to illustrate that a straight
forward manner of reading the outgoing traffic counters can lead to an erro-
neous conclusion (which concludes a wrong location of the attacker). The
reason for this erroneous conclusion is that the outgoing traffic counters of the
immediate upstream routers of R4 are not recorded correctly.
Rj
Ri
Cj(tj,k)
Ci(ti,k)
time
time
tj,k
ti,k
Aji Bji
Figure 2.3: Correct accumulative local traffic without clock synchronization.
Chapter 2 Distributed Snapshot Traceback Algorithm 33
2.2.1 Reasons for incorrect traceback result
Figure 2.3 illustrates a timing diagram with two routers Ri and Rj , where
Rj is an immediate upstream router of Ri. The black rectangle in the figure
represents the time instant at which the outgoing traffic counter of a router Ri
is read, and we let this time instant be ti,k, where k ∈ [1, 2] represents whether
the reading is taken for the first or the second time. We assume that Figure 2.3
is illustrating the reading of the outgoing traffic counter for the kth time. Let
Cj(tj,k) be the value of Rj’s outgoing traffic counter at time tj,k and let Ci(ti,k)
be the value of Ri’s outgoing traffic counter at time ti,k. The Aji block in the
figure represents a sequence of packets that are sent to V through Rj before
tj,k but are received by Ri after ti,k. Correspondingly, the Bji block represents
a sequence of packets that are sent to V by Rj after tj,k but are received by
Ri before time ti,k. When Ri records the traffic counter Ci(ti,k), there may be
chances that
• the amount of traffic represented by Aji should be included in Ci(ti,k),
but it is in fact not considered; or
• the amount of traffic represented by Bji should not be included in Ci(ti,k),
but it is in fact counted.
These are scenarios when the mis-counting of packets happened and will cer-
tainly lead to an erroneous conclusion.
2.2.2 Measuring the correct local traffic
Based on the above findings, we re-consider the formation of the calculation
of the local traffic. Let the accumulative local traffic of router Ri at time ti,k
be Ni(ti,k). The accumulative local traffic of Ri at time ti,k is the difference
between the outgoing traffic of Ri at ti,k and the transit traffic received by Ri
Chapter 2 Distributed Snapshot Traceback Algorithm 34
at ti,k. Hence:
Ni(ti,k) = Ci(ti,k) − ( transit traffic received by Ri at ti,k ) .
The transit traffic received by Ri at time ti,k is the outgoing traffic sent from
all immediate upstream routers of Ri. On one hand, since packets in Aji are
not received by Ri at ti,k, but are recorded by Rj at tj,k, one needs to reduce
this traffic workload from Cj(tj,k). On the other hand, packets in Bji are
received by Ri at ti,k, but are not recorded by Rj at tj,k. Thus, one needs to
include this traffic workload in Cj(tj,k). Therefore, the correct accounting of
the accumulative local traffic from router Ri is defined as follows:
Ni(ti,k) = Ci(ti,k) −∑
Rj∈U(Ri)
( Cj(tj,k) − Aji + Bji ) .
Let us apply the above equation back to Example B illustrated in Figure
2.2. The accumulative local traffic N4(t4,2) becomes:
N4(t4,2) = C4(t4,2) − (C5(t5,2) − A54 + B54)
= 66000 − (30000 − 0 + 20000)
= 16000.
∴ L4(t4,1, t4,2) = 16000 − 15000 = 1000.
Hence, one can conclude that the attacker is in the domain of R5. In the fol-
lowing subsection, we present an efficient distributed algorithm to measure the
two sequences of packets Aji and Bji correctly so as to satisfy the correctness
criteria stated above.
2.2.3 The distributed snapshot algorithm
We make use of the result in [34] to collect all outgoing traffic counters in a
coordinated manner, and, at the same time, to determine the values of Aji
and Bji. We call this algorithm the distributed snapshot traceback algorithm
(snapshot algorithm for short).
Chapter 2 Distributed Snapshot Traceback Algorithm 35
There are three main components in this algorithm, namely (1) the marker,
(2) the local state of a participating router or the victim site, and (3) the
channel state. These three components have the following functionalities under
our DDoS application.
1. Marker: The marker is a special packet with a special header or a spe-
cial header entry. The marker is initially sent by V to all its neighboring
routers. The functionality of the marker is to facilitate all participat-
ing routers to record their local states and to derive the corresponding
channel states.
2. Local state: The routers as well as the victim site have their own
local states. For a participating router, say Ri, the local state at time t
corresponds to the value of its outgoing traffic counter Ci(t). However,
the victim site V does not have an outgoing traffic counter. Instead,
the local state of the victim site V refers to the accumulative number
of packets that V has received by time t, i.e., the aggregated traffic
destined for V from the domains of all participating routers. We denote
this accumulative incoming traffic to V at time t as TV(t). To find the
aggregated incoming traffic sent to V within the interval [t1, t2], one can
perform the following calculation:
Incoming Traffic of V
IV(t1, t2) = TV(t2) − TV(t1). (2.3)
3. Channel state: This corresponds to the number of packets that are
received by a router after the router records its own local state but before
that router receives the marker along that link. Its role will be thoroughly
discussed later.
The snapshot algorithm in [34] assumes that the packet delivery process
is in the order sent (or FIFO). As two adjacent routers are connected by a
Chapter 2 Distributed Snapshot Traceback Algorithm 36
communication link or in the same LAN, the delivery order of the packets can
be preserved under this kind of physical connection. On the other hand, when
a router or the victim is measuring the channel state, it has to distinguish
at which channel an incoming packet is arriving. One cannot depend on the
source address of the incoming packet as it is spoofed. We suggest that the
router should refer to the level-two address (e.g., MAC address in Ethernet) of
a packet in order to distinguish from which channel the packet is coming. This
method does not bear the risk that the attackers are sending spoof packets
because we are interested only in the upstream router that a packet is coming
from. Also, it is futile for the attacker to spoof the level-two source address
of a packet because when the packet is routed through a routing device, the
level-two address is usually altered and set to the hardware address of that
routing device.
2.2.4 Pseudocode and execution of snapshot algorithm
In the following, we are going to present the flow to the snapshot algorithm.
First, we have to define the following notations. Let Ri and Rj be two adjacent
routers connected by two uni-directional links, namely Linki,j and Linkj,i.
Linki,j carries traffic that is from Ri to Rj while Linkj,i carries traffic from Rj
to Ri. Let the time instant that Ri records its local state be ti,k, and let the
time instant that Ri receives a marker from Linkj,i after it has recorded its
local state be ti,kj (if the marker arrives before Ri records its local state, then
that time instant is ti,k. See the following algorithm for details). Lastly, let
Hji(ti,k, ti,kj) be the channel state of Linkj,i, which is the number of packets
received by Rj from Ri after ti,k and before ti,kj . The following pseudo-code
shows the outline of the snapshot algorithm.
Distributed Snapshot Traceback Algorithm
Chapter 2 Distributed Snapshot Traceback Algorithm 37
Algorithm initialization
V records the value of incoming traffic at t as TV(t);
For (each link Linkv,k that connects V to its neighboring router Rk) {V sends a marker along Linkv,k and starts recording the number of
packets received from Linkk,v;
}
Marker-sending and marker-receiving rules
For the victim site V:
If (V has received a marker from a Linkk,v at time t′){V stop recording the number of packets received from Linkk,v and
stores the value as the channel state Hkv(t, t′);
}For each participating router Ri:
If (Ri has received a marker from Linkj,i at time ti,k and Ri has not
recorded its local state) {Ri records the value of the outgoing traffic counter at time t as Ci(t);
Ri sets the channel state Hji(ti,k, ti,kj) as zero;
For (each link Linki,k that connects Ri to its neighboring router Rk)
{Ri sends a marker along Linki,k;
Ri starts recording the number of packets received from Linkk,i, except
Linkj,i;
}If (Ri has received a marker from Linkj,i at time tji,k and Ri has already
recorded its local state){Ri stop recording the number of packets from Linkj,i and stores the
value as Hji(ti,k, ti,kj);
}
Chapter 2 Distributed Snapshot Traceback Algorithm 38
Termination
For each participating router Ri in the network topology:
If (Ri has recorded local state and has finished recording channel states
for all incoming links) {Ri sends the snapshot data (i.e., its local state and all its channel
states) to V;
Ri terminates;
}
For the victim site V:
If (V has recorded its incoming traffic and has finished recording all
channel states for all its incoming links and has received the snapshot
data sending from all participating routers) {V terminates;
}
In Figure 2.4, we depict the execution of the snapshot algorithm according
to the above pseudocode. The first step should be the invocation of the snap-
shot algorithm and it occurs when the victim site V acknowledges that it is
under a DDoS attack (for instance, V finds that the amount of incoming traffic
has exceeded a pre-defined threshold). In the beginning of the algorithm, Vsends markers to its neighboring routers along its outgoing links (Step 1.1 in
the figure). At the same time, the victim measures the number of packets from
the incoming links, and these are the channel states (Step 2.2 in the figure).
The router, upon the arrival of a marker from the victim, propagates the
markers to routers further from the victim (Step 2.1). Also, it returns a marker
back to the victim (Step 2.2) and the victim stops measuring the corresponding
channel state when the return marker arrives (Step 3). Eventually, a router
Chapter 2 Distributed Snapshot Traceback Algorithm 39
R1
ν
R2
R5
R3
R4
CA
C AC
C C
C
C
C C
C
ν
Step 1.1.Markers are sent fromvictim to neighboringrouters.
Step 1.2.Victim starts measuringchannel states.
R1
Step 2.1.Upon the receipt of the markerfrom the victim, the router returnsa marker back to the victim.
Step 2.2The router keeps propagatingthe markers to other routers.
ν
Step 3.Markers return to the victim and itstops measuring the channel states.
R2
Step 4.The local state of the router startssending back to the victim.
R4
Step 4.The local state of the router startssending back to the victim.
Figure 2.4: An example execution of the snapshot algorithm.
Chapter 2 Distributed Snapshot Traceback Algorithm 40
send its local state to the victim when it has stopped measuring all the channel
states (router R4 - Step 4) or it does not have any channel states to measure
(router R2 - Step 4).
The properties of this series of coordinated actions among the routers by
the sending and receiving of markers are as follows:
1. to guarantee that Bji = 0, and
2. to ensure the measured value Hji(ti,k, ti,kj) is equivalent to Aji.
After a router has finished recording its local and channel states for all its
incoming links, it sends these information to V. The algorithm terminates after
V has finished recording its local state and channel states, and has received the
local states and the channel states from all participating routers. In addition,
Lemma 2.1 proves the mentioned properties of the algorithm.
Lemma 2.1 For any two adjacent routers Ri and Rj which are connected by
the Linkj,i, the snapshot algorithm guarantees that Bji = 0, and correctly
measures Aji as the channel state of Linkj,i.
Proof. We first prove that Bji = 0, then prove that Aji is the channel state
Hji(ti,k, ti,kj) of Linkj,i.
To illustrate, we depict the ideas of the proof in Figures 2.5 and 2.6. In
the figures, the black rectangle represents the time instant at which the value
of the outgoing traffic counter of a router is recorded. The shaded rectangle
represents the time instant at which a marker arrives at Ri after the value of the
outgoing traffic counter of Ri is recorded. The dotted line is the transmission of
the sequence of packets Aji or Bji from Rj to Ri while the solid line represents
the transmission of the marker from Rj to Ri.
For both figures, case 1 corresponds to the scenario that Ri records its local
state Ci(ti,k) because it receives the marker from Rj along Linkj,i. For case 2,
Chapter 2 Distributed Snapshot Traceback Algorithm 41
Rj
Ri
Cj(tj,k)
Ci(ti,k)
time
time
Rj
Ri
Cj(tj,k) time
time
Case 1: Bji = 0 Case 2: Bji = 0
tj,k
ti,k
t
ti,kjti,k
Ci(ti,k)
BjiBji
Figure 2.5: Bji = 0 under all circumstances.
Rj
Ri
Cj(tj,k)
Ci(ti,k)
time
time
Aji
Case 1: Aji = 0
Rj
Ri
Cj(tj,k)
Ci(ti,k)
time
time
Aji
Case 2: Aji = Hji(ti,k, ti,kj)
tj,k
ti,k
tj,k
ti,k ti,kj
Figure 2.6: Aji is the channel state.
Ri has already recorded its local state Ci(ti,k) before the arrival of the marker
from Rj along Linkj,i.
In Figure 2.5, we show that Bji = 0 under all circumstances. When the
router Rj records its local state Cj(tj,k), it sends markers to all its outgoing
channels. Since Linkj,i is FIFO, all packets in Bji cannot reach Ri before the
marker arrives at Ri. Therefore, Bji is equal to zero in both cases.
We now prove that Aji is equivalent to the channel state of Linkj,i. Recall
that Aji represents a sequence of packets that are sent to V by Rj before tj,k
but are received by Ri after ti,k. When the router Rj records its local state
Cj(tj,k), it sends markers to all its outgoing channels. There are two cases to
Chapter 2 Distributed Snapshot Traceback Algorithm 42
consider:
(A) In case 1 of Figure 2.6, all packets of Aji must not be able to reach Ri
after ti,k as Linkj,i is FIFO. Therefore, Aji = 0; or
(B) In case 2 of Figure 2.6, all packets of Aji reach Ri before ti,k due to
the FIFO property of Linkj,i. Since Ri records its local state Ci(ti,k) at
time ti,k and starts counting the number of packets arrived along Linkj,i,
Ri stops counting the number of packets arrived along Linkj,i when it
receives the marker from Rj at t′. The packets arrived between [ti,k, ti,kj]
is the channel state Hji(ti,k, ti,kj) of Linkj,i. Based on the result in case
1, the packets of Aji cannot reach Ri after the time ti,k. Thus, by the
definition of Aji, Aji is equal to channel state Hji(ti,k, ti,kj) of Linkj,i.
Finally, by Lemma 2.1, the equation for calculating the accumulative local
traffic Ni(ti,k) is as follows:
Ni(ti,k) = Ci(ti,k) −∑
Rj∈U(Ri)
(
Cj(tj,k) − Hji(ti,k, ti,kj))
. (2.4)
To summarize, the aim of the traceback algorithm is to measure the local
traffic of every router within the time interval [t1, t2]. The victim site V initiates
the snapshot algorithm twice: once at time t1 and another at time t2. Based on
the local states and the channel states received from all routers, the victim site
calculates the consistent local traffic counters by applying Equation (2.4). In
turns, the victim calculates the local traffic intensity of router Ri by Equation
(2.2).
2.2.5 Example in calculating the traceback result
In this subsection, we illustrate the DDoS traceback algorithm through an
example by using the network topology shown in Figure 2.7. For simplicity,
the LANs of the routers are not shown and we assume that all the routers
Chapter 2 Distributed Snapshot Traceback Algorithm 43
R1
R2
R4
R3
ν
attackers
Figure 2.7: A network topology with two attackers who reside in the localdomains of R3 and R4.
have their own local domains. The attackers are located in the domains of the
routers R3 and R4 as indicated in the figure.
Figure 2.8 illustrates how the DDoS traceback algorithm works, we first
explain the symbols shown the figure. A black rectangle represents the time
instant that a router Ri records the value of its outgoing traffic counter, and
we denote this time instant as ti,k where k represents the kth instance of the
snapshot algorithm, k ∈ [1, 2]. In addition, the corresponding value of the
outgoing traffic counter is shown besides the black rectangle. For the victim
site V, the black rectangle represents the time instant that the victim site
V records the number of incoming packets. We denote this time instant as
tk where k ∈ [1, 2]. For the simplicity of illustration, we assume that the
initial values of the outgoing traffic counters of all routers and the initial value
of the accumulative incoming traffic of the victim site V are zero. On the
other hand, a shaded rectangle represents the time instant that a router or
the victim stops recording a channel state, and we denote this time instant as
ti,kj (the superscript j means that the marker coming from Linkj,i). Similar
to the presentation of the local state, the value of the channel state is shown
besides the shaded rectangle. This figure also shows the time instant that the
Chapter 2 Distributed Snapshot Traceback Algorithm 44
R1
R2
R3
R4
ν 0 0
0
10
100
0
010
230
20
110
200
230 0 0
10100
0
1st instance of thesnapshot algorithm
2nd instance of the snapshot algorithm
time instant of recordingthe local state
time instant tostop recordingthe channel state
time instant thatmalicious packets(100 packets) are sent
time instant thatnormal packets(10 packets) are sent
na nb
nc
ma
t1t2
t1,1
t2,1
t3,1
mbnd
t4,1
time
time
time
time
time
Figure 2.8: A timing diagram that shows the progress of the distributed snap-shot traceback algorithm.
domain of a router transmits packets to V. A sequence of normal packets is
represented by a white circle in the figure, and each white circle represents
10 packets (shown as na, nb, nc, and nd in the figure). On the other hand,
a sequence of malicious packets from the attackers is represented by a black
circle, and each black circle represents 100 packets (ma and mb in the figure).
Based on the values shown in Figure 2.8, one can apply Equation (2.4) to
calculate the accumulative local traffic of the routers and the victim site, and
the corresponding results are shown in Table 2.3. In order to have a clearer
illustration, we show the calculation of the accumulative local traffic N2(t2,1)
of R2 as an example. Referring to Figure 2.7, the immediate upstream routers
of R2 are R1 and R3. Then, we apply Equation (2.4) as follows:
N2(t2,1) = C2(t2,1) − (C1(t1,1) − H12(t2,1, t2,11)) −
(C3(t3,1) − H32(t2,1, t2,13))
= 0 − (10 − 10) − (0 − 0) = 0 .
Similarly, one can follow the above procedure to calculate the accumulative
local traffic of R1, R3, and R4 in both instances of the snapshot algorithm.
Chapter 2 Distributed Snapshot Traceback Algorithm 45
Accumulative local traffic at time tTime N1(t) N2(t) N3(t) N4(t) Tv(t)t = ti,1 10 0 0 100 0t = ti,2 20 10 100 110 230
Table 2.3: Computation of accumulative local traffic at time ti,1 and ti,2.
Local traffic from ti,1 to ti,2R1 R2 R3 R4
Li(ti,1, ti,2) 10 10 100 10
Table 2.4: The local traffic intensity counts only the packets in between thetwo instances of the snapshot algorithm.
After calculating all accumulative local traffic for both instances of the
snapshot algorithm, one can apply Equation (2.2) to obtain the local traffic
intensity of every router. The result is shown in Table 2.4. The most significant
property of the DDoS traceback algorithm is that only the packets that are
sent within the two instances of the snapshot algorithm will be recorded. We
refer to Figure 2.8 again in order to illustrate the mentioned property. The
figure shows that there are totally 240 packets (packet sequences na, nb, nc,
nd, ma, and mb) that have been sent towards the victim site V. However,
the sequences of packets na and mb are sent before the routers participate in
the traceback algorithm and, thus, these packets are not recorded. Therefore,
one can conclude that if an attacker at Ri that has sent a massive amount of
packets within the two instances ti,1, ti,2 of the snapshot algorithm, he/she will
be discovered by the traceback methodology.
To conclude this section, we presented the DDoS traceback algorithm which
records the local traffic of the routers correctly without the requirement of a
global clock. Also, a proof is given to show the correctness of the algorithm.
Moreover, one can have a very clear understand on how the snapshot algorithm
Chapter 2 Distributed Snapshot Traceback Algorithm 46
works through the presented example.
2.3 Interpreting the Traceback Result
In the previous section, we presented the DDoS traceback algorithm which
enables the victim site V to correctly compute the local traffic of every partic-
ipating router. Referring back to the example in Figure 2.8, one can observe
that the local administrative domain of router R3 is the location of the attacker
by comparing the local traffic intensity of Table 2.4. Since the attacker in R3
sent the sequence of malicious packets ma to the victim site V, the calculated
local traffic L3(t3,1, t3,2) of R3 is significantly larger than those of other routers.
Nevertheless, from Figure 2.8, one can notice that another attacker in R4 had
also sent the sequence of malicious packets mb to V before t4,1, but these mali-
cious packets are not revealed in Table 2.4 so that one cannot determine R4 is
another location of the attacker. The reason is that the sequence of packets mb
is not sent to V within [t4,1, t4,2]. Therefore, the malicious packets mb which
are sent by R4 before t4,1 and are received by V within [t1, t2] are not recorded
as the local traffic of R4. This leads to a problem relating local traffic of the
router Ri and incoming traffic of the victim site V.
According to Table 2.4, the sum of all local traffics is 10 + 10 + 100 + 10
= 130. But the incoming traffic received by V within [t1, t2] is:
IV(t1, t2) = TV(t2) − TV(t1) = 230 − 0 = 230.
The total number of packets generated by all routers within the two instances
of the snapshot algorithm is not equal to the number of packets received by
V within [t1, t2]. We call this as the traffic inequality. The traffic inequal-
ity suggests that the local traffics of the routers may not arrive at V within
[t1, t2], and thus one should not rely only on the local traffic of each router
to determine the location of the attackers. In the following subsections, we
Chapter 2 Distributed Snapshot Traceback Algorithm 47
will investigate this problem and we will illustrate a methodology to locate all
potential attackers.
2.3.1 Investigation of the traffic inequality
In this subsection, we present a detailed analysis of the traffic inequality. To
start our analysis, we first distinguish packets that are sent from the domain
of a router Ri to its downstream routers into three categories based on the
time that Ri records its local state. These packets are (1) the pre-monitoring,
(2) the monitoring, and (3) the post-monitoring packets, and they are formally
defined in Definition 2.3.
Definition 2.3 We define pre-monitoring, monitoring and post-monitoring
packets with respect to the time that the router Ri records its local state:
1. A packet sent from the local administrative domain of Ri is called a pre-
monitoring packet if and only if the packet is sent before Ri records its
local state in the first instance of the snapshot algorithm (ti,1).
2. A packet sent from the local administrative domain of Ri is called a
monitoring packet if and only if the packet is sent after Ri records its
local state in the first instance of the snapshot algorithm (ti,1), and before
Ri records its local state in the second instance of the snapshot algorithm
(ti,2).
3. A packet sent from the local administrative domain of Ri is called a
post-monitoring packet if and only if the packet is sent after Ri records
its local state in the second instance of the snapshot algorithm (ti,2).
Figure 2.9, which shows a timing diagram of a router Ri, illustrates the
three categories of traffics. All packets which are sent before the first instance
Chapter 2 Distributed Snapshot Traceback Algorithm 48
pre-monitoringpackets
post-monitoringpackets
monitoringpackets
Ri time
ti,1 ti,2
time instant that a router records its local state
monitoring regionpre-monitoringregion
post-monitoringregion
Figure 2.9: Classification of pre-monitoring, monitoring and post-monitoringpackets.
of the snapshot algorithm are pre-monitoring packets. The packets which are
sent between two snapshots are the monitoring packets, and these packets are
actually the local traffic of Ri. The packets which are sent after the second
instance of the snapshot algorithm are the post-monitoring packets.
Based on the above classification, one can have a better understanding
about the snapshot algorithm. Let Rj be the downstream router of Ri. The
channel state recorded by Rj in the first snapshot is the pre-monitoring packets
from Ri entering the monitoring region of Rj , e.g., na and mb in Figure 2.8.2
Similarly, the channel state recorded by Rj in the second snapshot is the
monitoring packets from Ri entering the post-monitoring region of Rj , e.g.,
nd in Figure 2.8.
We analyze the effect of the channel states recorded by the routers from the
point of view of the victim site. We denote the aggregated channel states as the
sum of all channel states recorded in an instance of the snapshot algorithm.
Let δ(k) be the numbers of packets in the aggregated channel states of the kth
2Note these pre-monitoring packets may also enter the post-monitoring region of Rj
but these packets will not affect our analysis. It is because it will be canceled out in thecalculation of the local traffic
Chapter 2 Distributed Snapshot Traceback Algorithm 49
instances of the snapshot algorithm respectively, i.e.,
δ(k) =∑
Ri,Rj∈G
Hji(ti,k, ti,kj), where k ∈ [1, 2] . (2.5)
During the first instance of the snapshot algorithm, δ(1) represents all pre-
monitoring packets that are received in the monitoring region of the victim site.
Similarly, during the second instance of the snapshot algorithm, δ(2) represents
all monitoring packets which are received in the post-monitoring region of the
victim site. Referring to the example in Figure 2.8, δ(1) = na + mb = 110 and
δ(2) = nd = 10.
As a matter of fact, the monitoring packets sent from router Ri are the
local traffic of Ri. If these packets are received only in the monitoring region,
i.e., within [t1, t2], of the victim site V, the traffic inequality problem will not
exist. However, the pre-monitoring packets of the aggregated channel states in
the first instance of the snapshot algorithm arrive at V within [t1, t2], therefore,
the victim site actually receives both the monitoring and the pre-monitoring
packets from all routers within [t1, t2]. Also, the monitoring packets of the
aggregated channel states in the second instance of the snapshot algorithm
arrive at V after t2. Hence, the victim site does not receive all monitoring
packets from the routers.
Let the local traffic of Ri be Li(ti,1, ti,2), where i ∈ [1, . . . , n], and let
IV(t1, t2) be the incoming traffic of the victim site V within [t1, t2]. According
to the above observation, we have the following equation relating IV(t1, t2),
Li(ti,1, ti,2) of Ri, δ(1), and δ(2):
IV(t1, t2) =∑
Ri∈G
Li(ti,1, ti,2) + δ(1) − δ(2). (2.6)
The interpretation of Equation (2.6) is as follows. IV(t1, t2) is composed of the
monitoring packets from all the routers and the pre-monitoring packets of the
aggregated channel states δ(1). Thus, IV(t1, t2) is the sum of∑
Ri∈GLi(ti,1, ti,2)
and δ(1). However, the monitoring packets in the aggregated channel states
Chapter 2 Distributed Snapshot Traceback Algorithm 50
δ(2) are received by the victim site V after t2. Therefore, δ(2) is subtracted from
the above sum. Referring to the example in Figure 2.8, we have the following
result by using Equation (2.6):
∑
Ri∈G
Li(ti,1, ti,2) + δ(1) − δ(2) = 130 + 110 − 10 = 230.
The above value is exactly equal to the value of incoming traffic IV(t1, t2) of
V. To summarize, the traffic inequality is compensated by Equation (2.6).
2.3.2 Calculating bounds for the number of packets ar-
rived at the victim site
Recall from the previous subsection, we have two important observations:
1. The packets arrived at the victim site within [t1, t2] not only include the
monitoring packets from the routers, but also include the pre-monitoring
packets of the aggregated channel state δ(1), and
2. The monitoring packets of the aggregated channel state δ(2) arrive at the
victim site after t2.
These observations imply that one cannot directly use the local traffics
Li(ti,1, ti,2) of the participating routers to find the locations of the attackers.
To overcome this problem, one has to identify the originating domains of the
pre-monitoring packets as well as the monitoring packets in the channel states.
Consider the illustration in Figure 2.10 which contains the same topology
as in Figure 2.7 as well as a timing diagram that shows the time-lines of R2,
R3, and R4. The channel state of Link3,2 is measured by the router R2, and it
contains the sequence of packets my. However, the packet sequence mx may
also be included in the channel state of Link3,2 since mx may arrive at router
R3 before R3 records its local state. Thus, the packets in the channel state of
Link3,2 may be sent from R2’s upstream routers R3 and R4. In summary, let
Chapter 2 Distributed Snapshot Traceback Algorithm 51
Channel state measured by R2
ν
Upstream routersof R2
time
time
time
R1
R2
R3
R4
R4
R3
R2
mx
my
Figure 2.10: The channel state of Link3,2 contains pre-monitoring (monitoring)packets from both R3 and R4 in the first (second) instance of the snapshotalgorithm.
Rj ∈ U(Ri), and let Ri and Rj be connected by Linkj,i. A non-empty channel
state, Hji(ti,k, ti,kj), of Linkj,i, is composed of the pre-monitoring (monitoring)
packets from the upstream routers of Ri in the first (second) instance of the
snapshot algorithm, and these packets are from Linkj,i.
Based on the above observation, one cannot determine the originating do-
mains of the packets in the channel states. This implies that one cannot cal-
culate an exact number of packets that have arrived at the victim site within
[t1, t2] from each participating router. However, we provide a methodology to
determine the upper and the lower bounds of these packets. Let Rr be an
upstream router of Rq. Let Hpq(tq,k, tq,kp)|Rr
represents the exact number of
packets which are sent from the domain of Rr and contributes to the channel
state Hpq(tq,k, tq,kp) of Linkp,q. Hence, the channel state of Linkp,q is the sum
of Hpq(tq,k, tq,kp)|Rr
for all upstream routers Rr of Rq, and the corresponding
equation is as follows:
Hpq(tq,k, tq,kp) =
∑
Rr∈U(Rq)
Hpq(tq,k, tq,kp)|Rr
. (2.7)
Also, recall from Section 2.1.2 that D(Ri) represents a set of downstream
routers of Ri. The amount of monitoring packets that are generated by the
Chapter 2 Distributed Snapshot Traceback Algorithm 52
domain of Ri and arrive at V after t2 is:
∑
Rp,Rq∈D(Ri)
Hpq(tq,2, tq,2p)|Ri
. (2.8)
Equation (2.8) represents the number of monitoring packets which are sent
from the domain of Ri and are recorded as the channel states of the downstream
router of Ri in the second instance of the snapshot algorithm. Li(ti,1, ti,2)
represents the number of monitoring packets sent by Ri within the snapshot
interval. Since the packets in Equation (2.8) are the monitoring packets that
are not received at V within [t1, t2], the number of monitoring packets which
are sent from Ri and are received by V in [t1, t2] is:
Li(ti,1, ti,2) −∑
Rp,Rq∈D(Ri)
Hpq(tq,2, tq,2p)|Ri
. (2.9)
Let L∗i (ti,1, ti,2) be the number of packets which are sent from Ri and are
received by the victim site V within [t1, t2] (the real local traffic). These packets
are composed of two components: (i) the monitoring packets sent from Ri and
arrived at V in [ti,1, ti,2], which is given by Equation (2.9), and (ii) the pre-
monitoring packets sent from Ri and arrived at V within [t1, t2], and these
packets are given as follows:
∑
Rp,Rq∈D(Ri)
Hpq(tq,1, tq,1p)|Ri
. (2.10)
Thus, by Equations (2.9) and (2.10), the real local traffic L∗i (ti,1, ti,2) of Ri is
represented as follows:
L∗i (ti,1, ti,2) = Li(ti,1, ti,2) −
∑
Rp,Rq∈D(Ri)
Hpq(tq,2, tq,2p)|Ri
+
∑
Rp,Rq∈D(Ri)
Hpq(tq,1, tq,1p)|Ri
. (2.11)
Note that it is possible that the pre-monitoring packets may arrive at V after
t2 probably because the interval [t1, t2] is not long enough. However, those
Chapter 2 Distributed Snapshot Traceback Algorithm 53
packets will not affect the correctness of the calculation of the real local traffic
L∗i (ti,1, ti,2) because these packets will be recorded in both Equations (2.8) and
(2.10). As shown in Equation (2.11), these packets will be canceled out.
Let Upper(L∗i ) and Lower(L∗
i ) be the upper and lower bounds of the real
local traffic L∗i (ti,1, ti,2) respectively. To find the bounds of L∗
i (ti,1, ti,2) in Equa-
tion (2.11), one can observe that
Hpq(tq,k, tq,kp) ≥ Hpq(tq,k, tq,k
p)|Ri.
Therefore, Upper(L∗i ) and Lower(L∗
i ) are:
L∗i (ti,1, ti,2) ≤ Li(ti,1, ti,2) +
∑
Rp,Rq∈D(Ri)
Hpq(tq,1, tq,1p), and (2.12)
L∗i (ti,1, ti,2) ≥ Li(ti,1, ti,2) −
∑
Rp,Rq∈D(Ri)
Hpq(tq,2, tq,2p) . (2.13)
Referring to Figure 2.8 on Page 44, Upper(L∗4) and Lower(L∗
4) are:
Upper(L∗4) = Li(t4,1, t4,2) + H43(t3,1, t3,1
4) = 110
Lower(L∗4) = Li(t4,1, t4,2) − H43(t3,2, t3,2
4) = 0 .
Since the attacker in R4 sends the sequence of malicious packets mb to V,
Upper(L∗4) is significantly higher than the others. This suggests that the do-
main of R4 is also a possible location of the attackers. In the next section,
we present the simulation results that show the effectiveness of our distributed
methodology.
2.4 Performance Evaluations
In the previous sections, we presented the DDoS traceback algorithm to deter-
mine the intensity of local traffic for each participating router. In this section,
we carry out two different sets of simulations to demonstrate the effectiveness
of our proposed methodology. In the first set of simulations (Simulation A),
Chapter 2 Distributed Snapshot Traceback Algorithm 54
R3
R4
R5
R6
R1
R2
ν
Upper Bound of Local Traffic
Local Traffic
Lower Bound of Local Traffic
(a) (b)
Figure 2.11: (a) Network topology and (b) Legend for Simulations A and B.
we use a simple network topology as depicted in Figure 2.11(a) to illustrate
the correctness and robustness of our algorithm under various factors (e.g.,
different processes of generating traffic, different attackers’ location distribu-
tions, . . . , etc). For the second set of simulations (Simulation B), we extend
the performance study to a large scale realistic Internet topology.
Simulation A (Correctness and robustness of DDoS traceback al-
gorithm): This set of simulations evaluates the correctness of the proposed
DDoS traceback algorithm. For this set of simulations, we use a network topol-
ogy in Figure 2.11(a) which contains six routers. The packets are generated by
two methods: (1) constant rate (e.g., an average rate of 100 pkts/sec implies
that every 0.01 second, a router will generate a new packet to the victim site
V), (2) exponential on/off process. (i.e. packets are sent at a fixed rate during
the “on” periods, and no packet will be sent during the “off” periods). Both
the on and off periods are taken from an exponential distribution. The average
duration for the on period and the off period are set to 100ms in this set of
simulations. The bandwidth and the delay of each link are set to 100Mbps
and 50ms respectively.
Chapter 2 Distributed Snapshot Traceback Algorithm 55
Simulation A.1 (Bounds for the local traffic): In this simulation, there is one
attacker who is located at the domain of R3. The attack traffic rate of R3 is set
as a constant rate of 500 pkts/sec while the normal traffic rates for all other
routers are set to a constant rate of 100 pkts/sec. The victim site V initiates the
DDoS traceback algorithm to determine the location of the attackers. Figure
2.11(b) illustrates the legend for various graphs in our simulations. Figure 2.12
shows that the upper bound of real local traffic Upper(L∗i ), the lower bound
of real local traffic Lower(L∗i ), as well as the real local traffic L∗
i for all six
routers in four different measurement intervals. The snapshot time interval of
four cases are 1, 2, 3 and 4 seconds respectively. The lower bound and upper
bound of real local traffic are computed based on Equations (2.12) and (2.13).
The real local traffic L∗i is the number of packets sent from router Ri and
received by the victim site V within the snapshot time interval. Note that the
real local traffic L∗i is provided only in the simulation environment. In Figure
2.12, one can observe that:
1. The real local traffic L∗i is between Upper(L∗
i ) and Lower(L∗i ), which
means that our DDoS traceback algorithm can successfully bound the
exact number of packets sent from router Ri and received by the victim
site V in the snapshot time interval.
2. The difference between the bounds of the real local traffic will reduce if
we increase the duration of the measurement interval.
3. The lower bound of real local traffic of the attack domain R3 is signifi-
cantly higher than the upper bound of real local traffic of other routers.
This implies we can locate the source of the attack traffic.
4. Lastly, we observe that the measurement interval can be very short (e.g, 4
seconds) and one can quickly determine the domain of R3 is the location
of the attacker.
Chapter 2 Distributed Snapshot Traceback Algorithm 56
0
100
200
300
400
500
600
700
R1 R2 R3 R4 R5 R6
Num
ber
of P
acke
ts
Snapshot Time Interval for 1 seconds
0
200
400
600
800
1000
1200
R1 R2 R3 R4 R5 R6
Num
ber
of P
acke
ts
Snapshot Time Interval for 2 seconds
0
300
600
900
1200
1500
1800
R1 R2 R3 R4 R5 R6
Num
ber
of P
acke
ts
Snapshot Time Interval for 3 seconds
0
400
800
1200
1600
2000
2400
R1 R2 R3 R4 R5 R6
Num
ber
of P
acke
ts
Snapshot Time Interval for 4 seconds
Figure 2.12: Simulation A.1. Bounds for the real local traffic under constanttraffic rate.
0
100
200
300
400
500
600
700
800
900
R1 R2 R3 R4 R5 R6
Num
ber
of P
acke
ts
Snapshot Time Interval for 1 seconds
0
200
400
600
800
1000
1200
1400
1600
R1 R2 R3 R4 R5 R6
Num
ber
of P
acke
ts
Snapshot Time Interval for 2 seconds
0
300
600
900
1200
1500
1800
2100
R1 R2 R3 R4 R5 R6
Num
ber
of P
acke
ts
Snapshot Time Interval for 3 seconds
0
400
800
1200
1600
2000
2400
R1 R2 R3 R4 R5 R6
Num
ber
of P
acke
ts
Snapshot Time Interval for 4 seconds
Figure 2.13: Simulation A.2. The real local traffic under exponential on/offprocess.
Chapter 2 Distributed Snapshot Traceback Algorithm 57
0
100
200
300
400
500
600N
umbe
r of
Pac
kets
Snapshot Time Interval for 1 second R6 R5R4 R3R2R1
0
200
400
600
800
1000
1200
R6 R5R4 R3R2R1
Num
ber
of P
acke
ts
Snapshot Time Interval for 2 seconds
0
200
400
600
800
1000
1200
1400
1600
Num
ber
of P
acke
ts
Snapshot Time Interval for 3 seconds R6 R5R4 R3R2R1
0
500
1000
1500
2000
Num
ber
of P
acke
ts
Snapshot Time Interval for 4 seconds R6 R5R4 R3R2R1
Figure 2.14: Simulation A.3. Effect of multiple attackers on the real localtraffic bounds.
Simulation A.2 (Exponential on/off process for packet generation): In this
simulation, we consider the packet generation process which is based on an
on/off exponential process. We use the same network topology in Figure
2.11(a) and repeat the similar simulation as in Simulation A.1. The average
duration of the on period and the off period are set to 100ms in this simula-
tion. Figure 2.13 illustrates the simulation results. We observe that even if the
packet generation process is governed by an on/off process, the algorithm is
robust enough to accurately determine the local traffic intensities of all partic-
ipating routers. Similar to Simulation A.1, the same conclusion can be made
for this simulation.
Simulation A.3 (Multiple attackers): In this simulation, there are two attack
domains and they are located in R3 and R5. We repeat the similar simulation
as in Simulation A.2. The average on period and off period are set to 100ms.
The local traffic from the attack domains is set as 500 pkts/sec while the local
Chapter 2 Distributed Snapshot Traceback Algorithm 58
0
100
200
300
400
500
600N
umbe
r of
Pac
kets
Snapshot Time Interval for 1 second R6 R5 R4 R3 R2R1
0
200
400
600
800
1000
Num
ber
of P
acke
ts
Snapshot Time Interval for 2 seconds R6 R5 R4 R3 R2R1
0
200
400
600
800
1000
1200
1400
1600
Num
ber
of P
acke
ts
Snapshot Time Interval for 3 seconds R6 R5 R4 R3 R2R1
0
500
1000
1500
2000
R6 R5 R4 R3 R2R1
Num
ber
of P
acke
ts
Snapshot Time Interval for 4 seconds
Figure 2.15: Simulation A.4. Effect of new attackers’ locations.
traffic of the normal domain is 100 pkts/sec. In Figure 2.14, we observe that
the lower bound of real local traffic of the domains of R3 and R5 are signif-
icantly higher than the upper bound of real local traffic of other domains for
all cases. Therefore, our DDoS traceback algorithm can effectively and quickly
determine various attackers’ locations.
Simulation A.4 (Varying attackers’ location): In this simulation, we consider
different locations for the two attackers and analyze its effect. We repeat the
same simulation as in Simulation A.4 but the two attackers are in routers R2
and R4. In Figure 2.15, one can observe that the lower bound of real local
traffics of the domains of R2 and R4 are significantly higher than the upper
bound of real local traffics of other domains for all cases. We conclude that
our methodology is robust and it is not sensitive to the location distributions
of different attackers.
Chapter 2 Distributed Snapshot Traceback Algorithm 59
0
20
40
60
80
100
0 200 400 600 800 1000
Mea
sure
d pe
rcen
tage
of a
ttack
traf
fic (
%)
Attack traffic rate (pkt/sec)
(500, 50.51)
(225, 31.35)
snapshot interval: 1 secsnapshot interval: 2 secsnapshot interval: 5 sec
snapshot interval: 10 secsnapshot interval: 100 sec
Figure 2.16: Simulation A.5. On different attack traffic rates.
Simulation A.5. (Different attack traffic rates): In this simulation, we inves-
tigate the effect of different attack traffic rates on the traceback result. Again,
as similar to Simulation A.1, the normal traffic is sending at a rate of 100
pkt/sec, and the attacker is in the domain of R3 sending out packets with a
constant bit rate. But, now, we carry out the simulation with different attack
traffic rates ranging from 50 pkt/sec to 1000 pkt/sec at a step of 25 pkt/sec.
Our aim is to investigate 1) in what percentage the attack traffic contributes
to the aggregated traffic received by the victim, and 2) in what range of the
attack traffic the traceback methodology is effective in locating the attackers.
In Figure 2.16, there are five different plots of the measured attack traffic
percentage against the attack traffic rate at five different snapshot intervals:
one, two, five, ten, and a hundred seconds. The attack traffic percentage is
calculated by dividing the local traffic of R3 by the total number of received
packets. Firstly, according to the figure, the percentage of the measured attack
Chapter 2 Distributed Snapshot Traceback Algorithm 60
traffic decreases as the snapshot interval increases. Nevertheless, the decreasing
percentage will eventually converge to a certain value as shown in the plot of
100-second snapshot interval.
We now describe the way in which the victim locate the attackers. We
denote a term filtering threshold which means if the percentage that a domain’s
traffic contributes to the aggregated traffic is over that threshold, that domain
will be considered as an attacking domain. Eventually, the corresponding
network administrator will be notified and starts filtering the large flow. If
one sets the threshold to 50%, then, referring to the figure, one can find only
the attackers with traffic rate greater than or equal to 500 pkt/sec, labeled by
the coordinates (500, 50.51). For another example, if the threshold is set to be
30%, then the victim can find attackers with the traffic rate greater than 225
pkt/sec.
According to this simulation, we can find two main factors affecting the
effectiveness in locating the attack traffics.
1. The attack traffic to the normal traffic ratio. According to the simulation
results, the methodology may fail to detect the attacker if the attack
traffic rate is not large enough.
2. The total number of domains. If the total number of domains that the
traceback methodology is monitoring is large, then even every innocent
domain sends a small amount of traffic, this makes the attack flow not
significantly dominating and lowers the percentage of the attack flow.
Under this situation, this may require the administrator to lower the
filtering threshold. For the case that the total number of domains is
small, a similar analysis can be applied and it is suggested that the
threshold should be a large value in order that the innocent domains will
not be mis-classified as attacking domains.
Chapter 2 Distributed Snapshot Traceback Algorithm 61
0
500
1000
1500
2000
Class1 Class2 Class3 Class4 Class5 Normal
Ave
rage
Num
ber
of P
acke
ts
Snapshot Time Interval for 5 seconds
0
500
1000
1500
2000
2500
3000
3500
Class1 Class2 Class3 Class4 Class5 Normal
Ave
rage
Num
ber
of P
acke
ts
Snapshot Time Interval for 10 seconds
0
1000
2000
3000
4000
5000
Class1 Class2 Class3 Class4 Class5 Normal
Ave
rage
Num
ber
of P
acke
ts
Snapshot Time Interval for 15 seconds
0
1000
2000
3000
4000
5000
6000
Class1 Class2 Class3 Class4 Class5 Normal
Ave
rage
Num
ber
of P
acke
ts
Snapshot Time Interval for 20 seconds
Figure 2.17: Simulation B. Simulation for large scale Internet Topology.
In conclusion, the set of results in Simulation A shows the following find-
ings.
1. As the snapshot interval becomes longer, the upper bound and the lower
bound of the local traffic become closer to the measured local traffic.
2. The methodology can quickly locate the attackers with dominating large
flows, however the efficiency is subject to the magnitude of the attack
flow and the total number of domains in the network.
Simulation B (Simulations on a large scale realistic Internet topol-
ogy): To validate the correctness of the theoretical bounds of the local traffic,
we extend the performance study to a large scale and realistic Internet topol-
ogy. We use the Internet topology from [62]. The testing dataset in our
simulations contains 1,000 distinct routers. The source of traceroute is consid-
ered as the victim site V and the traceroute dataset is considered as the map of
the upstream routers. We use this dataset and construct a network simulation
Chapter 2 Distributed Snapshot Traceback Algorithm 62
test-bed based on NS-2. There are five classes of attack traffic rates. The at-
tack traffic rates of Classes 1, 2, 3, 4, and 5 are 150 pkts/sec, 175 pkts/sec, 200
pkts/sec, 225 pkts/sec, and 250 pkts/sec respectively. There are 10 attackers
in each class and the attackers are evenly distributed in the different domains
of the network. The local traffic rate of normal domain is 10 pkts/sec. Packets
are generated according to the exponential on/off process. Both the average
on/off period are set to 100ms in this simulation. The bandwidth and the
delay of each link are set to 100Mbps and 20ms respectively.
The victim site V initiates the DDoS traceback algorithm to determine the
location of the attackers. Figure 2.17 shows that the upper bound of real local
traffic Upper(L∗i ), the lower bound of real local traffic Lower(L∗
i ), and the real
local traffic L∗i for the five classes of attacker as well as for the normal routers.
In this simulation, we have four different measurement intervals, they have the
duration of 5, 10, 15 and 20 seconds respectively. The attack domain which has
the largest upper bound of real local traffic within its class is selected and its
traffic rates are plotted in the figure. From Figure 2.17, one can observe that
L∗i is in between Upper(L∗
i ) and Lower(L∗i ) for all classes, which means that
our algorithm can successfully bound the real local traffics of all participating
routers. When the snapshot time interval increases from 5s to 20s, we observe
that the spread of the bounds of the local traffic tends to decrease. Therefore,
the estimation of the real local traffic L∗i becomes more accurate for a longer
snapshot time interval. On the other hand, we can see that the lower bound
of real local traffic from each of the five attack domains is significantly higher
than the upper bound of real local traffic of normal domain. It implies that
we can quickly (e.g, with 20 seconds) and accurately (based on the differences
of the bounds) determine the locations of the attackers.
Chapter 2 Distributed Snapshot Traceback Algorithm 63
2.5 Implementation Issues
In previous sections, we have shown that our DDoS traceback algorithm is
effective in locating potential attackers and filtering attack packets. The dis-
tributed traceback algorithm relies on the assumption that the victim site Vhas a map of its upstream routers. In this section, we illustrate this assump-
tion is reasonable and practical. Also, we show that our proposed distributed
traceback algorithm leverages on the existing traceback technologies and can
complement existing infrastructure such as the ICMP traceback technique [63].
However, we cannot show that the overhead of the proposed method is small
on high-end routers. We can show only that the overhead of the traceback
service is not significant under a set of experimental Linux routers.
2.5.1 Topology construction
There are ways to obtain a map of upstream routers for a given victim site.
Many network management tools exist for mapping, for example a tool based
on traceroute from Lucent Bell Labs[62] and a tool based on ICMP echo re-
quests from CAIDA[64]. In these techniques, the victim site V sends packet
to probe hosts which are k ≥ 1 hops away. This packet contains a TTL field
which is decremented by one for each traverse link. When the TTL reaches
zero, the router sends a reply back to V. This form of probing provides the
router adjacency information which can help V to build a map of upstream
routers.
Another efficient method to obtain an upstream map is to store the router
adjacency or edge information into the packets. Approaches like probabilis-
tic packet marking[30, 48] encode the router adjacency information into the
packet header. Other approaches like itrace[63, 65, 66] generate separate ICMP
packet with router adjacency information to a victim site V. When V receives
these packets, it extracts the router adjacency information to build a map of
Chapter 2 Distributed Snapshot Traceback Algorithm 64
upstream routers.
Note that when one invokes our proposed distributed traceback methodol-
ogy, the traceback region occurs only within the map of upstream routers. In
other words, only routers within the map need to participate in the distributed
traceback. Since it is possible that some leaf routers are at the edge of the
map and at the same time, connected to some other routers outside the map.
In this case, all transit and local traffics of this type of leaf routers will be con-
sidered as the local traffics of these leaf routers. One can progressively apply
the distributed traceback algorithm to this type of leaf routers to determine
the source of the attack. For example, if the local traffic of a leaf router is very
high, then one can consider this leaf router as a victim site and initiates the
distributed traceback algorithm again.
However, it is heavy for an ordinary host to store the map as it is huge in
size. Nevertheless, techniques like the Sink Hole [28] can help forwarding traffic
to a data processing center hosted by the ISP. The data collection process and
the network map storage can then be done in that dedicated host. The only
weakness about this approach is that if there are many hosts requesting the
traceback service, the loading of the data processing center will become huge.
2.5.2 System overhead
The proposed traceback methodology is working on the victim site and the
participating routers. Under most of the execution time, the routers have to
keep track on traceback data whenever a packet passes through it. The pro-
cessing of the outgoing counters, the markers, and the channel states must
incur an inevitable overhead on the router. However, we cannot provide any
solid data about this issue. The problem is that nowadays high-end routers,
which are deployed world-wide, do not provide any programmable feature for
us to modify the router and measure the overhead of our proposed traceback
Chapter 2 Distributed Snapshot Traceback Algorithm 65
methodology. Nevertheless, on the low-end side, the Linux-based router pro-
vides us a possible choice of the programmable routers.
We have implemented a programmable router prototype together with our
proposed DDoS traceback algorithm named the OPERA [67] by introducing
new modules to the netfilter [68] in the Linux system. Although the system
overhead on high-end routers remains a subject of the future research, we
provide the system overhead analysis on low-end routers to show, firstly, the
proposed methodology is implementable and is deployable, and, secondly, the
proposed methodology does not involve complex computation and hence has
a small overhead problem on the low-end routers.
We have implemented an experimental environment called the OPERA.
Each router has to install and load the modules provided in OPERA. Nev-
ertheless, it is difficult to carry out experiments to measure the overhead,
however, the work done by Harris et al. [69] can support our claim that the
system overhead for a Linux router is not expensive.
The work done by Harris et al. [69] has carried out experiments to test
the firewall in Linux machines, i.e., the iptables. We focus on the latency test
in which shows that the performance degrades as the number of filter rules
increases. The experiments carried out in [69] show that when filtering IP
addresses, TCP/UDP ports, and MAC addresses, the latency for filtering a
packet per rule increases linearly and the latencies are approximately 0.12,
0.66, and 0.68 µs/rule respectively.
In the OPERA project, we utilize the iptables, but, we also introduce new
functionality by inserting routines into the hook points of the netfilter. When
we implement the snapshot algorithm, the inserted routine only handles two
events: 1) updates a variable whenever a packet comes in, and 2) responses
instantly on incoming markers. The first event can be handled efficiently as it
is just a variable update, and the second event requires a routine in matching
the source of an incoming marker as well as the injection of a new marker.
Chapter 2 Distributed Snapshot Traceback Algorithm 66
Nevertheless, these two events are analogous to a filtering routine. Thus,
the introduced system overhead will be as light as using the iptables, which is
widely used in Linux system nowadays, with only two filtering rules introduced.
On the other hand, for each victim site, OPERA only needs to allocate a
set of memory for each outgoing interface and another set of memory for each
channel state. Hence, this involves only a few memory usage3 , and is scalable
for several hundreds of registered victim sites4 . We argue that this traceback
service is a privileged service. There will not be many sites paying for this
service except those with high popularity. It is peculiar for a router to handle
several thousands of victim sites simultaneously. If this does happen, this is
a sign of another level of DDoS attack and the attack target is the traceback
mechanism itself. This requires an distributed authentication protocol among
the routers and the victim sites. This is beyond the scope of this thesis and is
considered as the future work.
Further, if there are compromised hosts sending requests of tracing DDoS
attacks which are not really happening, our system will be overwhelmed by the
malicious hosts. If a host is compromised, the most important point to note is
the ability for a router, a victim, or a third party (e.g. Certificate Authority,
CA) to discover its malicious identity. In most cases, there is no way for any
entity to distinguish whether a request is coming from a compromised host or
not. If a host is compromised, the compromiser most likely has the private
information of that host such as the encryption keys and the authentication
secrets, and she is free to invoke the traceback service even though an authen-
tication protocol is implemented between the routers and the clients. Hence,
the method to discover the malicious identity of a compromised host is beyond
the scope of our traceback system.
3For example, a router with three incoming interfaces and one outgoing interface onlyneeds four variables, and each variable needs four bytes (an unsigned long integer) for oneset of snapshot data. A total memory usage is only 16 bytes.
4There are several hundred Kbytes of memory available in the kernel of Linux.
Chapter 2 Distributed Snapshot Traceback Algorithm 67
2.5.3 Implementation issue based on ICMP traceback
Our proposed distributed traceback methodology can complement and leverage
on the current ICMP traceback [63]. The main idea of ICMP traceback is,
for each router, it samples the forwarding packets with low probability (e.g.,
1/20000), and to generate an authenticated ICMP traceback message on to the
sampled packets and forwards to the victim site. The ICMP traceback message
has information about the routers: on which link interfaces packets arrive and
depart, as well as the information about its previous and next routers. During
a DoS attack, a victim can use these ICMP traceback messages to reconstruct
an attack path.
We view that our proposed distributed traceback methodology and the
ICMP traceback infrastructure are complimentary. The ICMP traceback ap-
proach encodes a map of upstream routers in the ICMP traceback messages.
Therefore, a victim site can use the information in ICMP traceback messages
to build a map of upstream routers. A router running the ICMP traceback
service has the capability to associate a packet with the input port or MAC ad-
dress on which a packet arrived. This capability can help the routers to count
the incoming and outgoing traffic in our distributed traceback algorithm. One
can also use the existing ICMP traceback infrastructure so that routers can
send back local state and channel states back the victim site. Another im-
portant point is that the ICMP traceback provides an authentication service.
This can also be applied to authenticate the victim site, the senders of the
marker, local state and channel state information. The main disadvantage
of the existing ICMP traceback is, due to the low probability of generating
ICMP messages, it requires many attack packets to pass through a router so
as to identify the locations of the attackers. On the contrary, our distributed
traceback algorithm can trace the location of the attackers in a short period
time. Hence, one can achieve a more effective traceback performance by using
Chapter 2 Distributed Snapshot Traceback Algorithm 68
our proposed distributed traceback algorithm in conjunction with the ICMP
traceback service.
2.5.4 An alternative to aggregate congestion control and
push-back
Our proposed methodology can be treated as an alternative to the aggregate
congestion control [24] (ACC for short, and a brief survey in included in Section
1.4.2). The ACC, same as our proposed approach, also requires modifications
of the routers and introduces inevitable overheads. The reasons why our ap-
proach can be an alternative to the ACC mechanism are as follows:
• The modifications of the routers for ACC approach is much more complex
than the modification brought about by our approach, and this implies
a heavier overhead for the routers.
• The classification of the aggregates is a not a light burden for the router.
E.g., the router may have to match the “characteristics” of every incom-
ing packet against every definition of the known aggregates. As suggested
in [24], the classification of aggregates depends on the rules known to the
routers. If the number of rules is large, which is quite certain to be true
in order to have an effective aggregation detection, the burden will be
large. As the overhead of our approach grows with the number of victims
while the ACC approach has an inevitable large overhead, our approach
can be a better choice.
2.5.5 Special deployment - acyclic network
Through the example presented in Section 2.2.5 on Page 42 as well as the simu-
lations presented in Section 2.4 on Page 53, we have shown how the distributed
Chapter 2 Distributed Snapshot Traceback Algorithm 69
R3
R1
R2
ν
R1
R2
ν
(a)
(b)
R4
R4
R3
C3,1(t)
C3,2(t)
Figure 2.18: (a) An acyclic network with one attacker who resides in the localdomain of R3. (b) R3 maintains two accumulative outgoing traffic countersC3,1(t) and C3,2(t) for the links Link3,1 and Link3,2, respectively.
snapshot traceback algorithm works in a tree topology. We now consider the
case that the network is an acyclic one.
A tree is obviously an acyclic network. However, a network having a router
with multiple number of outgoing links, without forming any cycles, is also
considered as an acyclic network. The Chandy-Lamport distributed snapshot
algorithm can be applied to acyclic networks. In the following, we repeat the
procedures taken in Section 2.2.5 (on Page 42) so as to demonstrate how the
distributed snapshot traceback algorithm works on acyclic networks as well.
Figure 2.18(a) shows an example acyclic network. The attacker is inside
the domain of router R3. Nevertheless, unlike other routers in the example
network in Figure 2.7 on Page 43, R3 has two outgoing links. Since R3 has two
Chapter 2 Distributed Snapshot Traceback Algorithm 70
outgoing links, it is natural that there should be two corresponding accumula-
tive outgoing traffic counters for R3: one for Link3,1 and one for Link3,2. But,
the current algorithm presented in Section 2.2.4 (on Page 36) only supports
one outgoing traffic counter. To remedy this problem, we provide the following
amendments to the distributed snapshot traceback algorithm.
Supporting multiple number of outgoing link
Instead of having one accumulative outgoing traffic counter for each router,
we set the number of the accumulative outgoing traffic counters to the number
of outgoing links. Denote Ci,j(t) as the value of the accumulative outgoing
traffic counter for Linki,j at time instant t. During the kth instance of the
snapshot algorithm, denote the time instant when Ri receives a marker from
Linkj,i and records the value of the counter Ci,j(t) as t(i,j),k, and denote the
time instant when Ri receives a marker from Linkp,i after it has recorded the
value of any one of its accumulative outgoing traffic counters as t(i,j),kp. Lastly,
we abuse the notation ti,k and denote it as the logical time that Ri calculates
its accumulative local traffic.
Then, the accumulative local traffic, instead of using Equation (2.4) on
Page 42, is changed into the following:
Ni(ti,k) = Ci(ti,k) −∑
Rj∈U(Ri)
(
Cj,i(tj,k) − Hji(t(i,j),k, t(i,j),kj))
. (2.14)
where Ci(t) =∑
Rj∈D(Ri)Ci,j(t) and D(Ri) is the set of immediate downstream
routers of Ri.
Figure 2.19 shows the timing diagram for a traceback executed on the
acyclic network shown in Figure 2.18(a). Note, in the figure, we also depicted
the routes of the packets after they have passed through R3 because there are
two downstream routers for the packets to choose.
With the above change in the calculation of the accumulative local traffic,
we are interested in how one can have the correct local traffic values calculated.
Chapter 2 Distributed Snapshot Traceback Algorithm 71
R4
R3
R2
R1
ν 0
10
0
1st instance of thesnapshot algorithm
2nd instance of the snapshot algorithm
time instant of recordingthe accumulativeoutgoing counter
time instant tostop recordingthe channel state
time instant thatmalicious packets(100 packets) are sent
time instant thatnormal packets(10 packets) are sent
na nb
ma
time
time
time
time
time
10
0
10
0
0
120
20
020
100
0
20
100
00 00
00
t(2,3),1
t(1,3),1
t(2,3),2
t(1,3),2
t(3,4),1
t(2,3),14 t(2,3),2
4
t(3,4),2
Figure 2.19: Another timing diagram that shows the progress of the distributedsnapshot traceback algorithm.
It is expected that the values of the local traffic of R3 and R4 are 10 and 100,
respectively. Since the calculation of R4 is obvious, we present the calculation
of the local traffic of R3 as follows.
N3(t3,1) = (C3,2(t(2,3),1) + C3,1(t(1,3),1))
−(C4,3(t(3,4),1) − H43(t(2,3),1, t(2,3),14))
= (10 + 0) − (10 − 0) = 0.
N3(t3,2) = (C3,2(t(2,3),2) + C3,1(t(1,3),2))
−(C4,3(t(3,4),2) − H43(t(2,3),2, t(2,3),24))
= (100 + 20) − (20 − 0) = 100.
∴ L3(t3,1, t3,2) = 100.
In summary, to support a network graph with routers having more than
one outgoing edge, such routers have to implement more than one accumula-
tive outgoing traffic counter. The calculation of the accumulative local traffic
should then be changed accordingly as shown in Equation (2.14).
Chapter 2 Distributed Snapshot Traceback Algorithm 72
2.5.6 Partial deployment
Our traceback scheme is proofed to provide a meaningful traceback result ac-
cording to the previous analysis and simulations. However, we have assumed
that all the routers involved in the traceback are equipped with the traceback
ability. In this subsection, we discuss the possibility for our scheme to provide
a meaningful traceback result under a partial deployment environment, which
means that not all routers involved know the traceback protocol. Firstly, we
have the following notations. We name a router with the traceback scheme de-
ployed a deployed router while a router without the traceback scheme deployed
an undeployed router.
Our idea in providing the partial deployment is to treat the local traffic
generated from an undeployed router as the local traffic of its nearest down-
stream deployed router. Thus, if an attacker is located in the domain of an
undeployed router, then its downstream deployed router will report a high level
of local traffic. This suggests that an attacker is hiding in either the deployed
router or its upstream deployed router.
However, the tradeoff in providing the partial deployment is the introduc-
tion of a set of strict conditions. The conditions are as follows.
1. The last mile router of the victim must be a deployed router. If
the last mile router of the victim is an undeployed router, then the local
traffic of the last mile router will become the local traffic of the victim,
which is not a reasonable result because the victim should not generate
any local traffic.
2. An undeployed router will not process nor drop the marker
packet. The undeployed router is transparent to the traceback protocol,
and thus the marker should only be forwarded by the undeployed router.
3. Each deployed router knows whether a router in the Internet
Chapter 2 Distributed Snapshot Traceback Algorithm 73
R1
R2
R4
R3
ν
attackers
undeployed router
R1
R2
R4
ν
attackers
a virtual link
(a) (b)
Figure 2.20: (a) The same example network as Figure 2.7 with attacking do-mains R3 and R4. But, the router R3 is an undeployed router. (b) Logically,a virtual link between the router R2 and R4 is formed.
map is a deployed router or an undeployed router. From the
viewpoint of a deployed router, in order to send markers to its nearest
upstream deployed routers, the deployed router needs to know the loca-
tion of the nearest upstream deployed routers. In a partial deployment
environment, the nearest upstream deployed routers may not be the im-
mediate neighboring upstream router. Therefore, for practical reasons,
each deployed router is required to know the locations of all the deployed
and the undeployed routers.
Deployment illustration
To illustrate how the partial deployment works, we revisit the example in
Figures 2.7 and 2.8 in Section 2.2.5. In this time, we change R3 to be an
undeployed router as shown in Figure 2.20(a).
When the traceback starts, the victim sends a marker to the router R2.
When R2 receives that marker, it decides the set of routers to which the marker
should be sent. According to the third condition (the deployed router knows
where the nearest upstream deployed routers are), the router R2 will find
that the nearest upstream deployed routers are R1 and R4. Then, R2 sends
Chapter 2 Distributed Snapshot Traceback Algorithm 74
R1
R2
R3
R4
ν 0 0
0
10
100
10010
230
20
110
230 0 10
0
1st instance of thesnapshot algorithm
2nd instance of the snapshot algorithm
na nb
nc
ma
t1t2
t1,1
t2,1
mbnd
t4,1
time
time
time
time
time
Figure 2.21: A timing diagram that shows the progress of the DDoS tracebackalgorithm under the partial deployment environment.
two markers destined for R1 and R4 accordingly. As the undeployed router
R3 forwards only the marker packet, eventually, all deployed routers will be
invoked. Meanwhile, the router R2 is instructed to record the channel states
until the markers from routers R1 and R4 arrive. From the viewpoint of router
R2, when it records the channel state, it no longer records the channel state
of the physical link Link3,2. Instead, a virtual link Link4,2 is established as
shown in Figure 2.20(b), and the router R2 will monitor this virtual link.
Figure 2.21 shows the timing diagram of the traceback result under the
partial deployment environment, and it shows the same scenario as in Figure
2.8 in Section 5.5 except that the router R3 is an undeployed router, thus there
is no reading recorded by R3. Also, the timing diagram depicts that the router
R1 is recording the channel state of the virtual link Link4,2. We now calculate
Chapter 2 Distributed Snapshot Traceback Algorithm 75
the accumulative local traffic of R2 by using Equation (2.4).
N2(t2,1) = C2(t2,1) − (C1(t1,1) − H12(t2,1, t2,11))
− (C4(t4,1) − H42(t4,1, t4,14))
= 0 − (10 − 10) − (100 − 100) = 0 .
N2(t2,2) = C2(t2,2) − (C1(t1,2) − H12(t2,2, t2,21))
− (C4(t4,2) − H42(t4,2, t4,24))
= 230 − (20 − 0) − (110 − 10) = 110 .
The local traffic of R2 is 110, which is the sum of the local traffic of R2 and R3
in the full deployment environment (see Table 2.4). Hence, the local traffic of
the undeployed router R3 is forwarded to the downstream deployed router R2.
Problem in measuring channel states
However, there is one problem about the introduction of the virtual link. To
illustrate, let us consider the scenario in Figure 2.22. According to the snapshot
algorithm, the router R1 will record the channel states of the virtual links
Link3,1 and Link4,1. But, the channel states that R1 is recording is the physical
link Link2,1. As R2 routes packets from R3 and R4, the physical channel Link2,1
is a mixture of channel states of Link3,1 and Link4,1. As a result, R1 is not
able to distinguish the two virtual links. This problem is illustrated by the
zoom-in part of Figure 2.23. After the marker from R3 arrives, the physical
channel Link2,1 now belongs only to the virtual link Link4,1.
To remedy the problem, the following approximation is applied: to dis-
tribute the mixed channel states into two shares proportional to the mea-
sured local states of R3 and R4. Mathematically, we have the following. De-
note the mixed channel state measured by R1 on the physical link Link2,1 as
H21(t1,1, t1,13), and denote the channel state measured by R1 on the physical
Chapter 2 Distributed Snapshot Traceback Algorithm 76
R1R2
R4
ν
R3undeployed
router
R1
R4
ν
R3virtual link Link3 1
virtual link Link4 1
(a) (b)
Figure 2.22: (a) In this example network, the router R2 is an undeployedrouters while the others are deployed routers. (b) As the undeployed router istransparent to the traceback protocol, the router R1 records the channel stateof the virtual links Link3,1 and Link4,1.
R4
R3
R2
R1
ν
time
time
time
time
timemixed channel
stateschannel state of
virtual link Link4 1
Figure 2.23: The timing diagram under a partial deployment environment. Adrawback is that the channel states of the virtual links Link3,1 and Link4,1
become indistinguishable at router R1.
link after R1 receives a marker from R3 as H21(t1,13, t1,1
4) . Also, denote the lo-
cal traffic measured at R3 and R4 as L3(t3,1, t3,2) and L4(t4,1, t4,2) respectively.
Then, the channel states H31(t1,1, t1,13) and H41(t1,1, t1,1
4) are given as follows:
H31(t1,1, t1,13) = H21(t1,1, t1,1
3) × L3(t3,1, t3,2)
L3(t3,1, t3,2) + L4(t4,1, t4,2); (2.15)
H41(t1,1, t1,14) = H21(t1,1, t1,1
3) × L3(t3,1, t3,2)
L3(t3,1, t3,2) + L4(t4,1, t4,2)
+ H21(t1,13, t1,1
4) . (2.16)
The rationale of this solution is based on the assumption that if the ratio
Chapter 2 Distributed Snapshot Traceback Algorithm 77
of the local traffic of R3 to the local traffic of R4 is at a certain value (say
x) within the snapshot interval, then, during the time before and after the
snapshot interval, the ratio will be very likely around the value x. Therefore,
we distribute the mixed channel state in two shares according to the ratio x.
Note that this solution is scalable. Although the undeployed routers form a
sub-network, the scheme still works conditioned that the sub-network routes
packets in a FIFO manner.
2.6 Chapter Summary
In this chapter, we propose a distributed traceback methodology for DDoS
attacks such that a victim site can locate attackers on the fly. During the
execution of the algorithm, a router has only to perform a light-weighted pro-
cedure that that keep track of (i) the number of packets forwarded to a victim
site and (ii) the number of transit packets for all its incoming links during the
recording of the router’s local state. By providing these two pieces of infor-
mation, a victim site can accurately determine the intensity of a router’s local
traffic. We also present an efficient algorithm so that a victim site can accu-
rately determine the number of packets (with upper- and lower-bounds) from
each router during the victim’s measurement interval. The set of information
can assist the victim to determine the locations of the attackers whether the at-
tack packets are spoofed or not within a short measurement interval. We carry
out simulations to illustrate that our methodology is effective independent of
the attack traffic volume, the attack traffic patterns as well as the location
distribution of the attackers. We believe that the proposed distributed trace-
back methodology can complement and leverage the existing ICMP traceback
so that a more efficient and accurate traceback can be obtained.
One drawback of the distributed snapshot traceback algorithm is that it is
not sensitive to non-significantly dominating flows. Accordingly to the results
Chapter 2 Distributed Snapshot Traceback Algorithm 78
obtained from the simulations, our methodology can only rank the sizes of the
flows so that one can pinpoint the dominating flows. If all of the flows are of
similar volumes, no conclusion can be made from the traceback result. Or, the
only conclusion is every domain is demonstrating a distributed denial-of-service
attack simultaneously and implies a global scale attack.
Chapter 3
Probabilistic Packet Marking
Algorithm
The distributed snapshot traceback algorithm introduced in the previous chap-
ter has provided us with an efficient algorithm in measuring the traffic gen-
erated by domains or ISPs. As suggested in Chapter 1, after the distributed
snapshot traceback algorithm has pinpointed the ISPs that send dominating
flows, the next step is to focus on the traceback within the targeted ISPs, us-
ing a microscopic traceback algorithm. Because of the different structure of the
network inside an ISP, the distributed snapshot traceback algorithm may not
be the best traceback algorithm. However, we are not going to reinvent the
wheel, and we choose an algorithm that is widely accepted, the probabilistic
packet marking algorithm, to be the microscopic traceback algorithm.
The probabilistic packet marking algorithm (PPM algorithm for short)
proposed by Savage et al. [30] attracted the most attention in contributing the
idea of IP traceback. The most interesting point of this IP traceback approach
is that it allows routers to encode certain information on the attack packets
based on a pre-determined probability. Upon receiving a sufficient number of
marked packets, the victim can construct the set of paths the attack packets
traversed, and hence the victim can obtain the location(s) of the attacker(s).
In our case, the victim is not be the true victim, but the micro-traceback
79
Chapter 3 Probabilistic Packet Marking Algorithm 80
processing node. Nonetheless, we still name the place that the attack traffic
concentrates the victim, and we treat the victim as a normal client in the
remainder of the thesis.
3.1 Structure of This Chapter
In this chapter, we present an in-depth introduction of the PPM algorithm. In
Section 3.2, we describe the goal as well as the structure of the PPM algorithm.
In Section 3.3, we present the assumptions of the PPM algorithm. Section 3.4
shows an illustration of the PPM algorithm so that readers can become familiar
with the execution of the PPM algorithm. Lastly, Section 3.5 concludes this
chapter.
3.2 Goal and Structure of the PPM Algorithm
The goal of the PPM algorithm is to obtain a constructed graph such that
the constructed graph contains the attack graph, where an attack graph is the
set of paths the attack packets traversed, and a constructed graph is a graph
returned by the PPM algorithm.
3.2.1 Global network and attack graph
We depict the meaning of an attack graph through the example in Figure 3.1.
The figure shows a simple network with the legitimate users as well as the
attackers attached. We name this network the global network. As an attack
graph is defined as the paths traversed by the attack packets that form the
attack graph, therefore, not all of the global network is the attack graph, and
the attack graph should contain only the affected routers and edges, depicted
in Figure 3.2.
Chapter 3 Probabilistic Packet Marking Algorithm 81
R1
ν
R5
R2
R4R3
R7
R8
R6
legitimateusers
attackers
Ri
ν
router
victim
group ofclients
traffic flow
Figure 3.1: A typical case of a DDoS attack toward the victim V.
Nevertheless, it is always hard to decide whether a packet is legitimate or
not. Eventually, there may be cases when the attack graph contains more
nodes and more edges than the actual attack graph. As depicted in Figure
3.1, the legitimate traffic is mixed with the attack traffic (at router R1). As it
is not easy to make a fast and accurate decision about the legitimacy of the
packet (because the source address of the packet may be spoofed), an attack
graph that has included routers and edges that are not traversed by the attack
packets is also accepted, and we call this graph the relaxed attack graph.
3.2.2 Constructed graph
To fulfill the goal to obtain the attack graph, [30] suggested a method to encode
the information of the edges of the attack graph into the attack packets through
cooperation between the routers and the victim site. When collecting enough
encoded packets, the victim builds a constructed graph based on the encoded
information. Thus, a constructed graph is the result returned by the PPM
algorithm.
We now define the correctness of the constructed graph. A constructed
Chapter 3 Probabilistic Packet Marking Algorithm 82
R1
ν
R4R7
R8 R5 R2
R1
νR4R7
R8 R5 R2
R3
(a) (b)
Figure 3.2: The illustration of an attack graph: (a) an attack graph is not theentire network; the attack graph is the paths traversed by attack packets; (b)the attack graph may become larger than the actual one due to the lack oflegitimacy of the packets.
graph must contain the attack graph as its sub-graph. When the PPM al-
gorithm stops and returns such a graph, then the PPM algorithm returns
a correct result. Otherwise, the constructed graph is an incorrect one. We
formally define the correctness of the constructed graph in Definition 3.1.
Definition 3.1 A constructed graph returned by the PPM algorithm is correct
if and only if the constructed graph contains the attack graph as a sub-graph.
Note important that Definition 3.1 includes the case when the constructed
graph is the same as the attack graph as well as the case when the constructed
graph is a relaxed attack graph.
3.2.3 Structure of the PPM algorithm
In particular, the PPM algorithm is made up of two separated procedures: the
packet marking procedure, which is executed on the router side, and the path
reconstruction procedure, which is executed on the victim side.
The packet marking procedure is designed to randomly encode edges’ infor-
mation on the packets arriving at the routers. By using the information, the
Chapter 3 Probabilistic Packet Marking Algorithm 83
victim then executes the path reconstruction procedure to construct the attack
graph. We first briefly review the packet marking procedure so that readers
can become familiar with how the router marks information on the packets.
A brief review of the packet marking procedure
The packet marking procedure aims to encode every edge of the attack graph,
and the routers encode the information in the following three marking fields of
an attack packet: the start, the end, and the distance fields (wherein [30] has
discussed the design of the encoding of the marking fields). In the following,
we describe how a packet stores the information about an edge in the attack
graph, and the pseudocode of the procedure from [30] is given in Figure 3.3
for reference.
When a packet arrives at a router, the router determines how to process
the packet based on a random number x (line #1 in the pseudocode). If x
is smaller than the pre-defined marking probability pm, the router chooses to
start encoding an edge. In other words, the probability that the router starts
encoding an edge is pm. The router sets the start field of the incoming packet
to the router’s address, and resets the distance field of that packet to zero.
Then, the router forwards the packet to the next router.
When the packet arrives at the next router, the router again chooses if it
should start encoding another edge. Say, this time, the router chooses not to
start encoding a new edge. Then, the router will find out that the previous
router has started marking an edge because the distance field of the packet
is zero. Eventually, the router sets the end field of the packet to the router’s
address. Nevertheless, the router increments the distance field of the packet
by one so as to indicate the end of the encoding. Now, the start and the end
fields together encode an edge of the attack graph. For this encoded edge to be
received by the victim, successive routers should choose not to start encoding
an edge, i.e., the case x > pm in the pseudocode, because a packet can encode
Chapter 3 Probabilistic Packet Marking Algorithm 84
Packet Marking Procedure(Packet w)
1. Let x be a random number in [0 . . . 1)2. If x < pm, then3. write router’s address into w.start and 0 into w.distance4. else5. If w.distance = 0 then6. write router’s address into w.end7. end If8. increment w.distance by one9. end If
Figure 3.3: The pseudocode of the packet marking procedure of the PPMalgorithm.
only one edge. Further, every successive router will increment the distance
field by one so that the victim will know the distance of the encoded edge.
Path reconstruction procedure
The path reconstruction procedure is the final step to build the constructed
graph. The procedure works with the encoded packets, and it extracts the
edge information from every packet. Note that, to avoid attackers in spoofing
the packets, the victim has to know the global network (not the attack graph),
and the procedure will eliminate the abnormal edge information (line #8 in
Figure 3.4). A subtle note is that, as the name of the procedure suggested,
this procedure works only with paths. But, this does not stop the procedure
from handling multiple numbers of paths.
Chapter 3 Probabilistic Packet Marking Algorithm 85
Path reconstruction procedure
1. Let G be a tree with root v, where v is the victim.2. Let every edge in G be (start, end, distance).3. For each packet w from attacker4. if w.distance == 0 ; then5. insert edge (w.start, v, 0) into G;6. else7. insert edge (w.start, w.end, w.distance) into G;8. remove any edge (x, y, d) with d 6= distance from x to v in G;9. extract path (Ri. . .Rj) by enumerating acyclic paths in G;
Figure 3.4: The pseudocode of the path reconstruction procedure of the PPMalgorithm.
3.3 Assumptions
The PPM algorithm is not a versatile methodology that can tackle all kinds of
distributed denial-of-service attacks such as the TCP-SYN flood attack and the
reflector attack. There are assumptions imposed on the execution environment
and the algorithm itself.
3.3.1 Marked packets and PPM markings
A received packet by the victim may contain information that is needed to
reconstruct the attack graph. According to [30], the marked information is
sliced and is distributed in encoded packets. In turn, the victim can reconstruct
the information by collecting enough fragments.
In our context, the PPM markings are the information reconstructed by
collected packets. To simplify the discussion, we are actually not focused on
the reconstruction of the sliced-edge information. Rather, we use the term
marked packet as a virtual packet that contains a set of reconstructed marking
Chapter 3 Probabilistic Packet Marking Algorithm 86
information.
Assumption 3.1 It is assumed that a marked packet can always be recon-
structed from slices of encoded information.
3.3.2 Router
We assume that every router in the network is willing to participate in the PPM
algorithm when requested. For each router, we assume that it is equipped with
the ability to mark packets following the packet marking procedure.
Functionally speaking, a router can either be an transit router or a leaf
router: a transit router forwards traffic from upstream routers to its down-
stream routers (or the victim) while a leaf router connects to the upstream
client computers (not routers) and forwards the clients’ traffic to its down-
stream routers (or the victim).
3.3.3 Packet marking probability
One of the characteristics of the PPM algorithm is to mark packets randomly.
The randomness is controlled by a variable called the marking probability, pm,
and every (participating) router owns a copy of this variable. There is no
restriction on the packet marking probability pm. But allowing every router
to have a different value for pm would complicate our discussion, and we make
the following assumption.
Assumption 3.2 It is assumed that every participating router uses the same
value of the packet marking probability pm throughout the execution of the
PPM algorithm.
Though this assumption sounds impractical when the PPM algorithm is
deployed in a worldwide scale, the fixed marking probability becomes natural
when the PPM algorithm is deployed within an ISP. In addition, there is
Chapter 3 Probabilistic Packet Marking Algorithm 87
work in the literature which claims that by changing the value the marking
probability of each router, the number of packets required to construct the
correct constructed graph can be minimize [70].
3.3.4 Attack source and attack pattern
An attack source is the end-host that sends packets to the victim (not neces-
sarily a high volume of traffic). Usually, the number of attack sources can be
in an order of thousands, and the aggregated volume is therefore overwhelm-
ing. Though such an attack source may not be the attacker (the attack source
may only be a zombie), it is necessary to stop such an overwhelming flow by
locating the sources. We therefore treat every attack source as an attacker.
Assumption 3.3 An attacker is the end-host (the leaf router of an attack
graph) that sends an attacking flow toward the victim.
A flood-based DDoS attack, according to its name, attacks the victim
by flooding the victim with packets, loads the victim with an extraordinary
amount of traffic, and hence disables or degrades the service provided by the
victim. However, there is no defined pattern by which the attackers bombard
the victim. It can be a continuous flow, a bursty periodic strike, etc. For
simplicity, we make an assumption about the attack pattern as follows.
Assumption 3.4 Every attacker sends out a continuous flow of packets. Also,
every attacker sends approximately the same number of packets toward the
victim.
Note that if this assumption is actually not true, say the attack pattern is ac-
tually a bursty, then the obtained attack graph may not cover all the attackers.
Chapter 3 Probabilistic Packet Marking Algorithm 88
3.3.5 Attack graph and packet routing
The attack graph generated by the PPM algorithm has a very strong depen-
dence on the routings inside the global network graph since the attack graph is
formed by the traversals of the packets. Nevertheless, due to the autonomous
property of the network routers, the routings inside the Internet graph may
be changed under abnormal situations. Unfortunately, a flood-based DDoS
attack is one of the abnormal situations. A high volume of flows generated by
the DDoS attack creates a congested environment within the Internet graph.
This may drive the routers to change their routings so as to cope with such a
change (and we all know that their acts are futile).
Eventually, the attack graph may be changed because of the changes in
the routings inside the Internet graph, and the topology of attack graph is,
therefore, short-lived. Nevertheless, our goal is to locate the attackers. The
short-lived property of the attack graph does not hinder us in achieving our
goal on the condition that the attack graph, from time to time, is pinpointing
the locations of the attackers.
Thus, the target of the PPM algorithm should not be fixed to find a con-
sistent attack graph. Rather, the target of the PPM algorithm is to locate the
attackers through the construction of the attack graph. We make the following
strong assumption.
Assumption 3.5 During the time that the PPM algorithm is executing, the
routings inside the global network graph should not change.
We provide the illustration on how the PPM algorithm reacts to the change
of the routings. In Figure 3.5(a), we have a network showing all the network
links and the current routing in the network. When one of the routers, R1, is
down, such failure triggers the routing table of every router to change com-
pletely as shown in Figure 3.5(b).
Chapter 3 Probabilistic Packet Marking Algorithm 89
R1
R4
R3
R2
ν
R1
R4
R3
R2
ν
physical linkrouting path
Routing before change Routing after change
R1
R4
R3
R2
ν
possible constructed graph
constructed graph’s link
(a) (b) (c)
Figure 3.5: The failure of the router R1 causes the route tables of R2, R3, andR4 to change. This results in a constructed graph with routers having multipleoutgoing edges.
Under such a scenario, the set of collected packets may included encoded
packets from the routing configurations in both Figures 3.5(a) and 3.5(b).
Therefore, the constructed attack graph may become the one shown in Figure
3.5(c).
We argue that this result is not an undesirable one as long as the definition
of a correct attack graph construction (Definition 3.1 on Page 82) still holds
because the new attack graph is indeed composed of all the edges traversed by
the packets. In the remainder of this paper, we stay with this assumption, and
we will discuss the scenario when this assumption is relaxed in Section 5.7.
On the other hand, modern routing protocols [71, 72] currently used by
routers favor the formation of a routing tree rather than a routing graph. The
difference between a routing tree and a routing graph lies in the number of
outgoing routes to a particular address. Usually, there is only one route for
one destination address. Therefore, the corresponding attack graph should be
a tree instead of a graph unless the routings with the attack graph have been
changed.
Chapter 3 Probabilistic Packet Marking Algorithm 90
Pkt # Src Hop#1 Hop#2 Hop#3 Hop#4 New1 R7 (φ, φ, φ) (φ, φ, φ) (φ, φ, φ) (φ, φ, φ) −2 R7 (R7, φ, 0) (R7, R4, 1) (R7, R4, 2) (R7, R4, 3)
√
3 R8 (R8, φ, 0) (R5, φ, 0) (R2, φ, 0) (R2,V, 1)√
4 R7 (R7, φ, 0) (R7, R4, 1) (R7, R4, 2) (R7, R4, 3) ×5 R7 (φ, φ, φ) (R4, φ, 0) (R4, R1, 1) (R4, R1, 2)
√
6 R8 (φ, φ, φ) (φ, φ, φ) (R2, φ, 0) (R2,V, 1) ×7 R7 (φ, φ, φ) (φ, φ, φ) (R1, φ, 0) (R1,V, 1)
√
8 R8 (φ, φ, φ) (R5, φ, 0) (R5, R2, 1) (R5, R2, 2)√
9 R8 (R8, φ, 0) (R8, R5, 1) (R8, R5, 2) (R8, R5, 3)√
......
......
......
...
Table 3.1: A sequence of packets collected by the victim.
Assumption 3.6 Every participating router has only one outgoing route to-
ward the victim.
For the ease of presentation, we call the “outgoing route toward the victim”
the victim route throughout this thesis.
3.4 Graph Reconstruction Example
After the descriptions of the PPM algorithm from the previous sections, we
illustrate a traceback example by using the network graph in Figure 3.1 (on
Page 81). Table 3.1 shows a set of packets that are received by the victim.
3.4.1 Packet marking
Table 3.1 not only shows the sequence of the arriving packets, but also displays
the way that the participating routers mark the packets.
For each row in the table, it displays the source of the concerned packet.
According to Figure 3.1, there are only two sources of attack packets, and they
are the routers R7 and R8. Also, the length of the path between R7 and the
Chapter 3 Probabilistic Packet Marking Algorithm 91
victim as well as the length of the path between R8 and the victim are the same,
which is three. Then, each row of the table displays a step-by-step illustrates
of the packet marking procedure with the column Hop#1 corresponding to
the first router, . . . , and the column Hop#4 corresponding to the destination
victim. For example, if the source of a packet is R7, then the four hops are
R7, R4, R1, and the victim V. Lastly, the table also shows if a packet contains
a new kind of marking combinations when it is arrived at the victim, and the
column New indicates such an information.
Table 3.1 actually has displayed all kinds of marked packets and the cor-
responding packet marking histories. In addition, the table also includes an
unmarked packet.
3.4.2 Attack graph reconstruction
By using the sequence of packets displayed in Table 3.1, the victim can re-
construct the attack graph, and the steps are shown in Figure 3.6. For each
arrived packet, Figure 3.6 displays the corresponding graph constructed. There
are cases when the victim receives an unmarked marked or the victim receives
a marked packet which has been received before. Correspondingly, the con-
structed graph will not grow in size under such cases. Otherwise, one edge
will be added for each arrived marked packets. Eventually, the ninth packet
completes the attack graph reconstruction example, and the constructed con-
structed graph is correct according to Definition 3.1 on Page 82.
3.5 Chapter Summary
In this chapter, we introduced the probabilistic packet marking algorithm
(PPM algorithm for short). The algorithm is distributed in nature, and it
requires cooperation between the victim and the participating routers. The
Chapter 3 Probabilistic Packet Marking Algorithm 92
1
4
7 8
5
2 3
6
9
R5
R8
ν ν ν
ν ν
ν
R4
R7
R2
R4
R7
R4
R7
R1
R2R2
R4
R7
ν
R4
R7
R1
R2
R4
R7
R1R2
ν
R4
R7
R1R2
ν
R4
R7
R1 R2
R5
(no change)
(no change) (no change)
Figure 3.6: A step-by-step illustration of the reconstruction of the attack graphbased on the incoming packet sequence in Table 3.1.
Chapter 3 Probabilistic Packet Marking Algorithm 93
goal of the PPM algorithm is to discover the locations of the attackers by dis-
covering the paths traversed by the attack packets. We call the union of the
paths the attack graph. The merit of the PPM algorithm is to use a probabilis-
tic way to encode edge information on the attack packets. This light-weighted,
efficient, and yet effective way to encode information on packets sheds light on
the way to deploy traceback algorithms on production routers, and this is why
the PPM algorithm is one of the best candidates for the microscopic traceback
algorithm.
However, the PPM algorithm faces a vital defect when it is deployed. Ac-
cording to Section 3.4, an example shows the way that the victim constructs
the attack graph. Also, this example shows the way that the victim stops the
attack graph reconstruction: when the constructed graph is the same as the
attack graph. Yet, in the example, the victim knows the time to stop because
the attack graph is known (as we know that Figure 3.1 on Page 81 is the attack
graph). There is a high possibility that when an ISP wants to discover the
attack graph, the attack graph is however unknown to the ISP. Then, the ISP
does not know the time to terminate the PPM algorithm. In the next chapter,
we address the termination condition of the PPM algorithm.
Chapter 4
Termination Condition of PPM
Algorithm
The probabilistic packet marking algorithm has gained much attention, and
it has been enhanced by much work in the literature. Yet, it is important to
ensure that the PPM algorithm can be deployed successfully.
For a traceback algorithm to be successfully deployed, several issues have
to be addressed. First, the initialization of the algorithm is vital. For the PPM
algorithm to be deployed as a microscopic traceback algorithm at the ISP level,
the routers on the ISP must be synchronized under centralized control. Then,
the initialization among the routers can be done seamlessly under the control
of the ISP. Next, the collection and the processing of the traceback data are
also vital. In our case, the victim may not have the ability to collect or to
process the data. Nevertheless, the ISP can set up special routing support so
as to divert the malicious traffic to the micro-traceback processing node (as
described in Section 1.2.2 on page 13).
Lastly, a traceback algorithm should know when the algorithm should stop.
For the PPM algorithm, however, the termination is not thoroughly discussed
in the literature. It turns out that the termination condition is important
because it determines the correctness of the constructed graph: if the algorithm
stops too early, the constructed graph will not contain enough edges of the
94
Chapter 4 Termination Condition of PPM Algorithm 95
attack graph, and, thus, fail to fulfill the traceback purpose. On the other
hand, it is not correct to allow the algorithm to run for a long period before
the victim starts the graph reconstruction procedure. The reason is obvious:
the victim would never know how much time is long enough.
In [30], the authors provided an estimation of the number of marked packets
required before the victim can have a constructed graph that is the same as the
attack graph in a single-attacker environment. Let X be the number of marked
packets required for the victim to reconstruct a path, and we name this number
the sufficient packet number. Let d be the length of the reconstructed path.
Also, let pm be the marking probability of every router in the path. Equation
(4.1) given by [30] defines the upper bound on the expected sufficient packet
number E[X], and we name this equation the upper–bound packet number
throughout this paper.
E[X] <ln(d)
pm(1 − pm)d−1. (4.1)
4.1 Using the Upper-Bound Packet Number
as the Termination Condition
Although there is no explicit definition of the termination condition of the PPM
algorithm in [30], it is well-accepted that Equation (4.1) is the termination
condition under the single-attacker environment. The authors also claimed
that, in a multiple-attacker environment,
“the number of packets needed to reconstruct each path is indepen-
dent, so the number of packets needed to reconstruct all paths is a
linear function of the number of attackers”.
Nevertheless, we have found that it is not the case in general. More specifically,
Equation (4.1) should not be treated as the termination condition of the PPM
algorithm.
Chapter 4 Termination Condition of PPM Algorithm 96
R1
v
R2
R4R3 R5 R6
pool of attack sources
victim
Figure 4.1: A six-router binary tree network: the upper-bound equation cannotbe applied under this multiple-attacker environment.
4.1.1 Failure under the multiple-attacker environment
Firstly, one cannot apply the termination condition to complex networks such
that the reconstruction of one path is dependent on another. This scenario can
be explained by Figure 4.1, a binary-tree network with six routers. The leaf
routers, from R3 to R6, are connected to a pool of attackers. These attackers
send out attack traffic towards to the victim V, and this presents a multiple-
-attacker environment. In this graph, the attack packets traversed through
four paths which are identical in structure. However, there are “shared” edges
among these paths. This implies that the reconstruction of one path is depen-
dent on another. Therefore, one cannot treat Equation (4.1) as the termination
condition under this scenario, and this restricts the application of the PPM
algorithm.
Secondly, although every path in a given network is independent, we have
found that the number of marked packets needed to reconstruct the network
graph does not have a linear relationship with the number of paths, i.e., the
claim made by [30] is not correct. We have carried out a set of simulations to
show our finding, and we start the description of our simulation setup from
the network depicted in Figure 4.2. The network contains four paths that
are identical in structure, and, more importantly, there are no shared edges
between any two paths. We name these paths the independent paths. Also, we
Chapter 4 Termination Condition of PPM Algorithm 97
R1
v
R4
R2
R5
R3
R8
R7
pool of attack sources
victim
R6
Figure 4.2: A eight-router tree network with four independent linear paths:another multiple-attacker environment.
assume that one independent path connects to one attacker, and every attacker
sends out a similar amount of attack traffic towards the victim.
4.1.2 Simulation findings
If the claim of the linear property in [30] is right, then there will be a linear
relationship between the number of the independent paths and the number of
packets required to construct a correct constructed graph, where the definition
of the correctness of the constructed graph is defined in Definition 3.1 on Page
82.
To show whether the claim is right or not, we carry out simulations. Given a
network graph, such as the one shown in Figure 4.2, a simulation is to measure
the number of packets needed by the PPM algorithm to return a constructed
graph that is exactly the same as the given network graph. Then, for a specific
network graph, we repeat such a simulation by 10,000 times so that we can
obtain an average value of the number of packets required. Eventually, we
performed a series simulations for the input network graphs having one to 50
independent paths.
Figure 4.3 shows the result of this set of simulations. One can observe
that the average number of marked packets required to construct a correct
Chapter 4 Termination Condition of PPM Algorithm 98
constructed graph increases as the number of independent paths increases.
In order to show whether the number of marked packets required increases
linearly with the number of paths or not, we plot the rate of change of the
number of marked packets required in Figure 4.4. Surprisingly, the graph shows
an increasing trend of the rate of change of the number of marked packets
required. The claim about the multiple-attacker environment made in [30] is
therefore wrong. In later context, we will provide the formal calculation of
the number of marked packets required, and provide a formal way to disprove
the use of the upper-bound equation as the termination condition of the PPM
algorithm.
Theoretically, the packet collecting problem can be transformed into the
coupon-collecting problem with unequal probabilities [73]. The fault made by
[30] is to treat the probability that every encoded edge arrived at the victim
the same, which is wrong (we will discuss it in Section 4.2). The solution to
the coupon-collection problem with unequal probabilities is very complex, and
does not show a linear property with the number of the independent paths.
In summary, the problem of using the upper-bound packet number as the
termination condition is that the relationship between the number of the attack
paths and the expected sufficient packet number E[X] is not known. Therefore,
the PPM algorithm cannot guarantee the correctness in the multiple-attacker
environment.
4.1.3 Chapter structure
In this chapter, we are going to derive a formal way to calculate the expected
number of marked packets required to construct a correct constructed graph.
First, different kinds of marked packets make up the constructed graph, and
the modeling of the packet marking process then becomes important in order
Chapter 4 Termination Condition of PPM Algorithm 99
0
500
1000
1500
2000
2500
3000
0 20 40 60 80 100 120 140 160 180 200
Num
ber
of m
arke
d pa
cket
s re
quire
d
Number of independent paths
Simulation: Number of marked packet required vs Number of independent paths
Simulation result
Figure 4.3: Simulation result: Number of marked packets required versus num-ber of independent paths.
0
2
4
6
8
10
12
14
16
18
20
0 20 40 60 80 100 120 140 160 180 200
Rat
e of
cha
nge
of n
umbe
r of
mar
ked
pack
ets
requ
ired
Number of independent paths
Simulation: Rate of change of number of marked packet required vs Number of independent paths
Rate of change
Figure 4.4: An increasing yet chaotic trend of the rate of change of the numberof marked packets required.
Chapter 4 Termination Condition of PPM Algorithm 100
to understand the pattern of the arriving packets. Such a model will be intro-
duced in Section 4.2, and the model is called packet-type model. By using the
packet-type model, we aim to find the sufficient packet number X as well as
its expectation E[X]. In Sections 4.3, we introduce the discrete-time Markov
chain model to find E[X]. In Section 4.4, we will compare the simulation re-
sult and the theoretical result, and the consistent results together disprove the
upper-bound packet number as the termination condition. Lastly, Section 4.5
concludes this chapter.
4.2 Packet-Type Model
The packet-type model is actually a model of the packet marking procedure
(Figure 3.3 on Page 84). This model aims to describe all possible types of the
marked packets. We first define the packet-type random variable.
Definition 4.1 Define T (G) as the packet-type random variable. T (G) = e
represents that a packet encoding the edge e arrives at the victim, where e is
in the set of edges of the attack graph G. Also, define T (G) = φ if the packet
arrived at the victim is unmarked.
The encoding of every packet arrived at the victim can be represented by
the random variable T (G). In addition, the distance between an edge and the
victim also plays a vital role. We define the edge distance function as follows.
Edge Distance Function
d((Ri, Rj),V, path) =
1, Rj = V;
d((Rj, Rk),V, path) + 1, otherwise,(4.2)
where path = (Ri, Rj, Rk, . . . ,V) is the path from Ri to the victim V. Note
that , according to Assumption 3.6 (every router has only one victim route),
the edge distance function should return only an unique output.
Chapter 4 Termination Condition of PPM Algorithm 101
4.2.1 Packet-Type probability
For every value of the packet-type random variable, there is a corresponding
probability for its occurrence. We name this probability the packet-type prob-
ability. In the following derivation of the packet-type probability, we let the
attack graph be G = (V, E). Also, let Ri, Rj ∈ V , and let (Ri, Rj) ∈ E.
For a packet encoding (Ri, Rj) arriving at the victim V, the packet has to
firstly pass through the edge (Ri, Rj), and then the packet has to be marked by
the router Ri but not the successive routers (meaning that the packet encodes
the edge (Ri, Rj)). Based on the above statement, we have:
P (T (G) = (Ri, Rj)) = P ( “a packet passes through (Ri, Rj)” and
“a packet encodes (Ri, Rj)”)
= P (“a packet passes through (Ri, Rj)”) ×
P (“a packet encodes (Ri, Rj)” |
“a packet passes through (Ri, Rj)”).
Henceforth, we firstly derive the probability that a packet passes through
(Ri, Rj). Then, we derive the probability that a packet encodes (Ri, Rj) con-
ditioned that the packet passes through (Ri, Rj). For the ease of presentation,
we name the former probability the via probability, Pv((Ri, Rj), G), and the
latter one the conditional encoding probability, Pc((Ri, Rj), G). Note also that,
without loss of generality, our proposed solution can also deal with the edges
in the form (Ri,V) where V is the victim site.
Via probability
By Assumptions 3.3 and 3.4 (on Page 87), the probability that a packet comes
from a particular leaf router is the same as the probability that it comes from
another leaf router. On the other hand, Assumption 3.6 (on Page 90) implies
that there is only one path leading from a leaf router to the victim.
Chapter 4 Termination Condition of PPM Algorithm 102
Through the above analysis, the via probability is the product of the portion
of leaf routers that are connected to the edge (Ri, Rj). In the following, we
establish a method to calculate the via probability. Let L(G) be the set of leaf
routers in the network graph G and let |L(G)| be the number of leaf routers
in the set L(G). Let Path(R,V) be the path leading from the router R to the
victim V.
For every leaf router Rl ∈ L(G), there is only one path Path(Rl,V) leading
to the victim. Whether this path contains the target edge (Ri, Rj) or not is
another story. We provide a function δ(p, e), where p is a path and e is an
edge, and we name it the edge testing function. If p contains e, then the edge
testing function returns one. Otherwise, the edge testing function returns zero.
Equation 4.3 gives the formal presentation of the edge testing function.
Edge Testing Function δ(p, e) =
1, e ∈ p
0, e /∈ p(4.3)
Through the introduction of the edge testing function, one can determine
if a path contains a particular edge. Then, one can count the number of
leaf routers whose path towards the victim contain the target edge (Ri, Rj).
Equation 4.4 gives the method that counts such a number of leaf routers, and
it is the via probability.
Via Probability
Pv((Ri, Rj)) =1
|L(G)| ×∑
Rl∈L(G)
δ(Path(Rl,V), (Ri, Rj)). (4.4)
Note importantly that such a derivation of the via probability would become
inapplicable if Assumption 3.6 (assumption on one victim route) is void.
Conditional encoding probability
The conditional encoding probability concerns how a packet’s marking can
reach the victim without being over-written. The formulation of this proba-
bility relies on the edge distance function introduced in early Section 4.2.
Chapter 4 Termination Condition of PPM Algorithm 103
The edge distance function d ((Ri, Rj),V, Path(Ri,V)) gives the distance
from Ri to V. For the encoding of (Ri, Rj) becoming successful, all the routers
from Rj until the last mile router of V have to choose not to encode a new edge.
Hence, the following equation defines the conditional encoding probability.
Conditional Encoding Probability
Pc((Ri, Rj)) = pm × (1 − pm)d((Ri,Rj),V ,Path(Ri,V))−1. (4.5)
Finally, we have the packet-type probability of (Ri, Rj) as follows.
Packet-type Probability
P (T (G)=(Ri, Rj)) =1
|L(G)| ×∑
Rl∈L(G)
δ(Path(Rl,V), (Ri, Rj)) ×
pm × (1 − pm)d((Ri,Rj),V ,Path(Ri,V))−1. (4.6)
In addition, the packet-type probability of an unmarked packet is as follows:
Unmarked Packet Probability
P (T (G) = φ) = 1 −∑
e∈E
P (T (G) = e) . (4.7)
Note that the above derivation of the packet-type probability includes the
presence of the unmarked packets. If the victim considers only the marked
packets, a suitable normalization should be applied as follows. Denote Tm(G)
as the marked packet-type random variable which is the same as the packet-
type random variable T (G) except that Tm(G) takes only on values of the edge
set E of the graph G. Then, the marked packet-type probability is given by
Equation (4.8).
Marked Packet-Type Probability
P (Tm(G)=e) =P (T (G)=e)
1 − P (T (G)=φ), ∀e ∈ E. (4.8)
Chapter 4 Termination Condition of PPM Algorithm 104
Packet type probability(Graph G)
/* The size of the “result” array is equal to the number of edges of thinput graph “G”*/1. result := allocate memory(G.edge) ;2. For (i := 0; i < G.edge; i := i + 1)3. result[i] := 0 ;4. end For5. Foreach leaf in G ; do/* “search path” finds the path from leaf to victim */6. path := search path(leaf, victim) ;8. Foreach edge in path ; do9. length := edge distance function(path, edge) ;10. result[edge] := result[edge] + 1/G.leaf num × pm × (1−pm)length−1;12. end Foreach14. end Foreach15. return result ;
Figure 4.5: The pseudocode of the packet-type probability calculation – itcalculates the packet-type probability of every edge in the graph G.
Chapter 4 Termination Condition of PPM Algorithm 105
4.2.2 Pseudocode of the calculation of the packet-type
probabilities
In Figure 4.5, we provide an algorithm to calculate the packet-type probability
of every edge of an input graph. The algorithm first constructs the path leading
from every leaf routers to the victim. Then, for each path, it calculates and
accumulates the packet-type probability by Equation (4.6) for every edge in
the path. Eventually, it returns the packet-type probabilities of all edges of the
input graph. Note that the calculations of the packet-type probability for an
unmarked packet and the marked packet-type probabilities are not included in
the pseudocode, but one can calculate it by Equations (4.7) and (4.8) together
with the results obtained by the algorithm.
4.2.3 Illustration of the calculation of the packet-type
probability
In this subsection, we use two example graphs in Figure 4.6 to demonstrate the
calculation of the packet-type probabilities under single-path and multi-path
environments, respectively.
The single-path environment
The graph Ga in Figure 4.6 contains one leaf router and has only one path
(R3, R2, R1, v) from leaf router R3 to the victim v. As there is only one leaf
and one path in Ga, the probability that a packet passes through any edge is
one. Thus, the packet-type probability is calculated as shown in Table 4.1.
The multiple-path environment
In graph Gb, there are two paths: (R3, R2, R1, v) and (R4, R1, v). Both paths
contain the edge (R1, v), and this implies that the packet-type probability
Chapter 4 Termination Condition of PPM Algorithm 106
R1 V
R4
R2R3R1 VR3 R2
(a) (b)
Figure 4.6: (a) Ga: A simple example linear network with three edges. (b)Gb: An example network with multiple paths leading from R3 and R4 to thevictim.
Edge (e)(R1, v) (R2, R1) (R3, R2)
P(T(Ga) = e) pm pm(1 − pm) pm(1 − pm)2
Table 4.1: Packet-type probabilities for Ga in Figure 4.6.
of (R1, v) is accumulated from two paths. For the path (R3, R2, R1, v), the
contributed packet-type probabilities are shown in Table 4.2. Then, for the
path (R4, R1, v), the accumulated packet-type probabilities are shown in Table
4.3.
Edge (e)(R1, v) (R2, R1) (R3, R2) (R4, R1)
P(T(Gb) = e) 12pm
12pm(1 − pm) 1
2pm(1 − pm)2 0
Table 4.2: Packet-type probabilities for Gb in Figure 4.6 : after the path(R3, R2, R1, v) of Gb is considered.
Chapter 4 Termination Condition of PPM Algorithm 107
Edge (e)(R1, v) (R2, R1) (R3, R2) (R4, R1)
P(T(Gb) = e) pm12pm(1 − pm) 1
2pm(1 − pm)2 1
2pm(1 − pm)
Table 4.3: Packet-type probabilities for Gb in Figure 4.6: after both paths(R3, R2, R1, v) and (R4, R1, v) of Gb are considered.
4.3 Using Markov Chain Model to Find the
Sufficient Packet Number
In the previous section, the packet-type model is defined. We employ such a
model in this section, and transform the packet collection process of the PPM
algorithm into a discrete-time Markov chain model. Using the well-established
mathematical techniques, we introduce the methodology to calculate the suf-
ficient packet number X and the expected sufficient packet number E[X].
4.3.1 The Markov process
One of the main tasks perform by the PPM algorithm is to collect the marked
packets, and this task terminates when there is at least one marked packet
from each packet type. Let us now define the underlying Markov process M.
In our model, each Markov state represents a combination of the collected
marked packets. Let G = (V, E) be a network graph. The state space SG of
the Markov process M is the power set of the edge set E and is stated as
follows:
SG = {Es | Es ⊆ E}.
For the ease of presentation, we make use of an example network G1 depicted
in Figure 4.7. For the example network G1, the state space SG1is as follows:
SG1= { φ, (e1), (e2), (e3), (e1, e2), (e1, e3), (e2, e3), (e1, e2, e3) },
Chapter 4 Termination Condition of PPM Algorithm 108
R3 R2 R1 νe1e2e3
Figure 4.7: Example network G1: it is a linear network with three routers andone victim.
where, without loss of generality, (ei, ej) represents the victim has collected
two types of marked packets: (i) packets encoding ei and (ii) packets encoding
ej . On the other hand, φ represents that the victim has not collected any
marked packets yet.
Markov states
The physical meanings of the Markov states in M are as follows. The Markov
process starts with the victim collected no marked packets, i.e., state φ. The
Markov process stops when the victim has collected the complete set of marked
packets, and the process is then in the absorbing state. The PPM algorithm
constructs the attack graph when such a state is reached. On the other hand,
the remaining states of the Markov process represent the intermediate states
of the PPM algorithm.
According to the above descriptions, if there are totally m edges in the
attack graph, then there will be totally 2m Markov states. As the network
size grows and the number of edges increases, the Markov chain may become
computationally expensive to model. Nevertheless, one can employ efficient
techniques such as aggregation or dis-aggregation [74] and stochastic comple-
mentation [75] to reduce the state space of the Markov chain model.
Transitions
Besides the Markov states, a discrete-time Markov chain also includes its one-
step transition probability matrix, and, in our Markov model, a transition
Chapter 4 Termination Condition of PPM Algorithm 109
implies an arrival of a packet (not necessarily a marked packet). A transition
occurs if one of the following two situations happens:
1. an arrival of an unmarked packet, or an arrival of a marked packet but
the victim has received this type of packets already. Under these two
cases, the process stays in the same state; or
2. an arrival of a marked packet which was never received by the victim
before. In this case, the process advances to another Markov state.
For example, considering the graph G1 again, suppose that the current state of
M is (e1), and the victim receives a packet encoded e1 again, then the Markov
process stays in state (e1). For another example, while the process is still in
state (e1), the victim receives a packet encoding e2. Then, the process makes
a transition to state (e1, e2).
Next, we define the transition structure of the Markov process. Let there
be m edges named e1, e2, . . . , em. The transition structure in Equation (4.9)
formally defines all the possible transitions of the Markov process.
Transition structure
φ −→ (ei) ;
(ei1) −→ (ei1 , ei2), where ei1 6= ei2 ;
(ei1 , ei2) −→ (ei1 , ei2, ei3), where ei3 6= ei1 & ei3 6= ei2 ;...
...
(ei1 , ei2 , . . . , eim−1) −→ (ei1 , ei2, . . . , eim−1
, eim) ;
(4.9)
where i1, i2, . . . , im ∈ [1, m].
Transition probabilities
Lastly, every transition is associated with a transition probability. In turn,
the transition probability involves the packet-type probability. Back to the
example of network G1 in Figure 4.7, from state (e1) to state (e1, e2), one
Chapter 4 Termination Condition of PPM Algorithm 110
requires the arrival of the packet encoding the edge e2, and, therefore, the
transition probability is the packet-type probability P (T (G1)=e2).
We now show the formulation of the transition probability matrix of the
PPM algorithm. Denote the transition probability matrix of the Markov chain
as P, and denote an entry P[i, j] as the probability that the Markov process
M makes a transition from state i to state j. Then, the transition probabil-
ity matrix P is formulated in Equation (4.10) according to the Markov state
transitions defined in Equation (4.9).
Transition probability structure
P[φ, φ] = P (T ′(G) = φ);
P[φ, (ei1)] = P (T ′(G) = ei1);
P[(ei1), (ei1)] = P (T ′(G) = ei1);
P[(ei1), (ei1 , Ri2)] = P (T ′(G) = ei2);...
...
P[(ei1 , . . . , eir), (ei1 , . . . , eir)] =∑r
k=1 P (T ′(G) = eik);
P[(ei1 , . . . , eir), (ei1 , . . . , eir , eir+1)] = P (T ′(G) = eir+1
);...
...
P[(ei1 , . . . , ein−1), (ei1 , . . . , ein−1
)] =∑n−1
k=1 P (T ′(G) = eik);
P[(ei1 , . . . , ein−1), (ei1 , . . . , ein−1
, ein)] = P (T ′(G) = ein);
P[(ei1 , . . . , ein), (ei1 , ei2, . . . , ein)] = 1.
(4.10)
where G is the attack graph.
4.3.2 Example on discrete-time Markov chain modeling
We present an example to model the traceback process of the example network
G1 into a discrete-time Markov chain When one follows the modeling rules
described above, one can construct the Markov chain as shown in Figure 4.8.
We describe the Markov chain in Figure 4.8 in the following. Every state
except the start state, labeled state 1, has a set of boxes along side with it. For
Chapter 4 Termination Condition of PPM Algorithm 111
e2
2
3
4
5
6
7
81
1P(T(G)=φ)
P(T(G)=φ) + P(T(G)=e1)
P(T(G)=φ) + P(T(G)=e2)
P(T(G)=φ) + P(T(G)=e3)
P(T(G)=e1)
P(T(G)=e2)
P(T(G)=e3)
P(T(G)=e2)
P(T(G)=e2)
P(T(G)=e3) P(T(G)=e1)
P(T(G)=e3)P(T(G)=e1)
P(T(G)=e3)
P(T(G)=e2)
P(T(G)=e1)
P(T(G)=φ) + P(T(G)=e1) + P(T(G)=e2)
P(T(G)=φ) + P(T(G)=e1) +
P(T(G)=e3)
P(T(G)=φ) + P(T(G)=e2) + P(T(G)=e3)
e1 e3
e1 e2e1
e1 e3
e2
e3 e2 e3
Figure 4.8: Illustration of the Markov chain model of the PPM algorithm withnetwork G1 in Figure 4.7.
Chapter 4 Termination Condition of PPM Algorithm 112
example, state 2 has a ‘e1’ in the box. These boxes indicates which types of
packets that the victim has received. Hence, the box of state 2 indicates that
the marked packets encoding edge e1 have been received. The same rationale
applies for other states. Specifically, for state 8, all types of marked packets are
collected with the three boxes appeared, and we name this state the absorbing
state. On the other hand, the transitions of the Markov model are represented
as arrows. The probabilities of these transitions are derived from the transition
probability structure in Equation (4.10). Note importantly that, on state 8,
its stationary transition probability is one, and this implies that further packet
arrivals will not change the state of the process.
If one sets the marking probability to be pm = 12, then the marked packet-
type probabilities of graph G1 are as follows:
P (Tm(G) = e) =
47
e = e1;
27
e = e2;
17
e = e3;
0 e = φ.
(4.11)
By employing the Markov chain (Figure 4.8) and the calculated marked packet-
type probability (Equation (4.11)), one can construct the transition probability
matrix as shown in Figure 4.9.
P =
0 47
27
17
0 0 0 00 4
70 0 2
717
0 00 0 2
70 4
70 1
70
0 0 0 17
0 47
27
00 0 0 0 6
70 0 1
7
0 0 0 0 0 57
0 27
0 0 0 0 0 0 37
47
0 0 0 0 0 0 0 1
Figure 4.9: The constructed transition probability matrix formulated of theMarkov chain shown in Figure 4.8.
Chapter 4 Termination Condition of PPM Algorithm 113
Importance of the transition probability matrix
The transition probability matrix is a vast source of information about the
PPM algorithm. Let the transition probability matrix to be P. If we raise
the matrix to a certain power, say k, then Pk represents the system’s states
after k packets have been arrived at the victim, and the entry Pk[1, 8] in Pk
represents the probability that the system makes a transition from state 1 to
state 8 after k packets have been arrived at the victim. In other words, Pk[1, 8
represents the accumulative probability that k marked packets are enough to
construct an attack graph. Mathematically, we have the following:
Pk[1, 8] = P (X < k) =
k∑
i=0
P (X=i) .
Hence, one can obtain the probability that the sufficient packet number is i,
P (X=i), as follows:
P (X=i) = P i[1, 8] −P i−1[1, 8] . (4.12)
Further, by Equation (4.12), one can construct the probability density function
of P [X = k] as shown in Figure 4.10. Note that this figure shows a close result
between the simulation data (the distribution of the number of required marked
packets generated by 100,000 individual simulation samples) and the density
function.
For another example number, one can find the probability distribution
function of the sufficient packet number of a more complex network graph, G2
in Figure 4.11. Figure 4.12 shows the probability distribution of the sufficient
packet number as well as the corresponding simulation result. The two results
are close, and this supports the correctness of the Markov chain model.
4.3.3 Fundamental matrix
Not only the probability density function of the sufficient packet number can
be found, the Markov chain can also help to find the expected sufficient packet
Chapter 4 Termination Condition of PPM Algorithm 114
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0 5 10 15 20 25 30
Pro
babi
lity
P[X
=i]
Number of marked packets (i)
Markov-chain Model for PPM algorithm
Markov-chain ModelSimulation result (mean = 8.3252)
Figure 4.10: Simulation result versus theoretical result: for network G1 in Fig-ure 4.7, we obtain two close sets of results for the distribution of the sufficientpacket number X.
R1
ν
R2
R4R3 R5 R6
R7 R8 R9 R10 R11 R12 R13 R14
pool of attack sources
victim
Figure 4.11: Example network G2: totally 16,384 Markov states.
Chapter 4 Termination Condition of PPM Algorithm 115
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0 50 100 150 200 250 300 350 400 450 500
Pro
babi
lity
P[X
=i]
Number of marked packets (i)
Probability Distibution of Sufficient Packet Number
Theoretical ResultSimulation Result (mean = 151.66)
Figure 4.12: Probability distribution of the sufficient packet number on the14-router binary-tree network, G2.
number E[X] using the theory of the fundamental matrix [76]. In brief, a
fundamental matrix a n − 1 × n − 1 matrix which is transformed from the
transition probability matrix. In here, we describe the steps involved in the
calculations of the fundamental matrix and E[X].
To calculate the fundamental matrix, one has to first partition the state
space of the Markov chain S into two mutually exhaustive and exclusive par-
titions: St is the partition for all transient states while Sa is the partition for
the absorbing states (e.g., state wherein all marked packets are received by the
victim). After the partition, the one-step transition probability matrix can be
represented as:
P =
Q C
0 1
.
In other words, in the case that there is only one absorbing state, and if
P is of size n × n, then Q is of size n−1 × n−1 wherein Q is the transition
Chapter 4 Termination Condition of PPM Algorithm 116
sub-matrix for all transient states in St (e.g., these are states which have less
than n received marked packets). The sub-matrix Q is used to calculate the
fundamental matrix M as follows:
Fundamental Matrix
M = (I− Q)−1 =∞∑
k=0
Qk . (4.13)
Theorem 4.1 Let M be the fundamental matrix of the underlying Markov
chain describes by the one-step transition probability matrix P. The (i, j)th
entry of M, denote as M[i, j], represents the expected number of visits from
the transient state i to the transient state j before entering the absorbing state.
Proof. The proof is done in [77]. In the following, we are repeating the proof
in terms of our application.
Let the number of visits from the transient state i to the transient state j
before entering the absorbing state be Xij .
Support that the PPM algorithm, i.e., the Markov process, is in the tran-
sient state si. In one step, it may enter the absorbing state sn, with probability
P[i, n].The corresponding number of visits to state sj is equal to zero unless
j = 1. Define:
δij =
1, i = j;
0, otherwise;
Thus, Xij = δij with probability P[i, n]. Alternatively, the process may go
to transient state sk at the first step with probability P[i, k]. The subsequent
number of visits to state sj is given by Xkj. If i = j, the total number of visits,
Xij , will be Xkj + 1. Otherwise, it will be Xkj. Therefore,
Xij =
δij with probability P[i, j]
Xkj + δij with probability P[j, k], 1 ≤ k < n
Chapter 4 Termination Condition of PPM Algorithm 117
If the random variable y denotes the state of the process at the second step
(given that the initial state i), we can summarize as follows:
E[Xij |Y = n] = δij ,
E[Xij |Y = k] = E[Xkj + δij] = E[Xkj] + E[δij ] = δij .
Now, since the pmf of Y is easily derived as P (Y = k) = P[i, k], 1 ≤ k ≤ n,
we can use the theorem of total expectation to obtain
E[Xij ] =∑
k E[Xij |Y = k] × P (Y = k)
= P[i, n] × δij +∑n−1
k=1 P[i, k] × (E[Xkj] + δij)
=∑n
k=1 P[i, k] × δij +∑n−1
k=1 P[i, k] × E[Xkj]
= δij +∑n−1
k=1 P[i, k] × E[Xkj] .
Forming the (n − 1) × (n − 1) matrix consisting of elements E[Xij], we have
[E[Xij]] = I + Q × [E[Xij ]] ⇒ [E[Xij ]] = (I − Q)−1 = M.
Based on the theorem above, one can calculate the expected number of
visits from the starting state to every transient states before entering the ab-
sorbing state, i.e, E[X] in our application. Thus, E[X] can be expressed as:
Expected sufficient packet number
E[X] =n−1∑
i=1
M[1, i] , (4.14)
where state 1 is the start state, i.e., the victim has not received any marked
packets.
4.3.4 Example on calculating E[X]
We continue our example on network graph G1 and calculate E[X]. By fol-
lowing the formulation of the fundamental matrix specified in Equation (4.13),
Chapter 4 Termination Condition of PPM Algorithm 118
M =
1 43
25
16
6415
1 1160
0 73
0 0 143
76
00 0 7
50 28
50 7
20
0 0 0 76
0 73
712
0 0 0 0 710
0 00 0 0 0 0 7
20
0 0 0 0 0 0 74
Figure 4.13: Fundamental matrix calculated by Equation (4.13) with transitionprobability matrix P shown in Figure 4.9.
one can calculate the fundamental matrix M as shown in Figure 4.13. Lastly,
by using Equation (4.14), the value of E[X] is given as follows:
E[X] = 1 +4
3+
2
5+
1
6+
64
15+ 1 +
11
60= 8.3500,
which is quite close to the simulation result shown in Figure 4.10 (on Page
114).
For the example network G2 in Figure 5.10, the calculated expected suffi-
cient packet number is 151.77 while the simulation result is 151.66 according
to Figure 4.12 (on Page 115). This again shows a close result between the
simulation and the calculation from the theoretical model.
4.4 Disproving the Upper-Bound Packet Num-
ber as the Termination Condition
With the discrete-time Markov chain model derived, one can show the rela-
tionship between the number of independent paths and the number of marked
packets required in a more clearer sense (described earlier in Section 4.1.2
on Page 97). Figure 4.14 shows a comparison between the simulation on the
independent path analysis and the theoretical result using the Markov chain
model.
According to the results, one can observe that the two set of data are
Chapter 4 Termination Condition of PPM Algorithm 119
0
1
2
3
4
5
6
7
8
9
10
0 2 4 6 8 10 12 14
Rat
e of
cha
nge
of n
umbe
r of
mar
ked
pack
ets
requ
ired
Number of indenpendent paths
Comparison: Simulation vs Theoretical Result
Theoretical resultSimulation result
Figure 4.14: The comparison between the simulation and the theoretical re-sults: both results disprove the linear property proposed by previous work.
consistent. This clears the doubt that the increasing trend in the rate of
change of the average marked packets needed is resulted because of statistical
errors in the simulations. More importantly, the simulation and the theoretical
results together disprove the use of the upper-bound equation (Equation (4.1)
on Page 95) as the termination condition in a multiple-attacker environment.
The relationship between the number of independent attack paths and the
expected sufficient packet number is proved to be non-linear using the discrete-
time Markov chain model.
4.5 Chapter Summary
In this chapter, we studied the way that the probabilistic packet marking
algorithm (PPM algorithm for short) should terminate its execution. The
number of marked packets collected by the victim is believed to be a natural
choice of the termination condition, and we called the number of the marked
Chapter 4 Termination Condition of PPM Algorithm 120
packets that are required to reconstruct the attack graph the sufficient packet
number.
We first had an in-depth understanding of the PPM algorithm. Through
simulation results, we knew that the relationship between the number of attack
paths and the expected sufficient packet number E[X] is not linear, and the
relationship is still outstanding. In quest for the answer, We first provided a
probabilistic model on the packet marking procedure of the PPM algorithm
(Figure 3.3 on Page 84), and we called such a model the packet-type model.
Then, we devised a discrete-time Markov chain model of the PPM algorithm.
The Markov chain model can give an accurate calculation of the probability
distribution function of the sufficient packet number as well as the expected
sufficient packet number.
Nevertheless, no matter how accurate and how efficient the calculation of
the expectation E[X] is, it is suggested that the expected number of required
marked packets E[X] should not be treated as the termination condition. De-
pending on the underlying probability distribution of the random variable X,
when the mean is reached, there is still a non-zero probability that the con-
structed graph is still an incorrect one. For instance, if the probability distri-
bution of X is a uniform distribution, then the probability that a correct attack
graph is constructed is just 0.5. As a matter of fact, one does not have such a
probability distribution unless one has known the attack graph in advance.
This contradicting condition motivates us to devise a new termination con-
dition of the PPM algorithm, and this idea eventually creates a new form of the
algorithm. We introduce a new traceback algorithm, the rectified probabilistic
packet marking algorithm, in the next chapter.
Chapter 5
Rectified Probabilistic Packet
Marking Algorithm
According to the understanding of the probabilistic packet marking (PPM)
algorithm accumulated from the previous chapters, we conclude the following
points:
• the termination condition of the PPM algorithm relies on the number of
marked packets collected by the victim;
• the number of marked packets required differs by the size and the struc-
ture of the attack graph; and
• the calculations of the probability density function and the expectation
of the number of marked packets required to reconstruct the attack graph
demand complete knowledge of the attack graph.
The last of the above points is the fatal weakness of the current termination
condition of the PPM algorithm because if one already knows the attack graph,
why would one need the PPM algorithm to find the attack graph? This leads
to an obvious choice: to abandon the current termination condition and devise
a new one.
In this chapter, we devise a new version of the PPM algorithm, and the new
algorithm is called the rectified probabilistic packet marking algorithm (RPPM
121
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 122
algorithm for short). The new PPM algorithm has the following attractive
features:
• the RPPM algorithm reconstructs the attack graph without any knowl-
edge of the attack graph;
• the user of the RPPM algorithm is free to determine the correctness of
the constructed graph; and
• when the RPPM algorithm terminates, the constructed graph is guar-
anteed to reach the correctness assigned by the user independent of the
marking probability and the structure of the underlying network graph.
5.1 Structure of This Chapter
In this chapter, we present the RPPM algorithm in details. First, an overview
of the RPPM algorithm will be given in Section 5.2. Next, we show the
dynamics of the execution of the RPPM algorithm in Section 5.3, through the
introduction of the execution diagram. The execution diagram has provided
the foundation of the way that the RPPM algorithm terminates itself. In
Section 5.4, the derivation of the termination condition of the RPPM algorithm
will be presented. Then, we illustrate the execution of the RPPM algorithm
through a graph reconstruction example in Section 5.5. Simulation results of
the RPPM algorithm are presented in Section 5.6, and the results will display
the robustness of the RPPM algorithm. Last but not least, some deployment
issues of the RPPM algorithm will be disclosed in Section 5.8, and Section 5.9
concludes this chapter.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 123
Markedpacketstream
Correctconstructedgraph
Incorrectconstructedgraph
probability > P*
probability < 1-P*
RPPMAlgorithm
User-selectedcorrectness P*
Figure 5.1: The design goal of the RPPM algorithm: to have a correct con-structed graph with probability greater than P ∗.
5.2 Overview of the RPPM Algorithm
To achieve the goal mentioned, we devise a probabilistic computation that guar-
antees that the constructed graph is the same as the attack graph with proba-
bility greater than P ∗, in which we call P ∗ the traceback confidence level (it is
analogous to the level of confidence that the algorithm wants to achieve). To
accomplish this goal, the graph reconstruction procedure of the original PPM
algorithm is completely replaced, and we call the new procedure the rectified
graph reconstruction procedure. On the other hand, we preserve the packet
marking procedure so that every router deployed with the PPM algorithm
dose not have to be changed.
5.2.1 Working principle
According to the original working principle of the PPM algorithm, one can
observe that when more marked packets arrive at the victim, the constructed
graph grows larger until it becomes the same as the attack graph. Then, the
following important point can be concluded from the above observation:
If there is a marked packet arriving at the victim but the con-
structed graph does not change accordingly, then the probability
that the current constructed graph is the same as the attack graph
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 124
will be increased.
In the meantime, we name the period that the constructed graph is not
updated the idle time. Accordingly, the rectified graph reconstruction proce-
dure should then monitor the status of the constructed graph as well as the
idle time. The longer the idle time, the more certain the estimation that the
constructed graph is the attack graph. Nevertheless, we are not going to mea-
sure the idle time in terms of “execution time.” Instead, we measure the idle
time in terms of the number of marked packets received.
Specifically, the rectified graph reconstruction procedure calculates the ter-
mination packet number (TPN for short) whenever the constructed graph is
updated. The TPN is used so that when the number of marked packets re-
ceived by the victim is larger than the TPN but the constructed graph is not
updated, then the RPPM algorithm should stop, and the algorithm should
claim that the probability that the constructed graph is the same as the at-
tack graph is at least P ∗.
5.2.2 Flow of rectified graph reconstruction procedure
Based on the above working principle, we give the the pseudocode of the rec-
tified graph reconstruction procedure in Figure 5.2, and this procedure should
be started as soon as the victim collected the first marked packet.
When a marked packet arrives at the victim, the procedure first checks if
this packet encodes a new edge. If so, the procedure updates the constructed
graph Gc accordingly. Next, if the constructed graph is connected, where
connected means every router can reach the victim, the procedure calculates
the termination packet number (TPN). We will comeback and discuss the case
when the graph is disconnected. The procedure then resets the counter for the
incoming packets to zero, and starts counting the number of incoming packets.
In the meantime, the procedure checks if the number of collected packets is
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 125
Rectified Graph Reconstruction Procedure (P ∗)
/* Initially, Gc contains the “victim” node only, and pkt count = 0.*/1. Foreach incoming packet pkt ; do2. pkt count := pkt count + 1;3. If the incoming packet pkt contains an edge e that is not included in
Gc; then4. Construct the new attack graph Gc by inserting the edge e ;5. If Gc is a connected graph ; then6. TPN := TPN subroutine(Gc, P ∗) ;7. pkt count := 0 ;8. end If9. end If10. If Gc is a connected graph ; then11. If pkt count > TPN ; then12. Return Gc as the constructed attack graph ;13. end If14. end If15. end Foreach
Figure 5.2: The pseudocode of the rectified graph reconstruction procedure ofthe RPPM algorithm.
larger than the TPN. If so, the procedure claims that the constructed graph
Gc is the attack graph with probability P ∗. Otherwise, the victim receives
a packet encoding a new edge. Then, the procedure updates the constructed
graph, revisits the TPN calculation subroutine, resets the counter for incoming
packets, and waits until a packet encoding new edge arrives or the number of
incoming packets is larger than the new TPN.
In the case that the constructed graph is disconnected, the procedure should
not calculate the TPN nor terminate its execution. The reason is that an attack
graph must be connected. Then, it would be wrong to return a disconnected
graph as a correct constructed graph. Therefore, the procedure should wait
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 126
until the constructed graph becomes connected again.
As a result, the termination condition of the RPPM algorithm is “the
counter for the incoming marked packets is larger than the termination packet
number (TPN)”. This shows that “the calculation of the TPN during each
update of the constructed graph” is the core of the RPPM algorithm. In the
next step, we provide a deeper understanding of the RPPM algorithm through
the introduction of the execution diagram.
5.3 Execution Diagram of the RPPM algorithm
According to the previous section, it is observed that the TPN, the constructed
graph, and the execution of the rectified graph reconstruction procedure are
closely related. Such a relationship can be visualized by the construction of the
execution diagram, as shown in Figure 5.3. The execution diagram presents
the dynamics of the execution of the rectified graph reconstruction procedure.
5.3.1 Types of states
There are two types of states in the diagram, the execution state and the
termination state. When the procedure is running, we say that “the rectified
graph reconstruction procedure is in an execution state”. Otherwise, we say
that “the rectified graph reconstruction procedure is in the termination state”.
The execution states also tells us the state of the constructed graph.
1. When the procedure is in the start state, labeled by “0”, it means the
procedure has started running and there are no edges in the constructed
graph.
2. When the procedure is in a connected state, it means the constructed
graph is connected. A connected state labeled by Ci means the con-
structed graph is connected and contains i edges.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 127
0
C1 Cn-1 Cn
STOP
D1 Dn-1
C2
D2
disconnected states
start state
connectedstates
growth transition termination transitiontermination stateexecution state
Figure 5.3: An execution diagram of the rectified graph reconstruction proce-dure of the RPPM algorithm constructing a graph with n edges.
3. When the procedure is in a disconnected state, the constructed graph is
disconnected. A disconnected state labeled by Di means the constructed
graph is disconnected and contains i edges.
Note that both the connected and disconnected states, say Ci and Di, respec-
tively, refer to all the possible graphs that have i edges. Last but not least,
when the procedure is in the termination state, it means the procedure has
stopped.
5.3.2 Types of transitions
There are two types of transitions in the execution diagram. When the proce-
dure takes a growth transition, it means a new edge is added to the constructed
graph. When the procedure takes a termination transition, it means the pro-
cedure is going to stop running.
The transition structure in Figure 5.3 is derived from the pseudocode of
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 128
the rectified graph reconstruction procedure in Figure 5.2. We briefly describe
the transition structure as follows.
1. If a packet encoding a new edge arrives before the number of received
packets is larger than the TPN, then the procedure takes a growth tran-
sition and proceeds to either a connected state or a disconnected state,
depending on the connectivity of the updated constructed graph.
2. If the number of received packets is larger than the TPN, then the pro-
cedure takes the termination transition and proceeds to the termination
state.
3. If the procedure is in one of the disconnected states, then it is meaning-
less to return such a graph as the correct constructed graph, and there
is no transition connecting the disconnected states to the termination
state. The procedure then keeps collecting packets until it proceeds to a
connected state.
5.3.3 Worst-case, average-case, and best-case scenarios
According to the execution diagram, one can classify three kinds of execution
scenarios of the RPPM algorithm. They are the worst-case, the average-case,
and the best-case scenarios. This classification is based on the possibility that
the RPPM algorithm returns a correct graph.
If one assumes that the constructed graph is always connected, then, at
every state, the victim has to calculate the TPN, and has to wait until the rec-
tified graph reconstruction procedure makes a transition to the next connected
state or the termination state. In other words, the procedure is vulnerable re-
turning an incorrect result because there is always a non-zero probability that
the procedure is terminated. We name this scenario the worst-case scenario.
On the other hand, if the constructed graph is allowed to enter a disconnected
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 129
state, then the procedure would not always have the possibility to enter the
termination state. We name this scenario the average-case scenario.
In addition, there is a possibility that the rectified graph reconstruction
procedure is always in the disconnected states (except the state when the
constructed graph becomes the attack graph). Then, there is no chance for
the procedure to return an incorrect result. We name this scenario the best-
-case scenario. Note that the best-case scenario will always have a successful
graph reconstruction.
5.3.4 Role of the execution diagram
The execution diagram provides a thorough understanding about the relation-
ship among the execution of the rectified graph reconstruction procedure, the
constructed graph, and the TPN. Through the analysis of the execution dia-
gram, it can be observed that different execution scenarios of the procedure
would affect the probability that the procedure returns a correct constructed
graph.
It is observed that the worst-case scenario would be the hardest case for
the rectified graph reconstruction procedure to return a correct graph. There-
fore, it is an ideal point for us to derive the calculation of the TPN. Sup-
pose one could successfully provide a guarantee of the correctness of the con-
structed graph under the worst-case scenario, then such a guarantee can also
be provided in the average-case scenario. Moreover, it is expected that the
average-case scenario should out-perform the worst-case scenario in terms of
the successful rate of returning a correct constructed graph. In the next step,
we move on to the derivation of the termination packet number.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 130
5.4 Derivation of Termination Packet Number
In this section, we discuss the derivation of the TPN at each connected state so
that the RPPM algorithm returns a correct constructed graph with probability
larger than P ∗.
5.4.1 Technique
The technique of the TPN calculation is to similar to the testing hypothesis
technique. First, say the rectified graph reconstruction procedure is currently
in state Ci. Then, we make the following hypothesis.
Hypothesis: the attack graph has more than i edges.
When there are more marked packets arriving at the victim and the constructed
graph has not been changed, it gives the procedure an increasing confidence
to reject the hypothesis.
The rectified graph reconstruction procedure should express such a confi-
dence in terms of the number of marked packets arriving at the victim. Then,
the TPN is actually a threshold saying that the procedure has a confidence
larger than P ∗ to reject the hypothesis. Therefore, the constructed graph is
the same as the attack graph with a probability of P ∗ when the TPN is reached.
In the following, we introduce the state-change probability, and this is the
first step of the derivation of the TPN. As mentioned at the end of Section
5.3, we consider only the worst-case case scenario, i.e., the constructed graph
is always connected.
5.4.2 State-change probability
We denote Pτi(Ci → Ci+1) as the probability that the rectified graph recon-
struction procedure proceeds from state Ci to state Ci+1 with TPN set to τi,
and we name this probability the state-change probability from Ci to Ci+1. In
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 131
other words, it is the probability that the victim receives a new edge before
the number of collected marked packets is larger than the TPN τi. Note that
we are not referring to any specific constructed graphs. Instead, as mentioned
in Section 5.3.1, Ci represents all the possible connected graphs with i edges.
Since the probability that the RPPM algorithm returning a correct con-
structed graph is equivalent to the probability that the RPPM algorithm makes
a transition of n− 1 steps from state C1 to state Cn, mathematically, we have
the following equation:
P (constructed graph is correct) =n−1∏
j=1
Pτj(Cj → Cj+1) .
Then, our claim is correct given that the product of the state-change prob-
abilities from state C1 to state Cn should be greater than P ∗ and is given
by:i∏
j=1
Pτj(Cj → Cj+1) > P ∗ .
For the sake of further presentation, we transform the above equation into
Equation (5.1):
Pτi(Ci → Ci+1) >
P ∗
Xi−1, where Xi−1 =
i−1∏
j=1
Pτj(Cj → Cj+1) . (5.1)
Note that Xi−1 in Equation (5.1) is the product of the state-change probabil-
ities of the past states of the rectified graph reconstruction procedure, and we
named it the accumulated state-change probability at state Ci. We will discuss
how to calculate the accumulated state-change probability in Section 5.4.3.
5.4.3 TPN derivation
According to the previous subsection, we know that the TPN at each connected
state can be found by Equation (5.1), which is expressed in terms of the state-
-change probability. In this subsection, we derive the TPN by deriving the
state-change probability with the following steps:
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 132
1. To recall, the state-change probability is the probability that the con-
structed graph of state Ci evolves into the constructed graph of state
Ci+1. Hence, the first step to calculate the state-change probability is
to find all the graphs that could possibly be the next constructed graph,
and we name this set of graphs the extended graphs.
2. In the second step, for each extended graph Ge, we find the probability
that the current constructed graph becomes the extended graph Ge. As a
matter of fact, the above probability is the state-change probability from
Ci to Ci+1 conditioned that the extended graph Ge is the next constructed
graph, and we name this the conditional state-change probability.
3. From the conditional state-change probability, one can find the state-
-change probability (and thus the TPN) through the definition of the
condition probability. Nevertheless, because the calculation of the exact
TPN violates basic assumptions of the traceback problem, the upper-
-bounded TPN would be derived alternatively, and the relationship be-
tween the exact TPN and the upper-bounded TPN will be presented.
Extended graphs
The extended graphs are the predictions of the future constructed graph based
on the current graph. Denote the constructed graph in state Ci of the rectified
graph reconstruction procedure as Gi where i ≥ 1. By the assumption that
every router has only one victim route (Assumption 3.6 on Page 90) and the
assumption that every constructed graph is connected (which was made earlier
in this section), when the constructed graph evolves from Gi to Gi+1, there are
always one new edge and one new node inserted into Gi.
The example in Figure 5.4 helps illustrate the above point. On the left
side of the figure, there is a constructed graph with one edge connecting two
nodes, and the victim and the router are labeled by v and R1, respectively.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 133
R1 ν
R1 ν
R1
ν
R2
R2
ConstructedGraph
ExtendedGraphs
Figure 5.4: Extended graph example: a constructed graph and its set of ex-tended graphs.
On the right side of the figure, a new edge is inserted in the constructed graph
at two possible locations: the graph on the left has the new edge (R2, R1),
and another one has the new edge (R2, v). We name the introduced edges the
extended edges. Formally, we define the extended graphs of Gi in Definition
5.1, and we define G(Gi) as the set of extended graphs.
Definition 5.1 Let G(Gi) be the set of extended graphs of the constructed
graph Gi = (Vi, Ei) in state Ci of the rectified graph reconstruction procedure.
G(Gi) = {Ge = (Ve, Ee) | ∃(u, t) /∈ Ei & u /∈ Vi & t ∈ Vi
such that Ve = Vi ∪ u and Ee = Ei ∪ (u, t)} .
By the assumption that every constructed graph is connected in this sec-
tion, G(Gi) has already included all the possible candidates for the next con-
structed graph Gi+1. So, in the next step, we assume that an extended graph
Ge is the next constructed graph Gi+1. Then, we calculate the state-change
probability conditioned that Gi+1 = Ge, and we call it conditional state-change
probability. Lastly, by using the definition of conditional probability:
Pτi(Ci → Ci+1) =
∑
Ge∈G(Gi)
Pτi(Ci → Ci+1 | Gi+1 = Ge) × P (Gi+1 = Ge) ,
we have the state-change probability.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 134
The conditional state-change probability
The conditional state-change probability is calculated according to the follow-
ing rationale. If one assumes that Gi+1 = Ge, then one knows the topology of
the next constructed graph, and also knows where the extended edge is. Then,
the state-change probability is equivalent to the probability that a packet en-
coding the extended edge arrives at the victim before the number of collected
packets is larger than the TPN.
The probability that the extended edge e′ arrives at the victim is exactly the
packet-type probability P (T (Ge) = e′). Because the marking process of each
packet is independent, the state-change probability conditioned that Gi+1 = Ge
is therefore given by the following equation:
Pτi(Ci → Ci+1 | Gi+1 = Ge) = 1 −
(
1 − P (T (Ge) = e′))τi
. (5.2)
Note that Equation (5.2) is an increasing function with respect to τi because:
ddx
(
1 −(
1 − P (T (Ge) = e′))x)
= −(
1 − P (T (Ge) = e′))x
log(
1 − P (T (Ge) = e′))
> 0,
where x > 0 & P (T (Ge) = e′) ∈ (0, 1).
To continue with the calculation of the state-change probability, the prob-
ability P (Gi+1 = Ge) has to be known. However, this is prohibited by the
assumption that the victim does not have any information about the attack
graph. As an alternative, the upper-bounded TPN will be derived instead.
Upper-bounded TPN
Since the conditional state-change probability is increasing with respect to τi
(stated in the note of Equation (5.2)), one can always find a sufficiently large
integer, τ ∗i , such that:
Pτ∗
i(Ci → Ci+1 | Gi+1 = Ge) >
P ∗
Xi−1
, ∀ Ge ∈ G(Gi) . (5.3)
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 135
By the above idea, we have:
Pτ∗
i(Ci → Ci+1) =
∑
Ge∈G(Gi)
Pτ∗
i(Ci → Ci+1 | Gi+1 = Ge) × P (Gi+1 = Ge)
>∑
Ge∈G(Gi)
P ∗
Xi−1
× P (Gi+1 = Ge) . (By Eq. (5.3).)
∵
∑
Ge∈G(Gi)
P (Gi+1 = Ge) = 1 , ∴ Pτ∗
i(Ci → Ci+1) >
P ∗
Xi−1.
Hence, this shows that τ ∗i can also be a TPN of state Ci because Equation (5.1)
is satisfied. By the above arguments, it is required to confirm the existence
of τ ∗ such that τ ∗i is large enough to satisfy Equation (5.3). From Equation
(5.3), we have:
Pτ∗
i(Ci → Ci+1 | Gi+1 = Ge) >
P ∗
Xi−1
⇒ 1 −(
1 − P (T (Ge) = e′))τ∗
i
>P ∗
Xi−1
(By Eq (5.2).)
⇒ τ ∗i >
log(
1 − P ∗
Xi−1
)
log(
1 − P (T (Ge) = e′)) .
Since the TPN is an integer, we have:
τ ∗i = ⌊Yi(Ge) + 1⌋ , where Yi(Ge) =
log(
1 − P ∗
Xi−1
)
log(
1 − P (T (Ge) = e′)) .
Further, by the monotonic increasing property of the logarithmic function,
Yi(Ge) is monotonic decreasing with respect to P (T (Ge) = e′). Thus, by
finding the value minGe∈G(Gi) P (T (Ge) = e′), the maximum value of τ ∗i in the
set of extended graphs G(Gi) can be found. Therefore,
Upper-bounded TPN
τ ∗i =
log(
1 − P ∗
Xi−1
)
log(1 − pmin)+ 1
, where pmin = minGe∈G(Gi)
P (T (Ge) = e′). (5.4)
Remark. The upper-bounded TPN derived in Equation (5.4) may not be the
exact value of the TPN because if the corresponding extended graph of pmin
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 136
in Equation (5.4) is not the next constructed graph Gi+1, then the true TPN
should be smaller (by the decreasing property of Yi(Ge) in the proof). That is
why we name τ ∗i the upper-bounded TPN.
Calculation of the accumulated state-change probability
According to Equation (5.1), the accumulated state-change probability is given
by:
Xi−1 =i−1∏
j=1
Pτ∗
j(Cj → Cj+1) =
Xi−2 × Pτ∗
i−1(Ci−1 → Ci) , i > 1 ;
1 , i = 1 .
Since the state-change probability is not derived, we opt to calculate the accu-
mulated state-change probability after the state of the rectified graph recon-
struction procedure has been changed.
Let us consider the scenario that the constructed graph is changed from
Gi−1 to Gi. Since after the state has been changed, the probability P (Gi = Ge)
becomes either one or zero for every extended graph Ge, and that means:
P (Gi = Ge) =
0, Ge ∈ G(Gi−1) − {Gi};1, Ge = Gi.
Then, the state-change probability Pτ∗
i−1(Ci−1 → Ci) becomes:
Pτ∗
i−1(Ci−1 → Ci)
=∑
Ge∈G(Gi−1) Pτ∗
i−1(Ci−1 → Ci | Gi = Ge) × P (Ge = Gi)
= Pτ∗
i−1(Ci−1 → Ci | Gi = Gi) × P (Gi = Gi)
= 1 −(
1 − P (T (Gi) = ei))τ∗
i−1
, (By Eq. (5.2).)
where ei is the new edge added to Gi.
Hence, the accumulated state-change probability Xi−1 can be obtained after
the rectified graph reconstruction procedure has proceeded from state Ci−1 to
state Ci. Equation (5.5) presents the calculation of the accumulated state-
change probability.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 137
Accumulated State-Change Probability
Xi−1 =
Xi−2 ×(
1 −(
1 − P (T (Gi) = ei))τ∗
i−1
)
, i > 1 ;
1 , i = 1 .
(5.5)
The accumulated state-change probability for a disconnected state
We now consider the case when the assumption that the constructed graph is
always connected is removed, i.e., a normal execution of the RPPM algorithm.
Suppose that the rectified graph reconstruction procedure enters the discon-
nected state Di+1 from the connected state Ci, the update of the accumulated
state-change probability has to be changed.
According to the previous discussion, the accumulated state-change prob-
ability depends on the constructed graph in state Di+1, which is disconnected.
Nevertheless, because the graph Gi is disconnected, then the packet-type
probability P (T (Gi) = ei) cannot be found. As an alternative, we choose
minGe∈G(Gi) P (T (Ge) = e′) in Equation (5.4) as the value of P (T (Gi+1) = ei+1)
in Equation (5.5). The reason for the above choice is as follows:
τ ∗i >
log(
1 − P ∗
Xi−1
)
log(
1 − pmin
) ⇒ Xi−1 ×(
1 −(
1 − pmin
)τ∗
i
)
> P ∗ .
where pmin = minGe∈G(Gi) P (T (Ge) = e′).
Hence, the accumulated state-change probability is still larger than the
traceback confidence level P ∗ by choosing minGe∈G(Gi) P (T (Ge) = e′) as the
value of P (T (Gi+1) = ei+1) in Equation (5.5). In the next subsection, we
conclude this section and provide the pseudocode of the TPN calculation sub-
routine.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 138
TPN subroutine(Graph G, Traceback Confidence Level P ∗)
/* Let the variables τ , X and p min be static variables, which meanthe values of these variables are not erased after exiting the subroutine.*/
1. If G is not connected & G.edge > 0; then2. If the previous state is a connected state ; then3. X := X × (1 − (1 − p min)τ ) ;4. end If5. exit the subroutine ;6. end If7. If the previous state is a connected state & G.edge > 0; then8. p := packet-type probability of the new edge of the constructed graph
;9. X := X × (1 − (1 − p)τ ) ;10. end If11. p min := 1 ;12. Foreach extended graph Ge in G(G) ; do13. p := the packet-type probability of the extended edge of Ge ;14. p min := min(p min, p) ;15. end Foreach16. τ := ⌊log(1 − P ∗/X)/ log(1 − p min) + 1⌋ ;17. return τ ;
Figure 5.5: The pseudocode of the termination packet number (TPN) calcu-lation subroutine.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 139
5.4.4 Section summary and TPN calculation subroutine
To summarize, we have presented how one can calculate the TPN at every con-
nected state of the graph construction procedure so that the RPPM algorithm
returns a correct constructed graph with a specified probability P ∗.
Figure 5.5 shows the subroutine that calculates the TPN, and it is ex-
ecuted whenever the rectified graph reconstruction procedure enters a new
state. When the routine is visited for the first time, the variable “X” that is
used to store the accumulated state-change probability is initialized to one.
Next, based on the connectivity of the current constructed graph, the variable
“X” is updated in different ways: 1) if the current constructed graph is con-
nected, the subroutine calculates the packet-type probability of the new edge
and then updates the variable “X”; and 2) if the current constructed graph
is disconnected, the subroutine uses the minimum packet-type probability of
the extended edge that was chosen from the extended graphs of the previous
constructed graph, i.e., “p min” in the pseudocode in Figure 5.5. Next, if the
current constructed graph is disconnected, the TPN subroutine will not calcu-
late the TPN, and one should exit the subroutine. Otherwise, the subroutine
calculates the TPN based on Equation (5.4). Finally, the subroutine returns
the calculated TPN.
After the calculation of the TPN has been derived, we will illustrate the
interactions between the execution of the rectified graph reconstruction pro-
cedure, the growth of the constructed graph, as well as the calculation of the
TPN in the next section.
5.5 Graph Reconstruction Example
In this section, we illustrate the entire graph reconstruction process of the
RPPM algorithm through an example. The example follows the pseudocodes
of the packet-type probability calculation, the rectified graph reconstruction
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 140
R1 ν
R1 ν
R1
νR2
(G1,1) (G1,2)
ConstructedGraph
ExtendedGraphs
(G1)
R2
Figure 5.6: State C1: a constructed graph with one edge, and its extendedgraphs.
procedure, and the TPN calculation subroutine in Figures 4.5 (on Page 104),
5.2 (on Page 125), and 5.5 (on Page 138), respectively.
We have the following configuration in the example: the attack graph is
the linear network with three routers shown in Figure 4.7 (on Page 108, the
marking probability pm is 0.5, and the traceback confidence level P ∗ is 0.5.
Again, we present the example with the assumption that the constructed graph
is always connected (the worst-case scenario). Moreover, we assume that the
victim counts only the marked packets.
5.5.1 State C1
According to the assumption that every constructed graph is connected, the
victim first receives the edge (R1,V) and the rectified graph reconstruction
procedure enters state C1.
As the constructed graph G1 is connected, the TPN calculation subroutine
is executed. Firstly, one has to construct the extended graphs of G1. Since
G1 has two nodes, there will be two extended graphs namely G1,1 and G1,2 as
shown in Figure 5.6. Secondly, one has to calculate the marked packet-type
probabilities for both extended graphs, and the results are listed in Table 5.1.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 141
Edges (e) of G1,1
(R1,V) (R2, R1)P (Tm(G1,1)=e) 2
313
Edges (e) of G1,1
(R1,V) (R2,V)P (Tm(G1,2)=e) 1
212
Table 5.1: The marked packet-type probabilities of the extended graph G1,1
and G1,2.
According to the TPN calculation subroutine, one has to find the minimum
marked packet-type probability of the extended edges of the two extended
graphs. Referring to Table 5.1, the minimum value is 13. Then, the TPN of C1
τ1 is calculated as follows:
τ1 =
⌊
log(1 − 0.51
)
log(1 − 13)
+ 1
⌋
= ⌊2.7095⌋ = 2 .
Note that the variable X in the TPN calculation subroutine is initialized to
one.
Therefore, the victim should wait for two marked packets before it stops the
rectified graph reconstruction procedure. Suppose that a packet encoding the
edge (R2, R1) arrives before the victim exits the rectified graph reconstruction
procedure. Then, the procedure enters state C2.
5.5.2 State C2
Since the constructed graph C2 is again connected, the TPN calculation sub-
routine is executed. Before the subroutine starts calculating the TPN, the
accumulative state-change probability, i.e, the variable X in the TPN calcula-
tion subroutine, should be updated as follows:
X = 1 ×(
1 −(
1 − 1
3
)2)
=5
9.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 142
ν
ν(G2,1)
(G2,2)
ConstructedGraph
ExtendedGraphs
νν
(G2,3)
(G2)R1R2
R3 R2
R2 R2
R1
R1 R1
R3 R3
Figure 5.7: State C2: a constructed graph with two edge, and its extendedgraphs.
The current constructed graph G2 and the extended graphs of G2, G2,1,
G2,2, and G2,3, are shown in Figure 5.7. The marked packet-type probabilities
for the three extended graphs are listed in Table 5.2, and the minimum marked
packet-type probability of the extended edge is 17
from the extended graph G2,1.
Then, the TPN τ2 is calculated as follows:
τ2 =
log(
1 − 0.55
9
)
log(
1 − 17
) + 1
= ⌊15.9372⌋ = 15 .
Hence, the victim should wait for 15 marked packets before it stops the
RPPM algorithm.
5.5.3 State C3
Suppose that a new edge (R3, R2) arrives before the number of collected marked
packets is larger than the TPN of state C2. Now, the constructed graph is
exactly the same as the attack graph and the rectified graph reconstruction
procedure enters the state C3. Nevertheless, the procedure has not yet stopped
as the victim does not know the attack graph. By similar steps, one can find
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 143
Edges (e) of G2,1
(R1,V) (R2, R1) (R3, R2)P (Tm(G2,1)=e) 4
727
17
Edges (e) of G2,2
(R1,V) (R2, R1) (R3, R1)P (Tm(G2,2)=e) 2
316
16
Edges (e) of G2,3
(R1,V) (R2, R1) (R3,V)P (Tm(G2,3)=e) 2
515
25
Table 5.2: The marked packet-type probabilities of the extended graphsG2,1,G2,2, and G2,3.
that the victim has to wait for 100 marked packets before it stops the RPPM
algorithm. Since there is no more new types of marked packets arrived, the
procedure enters the termination state.
In the next section, we present the simulation results of the RPPM algo-
rithm which show the correctness and robustness of the RPPM algorithm.
5.6 Simulation Result
In this section, we present the simulation results to show that the RPPM algo-
rithm is able to guarantee the correctness of the constructed graph independent
of the marking probability and the structure of the attack graph. First, we
describe the simulation environment.
5.6.1 Simulation environment
Every simulation of the RPPM algorithm starts with a testing network rooted
at the victim, i.e., the attack graph. The configuration of the network follows
the assumption stated in Section 3.3 (on Page 85). In addition, the network
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 144
has at least one leaf router, i.e., a router with zero incoming edges. Each edge
between two routers is directed and is assumed to have infinite capacity; thus,
no packet is lost under this environment.
Next, we describe the properties of the simulated packets. All packets are
homogeneous in terms of type, size, etc. Every packet’s destination is set to
the victim, and every packet starts its itinerary at one of the leaf routers of
the testing network, chosen at random. Further, the paths traversed by the
packets are chosen at random.
5.6.2 Simulation: different values of the marking prob-
ability
In this set of simulations, the impact of the marking probability on the success-
ful rate of the RPPM algorithm will be studied. As presented in Section 4.2
(on Page 100), the marking probability is one of the factors that determines
the packet-type probability and also the termination packet number. As a
matter of fact, the marking probability is closely related to the occurrences of
the different execution scenarios described in Section 5.3.3.
A high value of the marking probability is analogous to the worst-case
scenario. If the value of the marking probability is high, most of the arrived
packets are encoding edges that are close to the victim. Then, the constructed
graph is always connected with a very high probability, and thus, this case is
analogous to the worst-case scenario. On the contrary, the execution of the
RPPM algorithm is close to the best-case scenario with a very low value of the
marking probability.
We have conducted a set of simulations to verify the above claims. In this
set of simulations, the testing network is the linear network depicted in Figure
4.7 (on Page 108). The simulations are performed at three different values of
the marking probability: 0.1, 0.5, and 0.9.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 145
0
0.2
0.4
0.6
0.8
1
0.999 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Suc
cess
ful r
ate
Traceback Confidence Level
RPPM algorithm simulation: 3-router linear network
pm = 0.1pm = 0.5pm = 0.9
bottom line
Figure 5.8: The simulations show that the larger the marking probability is,the closer to the worst-case execution the simulation result is.
Calculation of the successful rate
The results of the simulations are shown in Figure 5.8. Each data point in the
figure is obtained in the following way. For instance, we are computing the
data point for the plot with marking probability set to 0.1 and the traceback
confidence level set to zero. The RPPM algorithm is executed repeatedly
for 10,000 times with the marking probability set to 0.1 and the traceback
confidence level set to zero. For each instance of the RPPM algorithm, the
returned constructed graph is compared with the input network graph, i.e., the
3-router linear network in this case. If the constructed graph is the same as the
input network graph, then it is considered to be a successful execution. Else,
it is considered to be a failed execution. Hence, we are employing a stricter
definition of correctness than the one defined in Definition 3.1 on Page 82.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 146
Lastly, the successful rate, i.e., the y-axis of the simulation result, is ob-
tained as follows.
Successful rate =Number of successful executions
Number of executions.
Therefore, the successful rate is the value showing how the RPPM algorithm
performed. Note that the above calculation of the successful rate applies to
every simulation within this chapter.
Definition of the bottom line
Back to Figure 5.8, in spite of the simulation results, there is an extra plot
in the figure named the “bottom line” representing the function y = x. If
a data point is above the bottom line, i.e., the successful rate is higher than
the traceback confidence level, then this implies the RPPM algorithm can
guarantee that correctness of the constructed graph, and vice versa. Therefore,
we expected that no data point would appear below the bottom line. Note
that the above definition of the bottom line applies to every simulation within
this chapter.
Simulation result
We now analyze the simulation result. Firstly, all the data points are above
the bottom line, and this shows that the RPPM algorithm can guarantee the
correctness of the constructed graph under different values of the marking
probability.
Secondly, one can observe that as the marking probability increases, the
rate at which the RPPM algorithm returns a correct graph decreases. With
pm = 0.9, the plot is very close to the bottom line. According to above
definition of the bottom line, this simulation result means the successful rate
is very close to the traceback confidence level, and this implies the worst-case
scenario.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 147
0
0.2
0.4
0.6
0.8
1
0.999 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Suc
cess
ful r
ate
Traceback Confidence Level
RPPM algorithm simulation: 14-router linear network with random marking probability
Worst caseAverage case
bottom line
Figure 5.9: RPPM algorithm simulation: 15-node linear network with randommarking probability.
Through this set of simulations, we showed that the RPPM algorithm can
guarantee the correctness of the constructed graph under different values of
the marking probability.
5.6.3 Simulation: different graph structures
The second set of simulations is to test if the RPPM algorithm can guarantee
the promised successful rate under different graph structures. In this set of
simulations, we execute the simulations under both the worst-case and the
average-case scenarios. The worst-case scenario is forced to be happening by
restricting the packet generation process while the average-case scenario is a
normal execution of the RPPM algorithm without any constraints. Also, for
each execution of the RPPM algorithm, the marking probability is set to a
random number from 0.1 to 0.9 inclusively.
The simulation results for the linear network, the binary-tree network, and
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 148
0
0.2
0.4
0.6
0.8
1
0.999 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Suc
cess
ful r
ate
Traceback Confidence Level
RPPM algorithm simulation: 14-router binary-tree network with random marking probability
Worst caseAverage case
bottom line
Figure 5.10: RPPM algorithm simulation: 14-router binary-tree network withrandom marking probability.
0
0.2
0.4
0.6
0.8
1
0.999 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Suc
cess
ful r
ate
Traceback Confidence Level
RPPM algorithm simulation: 14-router random-tree network with random marking probability
Worst caseAverage case
bottom line
Figure 5.11: RPPM algorithm simulation: 14-router random-tree network withrandom marking probability.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 149
0
0.2
0.4
0.6
0.8
1
0.999 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Suc
cess
ful r
ate
Traceback Confidence Level
RPPM algorithm simulation: 100-router random-tree network with marking probability = 0.1
Worst caseAverage case
bottom line
Figure 5.12: RPPM algorithm simulation: 100-router random-tree networkwith marking probability = 0.1.
0
0.2
0.4
0.6
0.8
1
0.999 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Suc
cess
ful r
ate
Traceback Confidence Level
RPPM algorithm simulation: 500-router random-tree network with marking probability = 0.1
Worst caseAverage case
bottom line
Figure 5.13: RPPM algorithm simulation: 500-router random-tree networkwith marking probability = 0.1.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 150
0
0.2
0.4
0.6
0.8
1
0.999 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Suc
cess
ful r
ate
Traceback Confidence Level
RPPM algorithm simulation: 1000-router random-tree network with marking probability = 0.1
Worst caseAverage case
bottom line
Figure 5.14: RPPM algorithm simulation: 1,000-router random-tree networkwith marking probability = 0.1.
the random-tree network containing 14 routers and one victim are shown in
Figures 5.9, 5.10, and 5.11, respectively. The topologies of the linear and the
binary-tree networks are self-explaining, and a random-tree network means
the nodes are randomly connected with the following constraints: 1) every
router can reach the victim in a non-zero number of hops; 2) there must be no
cycles in the graph; 3) the victim must not have any outgoing edges; and 4)
every router can only have one outgoing edge. Also, as [78] suggested that the
longest router in the Internet is 32, then the maximum length of the paths of
the testing network is therefore 32.
All three results show that no matter what the network is, all the data
points are above the bottom line. Hence, this shows that the RPPM algo-
rithm guarantees the correctness of the constructed graph independent of the
structure of the real network graph. Also, the simulation results support the
claim that the average-case scenario is out-performing the worst-case scenario
in terms of the successful rate. Further, we extend the simulations on the
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 151
random-tree network to larger network scales with 100, 500, and 1000 routers,
and the results are shown in Figures 5.12, 5.13, and 5.14, respectively. Accord-
ing to the results, the increasing network scale does not affect the guarantee
provided by the RPPM algorithm.
5.6.4 Section summary
In conclusion, the simulation results showed that the RPPM algorithm guar-
antees the correctness of the constructed graph independent of the marking
probability and the structure of the attack graph.
5.7 Supporting Routers with Multiple Victim
Routes
In this section, we relax the assumption that every router has only one outgoing
route towards the victim, i.e., Assumption 3.6 on Page 90. This change may
cause the attack packets to take more than one path towards to the victim,
and the routers in the constructed graph may have more than one outgoing
edge,
In the following, we first discuss the problem that emerged when the RPPM
algorithm is applied to routers having multiple victim routes. In addition, a set
of simulations is performed to illustrate the severity of the problem. Second,
we present the solution to the problem caused by the relaxed assumption: the
method is to introduce an extra set of extended graphs. Lastly, we perform
simulations based on this solution and compare the results with and without
the support of multiple victim routes.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 152
5.7.1 Problem of multiple victim routes
Originally, without considering routers having multiple victim routes, the ar-
rival of a new encoded edge will add only a new node and a new edge to
the constructed graph (and note that it is the worst-case execution scenario).
However, when we allow a router to have multiple victim routes, the arrival of
a marked packet encoding a new edge can result in two different scenarios: (i)
a new node is added, i.e., one node plus one edge; or (ii) no new node is added,
which means the new edge connects two existing nodes. Since the latter case is
not considered by the RPPM algorithm, one may then doubt the guarantee of
the successful rate of the RPPM algorithm. The following simulation supports
this doubt.
The simulation environment
The testing network is a random-tree network with 10 nodes: one victim plus
nine routers. But this time, we allow the routers in the testing network to
have more than one victim route. Again, the marking probability is set to a
random number in [0.1 : 0.9], and the values are the same for all routers.
The simulation result
Figure 5.15 shows the simulation results for both the average-case and the
worst-case executions. For small values of the traceback confidence level, the
successful rates of both execution modes are still over the bottom line. How-
ever, the successful rate of the worst-case execution falls below the bottom line
when the traceback confidence level goes beyond 0.54 while the successful rate
of the average-case execution falls below the bottom line when the traceback
confidence level goes beyond 0.59.
One can conclude that the RPPM algorithm cannot provide a guarantee of
the successful rate in reconstructing the attack graph when the routers have
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 153
0
0.2
0.4
0.6
0.8
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.999
Suc
cess
ful R
ate
Traceback Confidence Level
RPPM Simulation: 9-router random network with random marking probability
Average case, Single victim routeWorst case, Single victim route
Bottom line
Figure 5.15: When the routers have more than one victim route, the RPPMalgorithm cannot guarantee the correctness of the constructed graph when theconfidence level is larger than 0.59.
multiple outgoing routes towards the victim.
5.7.2 Formulating an extra set of extended graphs
To solve the problem, we suggest introducing an extra set of extended graphs.
The new set of extended graphs is defined in Definition 5.2.
Definition 5.2 Let G′(Gi) be the set of extended graphs of the constructed
graph Gi = (Vi, Ei) that supports multiple outgoing routes towards the victim.
G′(Gi) = {G′e = (Vi, E
′e) | ∃(u, v) /∈ Ei & u, v ∈ Vi such that E ′
e = Ei ∪ (u, v)},
and all graphs in G′(Gi) must not have any cycles.
According to Definition 5.2, an extended graph in G ′(Gi) introduces an
extra edge to the constructed graph without an extra node. The edge connects
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 154
ConstructedGraph
New Type ofExtended Graphs
R1 νR2
R1 νR2
extended edge
Figure 5.16: An illustration of the extended graph with the support of multiplevictim routes.
any two existing nodes with two restrictions: (1) no cycles and (2) a multi-
graph should not be formed. Then, this definition creates a family of extended
graphs with routers having multiple victim routes.
We illustrate the definition of the new set of extended graphs through an
example in Figure 5.16. The upper part of the figure shows a constructed
graph with two routers, R1 and R2, and the victim v and the lower part of
the figure is the new extended graph. For this example, there can only be one
extra edge (R2, v) according to Definition 5.2.
5.7.3 Reformulation of packet-type probability
As Assumption 3.6 is violated, the calculation of the packet-type probability
which is formulated based on this assumption is therefore violated too. A new
formulation of the packet-type probability is required. In here, we repeat some
parts of the derivation of the packet-type probability in Section 4.2.1 on Page
101.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 155
Via probability
In the original calculation of the via probability presented in Equation (4.4)
on Page 102, the calculation considers there is only one path from a leaf router
Rl to the victim V. In our current scenario, there can be more than one
path between these two endpoints. Hence, each path between the endpoints
should be considered, and we assume that the probability for every path in
Path(Rl,V) to be chosen by a packet is the same. Therefore, the new via
probability is given as follows.
Via Probability for Multiple Victim Routers
Pv((Ri, Rj)) =1
|L(G)| ×∑
Rl∈L(G)
∑
r∈Path(Rl,V)
δ(Path(Rl,V), (Ri, Rj)) × 1
|Path(Rl,V)| . (5.6)
Conditional encoding probability
Since the conditional encoding probability does not concern how the paths are
selected but it concerns which path and which edge are selected, the conditional
encoding probability is not required to change, and it is given by Equation 4.5
on Page 103.
Reformulated packet-type probability
Finally, we have the reformulated packet-type probability as follows.
Packet-type Probability for Multiple Victim Routes
P (T (G)=(Ri, Rj)) =1
|L(G)| ×∑
Rl∈L(G)
∑
r∈Path(Rl,V)
δ(Path(Rl,V), (Ri, Rj)) × 1
|Path(Rl,V)| ×
pm × (1 − pm)d((Ri,Rj),V ,Path(Ri,V))−1. (5.7)
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 156
Packet type(Graph G)
/* The size of the “result” array is equal to the number of edges of theinput graph “G”. */1. result := allocate memory(G.edge) ;2. For (i := 0; i < G.edge; i := i + 1)3. result[i] := 0 ;4. end For5. Foreach leaf in G ; do/* “search path” finds all the paths from leaf to victim */6. path set := search path(leaf, victim) ;7. Foreach path in path set ; do8. Foreach edge in path ; do9. length := edge distance function(edge, victim, path) ;10. result[edge] := result[edge] + 1.0/G.leaf num × 1.0/path.num ×11. pm × (1 − pm)length−1;12. end Foreach13. end Foreach14. end Foreach15. return result ;
Figure 5.17: The pseudocode of the packet-type probability calculation sub-routine which supports multiple victim routes.
Reformulated packet-type probability calculation pseudocode
Lastly, the pseudocode of the calculation of the re-formulated packet-type
probabilities of a given graph is going to be different from the one given in
Figure 4.5 on Page 104.
The new pseudocode is displayed in Figure 5.17. The difference between
the new and the old pseudocodes is the “for loop” starting from line 7 in Figure
5.17. This loop searches for every path leading from a leaf router (leaf in line
5) towards the victim.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 157
0
0.2
0.4
0.6
0.8
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.999
Suc
cess
ful R
ate
Traceback Confidence Level
RPPM Simulation: 9-router random network with random marking probability
Average case, single victim routeWorst case, single victim route
Average case, multiple victim routeWorst case, multiple victim route
Bottom line
Figure 5.18: With the support for multiple victim routes, the RPPM algorithmcan provide the guarantee of the correctness of the constructed graph.
5.7.4 Simulation: support for multiple victim routes
Definitions 5.1 and 5.2 together form an expanded set of extended graphs,
and the pseudocode in Figure 5.17 provides the way to calculate the set of
packet-type probabilities of network graphs having multiple victim routers.
Next, We conduct the simulation again using the expanded set of extended
graphs, and the results are shown in Figure 5.18. In the figure, the RPPM
algorithm can guarantee the correctness of the constructed graph again with
the support of multiple victim routes. Technically speaking, the introduction
of the extra set of extended graphs is actually increasing the value of the
termination packet number (TPN). As the TPN increases, the successful rate
is therefore increasing.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 158
5.7.5 Section summary
In conclusion, we provided support for routers having multiple victim routes.
Such support is done through an expansion of the set of the extended graphs.
We performed simulations to contrast the performances of the RPPM algo-
rithm with and without such support.
The drawback of this support is computation. Let n be the number of
nodes and m be the number of edges of the constructed graph. Originally, the
number of extended graphs is of order O(n). With the mentioned support, the
order of the number of extended graphs becomes O(nm). Hence, more time
is spent on calculating the TPN at each connected state of the rectified graph
reconstruction procedure. This shows the tradeoff in handling routers with
multiple victim routes.
5.8 Deployment Issues of the RPPM Algorithm
In previous sections, we discussed the RPPM algorithm in a theoretical sense.
In this section, we discuss the issues in deploying the RPPM algorithm.
5.8.1 Choice of the marking probability
It is not desirable to have a high value of the marking probability. Firstly, a
high value of the marking probability means a low value for the packet-type
probabilities for the majority types of packets. Hence, this implies that a large
number of marked packets are needed before the RPPM algorithm stops. This
also implies a long execution time of the RPPM algorithm.
Let us take a linear network with three routers and one victim (shown
in Figure 4.7 (on Page 108) as an example to illustrate the relationship be-
tween the marking probability and the number of packets required. Figure
5.19 shows the result of a simulation that aims to count the average number
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 159
0
20
40
60
80
100
120
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Ave
rage
num
ber
of m
arke
d pa
cket
s
Marking probability
PPM algorithm simulation: 4-node linear network
Simulation result
Figure 5.19: Average number of marked packets required for a correct graphreconstruction against different values of the marking probability.
of marked packets required for a correct graph reconstruction with different
values of the marking probability. The result shows that, for small values of
marking probability, the number of required packets is small. Nevertheless,
the number of required packets increases dramatically for large values of the
marking probability.
Despite the above reason, according to Section 5.6.2 (on Page 144), a high
value of the marking probability implies the presence of the worst-case scenario
of the RPPM algorithm. Although the worst-case scenario can still guarantee
the successful rate, it would be more beneficial to set the value of the marking
probability to a lower value so as to gain a larger successful rate than expected.
According to the above analysis, one should choose a small value for the
marking probability for a faster and more reliable graph reconstruction. How-
ever, what will happen if a too-small value of the marking probability is chosen?
Figure 5.20 is going to tell us the result.
Figure 5.20 displays two plots: the plot labelled “w/o marked packets” is
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 160
0
20
40
60
80
100
120
140
160
180
200
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Ave
rage
num
ber
of m
arke
d pa
cket
s
Marking probability
PPM algorithm simulation: 3-router linear network
w/ unmarked packetsw/o unmarked packets
Figure 5.20: Average number of total packets (marked packets plus unmarkedpackets) required for a correct graph reconstruction against different values ofthe marking probability.
the original plot taken from Figure 5.19 while the remaining plot is the average
number of marked packets plus unmarked packets. As one can observe that if
a too-small value of the marking probability is chosen, more packets, mainly
the unmarked packets, are received before the PPM algorithm can construct a
correct constructed graph. This implies a longer execution time. In addition,
these two plots merge together eventually because the number of unmarked
packets received by the victim is diminishing as the the marking probability
increases.
One may be interested to find the value of the marking probability such
that the number of marked required is minimize. However, the calculation of
such a value requires prior knowledge about the attack graph. Therefore, the
determination of the best marking probability cannot be done in the current
stage of research.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 161
5.8.2 Execution time comparison between the PPM and
the RPPM algorithms
In order to guarantee the correctness of the constructed graph, the RPPM
algorithm has to collect extra packets so as to attain such a guarantee. Tech-
nically speaking, before the moment that the constructed graph becomes the
same as the attack graph, the number of marked packets collected should be
the same for both the PPM and RPPM algorithms. After the constructed
graph has become the attack graph, the RPPM algorithm has to wait until
the number of collected packets is larger than the termination packet number
(TPN). In other words, that extra sum of packets is the tradeoff in deploying
the RPPM algorithm than the PPM algorithm.
However, it is difficult to determine a theoretical value or bound of the TPN
because the TPN calculation depends on the construction process of the con-
structed graph. The construction process, in turns, depends on the sequence
of the arrivals of the marked packets, which is randomized. Alternatively, we
conduct an empirical study on the tradeoff of the RPPM algorithm.
We first conduct simulations that are similar to those presented in Section
5.6, and the simulations are executed on random-tree networks with 14 routers,
50 routers, 100 routers, 500 routers, and 1000 routers, with marking probability
set to 0.1. Figures 5.21, 5.22, and 5.23 show the number of marked packets
recorded from the simulations at different traceback confidence levels (and the
plots are separated into three figures because of different scales of the y-axis).
In addition, these simulations are operated under the average-case scenario.
In Figure 5.24, we present the number of marked packets increased when
one compares the number of packets collected by the RPPM algorithm to those
collected by the PPM algorithm. Note that the average number of marked
packets that is just enough to construct the attack graph is obtained by in-
structing the PPM algorithm to stop when the constructed graph just becomes
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 162
0
100
200
300
400
500
600
700
800
900
1000
0.999 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Ave
rage
num
ber
of p
acke
ts m
easu
red
Traceback Confidence Level
RPPM algorithm simulation: random-tree network with marking probability = 0.1
14 routers
Figure 5.21: The number of marked packets recorded for the set of RPPMalgorithm simulations carried out on a random-tree networks with 14 routers.
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0.999 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Ave
rage
num
ber
of p
acke
ts m
easu
red
Traceback Confidence Level
RPPM algorithm simulation: random-tree network with marking probability = 0.1
50 routers100 routers
Figure 5.22: The number of marked packets recorded for the set of RPPMalgorithm simulations carried out on random-tree networks with 50 routersand 100 routers.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 163
0
20000
40000
60000
80000
100000
120000
140000
160000
0.999 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Ave
rage
num
ber
of p
acke
ts m
easu
red
Traceback Confidence Level
RPPM algorithm simulation: random-tree network with marking probability = 0.1
500 routers1000 routers
Figure 5.23: The number of marked packets recorded for the set of RPPMalgorithm simulations carried out on random-tree networks with 500 routersand 1000 routers.
0
100
200
300
400
500
600
700
800
900
1000
1100
0.999 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
Per
cent
age
incr
ease
d
Traceback Confidence Level
RPPM algorithm simulation: random-tree network with marking probability = 0.1
14 routers50 routers
100 routers500 routers
1000 routers
Figure 5.24: The percentage of number of marked mpackets increased whencomparing the RPPM algorithm to the PPM algorithm with different networkscales.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 164
Number of Routers 14 50 100 500 1000
Average number ofmarked packets re-quired for PPM al-gorithm
92 1,170 1,550 15,222 40,792
Average time re-quired in 100BaseTEthernet (in second)
0.011 0.140 0.180 1.820 4.910
Table 5.3: The average number of packets and time required to form a correctconstructed graph in a 100BaseT Ethernet.
the attack graph. Hence, these plots show the tradeoff of the RPPM algorithm
at different network sizes and different traceback confidence level.
Three main observations can be concluded from Figure 5.24. Firstly, when
the traceback confidence level increases, the tradeoff of the RPPM algorithm
increases. Secondly, the number of collected packets by the RPPM algorithm
is larger than those collected by the PPM algorithm by several times for the
small range of the traceback confidence level (two to five times for the traceback
confidence level below 0.8) and such an increase reaches 10 times for high values
of the traceback confidence level.
Lastly, an interesting observation is that the tradeoffs for small networks
are more significant than those for large networks. This can be explained
by the probability in forming a disconnected graph. For a large network,
such a probability is much higher than that of a small network. When a
disconnected graph is formed, the TPN calculation is skipped until the graph
becomes connected. Hence, this keeps the value of the TPN small (because
of the reduced value of the accumulated state-change probability) during the
ending states of the RPPM algorithm.
On the other hand, according to Table 5.3, one can observe that the time for
the PPM algorithm to collect enough packets is in the order of a few seconds
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 165
in a 100BaseT Ethernet1 . Therefore, although the tradeoff of the RPPM
algorithm could reach a multiple of 10, such a tradeoff is acceptable.
5.8.3 Scalability issue in PPM algorithm
Scalability is one of the weaknesses of the PPM algorithm. One can observe
that as the path length between the victim and the leaf router becomes longer,
it becomes more difficult to collect a complete set of the marked packets. Not
only the path length affects the traceback time, but also the size of the attack
graph matters. In Figure 5.25, one can observe that the number of marked
packets required to build the constructed graph is increasing with the size of
the graph, and the trend is not subsiding. This shows that the PPM algorithm
itself has a scalability problem. Nonetheless, as the RPPM algorithm inherits
the packet marking procedure from the PPM algorithm, the RPPM algorithm
also has the scalability problem.
As suggested in the previous sub-section, for small networks, the traceback
process takes only a few seconds to complete. For an ISP network, the total
number of routers, including the backbone routers as well as the customer
routers, is, by average, in an order of thousands (the minimum number is 131
while the maximum number is 8,751) [79]. Therefore, the PPM algorithm
can handle the traceback problem on ISP networks. However, for networks
as large as the one provided by [80], which contains the routing map of the
whole Internet (with nearly 200,000 routers and more than 600,000 directed
links), the PPM algorithm appears to be powerless, and it is believed that a
traceback process may take days to finish.
Solving the scalability problem of the PPM algorithm is not the scope of
this thesis. Rather, we find a suitable application for the PPM algorithm to
be deployed effectively. As the PPM algorithm can perform traceback on an
1Under a 100BaseT Ethernet, one can transmit at most 8,333 packets (each with 1,500bytes) in one second.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 166
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
0 100 200 300 400 500 600 700 800 900 1000
Ave
rage
Num
ber
of M
arke
d P
acke
ts
Number of Routers
Average Number of Marked Packets vs Size of Attack Graph
Figure 5.25: Scalability analysis: average number of marked packets collectedby the PPM algorithm versus the size of the attack graph.
ISP network in a reasonable time, this actually justifies our choice in choosing
the PPM algorithm to be the microscopic traceback algorithm.
5.8.4 Precision problem
The last deployment issue concerns the precision in the TPN calculation sub-
routine. The worst consequence of this problem is a reduced guarantee on the
correctness of the RPPM algorithm.
Cause of the problem
In the TPN calculation subroutine, the accumulative state-change probabil-
ity Xi−1 in Equation (5.4) is updated whenever a new edge is added to the
constructed graph. Theoretically speaking, Xi−1 > P ∗ is always true because:
∵ τ ∗ ∈ Z+ & 0 < P (T (Ge) = e′) < 1,
∴ 1 − (1 − P (T (Ge) = e′))τ∗
> P ∗
Xi−1⇒ Xi−1 > P ∗.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 167
Though Xi−1 > P ∗ is always true, Xi−1 is approaching to P ∗ after each update.
At each update, a value less than one is multiplied to Xi−1.
Specifically, if P ∗ is very close to Xi−1, then it will result in a precision
problem: there may not have enough precision for the calculation of the ex-
pression log(1 − P ∗
Xi−1) in Equation (5.4). In the worst case, the expression
log(1 − P ∗
Xi−1) may become log(0), which is a floating-point exception.
The floating-point exception must be avoided, and, to avoid such a problem,
the TPN calculation subroutine should stop the update of the accumulated
state-change probability Xi−1.
Denote Xlimit as a real number between 0 and 1. When the difference
between P ∗ and Xi−1 is smaller than Xlimit, the TPN calculation algorithm
stops updating the accumulated state-change probability Xi−1. Then, Equa-
tion (5.5) (on Page 137), which originally updates Xi−1, is changed as follows.
Xi−1 =
Xi−2 ×(
1 −(
1 − P (T (Gi)=ei))τ∗
i−1
)
,
i > 1 &
|Xi−2 − P ∗| > Xlimit;
Xi−2,i > 1 &
|Xi−2 − P ∗| ≤ Xlimit;
1, i = 1.
(5.8)
Nevertheless, Equation (5.8) would lead to the failure of the guarantee on
the correctness of the constructed graph although the equation can effectively
prevent the difference between Xi−1 and P ∗ from becoming one.
Reduced correctness
We show why the guarantee is void. Let X∗i−1 be the value of the accumu-
lative state-change probability obtained by Equation (5.5), and let Xi−1 be
the bounded value of the accumulative state-change probability obtained by
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 168
Equation (5.8). Originally, the upper-bounded TPN τ ∗i is obtained as follows:
τ ∗i =
log(
1 − P ∗
X∗
i−1
)
log(1 − pmin)+ 1
.
Since X∗i−1 ≤ Xi−1, then
∵ log(x) is decreasing when 0 < x < 1,
⇒ log(
1 − P ∗
X∗
i−1
)
< log(
1 − P ∗
Xi−1
)
;
∵ log(x) is negative when 0 < x < 1,
⇒log
„
1− P∗
X∗
i−1
«
log(1−pmin)>
log“
1− P∗
Xi−1
”
log(1−pmin).
Therefore, the TPN obtained by the bounded accumulated state-change prob-
ability Xi−1 is smaller than that is obtained by the original accumulated state-
change probability X∗i−1.
The above finding implies a very undesirable consequence: the RPPM al-
gorithm terminates before it has truly reached the guaranteed correctness.
In the following, we introduce the runtime probability as a tool to under-
stand how the correct guarantee of the RPPM algorithm is void.
Runtime probability and graph reconstruction example
Runtime probability is the probability that the constructed graph is the same
as the attack graph calculated during the RPPM algorithm is running. By
definition, the accumulated state-change probability is already the runtime
probability, and the runtime probability is merely a more meaningful alias.
We take another look at the graph reconstruction example in Section 5.5
(on Page 139) to show how the runtime probability helps understanding how
the RPPM algorithm cannot provided the said guarantee. In the example,
we assume that Xlimit = 1, and thus the accumulative state-change probabil-
ity would not be updated when a new edge is added. Still, we assume that
the constructed graph is always connected. The marking probability and the
traceback confidence level P ∗ are both 0.5.
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 169
When the first marked packet arrives, it should encode the edge (R1, v) in
Figure 4.7. But, this time, X1 should not be updated and X1 is still equal to
1. Then, when the edge (R2, R1) is added to the constructed graph, the TPN
at state C2 becomes:
τ2 =
⌊
log(
1 − 0.51
)
log(
1 − 17
) + 1
⌋
= ⌊5.4966⌋ = 5 ;
If the number of marked packets arrived at the victim is larger than τ2
before the third edge arrives, the runtime probability is:
(
1 −(
1 − 1
3
)2)
×(
1 −(
1 − 1
7
)5)
= 0.2985 .
while, according to Section 5.5 (on Page 139), the original runtime probability
is:(
1 −(
1 − 1
3
)2)
×(
1 −(
1 − 1
7
)15)
= 0.5005 .
The above example shows that the runtime probability is 0.2985, and this
means the RPPM algorithm could provide only a guarantee on the correctness
of the constructed graph of 0.2985 although the required guarantee is 0.5.
Repeated executions
We propose that the RPPM algorithm should be executed more than one time
in order to provide the promised correctness again. The trick of the repeated
execution method is to treat the returned constructed graph from the previous
execution instance as the input of the new execution instance.
Intuitively, this method works as follows. Say, at the first execution instance
of the RPPM algorithm, the construction graph is not yet the attack graph.
Then, continuing the second execution instance means giving the constructed
graph chances to continue to evolve.
Mathematically, denote Prun,i as the runtime probability of the ith instance
of the RPPM algorithm. Then, the probability that the constructed graph is
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 170
Repeated RPPM Algorithm(Traceback Confidence Level P ∗)
1. Execute the RPPM algorithm at traceback confidence level P ∗ andwith an empty constructed graph;
2. Obtain the runtime probability P run;3. Obtain the constructed graph Gc;4. While P run < P ∗ ; do5. Execute the RPPM algorithm at traceback confidence level P ∗ and
with the constructed graph Gc;6. Obtain the runtime probability P run′;7. Obtain the constructed graph Gc;8. P run := 1 − ((1 − P run) × (1 − P run′));11. Done
Figure 5.26: The pseudocode of repeating the RPPM algorithm to increasethe runtime probability.
correct after n consecutive executions of the RPPM algorithm, P (repeat n times),
is given by:
P (repeat n times) = 1 −n∏
i=1
(1 − Prun,i) . (5.9)
Since Prun,i > 0, as n increases, P (repeat n times) also increases. There-
fore, one can keep repeating the execution of the RPPM algorithm until the
guaranteed correctness is reached.
Pseudocode of repeated execution
Figure 5.26 shows the pseudocode of repeating the executions of the RPPM
algorithm until the probability that the constructed graph is same as the attack
graph is larger than the traceback confidence level P ∗.
The pseudocode works as follows. After the RPPM algorithm has been
executed for the first time, the runtime probability as well as the constructed
graph can be obtained after. If the runtime probability P run is less than
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 171
the traceback confidence level P ∗, then the RPPM algorithm algorithm will
be executed repeatedly until P run is larger than P ∗.
To summarize, the RPPM algorithm has a precision problem when it is
deployed. We observed that the precision problem would cause the RPPM
algorithm to fail to guarantee on the correctness of the constructed graph. We
propose executing the RPPM algorithm repeatedly so as to increase such a
correctness until the traceback confidence level P ∗ is reached.
5.9 Chapter Summary
Based on the termination condition analysis, one can conclude that the ex-
pected sufficient packet number (described and derived in Chapter 4) is not a
desirable termination condition of the PPM algorithm. Yet, there is a need for
the PPM algorithm to have a guarantee of the correctness of the constructed
graph.
In this chapter, we have suggested a new termination condition of the PPM
algorithm. We devised the rectified graph reconstruction procedure that gives
a precise termination condition for the PPM algorithm, and we called the
new traceback approach the rectified probabilistic packet marking algorithm
(RPPM algorithm for short). The RPPM algorithm, on one hand, does not
require any previous knowledge about the network graph, and, on the other
hand, guarantees that the constructed graph is a correct one with a specified
probability, and such a probability is an input parameter of the algorithm.
We have carried out a series of simulations to show the correctness and
robustness of the RPPM algorithm. The simulation results show that the
RPPM algorithm can always satisfy our claim that the constructed graph is
correct with a given probability. Also, the algorithm is robust under different
values of the marking probability and different structures of the attack graphs.
Moreover, we have addressed issues when the RPPM algorithm is deployed. To
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 172
conclude, the RPPM algorithm is an effective means to improve the reliability
of the original PPM algorithm.
Conclusion and Future Work
In this thesis, we focus on the defense mechanisms against the distributed
denial-of-service attack (DDoS attack for short). Specifically, we target on the
traceback of the locations of the attackers who are demonstrating a flood-based
DDoS attack. Yet, we narrow down our scope, and we consider only the sources
that are sending out attack traffic as the attackers.
We have proposed a revolutionary, divide-and-conquer traceback method-
ology, and the methodology is twofold. When a global-scale attack happens,
we proposed that the first step of the traceback process is to locate the Internet
service providers (ISPs for short) that are contributing overwhelming traffic
through a macroscopic traceback algorithm. Once the problematic ISPs are
uncovered, in the next step, each concerned ISP should locate the attackers
within its administrative domain using a microscopic traceback algorithm.
Such a divide-and-conquer approach has two merits. First, it provides a
fast way to confine the domain of the DDoS attack. Second, if the scale of
the DDoS attack is large, this approach divides the traceback problem into
several sub-problems, and conquers them in a parallel manner. In the thesis,
we first devised a macroscopic traceback algorithm called the distributed snap-
shot traceback algorithm. Then, we employed and enhanced the well-known
probabilistic packet marking algorithm, which suits the conditions of being a
microscopic traceback algorithm.
The distributed snapshot traceback algorithm (snapshot algorithm for short)
173
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 174
is the first traceback algorithm of its kind. Leveraging the well-known Chandy-
Lamport distributed snapshot algorithm, the snapshot algorithm coordinates
the border routers of the ISPs, and collects statistics from the border routers
in a distributed sense. Given the collected data, the victim can determine the
ISPs that contain the possible locations of the attackers. The proof has justified
the correctness of the algorithm, and the simulation results have demonstrated
the robustness of the algorithm.
The probabilistic packet marking algorithm (PPM algorithm for short) is a
prized traceback approach in terms of simplicity and effectiveness, and it is one
of the best candidates for a microscopic traceback algorithm. Yet, algorithm
this is a renowned traceback algorithm, the termination condition of the PPM
algorithm is, however, seldom studied in the literature. Nevertheless, our
finding has shown that the well-accepted termination condition of the PPM
algorithm is not correct in general cases. Worse, the defective termination
condition would lead to incorrect traceback results.
Having known that the traditional termination condition is defective, we
provided a discrete-time Markov chain model that corrects the faults in the
calculation of the traditional termination condition of the PPM algorithm.
Nevertheless, the effort spent on correcting such a calculation is in vain. In
order to have a precise calculation of the tradition termination condition, one
has to know the paths taken by the attack traffic in advance. However, these
paths are the results that the PPM algorithm aims to achieve. This contra-
dicting condition led us to discontinue to follow the traditional termination
condition of the PPM algorithm.
On the contrary, we introduced a new termination condition for the PPM
algorithm, and based on the new termination condition, we devised the rectified
probabilistic packet marking algorithm (RPPM algorithm for short). The most
signification contribution of the RPPM algorithm is that when the algorithm
terminates, the algorithm guarantees the traceback result with a specified level
Chapter 5 Rectified Probabilistic Packet Marking Algorithm 175
of confidence. Our finding showed that the RPPM algorithm can provide such
a guarantee under different deployment scenarios. In conclusion, the RPPM
algorithm provides an autonomous way for the original PPM algorithm to
determine its termination, and it is a promising means to enhance the reliability
of the PPM algorithm.
Though the proposed solutions in the thesis are self-contained, there is
room for future research. For both the distributed snapshot traceback algo-
rithm and the RPPM (PPM) algorithm, they are prone to attacks caused by
compromised border routers or fake request (or marker) packets. A tailor-made
authentication protocol can be designed to resist the attacks, in a best-effort
manner.
For the RPPM algorithm, the scalability problem is worth noticing. As
mentioned in Section 5.8.3, when the scale of the real attack graph increases,
the number of the marked packets required to have the correct constructed
graph also increases. This prohibits the PPM algorithm to be deployed in
a world-wide scale. One possible research direction is to devise a methodol-
ogy to adaptively change the marking probability. This solution can minimize
the number of packets required by cleverly changing the marking probability.
Though it is believed that this approach would work mathematically, the pro-
tocol for autonomously changing the marking probability may be difficult to
formulate. Further research effort is required.
Bibliography
[1] F. Lau, S. H. Rubin, M. H. Smith, and L. Trajkovic, “Distributed Denial
of Service Attacks,” in IEEE International Conference on Systems, Man,
and Cybernetics, pp. 2275–2280, 2000.
[2] “Computer Emergency Response Team, CERT Advisory CA-2000-
01: Denial-of-Service Developments, http://www.cert.org/advisories/CA-
2000-01.html.”
[3] “Computer Emergency Response Team, CERT Advisory CA-1996-21:
TCP SYN Flooding and IP Spoofing Attacks, http://www.cert.org/-
advisories/CA-1996-21.html.”
[4] “DARPA Internet Program. RFC 793: Transmission Control Protocol,”
Sept. 1981.
[5] J. Lemon, “Resisting SYN Flood DoS Attacks with a SYN Cache,” in
Proceedings of BSDCON 2002, pp. 89–98, 2002.
[6] A. Kuzmanovic and E. W. Knightly, “Low-rate TCP-Targeted Denial of
Service Attacks: the Shrew vs. the Mice and Elephants,” in Proceedings
of ACM SIGCOMM 2003, pp. 75–86, 2003.
[7] H. Sun, J. C. S. Lui, and D. K. Y. Yau, “Distributed Mechanism in
Detecting and Defending Against the Low-rate TCP Attack,” Computer
Networks Journal, vol. 50, Sep 2006.
176
[8] H. Sun, J. C. S. Lui, and D. K. Y. Yau, “Defending Against Low-rate
TCP Attack: Dynamic Detection and Protection,” in IEEE International
Conference on Network Protocols (ICNP), Berlin, Germany, 2004.
[9] A. Shevtekar, K. Anantharam, and N. Ansari, “Low Rate TCP Denial-
of-Service Attack Detection at Edge Routers,” IEEE Communications
Letters, vol. 9, pp. 262–265, April 2005.
[10] J. Elliott, “Distributed Denial of Service Attacks and the Zombie Ant
Effect,” IT Professional, vol. 2, no. 2, pp. 55–57, 2000.
[11] R. Chang, “Defending against Flooding-based Distributed Denial-of-
Service Attacks: a Tutorial,” IEEE Communications Magazine, vol. 40,
no. 10, pp. 42–51, 2002.
[12] S. Dietrich, N. Long, and D. Dittrich, “Analyzing Distributed Denial of
Service Tools: The Shaft Case,” in Proceedings of the 14th System Ad-
ministration Conference, pp. 329–339, 2000.
[13] W. Lee and S. J. Stolfo, “A Framework for Constructing Features and
Models for Intrusion Detection Systems,” ACM Transactions on Infor-
mation and System Security (TISSEC), vol. 3, no. 4, pp. 227–261, 2000.
[14] J. Beale, Snort 2.1 Intrusion Detection, Second Edition. Syngress, 2 ed.,
May 2004.
[15] V. Paxson, “An Analysis of Using Reflectors for Distributed Denial-of-
Service Attacks,” ACM SIGCOMM Computer Communication Review,
vol. 31, no. 3, pp. 38 – 47, 2001.
[16] N. Naoumov and K. Ross, “Exploiting P2P Systems for DDoS Attacks,”
in Proceedings of the 1st International Conference on Scalable Information
Systems. Article Number 47, 2006.
177
[17] J. Barlow and W. Thrower, “TFN2K - An Analysis. The AXENT Secu-
rity Team. http://www.symantec.com/avcenter/security/Content/2000-
02 10 a.html,” 2000.
[18] D. Dittrich, “The DoS Project’s “Trinoo” Distributed Denial of Service
Attack Tool. http://staff.washington.edu/dittrich/misc/trinoo.analysis,”
1999.
[19] D. Dittrich, “The “Stacheldraht” Distributed Denial of Service Attack
Tool. http://staff.washington.edu/dittrich/misc/stacheldraht.analysis,”
1999.
[20] C. C. Zou, W. Gong, and D. Towsley, “Code Red Worm Propagation
Modeling and Analysis,” in Proceedings of the 9th ACM Conference on
Computer and Communications Security, pp. 138–147, 2002.
[21] “Netcraft: Web Server Survey Archive. http://www.netcraft.com.”
[22] D. Moore, V. Paxson, S. Savage, C. Shannon, S. Staniford, and N. Weaver,
“Inside the slammer worm,” IEEE Security and Privacy, vol. 1, no. 4,
pp. 33–39, 2003.
[23] S. Adler, “The Slashdot Effect: an Analysis of Three Internet Publica-
tions. http://ssadler.phy.bnl.gov/adler/SDE/SlashDotEffect.html,” 1999.
[24] R. Mahajan, S. M. Bellovin, S. Floyd, J. Ioannidis, V. Paxson, and
S. Shenker, “Controlling High Bandwidth Aggregates in the Network,”
ACM SIGCOMM Computer Communication Review, vol. 32, pp. 62–73,
Jul 2002.
[25] X. Chen and J. Heidemann, “Flash Crowd Mitigation via Adaptive Ad-
mission Control based on Application-Level Observations,” ACM Trans-
actions on Internet Technology (TOIT), vol. 5, no. 3, pp. 532–569, 2005.
178
[26] J. Mirkovic and P. Reiher, “A Taxonomy of DDoS Attack and DDoS De-
fense Mechanisms,” ACM SIGCOMM Computer Communication Review,
vol. 34, no. 2, pp. 39 – 53, 2004.
[27] A. Hussain, J. Heidemann, and C. Papadopoulos, “A Framework for Clas-
sifying Denial of Service Attacks,” in Proceedings of the 2003 conference
on Applications, Technologies, Architectures, and Protocols for Computer
Communications (SIGCOMM), pp. 99 – 110, 2003.
[28] D. Barry, “Proactive Protection: New techniques and best practices help
service providers counter increase in cyber attacks,” Packet: Cisco Sys-
tems Users Magazine, vol. 16, no. 1, pp. 64–68, 2004.
[29] T. Y. Wong, K. T. Law, J. C. S. Lui, and M. H. Wong, “An Efficient Dis-
tributed Algorithm to Identify and Traceback DDoS Traffic,” The Com-
puter Journal, vol. 49, no. 4, pp. 418–442, 2006.
[30] S. Savage, D. Wetherall, A. Karlin, and T. Anderson, “Practical Network
Support for IP Traceback,” in Proceedings of the 2000 ACM SIGCOMM
Conference, pp. 295–306, 2000.
[31] T. Y. Wong, J. C. S. Lui, and M. H. Wong, “Markov Chain Modeling
of the Probabilistic Packet Marking Algorithm,” International Journal of
Network Security, vol. 5, no. 1, pp. 32–40, 2007.
[32] T. Y. Wong, M. H. Wong, and J. C. S. Lui, “A Precise Termination
Condition of the Probabilistic Packet Marking Algorithm,” Accepted by
IEEE Transactions on Dependable and Secure Computing, August 2007.
[33] E. Dijkstra and C. Scholten, “Termination detection for diffusing com-
putuations,” Information Processing Letter, vol. 11, pp. 1–4, Aug 1980.
179
[34] K. Chandy and L. Lamport, “Distributed Snapshots: Determining Global
States of Distributed Systems,” ACM Transactions on Computer Systems,
vol. 3, pp. 63–75, Feb 1985.
[35] L. Lamport, “Time, Clocks, and the Ordering of Events in a Distributed
System,” Communications of the ACM, vol. 21, pp. 558–565, Jul 1978.
[36] M. J. Fischer, N. D. Griffeth, and N. A. Lynch, “Global States of a Dis-
tributed System,” IEEE Transactions on Software Engineering, vol. 8,
no. 3, pp. 198–202, 1982.
[37] R. Koo and S. Toueg, “Checkpointing and Rollback-Recovery for Dis-
tributed Systems,” IEEE Transactions on Software Engineering, vol. 13,
no. 1, pp. 23–31, 1987.
[38] E. N. Elnozahy, L. Alvisi, Y. M. Wang, and D. B. Johnson, “A Survey of
Rollback-Recovery Protocols in Message-Passing Systems,” ACM Com-
puting Surveys, vol. 34, no. 3, pp. 375–408, 2002.
[39] P. A. Bernstein, V. Hadzilacos, and N. Goodman, Concurrency Control
and Recovery in Database Systems. Addison-Wesley, 1987.
[40] S. Bellovin, “Security Problems in the TCP/IP Protocol Suite,” ACM
Computer Communications Review, vol. 19, no. 2, pp. 32 – 48, 1989.
[41] P. Ferguson and D. Senie, “RFC 2267: Network Ingress Filtering: Defeat-
ing Denial of Service Attacks which Employ IP Source Address Spoofing,”
The Internet Society, January 1998.
[42] “Egress filtering v 0.2, global incident analysis center. http://-
www.sans.org/y2k/egress.htm.”
180
[43] K. Park and H. Lee., “On the Effectiveness of Route-Based Packet Filter-
ing for Distributed DoS Attack Prevention in Power-Law Internets,” in
Proceedings of ACM SIGCOMM 2001, pp. 15 – 26, 2001.
[44] D. K. Y. Yau, J. C. S. Lui, F. Liang, and Y. Yam, “Defending Against
Distributed Denial-of-service Attacks with Max-min Fair Server-centric
Router Throttles,” IEEE/ACM Transactions on Networking, vol. 13,
no. 1, pp. 29–42, 2005.
[45] S. Chen and Q. Song, “Perimeter-Based Defense against High Bandwidth
DDoS Attacks,” IEEE Transactions on Parallel and Distributed Systems,
vol. 16, no. 6, pp. 526– 537, 2005.
[46] J. Xu and W. Lee, “Sustaining Availability of Web Services under Dis-
tributed Denial of Service Attacks,” IEEE Transactions on Computers,
vol. 52, no. 2, 2003.
[47] K. T. Law, J. C. S. Lui, and D. K. Y. Yau, “You Can Run, But You Can’t
Hide: An Effective Methodology to Traceback DDoS Attackers,” IEEE
Transactions on Parallel and Distributed Systems, vol. 15, no. 9, pp. 799
– 813, 2005.
[48] D. X. Song and A. Perrig, “Advanced and Authenticated Marking
Schemes for IP Traceback,” in Proceedings of IEEE INFOCOM ’01,
pp. 878–886, April 2001.
[49] K. Park and H. Lee., “On the Effectiveness of Probabilistic Packet Mark-
ing for IP Traceback under Denial of Service Attack,” in Proceedings of
IEEE INFOCOM ’01, pp. 338 – 347, 2001.
[50] D. Dean, M. Franklin, and A. Stubblefield, “An Algebraic Approach to
IP Traceback,” in Proceedings of Network and Distributed System Security
Symposium, NDSS ’01, February 2001.
181
[51] D. Dean, M. Franklin, and A. Stubblefield, “An Algebraic Approach to
IP Traceback,” ACM Transactions on Information and System Security
(TISSEC), vol. 5, no. 2, pp. 119–137, 2002.
[52] A. C. Snoeren, C. Partridge, L. A. Sanchez, C. E. Jones, F. Tchakountio,
S. T. Kent, and W. T. Strayer, “Hash-Based IP Traceback,” in Proceedings
of the ACM SIGCOMM 2001 Conference on Applications, Technologies,
Architectures, and Protocols for Computer Communication, pp. 3–14, Au-
gust 2001.
[53] M. Adler, “Trade-Offs in Probabilistic Packet Marking for IP Traceback,”
Journal of the ACM, vol. 52, pp. 217–244, March 2005.
[54] M. Sung and J. Xu, “IP Traceback-based Intelligent Packet Filtering: A
Novel Technique for Defending Against Internet DDoS Attacks,” IEEE
Transactions on Parallel and Distributed Systems, vol. 14, no. 9, pp. 861–
872, 2003.
[55] A. Belenky and N. Ansari, “IP Traceback with Deterministic Packet
Marking,” IEEE Communications Letters, vol. 7, pp. 162–164, April 2003.
[56] A. Belenky and N. Ansari, “On IP Traceback,” IEEE Communications
Magazine, vol. 41, pp. 142–153, July 2003.
[57] Z. Gao and N. Ansari, “Tracing Cyber Attacks from the Practical Per-
spective,” IEEE Communications Magazine, vol. 43, pp. 123–131, May
2005.
[58] D. E. Taylor, J. W. Lockwood, T. S. Sproull, and D. B. P. Jonathan
S. Turner, “Scalable IP Lookup for Programmable Routers,” in Proceed-
ings of IEEE Infocom, 2002.
182
[59] L. L. Peterson, S. Karlin, and K. Li, “OS Support for General-Purpose
Routers,” in Workshop on Hot Topics in Operating Systems, pp. 38–43,
1999.
[60] X. Qie, A. Bavier, L. Peterson, and S. Karlin, “Scheduling Computa-
tions on a Software-Based Router,” in Proceedings of ACM SIGMET-
RICS, June 2001.
[61] D. K. Y. Yau and X. Chen, “Resource Management in Software-
Programmable Router Operating Systems,” IEEE Journal on Selected
Areas in Communications (JSAC), vol. 19, March 2001.
[62] “Internet Mapping Project, http://research.lumeta.com/ches/map/-
index.html,” 1999.
[63] Steven M. Bellovin and Marcu Leech and Tom Taylor, ICMP Traceback
Messages, Internet Draft: draft-bellovin-itrace-04.txt, Feb 2003.
[64] “Cooperative Association for Internet Data Analysis, http://-
www.caida.org/.”
[65] S. F. Wu, L. Zhang, D. Massey, and A. Mankin, Intention-Driven ICMP
Trace-Back, Internet Draft: draft-wu-itrace-intention-00.txt. submission
date Feb. 2001, expiration date Aug. 2001.
[66] A. Mankin, D. Massey, C.-L. Wu, S. F. Wu, and L. Zhang, “On Design
and Evaluation of Intention-Driven ICMP Traceback,” in Proceedings of
IEEE Int. Conference on Computer Communications and Networks, 2001.
[67] B. C. Chan, J. C. Lau, and J. C. Lui, “OPERA: An Open-source Extensi-
ble Router Architecture for Adding New Network Services and Protocols,”
Journal of Systems and Software, vol. 78, no. 1, pp. 24–36, 2005.
[68] “The Netfiler/iptables Projects. http://www.netfilter.org.”
183
[69] J. Harris and A. J. Melara, “Performance analysis of the linux fire-
wall in a host,” in CiNIC - Calpoly intelligent NIC Project, http://-
www.ee.calpoly.edu/3comproject/, 2002.
[70] K. R. Tao Peng, Christopher Leckie, “Adjusted Probabilistic Packet
Marking for IP Traceback,” in NETWORKING 2002, pp. 697–708, 2002.
[71] C. Hedrick, “RFC 1058: Routing Information Protocol,” The Internet
Society, June 1988.
[72] J. Moy, “RFC 2328: Open Shortest Path First (OSPF) Version 2,” The
Internet Society, April 1998.
[73] H. von Schelling, “Coupon Collecting for Unequal Probabilities,” Amer-
cian Mathematics Monthly, vol. 61, pp. 306–311, 1954.
[74] P. J. Courtois, Decomposability: Queueing and computer system applica-
tions. Academic Press, 1977.
[75] L. Golubchik and J. C. Lui, “Bounding of Performance Measures for
Threshold-based Queueing Systems: Theory and Application to Dynamic
Resource Management in Video-on-Demand Servers,” IEEE Transactions
of Computers, vol. 51, pp. 353–372, Apr. 2002.
[76] K. S. Trivedi, Probability and Statistics with Reliability, Queuing and
Computer Science Applications. Wiley-Interscience, 2002.
[77] U. Bhat, Elements of Applied Stochastic Processes. New York: Wiley,
1984.
[78] V. Paxson, “End-to-end Routing Behavior in the Internet,” IEEE/ACM
Transactions on Network-ing, vol. 5, pp. 601–615, Oct. 1997.
184
[79] N. Spring, R. Mahajan, D. Wetherall, and T. Anderson, “Measuring ISP
Topologies with Rocketfuel,” IEEE/ACM Transactions on Networking,
vol. 12, no. 1, pp. 2–16, 2004.
[80] Cooperative Association for Internet Data Analysis, CAIDA, “CAIDA’s
Router-Level Topology Measurements, http://www.caida.org/tools/-
measurement/skitter/router topology/.”
185