16
Robust Correlation of Encrypted Attack Traffic through Stepping Stones by Flow Watermarking Xinyuan Wang, Member, IEEE, and Douglas S. Reeves, Member, IEEE Abstract—Network-based intruders seldom attack their victims directly from their own computer. Often, they stage their attacks through intermediate “stepping stones” in order to conceal their identity and origin. To identify the source of the attack behind the stepping stone(s), it is necessary to correlate the incoming and outgoing flows or connections of a stepping stone. To resist attempts at correlation, the attacker may encrypt or otherwise manipulate the connection traffic. Timing-based correlation approaches have been shown to be quite effective in correlating encrypted connections. However, timing-based correlation approaches are subject to timing perturbations that may be deliberately introduced by the attacker at stepping stones. In this paper, we propose a novel watermark- based-correlation scheme that is designed specifically to be robust against timing perturbations. Unlike most previous timing-based correlation approaches, our watermark-based approach is “active” in that it embeds a unique watermark into the encrypted flows by slightly adjusting the timing of selected packets. The unique watermark that is embedded in the encrypted flow gives us a number of advantages over passive timing-based correlation in resisting timing perturbations by the attacker. In contrast to the existing passive correlation approaches, our active watermark-based correlation does not make any limiting assumptions about the distribution or random process of the original interpacket timing of the packet flow. In theory, our watermark-based correlation can achieve arbitrarily close to 100 percent correlation true positive rate (TPR), and arbitrarily close to 0 percent false positive rate (FPR) at the same time for sufficiently long flows, despite arbitrarily large (but bounded) timing perturbations of any distribution by the attacker. Our paper is the first that identifies 1) accurate quantitative tradeoffs between the achievable correlation effectiveness and the defining characteristics of the timing perturbation; and 2) a provable upper bound on the number of packets needed to achieve a desired correlation effectiveness, given the amount of timing perturbation. Experimental results show that our active watermark-based correlation performs better and requires fewer packets than existing, passive timing-based correlation methods in the presence of random timing perturbations. Index Terms—Network-level security and protection, intrusion tracing, correlation, stepping stone. Ç 1 INTRODUCTION N ETWORK-BASED attacks have become a serious threat to the critical information infrastructure on which we depend. To stop or repel network-based attacks, it is critical to be able to identify the source of the attack. Attackers, however, go to some lengths to conceal their identities and origin, using a variety of countermeasures. As an example, they may spoof the IP source address of the attack traffic. Methods of tracing spoofed traffic, generally known as IP traceback [23], [26], [9], [14] have been developed to address this countermeasure. Another common and effective countermeasure used by network-based intruders to hide their identity is to connect through a sequence of intermediate hosts, or stepping stones, before attacking the final target. For example, an attacker at host A may telnet or SSH into host B, and from there, launch an attack on host C. In effect, the incoming packets of an attack connection from A to B are forwarded by B, and become outgoing packets of a connection from B to C. The two connections or flows are related in such a case. The victim host C can use IP traceback to determine the second flow originated from host B, but traceback will not be able to correlate this with the attack flow originating from host A. To trace attacks through a stepping stone, it is necessary to correlate the incoming traffic with the outgoing traffic at the stepping stone. This would allow the attack to be traced back to host A in the example. The earliest work on connection correlation was based on tracking user’s login activities at different hosts [11], [25]. Later work relied on comparing the packet contents, or payloads of the connections to be correlated [27], [33]. Most recent work has focused on the timing characteristics [36], [35], [32], [8], [1] of connections, in order to correlate encrypted connections (i.e., traffic encrypted using IPSEC [12] or SSH [18], [34]). Timing-based correlation approaches, however, are sensitive to the use of countermeasures by the attacker, or adversary. In particular, the attacker can perturb the timing characteristics of a connection by selectively or randomly introducing extra delays when forwarding packets at the stepping stones. This kind of timing perturbation will adversely affect the effectiveness of any timing-based correlation. Timing perturbation can either make unrelated flows have similar timing characteristics, or make related flows exhibit different timing characteristics. This will 434 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011 . X. Wang is with the Department of Computer Science, George Mason University, University Drive, MSN 4A5, Fairfax, VA 22030. E-mail: [email protected]. . D.S. Reeves is with the Department of Computer Science and the Department of Electrical, and Computer Engineering, Campus Box 8206, North Carolina State University, Raleigh, NC 27695-8206. E-mail: [email protected]. Manuscript received 20 Nov. 2005; revised 9 June 2007; accepted 7 Dec. 2009; published online 6 Aug. 2010. Recommended for acceptance by V. Gligor. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TDSC-0156-1105. Digital Object Identifier no. 10.1109/TDSC.2010.35. 1545-5971/11/$26.00 ß 2011 IEEE Published by the IEEE Computer Society

434 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE … · 2014-01-31 · IP traceback [23], [26], [9], [14] have been developed to address this countermeasure. ... While passive approaches

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 434 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE … · 2014-01-31 · IP traceback [23], [26], [9], [14] have been developed to address this countermeasure. ... While passive approaches

Robust Correlation of Encrypted Attack Trafficthrough Stepping Stones by Flow Watermarking

Xinyuan Wang, Member, IEEE, and Douglas S. Reeves, Member, IEEE

Abstract—Network-based intruders seldom attack their victims directly from their own computer. Often, they stage their attacks

through intermediate “stepping stones” in order to conceal their identity and origin. To identify the source of the attack behind the

stepping stone(s), it is necessary to correlate the incoming and outgoing flows or connections of a stepping stone. To resist attempts at

correlation, the attacker may encrypt or otherwise manipulate the connection traffic. Timing-based correlation approaches have been

shown to be quite effective in correlating encrypted connections. However, timing-based correlation approaches are subject to timing

perturbations that may be deliberately introduced by the attacker at stepping stones. In this paper, we propose a novel watermark-

based-correlation scheme that is designed specifically to be robust against timing perturbations. Unlike most previous timing-based

correlation approaches, our watermark-based approach is “active” in that it embeds a unique watermark into the encrypted flows by

slightly adjusting the timing of selected packets. The unique watermark that is embedded in the encrypted flow gives us a number of

advantages over passive timing-based correlation in resisting timing perturbations by the attacker. In contrast to the existing passive

correlation approaches, our active watermark-based correlation does not make any limiting assumptions about the distribution or

random process of the original interpacket timing of the packet flow. In theory, our watermark-based correlation can achieve arbitrarily

close to 100 percent correlation true positive rate (TPR), and arbitrarily close to 0 percent false positive rate (FPR) at the same time for

sufficiently long flows, despite arbitrarily large (but bounded) timing perturbations of any distribution by the attacker. Our paper is the

first that identifies 1) accurate quantitative tradeoffs between the achievable correlation effectiveness and the defining characteristics

of the timing perturbation; and 2) a provable upper bound on the number of packets needed to achieve a desired correlation

effectiveness, given the amount of timing perturbation. Experimental results show that our active watermark-based correlation

performs better and requires fewer packets than existing, passive timing-based correlation methods in the presence of random timing

perturbations.

Index Terms—Network-level security and protection, intrusion tracing, correlation, stepping stone.

Ç

1 INTRODUCTION

NETWORK-BASED attacks have become a serious threat tothe critical information infrastructure on which we

depend. To stop or repel network-based attacks, it is criticalto be able to identify the source of the attack. Attackers,however, go to some lengths to conceal their identities andorigin, using a variety of countermeasures. As an example,they may spoof the IP source address of the attack traffic.Methods of tracing spoofed traffic, generally known asIP traceback [23], [26], [9], [14] have been developed toaddress this countermeasure.

Another common and effective countermeasure used by

network-based intruders to hide their identity is to connect

through a sequence of intermediate hosts, or stepping stones,

before attacking the final target. For example, an attacker at

host A may telnet or SSH into host B, and from there,

launch an attack on host C. In effect, the incoming packets

of an attack connection from A to B are forwarded by B, andbecome outgoing packets of a connection from B to C. Thetwo connections or flows are related in such a case. Thevictim host C can use IP traceback to determine the secondflow originated from host B, but traceback will not be ableto correlate this with the attack flow originating fromhost A. To trace attacks through a stepping stone, it isnecessary to correlate the incoming traffic with the outgoingtraffic at the stepping stone. This would allow the attack tobe traced back to host A in the example.

The earliest work on connection correlation was based on

tracking user’s login activities at different hosts [11], [25].

Later work relied on comparing the packet contents, or

payloads of the connections to be correlated [27], [33]. Most

recent work has focused on the timing characteristics [36],

[35], [32], [8], [1] of connections, in order to correlate

encrypted connections (i.e., traffic encrypted using IPSEC

[12] or SSH [18], [34]).Timing-based correlation approaches, however, are

sensitive to the use of countermeasures by the attacker, oradversary. In particular, the attacker can perturb the timingcharacteristics of a connection by selectively or randomlyintroducing extra delays when forwarding packets at thestepping stones. This kind of timing perturbation willadversely affect the effectiveness of any timing-basedcorrelation. Timing perturbation can either make unrelatedflows have similar timing characteristics, or make relatedflows exhibit different timing characteristics. This will

434 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011

. X. Wang is with the Department of Computer Science, George MasonUniversity, University Drive, MSN 4A5, Fairfax, VA 22030.E-mail: [email protected].

. D.S. Reeves is with the Department of Computer Science and theDepartment of Electrical, and Computer Engineering, Campus Box 8206,North Carolina State University, Raleigh, NC 27695-8206.E-mail: [email protected].

Manuscript received 20 Nov. 2005; revised 9 June 2007; accepted 7 Dec. 2009;published online 6 Aug. 2010.Recommended for acceptance by V. Gligor.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TDSC-0156-1105.Digital Object Identifier no. 10.1109/TDSC.2010.35.

1545-5971/11/$26.00 � 2011 IEEE Published by the IEEE Computer Society

Page 2: 434 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE … · 2014-01-31 · IP traceback [23], [26], [9], [14] have been developed to address this countermeasure. ... While passive approaches

increase the correlation false positive rate, or decrease thecorrelation true positive rate, respectively.

Previous timing-based correlation approaches are pas-sive in that they simply examine (but do not manipulate)the traffic timing characteristics for correlation purposes.While passive approaches are simple and easy to imple-ment, they may be vulnerable to active countermeasures bythe attacker, and/or require a large number of packets inorder to correlate timing-perturbed flows.

In this paper, we address the random timing-perturba-tion problem in correlating encrypted connections throughstepping stones. Our goal is to develop an efficientcorrelation scheme that is probabilistically robust againstrandom timing perturbation, and to answer fundamentalquestions concerning the effectiveness of such techniquesand the tradeoffs involved in implementing them.

We propose a novel watermark-based-correlation schemethat is designed specifically to be robust against timingperturbations by the adversary. Unlike most previouscorrelation approaches, our watermark-based approach isactive; i.e., it embeds a unique watermark into the encryptedflows by slightly adjusting the timing of selected packets.The unique watermark that is embedded in the encryptedflow gives us a number of advantages over passive timing-based correlation in overcoming timing perturbations by theadversary. First, our active watermark-based correlationdoes not make any limiting assumptions about the distribu-tion or random process of the original interpacket timing ofthe packet flow, or the distribution of random delays anadversary can add. This is in contrast to existing passivetiming-based correlation approaches. Second, our methodrequires substantially fewer packets in the flow to achievethe same level of correlation effectiveness as existing passivetiming-based correlation. In theory, our watermark-basedcorrelation can achieve arbitrarily close to 100 percentcorrelation true positive rate, and arbitrarily close to0 percent false positive rate at the same time for sufficientlylong flows, despite arbitrarily large (but bounded) timingperturbation of arbitrary distribution by the adversary. Tothe best of our knowledge, our paper is the first thatidentifies 1) the accurate quantitative tradeoffs between theachievable correlation effectiveness and the defining char-acteristics of the timing perturbation; and 2) a provableupper bound on the number of packets needed to achieve adesired correlation effectiveness, given a bound on theamount of timing perturbation.

We also investigate the maximum negative impact on theembedded watermark an adversary can have, and theminimum effort needed to achieve that impact. Underthe condition that the watermark-embedding parametersare unknown to the adversary, we determine the minimumdistortion required for the adversary to completely elim-inate any embedded watermark from the interpacket timing,and the optimal strategy for doing so. We further investigatethe implications of the constraints of real-time communica-tion and bounded delay for the adversary’s ability to removethe embedded watermark. While there exist ways tocompletely eliminate hidden information from any signaloffline, we show that (without knowledge of the watermark-embedding parameters) it is generally infeasible for theadversary to completely eliminate the embedded watermark

from the packet timing in real time, even if he can introducearbitrarily large (but bounded) distortion to the packettiming of normal network traffic. This result ensures that ourwatermark-based correlation is able to withstand arbitrarilylarge timing perturbations in real time, provided that thereare enough packets in the flows to be correlated.

The remainder of this paper is organized as follows:Section 2 summarizes related work. Section 3 gives anoverview of watermark-based correlation. Section 4 de-scribes the embedding of a single watermark bit in a flow.Section 5 analyzes the watermark bit robustness, tradeoffs,and the overall watermark detection and collision rates.Section 6 analyzes the minimum distortion required tocompletely remove an arbitrary embedded watermark, theoptimal strategy for doing so, and the implications of real-time constraints. Section 7 describes the implementation ofour method in the Linux kernel, and evaluates theeffectiveness of out method empirically. Section 8 concludesthe paper, and points out some future research directions.

2 RELATED WORK

Existing connection correlation approaches are based onthree different characteristics: 1) host activity; 2) connectioncontent (i.e., packet payload); and 3) interpacket timingcharacteristics. The host-activity-based approach (e.g., DIDS[25] and CIS [11]) collects and tracks users’ login activity ateach stepping stone. The major drawback of host-activity-based methods is that the host activity collected from eachstepping stone is generally not trustworthy. Since, theattacker is assumed to have full control over each steppingstone, he/she can easily modify, delete, or forge user-logininformation. This defeats the ability to correlate based onhost activity.

Content-based correlation approaches (e.g., thumbprint-ing [27] and SWT [33]) require that the payload of packetsremains invariant across (i.e., is unchanged by) steppingstones. Since the attacker can easily transform the connec-tion content by encryption at the application layer, theseapproaches are suitable only for unencrypted connections.

To correlate encrypted traffic, timing-based approaches(e.g., ON/OFF-based [36], deviation-based [35] and IPD-based [32]) have been proposed. These methods passivelymonitor the arrival and/or departure times of packets, anduse this information to correlate incoming and outgoingflows of a stepping stone. For instance, IPD-based correla-tion [32] has shown that 1) the important interpacket timingcharacteristics of connections are preserved during transitacross many routers and stepping stones, and 2) the timingcharacteristics of interactive flows (e.g., telnet and SSHconnections) are almost always unique enough to differ-entiate related flows from unrelated flows.

While the first generation of timing-based correlationapproaches have proved to be effective in correlatingencrypted connections, they are vulnerable to the attacker’suse of active timing perturbation. Donoho et al. [8] firstinvestigated the theoretical limits on the attacker’s ability todisguise his traffic through timing perturbation and bogus(padding, or chaff) packet injection. By using multiple-timescale-analysis techniques, they show that correlationbased on long-term behavior (of sufficiently long flows) is

WANG AND REEVES: ROBUST CORRELATION OF ENCRYPTED ATTACK TRAFFIC THROUGH STEPPING STONES BY FLOW... 435

Page 3: 434 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE … · 2014-01-31 · IP traceback [23], [26], [9], [14] have been developed to address this countermeasure. ... While passive approaches

still possible despite certain timing perturbations by theattackers. However, they do not present any tradeoffsbetween the magnitude of the timing perturbation, thedesired correlation effectiveness, and the number of packetsneeded. Another important issue not addressed by [8] is thecorrelation false positive rate. While coarse timescaleanalysis for long term behavior may not be affected bypacket jitter (timing perturbations) introduced by theattackers, it may also be insensitive to the unique detailsof each flow’s timing. Therefore, coarse-scale analysis tendsto increase the correlation false positive rate, whileincreasing the correlation true positive rate of timing-perturbed flows. Nevertheless, the work by Donoho [8]represents an important first step toward understanding theinherent limitations of timing perturbations by the adver-sary on timing-based correlation. Not addressed in thiswork were questions about correlation effectiveness forflows of arbitrary (rather than Poission or Pareto) distribu-tion in packet timing, and the achievable tradeoff of falseand true positive rates given the magnitude of the timingperturbation and the number of packets available.

After the initial publication of our active watermarking-based correlation [31], Blum et al. [1] developed anotherpassive, timing-based correlation method that considersboth correlation true positive and false positive at the sametime. Based on the assumption that the interpacket timingof flows can be modeled as a sequence of Poisson processesof different rates, they derived upper bounds on thenumber of packets needed to achieve a specified falsepositive rate and a 100 percent true positive rate. They alsoderived the lower bounds on the amount of chaff needed todefeat their passive method of correlation. However, theirwork did not present any experimental results, nor did itaddress such practical issues as how to derive modelparameters in real time, or how many packets are needed inpractice for real flows and realistic timing perturbations.Zhang et al. [37] and He and Tong [10] recently proposedseveral new passive timing-based correlation methodsbased on similar assumptions used by Blum et al. [1], andthey showed that their methods have better performancethan what proposed by Blum et al. [1].

Chakinala et al. [2] formally analyzed the packetreordering channels as information-theoretic game. Penget al. [19] recently studied the secrecy of the watermark-based correlation [31] and proposed an offline statisticalmethod for detecting the existence of watermark from apacket flow. However, their method assumes that thewatermark embedding follows some simple and fixedpatterns, and requires access to both the unwatermarkedand watermarked flows to be effective. As we will show inthe watermark-tracking model, we can make the unwater-marked flow unavailable to the adversary by watermarkingthe packet flow from its source.

In summary, existing timing-based-correlation ap-proaches passively measure and use possibly perturbed-timing information for correlation. They do not attempt tomake the interpacket timing characteristics more amenableto effective correlation. Passive approaches are simple toimplement and undetectable by the attacker. However, theygenerally make more limiting assumptions about theinterpacket timing characteristics, and require more packets

than an active approach to effectively correlate timing-

perturbed flows, as we will show.In this paper, we describe a novel active timing-based

correlation approach that

1. makes no assumption about the distribution ofinterpacket timing intervals,

2. does not require the timing perturbation to followany specific distribution or random process,

3. is provably effective against certain correlatedrandom timing perturbation, and

4. requires substantially fewer packets than passiveapproaches to achieve the same level of correlationeffectiveness.

3 OVERVIEW OF WATERMARK-BASED

CORRELATION

3.1 Overall Watermark-Tracing Model

The watermark-tracing approach exploits the observationthat interactive connections (i.e., telnet, SSH) are bidirec-tional. The idea is to watermark the backward traffic (fromvictim back to the attacker) of the bidirectional attackconnections by slightly adjusting the timing of selectedpackets. If the embedded watermark is both robust andunique, the watermarked back traffic can be effectivelycorrelated and traced across stepping stones, from thevictim all the way back to the attacker. As shown in Fig. 1,the attacker may connect through a number of hosts(H1; . . . ; Hn) before attacking the final target. Assumingthe attacker has not gained full control on the attack target,the attack target will initiate the attack tracing after it hasdetected the attack. Specifically, the attack target willwatermark the backward traffic of the attack connection,and inform sensors across the network about the water-mark. The sensors across the network will scan all traffic forthe presence of the indicated watermark, and report to thetarget if any occurrences of the watermark are detected.

Gateway, firewall, and edge router are good places to

deploy sensors. However, how many sensors can be

deployed, depend on not only the resources available, but

also the administrative privilege. How to optimally deploy

limited number of sensors over particular network, is an

open research problem [22]. Due to space limitation, we

leave aside the sensor-deployment issues, and instead focus

on the watermark-tracing approach itself.Since the backward traffic is watermarked at its very

source—the attack target, which is not controlled by the

attacker, the attacker will not have access to an unwater-

marked version of the traffic. This makes it difficult for the

attacker to determine which packets have been delayed by

the watermarking process, running at the target.

436 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011

Fig. 1. Overall watermark-tracing model.

Page 4: 434 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE … · 2014-01-31 · IP traceback [23], [26], [9], [14] have been developed to address this countermeasure. ... While passive approaches

The objective of watermark-based correlation is to makethe correlation of encrypted connections probabilisticallyrobust against random timing perturbations by the adver-sary. Unlike existing timing-based correlation schemes, ourwatermark-based correlation is active in that it embeds aunique watermark into the encrypted flows by slightlyadjusting the timing of selected packets. If the embeddedwatermark is both unique and robust, the watermarkedflows can be effectively identified, and thus, correlated ateach stepping stone.

In contrast to most previous passive correlation ap-proaches, our watermark-based correlation makes no limit-ing assumption about the distribution or random process ofthe original inter-packet timing characteristics of the flowsto be correlated.

We assume the following about the random timingperturbations introduced by the adversary:

1. While the attacker can add extra delay to any or allpackets of an outgoing flow at the stepping stone,the maximum delay he or she can introduce isbounded.

2. All packets in the original flow are kept. No packetsare dropped from or added to the flow by thestepping stone.

3. While the watermarking scheme is public knowl-edge, the watermarking embedding and decodingparameters are secrets known only to the watermarkembedder and the watermark detector(s).

Here, we do not require that the packet order of twoflows be the same, as long as the total number of packets isnot modified. As shown in works [20], [21], and [30], ourwatermark-based approach is able to correlate encryptedflows even if chaff and timing perturbation are applied atthe same time. Due to space limitation, we only considertiming perturbation in this paper.

In contrast to all previous passive approaches, ourcorrelation method does not require the random timingperturbation introduced by the attacker to follow anyparticular distribution or random process to be effective.The only assumption about timing perturbations is thatthey follow some distribution of finite variance, and theyhave the same covariance among each other.

3.2 Watermarking Model and Concept

Generally, digital watermarking [4] involves the selectionof a watermark carrier, and the design of two complemen-tary processes: embedding and decoding. The watermark-embedding process inserts the watermark information intothe carrier signal by a slight modification of some propertyof the carrier. The watermark-decoding process detectsand extracts the watermark (equivalently, determines theexistence of a given watermark) from the carrier signal. Tocorrelate encrypted connections, we propose to use theinterpacket timing as the watermark carrier property ofinterest.

For a unidirectional flow of n > 1 packets, we use ti andt0i to represent the arrival and departure times, respectively,of the ith packet Pi of a flow incoming to and outgoing fromsome stepping stone. (Given a bidirectional connection, wecan split it into two unidirectional flows and process eachindependently).

Assume without loss of generality that the normalprocessing and queuing delay added by the stepping stoneis a constant c > 0, and that the attacker introduces extradelay di to packet Pi at the stepping stone; then, we havet0i ¼ ti þ cþ di.

We define the arrival interpacket delay (AIPD) between Piand Pj as follows:

ipdi;j ¼ tj � ti ð1Þ

and the departure interpacket delay (DIPD) between Pi and Pjas follows:

ipd0i;j ¼ t0j � t0i: ð2Þ

We will use IPD to denote either AIPD or DIPD when itis clear in the context. We further define the impact orperturbation on ipdi;j by the attacker, as the differencebetween ipd0i;j and ipdi;j : ipd0i;j � ipdi;j ¼ dj � di. Note, weuse the timestamp of the ith and the jth packets to calculateipdi;j or ipd0i;j, even if there might be some packets reorderedin the packet flow. Since, we only use the timestamp ofselected packets, the negative impact of using the “wrong”packet due to packet reorder is equivalent to some randomtiming perturbation over the IPD.

Assume D > 0 is the maximum delay that the attackercan add to Piði ¼ 1; . . . ; nÞ, then the impact or perturbationon ipdi;j is dj � di 2 ½�D;D�. Accordingly, range ½�D;D� iscalled the perturbation range of the attacker.

To make our method robust against timing perturbationsby the adversary, we choose to embed the watermark usingIPDs from randomly and independently selected packets.

Given a flow containing the packet sequence P1; . . . ; Pnwith time stamps t1; . . . ; tn, respectively (ti < tj for 1 �i < j � n), we can independently and probabilisticallychoose 2m < n packets through the following process:1) sequentially consider each of the n packets; and2) independently and randomly determine if the currentpacket will be chosen for watermarking purposes, withprobability p ¼ 2m

n ð0 < m < n2Þ. By this method, the selec-

tion of one packet for watermarking purposes is indepen-dent from the selection of any other packet. Therefore, wecan expect to have 2m distinct packets independently andrandomly selected from a packet stream of n packets.

4 BASIC AND PROBABILISTIC WATERMARKING

4.1 Basic Watermark Bit Embedding and Decoding

As an IPD is conceptually a continuous value, we will firstquantize the IPD before embedding the watermark bit.Given any IPD ipd > 0, we define the quantization of ipd withuniform quantization step size s > 0 as the function

qðipd; sÞ ¼ roundðipd=sÞ; ð3Þ

where round(x) is the function that rounds off realnumber x to its nearest integer (i.e., roundðxÞ ¼ i for anyx 2 ð½i� 0:5; iþ 0:5Þ).

Fig. 2 illustrates the quantization for scalar x. It is easy tosee that qðk� s; sÞ ¼ qðk� sþ y; sÞ for any integer k and anyy 2 ½�s=2; s=2Þ.

Let ipd denote the original IPD before watermark bit w isembedded, and ipdw denote the IPD after watermark bit w is

WANG AND REEVES: ROBUST CORRELATION OF ENCRYPTED ATTACK TRAFFIC THROUGH STEPPING STONES BY FLOW... 437

Page 5: 434 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE … · 2014-01-31 · IP traceback [23], [26], [9], [14] have been developed to address this countermeasure. ... While passive approaches

embedded. To embed a binary digit or bit w into an IPD, weslightly adjust that IPD, such that the quantization ofthe adjusted IPD will have w as the remainder when themodulus 2 is taken.

Given any ipd > 0; s > 0 and binary digit w, the water-mark bit embedding is defined as function

eðipd; w; sÞ ¼ ½qðipdþ s=2; sÞ þ�� � s; ð4Þ

where � ¼ ðw� ðqðipdþ s=2; sÞ mod 2Þ þ 2Þmod 2.The embedding of one watermark bit w into scalar ipd is

done through increasing the quantization of ipdþs=2 by thenormalized difference between w and modulo 2 of thequantization of ipdþs=2, so that the quantization ofresulting ipdw will have w as the remainder when modulus 2is taken. The reason to quantize ipdþ s=2 rather than ipdhere is to make sure that the resulting eðipd; w; sÞ is no lessthan ipd, i.e., packets can be delayed, but cannot be outputearlier than they arrive. Fig. 3 illustrates the embedding ofwatermark bit w by mapping ranges of unwatermarked ipdto the corresponding watermark ipdw.

The watermark-bit-decoding function is defined asfollows:

dðipdw; sÞ ¼ qðipdw; sÞmod 2: ð5Þ

The correctness of watermark embedding and decodingis guaranteed by the following theorems, whose proofs areomitted due to space limitation:

Theorem 1. For any ipd > 0; s > 0 and binary bit w, dðeðipd;w; sÞ; sÞ ¼ w.

Theorem 2. For any ipd > 0; s > 0 and binary bit w, 0 � eðipd;w; sÞ � ipd < 2s.

4.2 Maximum Tolerable Perturbation

Given any ipd > 0; s > 0, we define the maximum tolerable

perturbation �max of dðipd; sÞ as the upper bound of theperturbation over ipd such that 8x > 0ðx < �max ) dðipd�x; sÞ ¼ dðipd; sÞ and either dðipdþ�max; sÞ 6¼ dðipd; sÞ ordðipd��max; sÞ 6¼ dðipd; s; Þ i.e., any perturbation smallerthan �max to ipd will not change the result of watermarkdecoding (dðipd; sÞ), while a perturbation of �max or greaterto ipd may change the watermark-decoding result.

We define the tolerable perturbation range as the portion ofthe perturbation range ½�D;D� within which any perturba-tion on ipd is guaranteed not to change dðipd; sÞ, and thevulnerable perturbation range as the range of perturbationvalues outside the tolerable perturbation range.

Given any ipd > 0; s > 0 and binary watermark bit w, bydefinition of quantization function qðipd; sÞ in (3) andwatermark-decoding function dðipdw; sÞ in (5), it is easy tosee that when x 2 ½�s=2; s=2Þ; dðeðipd; w; sÞ þ x; sÞ ¼ dðeðipd;w; sÞ; sÞ and dðeðipd; w; sÞ þ s=2; sÞ 6¼ dðeðipd; w; sÞ; sÞ.

This indicates that the maximum tolerable perturbation,the tolerable perturbation range, and the vulnerableperturbation range of dðeðipd; w; sÞ; sÞ are s=2; ½�s=2; s=2Þand ð�D;�s=2Þ [ ½s=2; DÞ, respectively.

In summary, if the perturbation of an IPD is within thetolerable perturbation range ½�s=2; s=2Þ, the embeddedwatermark bit is guaranteed to be not corrupted by thetiming perturbation. If the perturbation of the IPD is outsidethis range, the embedded watermark bit may be altered bythe attacker. Therefore, the larger the value of s (equiva-lently, the larger the tolerable perturbation range), the morerobust the embedded watermark bit will be. However, alarger value of s may disturb the timing the watermarkedflow more, as the watermark bit embedding itself may addup to 2s delay to selected packets.

It is desirable to have a watermark-embedding schemethat 1) disturbs the timing of watermarked flows as little aspossible, so that the watermark embedding is less notice-able; and 2) ensures the embedded watermark bit is robust,with high probability, against timing perturbations that areoutside the tolerable perturbation range ½�s=2; s=2Þ.

In the remainder of this section, we address the casewhen the maximum delay D > 0 added by the attacker isbigger than the maximum tolerable perturbation s=2. Byutilizing redundancy techniques, we develop a frameworkthat can make the embedded watermark bit robust, witharbitrarily high probability, against arbitrarily large (andyet bounded) random timing perturbation by the attacker,as long as the flow to be watermarked contains a sufficientnumber of packets.

4.3 Embedding a Single Watermark Bit over theAverage of Multiple IPDs

To make the embedded watermark bit probabilisticallyrobust against larger random delays than s=2, the key is tocontain and minimize the impact of the random delays onthe watermark-bearing IPDs so that the impact of therandom delays will fall, with high probability, withinthe tolerable perturbation range ½�s=2; s=2Þ. We exploitthe assumption that the attacker does not know whichpackets are randomly selected and which IPDs will be usedfor embedding the watermark.

438 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011

Fig. 3. Mapping between unwatermarked ipd and watermarked ipdw to

embed watermark bit w.

Fig. 2. Quantization of scalar value x.

Page 6: 434 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE … · 2014-01-31 · IP traceback [23], [26], [9], [14] have been developed to address this countermeasure. ... While passive approaches

We apply two strategies to contain and minimize the

impact of random delays over the watermark-bearing IPDs.

The first strategy is to distribute watermark-bearing IPDs

over a longer duration of the flow. The second is to embed a

watermark bit in the average of multiple IPDs. The rationale

behind these strategies is as follows: while the attacker may

add a large delay to a single IPD, it is impossible to add large

delays to all IPDs. In fact, random delays tend to increase

some IPDs and decrease others. Therefore, the impact on

the average of multiple IPDs is more likely to be within the

tolerable perturbation range ½�s=2; s=2Þ, even when the

perturbation range ½�D;D� is much larger than ½�s=2; s=2Þ.Instead of embedding one watermark bit in one IPD, we

embed the watermark bit into the average ofm � 1 randomly

selected IPDs. Here, we call m the redundancy number.Given a packet stream P1; . . . ; Pn with time stamps

t1; . . . ; tn, respectively (ti < tj for 1 � i < j � n), we first

independently and randomly choose 2m (0 < m < n2 )

distinct packets: Px1; . . . ; Px2m

(1 � xk � n for 1 � k � 2m).

We then randomly divide the 2m packets into two groups of

m packets {Py1; . . . ; Pym } and {Pz1

; . . . ; Pzm } ðyk < zk and

yk; zk 2 fx1; . . . ; x2mgÞ, and randomly form m packet pairs

f< Py1; Pz1

>; . . . ; < Pym; Pzm >g.Let <Pyk ; Pzk> be the k-th pair of packets randomly

selected to embed the watermark bit, whose timestamps are

tyk and tzk , respectively. Then, we have m IPDs: ipdk ¼tzk � tykðk ¼ 1; . . . ;mÞ. We represent the average of these

m IPDs as

ipdavg ¼1

m

Xmk¼1

ipdk: ð6Þ

Given any desired ipdavg > 0, and the values for s and w,

we can embed w into ipdavg by applying the embedding

function defined in (4) to ipdavg. Specifically, the timing of

packets Pzkðk ¼ 1; . . . ;mÞ are all delayed, so that ipdavg is

adjusted by �, as defined in (4). To decode the watermark

bit, we first collect them IPDs (denoted as ipdwk ; k ¼ 1; . . . ;m)

from the same m pairs of randomly selected packets, and

from them, compute the average ipdwavg of ipdw1 ; . . . ; ipdwm.

Then, we can apply the decoding function defined in (5) to

ipdwavg to decode the watermark bit.Because watermark embedding is now applied to the

average of m IPDs, the watermark-embedding process

needs to know the exact values of those IPDs to be averaged

in order to achieve a perfectly even time adjustment. In real-

time communication, packets arrive and are forwarded one

by one, and incoming packets should not be buffered for too

long before they are sent out. This means that the water-

mark-embedding process may need to adjust the timing of

some packets and send them out before knowing the

average of all the m selected IPDs. In this case, embedding

the watermark bit over the average of multiple IPDs of real-

time flows may lead to an uneven time adjustment over

those selected packets.

5 ANALYSIS OF PROBABILISTIC WATERMARKING IN

THE PRESENCE OF TIMING PERTURBATIONS

We now consider the probabilistic watermark decoding inthe presence of active timing perturbation. Base on verymoderate assumptions about the random timing perturba-tions, we first establish an upper bound of the watermark-bit-decoding error probability, and then, derive an approx-imation to the watermark-bit-decoding error probability.

5.1 Upper bound of the Watermark-Bit-DecodingError Probability

Let Diði ¼ 1; . . . ; nÞ represent the random delays added topackets Piði ¼ 1; . . . ; nÞ by the adversary, and let D > 0 bethe maximum delay the adversary can add to any packet.Here, we do not require the random delays added by theadversary to follow any particular distribution, except thatthe random delay follows some distribution of finitevariance. For example, the delay distribution may bedeterministic, bimodal, self-similar, or any other distribu-tion. Furthermore, we do not require the random delay Di’sto be independent from each other, and we only assumethat the covariance between different Di’s are the same.

Given the assumption that the adversary does not knowhow and which packets are selected by the watermarkembedder, the selection of watermark-embedding packetPxk ðk ¼ 1; . . . ; 2mÞ is independent from any random delaysDi that the adversary may add. Therefore, the impact of thedelays by the adversary over randomly selected Pxk ’s isequivalent to randomly choosing one from the randomvariable list D1; . . . ; Dn. Let dk ðk ¼ 1; . . . ; 2mÞ represent theimpact of the random delays by the adversary over thekth randomly selected packet Pxk . Since the random delaysadded by the adversary follow some fixed distribution, thedk’s (k ¼ 1; . . . ; 2m) are identically distributed.

Let dyk and dzk be the random variables that denote therandom delays added by the attacker to packets Pyk and Pzk ,respectively, for k ¼ 1; . . . ;m. Let Xk ¼ dzk � dyk be therandom variable that denotes the impact of these randomdelays on ipdk ¼ tzk � tyk , and Xm be the random variablethat denotes the overall impact of random delay on ipdavg.Therefore, EðXkÞ ¼ 0. From (6), we have

Xm ¼1

m

Xmk¼1

ðdzk � dykÞ ¼1

m

Xmk¼1

Xk: ð7Þ

Therefore, the impact of the random delay by theattacker over ipdavg equals the sample mean of X1; . . . ; Xm.

We define the probability that the impact of the timingperturbation by the attacker is out of the tolerable perturba-tion range ð�s=2; s=2� as the watermark bit vulnerability,which can be quantitatively expressed as PrðjXmj � s=2Þ.

Let � and �2 be the mean and the variance of the randomdelay added by the attacker. Because the maximum delaythat may be added by the attacker is assumed to bebounded, �2 is finite.

Given Covðu; vÞ ¼ EðuvÞ � EðuÞEðvÞ, EðdziÞ ¼ EðdzjÞ ¼EðdyiÞ ¼ EðdyjÞ, and CovðDi;DjÞði 6¼ jÞ is constant, we have

EðdzidzjÞ ¼ EðdyidyjÞ ¼ EðdzidyjÞ ¼ EðdzjdyiÞ: ð8Þ

WANG AND REEVES: ROBUST CORRELATION OF ENCRYPTED ATTACK TRAFFIC THROUGH STEPPING STONES BY FLOW... 439

Page 7: 434 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE … · 2014-01-31 · IP traceback [23], [26], [9], [14] have been developed to address this countermeasure. ... While passive approaches

Then

CovðXi;XjÞ ¼ EðXiXjÞ¼ Eððdzi � dyiÞðdzj � dyjÞÞ¼ EðdzidzjÞ þ EðdyidyjÞ � EðdzidyjÞ � EðdzjdyiÞ¼ 0:

ð9Þ

Therefore,

VarðXmÞ ¼1

m2Var

Xmk¼1

Xk

!

¼ 1

m2

Xmk¼1

VarðXkÞ þ 2CovðXi;XjÞ" #

1 � i < j � m

¼ 1

mVarðXkÞ

� 4�2

m:

ð10Þ

According to the Chebyshev inequality in statistics [7],

for any random variable X with finite variance VarðXÞ and

for any t > 0, PrðjX � EðXÞj � tÞ � VarðXÞ=t2. This means

that the probability that a random variable deviates from its

mean by more than t is bounded by VarðXÞ=t2. By applying

the Chebyshev inequality to Xm with t ¼ s=2, we have

Pr jXmj �s

2

� �� 16�2

ms2: ð11Þ

This means that the probability that the overall impact of

random delays on ipdavg is outside the tolerable perturba-

tion range ð�s=2; s=2�, is bounded. In addition, that

probability can be reduced to be arbitrarily close 0 by

increasing m, the number of redundant IPDs averaged

together before embedding the watermark bit. Since, the

watermark-bit-decoding error probability is less than

PrðjXmj � s2Þ, the derived upper bound is conservative

and it holds true regardless of the distribution, mean, or

the variance of the random delays added by the attacker, or

of the maximum quantization allowed for watermarking

embedding. Furthermore, the upper bound of the error

probability holds true even if the random delays on

different packets are correlated.

5.2 Approximation to the Watermark BitRobustness

In this section, we assume that the random delays added by

the adversary are independent and identically distributed

(iid), and we derive an accurate approximation to the

watermark bit robustness PrðjXmj < s=2Þ via the well-

known central limit theorem of statistics [7]. Although, the

approximation model assumes the random delays are iid,

our experiments (as shown in Fig. 10) demonstrate that the

derived approximation model can accurately model non-iid

(e.g., batch-releasing) random delays.

Central Limit Theorem. If the random variables X1; . . . ; Xn

form a random sample of size n from a given distributionXwithmean � and finite variance �2, then for any fixed number x

limn!1

Pr

ffiffiffinpðXn � �Þ�

� x� �

¼ �ðxÞ; ð12Þ

where �ðxÞ ¼R x�1

1ffiffiffiffi2�p e�

u2

2 du.

The theorem indicates that whenever a random sampleof size n is taken from any distribution with mean � andfinite variance �2, the sample mean Xn will be approxi-mately normally distributed with mean � and variance�2=n, or equivalently the distribution of random variableffiffiffinp ðXn � �Þ=� will be approximately a standard normaldistribution.

Let �2 denote the variance of the distribution of therandom delays added by the attacker (i.e., let VarðdykÞ ¼VarðdzkÞ ¼ �2). Applying the central limit theorem to randomsampleX1 ¼ dz1

� dy1; . . . ; Xm ¼ dzm � dym , where VarðXkÞ ¼

VarðdzkÞ þVarðdykÞ ¼ 2�2 and EðXkÞ ¼ EðdzkÞ � EðdykÞ ¼ 0,we have

Pr

ffiffiffiffiffimp ðXm �EðXiÞÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

VarðXiÞp < x

" #¼ Pr

ffiffiffiffiffimp

Xmffiffiffi2p

�< x

� �� �ðxÞ:

ð13Þ

Because of the symmetry of �ðxÞ

Pr

ffiffiffiffiffimp

Xmffiffiffi2p

�������� < x

� �� 2�ðxÞ � 1: ð14Þ

Therefore,

p ¼ Pr jXmj <s

2

h i

¼ Pr

ffiffiffiffiffimp

Xmffiffiffi2p

�������� < s

ffiffiffiffiffimp

2ffiffiffi2p

� �� 2�

sffiffiffiffiffimp

2ffiffiffi2p

� � 1:

ð15Þ

This means that the impact of the timing perturbation onthe watermark bit is approximately normally distributedwith zero mean and variance 2�2=m. Here, p represents theprobability that the impact of the timing perturbation fallswithin range ð� s

2 ;s2Þ. While the encoded watermark bit

could be decoded correctly even when the timing perturba-tion falls outside ð� s

2 ;s2Þ, such a probability is small when p

is close to 1. In the rest of this paper, we will use p as aconservative approximation to the probability that thewatermark bit will survive the timing perturbation.

Equation (15) confirms the result of (11). Fig. 4 illustrateshow the distribution of the impact of random timingperturbations by the attacker can be “squeezed” into thetolerable perturbation range by increasing the number ofredundant IPDs averaged.

Equation (15) also gives us an accurate estimate of thewatermark bit robustness. For example, assume the max-imum delay by the attacker is normalized to 1 time unit, therandom delays added by the attacker are uniformlydistributed over [0, 1] (whose variance �2 is 1/12), s ¼ 0:4,andm ¼ 12, then, Pr½jX12j < 0:2� � 2�ð1:2�

ffiffiffi2pÞ � 1 � 91%.

In other words, the impact of random timing perturbationson the average of 12 IPDs, with about 91 percent probability,will fall within the range [�0:2; 0:2]. Table 1 shows theestimation and simulation results of watermark bit robust-ness with uniformly distributed random delays over [0, 1],s ¼ 0:4, and various values form. This demonstrates that thecentral limit theorem can give us a very accurate estimate fora sample size as small as m ¼ 7.

440 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011

Page 8: 434 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE … · 2014-01-31 · IP traceback [23], [26], [9], [14] have been developed to address this countermeasure. ... While passive approaches

From (15), it can be seen that it is easier to achieve atarget level of robustness by increasing s than by increasingm. For example, the effect of increasing s by a factor of 2 isthe same as that of increasing m by a factor of 4.

5.3 Analysis of Watermark Detectability

Watermark detection refers to the process of determining ifa given watermark is embedded into the IPDs of a specificconnection or flow. Let the secret information sharedbetween the watermark embedder and decoder be repre-sented as <S;m; l; s; w>, where S is the packet selectionfunction that returns ðlþ 1Þ �m packets, m � 1 is thenumber of redundant pairs of packets in which to embedone watermark bit, l > 0 is the length of the watermark inbits, s > 0 is the quantization step size, and w is thel-bit watermark to be detected. Let f denote the flow to beexamined, and wf denote the decoded l bits from flow f .

The watermark detector works as follows:

1. Decode the l-bit wf from flow f .2. Compare the decoded wf with w.3. Report that watermark w is detected in flow f if the

Hamming distance between wf and w, representedas Hðwf; wÞ, is less than or equal to h, where h is athreshold parameter determined by the user, and0 � h < l.

The rationale behind using the Hamming distance ratherthan requiring an exact match to detect the presence of w isto increase the robustness of the watermark detector againsttiming perturbations by the attacker. Given any quantizationstep size s, there is a nonzero probability that an embeddedwatermark bit will be corrupted by timing perturbations, nomatter how much redundancy is used. Let 0 < p < 1 be theprobability that each embedded watermark bit will survive

the timing perturbation by the attacker. Then, the prob-ability that all l bits survive the timing perturbation by theattacker will be pl. When l is reasonably large, pl will tend tobe small unless p is very close to 1.

By using the Hamming distance h to detect watermarkwf ,the expected watermark detection rate will be

Xhi¼0

li

� pl�ið1� pÞi: ð16Þ

For example, for the value p ¼ 0:9102; l ¼ 24; and h ¼ 5,the expected watermark detection rate with exact bit matchwould be pl ¼ 10:45%. For the same values of p, l, and h, theexpected watermark detection rate using a Hammingdistance h ¼ 5 would be 98.29 percent.

It is possible that an unwatermarked flow happens tohave the watermark to be naturally detected. In this case, thewatermark detector would report the unwatermarked flowas having the watermark. It is termed a collision between wand f , if Hðwf; wÞ � h for an unwatermarked flow f .Collisions are obviously undesirable, as they may lead tofalse conclusions about who the source of an attack is.

Assuming that the l-bit wf extracted from a flow f isuniformly distributed, then the expected watermark colli-sion probability between any particular watermark w and arandom flow f will be

Xhi¼0

li

� 1

2

� l: ð17Þ

Fig. 5 shows the derived probability distribution of theexpected watermark detection and collision rates for l ¼ 24and p ¼ 0:9102. Given any watermark bit number l > 1 andany watermark bit robustness 0 < p < 1, the larger theHamming distance threshold h is, the higher the expected

WANG AND REEVES: ROBUST CORRELATION OF ENCRYPTED ATTACK TRAFFIC THROUGH STEPPING STONES BY FLOW... 441

TABLE 1Watermark Bit Robustness Simulation for Uniformly Distributed Random Delays over [0, 1], s ¼ 0:4

Fig. 5. Effect of the threshold h on the detection and collision rates of thewatermarking method.Fig. 4. Impact of random timing perturbations for different values of the

redundancy number m.

Page 9: 434 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE … · 2014-01-31 · IP traceback [23], [26], [9], [14] have been developed to address this countermeasure. ... While passive approaches

detection rate will be. However, a larger Hamming distancethreshold tends to increase the collision (false positive) rateof the watermark detection at the same time. An optimalHamming distance threshold h would be the one that giveshigh expected detection rate, while keeping the falsepositive rate low. However, the optimal h and l dependon 1) the number of packets available, 2) the definingcharacteristics of the timing perturbation, and 3) the desiredlevel of effectiveness. Due to space limitation, we leave thederivation of optimal h and l as a future work. We showthat our flow watermarking scheme can be effective evenwith potentially suboptimal h and l.

Given any quantization step size s > 0, and desiredwatermark collision probability Pc > 0, and any desiredwatermark detection rate 0 < Pd < 1, we can determine theappropriate Hamming distance threshold 0 � h < l. As-suming that h is chosen such that h < l=2, then, we have

Xhi¼0

li

� 1

2

� l�Xhi¼0

lh

� 1

2

� l� ðhþ 1Þ l

h

2l: ð18Þ

Because liml!1lh

2l¼ 0, we can always make the ex-

pected watermark collision probabilityPh

i¼0 ðliÞð12Þl < Pc

by having sufficiently large watermark bit number l.Since

Phi¼0 ðliÞpl�ið1� pÞ

i � pl, we can always make theexpected detection rate

Phi¼0 ðliÞpl�ið1� pÞ

i > Pd by mak-ing p sufficiently close to 1. From inequality (11), this canbe accomplished by increasing the redundancy number mgiven any fixed values of s and �.

Therefore, in theory, our watermark-based-correlationmethod can, with arbitrarily small averaged adjustment ofinterpacket timing, achieve arbitrarily close to 100 percentwatermark-detection rate, and arbitrarily close to 0 percentwatermark-collision probability at the same time againstarbitrarily large (but bounded) random timing perturbationof arbitrary distribution, as long as there are enough packetsin the flow to be watermarked.

In practice, the number of packets available is thefundamental limiting factor to the achievable effectivenessof our watermark-based correlation. Our experiences showthat our watermark-based correlation can be effective withas few as several hundred packets. For example, theexperiments in Sections 7.1 and 7.2 show that the water-marking only requires less than 300 packets to achieve avirtually 100 percent decoding rate against up to 1,000 msrandom timing perturbation, and less than 0.35 percentfalse positive rate.

Although, the watermark-detection true positive rate andfalse positive rate can be made arbitrarily close to 100 percentand 0 percent at the same time by introducing enoughredundancy, there is always a nonzero probability that anembedded watermark is not detected. In fact, there existsome special case of timing perturbation that could poten-tially and completely remove the embedded watermark. Forexample, with sufficiently large delay, the timing of thewatermarked packet flow could be perturbed such that theinterpacket arrival time is constant (or equivalently, the IPDsbetween adjacent packets equal to the average IPDs). In thiscase, the watermark decoding of the perturbed flow wouldbe fixed no matter what and how the watermark has beenembedded. However, achieving such a complete elimination

of embedded watermark requires knowing the exact timingcharacteristics of the traffic in advance. In the followingsection, we analyze the negative impacts of the adversary’stiming perturbations, and their limits under the constraintsof real-time communication.

6 LIMITS OF ADVERSARY’S TIMING PERTURBATION

We have shown that there exist special cases of brute forcetiming perturbation that could completely remove anyembedded watermark from any distribution of interpackettiming.

In this section, we analyze the limitations on the negativeimpact of the adversary’s timing perturbations. We assumethat the key parameters of the watermark-embeddingmethod are unknown to the adversary. We first identify theminimum distortion required for the adversary to comple-tely remove the embedded watermark and the optimalstrategy for doing so. We then analyze the additionalconstraints imposed by real-time communication, and theirimplications for the adversary’s ability to interfere with ordistort the watermark. We show that it is generally infeasiblefor the adversary to completely eliminate the embeddedwatermark from a flow of packets in real time.

6.1 Minimum Brute Force Perturbation Needed toCompletely Remove Watermark

Let the n-dimension vector SN ¼ <S1; . . . ; SN> (where Si 2Rþ is a random variable) be the host signal or carrier in whichthe watermark is to be embedded, M be the message to beembedded and transferred, andK be the key information forcorrect information embedding and decoding. Let XN ¼<X1; . . . ; XN> (where Xi 2 Rþ is a random variable) be thesignal afterM is embedded in SN . The adversary distortsXN

into another vector Y N ¼ <Y1; . . . ; YN> (where Yi 2 Rþ isanother random variable). We can view Yi as an estimator ofXi and use the mean squared error MSEðXi; YiÞ ¼ E½ðXi � YiÞ2�to measure the distortion between random variables Xi andYi. We useDðXN; Y NÞ ¼ 1

N

PNi¼1 MSEðXi; YiÞ to measure the

overall distortion between XN and Y N .We first, consider the distortion between single random

variables Xi and Yi. From an information-theoretic point ofview, to eliminate all the hidden information in Xi from Yivia brute force is to make the mutual information betweenXi and Yi IðXi;YiÞ be 0, i.e.,

IðXi;YiÞ ¼ HðXiÞ �HðXijYiÞ ¼ 0 ð19Þ

or

HðXiÞ ¼ HðXijYiÞ ¼ HðXi; YiÞ �HðYiÞ: ð20Þ

Then, we have

HðXi; YiÞ ¼ HðXiÞ þHðYiÞ: ð21Þ

Therefore, Xi and Yi are independent from each other. Thismeans that the adversary needs to distort Xi into anotherindependent random variable Yi in order to completelyremove any hidden information from random variable Xi

via brute force. The following theorem1 determines the MSEto achieve this:

442 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011

1. We omit the proof due to space limitation.

Page 10: 434 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE … · 2014-01-31 · IP traceback [23], [26], [9], [14] have been developed to address this countermeasure. ... While passive approaches

Theorem 3. The mean square error between two independentrandom variables X and Y is

MSEðX;Y Þ ¼ V arðXÞ þ V arðY Þ þ ððEðXÞ � EðY ÞÞ2:

Both VarðY Þ and ðEðXÞ � EðY ÞÞ2 are nonnegative, andthey will be 0 only when Y equals the constant value EðXÞ.Corollary 1. The minimum mean square error needed to convert

one random variable X into another independent randomvariable Y is V arðXÞ, and it only occurs when Y equals toconstant value EðXÞ.Therefore, the minimum distortion required to completely

eliminate hidden information from any particular randomvariableX via brute force is Var(X), and the optimal strategyto do so is to convert X into constant value EðXÞ.

Now, we consider the distortion between two n-dimen-sional vectors XN and Y N . The overall distortion betweenXN and Y N (DðXN; Y NÞ) will reach its minimum when eachMSEðXi; YiÞ reaches its minimum. Therefore, the minimumoverall distortion DðXN; Y NÞ required to completely elim-inate hidden information from XN is 1

N

PNi¼1 V arðXiÞ, and

the optimal strategy to achieve this is to convert XN ¼<X1; . . . ; XN> into Y N ¼ <EðX1Þ; . . . ; EðXNÞ>. Given anyfixed EðX1Þ; . . . ; EðXNÞ; Y N is fixed regardless of the exactvalues of XN . Therefore, the mutual information betweenXN and Y N is zero in this case. This result holds trueregardless of the distribution of each Xi in XN .

This result is consistent with Moulin’s work [16], [15]regarding the achievable capacity of an information hidingscheme in the presence of distortion by an adversary.Moulin showed that for a normally distributed host signal,the information hiding capacity is 0 when the distortion bythe adversary is equal to or greater than the variance of thehost signal. Here, we have shown that the adversary couldremove any hidden information from the host signal of anydistribution via distortion equal to or greater than thevariance of the host signal, and we have described anoptimal strategy to eliminate any hidden information viabrute force distortion.

6.2 Constraints of Real-Time Communication andTheir Implications

This section considers the additional constraints imposedby real-time communication and their implications for theadversary’s capability to completely remove any hiddeninformation from a real-time packet flow.

For a real-time packet flow P1; . . . ; Pn with time stampst1; . . . ; tn, respectively, let the host signal in which informa-tion will be embedded be SN ¼ <t1; . . . ; tn>. The require-ments of real-time communication impose the followingconstraints on any distortions over the packet timing:

1. Each packet can only be delayed.2. The delay to any packet is bounded (finite),

otherwise the real-time communication is broken.3. The delay to packet Pkðk < nÞ has to be determined

and performed before all n packets are received,otherwise the delay to Pk is unbounded whenn!1.

Let �i be the delay added to packet Pi, and t0i be thedistorted time stamp of packet Pi, then t0i ¼ ti þ �i. The

original and distorted interpacket delays between Piþ1 and

Pi are Ii ¼ tiþ1 � ti and I 0i ¼ t0iþ1 � t0i, respectively. Therefore,

�k ¼ t0k � tk ¼ �1 þXk�1

i¼1

ðI 0i � IiÞ: ð22Þ

The original and the perturbed interpacket timing

characteristics of packet flow P1; . . . ; Pn can be represented

by <t1; I1; . . . ; In�1> and <t01; I01; . . . ; I 0n�1>, respectively. In

particular, <I 01; . . . ; I 0n�1> represents the distortion pattern

over the original interpacket timing characteristics.According to results from Section 6.1, in order to

completely remove any hidden information from the

original interpacket timing characteristics, the adversary

needs to disturb <t1; I1; . . . ; In�1> into an independent one.

This means that <I 01; . . . ; I 0n�1> needs to be independent

from <I1; . . . ; In�1>. Therefore, the distortion pattern

<I 01; . . . ; I 0n�1> can be thought to be predetermined before

the original interpacket timing characteristics <t1; I1; . . . ;

In�1> is ever known.Let Imin and Imax denote the minimum and the

maximum of all Ii, respectively, and let D > 0 represent

the arbitrarily large maximum delay that the adversary

could add to any packet. To satisfy the real-time constraints,

the adversary could buffer at most DImin

packets before the

delay �1 is determined and applied to packet P1.Assume the adversary buffers b (0 < b < n) packets:

P1; . . . ; Pb before decides �1. Since the adversary knows

<t1; I1; . . . ; Ib�1> and all I 0k, he can find appropriate value

of �1 to make sure that �1; . . . ; �b are nonnegative according

to (22).Now, we show that when Imin < Imax, it is generally

infeasible for the adversary to bound �n within range [0; D]

without prior knowledge of the exact values of Ib; . . . ; In�1.

From (22), it is easy to get

�n ¼ �b þXn�1

i¼bðI 0i � IiÞ: ð23Þ

Therefore, the real-time constraint xn 2 ½0; D� is equiva-

lent to

� �bn� 1

n

Xn�1

i¼bI 0i �

1

n

Xn�1

i¼bIi �

D� �bn

: ð24Þ

Since both �b and D are fixed and finite, we have

limn!1

1

n

Xn�1

i¼bI 0i ¼ lim

n!1

1

n

Xn�1

i¼bIi: ð25Þ

This means that the average interarrival time (or

equivalently the average packet rate) of the perturbed

packet flow must be arbitrarily close to that of the original

packet flow given sufficiently large n. In order to

completely remove any hidden information from the

interpacket timing domain of a packet flow, <I 0b; . . . ; I 0n�1>

must be independent from <Ib; . . . ; In�1>. This means

<I 0b; . . . ; I 0n�1> must be determined before <Ib; . . . ; In�1>

is ever known. However, when Imin < Imax and n is large, it

is infeasible for the adversary to determine <I 0b; . . . ; I 0n�1>

WANG AND REEVES: ROBUST CORRELATION OF ENCRYPTED ATTACK TRAFFIC THROUGH STEPPING STONES BY FLOW... 443

Page 11: 434 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE … · 2014-01-31 · IP traceback [23], [26], [9], [14] have been developed to address this countermeasure. ... While passive approaches

whose average interarrival time is arbitrarily close to anunknown value 1

n

Pn�1i¼b Ii 2 ½Imin; Imax�.

In other words, when Imax > Imin, the adversary is not ableto meet all the real-time constraints when he tries tocompletely remove all the hidden information from asufficiently long real-time packet flow. Therefore, it isgenerally infeasible for the adversary to completely eliminateall the hidden information in real time from a sufficientlylong packet flow even with arbitrarily large delays.

7 IMPLEMENTATION AND EXPERIMENTS

We have implemented both online and offline versions ofour watermarking embedding and decoding scheme. Theonline version runs as a Linux kernel module in Linuxkernel 2.4.22, and it embeds the specified watermark tospecified IP flow at real time. The online version usesLinux netfilter and iptable to communicate the water-marking parameters from user space to kernel space, and ithas 10 ms precision in adjusting the timing of selectedpackets. The offline version is a user-space application thatreads and manipulates the flow trace files rather than real-time flows. The watermark-embedding part reads a flowtrace file of pcap format, and outputs a new flow trace filein which the watermark is embedded in the packet timing.The watermark-detecting part reads the watermarked-flowtrace file and reports if the specified watermark is detected.Our experience has shown that the online version andoffline version of our implementations consistently givealmost identical results on correlation true positive andfalse positive rates.

In this section, we empirically validate our activewatermark-based-correlation scheme in the presence ofrandom timing perturbations. In particular, we seek toanswer the following questions:

1. How vulnerable are existing (passive) timing-basedcorrelation schemes to random timing perturbations?

2. How robust is the active watermark-based correla-tion against random timing perturbation?

3. How effective is watermark-based correlation incorrelating the encrypted flows in the presence ofboth iid and non-iid random timing perturbations?

4. How accurate are our quantitative tradeoff modelsof watermark-bit robustness, watermark-detectionrate, and watermark-collision rate in predicting theactual values?

We have used three flow sets, labeled FS1, FS1-Int, andFS2, in our experiments. FS1 is derived from over 49 millionpacket headers of the Bell Labs-1 traces of NLANR [17]. Itcontains 121 SSH flows that have at least 600 packets and thatare at least 300 seconds long. FS2 contains 1,000 synthetictelnet flows generated from an empirically derived distribu-tion [6] of telnet packet interarrival times, using the TCPLIB[5] tool.

We also wish to examine the performance of our methodfor purely interactive traffic. Because SSH flows may containnoninteractive traffic, such as those of bulk data transfer andX-windows management, it is desirable to remove suchnoninteractive traffic from the SSH flows to get more accurateevaluation of watermark-based correlation of interactive

flows. We examine the adjacent IPD of those SSH flows inFS1, and filter out those flows whose adjacent IPDs are tooshort to be generated by humans. Since it is very unlikely for aperson to type more than 14 keystrokes per second, we chose70 ms as the threshold to tell whether the adjacent IPD isgenerated by human typing or not. We have found 33 out of121 flows in FS1 satisfy the following conditions:

1. Flow has >40% adjacent IPDs shorter than 70 ms.2. Flow has >10% 10-consecutive adjacent IPDs all of

which are shorter than 70 ms.

By removing 33 noninteractive flows from FS1, weobtained a new flow set FS1-Int following the abovedefinition.

We considered three types of timing perturbations in our

experiments. The first is the uniformly distributed random

perturbation, in which the attacker at a stepping stone adds

to each packet a random delay evenly distributed between 0

and the maximum delay (chosen by the attacker). The

second is a self-similar perturbation, in which the attacker at a

stepping stone adds to each packet a self-similar random

delay. The third is the batch-releasing perturbation, in which

the attacker at a stepping stone periodically buffers and

holds all packets received within a certain time window,

and then forwards all the buffered packets at line speed

once the time window has expired.

The first type of perturbation is iid, while the second and

third types of perturbation are non-iid. In fact, the batch-

releasing perturbation is neither independent nor identi-

cally distributed, since the impact over any packet is closely

correlated to the time of that packet. In addition, batch-

releasing drastically changes the original timing character-

istics of any flow to a pattern of periodic bursts (as shown in

Fig. 6), which represents a challenging case for any timing-

based correlation.

7.1 Correlation True Positive Experiments

This set of experiments aim to compare and evaluate the

correlation effectiveness of our proposed active watermark-

based correlation and previous passive timing-based corre-

lation under various timing perturbations. We used two

methods of correlation. First, we used an existing, passive

444 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011

Fig. 6. Effect of 10 seconds batch-releasing timing perturbation.

Page 12: 434 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE … · 2014-01-31 · IP traceback [23], [26], [9], [14] have been developed to address this countermeasure. ... While passive approaches

timing-based correlation method called IPD-based correla-

tion [32] to correlate each flow in FS1 with the same flow, after

it is perturbed by various levels of uniformly distributed

random timing perturbations. If the flow and the perturbed

flow are reported correlated, it is considered as a true positive

(“IPDCorr TP”) of the correlation in the presence of timing

perturbation. Second, we embedded a random 24-bit water-

mark into each flow of FS1 and FS2, with redundancy

number m ¼ 12, and quantization step size s ¼ 400 ms for

each watermark bit. The embedding of the 24-bit watermark

requires 300 packets to be selected, of which 288 were

delayed. Fig. 7 shows the effect of the watermark embedding

over the interpacket timing, and illustrates that the water-

mark embedding is far from obvious. Third, we randomly

perturbed the packet timing of the watermarked flows of FS1

and FS2 with different types of timing perturbations. It is

considered a true positive (“IPDWMCorr TP”) of watermark-

based correlation if the embedded watermark can be

detected from the timing-perturbed watermarked flows,

with a Hamming distance threshold h ¼ 5. Finally, we

calculated the expected watermark-detection rate from (15)

and (16) under various maximum delays of uniformly

distributed random timing perturbations.

7.1.1 Correlation under Uniformly Distributed Random

Timing Perturbation

Fig. 8 shows the measured (as well as expected) truepositive rates of IPD-based correlation and watermark-based correlation on FS1 and FS2 under various levels ofuniformly distributed random timing perturbations. Theresults clearly indicate that IPD-based correlation isvulnerable to even moderate random timing perturbations.Without timing perturbation, IPD-based correlation is ableto successfully correlate 93 percent of the SSH flows of FS1.However, with a maximum 100 ms random timingperturbation, the true positive rate of IPD-based correlationdrops to 45.5 percent, and with a 200 ms maximum delay,the rate drops to 21.5 percent.

In contrast, the proposed watermark-based correlation ofthe flows in FS1, FS1-Int, and FS2 is able to achievevirtually a 100 percent true positive rate, with up to amaximum 600 ms random timing perturbation. With a

maximum 1,000 ms timing perturbation, the true positiverates of watermark-based correlation for FS1, FS1-Int, andFS2 are 84.2 percent, 89.85 percent, and 97.32 percent,respectively. It can be seen that the measured watermark-based correlation true positive rates are well approximatedby the estimated values, based on the watermark detectionrate model (15) and (16). In particular, the true positive ratemeasurements of FS2 are very close to the predicted valuesat all perturbation levels.

7.1.2 Correlation under Self-Similar Distributed Random

Timing Perturbation

To see how our watermark-based correlation works whenthe random timing perturbation is non-iid, we first investi-gated correlation under self-similar timing perturbations.We used Kramers implementation [13] of Taqqu et al.’ [29]self-similar synthetic traffic-generating method to generatethe self-similar timing perturbation with 128 sources of ON/OFF periods aggregated with 30 percent cumulative load.And, we have bounded the timing perturbation throughmodulo operation. In particular, we have used 128 aggre-gating sources of ON/OFF periods to generate self-similardelays in units of milliseconds.

WANG AND REEVES: ROBUST CORRELATION OF ENCRYPTED ATTACK TRAFFIC THROUGH STEPPING STONES BY FLOW... 445

Fig. 7. Comparison of 288 selected IPDs before and after watermark embedding.

Fig. 8. Correlation true positive rates under uniformly distributed randomtiming perturbations.

Page 13: 434 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE … · 2014-01-31 · IP traceback [23], [26], [9], [14] have been developed to address this countermeasure. ... While passive approaches

Fig. 9 shows the measured watermark-correlation truepositive rates under various bounded self-similar timingperturbation, and expected watermark correlation truepositive rates of uniformly distributed random delays, withh ¼ 5; l ¼ 24; m ¼ 12; and s ¼ 400 ms. It shows that thebounded self-similar perturbation yields much higherwatermark-correlation true positive rates than the expectedtrue positive rates of uniformly distributed random delayperturbation with the same delay upper bound (1;200 2;400 ms). This indicates that the bounded self-similartiming perturbation has less negative impact than theuniformly distributed random timing perturbation of sameupper bound.

We calculated the variance of 10,000 self-similar delaysthat are bounded by 1,000 ms, and obtained �2

selfsim ¼ 46;786.However, the variance of a uniformly distributed randomvariable of range [0, 1,000] �2

uniform is 83,333. According to(15), the smaller the variance �2 is, the higher is thewatermark bit robustness (equivalently, the higher thewatermark true positive rate should be). This explains whythe self-similar type of random perturbation has less negativeimpact than the uniformly distributed random delays ofsame upper bound.

7.1.3 Correlation under Batch-Releasing Random

Timing Perturbation

The second type of non-iid random timing perturbation, weinvestigated, is the batch-release timing perturbation. In thismodel, incoming packets are buffered until expiration of thenext timer period, at which point all buffered packets areoutput at line speed, in a burst. With the batch-releaseperturbation, the actual delay of any packet depends onwhere it falls within the timer interval, or window.Assuming that the packet arrival times are uniformlydistributed within the batch-release window, we are ableto calculate the variance of the delays over all packets, giventhe window size or duration.

Fig. 10 shows the measured and estimated correlationtrue positive rates of our watermark-based correlation onflow set FS2, under various levels of batch-release timingperturbations. We used 24-bit watermarks with quantization

step size s ¼ 400 ms, redundancy number m ¼ 12, and

Hamming distance threshold h ¼ 5. The expected true

positive rates are calculated by (15) and (16). The measured

values are close to the expected values, which demonstrates

that our analytical model is applicable to non-iid timing

perturbations.

7.2 Correlation False Positive Experiments

As mentioned above, there is a nonzero probability that an

unwatermarked flow happens to exhibit the randomly

chosen watermark. This case is considered a correlation

collision, or false positive. According to our correlation

collision model (17), the collision rate is determined by the

number of watermark bits l and the Hamming distance

threshold h.We experimentally investigated the following false

positive rates for varying values of the Hamming distance

threshold h and the number of watermark bits l:

1. Collision rates between a given flow and 10,0001,000,000 randomly generated 24-bit watermarkswith varying Hamming distance threshold h.

2. Collision rates between a given 24-bit watermarkand 10,000 1,000,000 randomly generated (usingTCPLIB) synthetic telnet flows with varying Ham-ming distance threshold h.

3. Collision rates between a given flow and 100,000randomly generated watermarks of various lengthswith Hamming distance threshold h ¼ 5.

4. Collision rates between a given watermark ofvarious lengths and 10,000 1,000,000 randomlygenerated (using tcplib) synthetic telnet flows withHamming distance threshold h ¼ 5.

Fig. 11 shows the measured and estimated correlationfalse positive rates. The left subfigure shows the falsepositive rates for varying Hamming distance thresholdsand fixed length (24-bit) watermarks, and the right subfigureshows the false positive rates for varying watermark lengthsand a fixed Hamming distance threshold h ¼ 5. Themeasured values are the average of 100 separate experi-ments, and they are very close to the estimated values

446 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011

Fig. 9. Correlation true positive rates under self-similar random timingperturbations.

Fig. 10. Correlation true positive rates under batch-releasing randomtiming perturbations.

Page 14: 434 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE … · 2014-01-31 · IP traceback [23], [26], [9], [14] have been developed to address this countermeasure. ... While passive approaches

calculated from (17). Thus, the experimental results validate

our model of the collision probability.

7.3 Watermark Detection Tradeoff Experiments

Equation (15) gives us the quantitative tradeoff between the

expected watermark bit robustness, the redundancy num-

ber m, and the defining characteristics of the random timing

perturbation �2. With a given watermark bit robustness p,

(16) gives us the expected watermark detection rate.To verify the validity and accuracy of our tradeoff

models of watermark bit robustness and watermark

detection rate, we did the following experiments:

1. We embedded a random 24-bit watermark into eachflow in FS1, FS1-Int, and FS2, with quantization steps ¼ 400 ms, and varying redundancy numbersm ¼ 7; 8; 9; 10; 11; 12. After we perturbed the water-marked flows with 1,000 ms maximum randomdelays, we measured the watermark detection rate ofthe perturbed, watermarked flows with Hammingdistance threshold h ¼ 5.

2. We also embedded a random 24-bit watermarkinto each flow in FS1, FS1-Int, and FS2, withquantization step s ¼ 400 ms, redundancy numberm ¼ 12. After perturbing, the watermarked flows

with 1,000 ms maximum random delays, wemeasured the watermark-detection rate of theperturbed, watermarked flows for varying Ham-ming distance thresholds of h ¼ 2; 3; 4; 5; 6; 7; 8.

3. In a separate experiment, we embedded a randomwatermark of varying lengths l ¼ 18, 19, 20, 21, 22,23, 24 into each flow in FS1, FS1-Int, and FS2, withquantization step s ¼ 400 ms and redundancynumber m ¼ 12. After perturbing, the watermarkedflows with 1,000 ms maximum uniform randomdelays, we measured the watermark-detection rateof the perturbed, watermarked flows with differentHamming distance threshold h ¼ 3.

Fig. 12 shows the average of 100 experiments for the

measured watermark detection rates of FS1 and FS1-Int,

and the average of 10 experiments for the measured

watermark-detection rates of FS2, as well as the expected

detection rate derived from (15) and (16). In all cases, the

measured detection rates of FS2 are almost identical to the

expected values, and detection rates of FS1 are similar to

but lower than the expected values. The detection rates of

FS1-Int are always between that of FS1 and FS2. These

results validate our quantitative tradeoff models of water-

mark bit robustness and watermark-detection rate.

WANG AND REEVES: ROBUST CORRELATION OF ENCRYPTED ATTACK TRAFFIC THROUGH STEPPING STONES BY FLOW... 447

Fig. 11. Correlation false positive (collision) rates versus hamming distance threshold h and number of watermark bits l.

Fig. 12. Watermark detection rate tradeoff with redundancy number m, Hamming distance threshold h and number of watermark bits l.

Page 15: 434 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE … · 2014-01-31 · IP traceback [23], [26], [9], [14] have been developed to address this countermeasure. ... While passive approaches

7.4 Comparison with Other Representative PassiveApproach

We have also compared the effectiveness and the numbersof packets needed by our active watermark-correlationapproach, and a representative passive correlation ap-proach [1], under identical levels of timing perturbationon the same sets of traces.

For Poisson arrivals, paper [1] claimed that its detectionalgorithm DETECT-ATTACKS(�, p�) is guaranteed toachieve a 100 percent detection rate and no more than � falsepositive rate given sufficient number of packets. However, itdid not include any experimental results that demonstratedthe claimed effectiveness. To empirically compare theeffectiveness of our active approach and that of the passivecorrelation method of [1], we implemented its detectionalgorithm DETECT-ATTACKS(�, p�). After identifying themaximum number of packets p� in any time interval� ¼ 800 ms from each of the 1,000 flows in FS2, we applieddetection algorithm DETECT-ATTACKS(�, p�), with� ¼ 0:3%, to correlate flows in FS2 and the correspondingperturbed flows with maximum 800 ms uniformly distrib-uted perturbation. Surprisingly, the detection algorithmDETECT-ATTACKS(�, p�) in this test only achieved a79.5 percent detection rate, while experiencing no falsepositives.

As shown in Section 7.1, under a maximum 800 msuniformly distributed timing perturbation, our watermark-based-correlation method achieved at least a 99.9 percenttrue positive rate and about a 0.3 percent false positive rate,using parameter values of h ¼ 5, l ¼ 24, s ¼ 400 ms, andm ¼ 12 on flow set FS2.

We have further compared the upper bounds of thenumber of packets needed by our active correlationapproach, and by the method of [1]. As an example,consider the requirements to achieve a guaranteed correla-tion effectiveness of at least a 99.9 percent true positive rateand a 0.35 percent false positive rate, for a maximum 600 msuniformly distributed timing perturbation. Given theparameter values m ¼ 36, s ¼ 400 ms, and D ¼ 600 ms,inequality (11) guarantees that the upper bound of the biterror probability of our active approach will be less than0.0834 (i.e., p > 0:9166). Choosing l ¼ 32, h ¼ 8, andp ¼ 0:9166, the expected watermark-detection rate from(16) is >99:9 percent, and the expected watermark-collisionrate from (17) is 0.35 percent. Therefore, in order to achievea 99.9 percent TPR and a 0.35 percent FPR with a maximum600ms timing perturbation, our active approach needs toadjust the timing of no more than 32� 36 ¼ 1;152 packets.For the approach of [1], letting � ¼ 600 ms, � ¼ 0:35%, andusing the value of p� measured from the flows in FS1,detection algorithm DETECT-ATTACKS(�, p�) requires atmost 16,698 packets to achieve a 100 percent TPR and a0.35 percent FPR. For this target, the upper bound of thenumber of packets needed by the passive approach of [1] toachieve a comparable correlation effectiveness is at least anorder of magnitude more than that of the active approach.

8 CONCLUSION AND FUTURE WORKS

Tracing attackers’ traffic through stepping stones is achallenging problem, especially when the attack traffic is

encrypted, and its timing is manipulated (perturbed) tointerfere with traffic analysis. The random timing perturba-tion by the adversary can greatly reduce the effectiveness ofpassive, timing-based correlation techniques.

We presented a novel active timing-based-correlationapproach to deal with random timing perturbations. Byembedding a unique watermark into the interpacket timing,with sufficient redundancy, we can make the correlation ofencrypted flows substantially more robust against randomtiming perturbations. Our analysis and our experimentalresults confirm these assertions.

Our watermark-based correlation is provably effectiveagainst correlated random timing perturbation as long asthe covariance of the timing perturbations on differentpackets is fixed. Specifically, the proposed watermark-basedcorrelation can, with arbitrarily small average time adjustment,achieve arbitrarily close to 100 percent watermark-detection(correlation true positive) rate and arbitrarily close to 0 percentcollision (correlation false positive) probability at the same timeagainst arbitrarily large (but bounded) random timing perturba-tion of arbitrary distribution (or process), as long as there areenough packets in the flow to be watermarked.

Compared with previous passive-correlation approaches,our active watermark-based correlation has several advan-tages:

. Our active watermark-based correlation makes noassumptions about the original distribution of theinterpacket timing of the original packet flow, and itdoes not require the adversary’s timing perturbationto follow any specific distribution or random processto be effective. This is in contrast to existing passivetiming-based correlation methods, all of whichassumed the interpacket timing of the original packetflow to follow some specific distribution or randomprocess (e.g., Poisson) when deriving their bounds.

. Our active watermark-based correlation was shownto require substantially fewer packets than arepresentative passive timing-based correlationmethod to achieve a given level of robustness.

. The effectiveness of our active watermark-basedcorrelation can be modeled more accurately. Besidesidentifying the provable upper bounds on thenumber of packets needed to achieve desiredcorrelation effectiveness under any given level ofperturbation, we have also identified the quantita-tive tradeoff models between the number of packetsneeded to achieve any desired correlation effective-ness under any given level of perturbation. Ourexperimental results validate the accuracy of thesetradeoff models. Thus, our tradeoff models are ofpractical value in optimizing the overall effective-ness of watermark-based correlation in real-worldsituations.

We have experimentally investigated the watermark-based correlation under both iid and non-iid timing perturba-tions, and the experimental results confirmed our analyticalconclusion that our watermark-based correlation is effectivefor both iid and non-iid random timing perturbations.

While our flow watermarking approach has been shownto be promising in correlating encrypted traffic in the

448 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011

Page 16: 434 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE … · 2014-01-31 · IP traceback [23], [26], [9], [14] have been developed to address this countermeasure. ... While passive approaches

presence of timing perturbation, it is not optimal from

coding theoretic perspective. One interesting area of future

work is to investigate how to make the flow watermarking

more robust with fewer packets.

ACKNOWLEDGMENTS

This work was partially supported by the US National

Science Foundation (NSF) Grants CNS-0524286 and CCF-

0728771. Some preliminary results of this work have been

presented at the 10th ACM Conference on Computer and

Communication Security (CCS ’03), October, 2003 [31].

REFERENCES

[1] A. Blum, D. Song, and S. Venkataraman, “Detection ofInteractive Stepping Stones: Algorithms and ConfidenceBounds,” Proc. Seventh Int’l Symp. Recent Advances in IntrusionDetection (RAID ’04), Oct. 2004.

[2] R.C. Chakinala, A. Kumarasubramanian, R. Manokaran, G.Noubir, C. Pandu Rangan, and R. Sundaram, “SteganographicCommunication in Ordered Channels,” Proc. Eighth InformationHiding Int’l Conf. (IH ’06), 2006.

[3] T.M. Cover and J.A. Thomas, Elements of Information Theory.John Wiley & Sons, Inc., 1991.

[4] I. Cox, M. Miller, and J. Bloom, Digital Watermarking.Morgan-Kaufmann Publishers, 2002.

[5] P. Danzig and S. Jamin, “Tcplib: A Library of TCP InternetworkTraffic Characteristics,” Technical Report USC-CS-91-495, Univ. ofSouthern California, 1991.

[6] P. Danzig, S. Jamin, R. Cacerest, D. Mitzel, and E. Estrin, “AnEmpirical Workload Model for Driving Wide-Area TCP/IPNetwork Simulations,” J. Internetworking, vol. 3, no. 1, pp. 1-26,Mar. 1992.

[7] M. DeGroot, Probability and Statistics. Addison-Wesley PublishingCompany, 1989.

[8] D. Donoho et al, “Multiscale Stepping Stone Detection: DetectingPairs of Jittered Interactive Streams by Exploiting MaximumTolerable Delay,” Proc. Fifth Int’l Symp. Recent Advances in IntrusionDetection (RAID ’02), pp. 17-35, Oct. 2002.

[9] M.T. Goodrich, “Efficient Packet Marking for Large-Scale IPTraceback,” Proc. Ninth ACM Conf. Computer and Comm. Security(CCS ’02), pp. 117-126, Oct. 2002.

[10] T. He and L. Tong, “Detecting Encrypted Stepping-StoneConnections” IEEE Trans. Signal Processing, vol. 55, no. 5,pp. 1612-1623, May 2006.

[11] H. Jung et al., “Caller Identification System in the InternetEnvironment,” Proc. Fourth USENIX Security Symp., 1993.

[12] S. Kent and R. Atkinson RFC 2401: Security Architecture for theInternet Protocol, IETF, Sept. 1998.

[13] G. Kramer, “Generator of Self-Similar Network Traffic,” http://wwwcsif.cs.ucdavis.edu/kramer/code/trf_gen2.html, 2005.

[14] J. Li, M. Sung, J. Xu, and L. Li, “Large Scale IP Traceback in High-Speed Internet: Practical Techniques and Theoretical Foundation,”Proc. IEEE Symp. Security and Privacy, 2004.

[15] P. Moulin, “Information-Hiding Games,” Proc. Int’l WorkshopDigital Watermarking (IWDW ’03), May 2003.

[16] P. Moulin and J.A. O’sullivan, “Information-Theoretic Analysis ofInformation Hiding,” IEEE Trans. Information Theory, vol. 49, no. 3,pp. 563-593, Mar. 2003.

[17] NLANR Trace Archive, http://pma.nlanr.net/Traces/long/,2005.

[18] OpenSSH. URL. http://www.openssh.com, 2010.[19] P. Peng, P. Ning, and D.S. Reeves, “On the Secrecy of Timing-

Based Active Watermarking Trace-Back Techniques,” Proc. IEEESymp. Security and Privacy (SP ’06), May 2006.

[20] P. Peng, P. Ning, D. Reeves, and X. Wang, “Active Timing-BasedCorrelation of Perturbed Traffic Flows with Chaff Packets,” Proc.Second Int’l Workshop Security in Distributed Computing Systems(SDCS ’06), June 2005.

[21] Y.J. Pyun, Y.H. Park, X. Wang, D.S. Reeves, and P. Ning, “TracingTraffic through Intermediate Hosts that Repacketize Flows,” Proc.IEEE INFOCOM ’07, May 2007.

[22] Y.J. Pyun and D.S. Reeves, “Deployment of Network Monitors forAttack Attribution,” Proc. Fourth Int’l Conf. Broadband Comm.,Networks, and Systems (Broadnets ’07), pp. 525-534, Sept 2007.

[23] S. Savage, D. Wetherall, A. Karlin, and T. Anderson, “PracticalNetwork Support for IP Traceback,” Proc. ACM SIGCOMM ’00,pp. 295-306, Sept. 2000.

[24] C.E. Shannon, “A Mathematical Theory of Communication” BellSystem Technical J., vol. 27, pp. 379-423, 623-656, July/Oct. 1948.

[25] S. Snapp et al., “DIDS (Distributed Intrusion Detection System)—-Motivation, Architecture, and Early Prototype,” Proc. 14th Nat’lComputer Security Conf., pp. 167-176, 1991.

[26] A. Snoeren and C. Patridge et al., “Hash-Based IP Traceback,”Proc. ACM SIGCOMM ’01, pp. 3-14, Sept. 2001.

[27] S. Staniford-Chen and L. Heberlein, “Holding Intruders Accoun-table on the Internet,” Proc. IEEE Symp. Security and Privacy,pp. 39-49, 1995.

[28] C. Stoll, The Cuckoo’s Egg: Tracking a Spy through the Maze ofComputer Espionage. Pocket Books, 2000.

[29] M.S. Taqqu, W. Willinger, and R. Sherman, “Proof of aFundamental Result in Self-Similar Traffic Modeling,” ACMComputer Comm. Rev., vol. 27, pp. 5-23, 1997.

[30] X. Wang, S. Chen, and S. Jajodia, “Network Flow WatermarkingAttack on Low-Latency Anonymous Communication Systems,”Proc. IEEE Symp. Security and Privacy (SP ’07), May 2007.

[31] X. Wang and D. Reeves, “Robust Correlation of EncryptedAttack Traffic through Stepping Stones by Manipulation ofInterpacket Delays,” Proc. 10th ACM Conf. Computer and Comm.Security (CCS ’03), pp. 20-29, Oct. 2003.

[32] X. Wang, D. Reeves, and S.F. Wu, “Inter-Packet Delay BasedCorrelation for Tracing Encrypted Connections through SteppingStones,” Proc. Seventh European Symp. Research in Computer Security(ESORICS ’02), pp. 244-263, Oct. 2002.

[33] X. Wang, D. Reeves, S.F. Wu, and J. Yuill, “Sleepy WatermarkTracing: An Active Network-Based Intrusion Response Frame-work,” Proc. 16th Int’l Conf. Information Security (IFIP/Sec ’01),pp. 369-384, June 2001.

[34] T. Ylonen and C. Lonvick, IETF Internet Draft: SSH ProtocolArchitecture, IETF, draft-ietf-secsh-architecture-16.txt, Work inProgress, June 2004.

[35] K. Yoda and H. Etoh, “Finding a Connection Chain for TracingIntruders,” Proc. Sixth European Symp. Research in ComputerSecurity (ESORICS ’00), pp. 191-205, Oct. 2002.

[36] Y. Zhang and V. Paxson, “Detecting Stepping Stones,” Proc. NinthUSENIX Security Symp., pp. 171-184, 2000.

[37] L. Zhang, A.G. Persaud, A. Johnson, and Y. Guan, “Detection ofStepping Stone Attack under Delay and Chaff Perturbations,”Proc. 25th IEEE Int’l Performance Computing and Comm. Conf.(IPCCC ’06), Apr. 2006.

Xinyuan Wang received the PhD degree incomputer science from North Carolina StateUniversity in 2004 after years of professionalexperience in the networking industry. He iscurrently an associate professor of computerscience at George Mason University. His mainresearch interests are around computer net-works and systems security including malwareanalysis and defense, attack attribution, anon-ymity and privacy, VoIP security, and digital

forensics. He is a recipient of the US National Science Foundation(NSF) Faculty Early Career Development (CAREER) Award. He is amember of the IEEE.

Douglas S. Reeves received the PhD degree incomputer science from Pennsylvania StateUniversity in 1987. In the same year, he joinedthe faculty of North Carolina State University,where he is currently a professor of computerscience. His research interests include computersystems, security, and peer-to-peer computing.He is a member of the IEEE.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

WANG AND REEVES: ROBUST CORRELATION OF ENCRYPTED ATTACK TRAFFIC THROUGH STEPPING STONES BY FLOW... 449