Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Robust Correlation of Encrypted Attack Trafficthrough Stepping Stones by Flow Watermarking
Xinyuan Wang, Member, IEEE, and Douglas S. Reeves, Member, IEEE
Abstract—Network-based intruders seldom attack their victims directly from their own computer. Often, they stage their attacks
through intermediate “stepping stones” in order to conceal their identity and origin. To identify the source of the attack behind the
stepping stone(s), it is necessary to correlate the incoming and outgoing flows or connections of a stepping stone. To resist attempts at
correlation, the attacker may encrypt or otherwise manipulate the connection traffic. Timing-based correlation approaches have been
shown to be quite effective in correlating encrypted connections. However, timing-based correlation approaches are subject to timing
perturbations that may be deliberately introduced by the attacker at stepping stones. In this paper, we propose a novel watermark-
based-correlation scheme that is designed specifically to be robust against timing perturbations. Unlike most previous timing-based
correlation approaches, our watermark-based approach is “active” in that it embeds a unique watermark into the encrypted flows by
slightly adjusting the timing of selected packets. The unique watermark that is embedded in the encrypted flow gives us a number of
advantages over passive timing-based correlation in resisting timing perturbations by the attacker. In contrast to the existing passive
correlation approaches, our active watermark-based correlation does not make any limiting assumptions about the distribution or
random process of the original interpacket timing of the packet flow. In theory, our watermark-based correlation can achieve arbitrarily
close to 100 percent correlation true positive rate (TPR), and arbitrarily close to 0 percent false positive rate (FPR) at the same time for
sufficiently long flows, despite arbitrarily large (but bounded) timing perturbations of any distribution by the attacker. Our paper is the
first that identifies 1) accurate quantitative tradeoffs between the achievable correlation effectiveness and the defining characteristics
of the timing perturbation; and 2) a provable upper bound on the number of packets needed to achieve a desired correlation
effectiveness, given the amount of timing perturbation. Experimental results show that our active watermark-based correlation
performs better and requires fewer packets than existing, passive timing-based correlation methods in the presence of random timing
perturbations.
Index Terms—Network-level security and protection, intrusion tracing, correlation, stepping stone.
Ç
1 INTRODUCTION
NETWORK-BASED attacks have become a serious threat tothe critical information infrastructure on which we
depend. To stop or repel network-based attacks, it is criticalto be able to identify the source of the attack. Attackers,however, go to some lengths to conceal their identities andorigin, using a variety of countermeasures. As an example,they may spoof the IP source address of the attack traffic.Methods of tracing spoofed traffic, generally known asIP traceback [23], [26], [9], [14] have been developed toaddress this countermeasure.
Another common and effective countermeasure used by
network-based intruders to hide their identity is to connect
through a sequence of intermediate hosts, or stepping stones,
before attacking the final target. For example, an attacker at
host A may telnet or SSH into host B, and from there,
launch an attack on host C. In effect, the incoming packets
of an attack connection from A to B are forwarded by B, andbecome outgoing packets of a connection from B to C. Thetwo connections or flows are related in such a case. Thevictim host C can use IP traceback to determine the secondflow originated from host B, but traceback will not be ableto correlate this with the attack flow originating fromhost A. To trace attacks through a stepping stone, it isnecessary to correlate the incoming traffic with the outgoingtraffic at the stepping stone. This would allow the attack tobe traced back to host A in the example.
The earliest work on connection correlation was based on
tracking user’s login activities at different hosts [11], [25].
Later work relied on comparing the packet contents, or
payloads of the connections to be correlated [27], [33]. Most
recent work has focused on the timing characteristics [36],
[35], [32], [8], [1] of connections, in order to correlate
encrypted connections (i.e., traffic encrypted using IPSEC
[12] or SSH [18], [34]).Timing-based correlation approaches, however, are
sensitive to the use of countermeasures by the attacker, oradversary. In particular, the attacker can perturb the timingcharacteristics of a connection by selectively or randomlyintroducing extra delays when forwarding packets at thestepping stones. This kind of timing perturbation willadversely affect the effectiveness of any timing-basedcorrelation. Timing perturbation can either make unrelatedflows have similar timing characteristics, or make relatedflows exhibit different timing characteristics. This will
434 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011
. X. Wang is with the Department of Computer Science, George MasonUniversity, University Drive, MSN 4A5, Fairfax, VA 22030.E-mail: [email protected].
. D.S. Reeves is with the Department of Computer Science and theDepartment of Electrical, and Computer Engineering, Campus Box 8206,North Carolina State University, Raleigh, NC 27695-8206.E-mail: [email protected].
Manuscript received 20 Nov. 2005; revised 9 June 2007; accepted 7 Dec. 2009;published online 6 Aug. 2010.Recommended for acceptance by V. Gligor.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TDSC-0156-1105.Digital Object Identifier no. 10.1109/TDSC.2010.35.
1545-5971/11/$26.00 � 2011 IEEE Published by the IEEE Computer Society
increase the correlation false positive rate, or decrease thecorrelation true positive rate, respectively.
Previous timing-based correlation approaches are pas-sive in that they simply examine (but do not manipulate)the traffic timing characteristics for correlation purposes.While passive approaches are simple and easy to imple-ment, they may be vulnerable to active countermeasures bythe attacker, and/or require a large number of packets inorder to correlate timing-perturbed flows.
In this paper, we address the random timing-perturba-tion problem in correlating encrypted connections throughstepping stones. Our goal is to develop an efficientcorrelation scheme that is probabilistically robust againstrandom timing perturbation, and to answer fundamentalquestions concerning the effectiveness of such techniquesand the tradeoffs involved in implementing them.
We propose a novel watermark-based-correlation schemethat is designed specifically to be robust against timingperturbations by the adversary. Unlike most previouscorrelation approaches, our watermark-based approach isactive; i.e., it embeds a unique watermark into the encryptedflows by slightly adjusting the timing of selected packets.The unique watermark that is embedded in the encryptedflow gives us a number of advantages over passive timing-based correlation in overcoming timing perturbations by theadversary. First, our active watermark-based correlationdoes not make any limiting assumptions about the distribu-tion or random process of the original interpacket timing ofthe packet flow, or the distribution of random delays anadversary can add. This is in contrast to existing passivetiming-based correlation approaches. Second, our methodrequires substantially fewer packets in the flow to achievethe same level of correlation effectiveness as existing passivetiming-based correlation. In theory, our watermark-basedcorrelation can achieve arbitrarily close to 100 percentcorrelation true positive rate, and arbitrarily close to0 percent false positive rate at the same time for sufficientlylong flows, despite arbitrarily large (but bounded) timingperturbation of arbitrary distribution by the adversary. Tothe best of our knowledge, our paper is the first thatidentifies 1) the accurate quantitative tradeoffs between theachievable correlation effectiveness and the defining char-acteristics of the timing perturbation; and 2) a provableupper bound on the number of packets needed to achieve adesired correlation effectiveness, given a bound on theamount of timing perturbation.
We also investigate the maximum negative impact on theembedded watermark an adversary can have, and theminimum effort needed to achieve that impact. Underthe condition that the watermark-embedding parametersare unknown to the adversary, we determine the minimumdistortion required for the adversary to completely elim-inate any embedded watermark from the interpacket timing,and the optimal strategy for doing so. We further investigatethe implications of the constraints of real-time communica-tion and bounded delay for the adversary’s ability to removethe embedded watermark. While there exist ways tocompletely eliminate hidden information from any signaloffline, we show that (without knowledge of the watermark-embedding parameters) it is generally infeasible for theadversary to completely eliminate the embedded watermark
from the packet timing in real time, even if he can introducearbitrarily large (but bounded) distortion to the packettiming of normal network traffic. This result ensures that ourwatermark-based correlation is able to withstand arbitrarilylarge timing perturbations in real time, provided that thereare enough packets in the flows to be correlated.
The remainder of this paper is organized as follows:Section 2 summarizes related work. Section 3 gives anoverview of watermark-based correlation. Section 4 de-scribes the embedding of a single watermark bit in a flow.Section 5 analyzes the watermark bit robustness, tradeoffs,and the overall watermark detection and collision rates.Section 6 analyzes the minimum distortion required tocompletely remove an arbitrary embedded watermark, theoptimal strategy for doing so, and the implications of real-time constraints. Section 7 describes the implementation ofour method in the Linux kernel, and evaluates theeffectiveness of out method empirically. Section 8 concludesthe paper, and points out some future research directions.
2 RELATED WORK
Existing connection correlation approaches are based onthree different characteristics: 1) host activity; 2) connectioncontent (i.e., packet payload); and 3) interpacket timingcharacteristics. The host-activity-based approach (e.g., DIDS[25] and CIS [11]) collects and tracks users’ login activity ateach stepping stone. The major drawback of host-activity-based methods is that the host activity collected from eachstepping stone is generally not trustworthy. Since, theattacker is assumed to have full control over each steppingstone, he/she can easily modify, delete, or forge user-logininformation. This defeats the ability to correlate based onhost activity.
Content-based correlation approaches (e.g., thumbprint-ing [27] and SWT [33]) require that the payload of packetsremains invariant across (i.e., is unchanged by) steppingstones. Since the attacker can easily transform the connec-tion content by encryption at the application layer, theseapproaches are suitable only for unencrypted connections.
To correlate encrypted traffic, timing-based approaches(e.g., ON/OFF-based [36], deviation-based [35] and IPD-based [32]) have been proposed. These methods passivelymonitor the arrival and/or departure times of packets, anduse this information to correlate incoming and outgoingflows of a stepping stone. For instance, IPD-based correla-tion [32] has shown that 1) the important interpacket timingcharacteristics of connections are preserved during transitacross many routers and stepping stones, and 2) the timingcharacteristics of interactive flows (e.g., telnet and SSHconnections) are almost always unique enough to differ-entiate related flows from unrelated flows.
While the first generation of timing-based correlationapproaches have proved to be effective in correlatingencrypted connections, they are vulnerable to the attacker’suse of active timing perturbation. Donoho et al. [8] firstinvestigated the theoretical limits on the attacker’s ability todisguise his traffic through timing perturbation and bogus(padding, or chaff) packet injection. By using multiple-timescale-analysis techniques, they show that correlationbased on long-term behavior (of sufficiently long flows) is
WANG AND REEVES: ROBUST CORRELATION OF ENCRYPTED ATTACK TRAFFIC THROUGH STEPPING STONES BY FLOW... 435
still possible despite certain timing perturbations by theattackers. However, they do not present any tradeoffsbetween the magnitude of the timing perturbation, thedesired correlation effectiveness, and the number of packetsneeded. Another important issue not addressed by [8] is thecorrelation false positive rate. While coarse timescaleanalysis for long term behavior may not be affected bypacket jitter (timing perturbations) introduced by theattackers, it may also be insensitive to the unique detailsof each flow’s timing. Therefore, coarse-scale analysis tendsto increase the correlation false positive rate, whileincreasing the correlation true positive rate of timing-perturbed flows. Nevertheless, the work by Donoho [8]represents an important first step toward understanding theinherent limitations of timing perturbations by the adver-sary on timing-based correlation. Not addressed in thiswork were questions about correlation effectiveness forflows of arbitrary (rather than Poission or Pareto) distribu-tion in packet timing, and the achievable tradeoff of falseand true positive rates given the magnitude of the timingperturbation and the number of packets available.
After the initial publication of our active watermarking-based correlation [31], Blum et al. [1] developed anotherpassive, timing-based correlation method that considersboth correlation true positive and false positive at the sametime. Based on the assumption that the interpacket timingof flows can be modeled as a sequence of Poisson processesof different rates, they derived upper bounds on thenumber of packets needed to achieve a specified falsepositive rate and a 100 percent true positive rate. They alsoderived the lower bounds on the amount of chaff needed todefeat their passive method of correlation. However, theirwork did not present any experimental results, nor did itaddress such practical issues as how to derive modelparameters in real time, or how many packets are needed inpractice for real flows and realistic timing perturbations.Zhang et al. [37] and He and Tong [10] recently proposedseveral new passive timing-based correlation methodsbased on similar assumptions used by Blum et al. [1], andthey showed that their methods have better performancethan what proposed by Blum et al. [1].
Chakinala et al. [2] formally analyzed the packetreordering channels as information-theoretic game. Penget al. [19] recently studied the secrecy of the watermark-based correlation [31] and proposed an offline statisticalmethod for detecting the existence of watermark from apacket flow. However, their method assumes that thewatermark embedding follows some simple and fixedpatterns, and requires access to both the unwatermarkedand watermarked flows to be effective. As we will show inthe watermark-tracking model, we can make the unwater-marked flow unavailable to the adversary by watermarkingthe packet flow from its source.
In summary, existing timing-based-correlation ap-proaches passively measure and use possibly perturbed-timing information for correlation. They do not attempt tomake the interpacket timing characteristics more amenableto effective correlation. Passive approaches are simple toimplement and undetectable by the attacker. However, theygenerally make more limiting assumptions about theinterpacket timing characteristics, and require more packets
than an active approach to effectively correlate timing-
perturbed flows, as we will show.In this paper, we describe a novel active timing-based
correlation approach that
1. makes no assumption about the distribution ofinterpacket timing intervals,
2. does not require the timing perturbation to followany specific distribution or random process,
3. is provably effective against certain correlatedrandom timing perturbation, and
4. requires substantially fewer packets than passiveapproaches to achieve the same level of correlationeffectiveness.
3 OVERVIEW OF WATERMARK-BASED
CORRELATION
3.1 Overall Watermark-Tracing Model
The watermark-tracing approach exploits the observationthat interactive connections (i.e., telnet, SSH) are bidirec-tional. The idea is to watermark the backward traffic (fromvictim back to the attacker) of the bidirectional attackconnections by slightly adjusting the timing of selectedpackets. If the embedded watermark is both robust andunique, the watermarked back traffic can be effectivelycorrelated and traced across stepping stones, from thevictim all the way back to the attacker. As shown in Fig. 1,the attacker may connect through a number of hosts(H1; . . . ; Hn) before attacking the final target. Assumingthe attacker has not gained full control on the attack target,the attack target will initiate the attack tracing after it hasdetected the attack. Specifically, the attack target willwatermark the backward traffic of the attack connection,and inform sensors across the network about the water-mark. The sensors across the network will scan all traffic forthe presence of the indicated watermark, and report to thetarget if any occurrences of the watermark are detected.
Gateway, firewall, and edge router are good places to
deploy sensors. However, how many sensors can be
deployed, depend on not only the resources available, but
also the administrative privilege. How to optimally deploy
limited number of sensors over particular network, is an
open research problem [22]. Due to space limitation, we
leave aside the sensor-deployment issues, and instead focus
on the watermark-tracing approach itself.Since the backward traffic is watermarked at its very
source—the attack target, which is not controlled by the
attacker, the attacker will not have access to an unwater-
marked version of the traffic. This makes it difficult for the
attacker to determine which packets have been delayed by
the watermarking process, running at the target.
436 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011
Fig. 1. Overall watermark-tracing model.
The objective of watermark-based correlation is to makethe correlation of encrypted connections probabilisticallyrobust against random timing perturbations by the adver-sary. Unlike existing timing-based correlation schemes, ourwatermark-based correlation is active in that it embeds aunique watermark into the encrypted flows by slightlyadjusting the timing of selected packets. If the embeddedwatermark is both unique and robust, the watermarkedflows can be effectively identified, and thus, correlated ateach stepping stone.
In contrast to most previous passive correlation ap-proaches, our watermark-based correlation makes no limit-ing assumption about the distribution or random process ofthe original inter-packet timing characteristics of the flowsto be correlated.
We assume the following about the random timingperturbations introduced by the adversary:
1. While the attacker can add extra delay to any or allpackets of an outgoing flow at the stepping stone,the maximum delay he or she can introduce isbounded.
2. All packets in the original flow are kept. No packetsare dropped from or added to the flow by thestepping stone.
3. While the watermarking scheme is public knowl-edge, the watermarking embedding and decodingparameters are secrets known only to the watermarkembedder and the watermark detector(s).
Here, we do not require that the packet order of twoflows be the same, as long as the total number of packets isnot modified. As shown in works [20], [21], and [30], ourwatermark-based approach is able to correlate encryptedflows even if chaff and timing perturbation are applied atthe same time. Due to space limitation, we only considertiming perturbation in this paper.
In contrast to all previous passive approaches, ourcorrelation method does not require the random timingperturbation introduced by the attacker to follow anyparticular distribution or random process to be effective.The only assumption about timing perturbations is thatthey follow some distribution of finite variance, and theyhave the same covariance among each other.
3.2 Watermarking Model and Concept
Generally, digital watermarking [4] involves the selectionof a watermark carrier, and the design of two complemen-tary processes: embedding and decoding. The watermark-embedding process inserts the watermark information intothe carrier signal by a slight modification of some propertyof the carrier. The watermark-decoding process detectsand extracts the watermark (equivalently, determines theexistence of a given watermark) from the carrier signal. Tocorrelate encrypted connections, we propose to use theinterpacket timing as the watermark carrier property ofinterest.
For a unidirectional flow of n > 1 packets, we use ti andt0i to represent the arrival and departure times, respectively,of the ith packet Pi of a flow incoming to and outgoing fromsome stepping stone. (Given a bidirectional connection, wecan split it into two unidirectional flows and process eachindependently).
Assume without loss of generality that the normalprocessing and queuing delay added by the stepping stoneis a constant c > 0, and that the attacker introduces extradelay di to packet Pi at the stepping stone; then, we havet0i ¼ ti þ cþ di.
We define the arrival interpacket delay (AIPD) between Piand Pj as follows:
ipdi;j ¼ tj � ti ð1Þ
and the departure interpacket delay (DIPD) between Pi and Pjas follows:
ipd0i;j ¼ t0j � t0i: ð2Þ
We will use IPD to denote either AIPD or DIPD when itis clear in the context. We further define the impact orperturbation on ipdi;j by the attacker, as the differencebetween ipd0i;j and ipdi;j : ipd0i;j � ipdi;j ¼ dj � di. Note, weuse the timestamp of the ith and the jth packets to calculateipdi;j or ipd0i;j, even if there might be some packets reorderedin the packet flow. Since, we only use the timestamp ofselected packets, the negative impact of using the “wrong”packet due to packet reorder is equivalent to some randomtiming perturbation over the IPD.
Assume D > 0 is the maximum delay that the attackercan add to Piði ¼ 1; . . . ; nÞ, then the impact or perturbationon ipdi;j is dj � di 2 ½�D;D�. Accordingly, range ½�D;D� iscalled the perturbation range of the attacker.
To make our method robust against timing perturbationsby the adversary, we choose to embed the watermark usingIPDs from randomly and independently selected packets.
Given a flow containing the packet sequence P1; . . . ; Pnwith time stamps t1; . . . ; tn, respectively (ti < tj for 1 �i < j � n), we can independently and probabilisticallychoose 2m < n packets through the following process:1) sequentially consider each of the n packets; and2) independently and randomly determine if the currentpacket will be chosen for watermarking purposes, withprobability p ¼ 2m
n ð0 < m < n2Þ. By this method, the selec-
tion of one packet for watermarking purposes is indepen-dent from the selection of any other packet. Therefore, wecan expect to have 2m distinct packets independently andrandomly selected from a packet stream of n packets.
4 BASIC AND PROBABILISTIC WATERMARKING
4.1 Basic Watermark Bit Embedding and Decoding
As an IPD is conceptually a continuous value, we will firstquantize the IPD before embedding the watermark bit.Given any IPD ipd > 0, we define the quantization of ipd withuniform quantization step size s > 0 as the function
qðipd; sÞ ¼ roundðipd=sÞ; ð3Þ
where round(x) is the function that rounds off realnumber x to its nearest integer (i.e., roundðxÞ ¼ i for anyx 2 ð½i� 0:5; iþ 0:5Þ).
Fig. 2 illustrates the quantization for scalar x. It is easy tosee that qðk� s; sÞ ¼ qðk� sþ y; sÞ for any integer k and anyy 2 ½�s=2; s=2Þ.
Let ipd denote the original IPD before watermark bit w isembedded, and ipdw denote the IPD after watermark bit w is
WANG AND REEVES: ROBUST CORRELATION OF ENCRYPTED ATTACK TRAFFIC THROUGH STEPPING STONES BY FLOW... 437
embedded. To embed a binary digit or bit w into an IPD, weslightly adjust that IPD, such that the quantization ofthe adjusted IPD will have w as the remainder when themodulus 2 is taken.
Given any ipd > 0; s > 0 and binary digit w, the water-mark bit embedding is defined as function
eðipd; w; sÞ ¼ ½qðipdþ s=2; sÞ þ�� � s; ð4Þ
where � ¼ ðw� ðqðipdþ s=2; sÞ mod 2Þ þ 2Þmod 2.The embedding of one watermark bit w into scalar ipd is
done through increasing the quantization of ipdþs=2 by thenormalized difference between w and modulo 2 of thequantization of ipdþs=2, so that the quantization ofresulting ipdw will have w as the remainder when modulus 2is taken. The reason to quantize ipdþ s=2 rather than ipdhere is to make sure that the resulting eðipd; w; sÞ is no lessthan ipd, i.e., packets can be delayed, but cannot be outputearlier than they arrive. Fig. 3 illustrates the embedding ofwatermark bit w by mapping ranges of unwatermarked ipdto the corresponding watermark ipdw.
The watermark-bit-decoding function is defined asfollows:
dðipdw; sÞ ¼ qðipdw; sÞmod 2: ð5Þ
The correctness of watermark embedding and decodingis guaranteed by the following theorems, whose proofs areomitted due to space limitation:
Theorem 1. For any ipd > 0; s > 0 and binary bit w, dðeðipd;w; sÞ; sÞ ¼ w.
Theorem 2. For any ipd > 0; s > 0 and binary bit w, 0 � eðipd;w; sÞ � ipd < 2s.
4.2 Maximum Tolerable Perturbation
Given any ipd > 0; s > 0, we define the maximum tolerable
perturbation �max of dðipd; sÞ as the upper bound of theperturbation over ipd such that 8x > 0ðx < �max ) dðipd�x; sÞ ¼ dðipd; sÞ and either dðipdþ�max; sÞ 6¼ dðipd; sÞ ordðipd��max; sÞ 6¼ dðipd; s; Þ i.e., any perturbation smallerthan �max to ipd will not change the result of watermarkdecoding (dðipd; sÞ), while a perturbation of �max or greaterto ipd may change the watermark-decoding result.
We define the tolerable perturbation range as the portion ofthe perturbation range ½�D;D� within which any perturba-tion on ipd is guaranteed not to change dðipd; sÞ, and thevulnerable perturbation range as the range of perturbationvalues outside the tolerable perturbation range.
Given any ipd > 0; s > 0 and binary watermark bit w, bydefinition of quantization function qðipd; sÞ in (3) andwatermark-decoding function dðipdw; sÞ in (5), it is easy tosee that when x 2 ½�s=2; s=2Þ; dðeðipd; w; sÞ þ x; sÞ ¼ dðeðipd;w; sÞ; sÞ and dðeðipd; w; sÞ þ s=2; sÞ 6¼ dðeðipd; w; sÞ; sÞ.
This indicates that the maximum tolerable perturbation,the tolerable perturbation range, and the vulnerableperturbation range of dðeðipd; w; sÞ; sÞ are s=2; ½�s=2; s=2Þand ð�D;�s=2Þ [ ½s=2; DÞ, respectively.
In summary, if the perturbation of an IPD is within thetolerable perturbation range ½�s=2; s=2Þ, the embeddedwatermark bit is guaranteed to be not corrupted by thetiming perturbation. If the perturbation of the IPD is outsidethis range, the embedded watermark bit may be altered bythe attacker. Therefore, the larger the value of s (equiva-lently, the larger the tolerable perturbation range), the morerobust the embedded watermark bit will be. However, alarger value of s may disturb the timing the watermarkedflow more, as the watermark bit embedding itself may addup to 2s delay to selected packets.
It is desirable to have a watermark-embedding schemethat 1) disturbs the timing of watermarked flows as little aspossible, so that the watermark embedding is less notice-able; and 2) ensures the embedded watermark bit is robust,with high probability, against timing perturbations that areoutside the tolerable perturbation range ½�s=2; s=2Þ.
In the remainder of this section, we address the casewhen the maximum delay D > 0 added by the attacker isbigger than the maximum tolerable perturbation s=2. Byutilizing redundancy techniques, we develop a frameworkthat can make the embedded watermark bit robust, witharbitrarily high probability, against arbitrarily large (andyet bounded) random timing perturbation by the attacker,as long as the flow to be watermarked contains a sufficientnumber of packets.
4.3 Embedding a Single Watermark Bit over theAverage of Multiple IPDs
To make the embedded watermark bit probabilisticallyrobust against larger random delays than s=2, the key is tocontain and minimize the impact of the random delays onthe watermark-bearing IPDs so that the impact of therandom delays will fall, with high probability, withinthe tolerable perturbation range ½�s=2; s=2Þ. We exploitthe assumption that the attacker does not know whichpackets are randomly selected and which IPDs will be usedfor embedding the watermark.
438 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011
Fig. 3. Mapping between unwatermarked ipd and watermarked ipdw to
embed watermark bit w.
Fig. 2. Quantization of scalar value x.
We apply two strategies to contain and minimize the
impact of random delays over the watermark-bearing IPDs.
The first strategy is to distribute watermark-bearing IPDs
over a longer duration of the flow. The second is to embed a
watermark bit in the average of multiple IPDs. The rationale
behind these strategies is as follows: while the attacker may
add a large delay to a single IPD, it is impossible to add large
delays to all IPDs. In fact, random delays tend to increase
some IPDs and decrease others. Therefore, the impact on
the average of multiple IPDs is more likely to be within the
tolerable perturbation range ½�s=2; s=2Þ, even when the
perturbation range ½�D;D� is much larger than ½�s=2; s=2Þ.Instead of embedding one watermark bit in one IPD, we
embed the watermark bit into the average ofm � 1 randomly
selected IPDs. Here, we call m the redundancy number.Given a packet stream P1; . . . ; Pn with time stamps
t1; . . . ; tn, respectively (ti < tj for 1 � i < j � n), we first
independently and randomly choose 2m (0 < m < n2 )
distinct packets: Px1; . . . ; Px2m
(1 � xk � n for 1 � k � 2m).
We then randomly divide the 2m packets into two groups of
m packets {Py1; . . . ; Pym } and {Pz1
; . . . ; Pzm } ðyk < zk and
yk; zk 2 fx1; . . . ; x2mgÞ, and randomly form m packet pairs
f< Py1; Pz1
>; . . . ; < Pym; Pzm >g.Let <Pyk ; Pzk> be the k-th pair of packets randomly
selected to embed the watermark bit, whose timestamps are
tyk and tzk , respectively. Then, we have m IPDs: ipdk ¼tzk � tykðk ¼ 1; . . . ;mÞ. We represent the average of these
m IPDs as
ipdavg ¼1
m
Xmk¼1
ipdk: ð6Þ
Given any desired ipdavg > 0, and the values for s and w,
we can embed w into ipdavg by applying the embedding
function defined in (4) to ipdavg. Specifically, the timing of
packets Pzkðk ¼ 1; . . . ;mÞ are all delayed, so that ipdavg is
adjusted by �, as defined in (4). To decode the watermark
bit, we first collect them IPDs (denoted as ipdwk ; k ¼ 1; . . . ;m)
from the same m pairs of randomly selected packets, and
from them, compute the average ipdwavg of ipdw1 ; . . . ; ipdwm.
Then, we can apply the decoding function defined in (5) to
ipdwavg to decode the watermark bit.Because watermark embedding is now applied to the
average of m IPDs, the watermark-embedding process
needs to know the exact values of those IPDs to be averaged
in order to achieve a perfectly even time adjustment. In real-
time communication, packets arrive and are forwarded one
by one, and incoming packets should not be buffered for too
long before they are sent out. This means that the water-
mark-embedding process may need to adjust the timing of
some packets and send them out before knowing the
average of all the m selected IPDs. In this case, embedding
the watermark bit over the average of multiple IPDs of real-
time flows may lead to an uneven time adjustment over
those selected packets.
5 ANALYSIS OF PROBABILISTIC WATERMARKING IN
THE PRESENCE OF TIMING PERTURBATIONS
We now consider the probabilistic watermark decoding inthe presence of active timing perturbation. Base on verymoderate assumptions about the random timing perturba-tions, we first establish an upper bound of the watermark-bit-decoding error probability, and then, derive an approx-imation to the watermark-bit-decoding error probability.
5.1 Upper bound of the Watermark-Bit-DecodingError Probability
Let Diði ¼ 1; . . . ; nÞ represent the random delays added topackets Piði ¼ 1; . . . ; nÞ by the adversary, and let D > 0 bethe maximum delay the adversary can add to any packet.Here, we do not require the random delays added by theadversary to follow any particular distribution, except thatthe random delay follows some distribution of finitevariance. For example, the delay distribution may bedeterministic, bimodal, self-similar, or any other distribu-tion. Furthermore, we do not require the random delay Di’sto be independent from each other, and we only assumethat the covariance between different Di’s are the same.
Given the assumption that the adversary does not knowhow and which packets are selected by the watermarkembedder, the selection of watermark-embedding packetPxk ðk ¼ 1; . . . ; 2mÞ is independent from any random delaysDi that the adversary may add. Therefore, the impact of thedelays by the adversary over randomly selected Pxk ’s isequivalent to randomly choosing one from the randomvariable list D1; . . . ; Dn. Let dk ðk ¼ 1; . . . ; 2mÞ represent theimpact of the random delays by the adversary over thekth randomly selected packet Pxk . Since the random delaysadded by the adversary follow some fixed distribution, thedk’s (k ¼ 1; . . . ; 2m) are identically distributed.
Let dyk and dzk be the random variables that denote therandom delays added by the attacker to packets Pyk and Pzk ,respectively, for k ¼ 1; . . . ;m. Let Xk ¼ dzk � dyk be therandom variable that denotes the impact of these randomdelays on ipdk ¼ tzk � tyk , and Xm be the random variablethat denotes the overall impact of random delay on ipdavg.Therefore, EðXkÞ ¼ 0. From (6), we have
Xm ¼1
m
Xmk¼1
ðdzk � dykÞ ¼1
m
Xmk¼1
Xk: ð7Þ
Therefore, the impact of the random delay by theattacker over ipdavg equals the sample mean of X1; . . . ; Xm.
We define the probability that the impact of the timingperturbation by the attacker is out of the tolerable perturba-tion range ð�s=2; s=2� as the watermark bit vulnerability,which can be quantitatively expressed as PrðjXmj � s=2Þ.
Let � and �2 be the mean and the variance of the randomdelay added by the attacker. Because the maximum delaythat may be added by the attacker is assumed to bebounded, �2 is finite.
Given Covðu; vÞ ¼ EðuvÞ � EðuÞEðvÞ, EðdziÞ ¼ EðdzjÞ ¼EðdyiÞ ¼ EðdyjÞ, and CovðDi;DjÞði 6¼ jÞ is constant, we have
EðdzidzjÞ ¼ EðdyidyjÞ ¼ EðdzidyjÞ ¼ EðdzjdyiÞ: ð8Þ
WANG AND REEVES: ROBUST CORRELATION OF ENCRYPTED ATTACK TRAFFIC THROUGH STEPPING STONES BY FLOW... 439
Then
CovðXi;XjÞ ¼ EðXiXjÞ¼ Eððdzi � dyiÞðdzj � dyjÞÞ¼ EðdzidzjÞ þ EðdyidyjÞ � EðdzidyjÞ � EðdzjdyiÞ¼ 0:
ð9Þ
Therefore,
VarðXmÞ ¼1
m2Var
Xmk¼1
Xk
!
¼ 1
m2
Xmk¼1
VarðXkÞ þ 2CovðXi;XjÞ" #
1 � i < j � m
¼ 1
mVarðXkÞ
� 4�2
m:
ð10Þ
According to the Chebyshev inequality in statistics [7],
for any random variable X with finite variance VarðXÞ and
for any t > 0, PrðjX � EðXÞj � tÞ � VarðXÞ=t2. This means
that the probability that a random variable deviates from its
mean by more than t is bounded by VarðXÞ=t2. By applying
the Chebyshev inequality to Xm with t ¼ s=2, we have
Pr jXmj �s
2
� �� 16�2
ms2: ð11Þ
This means that the probability that the overall impact of
random delays on ipdavg is outside the tolerable perturba-
tion range ð�s=2; s=2�, is bounded. In addition, that
probability can be reduced to be arbitrarily close 0 by
increasing m, the number of redundant IPDs averaged
together before embedding the watermark bit. Since, the
watermark-bit-decoding error probability is less than
PrðjXmj � s2Þ, the derived upper bound is conservative
and it holds true regardless of the distribution, mean, or
the variance of the random delays added by the attacker, or
of the maximum quantization allowed for watermarking
embedding. Furthermore, the upper bound of the error
probability holds true even if the random delays on
different packets are correlated.
5.2 Approximation to the Watermark BitRobustness
In this section, we assume that the random delays added by
the adversary are independent and identically distributed
(iid), and we derive an accurate approximation to the
watermark bit robustness PrðjXmj < s=2Þ via the well-
known central limit theorem of statistics [7]. Although, the
approximation model assumes the random delays are iid,
our experiments (as shown in Fig. 10) demonstrate that the
derived approximation model can accurately model non-iid
(e.g., batch-releasing) random delays.
Central Limit Theorem. If the random variables X1; . . . ; Xn
form a random sample of size n from a given distributionXwithmean � and finite variance �2, then for any fixed number x
limn!1
Pr
ffiffiffinpðXn � �Þ�
� x� �
¼ �ðxÞ; ð12Þ
where �ðxÞ ¼R x�1
1ffiffiffiffi2�p e�
u2
2 du.
The theorem indicates that whenever a random sampleof size n is taken from any distribution with mean � andfinite variance �2, the sample mean Xn will be approxi-mately normally distributed with mean � and variance�2=n, or equivalently the distribution of random variableffiffiffinp ðXn � �Þ=� will be approximately a standard normaldistribution.
Let �2 denote the variance of the distribution of therandom delays added by the attacker (i.e., let VarðdykÞ ¼VarðdzkÞ ¼ �2). Applying the central limit theorem to randomsampleX1 ¼ dz1
� dy1; . . . ; Xm ¼ dzm � dym , where VarðXkÞ ¼
VarðdzkÞ þVarðdykÞ ¼ 2�2 and EðXkÞ ¼ EðdzkÞ � EðdykÞ ¼ 0,we have
Pr
ffiffiffiffiffimp ðXm �EðXiÞÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
VarðXiÞp < x
" #¼ Pr
ffiffiffiffiffimp
Xmffiffiffi2p
�< x
� �� �ðxÞ:
ð13Þ
Because of the symmetry of �ðxÞ
Pr
ffiffiffiffiffimp
Xmffiffiffi2p
�
�������� < x
� �� 2�ðxÞ � 1: ð14Þ
Therefore,
p ¼ Pr jXmj <s
2
h i
¼ Pr
ffiffiffiffiffimp
Xmffiffiffi2p
�
�������� < s
ffiffiffiffiffimp
2ffiffiffi2p
�
� �� 2�
sffiffiffiffiffimp
2ffiffiffi2p
�
� � 1:
ð15Þ
This means that the impact of the timing perturbation onthe watermark bit is approximately normally distributedwith zero mean and variance 2�2=m. Here, p represents theprobability that the impact of the timing perturbation fallswithin range ð� s
2 ;s2Þ. While the encoded watermark bit
could be decoded correctly even when the timing perturba-tion falls outside ð� s
2 ;s2Þ, such a probability is small when p
is close to 1. In the rest of this paper, we will use p as aconservative approximation to the probability that thewatermark bit will survive the timing perturbation.
Equation (15) confirms the result of (11). Fig. 4 illustrateshow the distribution of the impact of random timingperturbations by the attacker can be “squeezed” into thetolerable perturbation range by increasing the number ofredundant IPDs averaged.
Equation (15) also gives us an accurate estimate of thewatermark bit robustness. For example, assume the max-imum delay by the attacker is normalized to 1 time unit, therandom delays added by the attacker are uniformlydistributed over [0, 1] (whose variance �2 is 1/12), s ¼ 0:4,andm ¼ 12, then, Pr½jX12j < 0:2� � 2�ð1:2�
ffiffiffi2pÞ � 1 � 91%.
In other words, the impact of random timing perturbationson the average of 12 IPDs, with about 91 percent probability,will fall within the range [�0:2; 0:2]. Table 1 shows theestimation and simulation results of watermark bit robust-ness with uniformly distributed random delays over [0, 1],s ¼ 0:4, and various values form. This demonstrates that thecentral limit theorem can give us a very accurate estimate fora sample size as small as m ¼ 7.
440 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011
From (15), it can be seen that it is easier to achieve atarget level of robustness by increasing s than by increasingm. For example, the effect of increasing s by a factor of 2 isthe same as that of increasing m by a factor of 4.
5.3 Analysis of Watermark Detectability
Watermark detection refers to the process of determining ifa given watermark is embedded into the IPDs of a specificconnection or flow. Let the secret information sharedbetween the watermark embedder and decoder be repre-sented as <S;m; l; s; w>, where S is the packet selectionfunction that returns ðlþ 1Þ �m packets, m � 1 is thenumber of redundant pairs of packets in which to embedone watermark bit, l > 0 is the length of the watermark inbits, s > 0 is the quantization step size, and w is thel-bit watermark to be detected. Let f denote the flow to beexamined, and wf denote the decoded l bits from flow f .
The watermark detector works as follows:
1. Decode the l-bit wf from flow f .2. Compare the decoded wf with w.3. Report that watermark w is detected in flow f if the
Hamming distance between wf and w, representedas Hðwf; wÞ, is less than or equal to h, where h is athreshold parameter determined by the user, and0 � h < l.
The rationale behind using the Hamming distance ratherthan requiring an exact match to detect the presence of w isto increase the robustness of the watermark detector againsttiming perturbations by the attacker. Given any quantizationstep size s, there is a nonzero probability that an embeddedwatermark bit will be corrupted by timing perturbations, nomatter how much redundancy is used. Let 0 < p < 1 be theprobability that each embedded watermark bit will survive
the timing perturbation by the attacker. Then, the prob-ability that all l bits survive the timing perturbation by theattacker will be pl. When l is reasonably large, pl will tend tobe small unless p is very close to 1.
By using the Hamming distance h to detect watermarkwf ,the expected watermark detection rate will be
Xhi¼0
li
� pl�ið1� pÞi: ð16Þ
For example, for the value p ¼ 0:9102; l ¼ 24; and h ¼ 5,the expected watermark detection rate with exact bit matchwould be pl ¼ 10:45%. For the same values of p, l, and h, theexpected watermark detection rate using a Hammingdistance h ¼ 5 would be 98.29 percent.
It is possible that an unwatermarked flow happens tohave the watermark to be naturally detected. In this case, thewatermark detector would report the unwatermarked flowas having the watermark. It is termed a collision between wand f , if Hðwf; wÞ � h for an unwatermarked flow f .Collisions are obviously undesirable, as they may lead tofalse conclusions about who the source of an attack is.
Assuming that the l-bit wf extracted from a flow f isuniformly distributed, then the expected watermark colli-sion probability between any particular watermark w and arandom flow f will be
Xhi¼0
li
� 1
2
� l: ð17Þ
Fig. 5 shows the derived probability distribution of theexpected watermark detection and collision rates for l ¼ 24and p ¼ 0:9102. Given any watermark bit number l > 1 andany watermark bit robustness 0 < p < 1, the larger theHamming distance threshold h is, the higher the expected
WANG AND REEVES: ROBUST CORRELATION OF ENCRYPTED ATTACK TRAFFIC THROUGH STEPPING STONES BY FLOW... 441
TABLE 1Watermark Bit Robustness Simulation for Uniformly Distributed Random Delays over [0, 1], s ¼ 0:4
Fig. 5. Effect of the threshold h on the detection and collision rates of thewatermarking method.Fig. 4. Impact of random timing perturbations for different values of the
redundancy number m.
detection rate will be. However, a larger Hamming distancethreshold tends to increase the collision (false positive) rateof the watermark detection at the same time. An optimalHamming distance threshold h would be the one that giveshigh expected detection rate, while keeping the falsepositive rate low. However, the optimal h and l dependon 1) the number of packets available, 2) the definingcharacteristics of the timing perturbation, and 3) the desiredlevel of effectiveness. Due to space limitation, we leave thederivation of optimal h and l as a future work. We showthat our flow watermarking scheme can be effective evenwith potentially suboptimal h and l.
Given any quantization step size s > 0, and desiredwatermark collision probability Pc > 0, and any desiredwatermark detection rate 0 < Pd < 1, we can determine theappropriate Hamming distance threshold 0 � h < l. As-suming that h is chosen such that h < l=2, then, we have
Xhi¼0
li
� 1
2
� l�Xhi¼0
lh
� 1
2
� l� ðhþ 1Þ l
h
2l: ð18Þ
Because liml!1lh
2l¼ 0, we can always make the ex-
pected watermark collision probabilityPh
i¼0 ðliÞð12Þl < Pc
by having sufficiently large watermark bit number l.Since
Phi¼0 ðliÞpl�ið1� pÞ
i � pl, we can always make theexpected detection rate
Phi¼0 ðliÞpl�ið1� pÞ
i > Pd by mak-ing p sufficiently close to 1. From inequality (11), this canbe accomplished by increasing the redundancy number mgiven any fixed values of s and �.
Therefore, in theory, our watermark-based-correlationmethod can, with arbitrarily small averaged adjustment ofinterpacket timing, achieve arbitrarily close to 100 percentwatermark-detection rate, and arbitrarily close to 0 percentwatermark-collision probability at the same time againstarbitrarily large (but bounded) random timing perturbationof arbitrary distribution, as long as there are enough packetsin the flow to be watermarked.
In practice, the number of packets available is thefundamental limiting factor to the achievable effectivenessof our watermark-based correlation. Our experiences showthat our watermark-based correlation can be effective withas few as several hundred packets. For example, theexperiments in Sections 7.1 and 7.2 show that the water-marking only requires less than 300 packets to achieve avirtually 100 percent decoding rate against up to 1,000 msrandom timing perturbation, and less than 0.35 percentfalse positive rate.
Although, the watermark-detection true positive rate andfalse positive rate can be made arbitrarily close to 100 percentand 0 percent at the same time by introducing enoughredundancy, there is always a nonzero probability that anembedded watermark is not detected. In fact, there existsome special case of timing perturbation that could poten-tially and completely remove the embedded watermark. Forexample, with sufficiently large delay, the timing of thewatermarked packet flow could be perturbed such that theinterpacket arrival time is constant (or equivalently, the IPDsbetween adjacent packets equal to the average IPDs). In thiscase, the watermark decoding of the perturbed flow wouldbe fixed no matter what and how the watermark has beenembedded. However, achieving such a complete elimination
of embedded watermark requires knowing the exact timingcharacteristics of the traffic in advance. In the followingsection, we analyze the negative impacts of the adversary’stiming perturbations, and their limits under the constraintsof real-time communication.
6 LIMITS OF ADVERSARY’S TIMING PERTURBATION
We have shown that there exist special cases of brute forcetiming perturbation that could completely remove anyembedded watermark from any distribution of interpackettiming.
In this section, we analyze the limitations on the negativeimpact of the adversary’s timing perturbations. We assumethat the key parameters of the watermark-embeddingmethod are unknown to the adversary. We first identify theminimum distortion required for the adversary to comple-tely remove the embedded watermark and the optimalstrategy for doing so. We then analyze the additionalconstraints imposed by real-time communication, and theirimplications for the adversary’s ability to interfere with ordistort the watermark. We show that it is generally infeasiblefor the adversary to completely eliminate the embeddedwatermark from a flow of packets in real time.
6.1 Minimum Brute Force Perturbation Needed toCompletely Remove Watermark
Let the n-dimension vector SN ¼ <S1; . . . ; SN> (where Si 2Rþ is a random variable) be the host signal or carrier in whichthe watermark is to be embedded, M be the message to beembedded and transferred, andK be the key information forcorrect information embedding and decoding. Let XN ¼<X1; . . . ; XN> (where Xi 2 Rþ is a random variable) be thesignal afterM is embedded in SN . The adversary distortsXN
into another vector Y N ¼ <Y1; . . . ; YN> (where Yi 2 Rþ isanother random variable). We can view Yi as an estimator ofXi and use the mean squared error MSEðXi; YiÞ ¼ E½ðXi � YiÞ2�to measure the distortion between random variables Xi andYi. We useDðXN; Y NÞ ¼ 1
N
PNi¼1 MSEðXi; YiÞ to measure the
overall distortion between XN and Y N .We first, consider the distortion between single random
variables Xi and Yi. From an information-theoretic point ofview, to eliminate all the hidden information in Xi from Yivia brute force is to make the mutual information betweenXi and Yi IðXi;YiÞ be 0, i.e.,
IðXi;YiÞ ¼ HðXiÞ �HðXijYiÞ ¼ 0 ð19Þ
or
HðXiÞ ¼ HðXijYiÞ ¼ HðXi; YiÞ �HðYiÞ: ð20Þ
Then, we have
HðXi; YiÞ ¼ HðXiÞ þHðYiÞ: ð21Þ
Therefore, Xi and Yi are independent from each other. Thismeans that the adversary needs to distort Xi into anotherindependent random variable Yi in order to completelyremove any hidden information from random variable Xi
via brute force. The following theorem1 determines the MSEto achieve this:
442 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011
1. We omit the proof due to space limitation.
Theorem 3. The mean square error between two independentrandom variables X and Y is
MSEðX;Y Þ ¼ V arðXÞ þ V arðY Þ þ ððEðXÞ � EðY ÞÞ2:
Both VarðY Þ and ðEðXÞ � EðY ÞÞ2 are nonnegative, andthey will be 0 only when Y equals the constant value EðXÞ.Corollary 1. The minimum mean square error needed to convert
one random variable X into another independent randomvariable Y is V arðXÞ, and it only occurs when Y equals toconstant value EðXÞ.Therefore, the minimum distortion required to completely
eliminate hidden information from any particular randomvariableX via brute force is Var(X), and the optimal strategyto do so is to convert X into constant value EðXÞ.
Now, we consider the distortion between two n-dimen-sional vectors XN and Y N . The overall distortion betweenXN and Y N (DðXN; Y NÞ) will reach its minimum when eachMSEðXi; YiÞ reaches its minimum. Therefore, the minimumoverall distortion DðXN; Y NÞ required to completely elim-inate hidden information from XN is 1
N
PNi¼1 V arðXiÞ, and
the optimal strategy to achieve this is to convert XN ¼<X1; . . . ; XN> into Y N ¼ <EðX1Þ; . . . ; EðXNÞ>. Given anyfixed EðX1Þ; . . . ; EðXNÞ; Y N is fixed regardless of the exactvalues of XN . Therefore, the mutual information betweenXN and Y N is zero in this case. This result holds trueregardless of the distribution of each Xi in XN .
This result is consistent with Moulin’s work [16], [15]regarding the achievable capacity of an information hidingscheme in the presence of distortion by an adversary.Moulin showed that for a normally distributed host signal,the information hiding capacity is 0 when the distortion bythe adversary is equal to or greater than the variance of thehost signal. Here, we have shown that the adversary couldremove any hidden information from the host signal of anydistribution via distortion equal to or greater than thevariance of the host signal, and we have described anoptimal strategy to eliminate any hidden information viabrute force distortion.
6.2 Constraints of Real-Time Communication andTheir Implications
This section considers the additional constraints imposedby real-time communication and their implications for theadversary’s capability to completely remove any hiddeninformation from a real-time packet flow.
For a real-time packet flow P1; . . . ; Pn with time stampst1; . . . ; tn, respectively, let the host signal in which informa-tion will be embedded be SN ¼ <t1; . . . ; tn>. The require-ments of real-time communication impose the followingconstraints on any distortions over the packet timing:
1. Each packet can only be delayed.2. The delay to any packet is bounded (finite),
otherwise the real-time communication is broken.3. The delay to packet Pkðk < nÞ has to be determined
and performed before all n packets are received,otherwise the delay to Pk is unbounded whenn!1.
Let �i be the delay added to packet Pi, and t0i be thedistorted time stamp of packet Pi, then t0i ¼ ti þ �i. The
original and distorted interpacket delays between Piþ1 and
Pi are Ii ¼ tiþ1 � ti and I 0i ¼ t0iþ1 � t0i, respectively. Therefore,
�k ¼ t0k � tk ¼ �1 þXk�1
i¼1
ðI 0i � IiÞ: ð22Þ
The original and the perturbed interpacket timing
characteristics of packet flow P1; . . . ; Pn can be represented
by <t1; I1; . . . ; In�1> and <t01; I01; . . . ; I 0n�1>, respectively. In
particular, <I 01; . . . ; I 0n�1> represents the distortion pattern
over the original interpacket timing characteristics.According to results from Section 6.1, in order to
completely remove any hidden information from the
original interpacket timing characteristics, the adversary
needs to disturb <t1; I1; . . . ; In�1> into an independent one.
This means that <I 01; . . . ; I 0n�1> needs to be independent
from <I1; . . . ; In�1>. Therefore, the distortion pattern
<I 01; . . . ; I 0n�1> can be thought to be predetermined before
the original interpacket timing characteristics <t1; I1; . . . ;
In�1> is ever known.Let Imin and Imax denote the minimum and the
maximum of all Ii, respectively, and let D > 0 represent
the arbitrarily large maximum delay that the adversary
could add to any packet. To satisfy the real-time constraints,
the adversary could buffer at most DImin
packets before the
delay �1 is determined and applied to packet P1.Assume the adversary buffers b (0 < b < n) packets:
P1; . . . ; Pb before decides �1. Since the adversary knows
<t1; I1; . . . ; Ib�1> and all I 0k, he can find appropriate value
of �1 to make sure that �1; . . . ; �b are nonnegative according
to (22).Now, we show that when Imin < Imax, it is generally
infeasible for the adversary to bound �n within range [0; D]
without prior knowledge of the exact values of Ib; . . . ; In�1.
From (22), it is easy to get
�n ¼ �b þXn�1
i¼bðI 0i � IiÞ: ð23Þ
Therefore, the real-time constraint xn 2 ½0; D� is equiva-
lent to
� �bn� 1
n
Xn�1
i¼bI 0i �
1
n
Xn�1
i¼bIi �
D� �bn
: ð24Þ
Since both �b and D are fixed and finite, we have
limn!1
1
n
Xn�1
i¼bI 0i ¼ lim
n!1
1
n
Xn�1
i¼bIi: ð25Þ
This means that the average interarrival time (or
equivalently the average packet rate) of the perturbed
packet flow must be arbitrarily close to that of the original
packet flow given sufficiently large n. In order to
completely remove any hidden information from the
interpacket timing domain of a packet flow, <I 0b; . . . ; I 0n�1>
must be independent from <Ib; . . . ; In�1>. This means
<I 0b; . . . ; I 0n�1> must be determined before <Ib; . . . ; In�1>
is ever known. However, when Imin < Imax and n is large, it
is infeasible for the adversary to determine <I 0b; . . . ; I 0n�1>
WANG AND REEVES: ROBUST CORRELATION OF ENCRYPTED ATTACK TRAFFIC THROUGH STEPPING STONES BY FLOW... 443
whose average interarrival time is arbitrarily close to anunknown value 1
n
Pn�1i¼b Ii 2 ½Imin; Imax�.
In other words, when Imax > Imin, the adversary is not ableto meet all the real-time constraints when he tries tocompletely remove all the hidden information from asufficiently long real-time packet flow. Therefore, it isgenerally infeasible for the adversary to completely eliminateall the hidden information in real time from a sufficientlylong packet flow even with arbitrarily large delays.
7 IMPLEMENTATION AND EXPERIMENTS
We have implemented both online and offline versions ofour watermarking embedding and decoding scheme. Theonline version runs as a Linux kernel module in Linuxkernel 2.4.22, and it embeds the specified watermark tospecified IP flow at real time. The online version usesLinux netfilter and iptable to communicate the water-marking parameters from user space to kernel space, and ithas 10 ms precision in adjusting the timing of selectedpackets. The offline version is a user-space application thatreads and manipulates the flow trace files rather than real-time flows. The watermark-embedding part reads a flowtrace file of pcap format, and outputs a new flow trace filein which the watermark is embedded in the packet timing.The watermark-detecting part reads the watermarked-flowtrace file and reports if the specified watermark is detected.Our experience has shown that the online version andoffline version of our implementations consistently givealmost identical results on correlation true positive andfalse positive rates.
In this section, we empirically validate our activewatermark-based-correlation scheme in the presence ofrandom timing perturbations. In particular, we seek toanswer the following questions:
1. How vulnerable are existing (passive) timing-basedcorrelation schemes to random timing perturbations?
2. How robust is the active watermark-based correla-tion against random timing perturbation?
3. How effective is watermark-based correlation incorrelating the encrypted flows in the presence ofboth iid and non-iid random timing perturbations?
4. How accurate are our quantitative tradeoff modelsof watermark-bit robustness, watermark-detectionrate, and watermark-collision rate in predicting theactual values?
We have used three flow sets, labeled FS1, FS1-Int, andFS2, in our experiments. FS1 is derived from over 49 millionpacket headers of the Bell Labs-1 traces of NLANR [17]. Itcontains 121 SSH flows that have at least 600 packets and thatare at least 300 seconds long. FS2 contains 1,000 synthetictelnet flows generated from an empirically derived distribu-tion [6] of telnet packet interarrival times, using the TCPLIB[5] tool.
We also wish to examine the performance of our methodfor purely interactive traffic. Because SSH flows may containnoninteractive traffic, such as those of bulk data transfer andX-windows management, it is desirable to remove suchnoninteractive traffic from the SSH flows to get more accurateevaluation of watermark-based correlation of interactive
flows. We examine the adjacent IPD of those SSH flows inFS1, and filter out those flows whose adjacent IPDs are tooshort to be generated by humans. Since it is very unlikely for aperson to type more than 14 keystrokes per second, we chose70 ms as the threshold to tell whether the adjacent IPD isgenerated by human typing or not. We have found 33 out of121 flows in FS1 satisfy the following conditions:
1. Flow has >40% adjacent IPDs shorter than 70 ms.2. Flow has >10% 10-consecutive adjacent IPDs all of
which are shorter than 70 ms.
By removing 33 noninteractive flows from FS1, weobtained a new flow set FS1-Int following the abovedefinition.
We considered three types of timing perturbations in our
experiments. The first is the uniformly distributed random
perturbation, in which the attacker at a stepping stone adds
to each packet a random delay evenly distributed between 0
and the maximum delay (chosen by the attacker). The
second is a self-similar perturbation, in which the attacker at a
stepping stone adds to each packet a self-similar random
delay. The third is the batch-releasing perturbation, in which
the attacker at a stepping stone periodically buffers and
holds all packets received within a certain time window,
and then forwards all the buffered packets at line speed
once the time window has expired.
The first type of perturbation is iid, while the second and
third types of perturbation are non-iid. In fact, the batch-
releasing perturbation is neither independent nor identi-
cally distributed, since the impact over any packet is closely
correlated to the time of that packet. In addition, batch-
releasing drastically changes the original timing character-
istics of any flow to a pattern of periodic bursts (as shown in
Fig. 6), which represents a challenging case for any timing-
based correlation.
7.1 Correlation True Positive Experiments
This set of experiments aim to compare and evaluate the
correlation effectiveness of our proposed active watermark-
based correlation and previous passive timing-based corre-
lation under various timing perturbations. We used two
methods of correlation. First, we used an existing, passive
444 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011
Fig. 6. Effect of 10 seconds batch-releasing timing perturbation.
timing-based correlation method called IPD-based correla-
tion [32] to correlate each flow in FS1 with the same flow, after
it is perturbed by various levels of uniformly distributed
random timing perturbations. If the flow and the perturbed
flow are reported correlated, it is considered as a true positive
(“IPDCorr TP”) of the correlation in the presence of timing
perturbation. Second, we embedded a random 24-bit water-
mark into each flow of FS1 and FS2, with redundancy
number m ¼ 12, and quantization step size s ¼ 400 ms for
each watermark bit. The embedding of the 24-bit watermark
requires 300 packets to be selected, of which 288 were
delayed. Fig. 7 shows the effect of the watermark embedding
over the interpacket timing, and illustrates that the water-
mark embedding is far from obvious. Third, we randomly
perturbed the packet timing of the watermarked flows of FS1
and FS2 with different types of timing perturbations. It is
considered a true positive (“IPDWMCorr TP”) of watermark-
based correlation if the embedded watermark can be
detected from the timing-perturbed watermarked flows,
with a Hamming distance threshold h ¼ 5. Finally, we
calculated the expected watermark-detection rate from (15)
and (16) under various maximum delays of uniformly
distributed random timing perturbations.
7.1.1 Correlation under Uniformly Distributed Random
Timing Perturbation
Fig. 8 shows the measured (as well as expected) truepositive rates of IPD-based correlation and watermark-based correlation on FS1 and FS2 under various levels ofuniformly distributed random timing perturbations. Theresults clearly indicate that IPD-based correlation isvulnerable to even moderate random timing perturbations.Without timing perturbation, IPD-based correlation is ableto successfully correlate 93 percent of the SSH flows of FS1.However, with a maximum 100 ms random timingperturbation, the true positive rate of IPD-based correlationdrops to 45.5 percent, and with a 200 ms maximum delay,the rate drops to 21.5 percent.
In contrast, the proposed watermark-based correlation ofthe flows in FS1, FS1-Int, and FS2 is able to achievevirtually a 100 percent true positive rate, with up to amaximum 600 ms random timing perturbation. With a
maximum 1,000 ms timing perturbation, the true positiverates of watermark-based correlation for FS1, FS1-Int, andFS2 are 84.2 percent, 89.85 percent, and 97.32 percent,respectively. It can be seen that the measured watermark-based correlation true positive rates are well approximatedby the estimated values, based on the watermark detectionrate model (15) and (16). In particular, the true positive ratemeasurements of FS2 are very close to the predicted valuesat all perturbation levels.
7.1.2 Correlation under Self-Similar Distributed Random
Timing Perturbation
To see how our watermark-based correlation works whenthe random timing perturbation is non-iid, we first investi-gated correlation under self-similar timing perturbations.We used Kramers implementation [13] of Taqqu et al.’ [29]self-similar synthetic traffic-generating method to generatethe self-similar timing perturbation with 128 sources of ON/OFF periods aggregated with 30 percent cumulative load.And, we have bounded the timing perturbation throughmodulo operation. In particular, we have used 128 aggre-gating sources of ON/OFF periods to generate self-similardelays in units of milliseconds.
WANG AND REEVES: ROBUST CORRELATION OF ENCRYPTED ATTACK TRAFFIC THROUGH STEPPING STONES BY FLOW... 445
Fig. 7. Comparison of 288 selected IPDs before and after watermark embedding.
Fig. 8. Correlation true positive rates under uniformly distributed randomtiming perturbations.
Fig. 9 shows the measured watermark-correlation truepositive rates under various bounded self-similar timingperturbation, and expected watermark correlation truepositive rates of uniformly distributed random delays, withh ¼ 5; l ¼ 24; m ¼ 12; and s ¼ 400 ms. It shows that thebounded self-similar perturbation yields much higherwatermark-correlation true positive rates than the expectedtrue positive rates of uniformly distributed random delayperturbation with the same delay upper bound (1;200 2;400 ms). This indicates that the bounded self-similartiming perturbation has less negative impact than theuniformly distributed random timing perturbation of sameupper bound.
We calculated the variance of 10,000 self-similar delaysthat are bounded by 1,000 ms, and obtained �2
selfsim ¼ 46;786.However, the variance of a uniformly distributed randomvariable of range [0, 1,000] �2
uniform is 83,333. According to(15), the smaller the variance �2 is, the higher is thewatermark bit robustness (equivalently, the higher thewatermark true positive rate should be). This explains whythe self-similar type of random perturbation has less negativeimpact than the uniformly distributed random delays ofsame upper bound.
7.1.3 Correlation under Batch-Releasing Random
Timing Perturbation
The second type of non-iid random timing perturbation, weinvestigated, is the batch-release timing perturbation. In thismodel, incoming packets are buffered until expiration of thenext timer period, at which point all buffered packets areoutput at line speed, in a burst. With the batch-releaseperturbation, the actual delay of any packet depends onwhere it falls within the timer interval, or window.Assuming that the packet arrival times are uniformlydistributed within the batch-release window, we are ableto calculate the variance of the delays over all packets, giventhe window size or duration.
Fig. 10 shows the measured and estimated correlationtrue positive rates of our watermark-based correlation onflow set FS2, under various levels of batch-release timingperturbations. We used 24-bit watermarks with quantization
step size s ¼ 400 ms, redundancy number m ¼ 12, and
Hamming distance threshold h ¼ 5. The expected true
positive rates are calculated by (15) and (16). The measured
values are close to the expected values, which demonstrates
that our analytical model is applicable to non-iid timing
perturbations.
7.2 Correlation False Positive Experiments
As mentioned above, there is a nonzero probability that an
unwatermarked flow happens to exhibit the randomly
chosen watermark. This case is considered a correlation
collision, or false positive. According to our correlation
collision model (17), the collision rate is determined by the
number of watermark bits l and the Hamming distance
threshold h.We experimentally investigated the following false
positive rates for varying values of the Hamming distance
threshold h and the number of watermark bits l:
1. Collision rates between a given flow and 10,0001,000,000 randomly generated 24-bit watermarkswith varying Hamming distance threshold h.
2. Collision rates between a given 24-bit watermarkand 10,000 1,000,000 randomly generated (usingTCPLIB) synthetic telnet flows with varying Ham-ming distance threshold h.
3. Collision rates between a given flow and 100,000randomly generated watermarks of various lengthswith Hamming distance threshold h ¼ 5.
4. Collision rates between a given watermark ofvarious lengths and 10,000 1,000,000 randomlygenerated (using tcplib) synthetic telnet flows withHamming distance threshold h ¼ 5.
Fig. 11 shows the measured and estimated correlationfalse positive rates. The left subfigure shows the falsepositive rates for varying Hamming distance thresholdsand fixed length (24-bit) watermarks, and the right subfigureshows the false positive rates for varying watermark lengthsand a fixed Hamming distance threshold h ¼ 5. Themeasured values are the average of 100 separate experi-ments, and they are very close to the estimated values
446 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011
Fig. 9. Correlation true positive rates under self-similar random timingperturbations.
Fig. 10. Correlation true positive rates under batch-releasing randomtiming perturbations.
calculated from (17). Thus, the experimental results validate
our model of the collision probability.
7.3 Watermark Detection Tradeoff Experiments
Equation (15) gives us the quantitative tradeoff between the
expected watermark bit robustness, the redundancy num-
ber m, and the defining characteristics of the random timing
perturbation �2. With a given watermark bit robustness p,
(16) gives us the expected watermark detection rate.To verify the validity and accuracy of our tradeoff
models of watermark bit robustness and watermark
detection rate, we did the following experiments:
1. We embedded a random 24-bit watermark into eachflow in FS1, FS1-Int, and FS2, with quantization steps ¼ 400 ms, and varying redundancy numbersm ¼ 7; 8; 9; 10; 11; 12. After we perturbed the water-marked flows with 1,000 ms maximum randomdelays, we measured the watermark detection rate ofthe perturbed, watermarked flows with Hammingdistance threshold h ¼ 5.
2. We also embedded a random 24-bit watermarkinto each flow in FS1, FS1-Int, and FS2, withquantization step s ¼ 400 ms, redundancy numberm ¼ 12. After perturbing, the watermarked flows
with 1,000 ms maximum random delays, wemeasured the watermark-detection rate of theperturbed, watermarked flows for varying Ham-ming distance thresholds of h ¼ 2; 3; 4; 5; 6; 7; 8.
3. In a separate experiment, we embedded a randomwatermark of varying lengths l ¼ 18, 19, 20, 21, 22,23, 24 into each flow in FS1, FS1-Int, and FS2, withquantization step s ¼ 400 ms and redundancynumber m ¼ 12. After perturbing, the watermarkedflows with 1,000 ms maximum uniform randomdelays, we measured the watermark-detection rateof the perturbed, watermarked flows with differentHamming distance threshold h ¼ 3.
Fig. 12 shows the average of 100 experiments for the
measured watermark detection rates of FS1 and FS1-Int,
and the average of 10 experiments for the measured
watermark-detection rates of FS2, as well as the expected
detection rate derived from (15) and (16). In all cases, the
measured detection rates of FS2 are almost identical to the
expected values, and detection rates of FS1 are similar to
but lower than the expected values. The detection rates of
FS1-Int are always between that of FS1 and FS2. These
results validate our quantitative tradeoff models of water-
mark bit robustness and watermark-detection rate.
WANG AND REEVES: ROBUST CORRELATION OF ENCRYPTED ATTACK TRAFFIC THROUGH STEPPING STONES BY FLOW... 447
Fig. 11. Correlation false positive (collision) rates versus hamming distance threshold h and number of watermark bits l.
Fig. 12. Watermark detection rate tradeoff with redundancy number m, Hamming distance threshold h and number of watermark bits l.
7.4 Comparison with Other Representative PassiveApproach
We have also compared the effectiveness and the numbersof packets needed by our active watermark-correlationapproach, and a representative passive correlation ap-proach [1], under identical levels of timing perturbationon the same sets of traces.
For Poisson arrivals, paper [1] claimed that its detectionalgorithm DETECT-ATTACKS(�, p�) is guaranteed toachieve a 100 percent detection rate and no more than � falsepositive rate given sufficient number of packets. However, itdid not include any experimental results that demonstratedthe claimed effectiveness. To empirically compare theeffectiveness of our active approach and that of the passivecorrelation method of [1], we implemented its detectionalgorithm DETECT-ATTACKS(�, p�). After identifying themaximum number of packets p� in any time interval� ¼ 800 ms from each of the 1,000 flows in FS2, we applieddetection algorithm DETECT-ATTACKS(�, p�), with� ¼ 0:3%, to correlate flows in FS2 and the correspondingperturbed flows with maximum 800 ms uniformly distrib-uted perturbation. Surprisingly, the detection algorithmDETECT-ATTACKS(�, p�) in this test only achieved a79.5 percent detection rate, while experiencing no falsepositives.
As shown in Section 7.1, under a maximum 800 msuniformly distributed timing perturbation, our watermark-based-correlation method achieved at least a 99.9 percenttrue positive rate and about a 0.3 percent false positive rate,using parameter values of h ¼ 5, l ¼ 24, s ¼ 400 ms, andm ¼ 12 on flow set FS2.
We have further compared the upper bounds of thenumber of packets needed by our active correlationapproach, and by the method of [1]. As an example,consider the requirements to achieve a guaranteed correla-tion effectiveness of at least a 99.9 percent true positive rateand a 0.35 percent false positive rate, for a maximum 600 msuniformly distributed timing perturbation. Given theparameter values m ¼ 36, s ¼ 400 ms, and D ¼ 600 ms,inequality (11) guarantees that the upper bound of the biterror probability of our active approach will be less than0.0834 (i.e., p > 0:9166). Choosing l ¼ 32, h ¼ 8, andp ¼ 0:9166, the expected watermark-detection rate from(16) is >99:9 percent, and the expected watermark-collisionrate from (17) is 0.35 percent. Therefore, in order to achievea 99.9 percent TPR and a 0.35 percent FPR with a maximum600ms timing perturbation, our active approach needs toadjust the timing of no more than 32� 36 ¼ 1;152 packets.For the approach of [1], letting � ¼ 600 ms, � ¼ 0:35%, andusing the value of p� measured from the flows in FS1,detection algorithm DETECT-ATTACKS(�, p�) requires atmost 16,698 packets to achieve a 100 percent TPR and a0.35 percent FPR. For this target, the upper bound of thenumber of packets needed by the passive approach of [1] toachieve a comparable correlation effectiveness is at least anorder of magnitude more than that of the active approach.
8 CONCLUSION AND FUTURE WORKS
Tracing attackers’ traffic through stepping stones is achallenging problem, especially when the attack traffic is
encrypted, and its timing is manipulated (perturbed) tointerfere with traffic analysis. The random timing perturba-tion by the adversary can greatly reduce the effectiveness ofpassive, timing-based correlation techniques.
We presented a novel active timing-based-correlationapproach to deal with random timing perturbations. Byembedding a unique watermark into the interpacket timing,with sufficient redundancy, we can make the correlation ofencrypted flows substantially more robust against randomtiming perturbations. Our analysis and our experimentalresults confirm these assertions.
Our watermark-based correlation is provably effectiveagainst correlated random timing perturbation as long asthe covariance of the timing perturbations on differentpackets is fixed. Specifically, the proposed watermark-basedcorrelation can, with arbitrarily small average time adjustment,achieve arbitrarily close to 100 percent watermark-detection(correlation true positive) rate and arbitrarily close to 0 percentcollision (correlation false positive) probability at the same timeagainst arbitrarily large (but bounded) random timing perturba-tion of arbitrary distribution (or process), as long as there areenough packets in the flow to be watermarked.
Compared with previous passive-correlation approaches,our active watermark-based correlation has several advan-tages:
. Our active watermark-based correlation makes noassumptions about the original distribution of theinterpacket timing of the original packet flow, and itdoes not require the adversary’s timing perturbationto follow any specific distribution or random processto be effective. This is in contrast to existing passivetiming-based correlation methods, all of whichassumed the interpacket timing of the original packetflow to follow some specific distribution or randomprocess (e.g., Poisson) when deriving their bounds.
. Our active watermark-based correlation was shownto require substantially fewer packets than arepresentative passive timing-based correlationmethod to achieve a given level of robustness.
. The effectiveness of our active watermark-basedcorrelation can be modeled more accurately. Besidesidentifying the provable upper bounds on thenumber of packets needed to achieve desiredcorrelation effectiveness under any given level ofperturbation, we have also identified the quantita-tive tradeoff models between the number of packetsneeded to achieve any desired correlation effective-ness under any given level of perturbation. Ourexperimental results validate the accuracy of thesetradeoff models. Thus, our tradeoff models are ofpractical value in optimizing the overall effective-ness of watermark-based correlation in real-worldsituations.
We have experimentally investigated the watermark-based correlation under both iid and non-iid timing perturba-tions, and the experimental results confirmed our analyticalconclusion that our watermark-based correlation is effectivefor both iid and non-iid random timing perturbations.
While our flow watermarking approach has been shownto be promising in correlating encrypted traffic in the
448 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011
presence of timing perturbation, it is not optimal from
coding theoretic perspective. One interesting area of future
work is to investigate how to make the flow watermarking
more robust with fewer packets.
ACKNOWLEDGMENTS
This work was partially supported by the US National
Science Foundation (NSF) Grants CNS-0524286 and CCF-
0728771. Some preliminary results of this work have been
presented at the 10th ACM Conference on Computer and
Communication Security (CCS ’03), October, 2003 [31].
REFERENCES
[1] A. Blum, D. Song, and S. Venkataraman, “Detection ofInteractive Stepping Stones: Algorithms and ConfidenceBounds,” Proc. Seventh Int’l Symp. Recent Advances in IntrusionDetection (RAID ’04), Oct. 2004.
[2] R.C. Chakinala, A. Kumarasubramanian, R. Manokaran, G.Noubir, C. Pandu Rangan, and R. Sundaram, “SteganographicCommunication in Ordered Channels,” Proc. Eighth InformationHiding Int’l Conf. (IH ’06), 2006.
[3] T.M. Cover and J.A. Thomas, Elements of Information Theory.John Wiley & Sons, Inc., 1991.
[4] I. Cox, M. Miller, and J. Bloom, Digital Watermarking.Morgan-Kaufmann Publishers, 2002.
[5] P. Danzig and S. Jamin, “Tcplib: A Library of TCP InternetworkTraffic Characteristics,” Technical Report USC-CS-91-495, Univ. ofSouthern California, 1991.
[6] P. Danzig, S. Jamin, R. Cacerest, D. Mitzel, and E. Estrin, “AnEmpirical Workload Model for Driving Wide-Area TCP/IPNetwork Simulations,” J. Internetworking, vol. 3, no. 1, pp. 1-26,Mar. 1992.
[7] M. DeGroot, Probability and Statistics. Addison-Wesley PublishingCompany, 1989.
[8] D. Donoho et al, “Multiscale Stepping Stone Detection: DetectingPairs of Jittered Interactive Streams by Exploiting MaximumTolerable Delay,” Proc. Fifth Int’l Symp. Recent Advances in IntrusionDetection (RAID ’02), pp. 17-35, Oct. 2002.
[9] M.T. Goodrich, “Efficient Packet Marking for Large-Scale IPTraceback,” Proc. Ninth ACM Conf. Computer and Comm. Security(CCS ’02), pp. 117-126, Oct. 2002.
[10] T. He and L. Tong, “Detecting Encrypted Stepping-StoneConnections” IEEE Trans. Signal Processing, vol. 55, no. 5,pp. 1612-1623, May 2006.
[11] H. Jung et al., “Caller Identification System in the InternetEnvironment,” Proc. Fourth USENIX Security Symp., 1993.
[12] S. Kent and R. Atkinson RFC 2401: Security Architecture for theInternet Protocol, IETF, Sept. 1998.
[13] G. Kramer, “Generator of Self-Similar Network Traffic,” http://wwwcsif.cs.ucdavis.edu/kramer/code/trf_gen2.html, 2005.
[14] J. Li, M. Sung, J. Xu, and L. Li, “Large Scale IP Traceback in High-Speed Internet: Practical Techniques and Theoretical Foundation,”Proc. IEEE Symp. Security and Privacy, 2004.
[15] P. Moulin, “Information-Hiding Games,” Proc. Int’l WorkshopDigital Watermarking (IWDW ’03), May 2003.
[16] P. Moulin and J.A. O’sullivan, “Information-Theoretic Analysis ofInformation Hiding,” IEEE Trans. Information Theory, vol. 49, no. 3,pp. 563-593, Mar. 2003.
[17] NLANR Trace Archive, http://pma.nlanr.net/Traces/long/,2005.
[18] OpenSSH. URL. http://www.openssh.com, 2010.[19] P. Peng, P. Ning, and D.S. Reeves, “On the Secrecy of Timing-
Based Active Watermarking Trace-Back Techniques,” Proc. IEEESymp. Security and Privacy (SP ’06), May 2006.
[20] P. Peng, P. Ning, D. Reeves, and X. Wang, “Active Timing-BasedCorrelation of Perturbed Traffic Flows with Chaff Packets,” Proc.Second Int’l Workshop Security in Distributed Computing Systems(SDCS ’06), June 2005.
[21] Y.J. Pyun, Y.H. Park, X. Wang, D.S. Reeves, and P. Ning, “TracingTraffic through Intermediate Hosts that Repacketize Flows,” Proc.IEEE INFOCOM ’07, May 2007.
[22] Y.J. Pyun and D.S. Reeves, “Deployment of Network Monitors forAttack Attribution,” Proc. Fourth Int’l Conf. Broadband Comm.,Networks, and Systems (Broadnets ’07), pp. 525-534, Sept 2007.
[23] S. Savage, D. Wetherall, A. Karlin, and T. Anderson, “PracticalNetwork Support for IP Traceback,” Proc. ACM SIGCOMM ’00,pp. 295-306, Sept. 2000.
[24] C.E. Shannon, “A Mathematical Theory of Communication” BellSystem Technical J., vol. 27, pp. 379-423, 623-656, July/Oct. 1948.
[25] S. Snapp et al., “DIDS (Distributed Intrusion Detection System)—-Motivation, Architecture, and Early Prototype,” Proc. 14th Nat’lComputer Security Conf., pp. 167-176, 1991.
[26] A. Snoeren and C. Patridge et al., “Hash-Based IP Traceback,”Proc. ACM SIGCOMM ’01, pp. 3-14, Sept. 2001.
[27] S. Staniford-Chen and L. Heberlein, “Holding Intruders Accoun-table on the Internet,” Proc. IEEE Symp. Security and Privacy,pp. 39-49, 1995.
[28] C. Stoll, The Cuckoo’s Egg: Tracking a Spy through the Maze ofComputer Espionage. Pocket Books, 2000.
[29] M.S. Taqqu, W. Willinger, and R. Sherman, “Proof of aFundamental Result in Self-Similar Traffic Modeling,” ACMComputer Comm. Rev., vol. 27, pp. 5-23, 1997.
[30] X. Wang, S. Chen, and S. Jajodia, “Network Flow WatermarkingAttack on Low-Latency Anonymous Communication Systems,”Proc. IEEE Symp. Security and Privacy (SP ’07), May 2007.
[31] X. Wang and D. Reeves, “Robust Correlation of EncryptedAttack Traffic through Stepping Stones by Manipulation ofInterpacket Delays,” Proc. 10th ACM Conf. Computer and Comm.Security (CCS ’03), pp. 20-29, Oct. 2003.
[32] X. Wang, D. Reeves, and S.F. Wu, “Inter-Packet Delay BasedCorrelation for Tracing Encrypted Connections through SteppingStones,” Proc. Seventh European Symp. Research in Computer Security(ESORICS ’02), pp. 244-263, Oct. 2002.
[33] X. Wang, D. Reeves, S.F. Wu, and J. Yuill, “Sleepy WatermarkTracing: An Active Network-Based Intrusion Response Frame-work,” Proc. 16th Int’l Conf. Information Security (IFIP/Sec ’01),pp. 369-384, June 2001.
[34] T. Ylonen and C. Lonvick, IETF Internet Draft: SSH ProtocolArchitecture, IETF, draft-ietf-secsh-architecture-16.txt, Work inProgress, June 2004.
[35] K. Yoda and H. Etoh, “Finding a Connection Chain for TracingIntruders,” Proc. Sixth European Symp. Research in ComputerSecurity (ESORICS ’00), pp. 191-205, Oct. 2002.
[36] Y. Zhang and V. Paxson, “Detecting Stepping Stones,” Proc. NinthUSENIX Security Symp., pp. 171-184, 2000.
[37] L. Zhang, A.G. Persaud, A. Johnson, and Y. Guan, “Detection ofStepping Stone Attack under Delay and Chaff Perturbations,”Proc. 25th IEEE Int’l Performance Computing and Comm. Conf.(IPCCC ’06), Apr. 2006.
Xinyuan Wang received the PhD degree incomputer science from North Carolina StateUniversity in 2004 after years of professionalexperience in the networking industry. He iscurrently an associate professor of computerscience at George Mason University. His mainresearch interests are around computer net-works and systems security including malwareanalysis and defense, attack attribution, anon-ymity and privacy, VoIP security, and digital
forensics. He is a recipient of the US National Science Foundation(NSF) Faculty Early Career Development (CAREER) Award. He is amember of the IEEE.
Douglas S. Reeves received the PhD degree incomputer science from Pennsylvania StateUniversity in 1987. In the same year, he joinedthe faculty of North Carolina State University,where he is currently a professor of computerscience. His research interests include computersystems, security, and peer-to-peer computing.He is a member of the IEEE.
. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.
WANG AND REEVES: ROBUST CORRELATION OF ENCRYPTED ATTACK TRAFFIC THROUGH STEPPING STONES BY FLOW... 449