IEEE_ACM Transactions on Networking Volume 22 Issue 4 2014 [Doi 10.1109_TNET.2013.2272740] Houmansadr, Amir; Kiyavash, Negar; Borisov, Nikita -- Non-Blind Watermarking of Network Flows

8/18/2019 IEEE_ACM Transactions on Networking Volume 22 Issue 4 2014 [Doi 10.1109_TNET.2013.2272740] Houmansadr, A…

1/13

1232 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 22, NO. 4, AUGUST 2014

Non-Blind Watermarking of Network FlowsAmir Houmansadr , Member, IEEE , Negar Kiyavash , Senior Member, IEEE , and Nikita Borisov , Member, IEEE

Abstract— Linking network flows is an important problem in in-

trusion detection as well as anonymity. Passive traf fic analysis canlink flows, but requires long periods of observation to reduce er-rors. Active traf fic analysis, also known as flow watermarking, al-

lows for better precision and is more scalable. Previous flow water-marks introduce significant delays to the traf fic flow asa side effectof using a blind detection scheme; this enables attacks that detectand remove the watermark, while at the same time slowing down

legitimate traf fic. We propose the first non-blind approach forflowwatermarking, called RAINBOW, that improves watermark in-

visibility by inserting delays hundreds of times smaller than pre-vious blind watermarks, hence reducesthe watermark interferenceon network flows. We derive and analyze the optimum detectorsfor RAINBOW as well as the passive traf fic analysis under dif-

ferent traf fic models by using hypothesis testing. Comparing thedetection performance of RAINBOW and the passive approach,we observe that both RAINBOW and passive traf fic analysis per-form similarly good in the case of uncorrelated traf fic, howeverthe RAINBOW detector drastically outperforms the optimum pas-sive detector in the case of correlated network flows. This justifies

the use of non-blind watermarks over passive traf fic analysis eventhough both approaches have similar scalability constraints. Weconfirm our analysis by simulating the detectors and testing themagainst large traces of real network flows.

IndexTerms— Flow watermarking, hypothesis testing, non-blindwatermarking, traf fic analysis.

I. I NTRODUCTION

I NTERNET attackers commonly relay their traf

fic througha number of (usually compromised) hosts in order to hide

their identity. Detecting such hosts, called stepping stones, istherefore an important problem in computer security. The detec-tion proceeds by finding correlated flows entering and leavingthe network. Traditional approaches have used patterns inherentin traf fic flows, such as packet timings, sizes, and counts, tolink an incoming flow to an outgoing one [1]–[5]. More re-cently, an active approach called watermarking has been consid-ered [6]–[12]. In this approach, traf fic characteristics of an in-coming flow are actively perturbed as they traverse some router to create a distinct pattern, which can later be recognized in out-going flows. These techniques also have relevance to anony-mous communication, as linking two flows can be used to break

Manuscript received March 12, 2012; revised October 06, 2012 and May28, 2013; accepted June 26, 2013; approved by IEEE/ACM T RANSACTIONSON NETWORKING Editor S. Kasera. Date of publication July 31, 2013; date of current version August 14, 2014. This work was supported in part by the Na-

tional Science Foundation under Grants CNS 0831488, CCF 10-54937 CAR,

and CCF 10-65022, the Boeing Trusted Software Center, Information Trust In-stitute, University of Illinois, and the AFOSR under Grants FA9550-11-1-0016

and FA9550-10-1-0573.A. Houmansadr is with the University of Texas at Austin, Austin, TX 78701

USA (e-mail: [email protected]). N. Kiyavash and N. Borisov are with the University of Illinois at Ur-

bana–Champaign, Urbana, IL 61801 USA (e-mail: [email protected];

[email protected]).

Digital Object Identifier 10.1109/TNET.2013.2272740

anonymity, and both passive traf fic analysis [13], [14] and ac-

tive watermarking [8], [9], [12], [15] have been studied in thatdomain as well.

The choice between passive and active techniques for traf fic analysis exhibits a tradeoff. Passive approaches requireobserving relatively long-lived network flows and storingor transmitting large amounts of traf fic characteristics. Water-marking approaches are more ef ficient, with shorter observation

periods necessary. They are also blind : Rather than storing or communicating traf fic patterns, all the necessary information isembedded in the flow itself. This, however, comes at a cost: Toensure robustness, the watermarks introduce large delays (hun-dreds of milliseconds) to the flows, interfering with the activityof benign users and making them subject to attacks [16], [17].

Motivated by this, we propose a new category for network flow watermarks, the non-blind fl ow watermark s. Non-blindwatermarking lies in the middle of passive techniques and(blind) watermarking techniques: Similar to passive tech-niques (and unlike blind watermarks), non-blind watermarkswill record traf fic pattern of incoming flows and correlatethem with outgoing flows. On the other side, similar to blindwatermarks (and unlike passive techniques), non-blind water-marking aids traf fic analysis by applying some modificationsto the communication patterns of the intercepted flows. Wedevelop and prototype the first non-blind flow watermark,called RAINBOW. RAINBOW records the timing pattern of incoming flows and correlates them with the timing pattern of

the outgoing fl

ows. On each incoming fl

ow, RAINBOW alsoinserts a watermark by delaying some packets, after recordingthe received timings. As such a watermark is generated inde-

pendently of the flows, this will diminish the effect of naturalsimilarities between two unrelated flows and allow a flowlinking decision to be made over a much shorter time period.RAINBOW uses spread-spectrum techniques to make thedelays much smaller than previous work. RAINBOW usesdelays that are on the order of only a few milliseconds; thismeans that RAINBOW watermarks not only do not interferewith traf fic patterns of normal users, but they are also virtuallyinvisible since the delays are of the same magnitude as naturalnetwork jitter (watermark invisibility, as studied in previouswork [10], [18], [19], implies very low probability of detection

through statistical analysis). In [10], we use different informa-tion theoretical tools to verify the invisibility of RAINBOWand demonstrate its high performance in linking network flowsthrough a prototype implementation over the PlanetLab [20]infrastructure.

In this paper, we thoroughly analyze the detection perfor-mance of RAINBOW non-blind watermark and compare itto that of passive traf fic analysis schemes. By using hypoth-esis testing mechanisms from the detection and estimationtheory [21], we find the optimum detection schemes for RAINBOW as well as the optimum passive detectors under different models for network traf fic. Modeling real-worldnetwork traf fic is a complicated problem as it depends on many

U.S. Government work not protected by U.S. copyright.


2/13

HOUMANSADR et al.: NON-BLIND WATERMARKING OF NETWORK FLOWS 1233

different parameters; as a result, we only consider two extrememodels of the network traf fic: 1) independent flows where eachflow is modeled as a Poisson process (traf fic model A), and2) completely correlated flows where all flows are consideredto have similar timing patterns (traf fic model B). We assumethat any real-world traf fic model lies in the middle of these twoextreme models. Our analysis leads to the following important

conclusions.1) Non-blind watermarking always performs a better detec-

tion than passive traf fic analysis. This is an essential re-sult in motivating the use of non-blind watermarks over

passive traf fic analysis since both have similar scalabilityconstraints, i.e., both approaches have communica-tion overheads and computation overheads [10].

Not that this point is not necessary (nor is always true) tomotivate the use of traditional (blind) watermarks over pas-sive traf fic analysis, since blind watermarks provide much

better scalability (i.e., communication overhead andcomputation overhead [10]).

2) Our analysis shows that the performance advantage of non-

blind watermarking (over passive schemes) is only mar-ginal for uncorrelated network traf fic, while it is very sig-nificant for correlated network traf fic. This knowledge can

be used to decide the best traf fic analysis approach in var-ious applications. We validate our analysis through sim-ulating the detection schemes on real network traces. In

particular, we show that for highly correlated traf fic, e.g.,same Web page downloads, passive traf fic analysis per-forms very poorly, while a RAINBOW watermark is highlyeffective.

3) We also show (through both analysis and experiments)that the optimum watermark detector derived for correlatedtraf fic (namely SLCorr ) also performs very well for uncor-related traf fic (while the optimum watermark detector for

uncorrelated traf fic does not do well for correlated traf fic).This allows one to use SLCorr as the sole watermark de-tector regardless of the type of traf fic being observed. Thisis especially useful in real-world applications where theobserved traf fic is a mixture of different flow types.

Note that in this paper we do not discuss the performanceadvantage of non-blind watermarks over traditional blind wa-termarks, as this has been justified in [10].

The rest of this paper is organized as follows. We review the problem of stepping stone detection and existing schemes inSection II. Our RAINBOW scheme is presented in Section III.In Section IV, we use hypothesis testing to find and analyze theoptimum likelihood ratio detectors for passive and non-blind ac-

tive (watermark) approaches under different traf fic models andanalyze their false error rates. In Section V, we validate the anal-ysis results through simulation of the detection schemes over real network traces. We review several properties of RAINBOWin Section VI. Finally, the paper is concluded in Section VII.

II. BACKGROUND

Linking network flows is an important problem in different

networking applications, e.g., stepping stone detection [1], [2].

In such applications, network flows are relayed throughinterme-

diate nodes that disguise the relation between the original and

the relayed flows by encrypting packet contents and modifying

packet headers.

Traf fic analysis is suggested as an effective tool for linking

network flows in such scenarios since the intermediate nodes do

not significantly modify the traf fic patterns of the relayed flows.

The common patterns used for traf fic analysis are the packet

counts, packet timings, and packet sizes.

A. Passive Traf fic Analysis

In general, passive traf fic analysis techniques operate by

recording characteristics of incoming streams and then cor-

relating them with the outgoing ones. The right place to do

this is often at the border router of an enterprise, so the over-

head of this technique is the space used to store the stream

characteristics long enough to check against correlated relayed

streams, and the CPU time needed to perform the correlations.

In a complex enterprise with many interconnected networks,

a connection relayed through a stepping stone may enter and

leave the enterprise through different points; in such cases,

there is additional communications overhead for transmitting

traf fic statistics between border routers.

The passive schemes have explored using various character-

istics for correlating streams. Zhang and Paxson [2] model in-

teractive flows as on–off processes and detect linked flows bymatching up their on–off behavior. Wang et al. [3] use the in-

terpacket delays and devise several metrics for correlating step-

ping stones. Blum et al. find upper bounds of evading passive

analysis for an adversary with limited perturbation freedom. He

and Tong correlate packet counts to detect stepping stones [22].

Coskun et al. [23] use flow sketches, short representations of

network flows, for a fast but not highly effective correlation of

network flows. Hu et al. [5] use neural networks for the detec-

tion of stepping stones by making the assumption that legitimate

traf fic do not go through more than two relays.

B. WatermarksTo address some of the ef ficiency concerns of passive traf fic

analysis, Wang et al. proposed the use of watermarks [6]. In this

scenario, a border router will modify the traf fic timings of the

incoming flows to contain a particular pattern—the watermark.

If the same pattern is present in an outgoing flow, a stepping

stone is detected.

Watermarks improve upon passive traf fic analysis in two

ways. First, by inserting a pattern that is uncorrelated with any

other flows, they can improve the detection ef ficiency, requiring

smaller numbers of packets to be observed (hundreds instead

of thousands) and providing lower false-positive rates (10 or

lower, as compared to 10 with passive watermarks). Second,they can operate in a blind fashion: After an incoming flow is

watermarked, there is no need to record or communicate the

flow characteristics since the presence of a watermark can be

detected independently. The detection is also potentially faster,

as there is no need to compare each outgoing flow to all the

incoming flows within the same time frame.

Wang et al. [6] were the first to propose the use of flow wa-

termarks for detecting stepping stones. They later suggest the

use of similar techniques for linking encrypted VoIP commu-

nications [15]. Several watermark suggestions [7]–[9] embed

watermark modifications in intervals of network flows to resist

packet-level modifications due to the noisy communication net-

work. Such suggestions, known as interval-based watermarks,

are susceptible to an attack that intercepts multiple watermarked


3/13


flows [17]. SWIRL [11] makes the watermark modifications de-

pendent on the host flow in order to resist this multiflow attack.

An effective watermark should propose some desirable prop-

erties. First of all, a watermark should be robust to modifications

of the traf fic characteristics that will occur inside an enterprise

network, such as jitter. The watermarks should also introduce

little distortion, in that they should not significantly impact the

performance of the flows. This is important because in a step-

ping-stone scenario, most watermarked flows will be benign.

Finally, watermarks should be invisible even to attackers who

specifically try to test for their presence.

III. R AINBOW WATERMARK

We next present the design of a new watermark scheme we

call RAINBOW, for Robust And Invisible Non-Blind Water-

mark. Our scheme is robust (to passive interference) and invis-

ible. However, to achieve invisibility while maintaining detec-

tion ef ficiency, we make the scheme non-blind ; that is,incoming

flows timings are recorded and compared with the timings of outgoing flows. This allows us to make a robust watermark test

with even low-amplitude watermarks.

Suppose that a flow with the packet timing information

enters border router where it is to

be watermarked (we use the superscript to denote an “un-

watermarked” flow). Before embedding the watermark, the

interpacket delays (IPDs) of the flow, are

recorded in an IPD database, which is accessible by the wa-

termark detector. The watermark is subsequently embedded

by delaying the packets by an amount such that the IPD of

the th watermarked packet is . The watermark

components take values with equal probability(the watermarker excludes any IPD smaller than from wa-

termarking in order to avoid negative delays). The value is

chosen to be small enough so that the artificial jitter caused

by watermark embedding is invisible to ordinary users and

attackers.1

In order to apply watermark delays on the flow, output packet

is delayed by , where is the initial delay

applied to the first packet. This results in , as de-

sired. Since we cannot delay a packet for a negative amount of

time, must be chosen large enough to prevent this from hap-

pening. Since the sequence is generated from a random seed,

the watermarker can calculate all of the partial sums

in advance and adjust accordingly. If a particular random

seed requires a very large initial delay , a different seed can

be chosen.

As the flow traverses the network, it accumulates extra de-

lays. Let be the delay that the packet accumulates by the time

it reaches the watermark detector; i.e., the packet is received at

the detector at time . The IPD values at the detector

are then

(1)

where is the jitter present in the network.

1Throughout this paper, by attacker we mean the attacker to the watermarking

scheme.

As mentioned before, the RAINBOW scheme is non-blind,

and therefore the detector has access to the IPD database where

the unwatermarked flows are recorded. Given an observed

flow at the detector with IPDs and a previously recorded

flow , the detector must decide whether the two flows are

linked or not. In Section IV, we derive the optimum detectors

for the RAINBOW watermarks according to the LRT rules.

We also derive the optimum passive detectors, showing that

the RAINBOW watermark performs significantly better than

passive traf fic analysis for correlated network flows.

IV. DETECTION APPROACHES

RAINBOW is the first non-blind flow watermarking scheme.

Non-blind watermarking inherits similar scalability issues from

the passive traf fic analysis. In this section, we show how non-

blind watermarking improves the traf fic analysis performance

as compared to the traditional passive traf fic analysis.

We derive optimum likelihood ratio test (LRT) detectors

for the RAINBOW watermarking scheme for different traf fic

models and compare its detection performance to those of optimum passive detectors. We show that RAINBOW outper-

forms passive traf fic analysis for different traf fic models; this

confirms what we expect intuitively from information theory,

as a non-blind watermark detector has access to more infor-

mation (the watermark and the IPDs), compared to a passive

detector that only has access to the IPDs. We also show that the

RAINBOW detector is reliable in different models, while the

optimum passive detector fails in some scenarios.

As the extreme models, we perform our detection analysis for

two traf fic models:

• traf fic model A: independent flows with i.i.d. interpacket

delays;

• traf fic model B: completely correlated flows.As it is infeasible to evaluate the detection performance forall

different traf fic models, we discuss the detection performance

for these two traf fic models, and consider any real-world net-

work flow to lie between these two extreme models. We show

that an active detector, i.e., RAINBOW, is reliable for different

models, while a passive detector fails for certain traf fic models.

A. Detection Primitives

We use hypothesis testing [21] to analyze the detection per-

formance of active and passive detectors. For an active detector,

we aim to distinguish between the two following hypotheses.

• (null hypothesis): The received fl

ow with IPDs isa new, unwatermarked flow, unlinked to the flow with

IPDs .

• : is the result of a flow with original IPDs being

watermarked and passed through the network.2

Also, for a passive detector, we consider the following hy-

pothesis testing problem.

• (null hypothesis): The received flow with IPDs is

a new flow, unlinked to (the IPDs of another received

flow).

• : is the result of passing through the network.

2 Note that there is another po ssibility, namely that is a watermarked flow, but no t corresp onding to . However, we ignore this cas e because errors in this

scenario do not matter: If the flow is said to be watermarked, then the detectionalgorithm is correct, and if it is said to be unwatermarked, it will later be tested

against the correct .


4/13


Fig. 1. Comparison of observed jitter and a fitted Laplace distribution.

We find the optimum LRTs of these hypothesis testing prob-

lems. For any received flow with IPDs, an LRT evaluates a

test metric for the IPDs, , and compares it to a detection

threshold ; if , the received flow is said to be linked

to the one in the detector’s database (with IPDs of ). We can

therefore express the false positive and false negative rates of the detector as

(2)

(3)

B. Network Jitter Model

We will model network delays as i.i.d. exponential, which im-

plies that the jitter (difference of two delays) is i.i.d. according to

a zero-mean Laplace distribution denoted by , where

is the variance of the jitter. Of course, in a real network,

delays will have some correlation; we compare the probability

density function (pdf) of real observed jitter on a connection

over PlanetLab [20] to a best-fit Laplace distribution in Fig. 1.We can see that the real pdf has greater support at 0, and the

Laplace distribution has a heavier tail. This means that our anal-

ysis of error rates will be conservative since 0 jitter will result

in no error for our detection scheme. We have also conducted

similar experiments with the same results on Tor anonymous

network [24] to consider the other application of watermarking.

C. Traf fic Model A: Independent Flows, i.i.d. IPDs

In this model, we assume that the candidate flows are inde-

pendent. Also, each flow has i.i.d. IPDs, i.e., the flow is modeled

with a Poisson process. This represents a good model for non-

interactive network flows.

1) Passive Detection (PASSV Scheme): In this section, wefind the optimum likelihood ratio (LRT) passive detector for the

traf fic model A. Suppose that the flow with IPDs is known to

the detector. The detector will need to check if it is correlated

with some received flow , where and are independent.

Hence, in this case, the hypothesis testing problem is

(4)

where and represent the network jitter. Based on our mea-

surements over the PlanetLab, we model the network jitter with

an i.i.d. Laplacian distribution (see Section IV-B).

In order to find the optimum LRT detector, we first need to

find the pdf of in different hypotheses, i.e., for hypoth-

esis . As the model A suggests, we model the IPDs as

i.i.d. exponential distribution. So, in hypothesis the received

signal is the summation of a Laplace and an exponential

random variable. We have that the summation of an exponen-

tial random variable and a Laplace distribution

, i.e., , is given by [25]

(5)

We use this to find

(6)

In the case of , since the is known to the detector, we

can model as a Laplacian distribution with mean . Hence

(7)

Note that even though the real-world IPDs can never be nega-tive, the densities and return a nonzero density for negative

values of the IPDs. In fact, this is due to the approximation we

make in modeling the network jitter as a two-sided Laplacian

distribution, and its effect is very small for ordinary network

flows based on our simulations [10].

Having the densities and , we derive the optimum de-

tector based on the likelihood ratio test to be

(8)

where is the LRT detection threshold and

(9)

(10)

We define as the normalized detection threshold . A

value of of results in a MiniMax detector.

Detection Performance: Let us consider the case where the

detector uses the PASSV detection scheme in order to link a

received flow with IPDs to a known flow with IPDs , i.e., a

registered flow. Considering the assumptions made in the traf fic

model A, i.e., the IPDs being i.i.d., we use Lemma 1 (part b) in

the Appendix to find the false positive ( ) and false negative

( ) error rates of the PASSV detector

(11)

(12)

where and

(13)

The error probabilities of and correspond to a fixed

known IPDs sequence, . The overall false errors are evaluated

by averaging and with respect to

(14)


5/13


(15)

(16)

(17)

(18)

(19)

We can represent the upper bounds of these false errors as

(20)

(21)

where

(22)

(23)

For each detection threshold , we find the tightest exponent

bounds and such that

(24)

(25)

Analysis Results: We use Mathematica 7.0 to evaluate the

false error exponents of (24) and (25). The parameters used for

the simulations are s and pps, borrowed

from [10]. Fig. 2 plots the tightest bounds for the error expo-

nents of and for different thresholds of .

Note that the optimum varies with the decision threshold. For

, the false positive and false negative errors are equal;

we name this error rate the crossover error rate (COER). For

the mentioned setting of the variables, the COER exponent of

the PASSV detector is equal to 1.06396.

2) Active Detection (ACTV Scheme): In this section, we find

the optimum LRT detector for the RAINBOW non-blind water-

mark for the traf fic model A. We have the following hypothesis

testing problem:

(26)

where ’s are the IPDs registered in the IPD database, and ’s

are the IPDs of an independent flow. As before, in order to find

the optimum LRT detector, we need to find the distribution of

in different hypotheses. Using (5), we find the corresponding

pdf under as

(27)

Fig. 2. Analytical error exponents and of the PASSV de-tection scheme for different values of (traf fic model A). ( s,

pps.)

Since and are known to the detector, we find the pdf in

hypothesis as the following:

(28)

Thus, the optimum detector based on the likelihood ratio test

is

(29)

where is the LRT detection threshold and

(30)

(31)

Detection Performance: As before, considering the in-

dependence of the IPDs and also the watermark bits, we use

Lemma 1 (part b) in the Appendix to find the error probabilities

of the ACTV detector for a given and

(32)

(33)

where , and

(34)

As and correspond to a fixed IPDs sequence

and the watermark , we evaluate the overall false errors by

averaging and with respect to and

(35)


6/13


(36)

(37)

(38)

(39)

(40)

The approximated upper bounds can be formulated as

(41)

(42)

where

(43)

(44)

Finally, the tightest bounds for each are found by maxi-

mizing the error exponents with respect to the parameter

(45)

(46)

Analysis Results: Using Mathematica 7.0, we evaluate the

false error exponents of (45) and (46). As before, we use the

parameters s, s, and pps for the

simulations. Fig. 3 plots the tightest bounds for the error expo-

nents of and for different thresholds of .

The COER exponent occurs for and is equal to 1.06828,which is slightly better compared to that of the PASSV detector

evaluated before, i.e., 1.06396.

D. Traf fic Model B: Correlated Flows, Correlated IPDs

As the other extreme of traf fic models, we investigate the

traf fic model with correlated IPDs. We consider the case where

all of the network flows have the same IPDs, e.g., for any two

flows with IPDs and , we have that for

all . This model captures the behavior of a number of widely

used types of traf fic, including bulk file transfers, browsing

the same/similar Web sites, VoIP voice/video calls, video

streaming, etc. In fact, as we demonstrate in this paper through

analysis and simulations, passive traf fic analysis is highly

Fig. 3. Analytical error exponents and of the ACTV de-tection scheme for different values of (traf fic model A). ( s,

pps.)

inef ficient in linking this kind of traf fic, while watermarking

provides promising detection.

1) Passive Detection: In this model, a passive detection faces

the following hypothesis testing problem:

(47)

where . The optimum LRT detector for this problem is the random guessing

(48)

where is a uniform random variable. The detection rule

is

(49)

Detection Performance: Since the detector is based on

random guessing, the false errors are as follows:

(50)

(51)

where is determined by the choice of .

2) Active Detection (SLCorr Scheme): In this case, we have

the following hypothesis testing problem:

(52)

Since , this can be reduced to the following

hypothesis testing:

(53)


7/13


Fig. 4. Block diagram of the SLCorr detection scheme.

where . The optimum LRT detector for this

problem can be found considering the distribution of in

different hypotheses

(54)

(55)

Thus, we can derive the LRT detection metric as

(56)

which can be expressed as:

(57)

(58)

is a soft-limiter with breakpoints at and ( is

the watermark amplitude as defined before)

(59)

We can reformulate the optimum detection rule as

(60)

where

(61)

and

(62)

We call this detector SLCorr , as it is composed of a soft limiter

followed by a correlation block. From a communications point

of view, the soft-limiter is useful in reducing the signal detection

noise in channels with a Laplaciandistributed noise. We will use

this as the detection scheme for the RAINBOW watermark, as

will be discussed later. Fig. 4 shows the block diagram of the

SLCorr detector. SLCorr is a MiniMax detector for a detection

threshold of .

Detection Performance: The SLCorr test metric is given

in (60) to (62). Let us define and as the pdf of

in hypothesis and , respectively. We have that

(63)

(64)

Based on these, we can evaluate and , namely the

pdf of under hypothesis and , respectively

(65)

(66)

Considering that the distributions and are

i.i.d. with , we use the Chernoff bound [part (c) of Lemma 1

in the Appendix] to find the error probabilities of the SLCorr

detector

(67)

(68)

where is thenormalized detectionthreshold. We have

that

(69)

and

(70)

We can express the above and false errors as

(71)

(72)

where

(73)

(74)


mizing error exponents with respect to the parameter

(75)

(76)


false error exponents of (75) and (76). The parameters used

for the simulations are s and s. Fig. 5

plots the tightest bounds for the error exponents of


8/13


Fig. 5 . Ana lytical erro r ex pon en ts an d of SLC orr for d if-ferent values of (traf fic model B). ( s, s.)

and for different thresholds of . The COER expo-

nent occurs for and is equal to 0.0945.

E. Discussion

Above, we derived the optimum passive and active detectors

for the traf fic analysis problem and evaluated their performance

by finding the Chernoff upper bounds of their false error rates.

In this section, we use the asymptotic relative ef ficiency (ARE)

as a tool to compare their detection performances.

TheARE is a measure for comparingtwo discrete-time detec-

tion schemes. For two discrete detection schemes and , theARE metric is defined as , where

is the number of ’s samples. The parameter is the smallest

number of samples thatresultsin ’s errorrate tobe smaller

than or equal to the error rate of (with samples). An ARE

metric of depicts that is asymptotically more

ef ficient than . Chernoff [26] finds the ARE metric of two de-

tectors and using their Chernoff error upper bounds as

(77)

where and are the error exponents of the Chernoff upper

bounds for and detectors, respectively.

Using the analysis results from Sections IV-C and IV-D, wecan derive the ARE metric of the optimum passive and active

detectors for the two traf fic models as

(78)

(79)

This asserts that the optimum active detector outperforms the

optimum passive detector in both traf fic models A and B (which

is intuitively expected from information theory). As an impor-

tant observation, we see that the active detector’s advantage is

very small for the traf fic model A, however the active detector

significantly outperforms the optimum passive detector in traf fic

model B, i.e., thecorrelated traf fic. In other words, the active de-

tector provides very good detection performance for different

traf fic models, however the passive detection is very poor for

the more correlated network traf fic.

In the rest of this section, we analyze the performance of the

SLCorr scheme under the traf fic model A, showing that even

though SLCorr is not the optimum detector for the traf fic model

A, however it provides very good detection performance under

this model. Based on this, we choose SLCorr as the sole detector

for RAINBOW, regardless of the behavior of the network flows.This simplifies the watermark detection, as real-world traf fic are

combinations of the models A and B, and the detection can be

performed regardless of the type of the received traf fic. We also

analyze the performance of PASSV and ACTV detectors under

traf fic model B, showing their inef ficiency in this model.

1) SLCorr Detection Performance for Traf fic Model A:

The SLCorr scheme is the optimum active detector for traf fic

model B, but not the traf fic model A. In this section, we show

that SLCorr achieves a good detection performance even under

traf fic model A, allowing a system designer to use it as the

sole detection scheme regardless of the type of the traf fic.

SLCorr faces the following hypothesis testing under the traf fic

model A:

(80)

Considering SLCorr’s detection metric, given in (60) to (62),

one can rewrite the hypothesis testing problem as

(81)

where . Let us assume and as the pdfs

of and , respectively. We have that

(82)

(83)

Also, based on the summation of two Laplace distributions

given in [25], we have that

(84)

(85)

(86)

Now, let us define and as the pdfs of under

hypotheses and , respectively. We derive as

(87)

Also, using (83), we derive as

(88)


9/13


Based on the and distributions and using the Cher-

noff bounds for signal detection (part c of Lemma 1 in the

Appendix), we find the error probabilities of the detector to be

(89)

(90)

where we have

(91)

(92)

and

(93)

(94)

As before, we can expressthe above and false errors

as

(95)(96)

where

(97)

(98)


mizing the error exponents with respect to the parameter

(99)

(100)


false error exponents of (99) and (100). The parameters used

for the simulations are s, pps, and

s. Fig. 6 plots the tightest bounds for the error exponents

of and for different thresholds of . The

COER exponent occurs for s, which is equal to

0.0228. Also, Fig. 7 shows the COER exponent with respect to

different values of the watermark amplitude, . As we can see,

increasing the watermark amplitude improves the detection per-

formance (but reduces the watermark invisibility as discussed

in [10]).

2) Detection Performance of PASSV and ACTV Schemes for

Traf fic Model B: As derived before, the PASSV and ACTV

Fig . 6 . Analy tical e rror exp on ents and o f SL Cor r fo r d if-ferent values of (traf fic model A). ( s, pps, s.)

Fig. 7. COER error exponent of SLCorr in traf fic model A for different water-

mark amplitudes.

schemes are the optimum passive and active detectors for the

traf fic model A. We show that PASSV and ACTV perform very

poorly under the traf fic model B, i.e., the correlated traf fic. This

is unlike the SLCorr detector that works well for both of the

traf fic models.Under the traf fic model B, the PASSV detector faces the

hypothesis testing problem of (47) with . One

can see that in this case the PASSV detection rule described

in Section IV-C.1 is exactly the same for both and

hypotheses. This means that the false positive error rate of

PASSV scheme for correlated flows is equal to its true positive

rate, which makes the PASSV scheme equivalent to a random

guessing detector. Similarly, for the traf fic model B the ACTV

scheme deals with the hypothesis testing problem of (52) with

. Our analysis and simulations on Mathematica

confirms that the ACTV detection metric results in very close

values for the two hypothesis of and , rendering the

ACTV detection scheme ineffective for network flows in traf fic

model B (we skip the details due to the space constraints).


10/13


TABLE IFALSE POSITIVE R ATE OF DIFFERENT DETECTION SCHEMES FOR PORT 443

NETWORK FLOWS. EACH EXPERIMENT IS R UN FOR 10 000 DIFFERENTPAIRS OF FLOWS

TABLE II

FALSE POSITIVE R ATE OF DIFFERENT DETECTION SCHEMES FOR PORT 25 NETWORK FLOWS. EACH EXPERIMENT IS R UN FOR 10 000 DIFFERENT

PAIRS OF FLOWS

V. SIMULATION R ESULTS

In this section, we evaluate the performance of the three

detection schemes introduced before, i.e., SLCorr, ACTV,

and PASSV, through simulating them over real-world traf fic.

We show that SLCorr outperforms the other detectors dealing

with real-world network flows due to the intrinsic correlations

among the real-world network flows. We use the CAIDA

network traces gathered in January 2009 [27] for our simula-

tions. For our simulations, we have implemented the detection

schemes in C++. From the CAIDA traces, we extract threetypes of network flows for our simulations: TCP ports of 443

(HTTPS), 25 (SMTP), and 22 (SSH). We only select flows

with rates lower than 30 pps (this is because the parameters of

the optimum detectors depend on the rate of the flows). In all

of the simulations, the detectors use the detection thresholds

derived through analysis in the previous sections, i.e., 0.001 for

SLCorr, 0 for ACTV, and 0 for PASSV.

In the first set of our simulations, we evaluate the false posi-

tive error rate of the three detection schemes for network flows

mentioned above. For each detection scheme, we run the detec-

tion algorithm for 10 000 different pairs of network flows. In

order to show the effect of number of packets in the detection

performance, we run the experiments for four different values

of the parameter, i.e., 25, 50, 100, and 200. Tables I–III show

TABLE IIIFALSE POSITIVE R ATE OF DIFFERENT DETECTION SCHEMES FOR PORT 22

NETWORK FLOWS. EACH EXPERIMENT IS R UN FOR 10000 DIFFERENTPAIRS OF FLOWS

the false positive rates of the experiments along with some sta-

tistics on the detection metrics for three TCP ports of 443, 25,

and 22, respectively. Results show that, in most of the cases, the

SLCorr scheme results in smaller false positive errors compared

to the ACTV and PASSV schemes. This is because the real net-

work flows are deviated from the Poisson model of the traf fic,

due to the intrinsic dependencies among the packets of real net-

work flows. The SLCorr detector, on the other hand, is the op-

timum detector for correlated network flows, which also results

in reasonable detection performance for Poisson-modeled net-

work flows. Comparing the results for the three different traf fic

types (Tables I–III), we observe that the ACTV and PASSV

schemes perform the worst for the SSH traf fic (TCP port 22);

we explain this by the fact that SSH flows are more correlated

compared to HTTPS and SMTP flows, as they are based on thetyping behaviors of the human entities. Another general obser-

vation from the simulations is that the detection performance

improves as the number of packets, , increases.

In the second set of experiments, we run the simulated detec-

tion schemes to measure the false negative error rates. Again,

we use the detection thresholds derived through the analysis in

previous sections. In each simulation of the SLCorr and ACTV

schemes, the candidate network flow is watermarked using the

RAINBOW scheme (Section III), and then a network delay is

randomly selected and applied to that flow from a large pool of

network delays measured over the Planetlab infrastructure [20]

(the average standard deviation of the network delay is around10 ms). Likewise, for the PASSV simulations, the candidate net-

work flow is delayed similarly to simulate the network inter-

ference. The delayed flow is then correlated with the original

flow (nondelayed and nonwatermarked) using each of the de-

tection schemes. Tables IV–VI show the false negative of the

experiments for the three different detection schemes, evalu-

ated for three different TCP ports. For the watermark detec-

tion schemes of SLCorr and ACTV, the experiments are re-

peated for four different values of the watermark amplitude, i.e.,

ms. Also, all of the simulations are run for

different values of the watermark length . Results show that

by choosing reasonable parameters for the RAINBOW water-

mark, the SLCorr and ACTV detection schemes result in very

small false negative rates, comparable to those of the passive


11/13


TABLE IVFALSE NEGATIVE R ATE OF DIFFERENT DETECTION SCHEMES FOR PORT 443

NETWORK FLOWS. EACH E XPERIMENT IS R UN FOR 10000 DIFFERENTPAIRS OF FLOWS

TABLE VFALSE NEGATIVE R ATE OF DIFFERENT DETECTION SCHEMES FOR PORT 25

NETWORK FLOWS. EACH E XPERIMENT IS R UN FOR 10000 DIFFERENTPAIRS OF FLOWS

TABLE VI

FALSE NEGATIVE R ATE OF DIFFERENT DETECTION SCHEMES FOR PORT 22 NETWORK FLOWS. EACH E XPERIMENT IS R UN FOR 10000 DIFFERENT

PAIRS OF FLOWS

detection. Again, we see that increasing improves the detec-

tion performance.

In the third set of experiments, we evaluate the false posi-

tive error rate of the three detection schemes over highly corre-

lated network flows. More specifically, we use flow traces cor-

responding to Web browsing activities of human entities that

target the same destination Web sites at different times and from

different network locations.3 Table VII shows the false posi-

tive error rates for different detection schemes for different Web

sites and for different values of (each simulation is averaged

3The traces are generated and provided to us by X. Gong from the University

of Illinois at Urbana–Champaign, Urbana, IL, USA.

over 100 runs). As can be seen, in most of the cases, the ACTV

and PASSV detection schemes result in very high false positive

rates, while the SLCorr scheme results in no false positive error

in all of the cases. This confirms what we expect intuitively:

The PASSV and ACTV schemes are optimum passive and active

detection schemes for independent network traf fic models, but

they perform poorly as the network fl ows get more correlated .

The SLCorr scheme, however, is the optimum detection scheme

for correlated network flows, and it also performs good enough

in the case of independent network flows.

VI. OTHER WATERMARK PROPERTIES

A. Invisibility

The pioneering designs for flow watermarking [6]–[9] fail

to provide invisibility due to their use of large packet delays.

Examples of attacks against these schemes include [16] and

[17]. This motivated the design of new generation of schemes

such as RAINBOW [10], Swirl [11], and [28], which work

by inserting smaller delay values. In particular, RAINBOW’sinvisibility was studied in several recent works [10], [18], [19].

More specifically, Houmansadr et al. [10] use several statistical

tools to analyze RAINBOW’s invisibility for different values

of watermarking parameters. More recently, Lin et al. [19]

analyzed watermark invisibility for several flow watermarking

schemes, including RAINBOW; they showed that an improper

use of watermarking parameters, e.g., large watermark ampli-

tudes, can give away the presence of the watermark. Another

analysis was provided by Luo et al. [18], where the performance

of the scheme was tested against BACKLIT. The authors show

that when a watermark is applied only on one side of a TCP

connection, it can be detected. To fix this, a watermark should be adapted to be applied on both sides of a connection if the

carrying transport protocol is TCP.

B. Robustness to Packet Modi fications

A practical watermark detector should withstand packet addi-

tions and removals. In [10], we showed that RAINBOW resists

packet additions/removals up to 20% of the flow length. This is

achieved by adding a preprocessing step at the decoder, known

as the matching step. We refer the interested reader to [10] for

more details.

C. Robustness to Active Attacks

We note that active robustness and invisibility are likely to

be impossible to achieve simultaneously. This is because to be

invisible, a watermarking scheme must introduce small changes

to the packet stream. In particular, it cannot introduce jitters ex-

ceeding a few milliseconds, as otherwise it would stand apart

from the natural network jitter. On the other hand, an active at-

tacker may be willing to introduce large delays, for example 500

ms as suggested in previous work, hence practically wiping out

the watermark. Furthermore, it is easy to imagine an attacker

determined to hide his tracks using even more drastic measures,

such as inserting dummy packets to generate a completely in-

dependent Poisson process [4], which will render any linking

techniques ineffective. As such, RAINBOW is designed to de-

tect stepping stones when the attackers is unwilling (or unable)


12/13


TABLE VIIFALSE POSITIVE ERROR R ATE OF DIFFERENT DETECTION SCHEMES FOR NETWORK FLOWS GENERATED BY BROWSING THE SAME WEB SITES

to actively distort his stream as it crosses a stepping stone. Fur-

thermore, as the watermark is invisible, the attacker will not be

able to tell that he is being traced and, thus, will be less likely

applying costly watermark countermeasures.

VII. CONCLUSION

In this paper, we introduce the first non-blind active traf fic

analysis scheme, RAINBOW. Using the tools from the detec-tion and estimation theory, we find the optimum passive and

(non-blind) active traf fic analysis schemes for different types of

the network flows. We show that, for different traf fic models,

the optimum active detectors outperform the optimum passive

detectors. This advantage is more significant for the more cor-

related network traf fic, e.g., the Web browsing traf fic. Consid-

ering the fact that both passive and non-blind active approaches

of traf fic analysis are constrained by similar scalability issues,

this finding motivated the use of non-blind active approaches

over the passive approaches.

APPENDIXCHERNOFF BOUNDS

Lemma 1 (Chernoff Bound for Signal Detection): Consider

the following binary hypothesis testing for signal detection:

(101)

For this hypothesis testing, consider a detection scheme with

rule

such that .We are interested in finding the false positive rate

and the false negative rate

of this detector in different cases. We have that [21]:

a) General case:

(102)

(103)

where is the cumulant generating function (CGF)

of under hypothesis .

b) Independent ’s: We have that

where corresponds to hypothesis . This results in the

error rates to be

(104)

(105)

For , this reduces to

(106)

(107)

where

(108)

c) i.i.d. ’s: For any and , we have that

, which reduces the false error rates to

(109)

(110)

For , this reduces to

(111)

(112)

where

(113)

R EFERENCES

[1] S. Staniford-Chen and L. T. Heberlein, “Holding intruders accountableon the Internet,” in Proc. IEEE S&P , 1995, pp. 39–49.

[2] Y. Zhang and V. Paxson, “Detecting stepping stones,” in Proc.USENIX Security, 2000, vol. 9, p. 13.

[3] X. Wang, D. Reeves, and S. F. Wu, “Inter-packet delay based corre-

lation for tracing encrypted connections through stepping stones,” in Proc. ES ORICS , 2002, pp. 244–263.

[4] A. Blum, D. X. Song, and S. Venkataraman, “Detection of interactivestepping stones: Algorithms and confidence bounds,” in Proc. RAID,

2004, pp. 258–277.


13/13


[5] H.-C. Wu and S.-H. S. Huang, “Neural networks-based detectionof stepping-stone intrusion,” Expert Syst. Appl., vol. 37, no. 2, pp.

1431–1437, Mar. 2010.

[6] X. Wang and D. S. Reeves, “Robust correlation of encrypted attack traf fic through stepping stones by manipulation of interpacket delays,”

in Proc. ACM CCS , 2003, pp. 20–29.[7] Y. Pyun, Y. Park, X. Wang, D. S. Reeves, and P. Ning, “Tracing traf fic

through intermediate hosts that repacketize flows,” in Proc. IEEE IN- FOCOM , 2007, pp. 634–642.

[8] X. Wang, S. Chen, and S. Jajodia, “Network flow watermarking attack on low-latency anonymous communication systems,” in Proc. IEEE S&P , 2007, pp. 116–130.

[9] W. Yu, X. Fu, S. Graham, D. Xuan, and W. Zhao, “DSSS-based flowmarking technique for invisible traceback,” in Proc. IEEE S&P , 2007,

pp. 18–32.[10] A. Houmansadr, N. Kiyavash, and N. Borisov, “RAINBOW: A robust

and invisible non-blind watermark for network flows,” in Proc. NDSS ,2009.

[11] A. Houmansadr and N. Borisov, “SWIRL: A scalable watermark to

detect correlated network flows,” in Proc. NDSS , 2011.[12] A. Houmansadr, “Design, analysis, and implementation of effective

network flow watermarking schemes,” Ph.D. dissertation, Dept. ECE,Univ. Illinois at Urbana–Champaign, Urbana, IL, USA, 2012.

[13] B. N. Levine, M. K. Reiter, C. Wang, and M. Wright, “Timing attacksin low-latency mix systems,” in Proc. FC , 2004, pp. 251–265.

[14] G. Danezis, “The traf fic analysis of continuous-time mixes,” in PETS ,2004, pp. 35–50.

[15] X. Wang, S. Chen, and S. Jajodia, “Tracking anonymous peer-to-peer

VoIP calls on the internet,” in Proc. ACM CCS , 2005, pp. 81–91.[16] P. Peng, P. Ning, and D. S. Reeves, “On the secrecy of timing-based

activewatermarking trace-back techniques,” in Proc. IEEE S&P , 2006, pp. 335–349.

[17] N. Kiyavash, A. Houmansadr, and N. Borisov, “Multi-flow attacks

against network flow watermarking schemes,” in Proc. USENIX Se-curity, 2008, pp. 307–320.

[18] X. Luo, P. Zhou, J. Zhang, R. Perdisci, W. Lee, and R. K. C. Chang,“Exposing invisible timing-based traf fic watermarks with BACKLIT,”

in Proc. ACSAC , 2011, pp. 197–206.[19] Z. Lin and N. Hopper, “New attacks on timing-based network flow

watermarks,” in Proc. USENIX Security, 2012, p. 20.[20] A. Bavier, M. Bowman, B. Chun, D. Culler, S. Karlin, S. Muir, L. Pe-

terson,T. Roscoe,T. Spalink, and M. Wawrzoniak, “Operatingsystems

support for planetary-scale network services,” in Proc. NSDI , 2004,vol. 1, p. 19.

[21] H.V. Poor , An Introduction to Signal Detection and Estimation. NewYork, NY, USA: Springer-Verlag, 1998.

[22] T. He and L. Tong, “Detecting encrypted stepping-stone connections,” IEEE Trans. Signal Process., vol. 55, no. 5, pp. 1612–1623, May 2007.

[23] B. Coskun and N. Memon, “Onlinesketching of network flowsfor real-time stepping-stone detection,” in Proc. ACSAC , 2009, pp. 473–483.

[24] R. Dingledine, N. Mathewson, and P. Syverson, “Tor: The second-gen-

eration onion router,” in Proc. USENIX Security, 2004, vol. 13, p. 21.[25] S. Kotz, T. Kozubowski, and K. Podgórski , The Laplace Distribu-

tion and Generalizations: A Revisit with Applications to Communica-tions, Economics, Engineering, and Finance, ser. Progress in Mathe-

matics. Cambridge, MA, USA: Birkhäuser, 2001.

[26] H. Chernoff, “A measure of asymptotic ef ficiency for tests of a hypoth-esis based on the sum of observations,” Ann. Math. Statist., vol. 23, pp.

493–507, 1952.

[27] C. Walsworth, E. Aben, K. C. Claffy, and D. Andersen, “The CAIDAanonymized 2009 Internet traces—January,” 2009 [Online]. Available:

http://www.caida.org/data/passive/passive_2009_dataset.xml[28] X. Gong, M. Rodrigues, and N. Kiyavash, “Invisible flow watermarks

for channels with dependent substitution and deletion errors,” in Proc. IEEE ICASSP , 2012, pp. 1773–1776.

Amir Houmansadr (S’09–M’13)received the Ph.D.

degree in electrical and computer engineering fromthe University of Illinois at Urbana–Champaign, Ur-

bana, IL, USA, in 2012.He is currently a Postdoctoral Researcher with

the Computer Science Department, University of

Texas at Austin, Austin, TX, USA. His researchrevolves around network security and privacy,

particularly network traf fic analysis, anonymouscommunications, censorship circumvention, and

covert channels.Dr. Houmansadr has received several awards, including the Best Practical

Paper Award at the IEEE Symposium on Security and Privacy 2013.

Negar Kiyavash (S’06–M’06–SM’13) received

the B.S. degree from the Sharif University of Technology, Tehran, Iran, in 1999, and the M.S.

and Ph.D. degrees from the University of Illinois atUrbana–Champaign, Urbana, IL, USA, in 2003 and

2006, respectively, all in electrical and computer engineering.

She is an Assistant Professor with the Department

of Industrial and Enterprise Systems Engineering(ISE), University of Illinois at Urbana–Champaign.

Her research interests are in information theory andstatistical signal processing with applications to computer, communication, and

multimedia security.

Dr. Kiyavash is a recipient of the NSF CAREER and AFOSR YIP awards.

Nikita Borisov (M’06) received the Ph.D. degree incomputer science from the University of California,

Berkeley, CA, USA, in 2005.He is an Associate Professor with the Department

of Electrical and Computer Engineering, Universityof Illinois at Urbana–Champaign, Urbana, IL, USA.

His researchinterests includeanonymity, network se-curity, and privacy.

Dr. Borisov is a recipient of the NSF CAREER

Award.

Documents

IEEE_ACM Transactions on Networking Volume 22 Issue 4 2014 [Doi 10.1109_TNET.2013.2272740] Houmansadr, Amir; Kiyavash, Negar; Borisov, Nikita -- Non-Blind Watermarking of Network Flows