11
Predictable Design of Network-Based Covert Communication Systems Ronald W. Smith, and G. Scott Knight Computer Security Laboratory Royal Military College Kingston, Ontario, Canada [email protected] [email protected] Abstract This paper presents a predictable and quantifiable ap- proach to designing a covert communication system capa- ble of effectively exploiting covert channels found in the var- ious layers of network protocols. Two metrics are developed that characterize the overall system. A measure of proba- bility of detection is derived using statistical inference tech- niques. A measure of reliability is developed as the bit er- ror rate of the combined noisy channel and an appropriate error-correcting code. To support reliable communication, a family of error-correcting codes are developed that handle the high symbol insertion rates found in these covert chan- nels. The system metrics are each shown to be a function of the covert channel signal-to-noise ratio, and as such the two can be used to perform system level design trade-offs. Validation of the system design methodology is provided by means of an experiment using real network traffic data. 1. Introduction Covert channels present an interesting problem in se- cure system design. Traditional covert channels and asso- ciated research focus primarily on multi-level secure sys- tems [6, 15, 19]. Recognizing that complete elimination of these hidden channels is impossible [20], the classic de- fense involves some means of ensuring that the bandwidth (throughput) is reduced to some arbitrarily low number [7]. Network-based covert channels differ from classical covert channels in that they exploit properties of computer net- work communication protocols. In 1987 Girling [11] is perhaps the first to report on network-based covert chan- nels. Over the next two decades several varieties of exploits within the various network protocols are revealed. Exam- ples include [1, 12, 28, 30, 37]; [13, 26] summarize sev- eral typical exploits. A rigorous analysis of network-based covert channels has not been pursued to the same extent as the classic covert channels, and defense against these newer covert channels is an open area of research. Notable excep- tions to this last statement are revealed in recent publica- tions from the US Naval Research Laboratory on channel capacity analysis [21, 22] and bandwidth restrictions [14]. It has been amply demonstrated that basic storage and timing covert channels exist that will support a low band- width one-way communication system in network commu- nications. However, none of the published research has offered a rigorous or unified approach to the design of a predictably quantifiable covert communication system. De- tectability of the channel is only expressed in qualitative terms. Informal capacity estimates are often cited, but an information theoretic analysis of the channel capacity and channel error types is typically omitted. The reliability of communications across the channel is rarely expressed or considered, and the application of coding theory has been ignored or ad hoc. Interesting system designs are presented in [1, 3, 4, 10, 31]; however, each suffers from one or more of the above criticisms, and each is based on a single spe- cific type of exploit. In this paper we present a general purpose methodology for network-based covert communication system design. Specifically we investigate highly stealthy, low-bandwidth applications. We demonstrate that a predictably measurable covert communication system can be designed that exploits certain predictable properties of network communications. We provide a sound engineering foundation for the design of undetectable and reliable communication systems hid- den within existing Internet traffic. A low-bandwidth covert channel is formed through exploitation of well-chosen net- work properties. Selection and design of an exploit is guided by the criteria to minimize probability of detection. Given an exploit design, the channel is characterized for both capacity and noise. Based upon the exploit parameters and the noise characteristics of the channel, an appropriate coding scheme is devised. The combination of these ac- tivities yields a covert communication system designed to 2008 IEEE Symposium on Security and Privacy 978-0-7695-3168-7 /08 $25.00 © 2008 Crown Copyright DOI 10.1109/SP.2008.26 311

[IEEE 2008 IEEE Symposium on Security and Privacy (sp 2008) - Oakland, CA, USA (2008.05.18-2008.05.22)] 2008 IEEE Symposium on Security and Privacy (sp 2008) - Predictable Design of

Embed Size (px)

Citation preview

Page 1: [IEEE 2008 IEEE Symposium on Security and Privacy (sp 2008) - Oakland, CA, USA (2008.05.18-2008.05.22)] 2008 IEEE Symposium on Security and Privacy (sp 2008) - Predictable Design of

Predictable Design of Network-Based Covert Communication Systems

Ronald W. Smith, and G. Scott KnightComputer Security Laboratory

Royal Military CollegeKingston, Ontario, Canada

[email protected]@rmc.ca

Abstract

This paper presents a predictable and quantifiable ap-proach to designing a covert communication system capa-ble of effectively exploiting covert channels found in the var-ious layers of network protocols. Two metrics are developedthat characterize the overall system. A measure of proba-bility of detection is derived using statistical inference tech-niques. A measure of reliability is developed as the bit er-ror rate of the combined noisy channel and an appropriateerror-correcting code. To support reliable communication,a family of error-correcting codes are developed that handlethe high symbol insertion rates found in these covert chan-nels. The system metrics are each shown to be a functionof the covert channel signal-to-noise ratio, and as such thetwo can be used to perform system level design trade-offs.Validation of the system design methodology is provided bymeans of an experiment using real network traffic data.

1. Introduction

Covert channels present an interesting problem in se-cure system design. Traditional covert channels and asso-ciated research focus primarily on multi-level secure sys-tems [6, 15, 19]. Recognizing that complete eliminationof these hidden channels is impossible [20], the classic de-fense involves some means of ensuring that the bandwidth(throughput) is reduced to some arbitrarily low number [7].Network-based covert channels differ from classical covertchannels in that they exploit properties of computer net-work communication protocols. In 1987 Girling [11] isperhaps the first to report on network-based covert chan-nels. Over the next two decades several varieties of exploitswithin the various network protocols are revealed. Exam-ples include [1, 12, 28, 30, 37]; [13, 26] summarize sev-eral typical exploits. A rigorous analysis of network-basedcovert channels has not been pursued to the same extent as

the classic covert channels, and defense against these newercovert channels is an open area of research. Notable excep-tions to this last statement are revealed in recent publica-tions from the US Naval Research Laboratory on channelcapacity analysis [21, 22] and bandwidth restrictions [14].

It has been amply demonstrated that basic storage andtiming covert channels exist that will support a low band-width one-way communication system in network commu-nications. However, none of the published research hasoffered a rigorous or unified approach to the design of apredictably quantifiable covert communication system. De-tectability of the channel is only expressed in qualitativeterms. Informal capacity estimates are often cited, but aninformation theoretic analysis of the channel capacity andchannel error types is typically omitted. The reliability ofcommunications across the channel is rarely expressed orconsidered, and the application of coding theory has beenignored or ad hoc. Interesting system designs are presentedin [1, 3, 4, 10, 31]; however, each suffers from one or moreof the above criticisms, and each is based on a single spe-cific type of exploit.

In this paper we present a general purpose methodologyfor network-based covert communication system design.Specifically we investigate highly stealthy, low-bandwidthapplications. We demonstrate that a predictably measurablecovert communication system can be designed that exploitscertain predictable properties of network communications.We provide a sound engineering foundation for the designof undetectable and reliable communication systems hid-den within existing Internet traffic. A low-bandwidth covertchannel is formed through exploitation of well-chosen net-work properties. Selection and design of an exploit isguided by the criteria to minimize probability of detection.Given an exploit design, the channel is characterized forboth capacity and noise. Based upon the exploit parametersand the noise characteristics of the channel, an appropriatecoding scheme is devised. The combination of these ac-tivities yields a covert communication system designed to

2008 IEEE Symposium on Security and Privacy

978-0-7695-3168-7 /08 $25.00 © 2008 Crown CopyrightDOI 10.1109/SP.2008.26

311

Page 2: [IEEE 2008 IEEE Symposium on Security and Privacy (sp 2008) - Oakland, CA, USA (2008.05.18-2008.05.22)] 2008 IEEE Symposium on Security and Privacy (sp 2008) - Predictable Design of

achieve predictable levels of both probability of detectionand reliability.

Section II describes a general scenario in which such asystem may be employed, and introduces the concept ofexploit signal-to-noise ratio. Section III presents the themetrics against which the system design can be measured.Mathematical representations for system detectability andreliability are derived. A unique family of trellis codes ispresented as a means of handling the high error rates ofthese channels. Section IV introduces a system validationexperiment that demonstrates the effectiveness of the designmethodology. Section V concludes the paper.

2. Background

2.1. Applications

Perhaps the most widely held view of a covert channelis of a guarded secret being “leaked” to an unintended au-dience. This view is consistent with the traditional exampleof a covert channel in a multi-level secure system where aTrojan Horse surreptitiously modulates some parameter ofthe system visible to some lower-level process.

The introduction of network-based covert channels in-creases the scope of vulnerable computing systems. Leak-ing sensitive information is still a primary goal of mostcovert channels; however it is not the only application.Traffic analysis is a potentially new application affordedby network-based covet channels; covert channels exposedwithin assumed secure and anonymous network commu-nications afford the opportunity for an observer to defeatsome level of the anonymity [21]. Covert channels openedin one layer of a network may also provide the vehicle forinfiltrating higher-level networks. Network-based covertchannels have also been demonstrated to provide new at-tacker tracing techniques [36, 9].

2.2. Exploits, Channels and Systems

In much of the literature a covert channel refers to boththe technique used to signal hidden messages as well as thechannel formed by such a technique. For the purposes ofthis research it is helpful to use separate terms for theseconcepts. The term covert exploit, or simply exploit, is usedto refer to the specific technique used to inject hidden datainto the network packet stream. The term covert channel,or simply channel, is used to refer to the type of theoreticalcommunication channel formed by a given technique. Toillustrate the utility of such a naming convention, considerthe following example. Manipulation of the lower order bitof the Transport Control Protocol timestamp field or the ur-gent flag of the control bits field are two potential covertchannel exploits, each with its own merits in terms of ease

Eve’s “tapped” Packet Stream

X→YX→Y

A→BX→Y

X→YX→Y

X→YX→Y

X→Y

X→YX→Y

X→YX→Y

X→YX→Y

X→YX→Y

X→Y

X→YX→Y

X→YX→Y

X→Y

X→YX→Y

X→YX→Y

X→Y

X→YX→Y

X→YX→Y

X→YX→Y X→Y

X→Y X→Y

X→YX→Y

X→YX→Y X→Y

X→Y

X→YX→YX→YX→YX→Y

X→YX→YX→YX→Y

X→YX→Y X→YX→Y

A→BX→Y

X→Y X→Y

X→YA→B

A→BX→Y

X→Y

VPN “anonymizers”

“Hi Bob”

Alice Bob

Enclave A Enclave B

Gateway A Gateway B

“Trojan Horse” packet modulator

Alice sends message, “Hi Bob”

The Internet

Warden Warden

Figure 1. A Representative Network-BasedCovert Channel

of implementation and detectability. However, both mightyield a very similar covert storage channel in terms of thechannel’s information theoretic properties.

A covert communication system is another term intro-duced within this research. It refers to the collective set ofthe covert exploit(s) and channel, the coding scheme, andany other modules necessary to effectively use the chan-nel for communications. The design of covert communica-tion systems is a central theme of this paper, and it is thesystems-based approach that, in part, makes this researchunique.

2.3. The Attack Scenario

An illustration of an Internet-based covert channel is pro-vided in Figure 1. Alice, from Enclave A, communicateslegitimately with Bob, from Enclave B. The enclaves usethe Internet as a cost effective communication link and addsome measure of security and anonymity, perhaps a virtualprivate network layer. Assume that a Trojan Horse is some-how installed on Alice’s computer. This malicious softwarenow modulates the stream of packets sent by Alice in sucha way as to allow an eavesdropper, Eve, to identify Alice’stransmissions from those of other hosts within enclave A. Ahidden message is then encoded in the modulation schemethus allowing secret information to be exfiltrated from theenclave. Further, assume that the enclaves are monitoredfor suspicious and malicious use of the secured network, bya warden. The warden is an abstraction of the monitoringagency whose job it is to ensure that the security of the en-clave(s) is not jeopardized. In the context of this research,the warden is assumed to be a powerful monitoring agency,one with significant computing resources; the specific as-sumptions are covered in Section III.

312

Page 3: [IEEE 2008 IEEE Symposium on Security and Privacy (sp 2008) - Oakland, CA, USA (2008.05.18-2008.05.22)] 2008 IEEE Symposium on Security and Privacy (sp 2008) - Predictable Design of

2.4. Signal-to-Noise Ratio

Consider the following generic storage-based exploit.The designer of the system chooses a protocol field for ex-ploit that contains NF total field values. From this fielda subset of NS distinct exploit symbols is chosen whereineach symbol is represented by some mapping to the exploitfield values. Covert signaling is then achieved by injectingexploit symbols into the network traffic according to an en-coding scheme. Natural occurrences of the exploit symbolsare assumed to exist in the traffic at the point in the net-work at which the eavesdropper is listening, and therefore,all observed exploit symbols are either natural or injected(signal). It is important that each exploit symbol have anon-zero occurrence in the natural traffic, otherwise detec-tion by the warden would be trivial once unnatural exploitsymbols appeared.

Let νs denote the natural proportion of exploit symbolsin normal traffic; ie, the proportion without any covert sig-naling present. Let ν∗s denote the proportion of covert (in-jected) symbols within the exploited network traffic. Dur-ing a period of covert signaling, the observable proportionof exploit symbols is νs + ν∗s , and the ratio ν∗s/νs becomesa key system design parameter; call this parameter the ex-ploit signal-to-noise ratio (SNR). A small SNR implies thatfew covert symbols are added to the network traffic, thusdetection of the signaling by the warden is difficult. How-ever, a small SNR also implies a slow transmission rate andpoor reliability of the received message by the eavesdrop-per since Eve will have difficulty discriminating the exploitsymbols from the naturally occurring ones. Conversely, avery large ν∗s/νs increases the transmission rate and the re-liability, but now at the expense of detectability.

3. Covert Communication System Design Pa-rameters

3.1. Detecting a Covert Channel

In this subsection a set of underlying assumptions aremade about the warden. Based upon these assumptions ameasure of the probability of detection is proposed that is afunction of the exploit signal-to-noise ratio.

It is impossible to know the extent to which any war-den will go to protect an enclave from covert channels. Acommon assumption under this scenario is that the wardenis no more suspicious of one host than another [10]. Theimplication being that no host within the enclave, includingthe covert sender, is subject to forensic investigation or isdeemed untrustworthy. Without this assumption no exploitis possible, as the sender Trojan may likely be found. A sec-ond assumption is that the warden attempts to detect covertmessaging by monitoring the traffic across the network. The

type of monitoring ranges from simple signature based de-tection schemes to statistical anomaly detection. Signaturebased detection techniques are relatively easy to implementand therefore must be assumed to exist on any network ofinterest to this study. Therefore, any exploit design must besignature-free. A wise warden will also realize this fact, andthus a further assumption is that any meaningful monitoringby the warden must involve anomaly detection techniques.Specifically this implies that the modulation of the exploitfield values must not produce a signature and must not ap-pear anomalous in order to be undetectable. It is arguedhere that traditional anomaly detection techniques will notdetect the covert channels proposed herein since by theirvery design they contain no signature and violate no proto-col usage. It is not sufficient however to suggest that theyare undetectable. Instead a measure of detectability is pro-posed under the assumption that if a warden suspected suchcovert channels, they would employ the most appropriateanomaly detection; even if that is not currently the practice.The detection of an anomaly is by definition a probabilisticevent, and it is with this in mind that detectability is ex-plored.

Application of traditional hypothesis testing techniquesto covert (steganographic) channels is not new [5]. A com-mon technique within statistical quality control involvescomparing some attribute(s) within a known process to theattribute in other instances of the process. This technique al-lows for subtle changes in the process to be detectable. Re-call that in the attack scenario above, the warden is assumedto possess powerful resources, and therefore it is assumedthat the monitoring techniques employed by the warden in-volve inspection of every field of every packet at every pro-tocol layer, and every value within each packet is treated asan inspected attribute. This assumption is extreme, but itprovides a good basis for asserting a near worst-case esti-mate of probability of detection.

Returning to statistical quality control, an operatingcharacteristic (OC) curve is used to describe the ability ofan inspection scheme to detect attribute shifts [35]. The OCcurve is a plot of the probability of accepting a hypothesisconcerning some known attribute versus the true measureof the attribute. To illustrate, consider the operating char-acteristic curves depicted in Figure 2. The graph illustratesa fraction defect control chart. The underlying process isassumed to have a known fraction defect of value p0, fromwhich an upper control limit is established; meaning thatif a sample is found to contain greater than the number ofdefects defined by the upper control limit, the process is as-sumed to be out of control. For this type of chart, the prob-ability that a sample is within the control limit is defined bythe binomial distribution [35], and therefore the probabilityof accepting a sample for some arbitrary value, p, given thetrue fraction defect, p0, and sample size, N , is given by

313

Page 4: [IEEE 2008 IEEE Symposium on Security and Privacy (sp 2008) - Oakland, CA, USA (2008.05.18-2008.05.22)] 2008 IEEE Symposium on Security and Privacy (sp 2008) - Predictable Design of

Figure 2. OC Curves for Different Values ofSample Size

pAccept(p) =UCL∑x=0

[Nx

]px(1− p)N−x, (1)

where

UCL =⌊N(p0 + κ

√(p0(1− p0)/N))

⌋.

From the operating characteristic curves several obser-vations are made. For a fixed underlying fraction defect,the probability of detecting an out of control process in-creases as sample size increases. The likelihood of falsenegatives, or the probability that an out of control process isaccepted, is given by the probability pAccept(p). The likeli-hood of false positives, or the probability that an in-controlprocess is found to be out of control, is given by the proba-bility 1−pAccept(p = p0). In the example the upper controllimit is based upon a fairly standard usage of 3-Sigma (κ=3)in control charting. Increasing the control limit lowers thefalse positives, but at the expense of decreasing the sensitiv-ity of the technique to detect changes in the fraction defect.

Making the parallel between the attack scenario and thestatistical quality control technique above, the known frac-tion defect in the context of the exploit is the natural fractionof each exploit symbol expected in normal traffic, νs/NS .The true fraction defect for each symbol during periods ofcovert transmission is νtrue, or (νs + ν∗s )/NS . This as-sumes that the exploit and message symbols are roughlyuniformly distributed. The probability that the warden de-tects the covert signaling, based upon a sample of the traf-fic during which time the covert signaling is fully present,is equivalent to the probability of rejecting the sample, or(1 − pAccept(νtrue)). Therefore the following expressionfor the probability of detecting an arbitrary exploit symbolis proposed,

pSymbolDetect(νtrue) =

1−UCL∑x=0

[Nx

](νtrue)x(1− νtrue)N−x, (2)

whereνtrue = (νs + ν∗s )/NS , and

UCL =⌊N

(νs

NS+ κ√νs/NS(1− νs/NS)/N

)⌋.

Prior to postulating a general expression for probabilityof channel detection, three considerations remain. A dis-tinction must be made between the probability of detectingan “out-of-control” symbol and the probability of detectingthe channel. A criteria for the selection of sample size mustbe determined. Finally, it is useful for system design if theprobability is expressed as a function of the exploit SNR.

Equation 2 represents the probability of detecting occur-rences of one type of exploit symbol. Since the covert usageof the channel involves a total of NS unique exploit sym-bols, the probability of channel detection is given by

pChannelDetect(νtrue) =

1− (1− pSymbolDetect(νtrue))NS . (3)

In the context of quality control, the inspection samplesize is chosen so as to maximize the likelihood of detectinga change in an observed process constrained by the classicconsumers and producers risks, and by the practicality ofthe physical inspection. In the case of the network exploit,it is assumed that the traffic is only out of control intermit-tently; in other words covert signaling only occurs whena message is being sent. Under this assumption, the war-den would want to choose sample size to coincide exactlywith the duration of the covert signaling. Sampling a largerset would dilute the increased fraction of exploit symbolswithin the normal traffic, and sampling a smaller set woulddecrease the probability of detecting the change. The war-den has no practical means of knowing such a size or whento begin the sampling window. However, for the purposesof maintaining a worst-case scenario, assume that the war-den does in fact sample at this size and time. Let No de-note this optimal sample size and let L denote the size ofthe message in number of symbols. Assuming the messagesymbols are roughly uniformly distributed, the fraction ofall signal symbols within a transmission window of the op-timal sample size is given by

ν∗s = L/No. (4)

Combining equations 3 and 4 with 2 and recalling by def-inition that ν∗s = νsSNR, yields an expression for proba-bility of detection for a given exploit as a function of signal-to-noise ratio for a given message size, and the number of

314

Page 5: [IEEE 2008 IEEE Symposium on Security and Privacy (sp 2008) - Oakland, CA, USA (2008.05.18-2008.05.22)] 2008 IEEE Symposium on Security and Privacy (sp 2008) - Predictable Design of

unique exploit symbols,

pChannelDetect(SNR) =

1− (1− pSymbolDetect(SNR))NS , (5)where

pSymbolDetect(SNR) = 1−UCL∑x=0

[Nox

](νs(1+SNR)NS

)x(1−νs(1+SNR)NS

)No−x

UCL =⌊νs + κ

√νs(1−νs)/No

⌋, and

No = bL/(νsSNR)c .

In practice, for any chosen exploit field there will besome limited number of values suitable for exploitation,NS , representing some total natural exploit symbol propor-tion, νs. The probability of detection becomes strictly afunction of the ratio of injected symbols to natural symbols(SNR) and the message size.

3.2. Symbol Insertion Error-CorrectingCodes

Define channel noise as “any unwanted signal or effectin addition to the desired signal” [33]. Noise sources arevaried and depend wholly upon the medium of the chan-nel. Considering the typical exploit scenario described inSection 2, two types of noise may be present on the covertchannel. Foremost, there will be symbol insertions since allsuggested exploits use naturally occurring exploit field val-ues as potential signals. Given the intent to use the channelwith low probability of being detected, symbols arriving atthe receiver will therefore be some mix of signal and noise;the greater the stealth, the greater the probability of symbol-insertion noise events.

The other type of noise occurs as a result of packet loss inthe underlying (intended) channel. A dropped packet man-ifests itself as a symbol deletion. Packet reordering withinthe underlying channel will similarly yield symbol reorder-ing noise in some covert channels. Note that symbol inser-tion and symbol deletion are sufficient to describe all noisetypes on this channel. Symbol reordering is not needed asa separate category since it can always be described as acombination of deletions and insertions.

The rates of packet loss and packet reordering are fairlywell studied [2, 23, 38]. Typical packet loss values rangefrom 0.1 to 1% while packet reordering typically rangesfrom 0.6 to 2%. To some extent the effects of these types ofnoise are controllable by the selection and design of the ex-ploit. However, in order to achieve a high level of stealthytransmission, the injected symbols must remain sparse ascompared to those occurring naturally. As a result the sym-bol insertion rate will be orders of magnitude larger than the

deletion and reorder rates above, and thus the overwhelm-ing majority of all noise in the channel will be symbol in-sertions.

Analyzing the attack scenario, it is clear that no com-munication is permitted from the eavesdropper back to thesender. This dictates that a forward error correcting (FEC)scheme is required. [8, 16, 24, 25, 32] offer various bit in-sertion block coding schemes, but are not appropriate forhigh rates of symbol insertion. Further, trellis codes, ofwhich convolutional codes are the most prevalent, offer sev-eral advantages over block coding. Trellis codes employthe use of memory to improve the error-correcting capabil-ity of codes [29] over that of block codes. The use of atrellis code allows for a natural mapping of the encoder out-put bits (code words) onto the exploit symbols, and thus theoutput of the channel (and input to the decoder) can be in-terpreted as a stream of symbols, not bits. Finally, trelliscodes have an inherent ability to self-synchronize; that is tosay that a matched decoder can correctly decode a stream ofreceived blocks without knowledge of the beginning state ofthe code [34].

Combining the natural symbol usage with the inherentability to self-synchronize, a convolutional coding schemecan be expected to yield a design that allows for decodingto automatically synchronize anytime the received streamof symbols is error free for at least the constraint length ofthe code. This is a major advantage for a system where thereceiver has no knowledge of when the sender may senda message, and one where the noise is predominantly in-sertions. The remaining problem is to find a set of goodconvolutional codes.

Convolutional codes are a well studied category of trelliscodes. A careful study of known good convolutional codeswill reveal that without significant modification, they arenot well suited for symbol insertion correction despite theadvantages listed above. Refer to the state machine view ofthe representative known good convolutional code [17] inFigure 3. This code is said to be a (n = 4, k = 1,m = 3)code; by convention the code has k input bits, n output bits,and 2m states. Three undetectable symbol insertion errormodes are identified for codes of this construction.

a ) Memory-loss errors occur when symbol insertionscause the code to lose memory as a result of a shortpath-to-self. From an arbitrary state, once a codereturns to that state, any memory and thus any er-ror correction ability, is lost. For example, fromsome arbitrary state of the decoder, the right seriesof symbol insertions can cause the decoder to movethrough transitions that return it to that same state.The shorter these paths-to-self, the higher the proba-bility that this associated series of symbol insertionsmay occur. Memory-loss errors cause a loss of syn-chronization within the decoded message as indis-

315

Page 6: [IEEE 2008 IEEE Symposium on Security and Privacy (sp 2008) - Oakland, CA, USA (2008.05.18-2008.05.22)] 2008 IEEE Symposium on Security and Privacy (sp 2008) - Predictable Design of

1/1001

1/11111/1110

1/0001

1/0111

1/1000

1/0000

1/0110

0/1111

0/0000

0/1001

0/0110

0/0111

0/1000

0/1110

0/0001

S0

S1

S2

S3

S4

S5

S6

S7

Figure 3. Representation of a Known Good(4,1,3) Code - Finite State Machine

tinguishable “extra symbols” are added to the de-code sequence. The code in Figure 3 is very proneto memory-loss errors as seen by the self-transitions(path-to-self of length 1) in states S0 and S7.

b ) Equal-alternate-path errors occur when symbol in-sertions introduce an alternate but equal length pathcausing an indistinguishable alternate decode se-quence. Equal-alternate-path errors cause a finitenumber of message bit errors, but do not cause a lossof synchronization within the decoded message asthere are no extra symbols added to the decode se-quence.

c ) Unequal-alternate-path errors occur when symbolinsertions introduce an alternate but unequal lengthpath causing an indistinguishable alternate decodesequence. Unequal-alternate-path errors do cause aloss of synchronization within the decoded messageas there are either extra or fewer symbols added tothe decode sequence.

This categorization of undetectable error modes is a keyconsideration of code construction and theoretical reliabil-ity prediction. A family of trellis codes is presented nextthat are capable of handling the three symbol insertion er-ror modes above. For brevity, the full development of thesecodes and the derivation of the reliability expression willnot be explored here. However, evidence of the capabilities

S0

S3 S2

S1

S4

S7 S6

S5

S8

S11 S10

S9

S12

S15 S14

S13

S16

S19 S18

S17

S20

S23 S22

S21

S24

S27 S26

S25

S28

S31 S30

S29

-1 column

+1 column

R0

R1

R2

R3

R4

R5

R6

R7

Figure 4. Toroid of Squares Code / (6,1,5)

of the codes and the validity of their theoretical reliability isoffered in the next section.

The construction of the state machines characterizing thesymbol insertion error correcting codes begins with the cre-ation of a ring of states of size r, defined as a clockwisezero-path ring. The code is then constructed of some num-ber of such rings joined together by one-paths. This impliesthat any zero input bit results in a transition of the code tostates around a ring, and any one input bit results in a tran-sition of the code to another ring. Toroidal structures canbe formed by linking the rings in such a way that the one-paths wrap to complete the surface of the structure. Thetotal number of states of such a code is restricted to be anexact multiple of r.

The code depicted in Figure 4 is the first of a family ofcodes constructed as toroidal structures and is referred toas a Toroid of Squares code; the name emphasizes its ringshape and size, as well as the overall code structure. Notethat the code is a trellis code, and not a convolutional code.Also note that standard code size notation (n, k,m) is con-tinued for convenience, even though a trellis code need notnecessarily use all 2m states. Let the ring (square) in theupper left corner be denoted as R0, the one below it R1,the ring to its right R2, and so on. By construction, ev-ery zero-path ring forms a path-to-self of exactly 4 transi-tions. Table 1 enumerates the one-paths originating fromeach ring R0 state. Every one-path also forms a path-to-selfof exactly 4 transitions. Note the diversity of rings traversedfor each one-path. It can be shown that from any state theminimum path-to-self, equal-alternate-path, and unequal-alternate-path is no less than 4 transitions. Therefore, thiscode is not vulnerable to an undetectable error mode as aresult of any combination of 3 or less symbol insertions.

A family of toroidal codes can be constructed by varyingthe size (shape) of the rings and total state space. Intuitivelythe error-correcting capability of the codes increases withincreasing ring and state sizes. Several codes of varyingsizes have been developed by the authors. Two additional

316

Page 7: [IEEE 2008 IEEE Symposium on Security and Privacy (sp 2008) - Oakland, CA, USA (2008.05.18-2008.05.22)] 2008 IEEE Symposium on Security and Privacy (sp 2008) - Predictable Design of

Start One-Path Ring Traversal LengthS0 S0S31S22S25S0 R0R7R5R6R0 4S1 S1S8S7S30S1 R0R2R1R7R0 4S2 S2S5S12S11S2 R0R1R3R2R0 4S3 S3S26S29S4S3 R0R6R7R1R0 4

Table 1. Enumerated Ring0 One-Paths of theToroid of Squares Code

codes are presented in Appendix A.

3.3. Reliability

The capability of a code is generally defined as its abilityto detect and correct for errors [27, 17]. This capability ofthe code, along with an analysis of the channel, can be com-bined to provide an estimate of the reliability of the code,generally expressed as a bit error rate (BER) [29]. Note thatcode reliability is a function of the noisiness of the chan-nel and it provides a bound on the BER, below which nodecoder can achieve guaranteed error free decoding.

In the previous subsection a family of trellis codes wereintroduced as capable of handling increasing numbers ofsymbol insertions. The regularized construction of thecodes affords that a general expression for reliability is at-tainable. Specifically each code is designed to be capable ofdetecting and correcting errors resulting from I or less con-secutive insertions; beyond I consecutive insertions, unde-tectable errors may arise. A key factor in determining theBER of the code is therefore the probability that an unde-tectable error event will occur. Intuitively the probability ofsuch events is driven by the “size” of the code, and the SNRof the channel.

Assume that at any arbitrary time at the receiver, the cur-rent state of the decoder is known. The next symbol re-ceived can then be characterized as either valid or invalid.Valid refers to a received symbol that leads from the cur-rent state to a valid next state; invalid refers to a receivedsymbol that does not lead to a valid next state. The valid re-ceived symbols can be further subdivided into the one thatleads to the correct next state and those that lead to a validbut incorrect next state. The valid symbols can also be sub-divided into those that originated as signals and those thatoriginated as noise. Mapping the selected exploit symbolsonto the code there are 2n, or NS , total symbols, and leav-ing any state there are 2k valid symbols. Let ps and pvi

denote the probability that a received valid symbol is eithera signal or noise, respectively. Each probability can be ex-pressed in terms of the channel SNR:

ps = 1/(1 + 2k/(NS ∗ SNR)), (6)

and

pvi = 1/(1 + (NS ∗ SNR)/2k). (7)

Further, under the assumption that all sequences of morethan I incorrect valid symbols received before I or less sig-nals will cause an undetectable error event, the probabilityof an undetectable error event can be computed as follows:

p(U) =2I+1∑

j=I+1

[2I+1j

]pj

vi p2I+1−js . (8)

The expression in Equation 8 is independent of code con-struction beyond the size and its symbol insertion capabil-ity rating, I; in fact, it assumes that the code is capableof handling up to exactly I symbols insertions for everypath from every state. Therefore a means is desired wherean estimate of reliability can be provided that accounts forthe uniqueness of each specific code. Considering that allmemory-loss errors result in an undetectable error, and un-der the simplifying assumption that all equal- and unequal-alternate-paths lead to undetectable errors, a new estimatecan be provided for the overall probability of undetectableerrors:

p∗(U) = ρvulnerability

2I+1∑j=I+1

[2I+1j

]pj

vip2I+1−js , (9)

where ρvulnerability represents the fraction of paths oflength I + 1 that make the code vulnerable to undetectableerror modes.

In arriving at a conservative estimate of the bit error ratethree further assumptions are applied. The first assumptionis that, on average, undetectable errors will be considered tooccur at uniformly distributed locations in the received mes-sage. The second assumption is that all undetectable errorslead to loss of synchronization beyond the point in the mes-sage of the error. This assumption allows for an estimate tobe determined without enumerating all possible combina-tions of error modes when multiple undetectable errors dooccur. The third assumption recognizes the fact that whenan undetectable error does occur and the decoding is out ofsynchronization or otherwise faulty, there remains a randomchance that each bit is still decoded correctly. Under theseassumptions, the following estimate is provided for the biterror rate of the designed codes:

BER =1

2kL

L∑i=1

iL

i+ 1[Li

]p∗(U)i(1− p∗(U))L−i.

(10)

To better understand the equation, each major term isrevisited:

317

Page 8: [IEEE 2008 IEEE Symposium on Security and Privacy (sp 2008) - Oakland, CA, USA (2008.05.18-2008.05.22)] 2008 IEEE Symposium on Security and Privacy (sp 2008) - Predictable Design of

i) the inner[Li

]p∗(U)i(1 − p∗(U))L−i term repre-

sents the likelihood of having exactly i undetectableevents times the number of ways in which this exactnumber could occur in a message of size L;

ii) the (iL/(i + 1)) term estimates the number of bitsin the message which are subject to loss of synchro-nization under the assumption that the i error eventsare uniformly distributed throughout the message;

iii) 1/L normalizes to yield bit error rates; and

iv) 1/2k accounts for the fact that even under the as-sumption of lost synchronization, the decoder stillhas this random chance of getting the bit correct.

3.4 Representative Symbol Insertion Er-ror Correcting Codes with Results

The theoretically predicted and experimental BERs fora sample of the developed codes is presented next. Thefirst code illustrated in Figure 5 is referred to as a Toroidof Hexagons code. This binary (input) code consists of 10hexagons for a total of 60 states and 120 output symbols.Each hexagon (ring) defines the zero-paths, and thus theminimum length zero-path is 6. Again the one-paths pro-vide the transitions from hexagon to hexagon, with thesetraversals designed so as to yield long paths-to-self as wellas long equal- and unequal-alternate paths, again of length6 or longer. The predicted and empirical reliability of thiscode is provided in Figure 6.

-1 column

+1 column

+2 rows

-2 rows -2 rows

*Hexagon rings all rotate clockwise

*

Figure 5. Toroid of Hexagons / (7,1,6)

The next code presented in Figure 7 is the Square Toroidof Octagons code, a binary code consisting of a square gridof 16 octagons with a total of 128 states and 256 output

Figure 6. Theoretical and Empirical Reliability- Toroid of Hexagons

symbols. As the ring size and the total state size increases sotoo do the path lengths, yielding codes with ever-increasingsymbol insertion error correction capabilities. This is evi-denced in the reliability results presented in Figure 8.

*Octagon rings all rotate clockwise

Version 1: 0 column offset Version 2: -2 column offset

*

Version 1: 0 column offset Version 2: +2 column offset

Figure 7. Square Toroid of Octagons / (8,1,7)

4. A System Design with Experimental Results

This section will describe a system design and an ex-periment that illustrates the general steps to the design of acovert communication system and validates the theoreticalresults. In summary, the steps of the design process are: se-lection of the exploit field, selection of an appropriate setof exploit symbols from within the exploit field, characteri-zation of the exploit, characterization of the channel, selec-tion of an appropriate code, and selection of an appropriate

318

Page 9: [IEEE 2008 IEEE Symposium on Security and Privacy (sp 2008) - Oakland, CA, USA (2008.05.18-2008.05.22)] 2008 IEEE Symposium on Security and Privacy (sp 2008) - Predictable Design of

Figure 8. Theoretical and Empirical Reliability- Square Toroid of Octagons

802.11 - Observed Encrypted Data (WPA2-AES) Packet Traffic - Single PC Outbound

0

50

100

150

200

250

300

350

400

450

80 129

178

227

276

325

374

423

472

521

570

619

668

717

766

815

864

913

962

1011

1060

1109

1158

1207

1256

1305

1354

1403

1452

1501

1550

1599

Packet Length

# of

occ

uran

ces

Figure 9. Representative Packet Sizes inWireless Encrypted Traffic [18]

message length. The system is validated by transmitting ar-bitrary messages across the channel and observing the em-pirical results as compared to those predicted.

The exploit field chosen for the experiment is the 16-bit packet size field of a link layer protocol implementationfor encrypted wireless data. Figure 9 depicts the numberof occurrences of each packet size value for a period ofapproximately 40 minutes of traffic as recorded outboundfrom a single host. Approximately 35,000 total packetswere recorded.

An analysis of the packet size data reveals a set of about260 good candidate packet sizes for use as exploit symbols.Among the set of candidate symbols, the probability of oc-currence of each symbol ranges from 0.000266 to 0.000387with an average of 0.000328. The resultant covert channelis typical of that described above, namely that it is charac-terized as one where the predominant error type is that ofsymbol insertions.

The selection of an appropriate code is guided by thetotal available candidate exploit symbols. Ideally the code

Figure 10. Predicted System Results as aFunction of SNR

will use as many of the 260 exploit symbols as possible. A(8, 1, 7) code is the most obvious choice, since the maxi-mum number of output symbols is given by 28, or 256. Forthe experiment, the Square Toroid of Octagons from Ap-pendix A is a good choice as it uses the full 256 exploitsymbols and is capable of handling a high number of in-sertions, namely 7. The total number of exploit symbols,Ns, is known and the proportion of naturally occurring ex-ploit symbols, νs, is computed as the number of exploitsymbols times the average probability. Specifically, νs =0.0839. The last system design parameter to be selected isthe message length. A good heuristic for choosing the mes-sage length is to choose a value that is half the total numberof exploit symbols, or 2n−1. The message length is there-fore set at 128 bits.

From the design decisions above, the system metrics cannow be predicted. Figure 10 illustrates the predicted prob-ability of detection and reliability as a function of exploitsignal-to-noise ratio. The operational SNR is defined as thecross-over between detectability and reliability. From thegraph this is determined to be 0.045; where the probabilityof channel detection is estimated to be 0.81% and the biterror rate is estimated to be 0.36%.

With an operational signal-to-noise ratio of 0.045 thisimplies that the sending host injects exploit symbols at arate of 45 per 1000 naturally occurring exploit symbols. Therate of natural occurrence of exploit symbols is the overallexploit symbol proportion, νs, and from above is 8.39%.Therefore overall injection rate becomes 0.00378. Inter-preting this result, a symbol injector should release signals(symbols) into the background traffic at a rate of about 38per 10000 packets. Completing the analysis, it is knownfrom Equation 5 that the optimal sample size for the war-den is given by

No = bL/(νsSNRo)c ≈ 33, 900 symbols(packets).

319

Page 10: [IEEE 2008 IEEE Symposium on Security and Privacy (sp 2008) - Oakland, CA, USA (2008.05.18-2008.05.22)] 2008 IEEE Symposium on Security and Privacy (sp 2008) - Predictable Design of

Implementing the experiment involves transmitting anarbitrary 128-bit message across this representative chan-nel. A channel encoder uses the Square Toroid of Octagonscode to encode the message. A channel modulator mapsthe outputs of the encoder (code symbols) one-to-one ontothe chosen exploit symbols. A symbol injector releases onesignal for roughly every 263 packets of background traffic.At the receiver, a channel demodulator extracts from thereceived packets only those containing exploit symbols; inother words only those where packet size values are fromthe exploit symbol set. A channel decoder takes as inputthe sequence of exploit symbols and decodes until it findsa message of length 128; the message length is an assumedshared secret between sender and receiver.

The experiment was run 100 times. The empirical reli-ability was computed as the total number of bits decodedcorrectly over the total number of bits decoded. From thisthe observed BER was 0.25%. To confirm the probabilityof detection the observed number of each exploit symbolwas recorded and then compared to the upper control limit,UCL, from Equation 5. If any exploit symbol count ex-ceeded the UCL, it is considered an indicator of detection.For each trial, the observed number of each exploit symbolwithin the optimal sample size was compared to the UCL.Over the 100 trials, a total of 0 detections were observedacross all exploit symbols. For the 100 trials conducted, themaximum counts of any exploit symbol ranged from 17 to20, and in only a single instance were 20 exploit symbolscounted; 65% of the time the highest count was 18, wellbelow the UCL limit of 21.

5. Conclusion

A unique design methodology for network-based covertcommunication systems has been proposed. A quantita-tive measure of the probability of detection has been devel-oped for a generic storage-based network covert channel.A general methodology for the design of error-correctingcodes for high numbers of symbol insertion errors has beendemonstrated, and a family of error-correcting codes havebeen developed based upon the methodology. A generalexpression for the reliability of the family of codes hasbeen developed. Finally, a system design experiment hasdemonstrated the feasibility and predictability of the ap-proach; the BER was predicted to be 0.36% while observedto be 0.25%, and the detectability was predicted to be 0.81%while observed to be 0%.

An expression for the efficiency (or throughput) of thesystem has also been derived by the authors. The type ofchannel formed by the experimental design yields a lowsystem efficiency; at most one bit of message is transmittedin each packet sent. Designs of both higher efficiencyexploits and codes is currently being pursued. Extensions

to the trellis codes to support extremely low levels of SNRare being investigated. Techniques for detecting these typesof channels are also being explored.

Acknowledgment The authors would like to thankDr Paul Van Oorschot of Carleton University for hisinsightful comments on this work. This work was sup-ported, in part, by MITACS through its research projectentitled, Understanding and Mitigating Malicious Activityin Networked Computer.

References

[1] C. Abad. Ip checksum covert channels and se-lected hash collision. Web Site. http://www.gray-world.net/papers/ipccc.pdf, last visited April 2006.

[2] M. Allman and V. Paxson. On estimating end-to-end net-work path properties. In ACM SIGCOMM 99, Sept. 1999.

[3] K. Ashan. Covert channel analysis and data hiding inTCP/IP. Masters thesis, Department of Electrical and Com-puter Engineering, University of Toronto, 2002.

[4] S. Cabuk, C. Brodley, and C. Shields. IP covert timing chan-nels: Design and detection. CCS 2004, Oct. 2004.

[5] C. Cachin. An information-theoretic model for steganog-raphy. In D. Aucsmith, editor, Proceedings of 2nd Work-shop on Information Hiding, volume 1525, Portland, Ore-gon, USA, May 1998. Lecture Notes in Computer Science,Springer.

[6] T. N. C. S. Center. A guide to understanding covert channelanalysis of trusted systems. NCSC-TG-030, Library No.S-240,572, Version 1, Fort George G. Meade, Maryland, Nov.1993.

[7] N. Computer Security Center. Covert channel capac-ity limit policy. NCSC Interpretation of TCSEC0113.http://www.radium.ncsc.mil/tpep/library/interps/0113.html,last visited June 2004.

[8] M. Davey and D. MacKay. Reliable communication overchannels with insertions, deletions and substitutions. IEEETransactions on Information Theory, 47(2):687–692, Feb.2000.

[9] G. Drawson. Temporal interval modulation: An investiga-tion into the modulation of packet timings as a means ofgenerating a detectable watermark. Masters thesis, RoyalMilitary College of Canada, Apr. 2002.

[10] J. Giffin, R. Greenstadt, P. Litwack, and R. Tibbetts. Covertmessaging through TCP timestamps. In Workshop on Pri-vacy Enhancement Technologies (PET), Apr. 2002.

[11] C. Girling. Covert channels in lan”s. IEEE Transactions onsoftware Engineering, SE13(2):292–296, Feb. 1987.

[12] T. Handel and M. Sandford. Hiding data in the OSI networkmodel. In First International Workshop on Information Hid-ing, June 1996.

[13] D. Hintz. Channels in TCP and IP Headers. PowerPoint Pre-sentation. http://guh.nu/projects/cc/, last visited November2007.

[14] M. Kang, I. Moskowitz, and D. Lee. A network versionof the pump. In Proceedings of the IEEE Symposium onSecurity and Privacy, pages 144–154, 1995.

320

Page 11: [IEEE 2008 IEEE Symposium on Security and Privacy (sp 2008) - Oakland, CA, USA (2008.05.18-2008.05.22)] 2008 IEEE Symposium on Security and Privacy (sp 2008) - Predictable Design of

[15] B. Lampson. A note on the confinement problem. Commu-nications of the ACM, 1973.

[16] V. I. Levenshtein. Binary codes capable of correcting dele-tions, insertions, and reversals. Doklady Akademii NaukSSSR, 163(4):845–848, 1965. English translation in SovietPhysics Doklady, vol. 10 no. 8 pp:707710, 1966.

[17] S. Lin and D. C. Jr. Error Control Coding: Fundamentalsand Applications. Prentice-Hall, New Jersey, 1983.

[18] P. Martin. Covert channels in secure wireless networks.Master’s thesis (draft) not available as of March 2007, RoyalMilitary College of Canada, 2007.

[19] J. Millen. 20 years of covert channel modeling and analy-sis. In IEEE Symposium on Security and Privacy, Oakland,California, May 1999.

[20] I. Moskowitz and M. Kang. Covert channels - here to stay?In Proceedings COMPASS ’94, pages 235–243, Gaithers-burg, MD, 1994.

[21] I. Moskowitz, R. Newman, D. Crepeau, andA. Miller. Covert channels and anonymizing net-works. In Proceedings of the Workshop on Privacyin the Electronic Society (WPES 2003), Oct. 2003.http://chacs.nrl.navy.mil/publications/CHACS/2003/2003moskowitz-acm5.pdf,last visited April 2006.

[22] I. Moskowitz, R. Newman, and P. Syverson. Quasi-anonymous channels. In Proceedings CNIS 2003, Dec.2003. http://chacs.nrl.navy.mil/publications/CHACS/2003/2003moskowitz-5 3.pdf,last visited April 2006.

[23] V. Paxson. End-to-end internet packet dynamics.IEEE/ACM Transactions on Networking, 7(3):277–292,June 1999. An earlier version appeared in Proc. ACM SIG-COMM 9́7, September 1997, Cannes, France.

[24] E. Ratzer. Marker codes for channels with insertions anddeletions. to be presented at the 3rd International Sympo-sium on Turbo Codes and Applications, Munich, Germany,Apr. 2006.

[25] E. Ratzer and D. MacKay. Codes for channels with inser-tions, deletions and substitution. In In 2nd InternationalSymposium on Turbo Codes and Related Topics, pages 149–156, Apr. 2000.

[26] F. Raynal. Covert channels. PowerPoint Presen-tation. http://www.securitylabs.org/Download/lsm2003-Raynal.pdf.gz, last visited April 2006.

[27] S. Roman. Introduction to Information and Coding Theory.Springer Verlag, 1996.

[28] C. Rowland. Covert channels in the TCP/IP Protocol Suite.first Monday - a peer-reviewed journal on the Internet, 2(5),May 1997.

[29] R.Togneri and C. deSilva. Fundamentals of InformationTheory and Coding Design. Chapman & Hall, 2002. TdS02.

[30] M. V. S.D. Servetto. Communication using phantoms:Covert channels in the internet. In Proceedings of the IEEEInternational Symposium on Information Theory (ISIT),Washington, DC, June 2001.

[31] G. Shah, A. Molina, and M. Blaze. Keyboards and covertchannels. In Proceedings of the 15th USENIX Security Sym-posium, Vancouver, Cananda, Aug. 2006. ACM Press.

[32] T. Swart and H. Ferreira. A note on double insertion/deletioncorrecting codes. IEEE Transactions on Information Theory,49(1):269–273, Jan. 2003.

[33] R. Togneri and C. deSilva. Fundamentals of InformationTheory and Coding Design. Chapman & Hall, 2002.

[34] A. Viterbi and J. Omura. Principles of Digital Communica-tions and Coding. McGraw-Hill, New York, 1979.

[35] H. Wadsworth, K. Stevens, and A. Godfrey. Modern Meth-ods for Quality Control and Improvement. John Wiley &Sons, 1986.

[36] X. Wang, D. Reeves, S. Wu, and J. Yuill. Sleepy watermarktracing: An active network-based intrusion response frame-work. In Proceedings of the IFIP TC11 Sixteenth AnnualWorking Conference on Information Security, pages 369–384, June 2001.

[37] M. Wolf. Covert channels in LAN protocols. In Proceedingsof the Workshop on Local Area Network Security, pages 91–102, 1989.

[38] Y. Zhang, N. Duffield, V. Paxson, and S. Shenker. On theconstancy of internet path properties. In Proceedings ofthe ACM SIGCOMM Internet Measurement Workshop, Nov.2001.

321