88
Information Theory and Communications CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey Spring 2007 Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 1 / 44

Information Theory and Communications

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Information Theory and Communications

Information Theory and CommunicationsCSM25 Secure Information Hiding

Dr Hans Georg Schaathun

University of Surrey

Spring 2007

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 1 / 44

Page 2: Information Theory and Communications

Learning Outcomes

become familiar with fundamental concepts in communicationsEntropy and RedundancyError-control codingCompression

be able to link communications fundamentals to steganography

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 2 / 44

Page 3: Information Theory and Communications

Communications essentials Communications and Redundancy

Outline

1 Communications essentialsCommunications and RedundancyDigital CommunicationsShannon EntropySecurityPrediction

2 CompressionRecollectionHuffmann CodingHuffmann Steganography

3 Grammars

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 3 / 44

Page 4: Information Theory and Communications

Communications essentials Communications and Redundancy

The communications problem

m m̂//

Noisychannel

// //Enc. Dec.// c // r // //

Alice Bob

Bob’s problemEstimate m,given (partly) random output m̂ from the channel

How much (un)certainty does Bob have about m?Information theory and Shannon entropy.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 4 / 44

Page 5: Information Theory and Communications

Communications essentials Communications and Redundancy

The communications problem

m m̂

//

Noisychannel

// //

Enc. Dec.// c // r // //

Alice Bob

Bob’s problemEstimate m,given (partly) random output m̂ from the channel

How much (un)certainty does Bob have about m?Information theory and Shannon entropy.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 4 / 44

Page 6: Information Theory and Communications

Communications essentials Communications and Redundancy

The communications problem

m m̂

//

Noisychannel

// //

Enc. Dec.// c // r // //

Alice Bob

Bob’s problemEstimate m,given (partly) random output m̂ from the channel

How much (un)certainty does Bob have about m?Information theory and Shannon entropy.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 4 / 44

Page 7: Information Theory and Communications

Communications essentials Communications and Redundancy

The communications problem

m m̂

//

Noisychannel

// //

Enc. Dec.// c // r // //

Alice Bob

Bob’s problemEstimate m,given (partly) random output m̂ from the channel

How much (un)certainty does Bob have about m?Information theory and Shannon entropy.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 4 / 44

Page 8: Information Theory and Communications

Communications essentials Communications and Redundancy

The communications problem

m m̂

//

Noisychannel

// //

Enc. Dec.// c // r // //

Alice Bob

Bob’s problemEstimate m,given (partly) random output m̂ from the channel

How much (un)certainty does Bob have about m?Information theory and Shannon entropy.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 4 / 44

Page 9: Information Theory and Communications

Communications essentials Communications and Redundancy

The communications problem

m m̂

//

Noisychannel

// //

Enc. Dec.// c // r // //

Alice Bob

Bob’s problemEstimate m,given (partly) random output m̂ from the channel

How much (un)certainty does Bob have about m?Information theory and Shannon entropy.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 4 / 44

Page 10: Information Theory and Communications

Communications essentials Communications and Redundancy

Redundancy of English

FactThe English language is more than 50% redundant.

from http://www.cdt.org/crypto/glossary.shtml

Message destroyed on the channelRedundancy allows Bob to determine the original m.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 5 / 44

Page 11: Information Theory and Communications

Communications essentials Communications and Redundancy

Redundancy of English

FactThe English language is more than 50% redundant.

t** p*oce*s o**hid**g *ata**nsid* o*her**ata. For ex*****, a **xt f*lec**ld*** hid*** "in**de"****im*ge or***s**nd *ile* By look****at t*eim*g***or list***** to th**s**nd,*yo* w*u*d n*t *no**that***ere is *x*rainfo******* *r*sent.

from http://www.cdt.org/crypto/glossary.shtml

Message destroyed on the channelRedundancy allows Bob to determine the original m.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 5 / 44

Page 12: Information Theory and Communications

Communications essentials Communications and Redundancy

Redundancy of English

FactThe English language is more than 50% redundant.

t** p*oce*s o**hid**g *ata**nsid* o*her**ata. For ex*****, a **xt f*lec**ld*** hid*** "in**de"****im*ge or***s**nd *ile* By look****at t*eim*g***or list***** to th**s**nd,*yo* w*u*d n*t *no**that***ere is *x*rainfo******* *r*sent.

from http://www.cdt.org/crypto/glossary.shtml

Message destroyed on the channelRedundancy allows Bob to determine the original m.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 5 / 44

Page 13: Information Theory and Communications

Communications essentials Communications and Redundancy

Redundancy of English

FactThe English language is more than 50% redundant.

t*e p*oce*s o* hid**g *ata*insid* o*her*data. For ex*m***, a t*xt f*lec**ld*b* hidd** "ind*de" a**im*ge or*a*s*und *ile* By look**g*at t*eim*g*,*or list**in* to th* s**nd,*yo* w*uld n*t *no**that *here is *x*rainfo*****on *r*sent.

from http://www.cdt.org/crypto/glossary.shtml

Message destroyed on the channelRedundancy allows Bob to determine the original m.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 5 / 44

Page 14: Information Theory and Communications

Communications essentials Communications and Redundancy

Redundancy of English

FactThe English language is more than 50% redundant.

the process of hiding data inside other data. For example, a text filecould be hidden "inside" an image or a sound file. By looking at theimage, or listening to the sound, you would not know that there is extrainformation present.

from http://www.cdt.org/crypto/glossary.shtml

Message destroyed on the channelRedundancy allows Bob to determine the original m.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 5 / 44

Page 15: Information Theory and Communications

Communications essentials Communications and Redundancy

Benefits of redundancy

Cross-word puzzlesUnderstand foreigners with imperfect pronounciation.

How much would you understand of a lecture without redundancy?

Hear in a noisy environment.Read bad hand writing

How could I mark exam scripts without redundancy?

Cryptanalysis? Steganalysis?

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 6 / 44

Page 16: Information Theory and Communications

Communications essentials Communications and Redundancy

Benefits of redundancy

Cross-word puzzlesUnderstand foreigners with imperfect pronounciation.

How much would you understand of a lecture without redundancy?

Hear in a noisy environment.Read bad hand writing

How could I mark exam scripts without redundancy?

Cryptanalysis? Steganalysis?

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 6 / 44

Page 17: Information Theory and Communications

Communications essentials Communications and Redundancy

Benefits of redundancy

Cross-word puzzlesUnderstand foreigners with imperfect pronounciation.

How much would you understand of a lecture without redundancy?

Hear in a noisy environment.Read bad hand writing

How could I mark exam scripts without redundancy?

Cryptanalysis? Steganalysis?

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 6 / 44

Page 18: Information Theory and Communications

Communications essentials Communications and Redundancy

Benefits of redundancy

Cross-word puzzlesUnderstand foreigners with imperfect pronounciation.

How much would you understand of a lecture without redundancy?

Hear in a noisy environment.Read bad hand writing

How could I mark exam scripts without redundancy?

Cryptanalysis? Steganalysis?

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 6 / 44

Page 19: Information Theory and Communications

Communications essentials Communications and Redundancy

Benefits of redundancy

Cross-word puzzlesUnderstand foreigners with imperfect pronounciation.

How much would you understand of a lecture without redundancy?

Hear in a noisy environment.Read bad hand writing

How could I mark exam scripts without redundancy?

Cryptanalysis? Steganalysis?

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 6 / 44

Page 20: Information Theory and Communications

Communications essentials Communications and Redundancy

Benefits of redundancy

Cross-word puzzlesUnderstand foreigners with imperfect pronounciation.

How much would you understand of a lecture without redundancy?

Hear in a noisy environment.Read bad hand writing

How could I mark exam scripts without redundancy?

Cryptanalysis? Steganalysis?

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 6 / 44

Page 21: Information Theory and Communications

Communications essentials Communications and Redundancy

Benefits of redundancy

Cross-word puzzlesUnderstand foreigners with imperfect pronounciation.

How much would you understand of a lecture without redundancy?

Hear in a noisy environment.Read bad hand writing

How could I mark exam scripts without redundancy?

Cryptanalysis? Steganalysis?

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 6 / 44

Page 22: Information Theory and Communications

Communications essentials Communications and Redundancy

What if there were no redundancy?

No use for steganography!Any text would be meaningful,

in particular, ciphertext would be meaningfulSimple encryption would give a stegogramme

indistinguishable from cover-text.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 7 / 44

Page 23: Information Theory and Communications

Communications essentials Communications and Redundancy

Problems in natural language

Natural languages are arbitrarySome words/sentences have a lot of redundancy

Others have very little

Unstructured: hard to automate correction

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 8 / 44

Page 24: Information Theory and Communications

Communications essentials Communications and Redundancy

Problems in natural language

Natural languages are arbitrarySome words/sentences have a lot of redundancy

Others have very little

Unstructured: hard to automate correction

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 8 / 44

Page 25: Information Theory and Communications

Communications essentials Communications and Redundancy

Problems in natural language

Natural languages are arbitrarySome words/sentences have a lot of redundancy

Others have very little

Unstructured: hard to automate correction

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 8 / 44

Page 26: Information Theory and Communications

Communications essentials Communications and Redundancy

Problems in natural language

Natural languages are arbitrarySome words/sentences have a lot of redundancy

Others have very little

Unstructured: hard to automate correction

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 8 / 44

Page 27: Information Theory and Communications

Communications essentials Digital Communications

Outline

1 Communications essentialsCommunications and RedundancyDigital CommunicationsShannon EntropySecurityPrediction

2 CompressionRecollectionHuffmann CodingHuffmann Steganography

3 Grammars

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 9 / 44

Page 28: Information Theory and Communications

Communications essentials Digital Communications

CodingChannel and source coding

Source coding (aka. compression)Remove redundancyMake a compact representation

Channel coding (aka. error-control coding)Add mathematically structured redundancyComputationally efficient error-correctionOptimised (low error-rate, small space)

Two aspect of Information Theory

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 10 / 44

Page 29: Information Theory and Communications

Communications essentials Digital Communications

CodingChannel and source coding

Source coding (aka. compression)Remove redundancyMake a compact representation

Channel coding (aka. error-control coding)Add mathematically structured redundancyComputationally efficient error-correctionOptimised (low error-rate, small space)

Two aspect of Information Theory

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 10 / 44

Page 30: Information Theory and Communications

Communications essentials Digital Communications

CodingChannel and source coding

Source coding (aka. compression)Remove redundancyMake a compact representation

Channel coding (aka. error-control coding)Add mathematically structured redundancyComputationally efficient error-correctionOptimised (low error-rate, small space)

Two aspect of Information Theory

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 10 / 44

Page 31: Information Theory and Communications

Communications essentials Digital Communications

Channel and Source Coding

Message

Comp.��

Enc. Dec.Channel //

Decom.

rOO

�� ��

Encrypt.��

��

Decrypt.

OO

OOScramble

Remove redundancy

Add redundancy

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 11 / 44

Page 32: Information Theory and Communications

Communications essentials Digital Communications

Channel and Source Coding

Message

Comp.��

Enc. Dec.Channel //

Decom.

rOO

�� ��

Encrypt.��

��

Decrypt.

OO

OOScramble

Remove redundancy

Add redundancy

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 11 / 44

Page 33: Information Theory and Communications

Communications essentials Digital Communications

Channel and Source Coding

Message

Comp.��

Enc. Dec.Channel //

Decom.

rOO

�� ��

Encrypt.��

��

Decrypt.

OO

OOScramble

Remove redundancy

Add redundancy

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 11 / 44

Page 34: Information Theory and Communications

Communications essentials Digital Communications

Channel and Source Coding

Message

Comp.��

Enc. Dec.Channel //

Decom.

rOO

�� ��

Encrypt.��

��

Decrypt.

OO

OOScramble

Remove redundancy

Add redundancy

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 11 / 44

Page 35: Information Theory and Communications

Communications essentials Shannon Entropy

Outline

1 Communications essentialsCommunications and RedundancyDigital CommunicationsShannon EntropySecurityPrediction

2 CompressionRecollectionHuffmann CodingHuffmann Steganography

3 Grammars

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 12 / 44

Page 36: Information Theory and Communications

Communications essentials Shannon Entropy

UncertaintyShannon Entropy

m and r are stochastic variables(drawn at random from a distribution)

How much uncertainty about the message m?Uncertainty measured by entropyH(m) before any message is received.H(m|r) after receipt of the message

Conditional entropy

Mutual Information is derived from entropyI(m; r) = H(m)− H(m|r)I(m; r) is the amount of information contained in r about m.I(m; r) = I(r; m)

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 13 / 44

Page 37: Information Theory and Communications

Communications essentials Shannon Entropy

UncertaintyShannon Entropy

m and r are stochastic variables(drawn at random from a distribution)

How much uncertainty about the message m?Uncertainty measured by entropyH(m) before any message is received.H(m|r) after receipt of the message

Conditional entropy

Mutual Information is derived from entropyI(m; r) = H(m)− H(m|r)I(m; r) is the amount of information contained in r about m.I(m; r) = I(r; m)

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 13 / 44

Page 38: Information Theory and Communications

Communications essentials Shannon Entropy

UncertaintyShannon Entropy

m and r are stochastic variables(drawn at random from a distribution)

How much uncertainty about the message m?Uncertainty measured by entropyH(m) before any message is received.H(m|r) after receipt of the message

Conditional entropy

Mutual Information is derived from entropyI(m; r) = H(m)− H(m|r)I(m; r) is the amount of information contained in r about m.I(m; r) = I(r; m)

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 13 / 44

Page 39: Information Theory and Communications

Communications essentials Shannon Entropy

Shannon entropyDefinition

Random variable X ∈ X

Hq(X ) = −∑x∈X

Pr(X = x) logq Pr(X = x)

Usually q = 2, giving entropy in bitsq = e (natural logarithm) gives entropy in nats

If Pr(X = xi) = pi for x1, x2, . . . ∈ X , we writeH(X ) = h(p1, p2, . . .)

Example: One question Q; Yes/No is 50-50 probability

H(Q) = −2(

12 log 1

2

)= 1

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 14 / 44

Page 40: Information Theory and Communications

Communications essentials Shannon Entropy

Shannon entropyDefinition

Random variable X ∈ X

Hq(X ) = −∑x∈X

Pr(X = x) logq Pr(X = x)

Usually q = 2, giving entropy in bitsq = e (natural logarithm) gives entropy in nats

If Pr(X = xi) = pi for x1, x2, . . . ∈ X , we writeH(X ) = h(p1, p2, . . .)

Example: One question Q; Yes/No is 50-50 probability

H(Q) = −2(

12 log 1

2

)= 1

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 14 / 44

Page 41: Information Theory and Communications

Communications essentials Shannon Entropy

Shannon entropyDefinition

Random variable X ∈ X

Hq(X ) = −∑x∈X

Pr(X = x) logq Pr(X = x)

Usually q = 2, giving entropy in bitsq = e (natural logarithm) gives entropy in nats

If Pr(X = xi) = pi for x1, x2, . . . ∈ X , we writeH(X ) = h(p1, p2, . . .)

Example: One question Q; Yes/No is 50-50 probability

H(Q) = −2(

12 log 1

2

)= 1

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 14 / 44

Page 42: Information Theory and Communications

Communications essentials Shannon Entropy

Shannon entropyDefinition

Random variable X ∈ X

Hq(X ) = −∑x∈X

Pr(X = x) logq Pr(X = x)

Usually q = 2, giving entropy in bitsq = e (natural logarithm) gives entropy in nats

If Pr(X = xi) = pi for x1, x2, . . . ∈ X , we writeH(X ) = h(p1, p2, . . .)

Example: One question Q; Yes/No is 50-50 probability

H(Q) = −2(

12 log 1

2

)= 1

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 14 / 44

Page 43: Information Theory and Communications

Communications essentials Shannon Entropy

Shannon entropyProperties

1 Additive, if X and Y are independent, thenH(X , Y ) = H(X ) + H(Y ).

If you are uncertain about two completely different questions,the entropy is the sum of uncertainty for each question

2 If X is uniformly distributed,then H(X ) increase when the size of X increases.The more possibilities, the more uncertainty

3 Continuity, h(p1, p2, . . .) is continuous in each pi .

Shannon entropy is a measure in mathematical terms

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 15 / 44

Page 44: Information Theory and Communications

Communications essentials Shannon Entropy

What it tells usShannon entropy

Consider a message X of entropy k = H(X ) (in bits)The average size of a file F describing X is

at least k bitsIf the size of F is exactly k bits on average

then we have found a perfect compression of FEach message bit contains one bit of information on average

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 16 / 44

Page 45: Information Theory and Communications

Communications essentials Shannon Entropy

What it tells usShannon entropy

Consider a message X of entropy k = H(X ) (in bits)The average size of a file F describing X is

at least k bitsIf the size of F is exactly k bits on average

then we have found a perfect compression of FEach message bit contains one bit of information on average

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 16 / 44

Page 46: Information Theory and Communications

Communications essentials Shannon Entropy

Example banale

A single bit may contain more than a 1 bit of informationE.G. Image Compression

0: Mona Lisa10: Lenna110: Baboon11100: Peppers11110: F-1611101: Che Guevarra11111. . . : other images

However, on average,Maximum information in one bit is one bit(most of the time it is less)

The example is based on Huffmann coding

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 17 / 44

Page 47: Information Theory and Communications

Communications essentials Shannon Entropy

Example banale

A single bit may contain more than a 1 bit of informationE.G. Image Compression

0: Mona Lisa10: Lenna110: Baboon11100: Peppers11110: F-1611101: Che Guevarra11111. . . : other images

However, on average,Maximum information in one bit is one bit(most of the time it is less)

The example is based on Huffmann coding

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 17 / 44

Page 48: Information Theory and Communications

Communications essentials Shannon Entropy

Example banale

A single bit may contain more than a 1 bit of informationE.G. Image Compression

0: Mona Lisa10: Lenna110: Baboon11100: Peppers11110: F-1611101: Che Guevarra11111. . . : other images

However, on average,Maximum information in one bit is one bit(most of the time it is less)

The example is based on Huffmann coding

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 17 / 44

Page 49: Information Theory and Communications

Communications essentials Shannon Entropy

Example banale

A single bit may contain more than a 1 bit of informationE.G. Image Compression

0: Mona Lisa10: Lenna110: Baboon11100: Peppers11110: F-1611101: Che Guevarra11111. . . : other images

However, on average,Maximum information in one bit is one bit(most of the time it is less)

The example is based on Huffmann coding

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 17 / 44

Page 50: Information Theory and Communications

Communications essentials Shannon Entropy

Example banale

A single bit may contain more than a 1 bit of informationE.G. Image Compression

0: Mona Lisa10: Lenna110: Baboon11100: Peppers11110: F-1611101: Che Guevarra11111. . . : other images

However, on average,Maximum information in one bit is one bit(most of the time it is less)

The example is based on Huffmann coding

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 17 / 44

Page 51: Information Theory and Communications

Communications essentials Shannon Entropy

Example banale

A single bit may contain more than a 1 bit of informationE.G. Image Compression

0: Mona Lisa10: Lenna110: Baboon11100: Peppers11110: F-1611101: Che Guevarra11111. . . : other images

However, on average,Maximum information in one bit is one bit(most of the time it is less)

The example is based on Huffmann coding

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 17 / 44

Page 52: Information Theory and Communications

Communications essentials Shannon Entropy

Example banale

A single bit may contain more than a 1 bit of informationE.G. Image Compression

0: Mona Lisa10: Lenna110: Baboon11100: Peppers11110: F-1611101: Che Guevarra11111. . . : other images

However, on average,Maximum information in one bit is one bit(most of the time it is less)

The example is based on Huffmann coding

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 17 / 44

Page 53: Information Theory and Communications

Communications essentials Shannon Entropy

Example banale

A single bit may contain more than a 1 bit of informationE.G. Image Compression

0: Mona Lisa10: Lenna110: Baboon11100: Peppers11110: F-1611101: Che Guevarra11111. . . : other images

However, on average,Maximum information in one bit is one bit(most of the time it is less)

The example is based on Huffmann coding

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 17 / 44

Page 54: Information Theory and Communications

Communications essentials Security

Outline

1 Communications essentialsCommunications and RedundancyDigital CommunicationsShannon EntropySecurityPrediction

2 CompressionRecollectionHuffmann CodingHuffmann Steganography

3 Grammars

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 18 / 44

Page 55: Information Theory and Communications

Communications essentials Security

Cryptography

Alice ciphertext Bob,m → c → m

Eve

Eve seeks information about m, observing cIf I(m; c) > 0 then Eve succeeds in theory

or if I(k; c) > 0

If H(m|c) = H(m) then the system is absolutely secure.The above are strong statements

Even if Eve has information I(m; c) > 0,she may be unable to make sense of it.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 19 / 44

Page 56: Information Theory and Communications

Communications essentials Security

Stegananalysis

Question: Does Alice send secret information to Bob?Answer: X ∈ {yes, no}

What is the uncertainty H(X )?Eve intercepts a message S,

Is there any information I(X ; S)?

If H(X |S) = H(X ), then the system is absolutely secure.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 20 / 44

Page 57: Information Theory and Communications

Communications essentials Security

Stegananalysis

Question: Does Alice send secret information to Bob?Answer: X ∈ {yes, no}

What is the uncertainty H(X )?Eve intercepts a message S,

Is there any information I(X ; S)?

If H(X |S) = H(X ), then the system is absolutely secure.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 20 / 44

Page 58: Information Theory and Communications

Communications essentials Security

Stegananalysis

Question: Does Alice send secret information to Bob?Answer: X ∈ {yes, no}

What is the uncertainty H(X )?Eve intercepts a message S,

Is there any information I(X ; S)?

If H(X |S) = H(X ), then the system is absolutely secure.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 20 / 44

Page 59: Information Theory and Communications

Communications essentials Security

Stegananalysis

Question: Does Alice send secret information to Bob?Answer: X ∈ {yes, no}

What is the uncertainty H(X )?Eve intercepts a message S,

Is there any information I(X ; S)?

If H(X |S) = H(X ), then the system is absolutely secure.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 20 / 44

Page 60: Information Theory and Communications

Communications essentials Security

Stegananalysis

Question: Does Alice send secret information to Bob?Answer: X ∈ {yes, no}

What is the uncertainty H(X )?Eve intercepts a message S,

Is there any information I(X ; S)?

If H(X |S) = H(X ), then the system is absolutely secure.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 20 / 44

Page 61: Information Theory and Communications

Communications essentials Prediction

Outline

1 Communications essentialsCommunications and RedundancyDigital CommunicationsShannon EntropySecurityPrediction

2 CompressionRecollectionHuffmann CodingHuffmann Steganography

3 Grammars

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 21 / 44

Page 62: Information Theory and Communications

Communications essentials Prediction

Random sequences

Text is a sequence of random samples (letters)(l1, l2, l3, . . .); li ∈ A = {A, B, . . . , Z}

Each letter has a probability distribution P(l), l ∈ A.Statistical dependence (aka. redundancy)

P(li |li−1) 6= P(li)H(li |li−1) < H(li): Letter i − 1 contains information about liUse this information to guess li

The more letters li−j , . . . , li−1 we have seenthe more reliable can we predict li

Wayner (Ch 6.1) gives example of first, second, . . . , fifth orderprediction

Using j = 0, 1, 2, 3, 4

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 22 / 44

Page 63: Information Theory and Communications

Communications essentials Prediction

First-order predictionExample from Wayner

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 23 / 44

Page 64: Information Theory and Communications

Communications essentials Prediction

Second-order predictionExample from Wayner

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 24 / 44

Page 65: Information Theory and Communications

Communications essentials Prediction

Third-order predictionExample from Wayner

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 25 / 44

Page 66: Information Theory and Communications

Communications essentials Prediction

Fourth-order predictionExample from Wayner

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 26 / 44

Page 67: Information Theory and Communications

Compression Recollection

Outline

1 Communications essentialsCommunications and RedundancyDigital CommunicationsShannon EntropySecurityPrediction

2 CompressionRecollectionHuffmann CodingHuffmann Steganography

3 Grammars

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 27 / 44

Page 68: Information Theory and Communications

Compression Recollection

Compression

F∗ is set of binary strings of arbitrary length

DefinitionA compression system is a function c : F∗ → F

∗, such thatE(length m) > E(length(c(m))) when m is drawn from F

∗.

The compressed string is expected to be shorter than the original.

DefinitionA compression c is perfect if all target strings are used, i.e. if for anym ∈ F∗, c−1(m) is a sensible file (cover-text).

Decompress a random string, and it makes sense!

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 28 / 44

Page 69: Information Theory and Communications

Compression Recollection

Steganography by Perfect CompressionAnderson and Petitcolas 1998

A perfect compression scheme.A secure cipher.

Decompress

Encryption

C��

CompressS //

Decrypt

C

OO

Message

��

MessageOO

Keyoo //

Steganography without data hiding.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 29 / 44

Page 70: Information Theory and Communications

Compression Recollection

Steganography by Perfect CompressionAnderson and Petitcolas 1998

A perfect compression scheme.A secure cipher.

Decompress

Encryption

C��

CompressS //

Decrypt

C

OO

Message

��

MessageOO

Keyoo //

Steganography without data hiding.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 29 / 44

Page 71: Information Theory and Communications

Compression Huffmann Coding

Outline

1 Communications essentialsCommunications and RedundancyDigital CommunicationsShannon EntropySecurityPrediction

2 CompressionRecollectionHuffmann CodingHuffmann Steganography

3 Grammars

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 30 / 44

Page 72: Information Theory and Communications

Compression Huffmann Coding

Huffmann Coding

Short codewords for frequent quantitiesLong codewords for unusual quantitiesEach symbol (bit) should be equally probable

ONMLHIJK50%��

����

0 ????

????

?

1

ONMLHIJK25%��

����

0 ONMLHIJK25%

????

??

1

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 31 / 44

Page 73: Information Theory and Communications

Compression Huffmann Coding

Example

ooooooooooooo

0 OOOOOOOOOOOOO

1

ONMLHIJK25%��

����

0 ????

????

?

1

����

����

0 WVUTPQRS12 12 %

????

??

1

ONMLHIJK25%��

����

0 ONMLHIJK25%

????

??

1

ONMLHIJK7 14 %

????

??

1ONMLHIJK7 14 %

����

��

0

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 32 / 44

Page 74: Information Theory and Communications

Compression Huffmann Coding

Decoding

Huffmann codes are prefix freeNo codeword is the prefix of anotherThis simplifies the decoding

This is expressed in the Huffmann tree,follow edges for each coded bit(only) leaf node resolves to a message symbol

When a message symbol is recovered, start over for next symbol.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 33 / 44

Page 75: Information Theory and Communications

Compression Huffmann Coding

Ideal Huffmann code

Each branch equally likely: P(bi |bi−1, bi−2, . . .) = 1/2Maximum entropy H(Bi |Bi−1, Bi−2, . . .) = 1

uniform distribution of compressed filesimplies perfect compression

In practice, the probabilities are rarely powers of 12

hence the Huffmann code is imperfect

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 34 / 44

Page 76: Information Theory and Communications

Compression Huffmann Steganography

Outline

1 Communications essentialsCommunications and RedundancyDigital CommunicationsShannon EntropySecurityPrediction

2 CompressionRecollectionHuffmann CodingHuffmann Steganography

3 Grammars

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 35 / 44

Page 77: Information Theory and Communications

Compression Huffmann Steganography

Reverse Huffmann

Core Reading

Peter Wayner: Disappearing Cryptography Ch. 6-7

Stego-encoder: Huffmann decompressionStego-decoder: Huffmann compressionIs this similar to Anderson & Petitcolas

Steganography by Perfect Compression?

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 36 / 44

Page 78: Information Theory and Communications

Compression Huffmann Steganography

The Stegogramme

Stegogramme looks like random textuse probability distribution based on sample texthigher-order statistics make it look natural

Fifth-order statistics is reasonableHigher order will look more natural

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 37 / 44

Page 79: Information Theory and Communications

Compression Huffmann Steganography

The Stegogramme

Stegogramme looks like random textuse probability distribution based on sample texthigher-order statistics make it look natural

Fifth-order statistics is reasonableHigher order will look more natural

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 37 / 44

Page 80: Information Theory and Communications

Compression Huffmann Steganography

ExampleFifth order

For each 5-tupple of letters A0, A1, A2, A3, A4,Let li−4, . . . , li be consecutive letters in natural texttabulate P(li = A0|li−j = Aj , j = 1, 2, 3, 4)

For each 4-tuple A1, A2, A4, A5make an (approximate) Huffmann code for A0.

we may ommit some values of A0,or have non-unique codewords

We encode a message by Huffmann decompressionusing Huffmann code depending on the last four stegogrammesymbolsobtaining a fifth-order random text

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 38 / 44

Page 81: Information Theory and Communications

Compression Huffmann Steganography

ExampleFifth order

Consider four preceeding letters compNext letter may be

letter r e l a oprobability 40% 12% 22% 18% 8%combined 52% 22% 26%rounded 50% 25% 25%

Rounding to power of 12

Combining several letters reduces rounding error.

The example is arbitrary and fictuous.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 39 / 44

Page 82: Information Theory and Communications

Compression Huffmann Steganography

ExampleThe Huffmann code

Huffmann code based on fifth-order conditional probabilities

ONMLHIJKr/e��

����

0 ????

????

?

1

?>=<89:;l��

����

0 ONMLHIJKa/o??

????

1

When two letters are possible,choose at random (according to probalitity in natural text)decoding (compression) is still uniqueencoding (decompression) is not unique

This evens out the statistics in the stegogramme

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 40 / 44

Page 83: Information Theory and Communications

Compression Huffmann Steganography

Is this practical?Exercise

To be discussed in groups of 2-4.

How would you steganalyse a potential Huffmann-basedstegogramme?How practical is the steganalysis?How would you implement Huffmann-based steganography?

Which implementation issues/challenges do you foresee?

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 41 / 44

Page 84: Information Theory and Communications

Grammars

Grammar

A grammar describes the structure of a languageSimple grammar

sentence → noun verbnoun → Mr. Brown | Miss Scarletverb → eats | drinks

Each choice can map to a message symbol0 : Mr. Brown, eats1 : Miss Scarlet, drinks

Two messages can be stego-encryptedNo cover-text is input.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 42 / 44

Page 85: Information Theory and Communications

Grammars

More complex grammar

sentence → noun verb additionnoun → Mr. Brown | Miss Scarlet | . . . | Mrs. Whiteverb → eats | drinks | celebrates | . . . | cooksaddition → addition term | ∅term → on Monday | in March | with Mr. Green | . . . | in Alaska | athomegeneral → sentence | questionquestion → Does noun verb addition ?xgeneral → general | sentence, because sentence

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 43 / 44

Page 86: Information Theory and Communications

Grammars

More complex grammar

sentence → noun verb additionnoun → Mr. Brown | Miss Scarlet | . . . | Mrs. Whiteverb → eats | drinks | celebrates | . . . | cooksaddition → addition term | ∅term → on Monday | in March | with Mr. Green | . . . | in Alaska | athomegeneral → sentence | questionquestion → Does noun verb addition ?xgeneral → general | sentence, because sentence

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 43 / 44

Page 87: Information Theory and Communications

Grammars

More complex grammar

sentence → noun verb additionnoun → Mr. Brown | Miss Scarlet | . . . | Mrs. Whiteverb → eats | drinks | celebrates | . . . | cooksaddition → addition term | ∅term → on Monday | in March | with Mr. Green | . . . | in Alaska | athomegeneral → sentence | questionquestion → Does noun verb addition ?xgeneral → general | sentence, because sentence

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 43 / 44

Page 88: Information Theory and Communications

Grammars

Discussion

How practical is a grammar-based stego-system?Which implementation issues do you foresee?Can you visualise a grammar-variant for images?

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 44 / 44