Information Theory and Communications

Information Theory and CommunicationsCSM25 Secure Information Hiding

Dr Hans Georg Schaathun

University of Surrey

Spring 2007

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 1 / 44

Learning Outcomes

become familiar with fundamental concepts in communicationsEntropy and RedundancyError-control codingCompression

be able to link communications fundamentals to steganography


Communications essentials Communications and Redundancy

Outline

1 Communications essentialsCommunications and RedundancyDigital CommunicationsShannon EntropySecurityPrediction

2 CompressionRecollectionHuffmann CodingHuffmann Steganography

3 Grammars



The communications problem

m m̂//

Noisychannel

// //Enc. Dec.// c // r // //

Alice Bob

Bob’s problemEstimate m,given (partly) random output m̂ from the channel

How much (un)certainty does Bob have about m?Information theory and Shannon entropy.




m m̂

//

Noisychannel

// //

Enc. Dec.// c // r // //

Alice Bob






m m̂

//

Noisychannel

// //

Enc. Dec.// c // r // //

Alice Bob






m m̂

//

Noisychannel

// //

Enc. Dec.// c // r // //

Alice Bob






m m̂

//

Noisychannel

// //

Enc. Dec.// c // r // //

Alice Bob






m m̂

//

Noisychannel

// //

Enc. Dec.// c // r // //

Alice Bob





Redundancy of English

FactThe English language is more than 50% redundant.

from http://www.cdt.org/crypto/glossary.shtml

Message destroyed on the channelRedundancy allows Bob to determine the original m.


http://www.cdt.org/crypto/glossary.shtml




t** p*oce*s o**hid**g *ata**nsid* o*her**ata. For ex*****, a **xt f*lec**ld*** hid*** "in**de"****im*ge or***s**nd *ile* By look****at t*eim*g***or list***** to th**s**nd,*yo* w*u*d n*t *no**that***ere is *x*rainfo******* *r*sent.








t** p*oce*s o**hid**g *ata**nsid* o*her**ata. For ex*****, a **xt f*lec**ld*** hid*** "in**de"****im*ge or***s**nd *ile* By look****at t*eim*g***or list***** to th**s**nd,*yo* w*u*d n*t *no**that***ere is *x*rainfo******* *r*sent.








t*e p*oce*s o* hid**g *ata*insid* o*her*data. For ex*m***, a t*xt f*lec**ld*b* hidd** "ind*de" a**im*ge or*a*s*und *ile* By look**g*at t*eim*g*,*or list**in* to th* s**nd,*yo* w*uld n*t *no**that *here is *x*rainfo*****on *r*sent.








the process of hiding data inside other data. For example, a text filecould be hidden "inside" an image or a sound file. By looking at theimage, or listening to the sound, you would not know that there is extrainformation present.






Benefits of redundancy

Cross-word puzzlesUnderstand foreigners with imperfect pronounciation.

How much would you understand of a lecture without redundancy?

Hear in a noisy environment.Read bad hand writing

How could I mark exam scripts without redundancy?

Cryptanalysis? Steganalysis?



















































What if there were no redundancy?

No use for steganography!Any text would be meaningful,

in particular, ciphertext would be meaningfulSimple encryption would give a stegogramme

indistinguishable from cover-text.



Problems in natural language

Natural languages are arbitrarySome words/sentences have a lot of redundancy

Others have very little

Unstructured: hard to automate correction




















Communications essentials Digital Communications

Outline



3 Grammars



CodingChannel and source coding

Source coding (aka. compression)Remove redundancyMake a compact representation

Channel coding (aka. error-control coding)Add mathematically structured redundancyComputationally efficient error-correctionOptimised (low error-rate, small space)

Two aspect of Information Theory















Channel and Source Coding

Message

Comp.��

Enc. Dec.Channel //

Decom.

rOO

��

Encrypt.��

��

Decrypt.

OO

OOScramble

Remove redundancy

Add redundancy




Message

Comp.��

Enc. Dec.Channel //

Decom.

rOO

��

Encrypt.��

��

Decrypt.

OO

OOScramble

Remove redundancy

Add redundancy




Message

Comp.��

Enc. Dec.Channel //

Decom.

rOO

��

Encrypt.��

��

Decrypt.

OO

OOScramble

Remove redundancy

Add redundancy




Message

Comp.��

Enc. Dec.Channel //

Decom.

rOO

��

Encrypt.��

��

Decrypt.

OO

OOScramble

Remove redundancy

Add redundancy


Communications essentials Shannon Entropy

Outline



3 Grammars



UncertaintyShannon Entropy

m and r are stochastic variables(drawn at random from a distribution)

How much uncertainty about the message m?Uncertainty measured by entropyH(m) before any message is received.H(m|r) after receipt of the message

Conditional entropy

Mutual Information is derived from entropyI(m; r) = H(m)− H(m|r)I(m; r) is the amount of information contained in r about m.I(m; r) = I(r; m)






Conditional entropy







Conditional entropy




Shannon entropyDefinition

Random variable X ∈ X

Hq(X ) = −∑x∈X

Pr(X = x) logq Pr(X = x)

Usually q = 2, giving entropy in bitsq = e (natural logarithm) gives entropy in nats

If Pr(X = xi) = pi for x1, x2, . . . ∈ X , we writeH(X ) = h(p1, p2, . . .)

Example: One question Q; Yes/No is 50-50 probability

H(Q) = −2(

12 log 1

2

)= 1





Hq(X ) = −∑x∈X





H(Q) = −2(

12 log 1

2

)= 1





Hq(X ) = −∑x∈X





H(Q) = −2(

12 log 1

2

)= 1





Hq(X ) = −∑x∈X





H(Q) = −2(

12 log 1

2

)= 1



Shannon entropyProperties

1 Additive, if X and Y are independent, thenH(X , Y ) = H(X ) + H(Y ).

If you are uncertain about two completely different questions,the entropy is the sum of uncertainty for each question

2 If X is uniformly distributed,then H(X ) increase when the size of X increases.The more possibilities, the more uncertainty

3 Continuity, h(p1, p2, . . .) is continuous in each pi .

Shannon entropy is a measure in mathematical terms



What it tells usShannon entropy

Consider a message X of entropy k = H(X ) (in bits)The average size of a file F describing X is

at least k bitsIf the size of F is exactly k bits on average

then we have found a perfect compression of FEach message bit contains one bit of information on average



What it tells usShannon entropy

Consider a message X of entropy k = H(X ) (in bits)The average size of a file F describing X is

at least k bitsIf the size of F is exactly k bits on average

then we have found a perfect compression of FEach message bit contains one bit of information on average



Example banale

A single bit may contain more than a 1 bit of informationE.G. Image Compression

0: Mona Lisa10: Lenna110: Baboon11100: Peppers11110: F-1611101: Che Guevarra11111. . . : other images

However, on average,Maximum information in one bit is one bit(most of the time it is less)

The example is based on Huffmann coding



Example banale







Example banale







Example banale







Example banale







Example banale







Example banale







Example banale






Communications essentials Security

Outline



3 Grammars



Cryptography

Alice ciphertext Bob,m → c → m

Eve

Eve seeks information about m, observing cIf I(m; c) > 0 then Eve succeeds in theory

or if I(k; c) > 0

If H(m|c) = H(m) then the system is absolutely secure.The above are strong statements

Even if Eve has information I(m; c) > 0,she may be unable to make sense of it.



Stegananalysis

Question: Does Alice send secret information to Bob?Answer: X ∈ {yes, no}

What is the uncertainty H(X )?Eve intercepts a message S,

Is there any information I(X ; S)?

If H(X |S) = H(X ), then the system is absolutely secure.



Stegananalysis







Stegananalysis







Stegananalysis







Stegananalysis






Communications essentials Prediction

Outline



3 Grammars



Random sequences

Text is a sequence of random samples (letters)(l1, l2, l3, . . .); li ∈ A = {A, B, . . . , Z}

Each letter has a probability distribution P(l), l ∈ A.Statistical dependence (aka. redundancy)

P(li |li−1) 6= P(li)H(li |li−1) < H(li): Letter i − 1 contains information about liUse this information to guess li

The more letters li−j , . . . , li−1 we have seenthe more reliable can we predict li

Wayner (Ch 6.1) gives example of first, second, . . . , fifth orderprediction

Using j = 0, 1, 2, 3, 4



First-order predictionExample from Wayner



Second-order predictionExample from Wayner



Third-order predictionExample from Wayner



Fourth-order predictionExample from Wayner


Compression Recollection

Outline



3 Grammars



Compression

F∗ is set of binary strings of arbitrary length

DefinitionA compression system is a function c : F∗ → F

∗, such thatE(length m) > E(length(c(m))) when m is drawn from F

∗.

The compressed string is expected to be shorter than the original.

DefinitionA compression c is perfect if all target strings are used, i.e. if for anym ∈ F∗, c−1(m) is a sensible file (cover-text).

Decompress a random string, and it makes sense!



Steganography by Perfect CompressionAnderson and Petitcolas 1998

A perfect compression scheme.A secure cipher.

Decompress

Encryption

C��

CompressS //

Decrypt

C

OO

Message

��

MessageOO

Keyoo //

Steganography without data hiding.



Steganography by Perfect CompressionAnderson and Petitcolas 1998

A perfect compression scheme.A secure cipher.

Decompress

Encryption

C��

CompressS //

Decrypt

C

OO

Message

��

MessageOO

Keyoo //

Steganography without data hiding.


Compression Huffmann Coding

Outline



3 Grammars



Huffmann Coding

Short codewords for frequent quantitiesLong codewords for unusual quantitiesEach symbol (bit) should be equally probable

ONMLHIJK50%��

��

0 ????

????

?

1

ONMLHIJK25%��

��

0 ONMLHIJK25%

????

??

1



Example

ooooooooooooo

0 OOOOOOOOOOOOO

1

ONMLHIJK25%��

��

0 ????

????

?

1

��

��

�

0 WVUTPQRS12 12 %

????

??

1

ONMLHIJK25%��

��

0 ONMLHIJK25%

????

??

1

ONMLHIJK7 14 %

????

??

1ONMLHIJK7 14 %

��

��

0



Decoding

Huffmann codes are prefix freeNo codeword is the prefix of anotherThis simplifies the decoding

This is expressed in the Huffmann tree,follow edges for each coded bit(only) leaf node resolves to a message symbol

When a message symbol is recovered, start over for next symbol.



Ideal Huffmann code

Each branch equally likely: P(bi |bi−1, bi−2, . . .) = 1/2Maximum entropy H(Bi |Bi−1, Bi−2, . . .) = 1

uniform distribution of compressed filesimplies perfect compression

In practice, the probabilities are rarely powers of 12

hence the Huffmann code is imperfect


Compression Huffmann Steganography

Outline



3 Grammars



Reverse Huffmann

Core Reading

Peter Wayner: Disappearing Cryptography Ch. 6-7

Stego-encoder: Huffmann decompressionStego-decoder: Huffmann compressionIs this similar to Anderson & Petitcolas

Steganography by Perfect Compression?



The Stegogramme

Stegogramme looks like random textuse probability distribution based on sample texthigher-order statistics make it look natural

Fifth-order statistics is reasonableHigher order will look more natural



The Stegogramme

Stegogramme looks like random textuse probability distribution based on sample texthigher-order statistics make it look natural

Fifth-order statistics is reasonableHigher order will look more natural



ExampleFifth order

For each 5-tupple of letters A0, A1, A2, A3, A4,Let li−4, . . . , li be consecutive letters in natural texttabulate P(li = A0|li−j = Aj , j = 1, 2, 3, 4)

For each 4-tuple A1, A2, A4, A5make an (approximate) Huffmann code for A0.

we may ommit some values of A0,or have non-unique codewords

We encode a message by Huffmann decompressionusing Huffmann code depending on the last four stegogrammesymbolsobtaining a fifth-order random text



ExampleFifth order

Consider four preceeding letters compNext letter may be

letter r e l a oprobability 40% 12% 22% 18% 8%combined 52% 22% 26%rounded 50% 25% 25%

Rounding to power of 12

Combining several letters reduces rounding error.

The example is arbitrary and fictuous.



ExampleThe Huffmann code

Huffmann code based on fifth-order conditional probabilities

ONMLHIJKr/e��

��

0 ????

????

?

1

?>=<89:;l��

��

�

0 ONMLHIJKa/o??

????

1

When two letters are possible,choose at random (according to probalitity in natural text)decoding (compression) is still uniqueencoding (decompression) is not unique

This evens out the statistics in the stegogramme



Is this practical?Exercise

To be discussed in groups of 2-4.

How would you steganalyse a potential Huffmann-basedstegogramme?How practical is the steganalysis?How would you implement Huffmann-based steganography?

Which implementation issues/challenges do you foresee?


Grammars

Grammar

A grammar describes the structure of a languageSimple grammar

sentence → noun verbnoun → Mr. Brown | Miss Scarletverb → eats | drinks

Each choice can map to a message symbol0 : Mr. Brown, eats1 : Miss Scarlet, drinks

Two messages can be stego-encryptedNo cover-text is input.


Grammars

More complex grammar

sentence → noun verb additionnoun → Mr. Brown | Miss Scarlet | . . . | Mrs. Whiteverb → eats | drinks | celebrates | . . . | cooksaddition → addition term | ∅term → on Monday | in March | with Mr. Green | . . . | in Alaska | athomegeneral → sentence | questionquestion → Does noun verb addition ?xgeneral → general | sentence, because sentence


Grammars




Grammars




Grammars

Discussion

How practical is a grammar-based stego-system?Which implementation issues do you foresee?Can you visualise a grammar-variant for images?


Documents

Information Theory and Communications