1 Information Theory Nathanael Paul Oct. 09, 2002

1

Information Theory

Nathanael Paul

Oct. 09, 2002

2

Claude Shannon:Father of Information Theory

“Communication Theory of Secrecy Systems” (1949) Cryptography becomes science

Why is information theory so important in cryptography?

3

Some Terms

(P,C,K,E,D)

Computational Security Computational effort required to break

cryptosystem

Provable Security Relative to another, difficult problem

Unconditional Security Oscar (adversary) can do whatever he wants, as

much as he wants

4

Applying probability to cryptography

Each message p in P has a probability as well as each k in K has a probability Given a p in P and a k in K, a y in C is uniquely

determined. Given a k in K and a y in C, an x in X is uniquely

determined.

Induce a probability on ciphertext space For the equation below, y is fixed.

5

Some probability theory…

Probability distribution on X

Joint probability

Conditional probability

Bayes’ Theorem

6

Probability Distribution of X

p(x) – probability function of XX takes on a finite # (or countably infinite) of possible values – x Ex. x is a letter in substitution cipher, where X

is plaintext space P(X=x) = p(x) >= 0

this sum is over all possible values of x

7

Joint Probability

Let X1 and X2 denote random variables

p(x1,x2) = P(X1 = x1, X2 = x2) “The probability that X1 will take on the value

x1 and X2 will take on the value x2”

If X1 and X2 are independent, then p(x1,x2) = p(x1) * p(x2)

8

Conditional Probability

“What is the probability of x given y?”

p(x|y) = p(x,y)/p(y)

If p(X = x|Y = y) = p(X = x), then X and Y are independent.

9

Bayes’ Theorem

p(x,y) = p(x) * p(y | x)= p(y) * p(x | y)

10

Perfect Secrecy Defined

A cryptosystem (P,C,K,E,D) has perfect secrecy if

“ciphertext yields no information about plaintext”

11

Perfect Secrecy Defined

Suppose a cryptosystem (P,C,K,E,D) has |K| = |C| = |P|.

This cryptosystem has P.S. iff the following hold:

- Each key chosen is truely random- For each x in P, y in C, a unique key k ek(x) = y.

12

Perfect Secrecy (P.S.)implies|P| <= |K| and |C| <= |K|

Claim: Perfect Secrecy (P.S.)implies|P| <= |K| and |C| <= |K|

pP(x | y) = pP(x) > 0, where y is fixed.Ek(x) = y, for a k in K (k is random)

For each x a k in K Ek(x) = y, since probability pP(x) > 0.

13

Conclusion about Perfect Secrecy

“Key size should be at least as large as message size, and key size should be at least as large as ciphertext size.”

14

Perfect Secrecy Example

P = C = K = Z26 = {0,1,2,...,24,25}

Ek(x) = x + k mod 26Dk(x) = x – k mod 26

p(k) = 1/26 and p(x) = any distribution given note: key must be truely random

15

Entropy

Want to be able to measure the “uncertainty” or “information” of some random variable X.

Entropy a measure of information “How much information or uncertainty is in a

cryptosystem?”

16

Entropy (cont.)

Given: X, a random variable finite set of values of X: p1,..., pn

Entropy is:

17

Entropy examples

X: X1, X2

P: 1 , 0

Entropy = 0, since there is no choice. X1 will happen 100% of the time. H(X) = 0.

X: X1, X2 X1 is more likely than P: ¾ , ¼ X2.H(X) = - (¾ log2(¾) + ¼ log2(¼))

18

Entropy examples (cont.)

X: X1, X2

½ ½

H(x) = - (½ log2(½) + ½ log2(½)) = 1

X: X1, X2, ..., Xn

P: 1/n, 1/n, ..., 1/n

H(x) = - (1/n log2(1/n) * n) = log2(n)

19

Entropy examples (cont.)

If X is a random variable with n possible values: H(X) <= log2(n), with equality iff each value

has equal probability (i.e. 1/n) By Jensen’s Inequality, log2(n) provides an

upper bound on H(x)

If x is the months of the year:H(x) = log212 3.6 (about 4 bits needed to encode the year)

20

Unicity Distance

Assume in a given cryptosystem a msg is a string:x1,x2,...,xn where xi is in P (xi is a letter or block)

Encrypting each xi individually with the same key k,

yi = Ek(xi), 1 <= i <= n

How many ciphertext blocks, yi’s, do we need to determine k?

21

Unicity Distance (cont.)

Ciphertext only attack with infinite computing power

Unicity Distance Smallest # n, for which n ciphertexts (on

average) uniquely determine key One-time pad (infinite)

22

Defining a language

L: the set of all msgs, for n >= 1. “the natural language” p2: (x1,x2) : x1, x2 in P

pn: (x1,x2,...,xn), xi in P, so pn L each pi inherits a probability distribution from L

(digrams, trigrams, ...) H(pi) makes sense

23

Entropy and Redundancy of a language

What is the entropy of a language?

What is the redundancy of a language?

24

Application of Entropy and Redundancy

1 <= HL <= 1.5 in english H(P) = 4.18 H(P2) = 3.90

RL = 1 – HL/log226 about 70%, depends on HL

25

Unicity in substitution cipher

no = log2|K|/(RL*log2|P|)|P| = 26|K| = 26! (all permutations)

no = log226!/(0.70 * log226)which is about 26.8Which means… on average, if one has 27 letters of ciphertext from a substitution cipher, then you should have enough information to determine the key!

26

Ending notes...

key equivocation “How much information is revealed by the

ciphertext about the key?” H(K|C) = H(K) + H(P) – H(C)

Spurious keys incorrect but possible

So reconsider our question:

“Why can’t cryptography and math be separated?”

Documents

1 Information Theory Nathanael Paul Oct. 09, 2002