26
1 Information Theory Nathanael Paul Oct. 09, 2002

1 Information Theory Nathanael Paul Oct. 09, 2002

Embed Size (px)

Citation preview

Page 1: 1 Information Theory Nathanael Paul Oct. 09, 2002

1

Information Theory

Nathanael Paul

Oct. 09, 2002

Page 2: 1 Information Theory Nathanael Paul Oct. 09, 2002

2

Claude Shannon:Father of Information Theory

“Communication Theory of Secrecy Systems” (1949) Cryptography becomes science

Why is information theory so important in cryptography?

Page 3: 1 Information Theory Nathanael Paul Oct. 09, 2002

3

Some Terms

(P,C,K,E,D)

Computational Security Computational effort required to break

cryptosystem

Provable Security Relative to another, difficult problem

Unconditional Security Oscar (adversary) can do whatever he wants, as

much as he wants

Page 4: 1 Information Theory Nathanael Paul Oct. 09, 2002

4

Applying probability to cryptography

Each message p in P has a probability as well as each k in K has a probability Given a p in P and a k in K, a y in C is uniquely

determined. Given a k in K and a y in C, an x in X is uniquely

determined.

Induce a probability on ciphertext space For the equation below, y is fixed.

Page 5: 1 Information Theory Nathanael Paul Oct. 09, 2002

5

Some probability theory…

Probability distribution on X

Joint probability

Conditional probability

Bayes’ Theorem

Page 6: 1 Information Theory Nathanael Paul Oct. 09, 2002

6

Probability Distribution of X

p(x) – probability function of XX takes on a finite # (or countably infinite) of possible values – x Ex. x is a letter in substitution cipher, where X

is plaintext space P(X=x) = p(x) >= 0

this sum is over all possible values of x

Page 7: 1 Information Theory Nathanael Paul Oct. 09, 2002

7

Joint Probability

Let X1 and X2 denote random variables

p(x1,x2) = P(X1 = x1, X2 = x2) “The probability that X1 will take on the value

x1 and X2 will take on the value x2”

If X1 and X2 are independent, then p(x1,x2) = p(x1) * p(x2)

Page 8: 1 Information Theory Nathanael Paul Oct. 09, 2002

8

Conditional Probability

“What is the probability of x given y?”

p(x|y) = p(x,y)/p(y)

If p(X = x|Y = y) = p(X = x), then X and Y are independent.

Page 9: 1 Information Theory Nathanael Paul Oct. 09, 2002

9

Bayes’ Theorem

p(x,y) = p(x) * p(y | x)= p(y) * p(x | y)

Page 10: 1 Information Theory Nathanael Paul Oct. 09, 2002

10

Perfect Secrecy Defined

A cryptosystem (P,C,K,E,D) has perfect secrecy if

“ciphertext yields no information about plaintext”

Page 11: 1 Information Theory Nathanael Paul Oct. 09, 2002

11

Perfect Secrecy Defined

Suppose a cryptosystem (P,C,K,E,D) has |K| = |C| = |P|.

This cryptosystem has P.S. iff the following hold:

- Each key chosen is truely random- For each x in P, y in C, a unique key k ek(x) = y.

Page 12: 1 Information Theory Nathanael Paul Oct. 09, 2002

12

Perfect Secrecy (P.S.)implies|P| <= |K| and |C| <= |K|

Claim: Perfect Secrecy (P.S.)implies|P| <= |K| and |C| <= |K|

pP(x | y) = pP(x) > 0, where y is fixed.Ek(x) = y, for a k in K (k is random)

For each x a k in K Ek(x) = y, since probability pP(x) > 0.

Page 13: 1 Information Theory Nathanael Paul Oct. 09, 2002

13

Conclusion about Perfect Secrecy

“Key size should be at least as large as message size, and key size should be at least as large as ciphertext size.”

Page 14: 1 Information Theory Nathanael Paul Oct. 09, 2002

14

Perfect Secrecy Example

P = C = K = Z26 = {0,1,2,...,24,25}

Ek(x) = x + k mod 26Dk(x) = x – k mod 26

p(k) = 1/26 and p(x) = any distribution given note: key must be truely random

Page 15: 1 Information Theory Nathanael Paul Oct. 09, 2002

15

Entropy

Want to be able to measure the “uncertainty” or “information” of some random variable X.

Entropy a measure of information “How much information or uncertainty is in a

cryptosystem?”

Page 16: 1 Information Theory Nathanael Paul Oct. 09, 2002

16

Entropy (cont.)

Given: X, a random variable finite set of values of X: p1,..., pn

Entropy is:

Page 17: 1 Information Theory Nathanael Paul Oct. 09, 2002

17

Entropy examples

X: X1, X2

P: 1 , 0

Entropy = 0, since there is no choice. X1 will happen 100% of the time. H(X) = 0.

X: X1, X2 X1 is more likely than P: ¾ , ¼ X2.H(X) = - (¾ log2(¾) + ¼ log2(¼))

Page 18: 1 Information Theory Nathanael Paul Oct. 09, 2002

18

Entropy examples (cont.)

X: X1, X2

½ ½

H(x) = - (½ log2(½) + ½ log2(½)) = 1

X: X1, X2, ..., Xn

P: 1/n, 1/n, ..., 1/n

H(x) = - (1/n log2(1/n) * n) = log2(n)

Page 19: 1 Information Theory Nathanael Paul Oct. 09, 2002

19

Entropy examples (cont.)

If X is a random variable with n possible values: H(X) <= log2(n), with equality iff each value

has equal probability (i.e. 1/n) By Jensen’s Inequality, log2(n) provides an

upper bound on H(x)

If x is the months of the year:H(x) = log212 3.6 (about 4 bits needed to encode the year)

Page 20: 1 Information Theory Nathanael Paul Oct. 09, 2002

20

Unicity Distance

Assume in a given cryptosystem a msg is a string:x1,x2,...,xn where xi is in P (xi is a letter or block)

Encrypting each xi individually with the same key k,

yi = Ek(xi), 1 <= i <= n

How many ciphertext blocks, yi’s, do we need to determine k?

Page 21: 1 Information Theory Nathanael Paul Oct. 09, 2002

21

Unicity Distance (cont.)

Ciphertext only attack with infinite computing power

Unicity Distance Smallest # n, for which n ciphertexts (on

average) uniquely determine key One-time pad (infinite)

Page 22: 1 Information Theory Nathanael Paul Oct. 09, 2002

22

Defining a language

L: the set of all msgs, for n >= 1. “the natural language” p2: (x1,x2) : x1, x2 in P

pn: (x1,x2,...,xn), xi in P, so pn L each pi inherits a probability distribution from L

(digrams, trigrams, ...) H(pi) makes sense

Page 23: 1 Information Theory Nathanael Paul Oct. 09, 2002

23

Entropy and Redundancy of a language

What is the entropy of a language?

What is the redundancy of a language?

Page 24: 1 Information Theory Nathanael Paul Oct. 09, 2002

24

Application of Entropy and Redundancy

1 <= HL <= 1.5 in english H(P) = 4.18 H(P2) = 3.90

RL = 1 – HL/log226 about 70%, depends on HL

Page 25: 1 Information Theory Nathanael Paul Oct. 09, 2002

25

Unicity in substitution cipher

no = log2|K|/(RL*log2|P|)|P| = 26|K| = 26! (all permutations)

no = log226!/(0.70 * log226)which is about 26.8Which means… on average, if one has 27 letters of ciphertext from a substitution cipher, then you should have enough information to determine the key!

Page 26: 1 Information Theory Nathanael Paul Oct. 09, 2002

26

Ending notes...

key equivocation “How much information is revealed by the

ciphertext about the key?” H(K|C) = H(K) + H(P) – H(C)

Spurious keys incorrect but possible

So reconsider our question:

“Why can’t cryptography and math be separated?”