66
DCSP-8: Minimal length coding II, Hamming distance, Encryption Jianfeng Feng [email protected] http://www.dcs.warwick.ac.uk/ ~feng/dsp.html

DCSP-8: Minimal length coding II, Hamming distance, Encryption

  • Upload
    ilana

  • View
    52

  • Download
    2

Embed Size (px)

DESCRIPTION

DCSP-8: Minimal length coding II, Hamming distance, Encryption. Jianfeng Feng [email protected] http://www.dcs.warwick.ac.uk/~feng/dsp.html. Huffman coding The code in Table 1, however, is an instantaneously parsable code. It satisfies the prefix condition. - PowerPoint PPT Presentation

Citation preview

Page 1: DCSP-8: Minimal length coding II, Hamming distance, Encryption

DCSP-8: Minimal length coding II, Hamming distance, Encryption

Jianfeng Feng

[email protected]

http://www.dcs.warwick.ac.uk/~feng/dsp.html

Page 2: DCSP-8: Minimal length coding II, Hamming distance, Encryption

Huffman coding

The code in Table 1, however, is an instantaneously parsable code.

It satisfies the prefix condition.

Page 3: DCSP-8: Minimal length coding II, Hamming distance, Encryption

• 0.729*1+0.081*3*3+0.009*5*3+0.001*5=1.5980

Page 4: DCSP-8: Minimal length coding II, Hamming distance, Encryption

Decoding

1 1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1

Why do we require a code with the shortest average length?

Page 5: DCSP-8: Minimal length coding II, Hamming distance, Encryption

The derivation of the Huffman code tree is shown in Fig. and the tree itself is shown in Fig. .

In both these figures, the letter A to H have be used in replace of the sequence in Table 2 to make them easier to read.

Page 6: DCSP-8: Minimal length coding II, Hamming distance, Encryption
Page 7: DCSP-8: Minimal length coding II, Hamming distance, Encryption
Page 8: DCSP-8: Minimal length coding II, Hamming distance, Encryption
Page 9: DCSP-8: Minimal length coding II, Hamming distance, Encryption

Frequency for alphabetics

Page 10: DCSP-8: Minimal length coding II, Hamming distance, Encryption

Turbo coding

• Using Bayesian theorem to code and decode

• Bayesian theorem basically said we should employ priori knowledge as much as possible

• Read yourself

Page 11: DCSP-8: Minimal length coding II, Hamming distance, Encryption

Channel coding; Hamming distance

The task of source coding is to represent the course information with the minimum of symbols.

When a code is transmitted over a channel in the presence of noise, errors will occur.

The task of channel coding is to represent the source information in a manner that minimises the error probability in decoding.

Page 12: DCSP-8: Minimal length coding II, Hamming distance, Encryption

It is apparent that channel coding requires the use of redundancy.

If all possible outputs of the channel correspond uniquely to a source input, this is no possibility of detecting errors in the transmission.

To detect, and possibly correct errors, the channel code sequence must be longer than the source sequence.

Page 13: DCSP-8: Minimal length coding II, Hamming distance, Encryption

A good channel code is designed so that, if a few errors occur in transmission, the output can still be decoded with the correct input.

This is possible because although incorrect, the output is sufficiently similar to the input to be recognisable.

Page 14: DCSP-8: Minimal length coding II, Hamming distance, Encryption

The idea of similarity is made more firm by the definition of a Hamming distance.

Let x and y be two binary sequence of the same length.

The hamming distance between these two codes is the number of symbols that disagree.

Page 15: DCSP-8: Minimal length coding II, Hamming distance, Encryption

The hamming distance between these two codes is the number of symbols that disagree.

Two example distances: 0100->1001 has distance 3 (red path); 0110->1110 has distance 1 (blue path)

Page 16: DCSP-8: Minimal length coding II, Hamming distance, Encryption

The Hamming distance between 1011101 and 1001001 is 2.

The Hamming distance between 2143896 and 2233796 is 3.

The Hamming distance between "toned" and "roses" is 3.

Page 17: DCSP-8: Minimal length coding II, Hamming distance, Encryption

Suppose the code x is transmitted over the channel.

Due to error, y is received.

The decoder will assign to y the code x that minimises the Hamming distance between x and y.

Page 18: DCSP-8: Minimal length coding II, Hamming distance, Encryption

It can be shown that to detect n bit errors, a coding scheme requires the use of codewords with a Hamming distance of at least n+1.

it can be also shown that to correct n bit errors requires a coding scheme with a least a Hamming distance of 2n+1 between the codewords.

By designing a good code, we try to ensure that the Hamming distance between possible codewords x is larger than the Hamming distance arising from errors.

Page 19: DCSP-8: Minimal length coding II, Hamming distance, Encryption

1111111111

0000000000 0000011111

0100100100

Page 20: DCSP-8: Minimal length coding II, Hamming distance, Encryption

Channel Capacity

One of the most famous of all results of information theory is Shannon's channel capacity theorem.

For a given channel there exists a code that will permit the error-free transmission across the channel at a rate R, provided R<C, the channel capacity.

Page 21: DCSP-8: Minimal length coding II, Hamming distance, Encryption

C = B log2 ( 1 + (S/N) ) b/s

Page 22: DCSP-8: Minimal length coding II, Hamming distance, Encryption

As we have already noted, the astonishing part of the theory is the existence of a channel capacity.

Shannon's theorem is both tantalizing and frustrating.

Page 23: DCSP-8: Minimal length coding II, Hamming distance, Encryption

It is offers error-free transmission, but it makes no statements as to what code is required.

In fact, all we may deduce from the proof of the theorem is that is must be a long one.

No none has yet found a code that permits the use of a channel at its capacity.

However, Shannon has thrown down the gauntlet, in

as much as he has proved that the code exists.

Page 24: DCSP-8: Minimal length coding II, Hamming distance, Encryption

We shall not give a description of how the capacity is calculated.

However, an example is instructive.

The binary channel is a channel with a binary input and output.

Associated with each output is a probability p that the output is correct, and a probability 1-p it is not.

Page 25: DCSP-8: Minimal length coding II, Hamming distance, Encryption

For such a channel, the channel capacity turns output to be:

C =1+ p log2 p+ (1-p) log2(1-p)

Here, p is the bit error probability.

• If p=0, then C=1.

• If p=0.5, then C=0.

Thus if there is equal of receiving a 1 or 0, irrespective of the signal sent, the channel is completely unreliable and no message can be sent across it.

Page 26: DCSP-8: Minimal length coding II, Hamming distance, Encryption

So defined., the channel capacity is a non-dimensional number.

We normally quote the capacity as a rate, in bits/second.

To do this we relate each output to a change in the signal.

For the binary channel we have C = B [1+p log 2 p+(1-p) log2(1-p)]

We note that C<B, i.e. the capacity is always less than the it rate.

Page 27: DCSP-8: Minimal length coding II, Hamming distance, Encryption

Error detection coding

Page 28: DCSP-8: Minimal length coding II, Hamming distance, Encryption

A very common code is the single parity check code.

Page 29: DCSP-8: Minimal length coding II, Hamming distance, Encryption

A very common code is the single parity check code.

This code appends to each K data bits an additional bit whose value is taken to make the K+1 word even or odd.

Page 30: DCSP-8: Minimal length coding II, Hamming distance, Encryption

A very common code is the single parity check code.

This code appends to each K data bits an additional bit whose value is taken to make the K+1 word even or odd.

Such a choice is said to have even (odd) parity.

Page 31: DCSP-8: Minimal length coding II, Hamming distance, Encryption

A very common code is the single parity check code.

This code appends to each K data bits an additional bit whose value is taken to make the K+1 word even or odd.

Such a choice is said to have even (odd) parity.

With even off parity, a single bit error will make the received word odd (even).

Page 32: DCSP-8: Minimal length coding II, Hamming distance, Encryption

To see how the additional of a parity bit can improve error performance, consider the following example.

Page 33: DCSP-8: Minimal length coding II, Hamming distance, Encryption

To see how the additional of a parity bit can improve error performance, consider the following example.

A common choice of code block is eight.

Suppose that bit error rate is p=10-4. Then

Page 34: DCSP-8: Minimal length coding II, Hamming distance, Encryption

So, the probability of a transmission with an error is as above.

With the additional of a parity error bit we can detect any single bit error.

Page 35: DCSP-8: Minimal length coding II, Hamming distance, Encryption

As can be seen the addition of a parity bit has reduced theuncorrected error rate by three orders or magnitude.

Page 36: DCSP-8: Minimal length coding II, Hamming distance, Encryption

Single parity bits are common in asynchronous transmission.

Where synchronous transmission is used, additional parity symbols are added that check not only the parity of each 8 bit row, but also the parity of each 8 bit column.

The column is formed by listing each successive 8 bit word one beneath the other.

This type of parity checking is called lock sum checking, and it can correct any single 2 bit error in the transmitted block of rows and columns.

However, there are some combinations of errors that will go undetected in such a scheme.

Page 37: DCSP-8: Minimal length coding II, Hamming distance, Encryption
Page 38: DCSP-8: Minimal length coding II, Hamming distance, Encryption

Parity checking in this way provides good protection against single and multiple errors when the probability of the errors are independent.

However, in many circumstances, errors occur in groups, or bursts.

Parity checking the kind just described than provides little protection.

In these circumstances, a polynomial code is used.

Page 39: DCSP-8: Minimal length coding II, Hamming distance, Encryption

Encryption

                                                                                                                        

.

Page 40: DCSP-8: Minimal length coding II, Hamming distance, Encryption

EncryptionIn all our discussion of coding, we have not mentioned what

is popularly supposed to be the purpose of coding: security.

Page 41: DCSP-8: Minimal length coding II, Hamming distance, Encryption

EncryptionIn all our discussion of coding, we have not mentioned what

is popularly supposed to be the purpose of coding: security.

We have only considered coding as a mechanism for improving the integrity of the communication system in the presence of noise.

Page 42: DCSP-8: Minimal length coding II, Hamming distance, Encryption

EncryptionIn all our discussion of coding, we have not mentioned what

is popularly supposed to be the purpose of coding: security.

We have only considered coding as a mechanism for improving the integrity of the communication system in the presence of noise.

The use of coding for security has a different name: encryption.

Page 43: DCSP-8: Minimal length coding II, Hamming distance, Encryption

EncryptionIn all our discussion of coding, we have not mentioned what

is popularly supposed to be the purpose of coding: security.

We have only considered coding as a mechanism for improving the integrity of the communication system in the presence of noise.

The use of coding for security has a different name: encryption.

encryption is the process of obscuring information to make it unreadable without special knowledge

The use of digital computers has made highly secure communication a normal occurrence.

Page 46: DCSP-8: Minimal length coding II, Hamming distance, Encryption

EncryptionEarly examples:

Caesar cipher:

Plain text: Yet it may be roundly asserted that human ingenuity cannot concoct a cipher which human ingenuity cannot resolve

Cipher: Ekz oz sge hk xuatjre gyyxzky zngz nasgt otmktaoze igttuz iutiuiz g iovakz cnoin nasgt otmktaoze igttuz xkyurbk

                                                                                                                        

.

Page 47: DCSP-8: Minimal length coding II, Hamming distance, Encryption

Hill cipher

• With each alphabetic, we associate a number with it

• Using matrix rather than use single number, to code them pairwise, for example

Page 48: DCSP-8: Minimal length coding II, Hamming distance, Encryption

The basis for key based encryption is that is very much easier to encrypt with knowledge of the key than it is to decipher without knowledge of the key.

Secret key cryptography: uses a single secret key for both encryption and decryption.

Page 49: DCSP-8: Minimal length coding II, Hamming distance, Encryption

• Public key cryptography, also known as matched key cryptography, is a form of cryptography in which a user has a pair of cryptographic keys - a public key and a private key.

Page 50: DCSP-8: Minimal length coding II, Hamming distance, Encryption

• Public key cryptography, also known as matched key cryptography, is a form of cryptography in which a user has a pair of cryptographic keys - a public key and a private key.

The private key is kept secret, while the public key may be widely distributed.

The keys are related mathematically, but the private key cannot be practically derived from the public key.

Page 51: DCSP-8: Minimal length coding II, Hamming distance, Encryption

• Public key cryptography, also known as matched key cryptography, is a form of cryptography in which a user has a pair of cryptographic keys - a public key and a private key.

The private key is kept secret, while the public key may be widely distributed.

The keys are related mathematically, but the private key cannot be practically derived from the public key.

A message encrypted with the public key can only be decrypted with the corresponding private key.

Page 52: DCSP-8: Minimal length coding II, Hamming distance, Encryption

This key is use by the sender to encrypt the message.

This message is unintelligible to anyone not in possession of the second, private key.

In this way the private key need not be transferred.

The most famous of such scheme is the public Key mechanism using work of Rivest, Shamir and Adleman (RSA).

Page 53: DCSP-8: Minimal length coding II, Hamming distance, Encryption

It is based on the use of multiplying extremely large numbers and, with current technology, is computationally very expensive.

Based upon a mathematics branch: Number theory

symbol (dictionary)^prime number (mod public key)

= encoded symbol

encoded symbol (dictionary)^ the other prime number (mod public key)

= encoded symbol

Page 54: DCSP-8: Minimal length coding II, Hamming distance, Encryption

RSA numbers are composite numbers having exactly two prime factors that have been listed in the Factoring Challenge of RSA Security® and have been particularly chosen to be difficult to factor.

While RSA numbers are much smaller than the largest known primes, their factorization is significant because of the curious property of numbers that proving or disproving a number to be prime ("primality testing") seems to be much easier than actually identifying the factors of a number ("prime factorization").

Page 55: DCSP-8: Minimal length coding II, Hamming distance, Encryption

Thus, while it is trivial to multiply two large numbers and together, it can be extremely difficult to determine the factors if only their product is given.

With some ingenuity, this property can be used to create practical and efficient encryption systems for electronic data.

RSA Laboratories sponsors the RSA Factoring Challenge to encourage research into computational number theory and the practical difficulty of factoring large integers, and because it can be helpful for users of the RSA encryption public-key cryptography algorithm for choosing suitable key lengths for an appropriate level of security.

A cash prize is awarded to the first person to factor each challenge number.

Page 56: DCSP-8: Minimal length coding II, Hamming distance, Encryption

RSA numbers were originally spaced at intervals of 10 decimal digits between 100 and 500 digits, and prizes were awarded according to a complicated formula.

These original numbers were named according to the number of decimal digits, so RSA-100 was a hundred-digit number.

As computers and algorithms became faster, the unfactored challenge numbers were removed from the prize list and replaced with a set of numbers with fixed cash prizes.

At this point, the naming convention was also changed so that the trailing number would indicate the number of digits in the binary representation of the number.

Page 57: DCSP-8: Minimal length coding II, Hamming distance, Encryption

Hence, RSA-640 has 640 binary digits, which translates to 193 digits in decimal.

RSA numbers received widespread attention when a 129-digit number known as RSA-129 was used by R. Rivest, A. Shamir, and L. Adleman to publish one of the first public-key messages together with a $100 reward for the message's decryption (Gardner 1977).

Despite widespread belief at the time that the message encoded by RSA-129 would take millions of years to break, it was factored in 1994 using a distributed computation which harnessed networked computers spread around the globe performing a multiple polynomial quadratic sieve (Leutwyler 1994).

The corresponding factorization (into a 64-digit number and a 65-digit number) is

Page 58: DCSP-8: Minimal length coding II, Hamming distance, Encryption

                                                                                  

x

Page 59: DCSP-8: Minimal length coding II, Hamming distance, Encryption

RSA-129 is referred to in the Season 1 episode "Prime Suspect" of the television crime drama NUMB3RS.

On Feb. 2, 1999, a group led by H. te Riele

completed factorization of RSA-140 into two 70-digit primes.

In a preprint dated April 16, 2004, Aoki et al. factored RSA-150 into two 75-digit primes.

On Aug. 22, 1999, a group led by H. te Riele completed factorization of RSA-155 into two 78-digit primes (te Riele 1999b, Peterson 1999).

Page 60: DCSP-8: Minimal length coding II, Hamming distance, Encryption

• When the numbers are very large, no efficient, non-quantum integer factorization

algorithm is known; an effort concluded in 2009 by several

researchers factored a 232-digit number (RSA-768), utilizing hundreds of machines over a span of 2 years.

Kleinjung, et al (2010-02-18). Factorization of a 768-bit RSA modulus. International Association for Cryptologic Research

Page 61: DCSP-8: Minimal length coding II, Hamming distance, Encryption

On December 2, Jens Franke circulated an email announcing factorization of the smallest prize number RSA-576 (Weisstein 2003).

This factorization into two 87-digit factors was accomplished using a prime factorization algorithm known as the general number field sieve (GNFS).

On May 9, 2005, the group led by Franke announced factorization of RSA-200 into two 100-digits primes (Weisstein 2005a), and in November 2005, the same group announced the factorization of RSA-674 (Weisstein 2005b).

As the following table shows, RSA-704 to RSA-2048

remain open, carrying awards from ? to ? to whoever is clever and persistent enough to track them down.

Page 62: DCSP-8: Minimal length coding II, Hamming distance, Encryption

A list of the open Challenge numbers may be downloaded from RSA homepage

Page 63: DCSP-8: Minimal length coding II, Hamming distance, Encryption

Number digits prize factored (references)

RSA-100 100  Apr. 1991RSA-110 110  Apr. 1992RSA-120 120  Jun. 1993RSA-129 129 Apr. 1994 (Leutwyler 1994, Cipra 1995)RSA-130 130  Apr. 10, 1996RSA-140 140  Feb. 2, 1999 (te Riele 1999a)RSA-150 150  Apr. 6, 2004 (Aoki 2004)RSA-155 155  Aug. 22, 1999 (te Riele 1999b, Peterson 1999)RSA-160 160  Apr. 1, 2003 (Bahr et al. 2003)RSA-200 200  May 9, 2005 (see Weisstein 2005a)RSA-576 10000 Dec. 3, 2003 (Franke 2003; see Weisstein 2003)RSA-640 20000 Nov. 4, 2005 (see Weisstein 2005b)RSA-704 30000 open RSA-768 50000 open RSA-896 75000 openRSA-102 100000 openRSA-153 150000 openRSA-204 200000 open

Page 64: DCSP-8: Minimal length coding II, Hamming distance, Encryption

An Example RSA numbers: 55 = 5 x 11, Euler phi function =407 and 23 (7x23=1 (mod 40)) (public key: 7 and 55 )

So, we'll take what's left and create the following character set (dictionary):

  2  3  4  6  7  8  9 12 13 14 16 17 18  A  B  C  D  E  F  G  H  I  J  K  L  M 

19 21 23 24 26 27 28 29 31 32 34 36 37   N  O  P  Q  R  S  T  U  V  W  X  Y  Z 

38 39 41 42 43 46 47 48 49 51 52 53   sp  0  1  2  3  4  5  6  7  8  9  * 

Page 65: DCSP-8: Minimal length coding II, Hamming distance, Encryption

The message we will encrypt is "VENIO" (Latin for "I come"):

  V E  N  I  O  31 7 19 13 21

To encode it, we simply need to raise each number to the power of P modulo R.

 V:31^7 (mod 55) = 27512614111 (mod 55) =26  E: 7^7 (mod 55) =     823543 (mod 55) =28  N:19^7 (mod 55) =  893871739 (mod 55) =24  I:13^7 (mod 55) =   62748517 (mod 55) = 7  O:21^7 (mod 55) = 1801088541 (mod 55) =21

So, our encrypted message is 26, 28, 24, 7, 21 -- or "RTQEO" in our personalized character set.

Page 66: DCSP-8: Minimal length coding II, Hamming distance, Encryption

When the message "RTQEO" arrives on the other end of our insecure phone line, we can decrypt it simply by repeating the process -- this time using Q, our private key, in place of P.

R:26^23 (mod 55) = 350257144982200575261531309080576 (mod 55) =31 T:28^23 (mod 55) =1925904380037276068854119113162752 (mod 55) = 7 Q:24^23 (mod 55) =  55572324035428505185378394701824 (mod 55) =19 E: 7^23 (mod 55) =              27368747340080916343 (mod 55) =13 O:21^23 (mod 55) =   2576580875108218291929075869661 (mod 55) =21

The result is 31, 7, 19, 13, 21 -- or "VENIO", our original message.