23
MAT 4830 Final Exam Part I: Take Home Objectives To explain the necessity for error-correcting codes. To introduce the finite field , and discuss the Hamming family of error-correcting codes. Name: Instructions Show details of your solutions to ALL problems and exercises. Some problems may require explanations and/or formal answers in full sentences. Do not reformat the pages. Resize your fonts or rescale your diagrams if necessary to fit into the space given to each problem. Do not use any other reference materials such as internet, books, or earth’s inhabitants. All simulations and programming are expected to be done on Maple. Make sure your programs are reasonably efficient. o Give explanations to your methodologies. o Discuss your results. o Further discussions are not mandated but are welcomed. Please report any typos. Email a pdf version of this file, along with the Maple worksheet to Wai by the deadline: 5 min. before the in-class final exam starts. 1

MAT1228 Classwork - Seattle Pacific Universitymyhome.spu.edu/lauw/4830/homework/Error-Correcting Codes.doc · Web viewThus, the word “error” could be transmitted as “eerrrroorr.”

Embed Size (px)

Citation preview

MAT 4830 Final Exam Part I: Take Home

Objectives To explain the necessity for error-correcting codes. To introduce the finite field , and discuss the Hamming family of error-correcting codes.

Name:

Instructions Show details of your solutions to ALL problems and exercises. Some problems may require explanations and/or formal answers in full sentences. Do not reformat the pages. Resize your fonts or rescale your diagrams if necessary to fit into the space given to each problem. Do not use any other reference materials such as internet, books, or earth’s inhabitants. All simulations and programming are expected to be done on Maple. Make sure your programs are reasonably efficient.

o Give explanations to your methodologies. o Discuss your results. o Further discussions are not mandated but are welcomed.

Please report any typos. Email a pdf version of this file, along with the Maple worksheet to Wai by the deadline: 5 min. before the in-class final exam starts.

1

Section1 Introduction: Digitizing, Detecting and Correcting

The transmission of information over long distances began very early in human history.The discovery of electromagnetism and its many applications allowed us to send messages through wires and electromagnetic waves in the second half of the nineteenth century. Whether the message is sent in spoken word (in any human language) or an encoded form (using Morse code (1836), for example), the utility of being able to rapidly detect and correct errors is obvious.

An early method for improving the fidelity of a transmission is of historic importance. When telephones were first invented (both wired and wireless), the quality of transmission left much to be desired. Thus, rather than speaking directly, it was quite usual to spell out words phonetically. For example, in order to say the word “error,” the caller would instead say “Echo, Romeo, Romeo, Oscar, Romeo.” The American and British armies had devised such “alphabets” by the First World War. This method of improving the reliability of transmission works by multiplying the information; the hope is that the receiver can extract the original message, “error,” from the code, “Echo, Romeo, Romeo, Oscar, Romeo,” even when reception quality is poor. This “multiplication of information” or redundancy is the idea underlying all error detectors and correctors.

Our second example is that of an error-detection code: it allows us to detect when an error has occurred in the transmission, but it does not let us correct it. In computer science it is normal to associate each character of our extended alphabet (a, b, c, . . . ,A, B, C, . . . , 0, 1, 2, . . . , +, -, :, ;, . . . ) with a number between 0 and 127. In a binary representation, seven bits (a contraction of “binary digits”) are required to represent each of the possible characters. For example, suppose that the letter a is associated with the number 97. Because

the letter a will be encoded as 1100001.

Exercise 1.1 Find the binary representation of the decimal number 102.

2

The usual encoding is the following “dictionary”:

To detect errors we add an eighth bit to each character, called a parity bit. This bit is placed in the leftmost position, and is calculated such that the sum of all eight bits will always be even. For example, since the sum of the seven bits for “A” is1+0+0+0+0+0+1 = 2, the parity bit is 0, and “A” will be represented by 01000001.Similarly, the sum of the seven bits of “a” is 1+1+0+0+0+0+1 = 3 and “a” will be represented by the eight bits 11100001. This parity bit is an error-detection code. It allows us to detect when a single error has occurred in the transmission, but it does not allow us to correct for it, since we have no way of knowing which of the eight bits has been altered. However, once the receiver has determined that an error has occurred, he can simply ask for the affected character to be retransmitted. Note that this error detector assumes that at most one bit will be in error. This hypothesis is reasonable if the transmission is nearly perfect and there is a low probability that two in eight consecutive bits will be in error.

Exercise 1.2 Give an example to illustrate how the parity bit fails to detect transmission errors as described in the paragraph above.

3

Our third example presents a simple idea for constructing an error-correcting code. Such a code allows us both to detect and correct errors. It consists in simply sending the entire message several times. For example, we could simply repeat each character in a message twice. Thus, the word “error” could be transmitted as “eerrrroorr.” As such, this is only an error-detection code, since we have no way of knowing where the error is if one is detected. Which is the correct message if we receive “aaglee”: “age” or “ale”? In order to make this an error-correcting code, we simply need to repeat each letter three times. If it is reasonable to assume that no more than one in three letters will be received in error, then the correct letter can be determined as a simple majority. For example, the message “aaaglleee” would be received as “ale” and not “age.” Such a simple error-correcting code is not used in practice, since it is very costly: it triples the cost of sending each message! The codes that we will present in this chapter are much more economical. Note that it is not impossible for two or even three errors to occur in a sequence of three characters; our hypothesis is only that this is very unlikely.

Exercise 1.3 Let be the probability that a bit will be transmitted in error. (Hint: Be sure to define the appropriate random variables and use them to explain your solutions.)(a) What is the probability of having more than one bit in error in a transmission of seven bits?

(b) Consider transmitting a bit by repeating it three times. We decode by choosing the bit that is in the majority. Calculate the probability of correctly decoding the sent bit.

4

Both error-detecting and error-correcting codes have existed for a long time. In the digital age these codes have become more necessary and easier to implement. Their usefulness can be understood better when one knows the size of usual picture and music files. Figure 1.1 shows a very small digitized photo of the peak of a tower at theUniversit´e de Montr´eal, in Montr´eal, Canada. At the left, the photo is shown at its intended resolution, while at the right, it has been enlarged eight times, allowing the individual pixels to be seen clearly. The image was divided into 72 × 72 pixels, each of which is represented by a number between 0 and 255, indicating the intensity of gray from black to white. Each pixel requires 8 bits, meaning that transmitting this tiny black-and-white image requires sending 72 × 72 × 8 = 41,472 bits. And this example is far from the current digital cameras, whose sensors capture more than 2,000 × 3,000 pixels in color!

Fig. 1.1. A digitized photo: the “original” photo is at the left, while the same image is seen eight-times enlarged at the right.

Exercise 1.4 File sizes are expressed in bytes (1 byte = 8 bits), kilobytes (1 KB = 1,000 bytes), megabytes (1 MB = 106 bytes). Compute the size of a black-and-white image with 1280 ×1850 pixels in MB.

5

Sound, music in particular, is very often stored in digital form. In contrast to images, digitizing sound is harder to visualize. Sound is a type of wave. Waves in the ocean undulate along the surface of the water, light is a wave in the electromagnetic field, and sound is a wave in air density. If we measured the density of the air at a fixed location near a (well-tuned) piano, we would see that the density increases and decreases 440 times a second when the middle A is played. The variation is very small, but our ears are able to detect it and translate it to an electric wave that is then transmitted to and analyzed by our brain. Figure 1.2 shows a representation of this pressure wave. The horizontal axis indicates time, while the vertical axis indicates the amplitude of the

Fig. 1.2. A sound wave measured over a fraction of a second.

wave.) When the value is positive, this indicates that the density of the air is higher than normal (air at rest), while negative values indicate a decreased density. This wave can be digitized by approximating it with a step function. Each short time period of Δ seconds is approximated by the average value of the wave over the time interval. If Δ is sufficiently small, the step function approximation to the wave is indistinguishable from the original as heard by the human ear. (Figure 1.3 shows another sound wave and a step function digitization of it.) This digitization having been accomplished, the wave may now be represented by a sequence of integers identifying the heights of the steps along some predefined scale.

Fig. 1.3. A sound wave and a step function approximation to it.

6

Exercise 1.5 The average value of a function on is given by

.

Let be defined on . Write a Maple program to

(a) calculate the average values of on subintervals of with step size of ;

(b) produce a graph of on along with the step function approximation of by the average values in (a).

7

On compact discs, the sound wave is cut into 44,100 samples per second (equivalent to a pixel in a photo), and the intensity of each sample is represented by a 16-bit integer (216 = 65,536). Recalling that compact discs store sound in stereo, then we see that each second of music requires 44,100 × 16 ×2 = 1,411,200 bits and 70 minutes of audio requires 1,411,200×60×70 = 5,927,040,000 bits = 740,880,000 bytes ≈ 740 MB. Given such a large mass of data, it is desirable to be able to automatically detect and correct errors.

This module explores two classic families of error-correcting codes: those of Hamming and those of Reed and Solomon. The first of these was used by France-Telecom for the transmission of Minitel, a precursor to the modern Internet. Reed–Solomon codes are used in compact discs. The Consultative Committee for Space Data Systems, created in 1982 for standardizing the practices of different space agencies, recommended the use of Reed–Solomon codes for information transmitted over satellites.

8

Section 2 The Finite Field

In order to discuss Hamming codes we must first be comfortable working with the finite field of two elements . A field is a collection of elements on which we can define two operations, called “addition” and “multiplication,” which must each satisfy properties that are common for rationals and real numbers: associativity, commutativity, distributivity of multiplication with respect to addition, the existence of an identity element for each of addition and multiplication, the existence of an additive inverse, and the existence of a multiplicative inverse for all nonzero elements. The reader will surely recognize the rationals , the reals , and maybe the complex numbers as having these properties. These three sets, combined with the normal + and × operations, are fields. But there exist many others!

Although we will discuss the mathematical structure of fields in more generality inSection 5, we begin by providing rules for addition and multiplication over the set of binary digits . The addition and multiplication tables are given by

These operations satisfy the same rules that are satisfied by the fields , , and : associativity, commutativity, distributivity, and the existence of identity elements and inverses. For example, using both tables above we can verify that for all , distributivity is satisfied:

.Since x, y, and z each take one of two values, this property can be fully proved by considering each of the eight possible combinations of the triplet .Here we show an explicit verification of the distributivity property for the triplet

:

and.

As in , , and , 0 is the identity element for addition and 1 is the identity element for multiplication. Inspection shows that all elements have an additive inverse.

Exercise 2.1 An additive inverse of an element a is an element b such that . What is the additive inverse of 1? Why?

9

Similarly, each element of has a multiplicative inverse. Verifying this last

property is very simple, since there is only one element in , and its multiplicative inverse is itself, since .

Much as we define the vector spaces , , and , it is entirely possible to consider three-dimensional vector spaces in which each of the entries is an element of . It is possible to perform vector addition and scalar multiplication (with coefficients from , obviously!) of these vectors in using the definition of addition and multiplication in . For example,

and

Since the components must be in and only linear combinations with coefficients from are permitted, the number of vectors in (and in any for finite n) is finite!

Caution: even though the dimension of is finite, the number of vectors in is infinite. On the other hand, there are only vectors in the vector space , givenby

.Vector spaces over finite fields such as may seem a little daunting at first because most linear algebra courses do not discuss them, but many of the methods of linear algebra (matrix calculations, among others) apply to them.

Notations Alert there are two sets of operations here with common and different notations.Addition/ Vector Addition Multiplication/ Scalar Multiplication

+

+

Exercise 2.2 Fill in the step-by-step details of the calculation

10

Section 3 The Hamming Code

Here is a first example of a modern error-correcting code. Rather than using the normal alphabet, it uses the elements of . This is not really a restriction, since we have already seen ways of encoding the alphabet using only these binary digits. Moreover, we limit ourselves to transmitting “words” containing exactly four “letters” where

. Our vocabulary, or code , therefore contains only 16 “words” or elements. Rather than transmitting the four symbols to represent an element, we will instead transmit the seven symbols defined as follows:

Thus, to transmit the element we send the message

,since

Exercise 3.1 In the Hamming code, calculate the vector to be sent if we wish to

transmit the words .

11

Since the first four coefficients of are precisely the four symbols we wish to transmit, what purpose do the other three symbols serve? These symbols are redundant and allow us to correct any single erroneous symbol. How can we accomplish this “miracle”?

We consider an example. The receiver receives the seven symbols .

We distinguish the received symbols from the sent symbols in case of an error in the transmission. Due to the quality of the transmission link, it is reasonable for us to assume that at most one symbol will be in error. The receiver then calculates

and compares them with , , and respectively. If there is no error due to the transmission, , , and should coincide with , , and that were received. Here is the calculation

The receiver realizes that an error has occurred, since two of these calculated values ( , and ) are not in agreement with those received.

But where is the error? Is it in one of the four original symbols or in one of the three redundant ones? It is simple to exclude the possibility that one of , , and is in error. By changing only one of these values, there will remain a second identity that is not satisfied. Thus one of the first four symbols must be in error.

Among first four letters, which can we change that will simultaneously correct the two incorrect values of while preserving the correct value of the first? The answer is simple: we must correct . In fact, the first sum does not contain and thus is the only one that will not be affected by changing it. The two other relations do contain , and they will both be “corrected” by the change. Thus, even though the first four symbols of the message were received as

,the receiver determines the correct message as

.

12

Consider each of the possibilities. Suppose that the receiver received the symbols. The only thing the receiver knows for sure (according to our

hypothesis) is that these symbols correspond to the seven transmitted symbols , with the exception of at most one error. Thus, there are eight

possibilities:

(0) all of the symbols are correct,(1) is in error,(2) is in error,(3) is in error,(4) is in error,(5) is in error,(6) is in error,(7) is in error.

Using the redundant symbols, the receiver can determine which of these possibilities iscorrect. By calculating , , and , he can determine which of the eight possibilitiesholds with the help of the following table:

(0) if and and ,(1) if and ,(2) if and ,(3) if and ,(4) if and and ,(5) if ,(6) if ,(7) if .

The hypothesis that at most one symbol is in error is crucial to this analysis. If two letters had been in error, then the receiver would not be able to distinguish, for example, between the cases “ is in error” and “ and are both in error” and would therefore not be able to perform the appropriate correction. However, in the case of at most one error the receiver can always detect and correct the error. After having discarded the three extra symbols, the receiver is assured of having received the originally intended message. The process can be visualized as

13

How does the Hamming code compare to other error-correcting codes? This question is a little too vague. In fact, the quality of a code can be judged only as a function of the needs: the error rate of the channel, the average length of messages to be sent, the processing power available for encoding and decoding, etc. Nonetheless, we can compare it to our simple method of repetition. Each of the symbols , could be repeated until we attained sufficient confidence that the message will be correctly decoded. We again take the hypothesis that at most one bit error can occur every “few” bits (fewer than 15 bits). As we have already seen, if each symbol is sent twice, we are able only to detect an error. Thus, we must transmit each symbol at least 3 times, requiring a total of 12 bits to send this 4-bit message. The Hamming code is able to send the same message with the same confidence in only 7 bits, a significant improvement.

Exercise 3.2 Let . Then, . Use this example to illustrate why the receiver cannot distinguish between the cases “ is in error” and “ and are both in error”.

14

Exercise 3.3 In the Hamming code, The receiver receives the words

, , and . What were the originally transmitted words?

(a)

(b)

(c)

15

Exercise 3.4 Design simulation experiment(s) on Maple to investigate the effectiveness of Hamming code.(Note that the problem statement is intentional simple without a lot of instructions. It gives you the flexibility to showcase your ideas.)

16