Finite Fields and Error-Correcting Codes · CHAPTER 1 Finite Fields 1. Basic De nitions and Examples In this introductory section we discuss the basic algebraic opera-tions addition

Lecture Notes in Mathematics

Finite Fields and Error-Correcting Codes

Karl-Gustav Andersson

(Lund University)

(version 1.014 - 31 December 2015)

Translated from Swedish by Sigmundur Gudmundsson

Contents

Chapter 1. Finite Fields 31. Basic Definitions and Examples 32. Calculations with Congruences 83. Vector Spaces 144. Polynomial Rings 175. Finite Fields 236. The Existence and Uniqueness of GF (pn) 287. The Mobius Inversion Formula 32

Chapter 2. Error-Correcting Codes 351. Introduction 352. Linear Codes and Generating Matrices 383. Control Matrices and Decoding 414. Some Special Codes 455. Vandermonde Matrices and Reed-Solomon Codes 50

1

CHAPTER 1

Finite Fields

1. Basic Definitions and Examples

In this introductory section we discuss the basic algebraic opera-tions addition and multiplication from an abstract point of view. Weconsider a set A equipped with two operations defined in such a waythat to each pair of elements a, b ∈ A there are associated two newelements a + b and a · b in A called the sum and the product of a andb, respectively. We assume that for the sum we have the following fouraxioms.

a+ (b+ c) = (a+ b) + c(A1)

a+ b = b+ a(A2)

(A3) there exists an element 0 ∈ A such that

a+ 0 = a for all a ∈ A

(A4) for every a ∈ A there exists an element −a ∈ A such that

a+ (−a) = 0.

These axioms guarantee that subtraction is well-defined in A. It iseasily checked that (A1)–(A4) imply that the equation a+ x = b in Ahas the unique solution x = b + (−a). In what follows we will writeb− a for b+ (−a).

The corresponding axioms for the multiplication are

a · (b · c) = (a · b) · c(M1)

a · b = b · a(M2)

(M3) there exists an element 1 ∈ A such that

1 · a = a · 1 = a for all a ∈ A

3

4 1. FINITE FIELDS

(M4) for every a 6= 0 in A there exists an element a−1 ∈ A suchthat

a · a−1 = 1.

Sometimes we will only assume that some of these axioms for themultiplication are satisfied. If they all apply then, precisely as for thesubtraction, a division is well-defined in A i.e. the equation ax = bwith a 6= 0 has the unique solution x = a−1 · b.

Finally, we always assume the distributive laws for A:

(D) a · (b+ c) = a · b+ a · c and (a+ b) · c = a · c+ b · c

Definition 1.1. A ring A is a set equipped with an addition anda multiplication such that all the rules (A1)–(A4) are satisfied andfurthermore (M1) and (D). If A also satisfies (M2) it is said to be acommutative ring and if (M3) is fulfilled we say that the ring has aunity. A ring that contains at least two elements and satisfies all therules (M1)–(M4) for the multiplication is called a field.

Example 1.2. The rational numbers Q , the reals R and the com-plex numbers C are important examples of fields, when equipped withtheir standard addition and multiplication. The integers Z form acommutative ring but are not a field since (M4) is not valid in Z .

Example 1.3. The set M2(R) of 2× 2 real matrices forms a ring.Here 0 is the zero matrix and 1 is the unit matrix. In M2(R) thecommutative law (M2) is not satisfied. The rule (M4) is not fulfilledeither, since there exist non-zero matrices that are not invertible. Forexample we have (

1 −2−2 4

)(4 −22 −1

)=

(0 00 0

).

It follows from this relation that none of the two matrices on the left-hand side are invertible.

Definition 1.4. Two elements a 6= 0 and b 6= 0 in a ring are calledzero divisors if a · b = 0.

Example 1.5. The two matrices(1 −2−2 4

)and

(4 −22 −1

)in Example 1.3 are zero divisors in the ring M2(R).

We shall now discuss, in more detail, a family of rings that will playan important role in what follows. Let n ≥ 2 be a given integer. We

1. BASIC DEFINITIONS AND EXAMPLES 5

say that two integers a and b are congruent modulo n if their differencea − b is divisible by n. For this we simply write a ≡ b (mod n). Forexample we have 13 ≡ 4 (mod 3). Denote by [a] the class of integersthat are congruent to a modulo n. We can then define an addition anda multiplication of such congruence classes by

[a] + [b] = [a+ b] and [a] · [b] = [a · b].

Here we must verify that these definitions do not depend on the choiceof representatives for each congruent class. So assume that a ≡ a1(mod n) and b ≡ b1 (mod n). Then a1 = a + kn and b1 = b + ln forsome integers k and l. This implies that

a1 + b1 = a+ b+ (k + l)n and a1b1 = ab+ (al + bk + kln)n,

hence a1+b1 is congruent with a+b and a1b1 with ab modulo n. Denoteby Zn the set of congruence classes modulo n i.e.

Zn = {[0], [1], [2], . . . , [n− 1]}.

It is easily checked that the above defined addition and multiplicationturn Zn into a commutative ring.

Example 1.6. In the ring Z11 we have

[5] + [9] = [14] = [3] and [5] · [9] = [45] = [1]

and in Z12 the following equalities hold

[4] + [9] = [13] = [1] and [4] · [9] = [36] = [0].

As a direct consequence of the example we see that [5] is the mul-tiplicative inverse of [9] in the ring Z11. The following result gives acriteria for an element of Zn to have a multiplicative inverse.

Theorem 1.7. Let [a] in Zn be different from [0]. Then there existsan element [b] in Zn such that [a][b] = [1] if and only if a and n arerelatively prime i.e. they do not have a non-trivial common divisor.

Proof. Let us first assume that a and n have a common divisord ≥ 2. Then a = kd and n = ld for some integers k and l with0 < l < n. This implies that [l][a] = [lkd] = [kn] = [0]. Hence theredoes not exist a multiplicative inverse [b] to [a], because in that case

[l] = [l][1] = [l][a][b] = [0][b] = [0].

On the other hand, if a and n are relatively prime then it is a con-sequence of the Euclidean algorithm that there exist integers b and csuch that 1 = ab+ nc. This gives [1] = [a][b]. �

6 1. FINITE FIELDS

Example 1.8. We will now use the Euclidean algorithm to deter-mine whether or not [235] has a multiplicative inverse in Z567.

567 = 2 · 235 + 97

235 = 2 · 97 + 41

97 = 2 · 41 + 15

41 = 3 · 15− 4

15 = 4 · 4− 1

This shows that 567 and 235 are relatively prime, and by following thecalculations backwards we see that

1 = 4·4−15 = 4·(3·15−41)−15 = 11·15−4·41 = · · · = 63·567−152·235.

Hence the multiplicative inverse of [235] is [−152] = [415].

If n = p is a prime, then it is clear that none of the numbers1, 2, . . . , p − 1 has a common divisor with p. This shows that all theclasses [1], [2], . . . , [p− 1] in Zp, different from [0], have a multiplicativeinverse, so Zp is a field. If n is not a prime, then n = kl for someintegers k, l ≥ 2. Then none of the two classes [k] and [l] has an inversein Zn, so Zn is not a field. We summarize:

Theorem 1.9. The ring Zn is a field if and only if n is a prime.

We conclude this section by defining the notion of an isomorphismbetween rings. Let A1 and A2 be two rings and assume that there existsa bijective map f from A1 to A2 such that

f(a+ b) = f(a) + f(b) and f(a · b) = f(a) · f(b)

for all elements a and b in A1. In that case, we say that the rings A1 andA2 are isomorphic and that f is an isomorphism from A1 to A2. Tworings that are isomorphic are actually just two different representationsof the same ring. An isomorphism corresponds to just changing thenames of the elements. All calculations in one of the rings correspondto exactly the same calculations in the other.

Example 1.10. Let M be the ring of all 2×2 matrices of the form(a −bb a

)where a and b are real numbers and the operations are the standardmatrix addition and matrix multiplication. Then the map

M 3(a −bb a

)7→ a+ ib ∈ C

1. BASIC DEFINITIONS AND EXAMPLES 7

defines an isomorphism from M to the ring C of complex numbers.The reader is encouraged to check this fact.

Exercises

Exercise 1.1. Show that the following rules are valid in any ring:

(1) 0 · a = a · 0 = 0, (Hint: 0 · a+ 0 · a = 0 · a.)(2) (−a)b = a(−b) = −ab,(3) (−a)(−b) = ab.

Exercise 1.2. Show that a field does not have any zero divisors.

Exercise 1.3. Show that if a is not a zero divisor in the ring Athen the following cancelation law applies

ax = ay ⇒ x = y

for all x and y in A.

Exercise 1.4. Let M be the set of all matrices(a 2b−b a

),

where a and b are integers. Show that, with the standard matrix addi-tion and multiplication, M forms a commutative ring with unity. DoesM have any zero divisors?

Exercise 1.5. Let Q[√

2] be the set of all numbers of the forma+ b

√2, where a and b are rational. Show that the usual addition and

multiplication of real numbers turn Q[√

2] into a field.

Exercise 1.6. Let Z[i] be the set of Gaussian integers a+ ib, wherea and b are integers. Show that Z[i], with the usual addition andmultiplication of complex numbers, is a commutative ring with unity.For which elements u ∈ Z[i] does there exist a multiplicative inverse vi.e. an element v such that uv = 1?

Exercise 1.7. Show that a ring A is commutative if and only if

(a+ b)2 = a2 + 2ab+ b2

for all a and b in A.

Exercise 1.8. Find out if the determinant∣∣∣∣∣∣325 131 340142 177 875214 122 961

∣∣∣∣∣∣

8 1. FINITE FIELDS

is an odd number or an even one.

Exercise 1.9. Solve in Z23 the equations

[17] · x = [5] and [12] · x = [7].

Exercise 1.10. Determine if [121] and [212] are invertible in Z9999

or not. Find the inverses if they exist.

Exercise 1.11. Consider the elements [39], [41], [46] and [51] inZ221.

(1) Which of these are zero divisors?(2) Which have a multiplicative inverse? Find the inverses if they

exist.

Exercise 1.12. Solve the following systems of equations{4x+ 7y ≡ 3 (mod 11)

8x+ 5y ≡ 9 (mod 11),

{4x+ 7y ≡ 5 (mod 13)

7x+ 5y ≡ 8 (mod 13).

Exercise 1.13. Determine the digits x and y such that the follow-ing decimal numbers are divisible by 11

2x653874 , 37y64943252.

(Hint: 10n ≡ (−1)n (mod 11).)

Exercise 1.14. Let A be a finite commutative ring with a unity.Show that if a ∈ A is not a zero divisor, then a has a multiplicativeinverse. (Hint: Consider the map x 7→ ax , x ∈ A.)

Exercise 1.15. Let a be a non-zero element in a field A.

(1) Show that if a−1 = a, then either a = 1 or a = −1.(2) Prove Wilson’s theorem stating that for every prime p we have

(p− 1)! ≡ −1 (mod p).

2. Calculations with Congruences

Let F be a finite field with q elements and F ∗ = {x ∈ F ; x 6= 0}.We order the elements of F ∗ in a sequence x1, x2, . . . , xq−1. Then forevery fixed a ∈ F ∗ the sequence ax1, ax2, . . . , axq−1 contains exactly thesame elements i.e. those of F ∗, since if axi = axj then multiplicationby a−1 gives xi = xj. We have therefore shown that

q−1∏i=1

(axi) =

q−1∏i=1

xi .

2. CALCULATIONS WITH CONGRUENCES 9

By collecting a from each of the different factors on the left-hand sideand dividing by

∏q−1i=1 xi, we obtain aq−1 = 1 and have thereby proven

the following result.

Theorem 2.1. Let F be a finite field with q elements and a 6= 0 bean element of F . Then

aq−1 = 1.

Specializing to the case when F = Zp, for some prime p, we obtainthe following result due to Pierre de Fermat in 1640:

Theorem 2.2 (Fermat’s little theorem). If p is a prime numberand a is an integer not divisible by p, then

ap−1 ≡ 1 (mod p).

Example 2.3. We now want to calculate the least positive remain-der when dividing 3350 by 17. Since 17 is a prime, Fermat’s theoremtells us that 316 ≡ 1 (mod 17). Hence

3350 = 321·16+14 ≡ 314 (mod 17).

A continued calculation modulo 17 gives

314 = 97 = 9 · 813 ≡ 9 · (−4)3 = 9 · (−4) · 16 ≡ 9 · (−4) · (−1) = 36 ≡ 2.

The remainder that we are looking for is therefore 2.Alternatively, one can show that 314 ≡ 2 by observing that 314 ·32 =

316 ≡ 1. This implies that [314] = [9]−1 = [2], since 2 · 9 = 18 ≡ 1.

The next result generalizes Fermat’s little theorem.

Theorem 2.4. Let p and q be different prime numbers and m be apositive integer. Then

am(p−1)(q−1)+1 ≡ a (mod pq)

for every integer a.

Proof. If p does not divide a, then it follows from Fermat’s theo-rem that

ap−1 ≡ 1 (mod p).

This implies thatam(p−1)(q−1) ≡ 1 (mod p).

Multiplication by a gives

am(p−1)(q−1)+1 ≡ a (mod p).

This equality is of course also valid when p divides a, since then a ≡ 0(mod p). In the same way, we see that

am(p−1)(q−1)+1 ≡ a (mod q).

10 1. FINITE FIELDS

Since both p and q divide the difference am(p−1)(q−1)+1 − a so does theproduct pq and the statement is proven. �

Example 2.5. Theorem 2.4 has an interesting application in cryp-tology. Assume that a receiver, for example a bank, receives messagesfrom a large number of senders and does not want the content to be readby unauthorized individuals. Then the messages must be encrypted.This means that an encrypting key must be available to the sender.One way to achieve this is to use a system with a public key. Suchsystems are based on the idea that there exist functions that are eas-ily computed but the inverse operation is very difficult without someadditional information. The following method (the RSA-system) wassuggested by Rivest, Shamir and Adelman in 1978.

Choose two large1 different primes p and q and set n = pq. Thenpick a large number d relatively prime to (p− 1)(q − 1). According toTheorem 1.7 of the last section, d has a multiplicative inverse e in thering Z(p−1)(q−1), which can be determined by the Euclidean algorithm.The numbers n and e are made public as well as necessary informationon how they should be used for the encrypting. The numbers p, q andd are kept secret by the receiver.

Assume that all the messages are of the form of one or more integersbetween 1 and n. A sender interested in sending such a number M willencrypt it by calculating C ≡ M e (mod n). After receiving C, thereceiver calculates the unique number D between 1 and n satisfyingD ≡ Cd (mod n). According to Theorem 2.4 we have the equalityD ≡ M (mod n). Indeed, since e is the multiplicative inverse of d inthe ring Z(p−1)(q−1), it follows that ed = m(p − 1)(q − 1) + 1 for someinteger m, so

D ≡ Cd ≡M ed = Mm(p−1)(q−1)+1 ≡M (mod n).

Now the interesting question is, if it is possible to use only thepublic information e and n to get hold of the content of the messagesent. To do this within a reasonable amount of time one would needto know the prime numbers p and q. These can be determined byfactorizing n. Even with our modern computers this should in generalbe an impossible task.

In the next example we deal with the problem of finding a simulta-neous solution to several different congruences.

Example 2.6. In a 2000 years old book by the Chinese authorSun-Tsu one can read:

1By large numbers we here mean numbers with hundreds of digits.


“There exists an unknown number which divided by 3 leaves theremainder 2, by 5 the remainder 3 and by 7 the remainder 2. What isthis number?”

In other words, one should find an integer x that simultaneouslysolves the three congruences

x ≡ 2 (mod 3)

x ≡ 3 (mod 5)

x ≡ 2 (mod 7).

The method that Sun-Tsu presented for solving the problem gives theChinese remainder theorem.

Theorem 2.7. Assume that the integers n1, n2, . . . , nk are pairwiserelatively prime. Then the system of congruences

x ≡ a1 (mod n1)

x ≡ a2 (mod n2)

. . .

x ≡ ak (mod nk)

has a unique solution x modulo n = n1n2 · · ·nk.

Proof. Define

Ni =n

ni=∏j 6=i

nj.

Then the numbers Ni and ni are relatively prime for each i. Hencethere exist integers si and ti such that

siNi + tini = 1.

Set

x =k∑j=1

ajsjNj = a1s1N1 + · · ·+ akskNk.

We have siNi ≡ 1 (mod ni) and Nj ≡ 0 (mod ni) when j 6= i. Thisimplies that

x ≡ ai (mod ni) , i = 1, . . . , k.

We still have to show that the solution x is uniquely determinedmodulo n. Assume that x was another solution. Then x ≡ x (mod ni)for all i. Since the numbers ni are pairwise relatively prime, it followsthat x ≡ x (mod n) and the result follows. �

12 1. FINITE FIELDS

Example 2.8. In the last example we have n1 = 3, n2 = 5, n3 =7 and N1 = 35, N2 = 21, N3 = 15. We find

2 · 35− 23 · 3 = 1

1 · 21− 4 · 5 = 1

1 · 15− 2 · 7 = 1.

So the above method gives the solution

x = 2 · 2 · 35 + 3 · 1 · 21 + 2 · 1 · 15 = 233.

The least positive solution is

233− 2n = 233− 210 = 23.

The Chinese remainder theorem has another, a bit more abstract,formulation. If A1, . . . , Ak are k rings, then we can form a new ringdenoted by A1 × · · · × Ak consisting of all elements (a1, . . . , ak) whereai ∈ Ai. The addition and the multiplication in the new ring are definedby

(a1, . . . , ak) + (b1, . . . , bk) = (a1 + b1, . . . , ak + bk)

(a1, . . . , ak) · (b1, . . . , bk) = (a1 · b1, . . . , ak · bk).Assume now that n = n1n2 · · ·nk where the numbers ni are pairwise

relatively prime. Then the Chinese remainder theorem states that forgiven integers a1, . . . , ak with 0 ≤ ai < ni, there exists precisely oneinteger a with 0 ≤ a < n such that

a ≡ ai (mod ni) , i = 1, . . . , k.

It is easily checked that the map that takes a to (a1, . . . , ak) is anisomorphism between Zn and Zn1 × · · · × Znk

.

Example 2.9. Let n = 1001 = 7 · 11 · 13 and consider the twoelements [778] and [431] in Z1001. Then

778 ≡ 1 (mod 7) 431 ≡ 4 (mod 7)

778 ≡ 8 (mod 11) 431 ≡ 2 (mod 11)

778 ≡ 11 (mod 13) 431 ≡ 2 (mod 13).

Instead of calculating the product 778 · 431 modulo 1001, we can alsocalculate

(1, 8, 11) · (4, 2, 2) = (4, 16, 22) ≡ (4, 5, 9)

in the ring Z7 × Z11 × Z13 and then, as in the proof of the Chineseremainder theorem, determine the corresponding element in Z1001. Thissort of arithmetic is sometimes useful when performing this type ofcalculations with large numbers.


Exercises

Exercise 2.1. Find the multiplicative inverse of [45] in Z101 . Thendetermine the integer x between 1 and 100 such that

4599 ≡ x (mod 101) .

Exercise 2.2. In each of the following cases, find the least non-negative integer x satisfying

x ≡ 35000 (mod 13), x ≡ 3100 (mod 101),

x ≡ 340 (mod 23), x ≡ 21000 (mod 7).

Exercise 2.3. Show that if p and q are different primes, then

pq−1 + qp−1 ≡ 1 (mod pq).

Exercise 2.4. Let p1, p2, . . . , pk be different primes and r be a pos-itive integer divisible by pi − 1 for all i = 1, . . . , k. Show that

ar+1 ≡ a (mod p1 · p2 · · · pk)for all integers a.

Exercise 2.5. Show that all integers n satisfy

(1) n7 ≡ n (mod 42),(2) n13 ≡ n (mod 2730).

(Hint: Use the result from Exercise 2.4.)

Exercise 2.6. Find the least positive integer M , such that

M49 ≡ 21 (mod 209).

Exercise 2.7. Show that if p is a prime and m is a positive integer,then

a(p−1)pm−1 ≡ 1 (mod pm)

for all integer a not divisible by p. (Hint: Copy the proof of Theorem2.1 with F ∗ equal to the set of all invertible elements in Zpm .)

Exercise 2.8. Show that all odd integers k satisfy

(1) k4 ≡ 1 (mod 16),(2) k2

n ≡ 1 (mod 2n+2) where n ≥ 2.

Exercise 2.9. Find all integers x such thatx ≡ 1 (mod 3)

x ≡ 3 (mod 7)

x ≡ 7 (mod 16) .

14 1. FINITE FIELDS

Exercise 2.10. Find the least positive integer x satisfying{2x ≡ 9 (mod 11)

7x ≡ 2 (mod 19) .

Exercise 2.11. Verify that{95 ≡ 3 (mod 23)

95 ≡ 2 (mod 31)

and apply this to calculate 9536 (mod 713) .

3. Vector Spaces

Definition 3.1. A vector space (or a linear space) over a field F isa set V , containing an element denoted by 0, and for each pair u, v ∈ Vand each α ∈ F having a well-defined sum u + v ∈ V and a productαu ∈ V such that the following rules are satisfied

u+ (v + w) = (u+ v) + w(i)

u+ v = v + u(ii)

α(βu) = (αβ)u(iii)

1u = u(iv)

0u = 0(v)

α(u+ v) = αu+ αv(vi)

(α + β)u = αu+ βu .(vii)

Remark 3.2. It follows from these rules that all the axioms foraddition, (A1)–(A4) from Section 1, are satisfied in a vector space.From (iv) , (v) and (vii) we get

u+ 0 = 1u+ 0u = (1 + 0)u = 1u = u

so (A3) applies. The axiom (A4) can be verified as follows

u+ (−1)u = 1u+ (−1)u = (1 + (−1))u = 0u = 0 .

Remark 3.3. The elements of a vector space are often called vec-tors . In (v) we underlined the zero on the right-hand side to emphasizethat it is a vector. In what follows, we will simply denote also the zerovector by 0.

The basic theory for vector spaces over a general field F is the sameas for the special case when F = R. A number of vectors u1, . . . , ul in

3. VECTOR SPACES 15

V are said to be linearly dependent if there exist α1, . . . , αl ∈ F , notall zero, such that

α1u1 + · · ·+ αlul = 0 .

We say that u1, . . . , ul are linearly independent if they are not linearlydependent. The vectors u1, . . . , ul generate the vector space V if everyvector u ∈ V is a linear combination of u1, . . . , ul i.e. if

u = α1u1 + · · ·+ αlul

for some α1, . . . , αl ∈ F . A basis for V is a collection of vectorse1, . . . , en which are linearly independent and generate V . This is equiv-alent to the statement that every vector u ∈ V can, in a unique way,be written as

u = α1e1 + · · ·+ αnen,

where α1, . . . , αn ∈ F . The coefficients α1, . . . , αn are called the coor-dinates of the vector u in the basis e1, . . . , en. Two different bases for agiven vector space always contain equally many elements and a vectorspace is said to have the dimension n if it has a basis with n vectors. Ifa vector space V is generated by a finite number of vectors v1, . . . , vm,then we can always pick a basis from these. If the vectors v1, . . . , vmare linearly independent then they form a basis. Otherwise, one ofthem, for example vm, is a linear combination of the others. Then V isgenerated by v1, . . . , vm−1. In this way, we can continue until we obtaina collection of linearly independent vectors which generate V .

Example 3.4. For a given field F the standard example of a vectorspace over F is its n-fold product

F n = {(α1, . . . , αn) ; αi ∈ F}

with addition and multiplication, by elements from F , in each compo-nent. Every vector space V over F of dimension n can be identifiedwith F n by choosing a basis in V .

Example 3.5. Let f be a subfield of a larger field F . This meansthat f is a subset of F and that f is itself a field with the same operationsas defined in F . For this to be the case, it is necessary that f containsat least two elements, that the operations addition and multiplicationapplied to two elements in f again give an element in f , and that −αand α−1 also belong to f for every α 6= 0 in f . In this case, we canthink of F as a vector space over the subfield f . It follows from therules for F that the axioms (i)–(vii) for a vector space are satisfied.It is clear, that if we view the finite field F as a vector space over f ,then it is generated by a finite number of vectors. In other words there

16 1. FINITE FIELDS

exists a basis e1, . . . , en of elements in F such that every u ∈ F can, ina unique way, be written as

u = α1e1 + · · ·+ αnen

with α1, . . . , αn ∈ f . Here the dimension of F is n. If p is the numberof elements in the subfield f , then each coordinate αi can be chosen inp different ways, so F has exactly pn elements.

In connection with error-correcting codes, we will later deepen ourdiscussion on vector spaces over finite fields. Here we just show howExample 3.5 can be used to see that the number of elements of a finitefield must be a power of a single prime.

Let F be a finite field and as usual denote the unity in F by 1.Consider the sums

1 , 1 + 1 , 1 + 1 + 1 , . . . , m1 , . . .

where m1 means the sum of m copies of the unity. Since F is finite,there exist integers r < s such that r1 = s1. If m = s−r, then m1 = 0.The least positive integer p such that p1 = 0 is called the characteristicof the field F . The characteristic p must be a prime, since if p were theproduct of two integers p1 and p2 greater than 1 then

(p11) · (p21) = p1 = 0

and hence p11 = 0 or p21 = 0. This contradicts the fact that p is theleast positive integer with p1 = 0. Now set

f = {m1 ; m ∈ Z} = { 0 , 1 , 1 + 1 , . . . , (p− 1)1 } .

Then it is easily checked that f is a subfield of F and that the mapm 7→ m1 gives an isomorphism between Zp and f . Because f has pelements, it follows from Example 3.5 that the field F has pn elementsfor some positive integer n. We can now formulate our result as thefollowing theorem.

Theorem 3.6. For every finite field F there exist a prime numberp and a positive integer n such that the number of elements in F is pn.The prime p is the characteristic of the field.

Remark 3.7. The notion of a characteristic can also be defined forinfinite fields, but here there are two cases. Either, there exists a leastpositive integer p such that p1 = 0 which we then call the characteristic,or the elements m1 are non-zero for all non-zero m. In the latter casewe say that the characteristic is 0. As examples we have Q, R and Cwhich all are fields of characteristic 0.

4. POLYNOMIAL RINGS 17

Exercises

Exercise 3.1. Let V be a vector space over a field F . A subset Uof V is called a subspace of V if

u, v ∈ U ⇒ αu+ βv ∈ U , for all α, β ∈ F.

Check that every subspace U of V is a vector space with the sameoperations as in V .

Let F be the field Z3 and U be the subspace of F 4 generated bythe vectors (0, 1, 2, 1), (1, 0, 2, 2) and (1, 2, 0, 1). Find a basis for U anddetermine its dimension.

Exercise 3.2. Let F be a field with characteristic p 6= 0 .

(1) Show that pa = 0 for all a ∈ F .(2) Show that

(a+ b)p = ap + bp

for all a, b ∈ K .

(Hint: Show first that for 0 < k < p the binomial coefficients(pk

)are

divisible by p.)

Exercise 3.3. (1) Show that for a field of characteristic p 6= 0

(a1 + a2 + · · ·+ al)p = ap1 + ap2 + · · ·+ apl .

(2) Prove Fermat’s little theorem by choosing all ai = 1 in (1).

4. Polynomial Rings

According to Theorem 3.6, any finite field must have pn elements,where p is a prime number and n is some positive integer. So far,we have only dealt with the fields Zp for which n = 1. To be ableto construct fields with n > 1, we need to discuss polynomials withcoefficients in finite fields.

A polynomial with coefficients in a field F is an expression of theform

(1) f(x) = anxn + an−1x

n−1 + · · ·+ a1x+ a0,

where ai ∈ F . Strictly speaking, a polynomial is just a finite sequencea0, a1, . . . , an of elements in F and the letter x should be seen as aformal symbol. The value f(α) of the polynomial f at α ∈ F is

anαn + an−1α

n−1 + · · ·+ a1α + a0 ∈ F.

18 1. FINITE FIELDS

Example 4.1. Consider the polynomials

f(x) = x3 + 1 and g(x) = x4 + x2 + x+ 1

with coefficients in Z2 (observe that we do not write out the terms withcoefficient 0). Despite the fact that the values of f and g are equal forall α ∈ Z2 = {0, 1}, the polynomials should be considered as different .

If an 6= 0 in equation (1), then we say that the polynomial f(x)is of degree n and f(x) is said to be monic if an = 1. The set ofall polynomials with coefficients in a field F is denoted by F [x]. Theaddition and multiplication of polynomials are defined as usual whenthe coefficients lie in R or C. The division algorithm, the factor theoremand the Euclidean algorithm can be proven, in the general case, inexactly the same way as when F = R. The division algorithm tells usthat if f and g are polynomials such that deg f ≥ deg g, then thereexist polynomials q and r such that

f(x) = q(x)g(x) + r(x),

where either r(x) is the zero polynomial or deg r < deg g. If r is the zeropolynomial, then we say that g divides f and write g|f . The statementof the factor theorem is that f(α) = 0 if and only if (x − α) dividesf(x). Finally, the Euclidean algorithm gives a method for finding agreatest common divisor of two polynomials f and g. That h is agreatest common divisor of f and g means that h divides both f and g,furthermore that any other polynomial that divides both f and g mustdivide h. The greatest common divisor is not uniquely determined,but two different greatest common divisors h1 and h2 only differ by aconstant multiple. This follows from the fact that h1 divides h2 andh2 divides h1. This is only possible if h1 = ah2 for some a ∈ F . Ifwe demand that the greatest common divisor of f and g is a monicpolynomial, then it is uniquely determined and is denoted by (f, g).

Example 4.2. We will now illustrate the Euclidean algorithm bycalculating the greatest common divisor of the following polynomialsin Z3[x]:

f(x) = x5 + 2x3 + x2 + 2, g(x) = x4 + 2x3 + 2x2 + 2x+ 1.

Observe that since the coefficients are in Z3, we can apply identitiessuch as 4 ≡ 1 and 2 ≡ −1. (In what follows, we will leave out thebrackets around elements in Zn.)

x5 + 2x3 + x2 + 2 = (x+ 1)(x4 + 2x3 + 2x2 + 2x+ 1) + (x3 + 1)

x4 + 2x3 + 2x2 + 2x+ 1 = (x+ 2)(x3 + 1) + (2x2 + x+ 2)


x3 + 1 = (2x+ 2)(2x2 + x+ 2).

The last non-vanishing remainder 2x2 + x + 2 is a greatest commondivisor of f and g. The corresponding monic polynomial is obtainedby multiplying by 2−1 = 2. This gives (f, g) = x2 + 2x+ 1.

Definition 4.3. A polynomial s(x) in F [x] of degree n ≥ 1 is saidto be irreducible if it does not have a non-trivial divisor i.e. if theredoes not exist a polynomial g(x), with 1 ≤ deg g < n, that dividess(x). Irreducible polynomials are also called prime polynomials .

Example 4.4. The polynomial f(x) = x3 + 2x+ 1 is irreducible inZ3[x]. To checking this, observe that if f(x) were reducible then at leastone if its factors would be of degree 1. Then f(x) would necessarilyhave a zero in Z3, but this is not the case since f(0) = 1, f(1) = 1 andf(−1) = 1.

We will now prove that every monic polynomial in F [x] can bewritten as a product of monic prime polynomials and that this productis unique up to the order of its factors. For this we need the followinglemma.

Lemma 4.5. Assume that f , g and h are three polynomials in F [x]such that f(x) divides the product g(x)h(x). If f and g are relativelyprime i.e. (f, g) = 1 then f divides h.

Proof. It follows from the Euclidean algorithm that since (f, g) =1 there exist two polynomials c(x) and d(x) such that

1 = c(x)f(x) + d(x)g(x).

Henceh(x) = c(x)f(x)h(x) + d(x)g(x)h(x).

Both terms on the right-hand side are divisible by f so f must divideh. �

Theorem 4.6. Let F be a field and f(x) be a monic polynomial withcoefficients in F . Then there exist a number of different monic primepolynomials s1(x), . . . , sl(x) in F [x] and positive integers m1, . . . ,ml

such thatf(x) = s1(x)m1 · · · sl(x)ml .

The prime polynomials si and the integers mi are, up to order, uniquelydetermined.

Proof. We prove by induction, over the degree of f , that f canbe written as a product of prime polynomials. When the degree off is 1 there is nothing to prove. Now assume that the degree of f

20 1. FINITE FIELDS

is n and that the statement is correct for any polynomial of lowerdegree. If f is a prime polynomial we are done. Otherwise, we canwrite f(x) = g1(x)g2(x) for some polynomials of g1 and g2 both ofdegree less than n. According to the induction hypothesis these can bewritten as a product of prime polynomials. This proves that f has aprime factorization.

What is left to prove is the uniqueness. Assume that we have twoprime factorizations for f(x)

(2) s1(x)m1 · · · sl(x)ml = t1(x)n1 · · · tj(x)nj .

Let us first consider t1(x). We shall show that t1(x) is equal to oneof the factors si(x) on the left-hand side. Since s1 and t1 are monicprime polynomials, we know that either s1 = t1 or s1 and t1 are rela-tively prime. If s1 = t1 we are done. Otherwise s1(x)m1 and t1(x) arerelatively prime. According to Lemma 4.5, t1(x) must then divide theproduct

s2(x)m2 · · · sl(x)ml .

We can now continue the same procedure. Either t1 = s2 or else dividest1(x) the product

s3(x)m3 · · · sl(x)ml .

Sooner or later we end up with t1(x) = si(x) for some i. We can thendivide both sides of equation (2) by t1(x) and repeat the procedurenow for t2(x). When we have, in this way, divided out all the factorsti(x) on the right-hand side, all the factors si(x) on the left-hand sidemust have disappeared. Otherwise a product of such factors wouldbe equal to 1, which is impossible. This proves the uniqueness of theprime factorization. �

For a given field F the set F [x], equipped with the polynomial ad-dition and the polynomial multiplication, forms a ring. As we haveseen above, there are great similarities between F [x] and the ring Z ofintegers. For both Z and F [x] we have the division algorithm, the Eu-clidean algorithm and furthermore a unique prime factorization. Theprime numbers in Z correspond to the prime polynomials in F [x]. Weshall now copy the construction of the rings Zn from Z to F [x]. Lets(x) be a given non-zero polynomial with coefficients in F . Two poly-nomials f(x) and g(x) in F [x] are said to be congruent modulo s(x)if their difference f(x) − g(x) is divisible by s(x). For this we simplywrite f ≡ g (mod s). Denote by [f(x)] the class of polynomials whichare congruent to f(x) modulo s(x). Then we define an addition and amultiplication by

[f(x)] + [g(x)] = [f(x) + g(x)] and [f(x)] · [g(x)] = [f(x)g(x)].


In the same way as for the integers, one can check that these definitionsare independent of the choice of the representatives for the congruenceclasses. Denote by

F [x]/(s(x))

the set of congruence classes modulo s(x). It is easily checked thatF [x]/(s(x)), equipped with this addition and this multiplication, is acommutative ring.

Example 4.7. For the ring Z5[x]/(x3 + 1) we have

[x2 + 2x+ 1] · [x2 + x+ 2] = [x4 + 3x3 + 5x2 + 5x+ 2]

= [x4 + 3x3 + 2] = [(x+ 3)(x3 + 1− 1) + 2]

= [(x+ 3)(−1) + 2] = [−x− 1] = [4x+ 4].

Observe that x3 can always be substituted by −1, since we are calcu-lating modulo x3 + 1.

In analogy with the rings Zn one can show that F [x]/(s(x)) is afield if and only if s(x) is a prime polynomial. If s(x) is not a primepolynomial, then s(x) = s1(x)s2(x) for some polynomials s1 and s2of positive degree. Then [s1(x)][s2(x)] = 0, so F [x]/(s(x)) has zerodivisors and hence is not a field. If s(x) is a prime polynomial, then(f, s) = 1 for every non-zero polynomial f(x) of degree less than s.By the Euclidean algorithm there exist polynomials c(x) and d(x) suchthat

1 = c(x)f(x) + d(x)s(x).

This implies that [1] = [c(x)][f(x)], so [c(x)] is the inverse of [f(x)]. Ac-cording to the division algorithm, every congruence class in F [x]/(s(x))is represented by a polynomial of degree less than s(x). This meansthat every non-zero element has an inverse, so F [x]/(s(x)) is a field.

Example 4.8. The polynomial x2+1 is irreducible in the ring R[x]of polynomials with real coefficients. This means that

R[x]/(x2 + 1)

is a field. Every congruence class is represented by a polynomial ofdegree one and if we apply [x2 + 1] = 0, then we easily get

[a+ bx][c+ dx] = [(ac− bd) + (ad+ bc)x]

With this we easily see that R[x]/(x2 + 1) is isomorphic to the field Cof complex numbers.

Exercises

22 1. FINITE FIELDS

Exercise 4.1. Let f(x) be the polynomial x214 + 3x152 + 2x47 + 2in Z5[x]. Find the value f(3) in Z5.

Exercise 4.2. Show that if f(x) is a polynomial of degree n withcoefficients in a field F , then f has at most n zeros in F .

Exercise 4.3. Determine the greatest common divisor (f, g) of thefollowing polynomials in Z2[x]:

(1) f(x) = x7 + 1 , g(x) = x5 + x3 + x+ 1.(2) f(x) = x5 + x+ 1 , g(x) = x6 + x5 + x4 + x+ 1.

Exercise 4.4. Find the greatest common divisor h = (f, g) of thepolynomials f(x) = x17 + 1 and g(x) = x7 + 1 in Z2[x] and determinetwo polynomials c(x) and d(x) such that

h(x) = c(x)f(x) + d(x)g(x).

Exercise 4.5. Show that there exists only one irreducible poly-nomial in Z2[x] of degree two. Determine whether the polynomialx5 + x4 + 1 in Z2[x] is irreducible or not.

Exercise 4.6. Determine all monic irreducible polynomials in Z3[x]of degree 2.

Exercise 4.7. Find in Z3[x] the prime factorization for the follow-ing polynomials:

(1) x5 + x4 + x3 + x− 1(2) x4 + 2x2 + 2x+ 2(3) x4 + 1(4) x8 + 2.

Exercise 4.8. How many zero divisors do there exist in the ringZ5[x]/(x3 + 1)?

Exercise 4.9. (1) Let F be a finite field. Show that the productof all non-zero elements in F is equal to −1. (Hint: Apply Theorem2.1 and the relationship between zeros and coefficients.)

(2) Show that for every prime number p we have

(p− 1)! = −1 (mod p).

(Compare this result with Exercise 1.15.)

Exercise 4.10. Let F be a field with q elements, where q = 2m+1is odd. Show that x ∈ F is the square of some non-zero element in Fif and only if xm = 1. (Hint: Show first that a2 = b2 implies that a = bor a = −b and then use Exercise 4.2.)

Exercise 4.11. Show that for a field with an even number of ele-ments, every element is the square of one and only one element.

5. FINITE FIELDS 23

5. Finite Fields

Example 5.1. We shall here determine all irreducible polynomi-als in Z2[x] of degree less than or equal to 4. There exist only twopolynomials of degree 1, namely

x and x+ 1.

These are trivially irreducible. A polynomial of degree 2 or 3 is irre-ducible if and only if it has no zeros in Z2. It is easily checked thatsuch a polynomial has no zeros exactly when it has an odd numberof terms and the constant term is 1. This shows that the irreduciblepolynomials of degree 2 and 3 are exactly the following:

x2 + x+ 1

x3 + x2 + 1 and x3 + x+ 1.

If a polynomial of degree 4 is irreducible, then necessarily it does nothave a factor of degree 1, i.e. it does not have a zero in Z2, and it is nota product of two irreducible factors of degree 2. The second conditiononly excludes (x2 + x + 1)2 = x4 + x2 + 1, since there only exists oneprime polynomial of degree 2. The other polynomials in Z2 of degree4 that do not have a zero are

x4 + x3 + 1 , x4 + x+ 1 and x4 + x3 + x2 + x+ 1.

These are all the prime polynomials in Z2[x] of degree 4.

If s(x) is any of the irreducible polynomials of degree 4 mentionedabove, then Z2[x]/(s(x)) is a field with 24 = 16 elements. This followsfrom the fact that every congruence class is represented by a uniquepolynomial of degree 3 and for this each coefficient can be chosen inexactly two ways, namely as 0 or 1. Any irreducible polynomial ofdegree 2 or 3 induces a field with 22 = 4 or 23 = 8 elements, respectively.

In the next section, we will show that for every prime number p andevery positive integer n there exists an irreducible polynomial in Zp[x]of degree n. As a direct consequence of this, there exists for each suchp and n a field with pn elements. We shall also show that any two finitefields with the same number of elements are isomorphic. This meansthat up to isomorphism there exists, for each prime p and each positiveinteger n, exactly one finite field with pn elements. These fields aredenoted by GF (pn) and called the Galois field of order pn in honour of

the French mathematician Evariste Galois (1811-1832). In this sectionwe shall give examples of how to do calculations in finite fields.

24 1. FINITE FIELDS

Example 5.2. In order to find the multiplicative inverse of [x2 +1]in the field Z2[x]/(x3 + x2 + 1) we apply the Euclidean algorithm:

x3 + x2 + 1 = (x+ 1)(x2 + 1) + x

x2 + 1 = x · x+ 1.

This leads to (observe that + = − in Z2)

1 = (x2 + 1) + x · x = (x2 + 1) + x((x3 + x2 + 1) + (x+ 1)(x2 + 1))

= (x2 + x+ 1)(x2 + 1) + x(x3 + x2 + 1).

We end up with [x2 + 1]−1 = [x2 + x+ 1].

We will now turn our attention to calculations concerning powers.If a is a non-zero element of a finite field F then some of its power mustbe 1. We know for example from Theorem 2.1 that aq−1 = 1, where qis the number of elements in F .

Definition 5.3. The order of a non-zero element a in a finite fieldis the least positive integer m such that am = 1. We denote the orderof a by o(a).

Example 5.4. Here we determine the order of [10] in the field Z73:

102 = 100 ≡ 27

103 ≡ 270 ≡ −22

104 ≡ −220 ≡ −1.

This implies that 105 ≡ −10, 106 ≡ −27, 107 ≡ 22 and 108 ≡ 1. Theorder of [10] is therefore 8.

According to Fermat’s little theorem, we know that for any non-zeroelement a in the field Z73 we have a72 = 1. The following result showsthat it is not a coincidence that the order 8 in Example 5.4 divides 72.

Lemma 5.5. Let a be a non-zero element in a finite field. If an = 1for some positive number n, then the order of a divides n.

Proof. Assume the converse. If m is the order of a, then thereexist integers q and r with 0 < r < m, such that

n = qm+ r.

From this it follows that

1 = an = (am)q · ar = ar.

This contradicts the fact that m = o(a), since 0 < r < m. �

5. FINITE FIELDS 25

The next result gives us a method for constructing elements of highorder.

Lemma 5.6. Assume that the elements a1 and a2 in a finite fieldhave the orders m1 and m2, respectively, and that m1 and m2 are rel-atively prime. Then a = a1a2 has the order m1m2.

Proof. Assume that ak = 1. Then we have

1 = akm1 = akm11 · akm1

2 = akm12 .

According to Lemma 5.5, m2 must divide km1. Since (m1,m2) = 1 thenumber m2 must divide k. Using a similar argument, we see that m1

divides k. This means that k is divisible by m1m2, since m1 and m2

are relatively prime. The order of a is therefore at least m1m2. Thatit is exactly m1m2 follows from

am1m2 = (am11 )m2 · (am2

2 )m1 = 1.

�

Example 5.7. In the field Z73 we have

82 = 64 ≡ −9

83 ≡ −72 ≡ 1

so the order of [8] is 3. According to Example 5.4 and Lemma 5.6 theorder of [80] = [7] is 8 · 3 = 24.

Before we can formulate the main result of this section we need thefollowing lemma.

Lemma 5.8. Let a and b be elements of a finite field F of order mand n, respectively, and assume that m does not divide n. Then thereexists an element in F of order greater that n.

Proof. If m does not divide n, then there exists a prime powerpk that divides m but not n. Then m = m′pk and n = n′pl, where0 ≤ l < k and n′ is not divisible by p. According to Lemma 5.6, thismeans that (pk, n′) = 1 and the order of am

′ · bpl is pkn′ > n. �

Theorem 5.9. If F is a finite field with q elements, then therealways exists an element in F of order q − 1.

Proof. Let b be a non-zero element in F such that the order ofb is larger than or equal to the order of any other element of F . Setn = o(b). According to Lemma 5.8 the order of any element in F mustdivide n, since otherwise there would exist an element of order greater

26 1. FINITE FIELDS

than n. This means that any non-zero element of F must satisfy theequation

xn = 1.

The polynomial xn − 1 has therefore q − 1 different zeros. Followingthe factor theorem we therefore have n ≥ q − 1. On the other handTheorem 2.1 tells us that the order never can be greater than q − 1.Hence n = q − 1 so we have proven the result. �

Definition 5.10. Let F be a field with q elements. An element oforder q − 1 in F is said to be a primitive element .

Example 5.11. We shall show that [3] is a primitive element forZ101. Since the order of [3] must divide 100 = 22 · 52, it is enough tocheck the powers 2, 4, 5, 10, 20, 25 and 50:

32 = 9

34 = 81 ≡ −20

35 ≡ −60

310 ≡ 3600 ≡ −36

320 ≡ 1296 ≡ −17

325 ≡ 1020 ≡ 10

350 ≡ 100 ≡ −1

The least positive integer m for which 3m ≡ 1 is therefore 100.

For a primitive element a in a field F with q element the powers

a0, a1, a2, . . . , aq−2

are all different. Otherwise we would have aj = ak for some integersj < k between 0 and q − 2. Then ak−j = 1, which contradicts the factthat the order of a is q − 1. For every non-zero b in F there exists auniquely determined j with 0 ≤ j ≤ q − 2 such that b = aj. We callj the index of b and write j = ind(b). The index is also called thediscrete logarithm of b with respect to the primitive element a. Theindex can be used to simplify calculations of products and quotients infinite fields. If the field has q elements then we have

ind(b1 · b2) ≡ ind(b1) + ind(b2) (mod q − 1)

ind(b1 · b−12 ) ≡ ind(b1)− ind(b2) (mod q − 1) .

Example 5.12. We have seen in Example 5.1 that the polynomialx4 + x3 + 1 is irreducible Z2[x]. The field

F = Z2[x]/(x4 + x3 + 1)

5. FINITE FIELDS 27

has 24 = 16 elements. Each element in F can be described with astring of four binary digits given by the coefficients of the polynomialof degree 3 representing the congruence class. As an example, the string1011 denotes the class [x3 + x+ 1]. The class [x] is a primitive elementin F and this induces a table containing each element in F ∗ :

index 0 1 2 3 4 5 6 7element 0001 0010 0100 1000 1001 1011 1111 0111index 8 9 10 11 12 13 14element 1110 0101 1010 1101 0011 0110 1100

As an example, the calculation of the element of degree 5 goes as follows

[x5] = [x · x4] = [x · (x3 + 1)] = [x4 + x]

= [(x3 + 1) + x] = [x3 + x+ 1] .

We illustrate how the table can be used by calculating

(1111) · (1101)−1.

The index for this element is

6− 11 = −5 ≡ 10 (mod 15)

Hence(1111) · (1101)−1 = (1010).

Exercises

Exercise 5.1. Determine all irreducible polynomials of degree 5 inZ2[x].

Exercise 5.2. Prove that Z3[x]/(x3 + x2 + 2) is a field with 27elements and determine the multiplicative inverse to [x+ 2].

Exercise 5.3. Prove that Z11[x]/(x2+x+4) is a field and determinethe multiplicative invers to [3x+ 2]. How many elements does the fieldhave ?

Exercise 5.4. (1) Determine the order of the elements [3] and [4]in Z37. (2) Determine a primitive element in Z37.

Exercise 5.5. Determine a primitive element in Z73.

Exercise 5.6. (1) Show that L = Z2[x]/(x3 + x+ 1) is a field. (2)Show that [x] is a primitive element and calculate, as in Example 5.12,an index table for L. (3) Calculate [x2 + 1] · [x2 + x+ 1]−1.

28 1. FINITE FIELDS

Exercise 5.7. Use the table in Example 5.12 to calculate the fol-lowing

(1) (1001) · ((1011)2 + (0011)−2),(2) ((1010)2 + (0101)3) · ((0001) + (1101)2)−1.

6. The Existence and Uniqueness of GF (pn)

To show that there exists a field with pn elements we shall hereprove that for each prime p and every positive integer n there exists anirreducible polynomial of degree n in Zp[x]. We start by noticing thatthe total number of monic polynomials

f(x) = xn + an−1xn−1 + · · ·+ a1x+ a0

with coefficients in Zp is equal to pn. According to Theorem 6, everysuch polynomial can, in a unique way, up to the term order, be writtenas a product

(3) f(x) = s1(x)m1 · · · sl(x)ml ,

where s1(x), . . . , sl(x) are monic prime polynomials in Zp[x]. If di isthe degree of si(x) then

(4) n = m1d1 + · · ·+mldl.

The number of monic polynomials of degree n in Zp[x] is equal tothe number of ways, as in (3), to write monic polynomials of degree nas a product of prime polynomials. If Id denotes the number of monicprime polynomials of degree d, then according to (4), the total numberof monic polynomials of degree n in Zp[x] is equal to the coefficient fortn in the product

(1 + t+ t2 + · · · )I1(1 + t2 + t4 + · · · )I2(1 + t3 + t6 · · · )I3 · · · .Since we know that the number of these coefficients is equal to pn , wehave ∏

d

(1

1− td

)Id=

1

1− pt.

By taking logarithms on each side we obtain∑d

−Id(ln(1− td)

)= − ln(1− pt)

and by Taylor expanding on both sides we get

6. THE EXISTENCE AND UNIQUENESS OF GF (pn) 29

I1(t+t2

2+t3

3+ · · · )+I2(t

2+t4

2+t6

3+ · · · )+I3(t

3+t6

2+t9

3+ · · · )+ · · ·

= pt+p2t2

2+p3t3

3+ · · · .

Comparing coefficients of each side for tn gives∑d|n

Id ·d

n=pn

n.

Observe that on the left-hand side we only have terms where d dividesn. Multiplying by n gives the following result:

Theorem 6.1. If Id is the number of monic irreducible polynomialsof degree d in Zp[x], then ∑

d|n

dId = pn.

Example 6.2. If p = 2 and n = 6 then we obtain

I1 + 2I2 + 3I3 + 6I6 = 26 = 64.

According to Example 5.1 we have I1 = 2, I2 = 1 and I3 = 2, so I6 = 9.

By applying Theorem 6.1 repeatedly we can, in this way, determinethe numbers Id. But to do this in one go, we will make use of theMobius inversion formula proven in the next section.

The Mobius function µ(n) is defined for positive integers n andtakes only three values 0, 1 and −1. It is given by

µ(n) =

1 if n = 1

(−1)k if n is the product of k different primes

0 otherwise.

If we apply the Mobius inversion formula to the equation in Theorem6.1 then we get

nIn =∑d|n

µ(d)pn/d.

The right-hand side contains a lowest power of p. If the lowest poweris pm, then

nInpm

= ±1 + (a number of p-powers with coefficients ±1).

HencenInpm

= ±1 (mod p)

and in particular nIn 6= 0.

30 1. FINITE FIELDS

Theorem 6.3. For each prime number p and each positive integern there exists an irreducible polynomial of degree n in Zp[x].

It is a direct consequence of Theorem 6.3 that there exists a fieldwith pn elements. We shall now focus our attention on proving that,up to isomorphisms, there exists only one such field.

Let F be an arbitrary finite field of characteristic p. Then F con-tains the subfield

f = { 0 , 1 , . . . , (p− 1)1 }

which is isomorphic to Zp . If m1 ∈ f and β ∈ F , then (m1) · β = mβ .We can therefore consider F as a vector space over Zp . Since F isfinite, this vector space is finite dimensional. This implies that forevery α ∈ F there exists a positive integer d such that the powers

α0, α1, α2, . . . , αd

are linearly dependent, i.e. there exist a0, a1, . . . , ad ∈ Zp not all zerosuch that

a01 + a1α + a2α2 + · · ·+ adα

d = 0 .

Let d be the smallest such integer and set s(x) = a0 + a1x+ · · ·+ adxd.

Then s(x) has the lowest degree amongst the non-trivial polynomials inZp[x] having α as a zero. We can always choose ad = 1, and then s(x)is uniquely determined and called the minimal polynomial to α. Theminimal polynomial is irreducible in Zp[x] because if s(x) was a products1(x)s2(x) of factors of lower degree than d, then s1 or s2 would haveα as zero and this would contradict the fact that s(x) is the minimalpolynomial of α.

Theorem 6.4. Let F be a finite field of charateristic p and let α bean element of F . If L is the smallest subfield of F containing α and ifs(x) is the minimal polynomial to α, then L is isomorphic to the fieldZp[x]/(s(x)).

Proof. Set

L = {f(α) ; f ∈ Zp[x]}.Every subfield of F containing α must include L, since such a fieldcontains all powers of α and all linear combinations of such powers.We shall show that L is isomorphic to the field Zp[x]/(s(x)). It followsfrom this that L itself is a field and hence the smallest subfield of Fcontaining α. Consider the map

Zp[x]/(s(x)) 3 [f(x)] 7→ f(α) ∈ L.

6. THE EXISTENCE AND UNIQUENESS OF GF (pn) 31

It is well-defined since if f and g belong to the same congruence classi.e. if f(x) = g(x) + h(x)s(x) for some polynomial h, then

f(α) = g(α) + h(α)s(α) = g(α) .

It immediately follows from the definition that [f(x)]+[g(x)] is mappedto f(α) + g(α) and [f(x)] · [g(x)] to f(α)g(α). It remains to show thatthe map is bijective. It is clear that it is surjective. To show that itis injective, we first observe that if the minimal polynomial s(x) hasdegree d, then it is enough to consider polynomials f(x) of degree lessthan d. Every congruence class in Zp[x]/(s(x)) is represented by sucha polynomial. Assume that f(α) = g(α) for two different polynomialsof degree less than d. Then α is a zero of f − g , which contradictsthe fact that s(x) is the minimal polynomial of α. This shows that themap is injective and the statement is proven. �

Corollary 6.5. Let F be a field with pn elements and let s(x) be amonic prime polynomial in Zp[x] with zero α in F . Then s(x) is theminimal polynomial of α and the degree of s divides n.

Proof. The element α is a zero of both s(x) and its minimal poly-nomial t(x). Hence α is a zero to the greatest common divisor (s, t).Since s and t are irreducible, we must have s = (s, t) = t. If s(x) hasthe degree d and L is the smallest subfield containing α, then Theorem6.4 tells us that L has pd elements. Because F can be seen as a vectorspace over L, we have

|F | = |L|m

for some positive integer m, where |F | and |L| denote the number ofelements in F and L, respectively. This means that

pn = pdm

and from this follows that d divides n. �

We now have all the tools needed to prove that two finite fieldswith the same number of elements must be isomorphic. Let F be anarbitrary field with q = pn elements. According to Theorem 2.1 everyelement in F is a zero of the polynomial xq−x . We have multiplied theequation in the theorem by x to include x = 0. According to Theorem4.6, xq − x can be written as a product of prime polynomials in Zp[x]:

(5) xq − x =∏i

si(x).

Here is the sum of the degrees of the polynomials si equal to q. Sincexq − x has q different zeros in F , the prime polynomials on the right-hand side must all be different and for each polynomial si its degree

32 1. FINITE FIELDS

must be the number of its different zeros in F . The above corollaryshows that the degree of the polynomial si divides n.

Let us now consider the formula in Theorem 6.1. It shows that thesum of the degrees of all prime polynomials in Zp[x] dividing n is equalto pn. This means that the product on the right-hand side of equation(5) must contain a prime polynomial the degree of which divides n. Inparticular, according to Theorem 6.3, the right hand-side of (5) mustcontain a prime polynomial of degree n. This is the minimal polynomialof each of its n zeros in F . Let α be such a zero. Then it follows fromTheorem 6.4 that the smallest subfield of F containing α is isomorphicto the field Zp[x]/(s(x)) and consequently contains pn elements. Thefield F is therefore isomorphic to Zp[x]/(s(x)) . We have hereby proventhe following result.

Theorem 6.6. Let s(x) be a prime polynomial of degree n in Zp[x].Then every field with pn elements is isomorphic to Zp[x]/(s(x)) .

Remark 6.7. In particular, we have shown that if s1 and s2 aretwo different prime polynomials of degree n in Zp[x] then the fieldsZp[x]/(s1(x)) and Zp[x]/(s2(x)) are isomorphic.

7. The Mobius Inversion Formula

Let us first remember the fact that the Mobius function µ(n) isdefined for positive integers n, as 0 if n has multiple prime factors andas (−1)k if n is the product of k different primes. As a special case wehave µ(1) = 1 .

Lemma 7.1. ∑d|n

µ(d) =

{1 for n = 1

0 for n > 1 .

Proof. When n = 1 the sum is equal to µ(1) = 1. If n > 1 andn = pm1

1 · · · pmrr is the prime factorization of n, we set n∗ = p1 · · · pr.

Then ∑d|n

µ(d) =∑d|n∗

µ(d)

= 1− r + · · ·+ (−1)k(r

k

)+ · · ·+ (−1)r

(r

r

)= (1− 1)r

7. THE MOBIUS INVERSION FORMULA 33

= 0.

The binomial coefficients(rk

)tell us how many different numbers d are

products of k prime factors chosen amongst p1, . . . , pr. �

Theorem 7.2 (Mobius inversion formula). Let f(n) and g(n) bedefined for positive integers n and assume that

f(n) =∑d|n

g(d)

for all n. Then

g(n) =∑d|n

µ(d)f(nd

).

Proof. It follows from

f(nd

)=∑d′|n

d

g(d′),

that∑d|n

µ(d)f(nd

)=∑d|n

µ(d)∑d′|n

d

g(d′) =∑d′|n

g(d′)∑d| n

d′

µ(d) = g(n).

For the last equality we have used Lemma 7.1, which gives∑d| n

d′

µ(d) =

{1 if d′ = n

0 if d′ < n .

�

CHAPTER 2

Error-Correcting Codes

1. Introduction

When transferring or storing information there is always a risk oferrors occurring in the process. To increase the possibility of detectingand possibly correcting such errors, one can add a certain redundanceto the text carrying the information, for example, in form of controldigits. We shall now give two simple examples.

Example 1.1. Assume that a sender transmits a text which is di-vided into a number of six digit binary words. Each such word consistsof six digits which each is either 0 or 1. To increase the possibility fora receiver to detect possible errors, that might have occurred duringthe transfer, to each word the sender can add the seventh binary digitin such a way that in each seven digit word there always is an evennumber of ones. If the receiver registers a word with an odd numberof ones, then he will know that an error has occurred and can possiblyask the sender to repeat the message.

Example 1.2. If the receiver in Example 1.1 does not have theopportunity to ask for a repetition, the sender can proceed in a differentway. Instead of adding the seventh digit he can send every six digitword three times in a row. If the three words are not identical whenthey reach the receiver, he will know that an error has occurred andcould try to correct it at each place by choosing a digit that occursat the corresponding places in at least two of the received words. Hecan of course not be completely sure that the erroneous word has beencorrected, but if the probability for more than one error to occur islow, then the chances are good.

One disadvantage of the method in Example 1.2 is that, comparedwith the original text, the message with the error-correcting mechanismtakes three times as long to send. Hence it seems a worthwhile exerciseto find more effective methods and this is the purpose of the theory oferror-correcting codes. This was started off by the work of Shannon,Golay and Hamming at the end of the 1940s and has since evolved

35

36 2. ERROR-CORRECTING CODES

rapidly using ever more sophisticated mathematical methods. Herethe theory of finite fields plays a particularly important role.

For writing a text we must have an alphabet. This is a finite set Fof symbols called letters. As is common in coding theory, we assumethat F is a finite field. When F = Z2, as in the above examples, thecode is said to be binary. A word is a finite sequence x1x2 . . . xm ofletters. We shall here only deal with so called block codes . This meansthat the words are all of the same length m and can therefore be seenas elements in the vector space Fm . When appropriate, we write thewords as vectors x = (x1, . . . , xm) in Fm. A coding function E is aninjective map

E : Fm → F n

from Fm into a vector space F n of higher dimension i.e. m < n. Theimage C = E(Fm) is what we call a code. To improve the possibilityfor detecting and correcting errors, it is useful that the elements ofthe code C lie far apart from each other in F n. This to minimize theprobability that a sent code word is received erroneously as a differentcode word.

Definition 1.3. The Hamming distance d(x, y) between two vec-tors x = (x1, . . . , xn) and y = (y1, . . . , yn) in F n is defined as thenumber of coordinates i where xi 6= yi .

Example 1.4. In the space Z52 the Hamming distance satisfies

d(10111, 11001) = 3 and in Z43 we have d(1122, 1220) = 2.

Remark 1.5. If it is equally likely that an erroneously receivedletter is any other letter from the alphabet, then the Hamming distanceis a natural measure for how big the error is. In some situations, othermeasures are more appropriate, but here we will only deal with theHamming distance.

Definition 1.6. Let C be a code in F n. Then we define its sep-aration d(C) as the least distance between two different words in thecode i.e.

d(C) = min{d(x, y) ; x, y ∈ C , x 6= y}.Theorem 1.7. Let C be a code with separation d(C).

(1) If d(C) ≥ k+ 1 then C can detect up to k errors in each word.(2) If d(C) ≥ 2k + 1 then C can correct up to k errors in each

word.

Remark 1.8. The consequence of (2) is that if d(C) ≥ 2k + 1then, for each word containing at most k errors, there exists a uniquely

1. INTRODUCTION 37

determined closest code word. We assume that the erroneous word iscorrected by picking instead the closest word in the code. For prac-tical purposes, it is of great importance to find effective algorithmscorrecting errors and the existence of such algorithms can be a strongargument for the choice of a particular code. In the following we willfocus on how to construct codes with high separation and not on error-correcting algorithms.

Proof of Theorem 1.7. (1) If d(C) ≥ k+ 1, then any two codewords are different at at least k + 1 places. A received word with atmost k letters wrong cannot be a code word and is therefore detectedas erroneous.

To prove (2) we assume that x is a received word different froma code word y at most k places. If z was another code word withthis property then the triangular inequality gives d(y, z) ≤ d(y, x) +d(x, z) ≤ 2k. This contradicts the assumption that d(C) ≥ 2k + 1.This means that we can correct x to y . �

If we are interested in constructing a code C = E(Fm) in F n witha given separation σ = d(C) , then there is a natural limit for which mwe can choose. We shall now give a theoretical estimate of the largestpossible value of m.

Definition 1.9. For every non-negative integer r we define thesphere S(x, r), with centre x ∈ F n and radius r, by

S(x, r) = {y ∈ F n; d(x, y) ≤ r}.Lemma 1.10. If F has q elements then the sphere S(x, r) contains

exactly(n

0

)+

(n

1

)(q − 1) +

(n

2

)(q − 1)2 + · · ·+

(n

r

)(q − 1)r

words.

Proof. The result follows from the fact that if 0 ≤ j ≤ r, thenthere exist

(nj

)(q−1)j words which have exactly j coordinates different

from x. �

Theorem 1.11. Assume that F has q elements, that the code C inF n contains M words and has separation 2k + 1. Then

(6) M

[(n

0

)+

(n

1

)(q − 1) + · · ·+

(n

k

)(q − 1)k

]≤ qn.

Proof. The spheres of radius k and centre in different code wordsin C cannot intersect, since d(C) = 2k + 1. Because the number ofelements in F n is qn, the result then follows from Lemma 1.10. �


Remark 1.12. If C = E(Fm) then M = qm.

Remark 1.13. The inequality (6) is called the sphere packing boundor the Hamming bound. In case of equality, the corresponding code Cis said to be perfect. For such a code, every word y in F n lies in exactlyone sphere S(x, k) with x in C.

Exercises

Exercise 1.1. In Examples 1.1 and 1.2 we defined two coding func-tions from Z6

2 to Z72 and Z18

2 , respectively. Determine the separationfor the corresponding codes. Compare the result with Theorem 1.11.

Exercise 1.2. Let σ > 0 be an odd integer and C be a code in

Zn2 with M words and separation σ. Show that there exists a code Cin Zn+1

2 with M words and separation σ + 1. (Hint: Compare withExample 1.1)

Exercise 1.3. Construct a code in Z82 with 4 words and separation

5.

Exercise 1.4. Show that there does not exist a code in Z122 with

27 words and separation 5.

2. Linear Codes and Generating Matrices

Definition 2.1. A code C in F n is said to be linear if it is a linearsubspace of F n. If the dimension of C is m then it is called an [n,m]code.

Remark 2.2. That C is a linear subspace of F n means that everylinear combination of vectors in C is also contained in C. Then C isitself a vector space with the same operations as F n, so the dimensionof C is well-defined.

In practice, most error-correcting codes are linear or can be ob-tained from linear ones. A great advantage of linear codes is that it ismuch easier to determine their separation than in the general case.

Remark 2.3. By the weight w(x) of a code word x = (x1, . . . , xn)in F n we mean the number of coordinates in x that are different fromzero. The weight w(C) of a linear code C in F n is defined by

w(C) = min{w(x); x ∈ C , x 6= 0}.

2. LINEAR CODES AND GENERATING MATRICES 39

Theorem 2.4. For a linear code C the separation d(C) is equal toits weight w(C).

Proof. A linear code that contains the two words x and y alsocontains their difference x − y. The result follows from the fact thatthe Hamming distance d(x, y) is equal to the weight w(x− y). �

Remark 2.5. If we are interested in determining the separationfor a general code containing M words, then we must, in principle,determine M(M −1)/2 different Hamming distances, one for each pairin the code. For a linear code, it is enough to calculate the weight ofthe M − 1 non-zero code words.

Definition 2.6. A generator matrix for a linear [n,m] code C inF n is a m × n matrix G, with elements in F , such that its rows forma basis for C.

Example 2.7. Consider the following 3 × 7 matrix with elementsin F = Z3

G =

1 1 1 1 1 1 11 1 2 2 1 1 22 1 2 1 2 1 2

.By subtracting the first row from the second and adding the first tothe third, we obtain the matrix1 1 1 1 1 1 1

0 0 1 1 0 0 10 2 0 2 0 2 0

.Multiplying the third row by 2 gives1 1 1 1 1 1 1

0 0 1 1 0 0 10 1 0 1 0 1 0

.Finally, subtracting both the second and the third row from the firstyields

G =

1 0 0 2 1 0 00 0 1 1 0 0 10 1 0 1 0 1 0

.The rows of G generate the same subspace of F 7 as the rows of G,because we can write the rows in one matrix as a linear combinationof the rows of the other. The two matrices G and G are thereforegenerator matrices for the same code C in F 7 . We now observe that


the first three columns of G are columns in the identity matrix of order

3. If we interchange the second and the third columns of G we get1 0 0 2 1 0 00 1 0 1 0 0 10 0 1 1 0 1 0

.

This matrix generates a code C ′ in F 7 that is obtained from C byinterchanging the letters in position 2 and 3 for all words in C.

Definition 2.8. Two codes C and C ′ in F n are said to be equivalentif there exists a permutation π of the numbers 1, . . . , n such that

C ′ = {xπ(1)xπ(2) . . . xπ(n) ; x1x2 . . . xn ∈ C} .

Remark 2.9. If two codes C and C ′ are equivalent then theirseparations are equal i.e. d(C) = d(C ′).

The ideas presented in Example 2.7 can be applied to prove thefollowing theorem.

Theorem 2.10. Every linear [n,m] code C is equivalent to a codewith a generator matrix of the form

[Im | A]

where Im is the identity matrix of order m and A is an m × (n −m)matrix.

Definition 2.11. When a generator matrix for a linear code takesthe form as in Theorem 2.10 we say that it is of normal form.

Let G = [Im | A] be the generator matrix, of a linear [n,m] code Cin F n, of normal form. If the elements in Fm and F n are seen as rowmatrices, then the map

Fm 3 x 7→ xG ∈ F n

gives a natural linear coding function. The first m letters in the wordxG are given by x in Km and the last n−m letters (control digits) byxA.

Exercises

Exercise 2.1. Construct generator matrices for the codes in Ex-amples 1.1 and 1.2.

3. CONTROL MATRICES AND DECODING 41

Exercise 2.2. Let C be a binary linear code with the generatormatrix 1 0 0 1 1 0 1

0 1 0 1 0 1 10 0 1 0 1 1 1

.

List all the code words in C and determine the separation for C.

Exercise 2.3. The matrix[1 0 1 10 1 1 2

]is a generator matrix for a linear code C in Z4

3. Determine all the codewords in C and the separation d(C) . Then show that the code C isperfect.

Exercise 2.4. Let C be a binary linear code with generator matrix1 1 1 0 0 0 01 0 0 1 1 0 01 0 0 0 0 1 10 1 0 1 0 1 0

.Find a generator matrix for C of normal form.

Exercise 2.5. Prove Theorem 2.10 by showing that every m × nmatrix G, with elements in a field F and linearly independent rows,can be transformed into a matrix of the form [Im | A] by repeated useof the following operations:

(1) multiplication of a row with an element in F(2) addition of a row to another one(3) swopping two columns.

(Hint: Use induction over the number of rows in G)

3. Control Matrices and Decoding

Definition 3.1. The scalar product < x, y > of two vectors x =(x1, . . . , xn) and y = (y1, . . . , yn) in F n is defined by

<x, y>= x1y1 + · · ·+ xnyn.

Definition 3.2. The dual code C⊥ of a linear code C in F n is thelinear code

C⊥ = {y ∈ F n; <x, y>= 0 for all x ∈ C}.


Remark 3.3. As for subspaces in Rn, it is easy to show that if thecode C in F n has dimension m, then the dual code C⊥ is of dimensionn − m. For vector spaces F n over a finite field F , it is not true ingeneral that every vector in F n can, in a unique way, be written asthe sum of a vector in C and a vector in C⊥. It can even happen thatC⊥ = C. In that case the code is said to be self-dual.

Example 3.4. For the matrix

G =

[1 0 1 10 1 1 2

]the scalar product of the first row with itself is 3, the scalar product ofthe second row with itself is 6, and the scalar product of the two rowsis 3. This means that each scalar product is 0 modulo 3. From this wesee that the [4, 2] code over Z3 with generator matrix G is self-dual.

Definition 3.5. A generator matrix for the dual code C⊥ of C iscalled a control matrix for C.

A word x ∈ F n is contained in the code C if and only if the scalarproduct of x and any row of a control matrix for C is zero. In this waywe can easily check if a word belongs to the code or not.

If G is a generator matrix for an [n,m] code C and H is a controlmatrix for C, then G is an m × n matrix and H is an (n − m) × nmatrix of rank (n−m). The condition that H is a control matrix forC can be written as

(7) G ·H t = 0,

where H t denotes the transpose of the matrix H. The content of equa-tion (7) is namely that the scalar product of the rows of G and therows of H are zero.

Let us now assume that the generator matrix G is of normal form[Im | A], where A is an m× (n−m) matrix. If we then choose

H = [−At | In−m],

then it is easily verified that condition (7) is satisfied. We now formu-late this as the following theorem.

Theorem 3.6. If a linear [n,m] code C has the generator matrix[Im | A], then [−At | In−m] is a control matrix for C.

Remark 3.7. If the field F is Z2, then −At = At so we can take[At | In−m] as a control matrix.

3. CONTROL MATRICES AND DECODING 43

Example 3.8. The binary [5, 2] code which has the generator ma-trix

G =

[1 0 1 0 10 1 0 1 1

]has as control matrix

H =

1 0 1 0 00 1 0 1 01 1 0 0 1

.We shall now describe how a receiver can apply a control matrix

H of a linear code C to correct errors that possibly have occurredduring the transfer of information when using the code C. We start bychecking if the received word x ∈ F n satisfies the condition xH t = 0.If that is the case then x is orthogonal to all the rows of H and hencea code word. We then assume that no error has occurred and that xis the code word sent. On the other hand, if xH t 6= 0 then an errorhas occurred. In order to correct it, we consider the set of all words yin F n such that yH t = xH t . We call this set the coset correspondingto the syndrome xH t . In the coset corresponding to xH t we choosethe word y with least weight i.e. the least Hamming distance from theorigin. The fact that yH t = xH t means that the difference x − y isa code word and there does not exist any other code word closer to xsince y is of minimal weight. For this reason it is reasonable to correctx to x − y . The word y is called a coset leader corresponding to thesyndrome xH t .

Example 3.9. For the code in Example 3.8 we have the followingtable of coset leaders of the listed syndromes

coset leader 00000 10000 01000 00100 00010 00001 11000 10010

syndrome 000 101 011 100 010 001 110 111

The syndrome 000 corresponds to the coset of code words. The fivefollowing syndromes correspond to cosets consisting of words differentfrom a code word at only one place. For those the coset leaders areuniquely determined since different words of weight one have differentsyndromes. This is a consequence of the fact that the columns of thecontrol matrix H are all different. The syndrome of a word that has1 at place j and 0 elsewhere is the j-th row in H t . The two last cosetleaders are not uniquely determined by their syndromes. For example,also 01100 gives the syndrome 111. Here the receiver can act in severalways. One possibility is that he decides to pick one of the coset leaders


and uses that one for error-correcting. Other alternatives are that heasks the sender to repeat the message or simply ignores the word.

Let us now apply the above table to the three received words 11111,01110 and 01101. The first word has the syndrome 001. The corre-sponding coset leader is 00001 and the corrected word becomes 11110.For 01110 the syndrome is 101 with coset leader 10000. Even in thiscase the corrected word is 01110− 10000 = 11110. For the word 01101the syndrome is 110 so at least two letters must be wrong. If the re-ceiver picks the coset leader in the list above, then the corrected wordbecomes 10101.

We conclude this section with a theorem telling us how we candetermine the separation of a code from its control matrix.

Theorem 3.10. A linear code C with the control matrix H hasseparation σ if and only if there exist σ columns in H that are linearlydependent and furthermore any σ− 1 of the columns in H are linearlyindependent.

Proof. That σ columns in H are linearly dependent means thatthere exists a word x of weight at most σ such that xH t = 0. Theweight of such a word can never be less than σ, since σ − 1 columnsin H are always linearly independent. Hence w(C) = σ and the resultfollows from Theorem 2.4 of the last section. �

Exercises

Exercise 3.1. Construct a control matrix for the code in Example1.1.

Exercise 3.2. Show that for a linear [n,m] code C the dual codeC⊥ has dimension n−m. (Hint: Use Theorem 2.10)

Exercise 3.3. The matrices1 0 0 1 11 1 1 0 10 0 1 1 1

and

1 2 4 0 30 2 1 4 12 0 3 1 4

are generator matrices for two linear codes C1 and C2 in Z5

2 and Z55 ,

respectively. Construct control matrices for C1 and C2. What are theseparations for C1 and C2?

4. SOME SPECIAL CODES 45

Exercise 3.4. Consider the linear code in Z62 with the generator

matrix 1 0 0 1 1 10 1 0 1 0 10 0 1 0 1 1

.(1) Which of the following words are code words ?

111001 , 010100 , 101100 , 110111 , 100001.

(2) Which of the words can be corrected? Correct those!

Exercise 3.5. Let C be a binary code with generator matrix1 0 0 0 1 0 10 1 0 0 1 0 10 0 1 0 0 1 10 0 0 1 0 1 1

.Correct the following words in C if possible

1101011 , 0110111 , 0111000 .

Exercise 3.6. Determine the separation for the linear code in Z83

with control matrix 1 1 1 1 1 1 1 10 1 0 0 1 2 1 20 0 1 0 1 1 2 00 0 0 1 1 1 0 1

.Exercise 3.7. Let C be the code in Z6

5 with the generator matrix1 0 0 1 1 10 1 0 1 2 30 0 1 1 3 4

.Show that d(C) = 4.

4. Some Special Codes

Example 4.1. The matrix

H =

1 1 1 0 1 0 01 1 0 1 0 1 01 0 1 1 0 0 1

is a control matrix for a binary [7, 4] code consisting of all the wordsin Z7

2 such that xH t = 0. The seven columns in H are all differentand together they are all the non-zero elements in Z3

2 . Therefore every


non-zero syndrome in Z32 has a unique coset leader of weight 1. For

example y = 0001000 has the syndrome yH t = 011 corresponding tothe fourth column in H. Every word x in Z7

2 which is not a code wordcan be corrected to a code word by only changing one digit in x . Whichdigit is to be changed is determined by which column in H correspondsto the syndrome xH t .

Codes with the properties explained in the last example carry aspecial name.

Definition 4.2. A linear [n,m] code over Z2, with a control ma-trix such that its columns are all different and constitute all non-zerocolumns in Zn−m2 , is called a binary Hamming code.

Remark 4.3. Hamming codes can only occur for special values ofthe parameters m and n. If r = n −m then the number of non-zerovectors in Zn−m2 is 2r− 1. This means that for a binary Hamming codewe have n = 2r−1 and m = n−r = 2r−1−r for some positive integerr. In Example 4.1 we have r = 3.

Remark 4.4. In the same way as in Example 4.1, we see that foran arbitrary binary Hamming code, it follows that every word in Zn2 iseither a code word or has Hamming distance 1 from a uniquely deter-mined code word. This implies that the spheres, of radius 1 and centrein a code word, cover Zn2 and that two such spheres never intersect.This means that every binary Hamming code is perfect.

Example 4.5. Let C be the [10, 8] code over the field Z11 definedby xH t = 0, where

H =

[1 1 1 1 1 1 1 1 1 11 2 3 4 5 6 7 8 9 10

].

Observe that the control matrix H is not of the normal form [−At | I2].If we so wish, it is easy to transform it to normal form but for ourpurposes it is more useful as it is. Note that the calculations are takingplace in Z11, so x ∈ Z10

11 is a code word if and only if{x1 + x2 + · · ·+ x10 = 0 (mod 11)

x1 + 2x2 + · · ·+ 10x10 = 0 (mod 11).

Assume that when transferring a code word z = (z1, . . . , z10) exactlyone error e has occurred at place k so that the received word is

x = (z1, . . . , zk + e, . . . , z10) .

Then the syndrome xH t is equal to (e, ke). From this we can directlydetermine the error e and also at which place it occurred, by dividing


the second component by the first. If for example x = 0610271355,then xH t = (8, 6) . Since 8−1 = 7 in Z11 we get k = 6 · 8−1 ≡ 42 ≡ 9(mod 11) . If only one error has occurred in x, then this has happenedat place 9 and the corresponding digit should be changed to 5− 8 ≡ 8.

If we do not want to use the ”digit 10” in the code words we cansimply remove all the words containing 10 from the code C. Employingthe principle of inclusion-exclusion, one can see that we then still have82,644,629 words left in the code. This means that we could issueso many ten digit telephone numbers and guarantee that the correctperson would be reached even if one digits had been pressed wrongly.

To prepare the next example we describe how two given codes canbe used to produce a new one.

Theorem 4.6. Let F be a finite field and C1, C2 be two linear codesin F n of dimension m1 and m2, respectively. Then

C = {(x, x+ y) ∈ F 2n; x ∈ C1 and y ∈ C2}is a linear [2n,m1 +m2] code. If σ1 is the separation of C1 and σ2 theseparation of C2, then the separation of C is

σ = min(2σ1, σ2).

Proof. We leave it to the reader to prove that C is a linear codeof dimension m1 + m2. To determine the separation of C we mustestimate the least possible weight of the non-zero words in C. If y = 0,then w(x, x) = 2w(x) ≥ 2σ1 and equality is obtained for some x 6= 0 inC1 . If y 6= 0, then w(x, x+y) ≥ w(y) ≥ σ2 and equality holds for x = 0and some y ∈ C2 . Hence the separation of C equals min(2σ1, σ2) . �

Example 4.7. By repeatedly using Theorem 4.6, we shall constructa code which has, amongst other things, been used by Mariner 9 to sendpictures of the planet Mars back to Earth.

Let C1 be the binary [4,3] code consiting of all the words x =(x1, x2, x3, x4) in Z4

2 such that

x1 + x2 + x3 + x4 = 0 (mod 2) .

The code C1 is generated by those words that contain an even numberof the digit 1. A non-zero word must therefore contain at least twoones, so the separation of C1 is 2. As C2 we take the code consistingof the two words 0000 and 1111. The code C2 has dimension 1 andseparation 4. If we now apply the construction of Theorem 4.6 to C1

and C2 , then we obtain a [8, 3 + 1] code with separation 4. Call thiscode C ′1 and now choose C ′2 to be the code in Z8

2 containing the twoelements for which their digits are either all 0 or all 1. If we then apply


Theorem 4.6 to C ′1 and C ′2 we get a [16,5] code with separation 8. Callthis C ′′1 and take C ′′2 to be the code in Z16

2 consisting of the two wordswith all digits equal. If we then yet again employ Theorem 4.6 we yielda [32,6] code with separation 16. This is the code that was used byMariner 9. Since the separation is 16, Theorem 1.7 tells us that up to15 errors are detected and that up to 7 errors can be corrected in eachword consisting of 32 letters. For this 32 − 6 = 26 control digits areneeded. The Mariner code belongs to a general class called Reed-Mullercodes.

The last example of this section is a classical code constructed byM. J. E. Golay in 1949.

Example 4.8. Let C be the [12,6] code over Z3 with generatormatrix

G = [I6 | A] =

1 0 0 0 0 00 1 0 0 0 00 0 1 0 0 00 0 0 1 0 00 0 0 0 1 00 0 0 0 0 1

∣∣∣∣∣∣∣∣∣∣∣

0 1 1 1 1 11 0 1 2 2 11 1 0 1 2 21 2 1 0 1 21 2 2 1 0 11 1 2 2 1 0

.The five last digits in the five last rows are obtained by a cyclic permu-tation of the vector 01221. It is easily checked that the scalar productsof the rows of G are zero (note that 2 = −1 in Z3 ). The code C istherefore self-dual. In particular, we have <x, x>= 0 for every wordx in C.

Since the letters in x are 0 or ±1, this implies that the weight w(x)must be divisible by 3. We will show that there does not exist a wordin C of weight 3. Such a word must be of the type (3 | 0), (2 | 1), (1 | 2)or (0 | 3), where the digits to the left and to the right of | tell us howmany of the first six and last six digits in the word are different from0, respectively. Since the code is self-dual, the scalar product of anycode word and any row of the generator matrix G must be zero. Thisis impossible for the words of the type (3 | 0) and (2 | 1). On the otherhand, every code word is a linear combination of the rows of G. This isimpossible for the types (1 | 2) and (0 | 3). This means that the lowestweight of a non-zero word in C is 6, which therefore is the separationof the code. If we now remove the first column of A in the generatormatrix we obtain a [11,6] code called the Golay code over Z3 and isdenoted by G11. By removing a letter from a word its weight is reducedby at most 1, so G11 has the separation 5 and can therefore correct upto 2 errors.


This shows that G11 is a perfect code. In order to check this onehas to show that equality holds in (6) of Theorem 1.11. For G11 wehave M = 36, n = 11, k = 2 and q = 3, so we must verify

36 ·[(

11

0

)+

(11

1

)· 2 +

(11

2

)· 22

]= 311 .

This is left to the reader.

Remark 4.9. In 1949 Golay also constructed a perfect binary[23,12] code with separation 7 denoted by G23. One can show thatGolay’s codes are the only perfect codes over a finite field containingmore that two words and correcting more than one error. To be moreprecise, every such code must be equivalent to either G11 or G23.

Exercises

Exercise 4.1. Construct a control matrix for a binary [15,11] Ham-ming code.

Exercise 4.2. Let F be a finite field and C be an [n,m] code inF n with separation 3. If C has a control matrix H such that everyvector in F n−m can be obtained by multiplying some column in H byan element in F , then C is called a Hamming code over K.

(1) Show that every such Hamming code is perfect.(2) Construct a control matrix for a [8,6] Hamming code over Z7.(3) Construct a control matrix for a [13,10] Hamming code over

Z3.(4) For which values of n andm does there exist an [n,m] Hamming

code over Zp?Exercise 4.3. Correct, with respect to the code in Example 4.5,

the received word 0617960587 under the condition that it contains atmost one error.

Exercise 4.4. Let H be the control matrix in Example 4.5. Whatconclusion can be drawn if one digit, but not both, is zero in the syn-drome xH t for the received word x?

Exercise 4.5. Describe a generator matrix for the code C in The-orem 4.6, if G1 and G2 are generator matrices for the codes C1 and C2.Do also construct a generator matrix for the code C ′1 in Example 4.7.

Exercise 4.6. Show that in a binary self-dual code the weight ofany element must be an even number.


Exercise 4.7. Let C be a binary code with generator matrix1 1 0 0 0 1 1 00 1 1 1 0 1 0 00 0 1 0 1 1 1 00 0 1 1 1 0 0 1

.

(1) Show that C is self-dual.(2) Use the result of Exercise 4.6 to calculate the separation d(C).

5. Vandermonde Matrices and Reed-Solomon Codes

In this last section we describe a particular type of codes with ahigh error-correcting capacity. They have, amongst other things, beenimportant for the development of modern CD-technology.

According to Theorem 3.10, a linear code with control matrix Hhas separation at least σ if every collection of σ − 1 columns in His linearly independent. We start by showing how to easily constructmatrices with a least fixed number of linearly independent columns.

Let F be a finite field and β0, β1, . . . , βd be different elements ofF . Then the factor theorem tells us that a polynomial c(x) in F [x] ofdegree at most d with zeros β0, β1, . . . , βd must be the zero polynomial.If

c(x) = c0 + c1x+ · · ·+ cdxd ,

then this implies that the system of equations1 β0 β2

0 . . . βd01 β1 β2

1 . . . βd1...

......

...1 βd β2

d . . . βdd

c0c1...cd

=

00...0

only has the trivial solution c0 = c1 = · · · = cd = 0 . This means thatthe coefficient matrix is invertible, so the columns of the transposedmatrix

(8)

1 1 . . . 1β0 β1 . . . βdβ20 β2

1 . . . β2d

......

...βd0 βd1 . . . βdd

are linearly independent. A matrix of this form is called a Vandermondematrix .

5. VANDERMONDE MATRICES AND REED-SOLOMON CODES 51

Now let n be an integer greater than d and let α0, α1, . . . , αn bedifferent elements of the field F . Then every collection of d+1 columnsfrom the matrix

(9)

1 1 . . . . . . 1α0 α1 . . . . . . αnα20 α2

1 . . . . . . α2n

......

...αd0 αd1 . . . . . . αdn

are linearly independent. This because the columns form a Vander-monde matrix. According to Theorem 3.10 every matrix of the form(9) is a control matrix of a linear code in F n+1 with separation d+ 2.

Example 5.1. Consider the linear [10,6] code over Z11 defined bythe control matrix

H =

1 1 1 1 1 1 1 1 1 11 2 3 4 5 6 7 8 9 101 22 32 42 52 62 72 82 92 102

1 23 33 43 53 63 73 83 93 103

where we calculate the powers in Z11. According to what we havejust proven, four arbitrary chosen columns in H are always linearlyindependent so the corresponding code has separation 5. Observe thatthe columns are contained in a four dimensional vector space, so morethat four vectors are always linearly dependent.

Since the separation is 5, it follows from Theorem 1.7 that the codecorrects two errors. This is an improvement compared with the codein Example 4.5 only correcting one error. The price for this is that thenumber of code words in Z10

11 are now only 116 compared with 118 inExample 4.5.

The code in Example 5.1 is a so called Reed-Solomon code. Ingeneral this name is given to every code over a finite field F with acontrol matrix of the form (9) where α0, α1, . . . , αn are all the non-zeroelements of F . If F has q elements then n = q − 2. Usually, we thenlist the elements α0, α1, . . . , αn by choosing a primitive element α ∈ Fand put αi = αi . Then the control matrix (9) takes the form

1 1 1 . . . 11 α α2 . . . αq−2

......

......

1 αd α2d . . . α(q−2)d


Since αq−1 = 1 in F , it is of course sufficient to calculate the exponentsmodulo q − 1.

Remark 5.2. If d = 2k − 1, in the control matrix (9), then theseparation is 2k + 1 and the corresponding code corrects k errors. Inmost applications we have F = GF (2m) and each “letter” in F can thenbe written as a string of m binary symbols, 0 or 1. If one considers acontinuous sequence of (k− 1)m+ 1 binary symbols in one word, thenthey can not influence more than k letters in GF (2m) . This meansthat a single “cascade” of binary errors of length ≤ (k − 1)m + 1 canbe corrected. This is the reason why Reed-Solomon codes are used intoday’s CD-technology. This is utilized when playing a disc to eliminatenoise caused by dust, fingerprints, small scratches, etc.

Example 5.3. Let us consider the case when F = GF (26) andk = 5. Since F has 64 elements, every word in a Reed-Solomon codeover F has length 63 if the letters are elements in F . This correspondsto binary words of length 6 · 63 = 378. When k = 5 we can correctsingle cascades of binary errors up to length (k− 1)m+ 1 = 25. In thiscase the control matrix (9) has d + 1 = 2k = 10 rows, so the code hasdimension 63− 10 = 53, as a vector space over F . This means that itcontains (26)53 = 2318 words.

Exercises

Exercise 5.1. Construct a linear [8,4] code over Z17 with separa-tion 5.

Exercise 5.2. Construct a control matrix for a Reed-Solomon codeover F = GF (23) that corrects 2 errors in F .

Documents

Finite Fields and Error-Correcting Codes · CHAPTER 1 Finite Fields 1. Basic De nitions and Examples In this introductory section we discuss the basic algebraic opera-tions addition