Finite Mathematicsturnbull.mcs.st-and.ac.uk/~colva/finitemaths/lnotes.pdf · Finite mathematics is related to almost all other areas of mathematics, and it also has a wide range of

Finite Mathematics

Nik Ruskuc and Colva M. Roney-Dougal

September 19, 2011

Contents

1 Introduction 31 About the course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 A review of some algebraic structures . . . . . . . . . . . . . . . . . 3

2 Coding theory 61 Motivation: transmission of messages . . . . . . . . . . . . . . . . . . 62 Hamming distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Linear codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Some more group theory: subgroups and cosets . . . . . . . . . . . . 135 Decoding with coset leaders and syndromes . . . . . . . . . . . . . . 136 Perfect codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Latin squares 211 Definition and existence . . . . . . . . . . . . . . . . . . . . . . . . . 212 Counting Latin squares . . . . . . . . . . . . . . . . . . . . . . . . . 213 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 Latin squares from finite fields . . . . . . . . . . . . . . . . . . . . . 245 Direct products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Finite geometries 281 Finite affine planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 Finite fields and affine planes . . . . . . . . . . . . . . . . . . . . . . 303 Affine planes and Latin squares . . . . . . . . . . . . . . . . . . . . . 324 Projective planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5 Designs and Steiner triple systems 391 Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392 Steiner triple systems . . . . . . . . . . . . . . . . . . . . . . . . . . 403 Subsystems and a recursive construction . . . . . . . . . . . . . . . . 414 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Packings and coverings . . . . . . . . . . . . . . . . . . . . . . . . . . 456 Designs from perfect codes . . . . . . . . . . . . . . . . . . . . . . . . 47

2

Chapter 1

Introduction

1. About the course

Finite mathematics is a very broad and heterogeneous area of mathematics, study-ing finite sets and configurations. The typical general problems it considers are theexistence of such configurations with certain properties, their number and charac-terisation.

Finite mathematics is related to almost all other areas of mathematics, and italso has a wide range of applications. These connections will be illustrated in thecourse. One underlying theme throughout will be applications of abstract alge-bra. The definitions and some examples of basic algebraic structures are given inSection 2.

The course will not be solely based on a single book. Therefore, the best studysource will be the lecture notes. Some useful texts are:

1. P.J. Cameron, Combinatorics: Topics, Techniques, Algorithms, CambridgeUniversity Press, Cambridge, 1994.

2. A.P. Street and W.D. Wallis, Combinatorial Theory: an Introduction, CBRC,Manitoba, 1977.

3. J.H. van Lint and R.M. Wilson, A Course in Combinatorics, Cambridge Uni-versity Press, Cambridge, 1992.

All these books, as well as all tutorial sheets and solutions, will be available inMathematics/Physics library on short loan. Also, any other book containing in itstitle the words such as ‘finite mathematics’, ‘discrete mathematics’, ‘combinatorics’is likely to contain material relevant to the course.

2. A review of some algebraic structures

In this section we recall definitions and some important examples of groups, fieldsand vector spaces.

Definition 2.1. A group is a non-empty set G with a binary operation ·, satisfyingthe following axioms.

(G1) xy ∈ G for all x, y ∈ G (closure).

(G2) x(yz) = (xy)z for all x, y, z ∈ G (associativity).

(G3) There exists an element e ∈ G (called the identity element) such that xe =ex = x for all x ∈ G.

3

4 Nik Ruskuc and Colva M. Roney-Dougal

(G4) For each x ∈ G there exists an element x−1 ∈ G (called the inverse of x) suchthat xx−1 = x−1x = e.

If, in addition, G satisfies xy = yx for all x, y ∈ G then G is said to be an abeliangroup, and the operation · is said to be commutative.

Example 2.2. For every n ≥ 1, the set Zn = {0, 1, . . . , n− 1} with addition mod-ulo n is an abelian group of order n. The set Sn of all permutations of the set{1, 2, . . . , n} with the composition of mappings is a non-abelian group of order n!.

For abelian groups it is customary to use additive notation, with + denoting theoperation, 0 denoting the identity element, and −x denoting the (additive) inverseof x.

One of the main tasks of group theory is to describe all finite groups, but thisdoes not seem to be attainable.

Definition 2.3. A field is a set F with two binary operations + and · and twodistinguished elements 0 and 1, such that the following axioms are satisfied.

(F1) F with the operation + is an abelian group, with identity element 0.

(F2) F\{0} with the operation · is an abelian group, with identity element 1.

(F3) x(y + z) = xy + xz for all x, y, z ∈ F (distributivity).

Example 2.4. The number fields Q, R and C are the main examples of fields.Also, if p is a prime then Zp, with addition and multiplication modulo p, is a field.

Unlike groups, one can describe all finite fields relatively easily.

Theorem 2.5 (The Fundamental Theorem for Finite Fields) If F is a fi-nite field then its order is a power of a prime. Conversely, if n is a power of aprime, then there exists a unique (up to isomorphism) field of order n.

For prime power n, we denote the unique finite field of order n by GF(n). Forprime n we often write Zn instead of GF(n).

In the following example we show how to construct GF(4) = GF(22).

Example 2.6. Consider the set F = {0, 1, x, x + 1} of all constant and linearpolynomials over the field Z2. Let the addition in F be the ordinary addition ofpolynomials, and let the multiplication be the ordinary multiplication of polyno-mials, with the additional condition that x2 = x + 1. We can construct tables forthese two operations:

+ 0 1 x x+ 10 0 1 x x+ 11 1 0 x+ 1 xx x x+ 1 0 1

x+ 1 x+ 1 x 1 0

· 0 1 x x+ 10 0 0 0 01 0 1 x x+ 1x 0 x x+ 1 1

x+ 1 0 x+ 1 1 x

Clearly, F with +, and F\{0} with · are abelian groups. The multiplication ofpolynomials is distributive over addition, so F is a field.

In fact all finite fields can be constructed in a similar way. To construct a fieldwith pn elements (p prime) one considers all polynomials of degree less than nover the field Zp, and uses a rule of the form f(x) = 0, where f is an irreduciblepolynomial of degree n, to simplify polynomials of higher degrees.

Finite Mathematics 5

Definition 2.7. Let F be a field, let V be an abelian group, and let there be anexternal multiplication of elements from V by elements from F . Then V is said tobe a vector space over F if the following axioms are satisfied:

(V1) (α+ β)x = αx+ βx;

(V2) α(x+ y) = αx+ αy;

(V3) (αβ)x = α(βx);

(V4) 1x = x;

for all α, β ∈ F and all x, y ∈ V .

We shall assume the familiarity with the elementary theory of vector spaces.In particular we shall consider as known the following concepts: subspaces, linearindependence, basis, dimension, isomorphism.

Example 2.8. Let F be a field. Then the set V = F d = {(x1, . . . , xd) : xi ∈ F}is a vector space over F with respect to the component-wise addition and scalarmultiplication. The dimension of this space is d.

Actually, the above example is generic, as the following theorem shows.

Theorem 2.9. (The Fundamental Theorem for Finite-dimensional VectorSpaces) If V is a d-dimensional vector space over a field F , then V is isomorphicto F d.

Example 2.10. Let V be the (unique) 3-dimensional vector space over GF(2) =Z2. By the fundamental theorem for finite-dimensional vector spaces, V is isomor-phic to Z3

2. The elements of V are

(0, 0, 0), (1, 0, 0), (0, 1, 0), (0, 0, 1), (1, 1, 0), (1, 0, 1), (0, 1, 1), (1, 1, 1)

and there are 23 = 8 of them. To see that Z32 is a vector space, you should note

that Z2 is a field, that (V,+) forms an abelian group (with identity (0, 0, 0) andsuch that the inverse of (a, b, c) is (a, b, c)), and that axioms V1, V2, V3 and V4 allhold.

Chapter 2

Coding theory

1. Motivation: transmission of messages

Let us consider the following situation. Person A is in a space-craft somewhere inspace. They navigate the space-craft according to the instructions that are receivedfrom person B who is on Earth. For simplicity let us assume that there are fourpossible instructions: go left, go right, go up and go down. These instructions aretransmitted as a binary radio-signal; in other words B can transmit either of twotypes of signals, which we denote by 0 and 1.

So we have to encode four ‘messages’ into ‘words’ of 0s and 1s. The first, mostobvious way is to do something like this:

left=00, right=01, up=10, down=11.

Once transmitted, the signal is affected by various factors, such as other radio-signals, cosmic rays etc. Thus there is a (small) chance that an emitted 0 will bereceived as 1, or that an emitted 1 will be received as 0. We assume three thingsabout the possibility that an error occurs:

(BSC1) The probability that a 0 is turned into a 1 is the same as the probability thata 1 is turned into a 0.

(BSC2) This probability p is the same for each digit, and is less than 0.5.

(BSC3) An error occurring in one digit does not affect the probability that an erroroccurs in another digit.

Any communication channel satisfying BSC1–3 is called a Binary Symmetric Chan-nel (BSC). In this course we will always assume that we have a BSC.

So assume that B sends the signal 00 (left), but the first 0 is changed into 1, sothat A receives 10. Clearly A has no indication that an error has occurred, as 10 isalso a valid instruction, and so A will go up, rather than left.

Consider another way of encoding our instructions:


Suppose that 000 (left) is sent, and a single error occurs, say changing the first 0into 1. This time A will be able to detect it, as 100 is not a valid message. However,A will not know what was the original message, even if A knows that only one errorhas occurred. For 100 might have been obtained from 110, as well as from 000, bychanging one symbol.

Consider a further example:


6


MESSAGE:w!Z2:m

Encoding function

Z2mE: Z2

n

Z2n Z2

mD:Z2m

DECODEDMESSAGE:z!

Z2n

ENCODEDMESSAGE:u!

BSC

RECEIVEDMESSAGE:v!Z2

n

Decoding function

Figure 2.1: Transmission of messages

This time messages are sufficiently different so as to allow detection and correctionof a single error. Thus, if A receives the message 10000, and if it is assumed thatonly one error has occurred, than A will know that the original message was 00000.

So, in essence, we are considering a scheme shown in Figure 2.1.

Definition 1.1. An n-ary code over Z2 is a subset C ⊆ Zn2 . The elements ofC are called code-words. Given a code C, an encoding function is any bijectionE : Zm2 −→ C. A decoding function is any function D : Zn2 −→ Zm2 such that foru ∈ Zm2 we have uED = u.

In the ideal situation we would like the following to happen: we take an arbitraryword w ∈ Zm2 , encode it, transmit it, then decode it, and we obtain the same wordw. This is clearly impossible, since we have no control over what will happen withw in the channel. So we want to ensure that we have a high chance of decodingthe message correctly. In other words, we want to be sure that we will decode thereceived word correctly, under the assumption that only a few errors have occurred.

2. Hamming distance

Since we will be dealing with elements of Zn2 throughout this chapter, let us recallthat these elements are n-tuples of 0s and 1s. Usually, instead of (x1, x2, . . . , xn)we shall simply write x1x2 . . . xn. We shall also frequently refer to these elementsas words, rather than vectors. We also recall the addition and multiplication in Z2:0 + 0 = 1 + 1 = 0, 0 + 1 = 1 + 0 = 1, 0 · 0 = 0 · 1 = 1 · 0 = 0, 1 · 1 = 1. Note that forevery x we have x = −x, as x+ x = 0 over Z2.

The Hamming distance provides us with a means of measuring the differencebetween any two words from Zn2 .

Definition 2.1. Let x = x1x2 . . . xn and y = y1y2 . . . yn be two words from Zn2 .The Hamming distance d(x, y) between x and y is the number of places in whichthey differ:

d(x, y) = |{i : 1 ≤ i ≤ n, xi 6= yi}|.

Closely related to this is the notion of weight.


Definition 2.2. The weight of a word x = x1x2 . . . xn ∈ Zn2 , denoted by wt(x), isthe number of 1’s in x:

wt(x) = |{i : 1 ≤ i ≤ n, xi = 1}|.

The connection between distance and weight is as follows.

Theorem 2.3. For any two x, y ∈ Zn2 we have

(i) d(x, y) = wt(x− y);

(ii) wt(x) = d(x, 0),

where 0 denotes the zero-vector in Zn2 .

Proof. We have

d(x, y) = |{i : xi 6= yi}| = |{i : xi − yi 6= 0}| = |{i : xi − yi = 1}| = wt(x− y).

The proof of (ii) is similar. �

Next we prove that the Hamming distance has the usual properties of a distancefunction:

Theorem 2.4. The set Zn2 with the Hamming distance d is a metric space. Inother words, d has the following properties:

(i) d(x, y) ≥ 0;

(ii) d(x, y) = 0⇐⇒ x = y;

(iii) d(x, y) = d(y, x);

(iv) d(x, z) ≤ d(x, y) + d(y, z) (the triangle inequality);

for all x, y, z ∈ Zn2 .

Proof. Properties (i), (ii) and (iii) are obvious, and we leave the proofs as anexercise. For (iv) note that

{i : xi 6= zi} ⊆ {i : xi 6= yi or yi 6= zi} = {i : xi 6= yi} ∪ {i : yi 6= zi},

so that

d(x, z) = |{i : xi 6= zi}| ≤ |{i : xi 6= yi}|+ |{i : yi 6= zi}| = d(x, y) + d(y, z),

as required. �

Definition 2.5. The minimum distance of a code C is the minimum distance be-tween any two codewords of C.

Now let us again consider a typical transmission process, where we have a codeC ⊆ Zn2 , and where a word u ∈ C has been transmitted through the channel. Weknow that errors may occur, and so, in general, the received word v will be distinctfrom u. If we let x = v − u(= v + u) we say that x is the error of transmission.From x = v−u we clearly have v = u+x, and so we say that the channel has addedthe error x to the transmitted word u.

Let X ⊆ Zn2 be arbitrary. We think of X as the collection of errors which aremore likely to occur than the others. We say that we can detect errors from X iffor every code-word u ∈ C we have u+x 6∈ C. Similarly, we say that we can correcterrors from X if for all u, u1 ∈ C and all x, x1 ∈ X the equality u + x = u1 + x1

implies u = u1 (and x = x1). This means that no received word could havebeen produced by adding errors in X to two different code-words. There is astrong connection between the Hamming metric and the error-detecting and error-correcting capabilities of a code.


Theorem 2.6. Let C ⊆ Zn2 be a code, and let k ≥ 1.

(i) We can detect every error of weight at most k if and only if C has minimumdistance at least k + 1.

(ii) We can correct every error of weight at most k if and only if C has minimumdistance at least 2k + 1.

Proof. (i) (⇒) Suppose that we can detect every error of weight at most k. Letu, v ∈ C, u 6= v. Note that v = u+ (v − u) ∈ C, so that we cannot detect the errorv − u. Hence k < wt(v − u) = d(u, v), and the minimum distance is at least k + 1.

(⇐) Suppose that C has minimum distance k+ 1. Let u ∈ C and let wt(x) ≤ k.Consider the word u+x. We have d(u, u+x) = wt(u+x−u) = wt(x) ≤ k, so thatu+ x 6∈ C, and we can detect x. Thus we can detect every error of weight at mostk.

(ii) (⇒) Let us assume that we can correct every error of weight at most k,but that there are two code-words u, v ∈ C such that d(u, v) ≤ 2k. If we letI = {i : ui 6= vi}, then |I| ≤ 2k, and hence we can write I as the union of twodisjoint subsets I1 and I2 of size at most k:

I = I1 ∪ I2, |I1| ≤ k, |I2| ≤ k, I1 ∩ I2 = ∅.

Define two words x = x1x2 . . . xn and y = y1y2 . . . yn in Zn2 as follows:

xi = 1⇔ i ∈ I1,yi = 1⇔ i ∈ I2.

Clearly we have wt(x) ≤ k, wt(y) ≤ k and u+ x+ y = v. The last equality can bewritten as u + x = v + y, and, because of the error correcting capability of C, weconclude that u = v, a contradiction. So the minimum distance of the code is atleast 2k + 1.

(⇐) Suppose that C has minimum distance at least 2k + 1. Let u, v ∈ C andx, y ∈ Zn2 be such that wt(x) ≤ k, wt(y) ≤ k and u+ x = v + y. Then we have

d(u, v) = wt(u− v) = wt(y − x) = wt(x+ y) ≤ wt(x) + wt(y) ≤ 2k.

Since the minimum distance is at least 2k+1, we conclude that u = v, meaning thatwe can correct all errors of weight at most k. �

Example 2.7. Consider the code C = {00001, 01010, 10100, 11111} ⊆ Z52. The

distances between elements of C are respectively 3, 3, 4, 4, 3, 3, and so we candetect errors of weight at most 2, and correct errors of weight 1.

3. Linear codes

We have seen that a code is simply a subset C ⊆ Zn2 , that an encoding function is abijection E : Zm2 −→ C and a decoding function is a mapping D : Zn2 −→ Zm2 suchthat uED = u for all u ∈ Zn2 . The problem with these general codes is that theencoding and decoding functions are not convenient for computing. For instance,for the encoding function one has to store a table containing all the elements of Zm2and the corresponding elements of C, and to look in this table whenever sendinga message. This problem can be overcome by giving codes an algebraic structure,most often that of a vector space.

Definition 3.1. A linear code is any subspace of the vector space Zn2 .


The first advantage of having a linear code, as opposed to an arbitrary code, isthat it is easier to analyse its error-detecting and error-correcting capabilities.

Theorem 3.2. Let C ⊆ Zn2 be a linear code. Then the minimum distance is equalto the minimal weight of a non-zero vector in C. In particular, we can detect(respectively, correct) every error of weight at most k if and only if this minimalweight is at least k + 1 (respectively, 2k + 1).

Proof. Let M be the minimum distance, with d(x, y) = M , and let N be theminimal weight of a non-zero vector, with wt(z) = N . Since C is a subspace of Zn2we must have x− y ∈ C and also 0 ∈ C. But then we have

M = d(x, y) = wt(x− y) ≥ N = wt(z) = d(z, 0) ≥M,

and so M = N as required. �

Assume that we want to encode elements of Zm2 by means of a linear codeC ⊆ Zn2 . Then C is a subspace of Zn2 with 2m elements, and so dim(C) = m. Hencewe can define C by listing a basis for C, which is a set of m linearly independentvectors from C:

ai = ai1ai2 . . . ain, 1 ≤ i ≤ m,If we take these vectors for the rows of a matrix G, we obtain what is called agenerator matrix for C:

G =

a1

a2

...am

=

a11 a12 . . . a1n

a21 a22 . . . a2n

......

...am1 am2 . . . amn

.It is worth remarking that G is not unique, as C has several bases.

Example 3.3. Let C := {000, 110, 101, 011}. Then

G1 =[

1 1 00 1 1

]is a generator matrix for C, but so is

G2 =[

1 0 11 1 0

].

The generator matrix of a code can be used to define an easy encoding function.

Theorem 3.4. Let C ⊆ Zn2 be a linear code of dimension m, and let G be a gen-erator matrix for C. Then the function E : Zm2 −→ Zn2 defined by E(x) = xG, isan encoding function.

Proof. We have to prove that E is a bijection. First note that for x =x1x2 . . . xm ∈ Zm2 we have

E(x) = xG = [x1 x2 . . . xm]

a1

a2

...am

= x1a1 + x2a2 + . . .+ xmam ∈ C,

and so E maps Zm2 onto C, since {a1, . . . , am} is a basis for C. Also,

E(x) = E(y)⇒ x1a1 + . . .+ xmam = y1a1 + . . .+ ymam ⇒ xi = yi (1 ≤ i ≤ m),

since the vectors a1, . . . , am are linearly independent. Therefore E is indeed abijection. �


Example 3.5. Let us consider the encoding function E : Z32 −→ Z6

2 given by the

generator matrix G =

1 0 1 1 0 00 1 1 0 1 11 0 1 0 0 1

. Then we have [1 0 1]G = [0 0 0 1 0 1]

and so the word 101 is encoded as 000101. In this way one can calculate all the code-words, and obtain C = {000000, 101100, 011011, 101001, 110111, 000101, 110010,011110}. The weights of the code-words are respectively 0, 3, 4, 3, 5, 2, 3, 4.So we can detect single errors, but cannot correct them.

Another way to define an m-dimensional linear code in Zn2 is to give it as thenull-space of an (n−m)× n matrix with linearly independent rows. This matrix iscalled the parity check matrix.

Example 3.6. Consider the matrix

H =[

1 1 1].

A vector v = v1v2v3 ∈ Z32 is in the null-space ofH if and only if v1+v2+v3 = 0. Thus

the null-space of H is {000, 110, 101, 011}, which is exactly the code of Example 3.3.

The question then arises about the connection between the generator matrixand the parity check matrix for the same code.

Let the code C be given by the generator matrix G = [aij ]m×n, and let w =w1w2 . . . wn ∈ Zn2 . Then w ∈ C if and only if there exists x = x1x2 . . . xm ∈ Zm2such that xG = w. This is a system of n equations in the variables x1, . . . , xm andw1, . . . , wn. If we eliminate x1, . . . , xm from this system, we obtain a homogeneoussystem of n −m equations in variables w1, . . . , wn. This system can be written asHwT = 0, where H is an (n−m)× n matrix whose (i, j)-entry is the coefficient ofwj in the i-th equation. So we have w ∈ C if and only if HwT = 0, and hence H isa parity check matrix for C.

Conversely, if we are given a parity check matrix H = [bij ](n−m)×n, then forevery w ∈ C we have

HwT = 0.

If w = w1w2 . . . wn, then the above equality can be written as a system of n −mequations in n variables w1, . . . , wn. We can solve this system for w1, . . . , wn. Sincethe number of variables is greater than the number of equations, n− (n−m) = mparameters x1, . . . , xm will appear; in other words we obtain the solution in theform

wj = a1jx1 + . . .+ amjxm (1 ≤ j ≤ n).

Thus, if we define G = [aij ]m×n, we have xG = w, and G is a generator matrix forC.

Example 3.7. Let us find a parity check matrix for the code given in Example 3.5.So we consider the system xG = w, where x = x1x2x3 and w = w1w2w3w4w5w6.In expanded form this system is:

x1 + x3 = w1

x2 = w2

x1 + x2 + x3 = w3

x1 = w4

x2 = w5

x2 + x3 = w6.

Substituting any values in for x1, x2 and x3 would yield a codeword W . Instead weeliminate x1, x2, x3 by using x1 = w4, x2 = w2, x3 = w1 + w4 (remember, over Z2


we have x1 + x1 = 0). This gives us a set of 3 equations that do not involve x1, x2

or x3:w1 + w2 + w3 = 0

w2 + w5 = 0w1 + w2 + w4 + w6 = 0.

Any set of 6 variables w1w2w3w4w5w6 that satisfy these three equations is a code-word. We can write these three equations as HwT = 0, where

H =

1 1 1 0 0 00 1 0 0 1 01 1 0 1 0 1

,and H is a parity check matrix for C.

Example 3.8. Let a code C ⊆ Z62 be given by the parity check matrix

H =[

1 1 1 0 0 01 0 1 1 1 1

].

We find the corresponding generator matrix as follows. We consider the systemHwT = 0, which is equivalent to

w1 + w2 + w3 = 0w1 + w3 + w4 + w5 + w6 = 0.

Any choice of w1w2w3w4w5w6 which satisfies these two equations is a codeword. Letus therefore solve it for w1, w2, w3, w4, w5, w6, writing the parameters as x1, x2, x3, x4:

w1 = x1

w2 = x2

w3 = x1 + x2

w4 = x3

w5 = x4

w6 = x2 + x3 + x4.

This solution can be written as w = xG, where

G =

1 0 1 0 0 00 1 1 0 0 10 0 0 1 0 10 0 0 0 1 1

,and G is a generator matrix for C.

Remark 3.9. If we start with with a generator matrix G, find the correspondingparity check matrix H and then the corresponding generator matrix G1, we neednot have G = G1. This is because a code may have several different generatormatrices.

As one might expect, the parity check matrix also contains information aboutthe error detecting and correcting capabilities of the code.

Theorem 3.10. Let C ⊆ Zn2 be a linear code with parity check matrix H. Thenthe minimum distance is equal to the size of the smallest set of linearly dependentcolumns of H. In particular, we can detect (respectively, correct) all errors of weightup to k if and only if the size of the smallest set of linearly dependent columns ofH is k + 1 (respectively, 2k + 1).


Proof. By Theorem 3.2 since C is linear the minimum distance is equal to theminimum weight of a non-zero code-word. Write H = [c1 . . . cn], where the ci arecolumns of H. For a word w = a1 . . . an ∈ Zn2 we have w ∈ C if and only ifHwT = 0, i.e. if and only if a1c1 + . . . ancn = 0. The word w has weight k if andonly if exactly k of a1, . . . , an are equal to 1. Therefore C contains a word of weightk if and only if H has a set of k linearly dependent columns. �

4. Some more group theory: subgroups and cosets

In this section we recall some more elementary group theory that we will need inSection 5. In order to make the notation compatible with what follows, we shall usethe additive notation for groups; in other words we shall denote the group operationby +, the identity element by 0, and the inverse of x by −x.

A non-empty subset H of a group G is a subgroup if it is a group itself underthe same operation. For example, if G = Z6 = {0, 1, 2, 3, 4, 5}, then H = {0, 2, 4}is a subgroup, while K = {0, 1, 2, 3} is not. Actually, it can be proved that H is asubgroup of G if and only if H is closed under + and under taking inverses.

Let G be a group, let H be a subgroup of G, and let a ∈ G. The coset of Hdetermined by a is the set

a+H = {a+ h : h ∈ H}.

The main properties of cosets are given in the following

Theorem 4.1. Let G be a group, and let H be a subgroup of G. The cosets of Hsatisfy the following properties.

(i) H = 0 +H is a coset of itself.

(ii) a ∈ a+H for every element a ∈ G.

(iii) |a + H| = |b + H| for all a, b ∈ G; in other words, any two cosets of H havethe same number of elements.

(iv) Any two distinct cosets of H are disjoint (i.e. their intersection is the emptyset).

(v) G =⋃a∈G(a+H); in other words, G is the union of all cosets of H.

Proof. Exercise. �

The above theorem can be summed up as follows: the cosets of a subgrouppartition the group into blocks of equal size.

5. Decoding with coset leaders and syndromes

Let C ⊆ Zn2 be an m-dimensional linear code, let G be a generator matrix for C,and let H be a parity check matrix for C. We have seen that G yields an easy-to-compute encoding function E : Zm2 −→ Zn2 given by E(x) = xG. In this sectionwe discuss decoding.

First of all we introduce a restriction on the generator matrix G: we requirethat G be in standard form, meaning that G is written as

G =

1 0 . . . 0 b11 b12 . . . b1,n−m0 1 . . . 0 b21 b22 . . . b2,n−m...

......

......

...0 0 . . . 1 bm1 bm2 . . . bm,n−m

.


So G consists of the identity matrix Im, followed by an m× (n−m) matrix B; wewrite briefly G = [Im|B].

The reason for making this restriction is that it makes it easy to decode thecode-words. Note that for every x ∈ Zm2 we have

E(x) = xG = x[Im|B] = [xIm|xB] = [x|xB].

So every word x ∈ Zm2 is encoded as a longer word beginning with x. Conversely, ifa code-word w = w1w2 . . . wn ∈ C is received, we ought to decode it as w1w2 . . . wm.

Another advantage of G being in the standard form is that it is easy to find thecorresponding parity check matrix.

Theorem 5.1. Let C ⊆ Zn2 be an m-dimensional linear code. Then G = [Im|B] isa generator matrix for C if and only if the matrix H = [BT |In−m] is a parity checkmatrix for C.

Proof. We show that if G = [Im|B] is a generator matrix for C, then H =[BT |In−m] is a parity check matrix for C, and leave the converse as an exercise.

We consider the system xG = w. In an expanded form this system is:

x1 = w1

x2 = w2

. . ....

xm = wmb11x1 + b21x2 + . . . + bm1xm = wm+1

b12x1 + b22x2 + . . . + bm2xm = wm+2

......

b1(n−m)x1 + b2(n−m)x2 + . . . + bm(n−m)xm = wn

We solve this for w by substituting w1 = x1, . . . , wm = xm, yielding the system ofn−m equations:

b11w1 + b21w2 + . . . + bm1wm + wm+1 = 0b12w1 + b22w2 + . . . + bm2wm + wm+2 = 0...

. . ....

b1(n−m)w1 + b2(n−m)w2 + . . . + bm(n−m)wm + wn = 0

This system of equations can be written as HwT = 0 where H = [BT |In−m]. �

The problem arises when we want to decode a word which is not a code-word.A reasonable approach to this is to find first the corresponding code-word, and thento decode this code-word, as explained above. This amounts to finding the error oftransmission. However, since every word is a possible error, we cannot determinewhich error has occurred. Instead, we want to discover the error which is most likelyto have occurred. Now remember that the probability of a single error is small, andcertainly smaller than 0.5. This means that errors of small weights are likelier tooccur than those of large weights. Consequently, for any received word we want tofind the code-word closest to it (with respect to the Hamming distance).

Note that C, being a linear code, is certainly a subgroup of Zn2 . Thus we maytalk about cosets of C in Zn2 . If C has dimension m, then |C| = 2m, and so thereare 2n/2m = 2n−m cosets, say C = C1, C2, . . . , C2n−m .

Definition 5.2. Let Ci be a coset of C, and let a ∈ Ci. A coset leader of Ci is anyelement a ∈ Ci of minimal weight; in other words a is a coset leader if for any otherb ∈ Ci we have wt(a) ≤ wt(b).


The following theorem shows how coset leaders give us a method for decodingwith the desired properties.

Theorem 5.3. Let C ⊆ Zn2 be an m-dimensional linear code, let C = C1, C2, . . . ,C2n−m be the cosets of C, and let a1, a2, . . . , a2n−m be respective coset leaders. If aword w ∈ Zn2 belongs to the coset Ci then w + ai is the code-word closest to w.

Proof. From w ∈ Ci = ai + C it follows that w = ai + v for some v ∈ C. Butthen w + ai = w − ai = v ∈ C is a code-word.

Let u ∈ C be any other code-word, and let b = w + u. We have

b = w + u = ai + v + u ∈ ai + C = Ci.

Since ai is a coset leader for Ci, we have wt(ai) ≤ wt(b), and so

d(w, u) = wt(w − u) = wt(b) ≥ wt(ai) = wt(w − v) = d(w, v),

as required. �

This solution for the problem of decoding is still not entirely satisfactory: wehave not avoided the need to store all the elements of Zn2 . We solve this finalproblem by introducing the following new concept.

Definition 5.4. Let C ⊆ Zn2 be a linear code of dimension m, and let H be itsparity check matrix. For a word w ∈ Zn2 , its syndrome is the word HwT ∈ Zn−m2 .

The significance of syndromes for decoding is based on the following theorem.

Theorem 5.5. Let C ⊆ Zn2 be a linear code, let H be its parity check matrix, andlet w1, w2 ∈ Zn2 . Then w1 and w2 belong to the same coset of C if and only if theirsyndromes are equal.

Proof. (⇒) Assume that w1 and w2 belong to the same coset a+ C of C. Thismeans that w1 = a + u, w2 = a + v for some u, v ∈ C. Since H is a parity checkmatrix for C we have HuT = HvT = 0, and so

HwT1 = H(aT + uT ) = HaT = H(aT + vT ) = HwT2 .

(⇐) Now assume that HwT1 = HwT2 . This implies that H(wT1 − wT2 ) = 0, andhence w1−w2 ∈ C. If we denote w1−w2 by u, then we have w1 = w2 +u ∈ w2 +C,and so w1 and w2 belong to the same coset of C. �

So assume that we know coset leaders for C (we show in Example 5.6 how tocalculate them) and corresponding syndromes. Then we can find the coset leaderfor an arbitrary word just by computing its syndrome. Therefore, we have no needto store 2n elements of Zn2 , but instead store 2n−m coset leaders, and the samenumber of syndromes.

If we combine all the results from this section we obtain the following methodfor decoding.

Decoding. Let C ⊆ Zn2 be an m-dimensional linear code, with the generatormatrix G in standard form, and the corresponding parity check matrix H. Prior todecoding do the following four steps:

1) calculate code-words of C;

2) find coset leaders for C;


3) for each coset leader calculate its syndrome;

4) Store the table of coset leaders and their syndromes: all the other codewordsmay be discarded.

To decode an arbitrary word w ∈ Zn2 do the following four steps:

1) calculate the syndrome of w;

2) find the coset leader a with the same syndrome;

3) let v = a+ w;

4) decode w as the first m symbols of v.

Example 5.6. Let C be the code with the generator matrix

G =

1 0 0 1 1 00 1 0 0 1 10 0 1 1 0 1

.The corresponding parity check matrix is

H =

1 0 1 1 0 01 1 0 0 1 00 1 1 0 0 1

.The elements of C can be obtained as xG, x ∈ Z3

2. We obtain

C = {000000, 100110, 010011, 001101, 110101, 101011, 011110, 111000}.

Now we find coset leaders. We list the elements of C in the first row of a table,and then we find an element of Zn2 of minimal weight which is not listed. Forexample, we can take 100000 to be this element. We add this element to all theelements of C, and thus obtain the second row of our table. So after these two stepsthe table looks like this:

000000 100110 010011 001101 110101 101011 011110 111000100000 000110 110011 101101 010101 001011 111110 011000

Next, we again find an element of minimal weight not already listed, and add it toall the elements of C, obtaining the third row of the table. We keep doing this untilwe exhaust all the elements of Zn2 . We obtain the following table:

000000 100110 010011 001101 110101 101011 011110 111000100000 000110 110011 101101 010101 001011 111110 011000010000 110110 000011 011101 100101 111011 001110 101000001000 101110 011011 000101 111101 100011 010110 110000000100 100010 010111 001001 110001 101111 011010 111100000010 100100 010001 001111 110111 101001 011100 111010000001 100111 010010 001100 110100 101010 011111 111001100001 000111 110010 101100 010100 001010 111111 011001

It is easy to see that the rows of the table are cosets of C, and that the elements inthe first column are coset leaders.

Next, for each leader we calculate its syndrome. For example

HaT1 =

000

, HaT2 =

110

,


where a1 and a2 are the first two coset leaders. The complete table of coset leadersand syndromes is as follows:

000000 000100000 110010000 011001000 101000100 100000010 010000001 001100001 111

To decode, say, the word w = 110111 we first compute its syndrome:

HwT =

010

.The corresponding coset leader is 000010, and so the correct code-word is 110111+

000010 = 110101. Consequently, w should be decoded as 110.We give a second example, and decode the word w = 110001. Again, we first

compute its syndrome:

HwT =

100

.The corresponding coset leader is 000100, and so the correct code-word is 110001 +000100 = 110101. We decode w as 110.

6. Perfect codes

We have seen that a code is a device for transfer of information, potentially capableof detecting and correcting random errors in transmission. However, these twofunctions of a code are in conflict with one another. For example, any one-elementcode C = {u}, u ∈ Zn2 can correct all errors, but cannot carry any information. Atthe other extreme, the full code C = Zn2 can carry a lot of information, but cannotdetect any errors, let alone correct them.

Recall that(ni

)= n!

i!(n−i)! is the number of ways of choosing i objects from aset of n objects. The following theorem gives an upper bound for the number ofcode-words in a code of specified error correcting capabilities.

Theorem 6.1. For a code C ⊆ Zn2 with minimum distance at least 2e+ 1 we have

|C| ≤ 2n/e∑i=0

(n

i

).

Proof. For each code-word w ∈ C consider the ‘ball’ with centre w and radius e:

B(w, e) = {u ∈ Zn2 : d(u,w) ≤ e}.

We claim that for distinct w1, w2 ∈ C we have

B(w1, e) ∩B(w2, e) = ∅.

Indeed, u ∈ B(w1, e) ∩B(w2, e) would imply

d(w1, w2) ≤ d(w1, u) + d(w2, u) ≤ e+ e < 2e+ 1,


a contradiction.Next note that

B(w, e) =e⋃i=0

{u ∈ Zn2 : d(w, u) = i}.

For each i, the set {u ∈ Zn2 : d(w, u) = i} consists of all the words which differfrom w in exactly i positions; clearly there are exactly

(ni

)such words. Hence

|B(w, e)| =e∑i=0

(n

i

).

Since for all w ∈ C the sets B(w, e) are disjoint subsets of Zn2 we conclude

2n = |Zn2 | ≥∑w∈C|B(w, e)| = |C|

e∑i=0

(n

i

),

and the desired inequality follows. �

Definition 6.2. A code C ⊆ Zn2 is said to be perfect if it attains the equality in theprevious theorem, i.e. if the minimum distance is 2e+ 1 and |C| = 2n/

∑ei=0

(ni

).

We now show that perfect codes exist.

Example 6.3. Let H be any d× (2d−1) matrix, the columns of which are all non-zero vectors from Zd2, and let C ⊆ Z2d−1

2 be the code with parity check matrix H. Itis obvious that every two columns of H are linearly independent, and that one canfind three linearly dependent columns. Therefore, by Theorem 3.10, the minimumdistance is 3, and we can correct single errors. In the notation of Theorem 6.1 wehave n = 2d − 1 and e = 1. The generator matrix for C has dimension (n− d)× n,and so the number of code-words is

2n−d = 2n/2d = 2n/(1 + n) = 2n/e∑i=0

(n

i

).

Therefore, C is a perfect code; it is called the (2d − 1, 2d − d− 1) Hamming code.Let us, for example, construct the (7, 4) Hamming code, so that d = 3. We have

the freedom of choice in which order to put the columns of H. With future use inmind, we opt for

H =

0 1 1 1 1 0 01 1 0 1 0 1 01 0 1 1 0 0 1

.By Theorem 5.1 the corresponding generator matrix is

G =

1 0 0 0 0 1 10 1 0 0 1 1 00 0 1 0 1 0 10 0 0 1 1 1 1

,and hence the code-words are

C = {0000000, 1000011, 0100110, 0010101, 0001111,1100101, 1010110, 1001100, 0110011, 0101001, 0011010,

1110000, 1101010, 1011001, 0111100, 1111111}.


The minimum weight of C is 3, and hence the minimum distance is 3.It is possible (and not too difficult) to show that Hamming codes are the only

linear perfect codes with e = 1. However, there exist various non-linear perfectcodes with e = 1.

For e > 1 the perfect codes are few and far between. For example, for e = 2, aperfect code C ⊆ Zn2 must satisfy:

|C| = 2n/2∑i=0

(n

i

)= 2n+1/(n2 + n+ 2).

So we must have n2 + n + 2 = 2a for some a. If we multiply both sides by 4 andset x = 2n+ 1, y = a+ 2, we obtain the equation

x2 + 7 = 2y.

This equation is known as Nagell’s equation, and its solutions are (±1, 3), (±3, 4),(±5, 5), (±11, 7) and (±181, 15) (this is non-trivial). Since e = 2 we must haven ≥ 2e+1 = 5, and so 2n+1 = x ≥ 11. We see that the only possibilities are n = 5and n = 90. For n = 5 we have the following

Example 6.4. Let C = {00000, 11111} ⊆ Z52. Here we have |C| = 2, n = 5, e = 2

and

|C| = 2 = 32/16 = 25/(1 + 5 + 10) = 25/((

50

)+(

51

)+(

52

)).

So C is a (not very exciting) perfect code; it is called the 5-repetition code.

On the other hand there is no perfect code C ⊆ Z902 with e = 2; we will show

this in Corollary 6.2 in Chapter 5.Consider now the case e = 3; note that here we must have n ≥ 7. This time we

have

|C| = 2n/3∑i=0

(n

i

)= 3 · 2n+1/(6 + 6n+ 3n(n− 1) + n(n− 1)(n− 2)).

If we put m = n+ 1 we see that we must have

m(m2 − 3m+ 8) = 3 · 2a

for some a. We have the following two cases.Case 1: m = 2b, m2− 3m+ 8 = 3 · 2c for some b, c. We have n = 2b− 1 ≥ 7, so

we must have b ≥ 3. If b ≥ 4 we have m2− 3m+ 8 ≡ 8 (mod 16), so that c = 3 andm2 − 3m + 8 = 24. But this last equation has no integer solutions. For b = 3 wehave m = 8, c = 4, n = 7 and |C| = 2. The 7-repetition code {0000000, 1111111} isan example of this situation.

Case 2: m = 3 · 2b, m2 − 3m + 8 = 2c for some b, c. We must have b ≥ 2 as ifb = 1 we get n = 5. The case b ≥ 4 is eliminated as in Case 1. For b = 2 we havem = 12 and m2 − 3m + 8 = 116, which is not a power of 2. For b = 3 we havem = 24, n = 23. An example of such a code is given below.

Example 6.5. Consider the (7,4) Hamming code H, as defined in Example 6.3.Extend each code-word in H by one component, which is equal to the sum of allthe other components. Thus, for example, the code-word 1110000 is extended to11100001, while the code-word 1101010 is extended to 11010100. The obtained codeH has the minimum weight of a non-zero code-word equal to 4, as the minimumweight of H is 3. Let H∗ ⊆ Z7

2 be the code obtained from H by reversing all the


code-words, and let H∗ be the code obtained from H∗ by adding to each code-wordthe extra component equal to the sum of all components. Finally, form a new codeC ⊆ Z24

2 as follows:

C = {(a+ x, b+ x, a+ b+ x) : a, b ∈ H, x ∈ H∗}.

If A and X are bases for H and H∗ respectively, then one may prove that theset

{(a, 0, a) : a ∈ A} ∪ {(0, b, b) : b ∈ A} ∪ {(x, x, x) : x ∈ X}

is a basis for C. In particular, |C| = 212. Also, one may prove that the minimalweight of a non-zero code-word in C is equal to 8.

Now, delete the last component of every vector in C to obtain a new codeC ⊆ Z23. It still has 212 code-words, and the minimum weight of a non-zero code-word is 7, so that C corrects up to three errors (e = 3). Now we have

2n/e∑i=0

(n

i

)= 223/(

(230

)+(

231

)+(

232

)+(

233

))

= 223/(1 + 23 + 253 + 1771) = 223/211 = 212 = |C|,

and C is perfect! The code C is called the (binary) Golay code.

It actually turns out that for e > 1 the Golay code and the repetition codes arethe only perfect codes that exist.

Chapter 3

Latin squares

1. Definition and existence

Let us consider the following problem posed by Euler in 1779.

The thirty six officers problem. There are six regiments, each having six officers,one of each of six possible ranks. Is it possible to parade these thirty six officers ina six by six pattern, so that every row and every column contain exactly one officerof each rank and exactly one member of each regiment?

Euler conjectured that the answer was negative. This was finally proved byTarry in 1900 by a systematic examination of all possibilities. Today this can bedone relatively easily using computers.

Let us consider the first condition in the problem: every row and every columnshould contain exactly one officer of each rank. Denote the ranks by 1, 2, 3, 4,5, 6, and replace the officers by their ranks. We obtain a 6 × 6 array of numbers{1, 2, 3, 4, 5, 6}, such that every row and every column of the array contain eachnumber exactly once.

Definition 1.1. A Latin square of order n is an n×n array of numbers {1, 2, . . . , n}(or some other n symbols) in which every row and every column contains eachnumber exactly once.

Do Latin squares exist? What are the possible orders of Latin squares? Thefollowing theorem answers these questions.

Theorem 1.2. Let G = {g1, g2, . . . , gn} be a finite group of order n. The multipli-cation table for G is a Latin square. In particular, for each n there exists a Latinsquare of order n.

Proof. We prove that an arbitrary row, corresponding to the element gi, containsan arbitrary element gk. The proof for columns is similar. Let gj = g−1

i gk. Thenthe (gi, gj) entry in the table is gigj = gig

−1i gk = gk.

The second statement follows from the first and the fact that for every n thereexists a group of order n (e.g. Zn). �

2. Counting Latin squares

In this section we will prove that there are ‘many’ Latin squares of order n. To dothis, we need to make a significant detour.

Let A1, . . . , An be sets. A system of distinct representatives (SDR for short) forthese sets is an n-tuple (x1, . . . , xn) of elements with the properties:

21


• xi ∈ Ai for i = 1, . . . , n (so that xi is a representative of Ai);

• xi 6= xj for i 6= j (so that all representatives are distinct).

A system of distinct representatives therefore contains one element from eachset Ai with 1 ≤ i ≤ n, and these elements are all different.

Example 2.1. Let A1 := {1, 2, 3, 4}, A2 := {2, 4, 7}, and A3 := {3, 4, 7}. Thereare many different SDRs for these three sets. Some are:

(1, 2, 3), (1, 2, 4), (1, 2, 7), (1, 4, 3), (1, 4, 7), (1, 7, 3), (1, 7, 4).

Theorem 2.2. Let (A1, . . . , An) be finite sets of size at least r satisfying

|⋃j∈J

Aj | ≥ |J | for all J ⊆ {1, . . . , n}. (∗)

The number of SDRs for this family is at least{r! if r ≤ nr(r − 1) . . . (r − n+ 1) if r > n.

Proof. Omitted. This is a version of Hall’s Marriage Theorem, which is provedin MT4514 Graph Theory. �

Theorem 2.3. Let (A1, . . . , An) be a family of subsets of {1, . . . , n}, and let r be apositive integer. If each of the sets Ai has size r and if each element of {1, . . . , n}is contained in exactly r sets, then the family (A1, . . . , An) has at least r! SDRs.

Proof. We prove that (A1, . . . , An) satisfies (∗); the result then follows fromTheorem 2.2.

For an arbitrary J ⊆ {1, . . . , n} we count in two different ways the number ofpairs (j, x) where j ∈ J and x ∈ Aj . There are |J | choices for j, and, having chosenj, there are |Aj | = r choices for x. So there are precisely r|J | such pairs. On theother hand, there are | ∪j∈J Aj | choices for x, and, having chosen x, there are atmost r possible choices for j, since x lies in precisely r sets. We conclude thatr|J | ≤ r| ∪j∈J Aj |, implying (∗), as required. �

Let us now return to our problem of counting Latin squares. The idea is tobuild a Latin square row by row, and to count how many choices for adding eachnew row we have. To this end we introduce a notion of a Latin rectangle: it is ak × n array (with k ≤ n) with entries from {1, . . . , n} such that each entry occursprecisely once in each row and at most once in each column.

Lemma 2.4. Given a k × n Latin rectangle with k < n, there are at least (n− k)!ways to add a row to form a (k + 1)× n Latin rectangle.

Proof. Let Ai be the set of all entries not appearing in the ith column. We seethat (x1, . . . , xn) is a possible (k + 1)st row if and only if xi ∈ Ai and xi 6= xj fori 6= j, i.e. if and only if (x1, . . . , xn) is an SDR for (A1, . . . , An).

Now, clearly each set Ai has size n− k. Also, a fixed entry x appears preciselyk times (once in each row), and so it belongs to precisely n−k sets. The conditionsof Theorem 2.3 are fulfilled for r = n− k, and the lemma follows. �

Theorem 2.5. The number of Latin squares of order n is at least n!(n−1)! . . . 2!1!.

Proof. There are n! choices for the first row; having chosen it, there are at least(n− 1)! choices for the second row, etc. �


3. Orthogonality

Let us analyse the thirty six officers problem in more detail. We have alreadyconsidered the ranks of the officers. The second requirement is that every row andevery column contain one officer from each regiment. So if we denote each regimentby 1, 2, 3, 4, 5, 6, and replace each officer by the number of its regiment we obtainanother Latin square. Thus we have two Latin squares: L1 representing the ranksand L2 representing the regiments. These two Latin squares are related by thecondition that every regiment has one officer of each rank.

Let us put the square L2 over L1, so that in each cell we can see a pair ofnumbers. Now, it cannot happen that a pair (i, j) occurs twice, as it would meanthat the regiment j has two officers of rank i. Since there are 36 cells and 36 pairsof numbers {1, 2, 3, 4, 5, 6}, we conclude that each pair must occur exactly once.

Definition 3.1. Two Latin squares A = (aij)n×n and B = (bij)n×n are orthogonalif the set {(aij , bij) : 1 ≤ i, j ≤ n} contains all possible pairs.

Example 3.2. The following two Latin squares are orthogonal:

1 2 3 42 1 4 33 4 1 24 3 2 1

1 4 3 23 2 1 42 3 4 14 1 2 3

One may ask for which values of n there exist orthogonal squares of order n?It is clear that they do not exist for n = 2. Also, Tarry’s solution to the thirty sixofficers problem means that there are no orthogonal Latin squares of order 6. Onthe other hand we shall prove in Section 5 that if n 6≡ 2 (mod 4) then there existorthogonal Latin squares of order n. Euler conjectured that the converse was alsotrue: if n ≡ 2 (mod 4) then orthogonal squares of order n do not exist. However,he was wrong: Bose, Shrikhande and Parker proved in 1960 that for every n, exceptfor n = 2 and n = 6, orthogonal squares exist.

Another interesting question that one may ask is the following.

Question 3.3. What is the maximal number of mutually orthogonal Latin squaresof order n? (Latin squares A1, A2, . . . , Ak are mutually orthogonal if each pair Aiand Aj are orthogonal.)

The following notion will be useful in considering the above question.

Definition 3.4. A Latin square A = (aij)n×n is in standard form if its first row is123 . . . n.

It is clear that every Latin square A can be standardised by reordering thesymbols in it; we denote the resulting square by A∗.

Example 3.5. The first square A in Example 3.2 is in standard form. The secondsquare B in the same example has standardisation

B∗ =

1 2 3 43 4 1 24 3 2 12 1 4 3

Note that A and B∗ are orthogonal.

Lemma 3.6. If A and B are orthogonal Latin squares, then so are A∗ and B∗.


Proof. Let A = (aij)n×n and B = (bij)n×n. Standardisation of A is achievedby means of a permutation σ of the set {1, 2, . . . , n}, so that A∗ = (σ(aij))n×n.Similarly, we have B∗ = (τ(bij))n×n for some other permutation τ . Assume thatA∗ and B∗ are not orthogonal. This means that among the pairs (σ(aij), τ(bij))(1 ≤ i, j ≤ n) at least one pair occurs twice. Thus we have

(σ(aij), τ(bij)) = (σ(akl), τ(bkl)),

for some i, j, k, l. Since σ and τ are permutations, this implies aij = akl andbij = bkl, which contradicts the fact that A and B are orthogonal. �

The following theorem gives an upper bound for the maximal number of mutuallyorthogonal Latin squares of order n.

Theorem 3.7. If A1, A2, . . . , Am are mutually orthogonal Latin squares of order nthen m ≤ n− 1.

Proof. Let Ak = (a(k)ij )n×n. By Lemma 3.6 we may assume that all A1, . . . , Am

are in standard form (otherwise we standardise them, without affecting orthogonal-ity), i.e.

a(k)1j = j.

Consider the setS = {(i, j, k) : a(k)

ij = 1}.

Clearly, the number of elements of S is equal to the total number of 1’s inA1, . . . , Am,so that

|S| = nm. (3.1)

Consider a triple (i, j, k) ∈ S. Each of the squares has 1 in the position (1, 1).Hence, if i = j = 1 then k can be arbitrary. Also, no other entry in the position(1, j) or (i, 1) can be 1, so that we cannot have one of i and j being equal to 1 andthe other one not. Finally, if i 6= 1 and j 6= 1 then, because of orthogonality, theremay exist at most one k such that (i, j, k) ∈ S. We conclude that

|S| ≤ m+ (n− 1)2. (3.2)

Combining (3.1) and (3.2) we obtain m ≤ n− 1, as required. �

4. Latin squares from finite fields

Theorem 3.7 gives no indication about the sharpness of the given bound. Here weshow that for infinitely many n there are sets of n − 1 mutually orthogonal Latinsquares, namely whenever n is a prime power. To do so we introduce a method ofconstructing orthogonal Latin squares from finite fields.

Theorem 4.1. If n = pt, where p is a prime and t ≥ 1, then there exist n − 1mutually orthogonal Latin squares of order n.

Proof. By the Fundamental Theorem for Finite Fields (Theorem 2.5 in Chapter1) there exists a finite field F = {f1, f2, . . . , fn = 0} of order n. Define n− 1 arraysAk = (a(k)

ij )n×n, 1 ≤ k ≤ n− 1, with elements from F by setting

a(k)ij = fifk + fj .


First we prove that each Ak is a Latin square. Assume that two elements a(k)ij1

and a(k)ij2

in the ith row are equal. This means that

fifk + fj1 = fifk + fj2 ,

so that fj1 = fj2 , and hence j1 = j2. Similarly, if a(k)i1j

= a(k)i2j

, we have

fi1fk + fj = fi2fk + fj ⇒ fi1fk = fi2fk ⇒ fi1 = fi2 ⇒ i1 = i2,

since F is a field and fk 6= 0.We complete the proof by showing that Ak and Al are orthogonal for arbitrary

k, l, with k 6= l. Assume that they are not; this means that

(a(k)i1j1

, a(l)i1j1

) = (a(k)i2j2

, a(l)i2j2

),

for some i1, i2, j1, j2, yielding the system

fi1fk + fj1 = fi2fk + fj2 (3.3)fi1fl + fj1 = fi2fl + fj2 . (3.4)

Subtracting (3.4) from (3.3) we obtain

fi1(fk − fl) = fi2(fk − fl).

Since k 6= l, we have fk − fl 6= 0, so that fi1 = fi2 and hence i1 = i2. Substitutingfi1 = fi2 back into (3.3) we obtain fj1 = fj2 , and hence j1 = j2. Therefore, Ak andAl are orthogonal. �

Example 4.2. Let us use the finite field Z5 to construct 4 mutually orthogonalLatin squares of order 5. First, we let

f1 = 1, f2 = 2, f3 = 3, f4 = 4, f5 = 0.

The first Latin square A1 = (a(1)ij )5×5 is given by a(1)

ij = fi + fj :

j 1 2 3 4 5i fi fj 1 2 3 4 01 1 2 3 4 0 12 2 3 4 0 1 23 3 4 0 1 2 34 4 0 1 2 3 45 0 1 2 3 4 0

Similarly, the second Latin square A2 = (a(2)ij )5×5 is given by a(2)

ij = 2fi + fj :

j 1 2 3 4 5i fi 2fi fj 1 2 3 4 01 1 2 3 4 0 1 22 2 4 0 1 2 3 43 3 1 2 3 4 0 14 4 3 4 0 1 2 35 0 0 1 2 3 4 0


Repeating similar calculations for A3 and A4 we obtain the squares:

A1 =

2 3 4 0 13 4 0 1 24 0 1 2 30 1 2 3 41 2 3 4 0

, A2 =

3 4 0 1 20 1 2 3 42 3 4 0 14 0 1 2 31 2 3 4 0

A3 =

4 0 1 2 32 3 4 0 10 1 2 3 43 4 0 1 21 2 3 4 0

, A4 =

0 1 2 3 44 0 1 2 33 4 0 1 22 3 4 0 11 2 3 4 0

At present it is not known whether the converse of Theorem 4.1 holds, i.e.whether from the existence of n − 1 mutually orthogonal squares of order n itfollows that n is a prime power.

5. Direct products

Direct products are a means of constructing new Latin squares from existing ones.

Definition 5.1. Let A = (aij)m×m and B = (bij)n×n be two Latin squares. Theirdirect product C = A × B is an mn × mn array, indexed by the elements of{1, . . . ,m} × {1, . . . , n} and entries

c(i,j),(k,l) = (aik, bjl).

Theorem 5.2. The direct product of two Latin squares is again a Latin square.

Proof. Let A = (aij)m×m and B = (bij)n×n be two Latin squares, and let C betheir direct product. Assume that in the row indexed by (i, j) we have two identicalentries:

c(i,j),(k1,l1) = c(i,j),(k2,l2).

This means thataik1 = aik2 , bjl1 = bjl2 .

Since A and B are Latin squares we have k1 = k2 and l1 = l2. The proof for thecolumns is analogous. �

Example 5.3. Consider the following two Latin squares

1 22 1

2 3 13 1 21 2 3

Their direct product, according to the definition, is

(1, 1) (1, 2) (1, 3) (2, 1) (2, 2) (2, 3)(1, 1) (1, 2) (1, 3) (1, 1) (2, 2) (2, 3) (2, 1)(1, 2) (1, 3) (1, 1) (1, 2) (2, 3) (2, 1) (2, 2)(1, 3) (1, 1) (1, 2) (1, 3) (2, 1) (2, 2) (2, 3)(2, 1) (2, 2) (2, 3) (2, 1) (1, 2) (1, 3) (1, 1)(2, 2) (2, 3) (2, 1) (2, 2) (1, 3) (1, 1) (1, 2)(2, 3) (2, 1) (2, 2) (2, 3) (1, 1) (1, 2) (1, 3)


After renumbering this becomes

2 3 1 5 6 43 1 2 6 4 51 2 3 4 5 65 6 4 2 3 16 4 5 3 1 24 5 6 1 2 3

The above example suggests an alternative way of constructing the direct prod-uct of two Latin squares A = (aij)m×m and B = (bij)n×n. We let B1, B2, . . . , Bmbe copies of B, each with its own set of symbols. Then we replace the entries of Aby the corresponding Bi’s. In other words we form the array (Baij )n×n, and thisarray is the direct product of A and B.

Next we prove that direct products preserve orthogonality.

Theorem 5.4. If A and B are orthogonal Latin squares of order m, and if C andD are orthogonal Latin squares of order n, then A×C and B×D are also orthogonalLatin squares.

Proof. Let A = (aij)m×m, B = (bij)m×m, C = (cij)n×n, D = (dij)n×n, X =A× C, Y = B ×D. Assume that X and Y are not orthogonal. This means that

x(i1,j1),(k1,l1) = x(i2,j2),(k2,l2),

y(i1,j1),(k1,l1) = y(i2,j2),(k2,l2),

for some i1, i2, j1, j2, k1, k2, l1, l2. From the definition of the direct product we have

ai1k1 = ai2k2 (3.5)cj1l1 = cj2l2 (3.6)bi1k1 = bi2k2 (3.7)dj1l1 = dj2l2 . (3.8)

From (3.5), (3.7) and the fact that A and B are orthogonal we deduce that i1 = i2and k1 = k2. Similarly, from the other two equations and the orthogonality of Cand D we deduce that j1 = j2 and l1 = l2. �

Corollary 5.5. If n 6≡ 2 (mod 4) then there exists a pair of orthogonal Latinsquares of order n.

Proof. Let n = pα11 pα2

2 . . . pαk

k be the decomposition of n into a product ofprimes, with p1 < p2 < . . . < pk. Since n 6≡ 2 (mod 4) it follows that if n is oddthen pα1

1 ≥ 3 and if n is even then pα11 ≥ 4. Thus in either case pα1

1 > 2, andso pαi

i > 2 for every i. By Theorem 4.1, for each i (1 ≤ i ≤ k) there exists apair Ai, Bi of orthogonal Latin squares of order pαi

i . But then the Latin squaresA = A1× . . .×Ak and B = B1× . . .×Bk are orthogonal by Theorem 5.4 and haveorder n. �

Chapter 4

Finite geometries

1. Finite affine planes

In standard (euclidean) plane geometry one deals with a basic set P, the elementsof which are called points, and a set L of certain subsets of P called lines. Therelationship between points and lines is specified by means of a series of axioms,which are a short set of rules that define the system. Certain axioms, such as theone requiring the existence of a point between any two distinct points, force everyline to consist of infinitely many points.

One may consider systems which do not insist that all of the axioms hold, butinstead require only a subset of them, or a different collection of axioms entirely.Here we consider the following such system:

(A1) Any two distinct points A and B of P are contained in one and only one linefrom L. (This line is usually denoted by l(A,B).)

(A2) There are four distinct points in P, no three of which lie on the same line.

(A3) For any line l ∈ L and any point P ∈ P not belonging to l there exists aunique line p ∈ L such that P ∈ p and p ∩ l = ∅.

Definition 1.1. If P is a set and L is a set of subsets of P, and if axioms (A1),(A2), (A3) are satisfied, then the structure (P,L) is called an affine plane.

It turns out that P is not necessarily infinite. For example, if P = {1, 2, 3, 4}and L = {{1, 2}, {2, 3}, {3, 4}, {1, 4}, {1, 3}, {2, 4}}, then (P,L) is an affine plane; itcan be represented as in Figure 4.1.

1 2

34

Figure 4.1: An affine plane of order 4

28


P

l

l1 l2 ly

L1 L2 Ly

Figure 4.2: Proof of Lemma 1.3

Definition 1.2. If (P,L) is an affine plane and P is finite then (P,L) is a finiteaffine plane. The number of elements of P is called the order of the plane.

We have the notion of parallelism in affine planes. We say that two lines p andq are parallel if p = q or p ∩ q = ∅. It is easy to show that this is an equivalencerelation. Axiom (A3) can now be reformulated as: for each line l and each point Pthere exists a unique line p which contains P and is parallel to l.

For what follows, we will need the following easy lemma.

Lemma 1.3. Let P be a point and l be a line such that P 6∈ l. If the number oflines containing P is x, and if the number of points in l is y then x = y + 1.

Proof. Of the x lines containing P , exactly one is parallel to l (by (A3)), andhence has the empty intersection with it. So there are x− 1 lines containing P andcrossing l. If p1 and p2 are any two such lines, and if we write L1 = p1∩l, L2 = p2∩l,then L1 6= L2, for otherwise we would have p1 = l(P,L1) = l(P,L2) = p2 by (A1).Therefore x− 1 ≤ y.

Conversely, if L1, . . . , Ly are all the points of l, then clearly all the lines li =l(P,Li) are distinct and cross l. In addition there is one line containing P whichhas a trivial intersection with l, and so x ≥ y+1. (The proof is illustrated in Figure4.2.) �

As in the case of Latin squares one may ask the following question: for whichnumbers n does there exist a finite affine plane of order n? The following theoremshows that n certainly has to be a square.

Theorem 1.4. Let (P,L) be a finite affine plane. Then there exists a number msuch that the following statements hold:

(i) every point lies on exactly m+ 1 lines;

(ii) every line contains exactly m points;

(iii) there are exactly m2 points;

(iv) there are exactly m2 +m lines;

(v) for any line there are m lines parallel to it.


Proof. (i) We need only prove that the number of lines through a point is thesame for all points, as then we can denote that number by m+ 1 and the result willfollow.

Let P and Q be any two distinct points, and let l be a line not containing any ofthem; existence of l is ensured by (A2). Then the number of lines passing throughP is one greater than the number of points on l by Lemma 1.3. But, by the samelemma, so is the number of lines passing through Q. Therefore, the number of linespassing through a point is the same for all points, as required.

(ii) This part is a direct consequence of (i) and Lemma 1.3.(iii) and (iv) Let x be the total number of points, and let y be the total number

of lines. If we count the points on each of y lines, we will count ym points by (ii).On the other hand, every point lies on m+ 1 lines by (i), and so has been countedm+ 1 times. Thus we have

ym

m+ 1= x. (4.1)

Let us now count the lines in the following way. Every line is uniquely determinedby two points. There are x(x − 1)/2 pairs of points. However, every line containsm points, and has been counted m(m− 1)/2 times. Thus we obtain

x(x− 1)/2m(m− 1)/2

= y. (4.2)

Solving (4.1) and (4.2) gives

x = m2, y = m2 +m,

as required.(v) Exercise. �

2. Finite fields and affine planes

We have seen that the order of a finite affine plane must be a square. However, itis not known for which squares there exist finite affine planes of those orders.

In this section we create examples based on finite fields. Let F be any finitefield, and let us define a set of points

P = F × F = {(a, b) : a, b ∈ F}.

We define lines to be sets of solutions of linear equations in two variables. Moreprecisely, for a, b, c ∈ F with a 6= 0 or b 6= 0, we let

la,b,c = {(x, y) ∈ F × F : ax+ by + c = 0},

and thenL = {la,b,c : a, b, c ∈ F, a 6= 0 or b 6= 0}.

(Note that one line can be defined by several equations. This is the reason whywe define lines to be the sets of solutions of linear equations, rather than equationsthemselves.)

Now we show that (P,L) forms a finite affine plane.

Theorem 2.1. If n = pt, where p is a prime and t ≥ 1, then there exists a finiteaffine plane of order n2.


Proof. By the Fundamental Theorem for Finite Fields (Theorem 2.5 in Chapter1) there exists a finite field F with n elements. Use F to define P and L, as above.Clearly, |P| = n2.

We now check that all three axioms for affine planes are satisfied.(A1) Let (α, β), (γ, δ) ∈ P be two distinct points. Then either α 6= γ or β 6= δ

and so the equation

(δ − β)x+ (α− γ)y + (βγ − αδ) = 0 (4.3)

defines a line. It is easy to check that this line contains both given points.For the uniqueness, assume that the equation

ax+ by + c = 0 (4.4)

defines a line containing both (α, β) and (γ, δ). Then we have

aα+ bβ + c = 0 (4.5)aγ + bδ + c = 0. (4.6)

Let us assume, without loss of generality, that β 6= δ. Subtracting (4.6) from (4.5)we obtain

a(α− γ) + b(β − δ) = 0. (4.7)

We note that we must have a 6= 0, for otherwise we would have b(β − δ) = 0, andso b = 0. Solving the system (4.5), (4.6) for b and c we obtain

b = a(β − δ)−1(γ − α)c = −aα− aβ(β − δ)−1(γ − α).

If we substitute these solutions back into (4.4) and simplify, we obtain equation(4.3). Therefore, any equation defining a line which contains both (α, β) and (γ, δ)is equivalent to (4.3).

(A2) Consider the points (0, 0), (0, 1), (1, 0), (1, 1). The points (0, 0) and (0, 1)lie on the line x = 0, but neither (1, 0) nor (1, 1) satisfy x = 0. The points (0, 0)and (1, 0) lie on the line y = 0, but neither (0, 1) or (1, 1) satisfy y = 0. The points(0, 0) and (1, 1) lie on the line x + y = 0, but neither (0, 1) or (1, 0) satisfy thisequation. The points (0, 1) and (1, 0) lie on the line x+ y− 1 = 0, but neither (0, 0)or (1, 1) satisfy this equation. The points (0, 1) and (1, 1) lie on y−1 = 0, but (0, 0)and (1, 0) do not. Finally, the points (1, 0) and (1, 1) lie on x − 1 = 0, but (0, 0)and (0, 1) do not.

(A3) Let l be the line defined by the equation ax+ by+ c = 0, and let (α, β) beany point not belonging to l. Consider the line defined by the equation

ax+ by + (−αa− βb) = 0. (4.8)

It is easy to check that this line contains (α, β) and has no intersection with l. Forthe uniqueness we show, similarly as for (A1), that any line which contains (α, β)and has the empty intersection with l is defined by an equation equivalent to (4.8).�

It is not known whether the converse of Theorem 2.1 is also true, i.e. whetherthe existence of a finite affine plane of order n2 implies that n is a prime power.

Definition 2.2. If F is a finite field, then the finite affine plane (P,L) constructedas above is denoted AP (F ).


Example 2.3. We construct the finite affine plane AP (Z3). It has 9 points and 12lines. The points are P = {(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 1), (2, 2), (2, 0)}.The lines are solutions of linear equations of the form ax + by + c = 0. If a 6= 0,then we can choose a = 1, and we obtain the following equations, listed togetherwith corresponding lines:

x+ 0 = 0, l1 = {(0, 0), (0, 1), (0, 2)}x+ 1 = 0, l2 = {(2, 0), (2, 1), (2, 2)}x+ 2 = 0, l3 = {(1, 0), (1, 1), (1, 2)}x+ y + 0 = 0, l4 = {(0, 0), (1, 2), (2, 1)}x+ y + 1 = 0, l5 = {(0, 2), (1, 1), (2, 0)}x+ y + 2 = 0, l6 = {(0, 1), (1, 0), (2, 2)}x+ 2y + 0 = 0, l7 = {(0, 0), (1, 1), (2, 2)}x+ 2y + 1 = 0, l8 = {(0, 1), (1, 2), (2, 0)}x+ 2y + 2 = 0, l9 = {(0, 2), (1, 0), (2, 1)}.

When a = 0, then we must have b 6= 0, and we can choose b = 1. We obtain afurther three lines:

y + 0 = 0, l10 = {(0, 0), (1, 0), (2, 0)}y + 1 = 0, l11 = {(0, 2), (1, 2), (2, 2)}y + 2 = 0, l12 = {(0, 1), (1, 1), (2, 1)}.

The obtained plane is sketched in Figure 4.3.

3. Affine planes and Latin squares

If we compare the main result of the previous section with that of Section 4 ofChapter 3, we see an unexpected symmetry. When n is a prime power we canrelatively easily construct a collection of n − 1 mutually orthogonal Latin squaresof order n, and also a finite affine plane of order n2. On the other hand it is notknown whether either of these configurations exists when n is not a prime power.In this section we show that this symmetry is no coincidence.

First we show that one may use affine planes to construct sets of mutuallyorthogonal Latin squares.

Lemma 3.1. Let n ≥ 2 be any natural number. If there exists a finite affine planeof order n2 then there exist n− 1 mutually orthogonal Latin squares of order n.

Proof. Let (P,L) be a finite affine plane of order n2. Let O be an arbitrarypoint, and let x and y be two distinct lines through O, which we consider as fixed.Let z be any other line through O; by Theorem 1.4(i) there are n − 1 choices forthis line. By Theorem 1.4(v) there are n lines parallel to x; let us denote these linesby x1, . . . , xn. Similarly, let y1, . . . , yn be all lines parallel to y, and let z1, . . . , znbe all lines parallel to z.

We define an array Az = (a(z)ij )n×n depending on z as follows:

a(z)ij = k ⇐⇒ xi ∩ yj ∩ zk 6= ∅.

(Note that the condition xi ∩ yj ∩ zk 6= ∅ means that xi, yj and zk all pass throughthe same point.)

We prove that Az is a Latin square. If a(z)ij1

= a(z)ij2

then this means that

xi ∩ yj1 ∩ zk 6= ∅, xi ∩ yj2 ∩ zk 6= ∅,


(0,0)

(0,1)

(0,2)

(1,0)

(1,1)

(1,2)

(2,1)

(2,2)

(2,0)

Figure 4.3: The finite affine plane AP (Z3)


so thatxi ∩ yj1 ∩ zk = xi ∩ zk = xi ∩ yj2 ∩ zk.

If this point of intersection is denoted by P , then yj1 and yj2 are two lines which bothcontain P and are parallel to y. This contradicts (A3) unless j1 = j2. This provesthat every row of Az contains every entry exactly once. The proof for columns issimilar, and so Az is a Latin square.

We must prove that any two such squares Az and At are orthogonal. Assumethat they are not. Then we have

a(z)ij = a

(z)kl = r, a

(t)ij = a

(t)kl = s,

which means that

xi ∩ yj ∩ zr 6= ∅, xk ∩ yl ∩ zr 6= ∅,xi ∩ yj ∩ ts 6= ∅, xk ∩ yl ∩ ts 6= ∅.

So the line zr contains two points xi ∩ yj and xk ∩ yl which are distinct (why?).Similarly, ts contains these two points, and so zr = ts by (A1). But zr is parallelto z and ts is parallel to t. Since O = z ∩ t we have two parallels through O to theline zr = ts, which contradicts (A3).

Since there were (n+ 1)− 2 = (n− 1) choices for z, the result follows. �

Example 3.2. We consider the affine plane AP (Z3), as given in Figure 4.3. Welet the point O be (0, 0), the line x := {(0, 0), (0, 1), (0, 2)}, and the line y :={(0, 0), (1, 0), (2, 0)}. Then there are two possible choices for z: either the line{(0, 0), (1, 1), (2, 2)} or the line {(0, 0), (2, 1), (1, 2)}.

Let us first set z := {(0, 0), (1, 1), (2, 2)}. Then we have the following system of9 lines:

x1 := {(0, 0), (0, 1), (0, 2)}x2 := {(1, 0), (1, 1), (1, 2)}x3 := {(2, 0), (2, 1), (2, 2)}y1 := {(0, 0), (1, 0), (2, 0)}y2 := {(0, 1), (1, 1), (2, 1)}y3 := {(0, 2), (1, 2), (2, 2)}z1 := {(0, 0), (1, 1), (2, 2)}z2 := {(0, 2), (1, 0), (2, 1)}z3 := {(0, 1), (1, 2), (2, 0)}

We then define an array A = (aij)3×3 by

aij = k ⇔ xi ∩ yj ∩ zk 6= ∅

This produces the following Latin square:

A :=1 3 22 1 33 2 1

Similarly, if we set z := {(0, 0), (2, 1), (1, 2)} then x1, x2, x3, y1, y2, y3 are as before,but we now have

z1 := {(0, 0), (2, 1), (1, 2)}z2 := {(1, 0), (0, 1), (2, 2)}z3 := {(2, 0), (1, 1), (0, 2)}.

This gives the Latin square

B :=1 2 32 3 13 1 2


Theorem 3.3. Let n ≥ 2 be any natural number. Then there exists a finite affineplane of order n2 if and only if there exist n− 1 mutually orthogonal Latin squaresof order n.

Proof. (⇒) This is immediate from Lemma 3.1.(⇐) Let A1, A2, . . . , An−1 be mutually orthogonal Latin squares of order n. We

build a finite affine plane as follows. The points are simply pairs of indices:

P = {(i, j) : 1 ≤ i, j ≤ n}.

There are three types of lines:

(i) ‘vertical lines’ of the form {(i, x) : x = 1, . . . , n} for fixed i;

(ii) ‘horizontal lines’ of the form {(x, j) : x = 1, . . . , n} for fixed j;

(iii) ‘sloped lines’ of the form {(i, j) : a(m)ij = k} for fixed m and k.

In proving that the obtained configuration is indeed a finite affine plane we willuse the following facts, which should be checked as an exercise:

(1) there are n2 points and n2 + n lines;

(2) every line contains n points;

(3) each point is contained in exactly n+ 1 lines;

(4) any two distinct lines have at most one point in common (use the fact thatA1, . . . , An−1 are mutually orthogonal Latin squares).

(A1) It follows directly from (4) that for any two distinct points there exists atmost one line containing both of them. Now count all the pairs of points which lieon the same line. Since there are n2 + n lines, each with n points, the number ofsuch pairs is

(n2 + n) · n(n− 1)2

=n2(n2 − 1)

2.

The last number is clearly equal to the total number of pairs of points. Therefore,every pair of points lies on a line.

(A2) The points (1, 1), (1, 2), (2, 1), (2, 2) can be shown to satisfy this axiom.(A3) Let l be a line, and let P be a point such that P 6∈ l. Each point in l

defines a line through P . There are n points in l (by (2)), so this defines n linesthrough P . By (3), there are n+ 1 lines through P , so exactly one of them containsno points of l. �

Example 3.4. We use the four mutually orthogonal Latin squares of order 5 fromExample 4.2 of Chapter 3 to construct the affine plane AP (Z5). The points are theset {(i, j) : 1 ≤ i, j ≤ 5}.

The vertical lines are:

{(1, 1), (1, 2), (1, 3), (1, 4), (1, 5)}{(2, 1), (2, 2), (2, 3), (2, 4), (2, 5)}{(3, 1), (3, 2), (3, 3), (3, 4), (3, 5)}{(4, 1), (4, 2), (4, 3), (4, 4), (4, 5)}{(5, 1), (5, 2), (5, 3), (5, 4), (5, 5)}

The horizontal lines are:

{(1, 1), (2, 1), (3, 1), (4, 1), (5, 1)}{(1, 2), (2, 2), (3, 2), (4, 2), (5, 2)}{(1, 3), (2, 3), (3, 3), (4, 3), (5, 3)}{(1, 4), (2, 4), (3, 4), (4, 4), (5, 4)}{(1, 5), (2, 5), (3, 5), (4, 5), (5, 5)}


Figure 4.4: A finite projective plane of order 7

The sloped lines arising from the Latin square A1 are:

m = 1 k = 0 {(1, 4), (2, 3), (3, 2), (4, 1), (5, 5)}k = 1 {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)}k = 2 {(1, 1), (2, 5), (3, 4), (4, 3), (5, 2)}k = 3 {(1, 2), (2, 1), (3, 5), (4, 4), (5, 3)}k = 4 {(1, 3), (2, 2), (3, 1), (4, 5), (5, 4)}

and similarly for m = 2, 3, 4.

4. Projective planes

A system (P,L) of points and lines is called a projective plane if it satisfies axioms(A1), (A2) and

(P3) Any two lines have a point in common.

If P is finite then we have a finite projective plane. An example of this structure isgiven in Figure 4.4.

Theorem 4.1. Let (P,L) be a finite projective plane. Then there exists a numberm such that the following conditions are satisfied:

(i) every point lies on exactly m+ 1 lines;

(ii) every line has exactly m+ 1 points;

(iii) there are m2 +m+ 1 points;

(iv) there are m2 +m+ 1 lines.

Proof. The proof is similar to the proof of Theorem 1.4, and is left as an exercise.�

The following theorem gives a connection between finite affine and projectiveplanes.

Theorem 4.2. Let n ≥ 2 be a natural number. Then there exists a finite affineplane of order n2 if and only if there exists a finite projective plane of order n2+n+1.


Proof. The idea of the proof is to construct a projective plane from an affineplane, and, vice versa, to construct an affine plane from a projective plane. Wedescribe the constructions, but leave the axiom-checking as an exercise.

(⇒) Let (P,L) be a finite affine plane of order n2. Let P be any point, and letp1, . . . , pn+1 be the lines passing through this point. We construct a finite projectiveplane (P ′,L′) as follows. We choose n+ 1 new points X1, . . . , Xn+1, and let

P ′ = P ∪ {X1, . . . , Xn+1}.

Lines in L′ have the form l ∪ {Xi} where l ∈ L and i is the unique number suchthat pi is parallel to l. We also have one more line, namely {X1, . . . , Xn+1}.

(⇐) Now we start from a finite projective plane (P,L), and construct a finiteaffine plane (P ′,L′) as follows. We choose an arbitrary line p ∈ L and denote itspoints by X1, . . . , Xn+1. The new set of points is

P ′ = P\{X1, . . . , Xn+1},

and the lines have the form l\{X1, . . . , Xn+1}, with l ∈ L, l 6= p. �

We refer to the two constructions described in the above proof as adding orremoving the line at infinity.

Example 4.3. We construct the projective plane arising from the finite affine planeAP (Z3) described in Example 2.3. The result is shown in Figure 4.5.

Example 4.4. Removing any line from the seven element projective plane shownin Figure 4.4 results in the four element affine plane shown in Figure 4.1.


Figure 4.5: The finite projective plane arising from AP (Z3)

Chapter 5

Designs and Steiner triple systems

1. Designs

In the previous chapter we have seen that a finite affine plane is a system (P,L),consisting of a set P and another set L of subsets of P, satisfying three axioms.The first of these axioms requires that every pair of elements of P belongs to aunique element of L. Also, from the axioms, it follows that there exists a numberm such that every element of P belongs to exactly m + 1 elements of L, and thatevery element of L contains exactly m elements of P; see Theorem 1.4 of Chapter4. Similarly, we have seen that in a finite projective plane there exists a numberm such that every element of P belongs to exactly m+ 1 elements of L and everyelement of L contains exactly m+ 1 elements of P.

Finite affine planes and finite projective planes are special cases of the followinggeneral concept.

Definition 1.1. Let v, b, r, k, λ be natural numbers with k < v, r ≤ b, λ ≤ b. A(v, b, r, k, λ)-design is a pair (X,B), where X is a set and B is a set of subsets of X,with the following properties:

(D1) X has v elements;

(D2) B has b elements, each of size k;

(D3) every element of X is contained in exactly r elements of B;

(D4) every pair of elements of X is contained in exactly λ elements of B.

The elements of X are called points, while the elements of B are called blocks.

Example 1.2. A finite affine plane of order n2 has n2 points and n2 + n lines.Every point is on n+ 1 lines and every line contains n points. Every pair of pointsdefines a unique line. Therefore a finite affine plane is an (n2, n2 + n, n + 1, n, 1)-design. A finite projective plane of order n2 + n + 1 has n2 + n + 1 points andn2 + n+ 1 lines. Every point is on n+ 1 lines, and every line contains n+ 1 points.Every pair of points defines a unique line. Therefore a finite projective plane is an(n2 + n+ 1, n2 + n+ 1, n+ 1, n+ 1, 1)-design.

The main general task of design theory is to determine for which values of thefive parameters a (v, b, r, k, λ)-design exists. Of course, this is a far too difficulttask, and it has not been completed even for some ‘tame’ types of designs, such asfinite geometries; see the previous chapter. Here we give a theorem putting someconstraints on the parameters.

Theorem 1.3. If there exists a (v, b, r, k, λ)-design then

39

40 Nik Ruskuc and Colva Roney-Dougal

(i) vr = bk;

(ii) λ(v − 1) = r(k − 1).

Proof. (i) If we count the points in each of b blocks, we find bk of them. In doingso we have counted every point exactly r times, so that vr = bk.

(ii) Let us count pairs of points belonging to each of b blocks; the result is clearlybk(k − 1)/2. On the other hand, there are v(v − 1)/2 pairs of points, and we havecounted each pair exactly λ times, so that we obtain

λv(v − 1)2

=bk(k − 1)

2.

If we use (i) and replace bk by vr we obtain λ(v − 1) = r(k− 1), as required. �

2. Steiner triple systems

Definition 2.1. A Steiner triple system is a pair (X,B), where X is a set and Bis a set of 3-element subsets of X, such that every pair of elements of X belongsto a unique element of B. In other words, a Steiner triple system with more thanthree points is a (v, b, r, 3, 1)-design. The number v = |X| is called the order of theSteiner triple system.

Example 2.2. • The systems (∅, ∅), ({1}, ∅) and ({1, 2, 3}, {{1, 2, 3}}) are Steinertriple systems of orders 0, 1 and 3 respectively.

• The finite projective plane shown in Figure 4.4 has 7 points and 7 lines, suchthat each line contains 3 points and every pair of points defines a unique line.Therefore it is Steiner triple system of order 7.

• The finite affine plane AP (Z3) shown in Figure 4.3 has 9 points {(i, j) : 0 ≤i, j ≤ 2} and 12 lines. Each line contains 3 points, and every pair of pointslies on a unique line, so AP (Z3) is a Steiner triple system of order 9.

• The system

({0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, {{0, 1, 4}, {1, 2, 5}, {2, 3, 6}, {3, 4, 7}, {4, 5, 8},{5, 6, 9}, {6, 7, 10}, {7, 8, 11}, {8, 9, 12}, {9, 10, 0}, {10, 11, 1}, {11, 12, 2}, {12, 0, 3},

{0, 2, 8}, {1, 3, 9}, {2, 4, 10}, {3, 5, 11}, {4, 6, 12}, {5, 7, 0}, {6, 8, 1},{7, 9, 2}, {8, 10, 3}, {9, 11, 4}, {10, 12, 5}, {11, 0, 6}, {12, 1, 7}})

is a Steiner triple system of order 13.

Exercise 2.3. Which finite affine planes are Steiner triple systems? Which finiteprojective planes are Steiner triple systems?

We are able to precisely determine the possible orders of Steiner triple systems.The first half of this result is given by the following theorem.

Theorem 2.4. If there exists a Steiner triple system of order v > 0 then eitherv ≡ 1 (mod 6) or v ≡ 3 (mod 6).

Proof. It is clear that the result holds for v ≤ 3. For v > 3, a Steiner triplesystem of order v is a (v, b, r, 3, 1)-design, and so, by Theorem 1.3, we have

vr = 3b, v − 1 = 2r.

From these two equations we see that v is odd and that b = v(v − 1)/6, so thatv(v − 1) must be divisible by 6. Therefore, either v is divisible by 3, in which casewe have v ≡ 3 (mod 6), or v − 1 is divisible by 3, in which case v ≡ 1 (mod 6).�


The first appearance of Steiner triple systems was in the so called Kirkman’sschoolgirls problem:

Is is possible to arrange 15 schoolgirls in parties of three for sevendays’ walks so that every two of them walk together exactly once?

Kirkman, who was a vicar at the time, became interested in the problem afterfinding a more general problem in the Lady’s and Gentleman’s Diary in 1845:

Given integers l,m, n with l < m < n, what is the greatest numberof m-element subsets of an n-element set with the property that anyl-element subset lies in at most one of the chosen sets?

3. Subsystems and a recursive construction

Definition 3.1. Let (X,B) be a Steiner triple system, let Y ⊆ X and let C = {T ∈B : T ⊆ Y }. If (Y, C) is a Steiner triple system then we say that Y (or, moreprecisely, (Y, C)) is a subsystem of (X,B).

Example 3.2. Let (X,B) be a Steiner triple system. The systems (∅, ∅) and (X,B)are subsystems. If X 6= ∅, then for every element x ∈ X the system ({x}, ∅) is asubsystem. If |X| ≥ 3 and if T ∈ B is any triple then (T, {T}) is a subsystem.

Lemma 3.3. Let (Y, C) be a subsystem of a Steiner triple system (X,B), let a, b ∈Y , and let T be the unique triple in B containing a and b. Then T ∈ C.

Proof. Since (Y, C) is a Steiner triple system, there must exist a triple T ′ ∈ Ccontaining a and b. On the other hand C ⊆ B and T is the unique triple in Bcontaining a and b, so that T ′ = T . �

Lemma 3.4. If Y is a non-empty subsystem of a Steiner triple system (X,B), then|X \ Y | is even.

Proof. We showed in the proof of Theorem 2.4 that a nontrivial Steiner triplesystem has odd order. Since both X and Y are nonempty, they both have oddorder, so |X \ Y | = |X| − |Y | is even. �

Now we prove a theorem which shows how to build new Steiner triple systemsout of the existing ones.

Theorem 3.5. Let (X,B) be a Steiner triple system of order n, let (Y, C) be asubsystem of (X,B) of order r, and let (Z,D) be a Steiner triple system of orderm. Then the following hold:

1. There exists a Steiner triple system (U, E) of order r + (n− r)m.

2. If m > 0 then this system may be chosen so as to contain (X,B) as a subsys-tem.

3. If 0 < r < n and m > 1 then (U, E) can be chosen to contain a subsystem oforder 7.

Proof. 1. The idea of the proof is as follows. We draw (X,B) on a sheet ofpaper, with (Y, C) drawn on the left hand half of the sheet; see Figure 5.1. Thenwe make m copies of this page, and bind them together, identifying left halves; seeFigures 5.2 and 5.3. This forms the set U . For the triples E we choose all theexisting triples, plus certain triples connecting elements on different pages.


Y X\Y

Figure 5.1: A Steiner triple system X on a sheet of paper

Y

Figure 5.2: Several copies of X

Y

Figure 5.3: A new system


Let us now be more precise. We let t = n− r and write

Y = {a1, . . . , ar},X = {a1, . . . , ar} ∪ {b1, . . . , bt},Z = {c1, . . . , cm}.

Next we define the new set of points as

U = {a1, . . . , ar} ∪ {di,j : 1 ≤ i ≤ m, 1 ≤ j ≤ t}.

The blocks are of the following three types:

(a) ‘Binding blocks’—these are simply the elements of C.

(b) ‘Single page blocks’—these have two forms. The first is {di,j1 , di,j2 , di,j3}where 1 ≤ i ≤ m and {bj1 , bj2 , bj3} ∈ B (so all points are in X \ Y ). Thesecond type of single page block is {di,j1 , di,j2 , ak} where 1 ≤ i ≤ m and{bj1 , bj2 , ak} ∈ B (i.e. two of them are in X \ Y and one of them is in Y ).

(c) ‘Three page blocks’—these have the form {di1,j1 , di2,j2 , di3,j3} where {ci1 , ci2 ,ci3} ∈ D and j1 + j2 + j3 ≡ 0 (mod t).

The set of all these triples is denoted by E .To prove that (U, E) is a Steiner triple system we have to prove that every pair

of points belongs to a unique triple. If two points lie on the same page (includingthe glued-together page) then they belong to a unique triple of type (a) or (b). Iftwo points lie on different pages, then these points have the form di1,j1 and di2,j2with i1 6= i2. Then clearly these two points belong to no triple of type (a) or (b),and do belong to a unique triple {di1,j1 , di2,j2 , di3,j3}, where i3 is determined by thefact that {ci1 , ci2 , ci3} ∈ D and j3 is determined by j1 + j2 + j3 ≡ 0 (mod t).2. If m > 0 then our ‘book’ has at least one ‘page’, and this ‘page’ is a copy of(X,B); therefore (U, E) contains (X,B) as a subsystem.3. We assume that 0 < r < n (i.e. Y 6= ∅, X) and that m > 1 (which implies that|D| > 0), and prove that (U, E) contains a subsystem of order 7. First choose anarbitrary a ∈ Y . Next order the elements b1, . . . , bt of X\Y so that {a, bt/2, bt} ∈ B.(Recall that t is even by Lemma 3.4.) Finally let (ci1 , ci2 , ci3) ∈ D be any triple. Itis an exercise to check that the set

{a, di1,t/2, di2,t/2, di3,t/2, di1,t, di2,t, di3,t}

is a subsystem of (U, E) of order 7. �

Example 3.6. Since there exists a Steiner triple system of order 3 (n = 3), witha subsystem of order 1 (r = 1), and a Steiner triple system of order 7 (m = 7), itfollows that there exists a Steiner triple system of order

r + (n− r)m = 1 + (3− 1) · 7 = 15.

Thus, there is a design with the correct parameters to satisfy Kirkman’s schoolgirlsproblem: to finally solve the problem you must convince yourself that for at leastone such design there exists a partition of the blocks into seven sets of 5 triples,such that each set of 5 triples covers all 15 points.


4. Existence

In this section we prove the following converse of Theorem 2.4.

Theorem 4.1. If n is a natural number such that n ≡ 1 (mod 6) or n ≡ 3 (mod 6)then there exists a Steiner triple system of order n.

Proof. Let A be the set of all natural numbers n > 0 for which there exists aSteiner triple system of order n. We want to show that A consists of all numberscongruent to 1 or 3 modulo 6. From Example 2.2 we know that 1, 3, 7, 9, 13 ∈ A sothe result holds for n ≤ 13.

Let us also define B to be the set of all natural numbers n for which there existsa Steiner triple system of order n having a subsystem of order 7. Clearly, we haveB ⊆ A. We shall prove that B contains all numbers which are congruent to 1 or 3modulo 6 and are greater than 13.

In doing so we shall make use of Theorem 3.5. Let us denote by E the set ofall triples (r, n,m) of non-negative integers such that there exists a Steiner triplesystem of order n with a subsystem of order r and also there exists a Steiner triplesystem of order m. Theorem 3.5 can be then restated as:

(r, n,m) ∈ E =⇒ r + (n− r)m ∈ A, (∗)(r, n,m) ∈ E and 0 < r < n, m > 1 =⇒ r + (n− r)m ∈ B (†).

As we noted in Example 3.2, every Steiner triple system has subsystems of orders 0and 1, and every Steiner triple system with at least three elements has a subsystemof order 3.

Now note that the following implications hold. If a natural number n ∈ B thenthere exists a Steiner triple system (X,B) of order n with a subsystem of order 7so n > 0. Also, the triple (0, n, 3) ∈ E, as there always exists a subsystem of order0. By Theorem 3.5(ii) with |X| = n, Y = ∅ and |Z| = 3 the Steiner triple system(U, E) of order 3n contains a copy of (X,B). Since (X,B) contains a subsystem oforder 7, so does (U, E). So 3n ∈ B. We summarise this as

∀n, n ∈ B ⇒ 3n ∈ B. (5.1)

Also, n ∈ A with n > 1 implies that (1, n, 3) ∈ E, since every Steiner triplesystem has a subsystem of order 1. This in turn implies (by †) that 3n − 2 ∈ B.Summarising,

∀n > 1, n⇒ 3n− 2 ∈ B. (5.2)

Similarly, n ∈ A and n ≥ 3 implies that (3, n, 3) ∈ E, since every Steiner triplesystem with at least three elements has a subsystem of order 3. Therefore by † wehave 3n− 6 ∈ B. Therefore

∀n > 3, n ∈ A⇒ 3n− 6 ∈ B (5.3)

Finally, if n ∈ B then by definition there exists a Steiner triple system of order nwith a subsystem of order 7, so (7, n, 3) ∈ E, and so by † if n > 7 then 3n−14 ∈ B.Therefore

∀n > 7, n ∈ B ⇒ 3n− 14 ∈ B (5.4)

Now we can use these equations to prove the following implications:

∀k > 0, 6k + 1 ∈ A⇒ 3(6k + 1)− 2 = 18k + 1 ∈ B, by 5.2∀k > 0, 6k + 3 ∈ A⇒ 3(6k + 3)− 6 = 18k + 3 ∈ B, by 5.3∀k ≥ 0, 6k + 3 ∈ A⇒ 3(6k + 3)− 2 = 18k + 7 ∈ B, by 5.2∀k ≥ 0, 6k + 3 ∈ B ⇒ 3(6k + 3) = 18k + 9 ∈ B, by 5.1∀k ≥ 0, 6k + 9 ∈ B ⇒ 3(6k + 9)− 14 = 18k + 13 ∈ B, by 5.4∀k ≥ 0, 6k + 7 ∈ A⇒ 3(6k + 7)− 6 = 18k + 15 ∈ B. by 5.3.

(5.5)


Now let S be the set of natural numbers that are congruent to 1 or 3 modulo6, that are greater than 13, and that are not members of B. Let n be the smallestnumber in S. Then every number t with t < n and t ≡ 1 (mod 6) or t ≡ 3 (mod 6)belongs to A; for if t > 13 then t ∈ B ⊆ A, while for t ≤ 13 this is Example 2.2.We consider the following cases, depending on the remainder n gives when dividedby 18.

Case 1: n = 18k + 1 with k ≥ 1. From 6k + 1 < n it follows that 6k + 1 ∈ A,and hence n = 18k + 1 ∈ B, a contradiction.



Case 4: n = 18k + 9 with k ≥ 1. If k ≥ 2 then 13 < 6k + 3 < n, and hence6k + 3 ∈ B, implying n = 18k + 9 ∈ B, a contradiction. For k = 1, we have n = 27and

(1, 3, 13) ∈ E ⇒ 1 + 2 · 13 = 27 ∈ B.

Case 5: n = 18k + 13 with k ≥ 1. From 15 ≤ 6k + 9 < n it follows that6k + 9 ∈ B, and hence n = 18k + 13 ∈ B, a contradiction.


This completes the proof of the theorem. �

5. Packings and coverings

We now consider what happens for those n for which there does not exist a Steinertriple system on n points. We do this by examining some families of subsets thatare almost, but not quite, Steiner triple systems.

Definition 5.1. A (2, 3)-packing is a pair (X,B) where X is a set, and B is a set oftriples from X, such that any two points of X are contained in at most one memberof B. A (2, 3)-covering is a pair (X,B) where X is a set, and B is a set of triplesfrom X, such that any two points of X are contained in at least one member of B.

Example 5.2. Let X = {1, 2, 3, 4, 5}. Then {123, 345} is a (2, 3)-packing of X,and {123, 345, 145, 245} is a (2, 3)-covering of X. Note that there are no Steinertriple systems on 5 points.

If (X,B) is a (2, 3)-packing and C ⊆ B then (X, C) is a (2, 3)-packing. Similarly,if (X,B) is a (2, 3)-covering and B ⊆ C, where C is a set of triples of X, then (X, C)is a (2, 3)-covering. We define p(n) to be size of the largest possible set of triples fora (2, 3)-packing on n points. We define c(n) to be the size of the smallest possibleset of triples for a (2, 3)-covering on n points.

Lemma 5.3. 1. p(n) ≤ n(n− 1)/6.

2. c(n) ≥ n(n− 1)/6.

3. Equality holds in either bound if and only if there exists a Steiner triple systemof order n.

Proof. 1. Let (X,B) be a (2, 3)-packing with |X| = n and |B = p(n)}|. Wecount the number of elements of

{(Y,B) : Y ⊂ B, |Y | = 2, B ∈ B}


in two different ways. There are n(n− 1)/2 choices for Y , and each Y is containedin at most one B. There are p(n) triples, and each contains 3 pairs of points. Son(n− 1)/2 ≥ 3p(n), and the result follows.2. This is the same as part 1, except that each pair of points belongs to at mostone triple so the inequality is reversed.3. If equality holds in either bound then every pair of points must belong to exactlyone triple. That is, the (2, 3)-packing or (2, 3)-covering is a Steiner triple system.Conversely, a Steiner triple system is both a (2, 3)-packing and a (2, 3)-covering.�

By Theorems 2.4 and 4.1, if n ≡ 1 or 3 mod 6 then p(n) = c(n) = n(n− 1)/6.For other values, p(n) is smaller than this bound and c(n) is larger. We now provea general theorem that improves these inequalities. Recall that for x ∈ R, we writebxc for x rounded down to the nearest integer, and we write dxe for x rounded up.

Theorem 5.4. 1. p(n) ≤ bn/3b(n− 1)/2cc.

2. c(n) ≥ dn/3d(n− 1)/2ee.

Proof. 1. Let (X,B) be a (2, 3)-packing, and let x ∈ X. Each triple containingx must contain two other elements of X. Since x occurs at most once with eachother element, there are at most (n−1)/2 triples in B that contain x. However, thenumber of triples containing x is an integer, so we can round down to b(n− 1)/2c.

Let us count the set of pairs {(x,B) : x ∈ B, B ∈ B} in two different ways.Firstly, there are n choices for x, and by the previous paragraph there are at mostb(n− 1)/2c choices for B, given x. Secondly there are p(n) choices for B, and givenB there are 3 choices for x. So

nb(n− 1)/2c ≥ 3p(n) ⇒ p(n) ≤ n/3b(n− 1)/2c

and once again we can round this number down, giving the result.2. The proof for this is similar, except that we round up. �

In general, unless n ≡ 1, 3 mod 6 (so that there exists a Steiner triple system onn points) the values of p(n) and c(n) will not attain the bounds given in the abovetheorem. We now give another example of when they are met.

Theorem 5.5. If n ≡ 0, 2 mod 6 then p(n) = n(n− 2)/6 = bn/3b(n− 1)/2cc.

Proof. Since n is even, (n−1)/2 is a fraction and hence b(n−1)/2c = (n−2)/2.Then either n or n− 2 is divisible by 3, and so n(n− 2)/6 is an integer and henceis the upper bound for p(n) given in Theorem 5.4.

On the other hand, if n ≡ 0, 2 mod 6 then n+ 1 is congruent to 1 or 3 mod 6,and so there exists a Steiner triple system S on n + 1 points. This Steiner triplesystem has (n+ 1)n/6 triples, and each point lies in n/2 triples. Remove one pointfrom S, and all triples containing it. This gives a (2, 3)-packing on n points with

(n+ 1)n6

− n

2=n(n− 2)

6

triples, as required. �

Example 5.6. Let n = 6. The finite projective plane with 7 points is a Steinertriple system on n + 1 points. If the triples are {123, 145, 167, 247, 256, 346, 357},then we can remove point number 7 and all triples containing it to get a (2, 3)-packing on 6 points with triples {123, 145, 256, 346}. There are (6 × 4)/6 = 4triples, and each point occurs in b(6− 1)/2c = 2 triples, as required.


6. Designs from perfect codes

In this section we establish a surprising connection between designs and perfectcodes, which, in particular, enables us to prove that there does not exist a perfect2-error correcting code in Z90

2 (see the comment after Example 6.4 in Chapter 2).Recall that a perfect e-error correcting code C ⊆ Zn2 has minimum distance

2e+ 1 and satisfies

|C| = 2n/e∑i=0

(n

i

).

For any word w = x1 . . . xn ∈ Zn2 we define its support to be the set of all positionsin which w has 1, i.e.

supp (w) = {i : xi = 1}.

Also note that without loss of generality we may assume that all codes containthe zero-vector. Indeed, if a code C does not contain the zero-vector, we can takean arbitrary w0 ∈ C, and consider a new code C0 = C + w0. It is clear that0 = w0 + w0 ∈ C0, that |C0| = |C| and that the distances between the code-wordsare the same in C and C0.

Theorem 6.1. Let C ⊆ Zn2 be an e-error correcting perfect code, let X = {1, . . . , n},and let B be the set of all supports of non-zero code-words of minimal weight 2e+1:

B = {supp (w) : w ∈ C, wt(w) = 2e+ 1}.

Then (X,B) is a (v, b, r, k, λ)-design with

v = n, b =(

n

e+ 1

)/

(2e+ 1e+ 1

), r =

(n− 1e

)/

(2ee

),

k = 2e+ 1, λ =(n− 2e− 1

)/

(2e− 1e− 1

).

Proof. That the minimal weight of non-zero code-words is 2e + 1 follows fromthe assumption that 0 . . . 0 ∈ C and that C corrects e errors.

Let us prove that every e + 1-element subset of X is contained in a uniqueelement of B. Indeed, let A ⊆ X with |A| = e + 1, and let u ∈ Zn2 be the uniqueword (not codeword!) with supp (u) = A. From the assumption that C is perfect,and the proof of Theorem 6.1 in Chapter 2, it is clear that the balls

B(w, e) = {x ∈ Zn2 : d(w, x) ≤ e} (w ∈ C)

must be disjoint and must cover Zn2 . So there is a unique ball B(w, e) (w ∈ C) suchthat u ∈ B(w, e). Clearly, w 6= 0 . . . 0, so that

e ≥ d(u,w) = |(supp (u) ∪ supp (w)) \ (supp (u) ∩ supp (w))|= |A ∪ supp (w)| − |A ∩ supp (w)| ≥ |supp (w)| − |A| ≥ 2e+ 1− e− 1 = e.

We conclude that |supp (w)| = 2e + 1, i.e. supp (w) ∈ B, and A ⊆ supp (w). Ifthere existed another code-word w1 ∈ C of weight 2e+ 1 such that A ⊆ supp (w1),it would follow that

d(w,w1) ≤ d(w, u) + d(w1, u) ≤ e+ e < 2e+ 1,

contradicting the assumption that C corrects e errors.


Now, there are exactly(ne+1

)subsets of X of size e+1. Each of them is contained

in a unique set from B. Moreover, each set from B contains exactly(2e+1e+1

)subsets

of size e+ 1. Therefore, B has exactly

b =(

n

e+ 1

)/

(2e+ 1e+ 1

)elements.

Next let x ∈ X be arbitrary. Then x is contained in exactly(n−1e

)subsets of X

of size e+ 1. Each of these subsets is contained in a unique set from B. Moreover,each set from B contains exactly

(2ee

)subsets of size e + 1 which contain a fixed

point. Therefore, x is contained in exactly

r =(n− 1e

)/

(2ee

)sets from B.

By repeating the same argument once more, this time for pairs of points, weconclude that each pair of points is contained in exactly

λ =(n− 2e− 1

)/

(2e− 1e− 1

)sets from B. This completes the proof. �

Corollary 6.2. There does not exist a perfect 2-error correcting code C ⊆ Z902 .

Proof. Assume that such a code exists. Note that, in the notation of the previoustheorem, we have n = 90 and e = 2. By the theorem it follows that the supportsof all code-words of weight 5 form a (v, b, r, k, λ)-design, where the values of theparameters are given above. In particular, we have

λ =(n− 2e− 1

)/

(2e− 1e− 1

)=(

881

)/

(31

)= 88/3,

which is not an integer; a contradiction. �

Documents

Finite Mathematicsturnbull.mcs.st-and.ac.uk/~colva/finitemaths/lnotes.pdf · Finite mathematics is related to almost all other areas of mathematics, and it also has a wide range of