97
The extended Golay code N. E. Straathof July 6, 2014 Master thesis Mathematics Supervisor: Dr R. R. J. Bocklandt Korteweg-de Vries Instituut voor Wiskunde Faculteit der Natuurwetenschappen, Wiskunde en Informatica Universiteit van Amsterdam

The extended Golay code - UvA · the encoding and decoding methods will be explained. Sections 2.5 and 2.4 discuss two subjects that we will need later on, since they are properties

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • The extended Golay code

    N. E. Straathof

    July 6, 2014

    Master thesisMathematics

    Supervisor: Dr R. R. J. Bocklandt

    Korteweg-de Vries Instituut voor Wiskunde

    Faculteit der Natuurwetenschappen, Wiskunde en Informatica

    Universiteit van Amsterdam

  • Abstract

    This thesis discusses a type of error-correcting codes: the extended Golay code G24. Withthe help of general coding theory the characteristic features of this code are explained.Special emphasis is laid on its automorphism group, the group that acts on all codewordsand leaves the code unaltered. It is called the Mathieu group M24, which is one of thefew sporadic groups in the classification of all finite simple groups. The properties ofG24 and M24 are visualised by four geometric objects: the icosahedron, dodecahedron,dodecadodecahedron, and the cubicuboctahedron. The dodecahedron also providesus with a means to visualise the encoding and decoding processes, and it opts for analternative way of discussing the code in coding theory or programming courses.

    Title: The extended Golay codeAuthors: N. E. Straathof, [email protected], 6187501Supervisor: Dr R. R. J. BocklandtSecond grader: Dr H. B. PosthumaDate: July 6, 2014

    Korteweg-de Vries Instituut voor WiskundeUniversiteit van AmsterdamScience Park 904, 1098 XH Amsterdamhttp://www.science.uva.nl/math

    2

    http://www.science.uva.nl/math

  • Contents

    1. Introduction 5

    2. Error-correcting codes 62.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2. Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3. Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4. Cyclic codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.5. Steiner systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    3. The extended Golay code G24 223.1. The extended Golay code . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.2. The Golay code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.3. Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    4. The Mathieu Group M24 394.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.3. Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.4. Multiple transitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.5. Simplicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    5. Geometric constructions 535.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.2. Icosahedron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.3. Dodecahedron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    5.3.1. Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585.3.2. Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    5.4. Dodecadodecahedron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685.5. Cubicuboctahedron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.6. Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

    3

  • 6. Discussion 82

    A. Decoding patterns 84

    B. GAP verifications 92

    C. Popular summary 95

    Bibliography 96

    4

  • Chapter 1

    Introduction

    Communication is important in our daily life. We use phones, satellites, computers andother devices to send messages through a channel to a receiver. Unfortunately, mosttypes of communication are subject to noise, which may cause errors in the messagesthat are being sent. Especially when sending messages is a difficult or expensive task,for example in satellite communication, it is important to find ways to diminish theoccurrance of errors as much as possible. This is the central idea in coding theory: whatmessage was being sent given what we have received? To make this problem as easyas possible we use error-correcting codes. The main idea is to add redundancy to themessages which enables us to both identify and correct the errors that may have occured.

    This thesis discusses a specific type of error-correcting codes, the extended Golay codeG24, named after the Swiss mathematician Marcel J.E. Golay (1902 - 1989). He usedmathematics to solve real-world problems, one of which was the question of how to sendmessages from satellites through space. The extended Golay code was used for sendingimages of Jupiter and Saturn from the Voyager 1 and 2.

    Along with the extended Golay code we discuss a specific Mathieu group, M24, as it ishighly linked to the code. This group is named after the French mathematician ÉmileLéonard Mathieu (1835 - 1890).

    The last part of this thesis describes four geometric figures with which we can visualiseproperties of G24 and M24.

    Upon reading this thesis the reader schould be familiar with group theory, linear algebraand (hyperbolic) geometry.

    5

  • Chapter 2

    Error-correcting codes

    The transfer of information in general comes down to three steps: a source sends, achannel transmits, and a receiver receives. However, in many cases the possibility existsthat the information is altered by noise in the channel. For example, messages whichare sent from satellites through space have a high chance of being altered along the way.The question of how to enlarge the probability of receiving information as was intendedinitiated the study of error-correcting codes : codes which enable us to correct errors thathave occured. Figure 2.1 shows the general idea: a message is encoded into a codeword,it is sent to the receiver through a channel, in this channel the possibility exists thaterrors occur, and the receiver tries to obtain the original message by decoding the word.

    messagem

    codewordx

    received information

    r

    messagem

    encoding sending decoding

    Figure 2.1.: Using error-correcting codes.

    In section 2.1 we start by explaning the basic definitions, and in section 2.2 and 2.3the encoding and decoding methods will be explained. Sections 2.5 and 2.4 discuss twosubjects that we will need later on, since they are properties of the extended Golay code.

    2.1. Definitions

    We use error-correcting codes to understand what was sent to us, even though what wereceived might differ from what was sent. It consists of codewords which are the originalmessages with some amount of redundancy added. In this section we will discuss themost important properties of such codes which allows us to give a detailed description of

    6

  • the extended Golay in chapter 3.

    Definition 1. A message m of length k is a sequence of k symbols out of some finitefield F , so m = (m1 . . .mk) ∈ Fk. Then an n-code C over a finite field F is a set ofvectors in Fn, where n ≤ k. If |F| = q, we call the code q-ary, and if |F| = 2, we call itbinary.

    Since we will be dealing with a binary code only, we will assume codes are binary fromnow on. This means that F = F2 = {0, 1}, and thus that |Fn| = |Fn2 | = 2n.

    Definition 2. The error probability p is the probability that 0 is received when 1 wassent, or 1 is received when 0 was sent. In general we assume that p ∈ [0, 1

    2).

    Figure 2.2 illustrates a binary channel with error probability p.

    Figure 2.2.: Binary channel with error probability p.

    Definition 3. The Hamming weight wt(v) of a vector v ∈ Fn is the number of itsnonzero elements: wt(v) = #{i | vi 6= 0, i = 1, . . . , n}.

    Definition 4. The Hamming distance dist(v,w) of two vectors v,w ∈ Fn is the numberof places where they differ: dist(v,w) = #{i | vi 6= wi, i = 1, . . . , n}.

    The idea is that an n-code C is a strict subset of Fn in which we want the Hammingdistance between any two vectors to be as large as possible. Therefore, the minimumHamming distance is an important characteristic of the code.

    Definition 5. The mimimum Hamming distance d of a code C is defined as d =min{dist(x,y) | x,y ∈ C}.

    Definition 6. An (n,M, d)-code C is a set of M vectors in Fn2 , such that dist(x,y) ≥ dfor all x,y ∈ C. Its block length is n and M is its dimension.

    In general we know little about (n,M, d)-codes. What helps us is to add some structureto the set of vectors of which it consists.

    Definition 7. A linear [n,k,d]-code C is a k-dimensional subspace of Fn2 , such thatdist(x,y) ≥ d for all x,y ∈ C.

    7

  • Remark 8. The notational distinction between linear and non-linear codes is given by thebrackets: an [n, k, d]-code is an (n, 2k, d)-code. If x and y are codewords in C, then forall a, b ∈ F2 also ax + by ∈ C, since C is a subspace of Fn2 . This justifies the term ’linear’.Since dim(C) = k, we have |C| = 2k, but we still call k the dimension of C.Another useful property of linear codes is the following:

    Theorem 9. For a binary linear code C it holds that the minimum Hamming distance isequal to the minimum weight of any non-zero codeword.

    Proof. If x,y ∈ C then also w := x− y ∈ C, since the code is linear. So:

    d = min{dist(x,y) | x,y ∈ C,x 6= y}= min{wt(x− y) | x,y ∈ C,x 6= y}= min{wt(w) | w ∈ C,w 6= 0}.

    For each codeword x we can look at the set of all other codewords y that lie within acertain distance:

    Definition 10. The Hamming sphere St(x) of radius t around a vector x is the set ofvectors y ∈ Fn2 such that dist(x,y) ≤ t.

    Note that the number of vectors y at discance i from x is(ni

    ), since we have to choose i

    out of n places where x and y differ. This means that the number of vectors in St(x) isequal to

    ∑ti=0

    (ni

    ).

    Definition 11. The error-correcting capability t of C is the largest radius of Hammingspheres around all codewords of C, such that for any two different codewords y and ztheir corresponding Hamming spheres St(y) and St(z) are disjoint.

    The error-correcting capability gives you the number of errors that the code is able tocorrect, i.e. the number of digits in which a sent and received vector differ. Note thatthis is not the same as the error-detecting capability, which says how many errors thecodes is able to detect, but not neccesarily to correct.

    Theorem 12. For the error-correcting capability t of an (n,M, d)-code C it holds thatt = b1

    2(d− 1)c 1.

    Proof. Firstly suppose that t > b12(d− 1)c. If we take two codewords x,y ∈ C for which

    dist(x,y) = d, then St(x) ∩ St(y) 6= ∅, a contradiction. So, we must have t ≤ b12(d− 1)c.Now let t < b1

    2(d− 1)c. Then for any two different codewords x,y ∈ C and any vector

    v ∈ St(x) ∩ St(y), we have by the triangle inequality:

    dist(x,y) ≤ dist(x,v) + dist(v,y)1bsc denotes the largest integer not greater than s, for any real valued number s.

    8

  • <

    ⌊1

    2(d− 1)

    ⌋+

    ⌊1

    2(d− 1)

    ⌋≤ d− 1,

    which contradicts the fact that the minimum Hamming distance is d. Hence if t =b12(d−1)c then it is the largest radius such that the Hamming spheres of any two different

    codewords are disjoint.

    Codewords are vectors in Fn2 , so we are able to obtain another vector from a codewordx by permuting its n digits. If we permute the position of the digits only, we say thatthe resulting codeword y is obtained from a positional permutation. If we permute thesymbols at specific places, we say that it is obtained from a symbolic permutation.

    Example 13. Suppose we have a [4, 2, 2]-code C = {(0000), (0011), (1100), (1111)}. Ifwe look at the following sets:

    C1 = {(0000), (1001), (0110), (1111)},C2 = {(0000), (0101), (1010), (1111)},C3 = {(1000), (1011), (0100), (0111)}, andC4 = {(0110), (0101), (1010), (1001)},

    then we see that C1 and C2 are obtained from C by performing a positional permutation:C1 from shifting all digits one place to the right (and thus the fourth digit moves toposition 1), C2 from interchanging the digits on positions 2 and 3. C3 and C4 are obtainedfrom C by performing a symbolic permutation: C3 from changing the first digit and C4from changing the last three digits.

    If we perform the same permutations on all codewords of a code C, the resulting setalso is a code. Luckily, performing either kind of permutations does not alter the setof Hamming distances of the codewords, so the parameters of the code are preserved.Therefore, we can view such codes as being ’the same’:

    Definition 14. Two codes are called equivalent if one can be obtained from the other byperforming a sequence of positional and/or symbolic permutations.

    In the case where C is a linear code we have to check whether the linearity is preserved.Obviously positional permutations do not cause any trouble, but in example 13 we seethat (1010) + (1001) = (0011) 6∈ C4, hence:

    Definition 15. Two codes are called linearly equivalent if one can be obtained from theother by performing a sequence of positional permutations.

    The next section describes how we can turn messages into codewords.

    9

  • 2.2. Encoding

    Preferably we want the construction of a code, i.e. the choice of redundancy that we addto messages, to be a simple function f : Fk2 → Fn2 . For linear codes this exactly is thecase: the function f sends any message m to a vector x ∈ Fn2 , where xH t = 0 for somen− k × n-binary matrix H. Obviously there are several vectors x for which this holds,but in order to obtain one vector x for each message m we set (x1 . . . xk) = (m1 . . .mk),for then:

    xH t = (m1 . . .mk xk+1 . . . xn)Ht

    = mH tk + (xk+1 . . . xn)Htn−k,

    where Hk consists of the first k columns of H and Hn−k consists of its last n− k columns.So:

    xH t = 0⇔mH tk = (xk+1 . . . xn)H tn−k, (2.1)

    which determines the last n − k digits of x uniquely. If we now let H be of the formH = (A | 1n−k), where A is some n− k × k matrix, then equation 2.1 becomes:

    xH t = 0⇔mAt = (xk+1 . . . xn). (2.2)

    The matrix H is called the parity check matrix of our code, and if it is of the form asin 2.2 then it is said to be in normal position.

    Example 16. For a linear code C with n = 6 and k = 3 let H = (A | 13) be the paritycheck matrix with:

    A =

    0 1 11 0 11 1 0

    .If the message m is (011), then we decode it into a codeword as follows: first we set(x1 x2 x3) equal to (011), and second we determine (x4 x5 x6) by putting xH

    t = 0. Asin equation 2.2 this gives us:

    mAt = (011)

    0 1 11 0 11 1 0

    = (011) = (x4 x5 x6).Hence, the codeword is (011011).

    Now we can strengthen our original definition of a linear code:

    Definition 17. An [n, k, d]-code C with parity check matrix H = (A | 1n−k) consists ofall vectors x ∈ Fn2 such that xH t = 0, where A is a some n− k × k binary matrix. Thevectors x are called codewords of C.

    10

  • For any codeword x of a linear code with parity check matrix H in normal position, wecall its first k digits message symbols, and its last n− k digits check symbols. The checksymbols form the earlier mentioned redundancy that is added to messages.

    Example 18. A repetition code C is an [n, 1, n]-code with parity check matrix H = (A |1n−1), where A = (1 . . . 1). If a message m = (0), then the equation (0)A

    t = (x2 . . . xn)gives us x = 0, and if m = (1) then (1)At = (x2 . . . xn) implies x = 1. Hence, C = {0,1}.In both codewords the message symbol is repeated n times, which explains the name ofthe code.

    An equivalent way of encoding messages into codewords of a linear code C is by using agenerator matrix G. Since all linear combinations of codewords in C are again codewords,we can find a basis for C. The rows of G then are the vectors of this basis. In example 16the set B = {(100011), (010101), (001110)} forms a basis: the three vectors are linearlyindependent and |B| = k = 3. For example the codeword (111000) is obtained by addingall three codewords in B. Hence, in this case:

    G =

    1 0 0 0 1 10 1 0 1 0 10 0 1 1 1 0

    .Definition 19. An [n, k, d]-code C with generator matrix G consists of all vectors x ∈ Fn2such that x = mG for all messages m ∈ Fk2.

    Remark 20. Note that from equation 2.2 it follows that x = m(1k | At), so the linearcode with parity check matrix H = (A | 1n−k) is equal to the linear code with generatormatrix G = (1k | At).Another way of viewing how G and H are related is by computing the dual code:

    Definition 21. If we have a linear code C, then the dual code C⊥ consist of all vectorsv ∈ Fn2 such that v · x = v1x1 + . . .+ vnxn = 0, for all x ∈ C.

    We can see that if y, z ∈ C⊥, then x · y = x · z = 0 for all x ∈ C. So, x · (ay + bz) = 0 forall a, b ∈ F2 which implies that also C⊥ is a linear code. In general, the dimension of C⊥is n− k, but its minimum Hamming distance has to be computed seperately for each C.If we now have generator matrices G of C and H of C⊥, then ofcourse HGt = GH t = 0.Hence, H is the parity check matrix of C and G is the parity check matrix of C⊥.

    Until now we assumed H and G are in normal position. Codes that admit such G andH are called systematic codes. However, as long as H has n columns and n− k linearlyindependent rows, it is a parity check matrix. Likewise, for G to be a generator matrix itsuffices to have n columns and k linearly independent rows. Moreover, permuting rowsin G will give the same code, and permuting the colums will give an equivalent code.Luckily, we can always work with systematic codes by the following theorem:

    Theorem 22. For every linear code C there exist matrices H and G that are in normalposition, such that the linear code C ′ with parity check matrix H and generator matrix Gis linearly equivalent to C.

    11

  • Proof. Suppose C is an [n, k, d]-code with generator matrix G (that is not necessarily innormal position). For any 1 ≤ i ≤ k, we can firstly interchange rows to ensure that theleftmost non-zero element starting from row i is in row i. Next we add row i to the rowsabove and below it in order to obtain zero’s in the column where the first non-zero entryappears. This means that we now have obtained a k × n-matrix G′ that is in reducedechelon form, so G′ = MG where M is an invertible k × k-matrix. Hence, G′ has ncolumns and k linearly independent rows, so it is a generator matrix for C. All that wehave left to do is permuting G′s columns so that it is in normal position. If pi is theposition of the first 1 in row i, then we shift the columns such that column pi becomescolumn i, and we are done.

    The reason why we want to encode in this manner, is that the redundancy we add to themessage by multiplying it with a generator matrix allows us to detect some errors thatmight occur, and even correct them. This will be explained in the next section.

    2.3. Decoding

    Noise in the channel through which we send our codeword x gives the possibility thatthe received vector r and x are different. The error vector e is then defined as e = r− x.If the error probability is p, then ofcourse ei = 1 with probability p for i = 1, . . . , n.

    By deciding what message was sent upon receiving r, we want to know what codewordx was sent, and thus what is the error vector e. We will see how the encoding methoddescribed in the previous section is preferable to solve this, by looking at the syndromeof a received vector r:

    Definition 23. The syndrome S(r) of a received vector r ∈ Fn2 is the vector rH t ∈ Fn−k2 ,where H is the parity check matrix of the used linear code C.

    By definition S(x) = 0 if and only if x ∈ C, so:

    S(r) = rH t

    = (x + e)H t

    = xH t + eH t

    = eH t.

    Hence, the syndrome of a received vector is equal to the syndrome of its error vector,and instead of looking at the received vector we can focus on its syndrome. Moreover:

    eH t =n∑i=1

    ei(Ht)i,

    where (H t)i is the ith row of H t, so the syndrome of a received vector is equal to the sum

    of the columns of H where the error occured. Finally, note that there is a one-to-one

    12

  • correspondence between syndromes and cosets of a linear code C, since two codewordsx and y are in the same coset of C if and only if x − y ∈ C, which in turn happens ifand only if (x− y)H t = 0, i.e. xH t = yH t. So, syndromes form a partition of the code.The decoding therefore includes making a list of all possible syndromes of a receivedvector. There might be several error vectors that belong to one syndrome, but since p < 1

    2

    the error vector with the smallest Hamming weight is the one which is most likely to occur.

    To summarize, the method for decoding a vector r when an [n, k, d]-code was used, iscommonly referred to as the standard array, and it goes as follows:

    Algorithm 24. The standard array.

    (i) Make a list of all possible messages and their corresponding codewords.

    (ii) Make a list of all 2n−k syndromes and their corresponding error vectors. If morethan one error vectors belongs to a syndrome, you pick the one(s) with leastHamming weight. You do this by using the fact that a syndrome of a vector is thesum of the columns in the parity check matrix H of C where the error(s) occured.

    (iii) For any received vector r you compute its syndrome, and check in the list whicherror vector e corresponds to it. Then the original codeword most likely wasx = r−e, and hence its first k digits are the message. If more than one error vectorbelongs to a syndrome then they are equally likely, so you have found codewordswhich are all equally likely to have been sent.

    Note that for all error vectors e with Hamming weight t = b12(d−1)c, this standard array

    will give you a solution. For larger Hamming weight of the error vector the possibilityexists that the received vector will be decoded incorrectly.

    Example 25. Let C be the [5, 2, 3]-code with parity check matrix H:

    H =

    1 0 1 0 01 1 0 1 00 1 0 0 1

    .We follow the standard array as in algorithm 24. First we make a list of all possiblemessages and their corresponding codewords:

    message codeword

    (00) (00000)(01) (01011)(10) (10110)(11) (11111)

    Then we compute the 25−2 = 8 syndromes and their corresponding error vectors with thesmallest Hamming weight:

    13

  • syndrome error vector

    (000) (00000)(100) (00100)(010) (00010)(001) (00001)(110) (10000)(101) (00101), (11000)(011) (01000)(111) (00111), (11010)

    For example the syndrom (100) corresponds to the third column of H, so our error vectoris (00100). Likewise, the syndrome (101) is both the sum of the third and the fifthcolumn of H, and the sum of the first and the second column of H. The error vectors(00101) and (11000) have Hamming weight 2, so we both include them in our list.Now suppose we have received r = (11011). Then its syndrome S(r) is:

    S(r) = rH t

    = (11011)

    1 1 00 1 11 0 00 1 00 0 1

    = (110),

    so in the above table we we find that e = (100000), so x = r− e = (01011). Hence, ouroriginal message most likely was m = (01).

    Next we will define the most desirable type of codes: the ones for which no ambiguityabout which error vector to choose can arise. Suppose that C is an [n, k, d]-code. Theneach Hamming sphere St(x) around a codeword x ∈ C contains

    ∑ti=0

    (ni

    )vectors, and

    if we take t = b12(d − 1)c, then by theorem 12 for different x,y ∈ C it holds that

    St(x) ∩ St(y) = ∅. Now since #{St(x) | x ∈ C} = #C = 2k, we have:

    #(∪x∈CSt(x)) = 2kt∑i=0

    (n

    i

    )≤ 2n,

    and we obtain the following inequality, which is called the sphere packing boundary :

    t∑i=0

    (n

    i

    )≤ 2n−k. (2.3)

    As mentioned before we want a code to contain those codewords such that the probabilitythat the receiver will be able to determine what message was sent is as large as possible.

    14

  • On the other hand, we also want the codewords not to be too large, since then thecommunication will take up much time. Hence, for an [n, k, d]-code we want to increaseboth the ratio d

    nand the rate of efficiency R = k

    n. The sphere packing boundary is one

    of the few inequalities that gives a relation between these parameters. Preferably, wewant it to be an equality, since then the disjunct Hamming spheres cover up the wholeof the code. If this is the case, we call the code perfect:

    Definition 26. An [n, k, d]-code C with t = b12(d− 1)c is perfect if and only if:

    t∑i=0

    (n

    i

    )= 2n−k. (2.4)

    Theorem 27. An [n, k, d]-code C is perfect if and only if for every possible syndromethere is a unique error vector e with wt(e) ≤ t = b1

    2(d− 1)c.

    Proof. For a codeword x we know that all error vectors inside St(x) have a differentsyndrome, since the error-correcting capability is t. The number of vectors inside St(x)is equal to

    ∑ti=0

    (ni

    ), and the number of syndromes is 2n−k. By the pigeonhole principle

    it holds that if #St(x) is equal to the number of different syndromes, then we must havethat every syndrome has a unique error vector e of Hamming weight less than or equalto t. Since the definition of a perfect code requires that

    ∑ti=0

    (ni

    )= 2n−k, we have the

    desired result.

    Note that linear codes with even d can never be perfect: if d is even then there arevectors in Fn2 which are exactly at distance

    12d from two different codewords, so the error

    vectors of different codewords cannot be unique.

    Example 28. The binary linear code C of example 18 is perfect if and only if n is odd:

    • If C is perfect, then d is odd by the above comment, and hence n is odd.

    • If n is odd, say n = 2m+ 1 for some m ∈ N ∪ {0}, then:

    t∑i=0

    (n

    i

    )=

    t∑i=0

    1

    2

    ((n

    i

    )+

    (n

    n− i

    ))

    =1

    2

    12(n−1)∑i=0

    ((n

    i

    )+

    (n

    n− i

    ))=

    1

    2

    n∑i=0

    (n

    i

    )=

    1

    22n

    = 2n−1,

    so by definition 26 C is perfect.

    15

  • Perfect codes are desirable but, unfortunately, there are not many of them. An importantexample of a perfect code is the [23, 11, 7] Golay code, which is closely related to the[24, 12, 8] extended Golay code. In the next sections we will discuss two more topics thatenable us to describe these codes in full detail in chapter 3.

    2.4. Cyclic codes

    Linearity imposes quite some restrictions on the parameters that we want to use for ourcodes. Cyclic codes form another type of codes that allow more flexibility and thereforeare widely used. As the Golay codes too are cyclic, this section discusses the propertiesof such codes.

    Definition 29. An [n, k, d]-code C is cyclic if and only if (x1 . . . xn) ∈ C ⇒ (xn x1 . . . xn−1) ∈C, for all x ∈ C.

    Note that (x1 . . . xn) · (y1 . . . yn) = (xn x1 . . . xn−1) · (yn y1 . . . yn−1), so if C is cyclic thenC⊥ is too. Obviously, if the rows of a generator matrix of a code are cyclic shifts of eachother, this code is cyclic.

    If we look at the polynomial ring Fn2 [X], we see that with every vector r ∈ Fn2 we canassociate a polynomial r(X) as follows:

    (r0 . . . rn−1) 7→ r(X) = r0 + r1X + . . .+ rn−1Xn−1.

    Cyclicity of a linear code C implies that (rn−1 r0 . . . rn−2) ∈ C too, but the associatedpolynomial then is r0X + . . . + rn−1X

    n, which we cannot associate with a codewordanymore. We solve this by working with the quotient ring R = Fn2 [X]/〈Xn − 1〉, sothat multiplying with X in R corresponds to a cyclic shift in Fn2 . The association of apolynomial to a vector then becomes the map f :

    f : Fn2 → Fn2 [X]/〈Xn − 1〉(r0 . . . rn−1) 7→ r(X) mod (Xn − 1).

    Definition 30. An ideal I of Fn2 [X]/〈Xn − 1〉 is a linear subspace of Fn2 [X]/〈Xn − 1〉such that if r(X) ∈ I, then so is r(X) ·X. If in addition I is generated by one polynomialof Fn2 [X]/〈Xn − 1〉, then I is called a principal ideal.

    We can easily see that the image of a cyclic [n, k, d]-code C under f is a subset ofFn2 [X]/〈Xn−1〉 that is closed under addition (linearity) and under multiplication with X

    (cyclicity). Hence, it is closed under multiplication with any polynomial modulo (Xn−1),and C is an ideal of Fn2 [X]/〈Xn − 1〉. If C is principal, then its generator polynomialg(X) plays a similar role as the generator matrix of a linear code, as we will see in thenext theorem.

    16

  • Theorem 31. Let C be any non-zero ideal in Fn2 [X]/〈Xn−1〉, i.e. a cyclic code of lengthn. Then:

    (i) C = 〈g(X)〉, where g(X) is a unique monic polynomial of minimal degree r in C,

    (ii) g(X) is a factor of Xn − 1,

    (iii) If g(X) = g0 + g1X + . . .+ grXr, then C has a generator matrix G which is given

    by:

    G =

    g(X)

    g(X) ·X. . .

    g(X) ·Xn−r−1

    .Proof. Let g(X) be a non-zero polynomial in Fn2 [X]/〈Xn − 1〉 with minimal degree r.

    (i) Suppose g’(X) too is a polynomial of degree r in C. Because an ideal is a linearsubspace of Fn2 [X]/〈Xn − 1〉, also g’(X) − g(X) ∈ C. However, g’(X) − g(X)has degree lower than r, a contradiction unless g’(X) = g(X), so g(X) is unique.If now f(X) is an arbitrary polynomial in Fn2 [X]/〈Xn − 1〉 then we can writef(X) = h(X)g(X) + r(X), where deg(r(X)) < r and h(X) ∈ Fn2 [X]/〈Xn − 1〉.However, then r(X) = f(X)− h(X)g(X) ∈ C, and since g(X) has minimal degreewe must conclude that r(X) = 0. Hence, any polynomial in C is divisible by g(X)so C = 〈g(X)〉. Since g(X) is only defined up to multiplication with a constant wemay assume that it is monic.

    (ii) Write Xn − 1 = q(X)g(X) + r(X) ∈ Fn2 [X], where deg(r(X)) < r. By thesame argument as in (i) we must conlcude that r(X) = 0 in Fn2 [X]/〈Xn − 1〉, soXn − 1 = q(X)g(X). Hence, g(X) is a factor of Xn − 1.

    (iii) From (i) we know that we can write any polynomial f(X) ∈ C as f(X) = h(X)g(X),and from (ii) we know that Xn − 1 = q(X)g(X). So in Fn2 [X]/〈Xn − 1〉:

    f(X) = h(X)g(X)

    = h(X)g(X) + p(X)(Xn − 1) for any p(X) ∈ C= h(X)g(X) + p(X)q(X)g(X)

    = (h(X) + p(X)q(X))g(X)

    = c(X)g(X),

    where deg(c(X)) ≤ n − r − 1. So any f(X) ∈ C can be written uniquely asf(X) = c(X)g(X). Since we have n − r linearly independent multiples of g(X),namely g(X),g(X) ·X, . . . ,g(X) ·Xn−r−1, and the dimension of C is n− r, we seethat C is generated as a subspace of Fn2 by the rows of G.

    17

  • Example 32. Suppose C is a cyclic [7, 4, 3]-code, with generator polynomial g(X) =1 +X +X3. Then n = 7 and r = 3, so n− r − 1 = 3 and we have:

    G =

    g(X)

    g(X) ·Xg(X) ·X2

    g(X) ·X3

    =

    1 +X +X3

    X +X2 +X4

    X2 +X3 +X5

    X3 +X4 +X6

    ,which can be expressed as the generator matrix for which each row has a 1 on thepositions that correspond to the powers of the polynomial in that row, e.g. 1 +X +X3

    gives the row (1101000). So:

    G =

    1 1 0 1 0 0 00 1 1 0 1 0 00 0 1 1 0 1 00 0 0 1 1 0 1

    .A specific type of cyclic codes are quadratic residue (QR-)codes. These are codes over Fpq ,where p is prime and q is a quadratic residue modulo p, i.e. q = i2 for some i = 1, . . . , p−1.If q = 2, the QR-code is called binary. If we let Q be the set of quadratic residues modulop and let N = Fpq\(Q ∪ {0}), then:

    Definition 33. A QR-code C over Fpq is the cyclic code with generator polynomialg(X) ∈ Fq[X]/〈Xp − 1〉, given by:

    g(X) =∏r∈Q

    (X − αr),

    where α is a primitive pth root of unity in the splitting field of Xp − 1.

    Note that:

    Xp − 1 = (X − 1)∏r∈Q

    (X − αr)∏r∈N

    (X − αr) (2.5)

    = (X − 1)g(X)∏r∈N

    (X − αr). (2.6)

    Finally, we move on to the our last subject that enables us to discuss the Golay codes:Steiner systems.

    18

  • 2.5. Steiner systems

    Definition 34. Let t, k, v be positive integers such that t ≤ k ≤ v. Let X be a set thatconsists of v points, where we call a subset of k points a block. Then a Steiner systemS(t, k, v) is a collection of disctinct blocks such that any subset of t points is contained inexactly one block.

    Example 35. Let X be the projective plane of order 2 that consists of 7 points, seefigure 2.3. We call a line through 3 points a block. Then X is a Steiner system S(2, 3, 7),since the total number of points is 7, a block is a subset of 3 points, and any subset of 2points is contained in exactly one block.

    Figure 2.3.: The projective plane X.

    Definition 36. Let p1, . . . , pk be the points in a block of a Steiner system S(t, k, v), andlet λij be the number of blocks which contain p1, . . . , pi but do not contain pi+1, . . . , pj for0 ≤ i ≤ j ≤ k. If λij is constant, i.e. does not depend on the choice of p1, . . . , pj, thenλij is called a block intersection number.If i = 0, then λij is the number of blocks that do not contain p1, . . . , pj, and if i = j thenλij is the number of blocks that do contain p1, . . . , pi.

    Theorem 37. Let t, k, v be integers for which a Steiner system S(t, k, v) exists. Then:

    (i) λij = λi,j+1 + λi+1,j+1 for all 0 ≤ i ≤ j ≤ k (the Pascal property),

    (ii) Let p1, . . . , pt be any t distinct points and let λi be the number of blocks that containp1, . . . , pi for i ≤ t, where λ0 is the total number of distinct blocks. Then λi isindepent of the choice of the i points out of {p1, . . . , pt}, and for all i ≤ t it holdsthat:

    λi =

    (v−it−i

    )(k−it−i

    ) ,(iii) λii = λi for all i < t, and λii = 1 for all i ≥ t.

    19

  • Proof. (i) Let 0 ≤ i ≤ j ≤ k. Then λij is equal to the number of blocks that containp1, . . . , pi but do not contain pi+2, . . . , pj+1 by definition of the block intersectionnumber. This set of blocks consists of blocks that contain pi+1 and those that donot: the number of blocks that do contain pi+1 is equal to λi+1,j+1 and the numberof blocks that do not is equal to λi,j+1. Hence, λij = λi,j+1 + λi+1,j+1.

    (ii) The proof of this step goes by induction on i. Firstly, let i = t. Then by definitionof a Steiner system λt is independent of the choice of p1, . . . , pt and it is equal to 1.Moreover: (

    v−tt−t

    )(k−tt−t

    ) = (v−t0 )(k−t0

    ) = 11

    = 1,

    so the statement holds for i = t. Now suppose it holds for some i + 1. For eachblock B that contains p1, . . . , pi and each point q that is different from p1, . . . , piwe define:

    φ(q, B) :=

    {1 if q ∈ B0 if q 6∈ B.

    Then by the induction hypothesis we have:∑q

    ∑B

    φ(q, B) = λi+1(v − i),

    since λi+1 is the number of blocks containing p1, . . . , pi and q, and v − i is thenumber of choices for q. Also:∑

    B

    ∑q

    φ(q, B) = λi(k − i),

    since λi is the number of blocks containing p1, . . . , pi and k − i is the number ofchoices to choose q ∈ B. Hence, λi is independent of the choice of p1, . . . , pi and:

    λi = λi+1v − ik − i

    =

    (v−i−1t−i−1

    )(k−i−1t−i−1

    ) v − ik − i

    =

    (v−i−1t−i−1

    )(k−i−1t−i−1

    ) v−it−ik−it−i

    =

    (v−it−i

    )(k−it−i

    ) .(iii) By definition λii = λi for all 0 ≤ i ≤ t (here λ00 is interpreted as the number of

    blocks that ’do not contain nothing’, so it equals the total number of blocks). In(ii) we have seen that if i = t then λt = 1, which immediately implies that λi = 1for all i > t as well.

    20

  • Remark 38. Suppose that from any Steiner system S(t, k, v) we pick a point p. Thenwe can divide the blocks into two sets: the λ01 blocks that do not contain p, and the λ1blocks that do contain p. If we now omit p from our system, then we can easily see thatthe λ1 sets of k − 1 points form the blocks of a Steiner system S(t− 1, k − 1, v − 1) withblock intersection numbers λi+1,j+1. Hence, if a Steiner system S(t, k, v) exists, then sodoes a Steiner system S(t− 1, k − 1, v − 1).By theorem 37(i) we can make a Pascal triangle of the block intersection numbers asfollows:

    λ0λ01 λ1

    λ02 λ12 λ2· · · · · · · · ·

    Example 39. The Pascal triangle of the block intersection numbers of figure 2.3 is:

    74 3

    2 2 10 2 0 1

    Now that we have explored error-correcting codes in general we can move on to the Golaycodes.

    21

  • Chapter 3

    The extended Golay code G24

    The extended Golay code was used for sending messages through space. This is becauseit is useful in situations where there is a high risk of noise in the channel, and whensending messages is difficult or expensive.

    We will discuss the most imporant features of the extended Golay code G24 in section 3.1.In section 3.2 we discuss another Golay code, G23, which is linked to G24 and which wewill need later on. Section 3.3 discusses how the decoding works for both codes.

    3.1. The extended Golay code

    Definition 40. The extended Golay code G24 is the binary linear code with generatormatrix G = (112 | B), where B is given by:

    B =

    1 1 0 1 1 1 0 0 0 1 0 11 0 1 1 1 0 0 0 1 0 1 10 1 1 1 0 0 0 1 0 1 1 11 1 1 0 0 0 1 0 1 1 0 11 1 0 0 0 1 0 1 1 0 1 11 0 0 0 1 0 1 1 0 1 1 10 0 0 1 0 1 1 0 1 1 1 10 0 1 0 1 1 0 1 1 1 0 10 1 0 1 1 0 1 1 1 0 0 11 0 1 1 0 1 1 1 0 0 0 10 1 1 0 1 1 1 0 0 0 1 11 1 1 1 1 1 1 1 1 1 1 0

    .

    Remark 41. In general G24 refers to any of the linear codes that are linearly equivalentto the one in definition 40. For example, we can find a generator matrix G for G24 whoserows all have Hamming weight 8.

    22

  • Firstly notice that B2 = 112 and Bt = B, which means that G is a parity check matrix

    itself and H is a generator matrix too. We can easily verify that any sum of two rowsof G has Hamming weight 8. If we multiply G with its transpose, we obtain GGt = 0,so r · r’ =

    ∑ni=1 rir

    ′i = 0 for each two rows r, r’ of G. This implies that x · y = 0 for

    all x,y ∈ G24, so G24 is contained in its dual. However, since H is obtained from G bypermuting columns, we have that G24 and G

    ⊥24 are linearly equivalent, so they have the

    same dimension. Hence, G24 = G⊥24. Moreover, we must have that #{i | xi = yi} is even

    for all x,y ∈ G24. Now if we assume both wt(x) and wt(y) are divisible by 4 then:

    wt(x + y) = #{i | xi + yi = 1}= #{i | (xi, yi) ∈ {(1, 0), (0, 1)}}= #({i | xi = 1}\{i | xi = yi = 1}) + #({i | yi = 1}\{i | yi = xi = 1})= wt(x) + wt(y)− 2wt(x · y)= wt(x) + wt(y),

    so wt(x + y) again is divisible by 4. We can easily check that each row of G has weight 8or 12. Combining all this immediately proves the following lemma:

    Lemma 42. The Hamming weight of each codeword of G24 is divisible by 4.

    Proof.

    We can permute the columns of G to obtain a linearly equivalent code, which by remark 41gives us ’another’ Golay code G24. We take G = (L | R), with:

    L =

    1 1 0 0 0 0 0 0 0 0 0 01 0 1 0 0 0 0 0 0 0 0 01 0 0 1 0 0 0 0 0 0 0 01 0 0 0 1 0 0 0 0 0 0 01 0 0 0 0 1 0 0 0 0 0 01 0 0 0 0 0 1 0 0 0 0 01 0 0 0 0 0 0 1 0 0 0 01 0 0 0 0 0 0 0 1 0 0 01 0 0 0 0 0 0 0 0 1 0 01 0 0 0 0 0 0 0 0 0 1 01 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0

    , and

    23

  • R =

    0 1 1 0 1 1 1 0 0 0 1 00 1 0 1 1 1 0 0 0 1 0 10 0 1 1 1 0 0 0 1 0 1 10 1 1 1 0 0 0 1 0 1 1 00 1 1 0 0 0 1 0 1 1 0 10 1 0 0 0 1 0 1 1 0 1 10 0 0 0 1 0 1 1 0 1 1 10 0 0 1 0 1 1 0 1 1 1 00 0 1 0 1 1 0 1 1 1 0 00 1 0 1 1 0 1 1 1 0 0 00 0 1 1 0 1 1 1 0 0 0 11 1 1 1 1 1 1 1 1 1 1 1

    .

    We label the 12 columns of L from left to right as l∞, l1, . . . , l11, and the columns of Ras r∞, r1, . . . , r11. We see that the columns l1, . . . , l11, r∞ form the identity matrix 112,and columns r1, . . . , r11, l∞ form our original matrix B. This enables us to deduce thefollowing facts about G24:

    Theorem 43. G24 is invariant under the permutation τ = (l∞r∞)(l0r0)(l1r10)(l2r9) · · · (l10r1).

    Proof. If g0, . . . , g11 are the rows of G = (L | R), then:

    τ(g0) = τ(110000000000011011100010)

    = (010100011101110000000000)

    = g0 + g1 + g3 + g4 + g5 + g9 + g11,

    which is in G24 since it is a linear code. In the same manner we can show that τ(gi) is asum of certain rows of G24 for i = 1, . . . , 10. Finally:

    τ(g11) = τ(000000000000111111111111)

    = (111111111111000000000000)

    = g⊥11,

    which a row of G⊥24 which we have shown to be equal to G24.

    Corollary 44. If we denote a codeword x of G24 by (xL | xR), where xL and xR consistof the first and last twelve digits of x respectively, then theorem 43 states that alsox’ = (x’L | x’R) is in G24, where wt(xL) = wt(x’R) and wt(xR) = wt(x’L).

    For any message m we have a codeword x = mG = (xL | xR), where wt(xL) = wt(xR) ≡ 0mod 2, since the weight of each row of L and the weight of each row of R is even. Thisleads to the following result:

    Theorem 45. No codeword of G24 can have Hamming weight 4.

    Proof. Suppose x is a codeword of G24 of weight 4. Then we have three possibilities:wt(xL) = 0, 2 or 4. We use generator matrix G = (L | R) for G24 that is given inlemma 42 to check these options.

    24

  • (i) If wt(xL) = 0, then for the original message m it holds that m1 = . . . = m11 = 0,and m12 can either be 0 or 1. If m12 = 0, then wt(xR) = 0, and if m12 = 1, thenwt(xR) = 12. In both cases wt(x) 6= 4, a contradiction.

    (ii) If wt(xL) = 2, then for the original message m there are two cases. In case 1) itholds that mi = 1 for some i = 1, . . . , 11 where all other digits are 0 and m12 iseither 0 or 1. Both when m12 = 0 and m12 = 1 we can easily verify that wt(xL) = 6.In case 2) it holds that mi = mj = 1 for some i, j = 1, . . . , 11 i 6= j, where all otherdigits are zero, and m12 is either 0 or 1. Again for either m12 we have that wt(xR)= 6. So all possibilities give us wt(x) = 8, a contradiction.

    (iii) If wt(xL) = 4, then wt(xR) must be equal to 0 and by corollary 44 we havex’ = (x’L | x’R) ∈ G24, where wt(x’L) = 0. However, this is case (i) so again wehave a contradiction.

    Corollary 46. No codeword of G24 can have Hamming weight 20.

    Proof. If we take m = (1, . . . , 1) ∈ F122 , then mG = 1 ∈ G24. So if x is a codeword withwt(x) = 20, then x + 1 again is a codeword with wt(x + 1) = 4, but by corollary 45 thiscannot hold.

    Theorem 47. G24 is a [24, 12, 8]-code.

    Proof. We already know thatG24 is linear and that it has block length 24 and dimension 12.By lemma 42 the weight of each codeword of G24 is divisible by 4, by theorem 45 there areno codewords of Hamming weight 4, and we can easily check that there exist codewords ofweight 8. For example, take m = (1 0 . . . 0), then mG = (110000000000 | 011011100010)so wt(mG) = 8. Hence, the minimal Hamming weight of all codewords is 8. Since G24 islinear, by theorem 9 we know that the minimal Hamming weight is equal to the minimalHaming distance, so G24 is a [24, 12, 8]-code.

    The possible Hamming weights of codewords in G24 are 0, 8, 12, 16 and 24. Out of thetotal of 212 codewords in G24 it is interesting to see how many of them have weight i,the so-called weight distribution number Ai. Ofcourse we have A1 = A24 = 1, since 0and 1 are the only codewords of Hamming weight 0 and 24 respectively. Also, G24 isself-dual so for any codeword of Hamming weight 8 there is exactly one codeword ofHamming weight 16, namely its complement. Hence, A8 = A16. We will use the proof oftheorem 45 to derive A8.

    Suppose x is a codeword of Hamming weight 8. If wt(xL) = 0 then wt(xR) is either 0 or12, but both cases this means that wt(x) 6= 8. By corollary 44 then wt(xL) cannot beequal to 8 either. Hence, wt(xL) = 2, 4 or 6:

    (i) If wt(xL) = 2, then there are two cases for the original message m, as we have seenin the proof of theorem 45. For case 1), we have to choose 1 out of 11 digits to

    25

  • set equal to 1, which gives us 11 possibilities. For case 2), we have to choose 2 outof 11 digits to set equal to 1, which gives us

    (112

    )possibilities. Since in both cases

    m12 can either be 0 or 1, we have to multiply by 2. This gives us 2 · (11 +(112

    ))

    possibilities for wt(xL) = 2.

    (ii) If wt(xL) = 4, then wt(xR) must be equal to 4 as well. This means that for theoriginal message m there are two cases. 1) It holds that mi = mj = mk = 1 forsome i, j, k = 1, . . . , 11 that are distinct, and all other digits are zero. 2) It holdsthat mi = mj = mk = ml = 1 for some i, j, k, l = 1, . . . , 11 that are distinct, andall other digits are zero. In both cases m12 must be equal to 0, otherwise wt(xR) =8. This means that we have

    (113

    )+(114

    )possibilities for wt(xL) = 4.

    (iii) If wt(xL) = 6, then by corollary 44 there is a codeword x ∈ G24 for which itholds that wt(x’L) = 2. However, this is case (i), so again we have 2 · (11 +

    (112

    ))

    possibilities.

    Combining all this gives us:

    A8 = A16 = 2 ·(

    11 +

    (11

    2

    ))+

    ((11

    3

    )+

    (11

    4

    ))+ 2 ·

    (11 +

    (11

    2

    ))= 2 · 66 + (165 + 330) + 2 · 66= 759, and:

    A12 = 212 − A1 − A8 − A16 − A24

    = 4096− 1− 759− 759− 1= 2576.

    Table 3.1 gives an overview of the weight distribution numbers of G24.

    Hamming weight i 0 8 12 16 24Weight distribution number Ai 1 759 2576 759 1

    Table 3.1.: Weight distribution of G24.

    If we focus on the 759 codewords of Hamming weight 8, we will see that they form animportant subset of G24.

    Theorem 48. Any vector r ∈ F242 with wt(r) = 5 is covered1 by exactly one x ∈ G24 forwhich wt(x) = 8.

    Proof. Suppose we have a vector r with wt(r) = 5.

    (i) If it is covered by two codewords x,y ∈ G24 of Hamming weight 8, then we haverai = xai = yai for some ai ∈ {1, . . . , 24} and i = 1, . . . , 5. Since wt(x) = wt(y) =8, x and y can differ at at most 6 places, which means that dist(x,y) ≤ 6. However,this is a contradiction to the fact that d = 8.

    1If for two vectors r,w ∈ Fn2 it holds that r covers w, then ri = 1 implies that wi = 1 for all i = 1, . . . , n.

    26

  • (ii) If x is a codeword with wt(x) = 8, then the number of vectors r with wt(r) = 5that it covers is

    (85

    ), since there are 5 out of 8 digits that you can choose to set

    equal to 1.

    (iii) We have shown that there are 759 codewords of Hamming weight 8 in G24, and759 ·

    (85

    )=(245

    )which is exactly the number of vectors r with wt(r) = 5.

    Theorem 49. The codewords x ∈ G24 with wt(x) = 8, the so called octads, form theblocks of a Steiner system S(5, 8, 24).

    Proof. This follows immediately from theorem 48.

    Since we have a generator matrix for which any sum of two of its columns has Hammingweight 8, the octads generate G24. Hence, the extended Golay code is the vector spaceover F2 spanned by the octads of the Steiner system S(5, 8, 24).

    By theorem 37 we can make a Pascal triangle of the Steiner system S(5, 8, 24), which isgiven in table 3.2:

    759506 253

    330 176 77210 120 56 21

    130 80 40 16 578 52 28 12 4 1

    46 32 20 8 4 0 130 16 16 4 4 0 0 1

    30 0 16 0 4 0 0 0 1

    Table 3.2.: The Pascal triangle of S(5, 8, 24).

    From this triangle we can derive important properties of the blocks of a Steiner systemS(5, 8, 24). For example, it will help us in proving that this Steiner system S(5, 8, 24) isunique, which we will see in theorem 55. First we will need some preliminary results.

    Theorem 50. Let {λij}0≤i≤j≤8 be the block intersection numbers of G24 as in table 3.2.Then:

    (i) Two blocks can meet in either 0, 2 or 4 points,

    (ii) If two blocks meet in 4 points, then their sum again is a block,

    (iii) The blocks meeting any block in 4 points determine all other blocks.

    27

  • Proof. We denote the 24 points of S(5, 8, 24) by {a, b, . . . , x}. Since the octads of G24form the blocks of a Steiner system S(5, 8, 24), we can view addition of blocks in S(5, 8, 24)as taking their symmetric difference. E.g. {abcdefgh}+ {abcdijkl} = {efghijkl}.

    (i) In table 3.2 we see that the number of blocks in S(5, 8, 24) containing 5, 6, 7 or8 points are λ5 = . . . = λ8 = 1, so blocks cannot meet in 5 points or more. Theoctads of G24 form the blocks of a Steiner system S(5, 8, 24), and since the weightof all codewords in G24 is even, blocks can meet in 0, 2 or 4 points only.

    (ii) Let A and B be two blocks that meet in 4 points, say A = {abcdefgh} andB = {abcdijkl}. In S(5, 8, 24) each set of 5 points is covered by exactly one block.Suppose A+B = {efghijkl} is not a block. Then, if C is the block that contains{efghi}, by (i) it must contain at least one more point of B, say C = {efghijmn}.Likewise, the block D that contains {efghk} must contain at least one more pointof B, say D = {efghklop}. Now let E be the block that contains {efgik}. By (i)it holds that E must contain at least one more point of A. If this point is a, b, c ord, then we see that E must also contain at least one more point of B, etc. However,then E meets A and B in more than 4 points, a contradiction with (i). If it is h,then E meets C and D in 5 points, also a contradiction to (i). Hence, A+B mustbe a block.

    (iii) Suppose we have all the blocks in S(5, 8, 24) that meet any given block A in 4points. Then by (i) we must find all blocks that meet A in either 0 or 2 points. Bytable 3.2 we know that the number of blocks that meet A in 0 points is λ08 = 30,and the number of blocks that meet A in 2 points is λ28 = 16. We can writeA = {abcde . . .} since any five points of a block uniquely determine three others.The number of blocks that contain {abcd} is λ4 = 5, so next to A we have 4more which we denote by B1, B2, B3 and B4. Likewise, the 4 blocks other thanA that contain {abce} we denote by C1, C2, C3 and C4. From (ii) we know thatthe Bn + Cm are blocks for n,m = 1, . . . , 4. Since they are clearly distinct and#{Bn + Cm | n,m = 1, . . . , 4} = 16, they are the blocks that meet A in 2 points:in d and e. Likewise, the Bn + Bm are blocks for n,m = 1, . . . , 4. Obviously, theBn +Bm all meet A in 0 points and #{Bn +Bm | n,m = 1, . . . , 4} = 3! = 6. Sinceinstead of e we can also choose a, b, c or d, we have 5 choices in total. This gives usthe 5 · 6 = 30 blocks that meet A in 0 points.

    We obtain other useful properties of the blocks of S(5, 8, 24) by looking at sextets. Thesewill help us by proving uniqueness of the Steiner system S(5, 8, 24). Since for any fourpoints a fifth uniquely determines a block, we can divide the 24 points into 6 distinctsubsets of 4 points as follows: if we have {abcd}, then {e} gives us {fgh}, {i} gives us{jkl}, etc. We obtain the following partition of S(5, 8, 24):

    28

  • {abcd} {e} {fgh}{i} {jkl}{m} {nop}{q} {rst}{u} {vwx}

    Table 3.3.: Partition of S(5, 8, 24).

    We can thus say that this partition is defined by {abcd}.

    Definition 51. A sextet is a partition of S(5, 8, 24) into 6 disjunct subsets that is definedby a set of 4 points. These 6 subsets are called tetrads.

    Remark 52. Note that the definition of a sextet implies that any two tetrads togetherform a block of S(5, 8, 24).

    We can look at the number of points in which a block meet such sextets. We denote thisnumber by xk11 · · · xk66 , where xi is the number of points in which the block meets tetradi, and ki gives the amount of tetrads for which this number is the same. E.g. for thesextet in table 3.3 and block {abcdefgh}, this number is 42 · 04.

    Theorem 53. The number of points in which a block meets a sextet is either 42 · 04,24 · 02 or 31 · 15.

    Proof. Suppose we have a sextet where T is one of its tetrads. Let B = {abcdefgh} beany block. Then B can meet T in 4, 3, 2, 1 or 0 points.Firstly suppose B meets T in 4 points, say in {abcd}. Let S be any other tetrad. Then byremark 52 also {abcd} ∪ S is a block. If {efgh} meets S in 1, 2 or 3 points, then we arein the situation in which two blocks meet in 5, 6 or 7 points respectively, a contradictionto theorem 50 (i). Therefore, {efgh} must meet S in 0 or 4 points. If they meet in 0points, then there must be another tetrad S ′ that meets {efgh} in 4 points. Hence, if Bmeets a tetrad in 4 points, then xk11 · · ·xk66 = 42 · 04.Secondly suppose that B meets T in 3 points. Then by theorem 50 (ii) any other tetrad Smust meet B in precisely 1 point. Hence, the only possibility is that xk11 · · · xk66 = 31 · 15.Now suppose that B meets T in 2 points. Let S be any other tetrad. Since T ∪ S is ablock, by theorem 50(ii) B and S can only meet in 0 or 2 points. The only possibilitythus is xk11 · · ·xk66 = 24 · 02.Finally, when B meets T in 0 points, obviously the only possibilities are xk11 · · · xk66 = 42 ·04or 24 · 02.

    Definition 54. If we label the tetrads of any two sextets A and B as A1, . . . , A6 andB1, . . . , B6, then the ij

    th entry of the 6× 6 intersection matrix IAB of A and B is givenby the number of points that are both in Ai and Bj, for all i, j = 1, . . . , 6.

    29

  • Using remark 52 and theorem 53 we can easily verify that an intersextion matrix of anytwo different sextets must be of the following form:

    4 0 0 0 0 00 4 0 0 0 00 0 4 0 0 00 0 0 4 0 00 0 0 0 4 00 0 0 0 0 4

    if the sextets are equal, (3.1)

    2 2 0 0 0 02 2 0 0 0 00 0 2 2 0 00 0 2 2 0 00 0 0 0 2 20 0 0 0 2 2

    if two colums of a sextet meet the other sextet in 42 · 04 points,

    (3.2)2 0 0 0 1 10 2 0 0 1 10 0 2 0 1 10 0 0 2 1 11 1 1 1 0 01 1 1 1 0 0

    if two columns of a sextet meet the other sextet in 24 · 02 points,

    (3.3)3 0 0 0 0 11 0 0 0 0 30 1 1 1 1 00 1 1 1 1 00 1 1 1 1 00 1 1 1 1 0

    if two columns of a sextet meet the other sextet in 31 · 51 points.

    (3.4)

    We have now obtained enough tools to prove the following theorem.

    Theorem 55. The Steiner system S(5, 8, 24) is unique.

    Proof. We will show that an arbitraty block O of S(5, 8, 24) determines all others, whichmeans that S(5, 8, 24) is unique. For this, we will look at 7 sextets that are uniquelydetermined by O. Then we will look at their intersection matrices and pick the ones thatare of the same form as in 3.2. We will then show that with these intersection matrices wecan obtain all blocks that meet O in 4 points, and theorem 50 (iii) will complete the proof.

    Suppose we have a Steiner system S(5, 8, 24) where we denote its points as a, b, . . . , x,and let O be any block, say O = {abcdefgh}. Let i be a point that is not in O and letS1 be the sextet that is defined by {abcd}, say S1 is as in table 3.3. If we let the disjoint

    30

  • sets of 4 points be the columns of a 4× 6-matrix, then we can represent S1 as:

    S1 =

    a e i m q ub f j n r vc g k o s wd h l p t x

    ,where ofcourse permutations inside the columns are allowed. We will now construct sixmore sextets that are determined by O and call them S2, . . . , S7.

    Firstly, we look at the block {bcdei . . .}. By theorem 53 the number of points in whichthis block meets S1 can only be 3

    1 · 15. So, we can say that this block is {bcdeimqu},and that the sextet S2 that is defined by {bcde} is:

    S2 =

    b a i j k lc f m n o pd g q r s te h u v w x

    .If we now put the columns of S2 that do not meet O in a 4× 4-matrix B, we obtain:

    B =

    i j k lm n o pq r s tu v w x

    .From this matrix B we can make several new blocks by adding rows and columns:

    (i) The sums of any two rows of B give us 6 blocks,

    (ii) the sums of any two columns of B give us 6 more, and

    (iii) the sums of (i) and (ii) give us 6·62

    = 182.

    This gives us 6 + 6 + 18 = 30 blocks that are disjoint from O, and in table 3.2 we see thatλ08 = 30, which means that we actually obtained all blocks which are disjoint from O.We now write:

    S1 =

    1 2 3 4 5 61 2 3 4 5 61 2 3 4 5 61 2 3 4 5 6

    ,meaning we set S1 as a ’basis’ matrix. Since permutations inside columns are allowed,the digits in column i all get label i. Then S2 becomes:

    S2 =

    2 1 3 3 3 31 2 4 4 4 41 2 5 5 5 51 2 6 6 6 6

    .2Here we have to divide by two, since otherwise we count all octads twice. For example, rows 1 and 2with columns 1 and 2 yield the same octad as rows 3 and 4 with columns 3 and 4.

    31

  • For example the first column (2 1 1 1)t corresponds to the fact that a is in the secondcolumn in S2, and b, c and d are in its first.By theorem 53 the number of points in which the block {acdei . . .} meets both S1 andS2 can only be 3

    1 · 15, which means that this block is actually {acdeinsx}. The sextetS3 that is generated by {acde} can be of different forms by theorem 53, but if keep inmind that it cannot give rise to any of the 30 blocks as in (i), (ii) or (iii), it can only be:

    S3 =

    1 1 3 4 6 52 2 4 3 5 61 2 6 5 3 41 2 5 6 4 3

    .By taking a closer look to the three sextets S1, S2 and S3, we see that they are invariantunder the following permutations of g, h, . . . , x:

    τ1 = (mqu)(nsx)(jkl)(rwp)(otv)

    τ2 = (mq)(pt)(jk)(vw)(ns)(or)

    τ3 = (gh).

    The block {abcei . . .} meets the first columns of S1, S2 and S3 in three points, so thenumber of points in which it meets either one of these sextets can only be 31 · 15. Thismeans that its last three digits cannot be {jkl}, {nsx} or {mqu}. What remains is {otv}or {prw}. We see that τ2({otv}) = (prw), so we can choose the block {abcei . . .} to be{abceiprw}. Hence, the sextet S4 that is defined by {abce} is given by:

    S4 =

    1 1 3 4 5 61 2 5 6 3 41 2 6 5 4 32 2 4 3 6 5

    .In a similar manner we can see that the sextet S5 that is defined by {abde} is given by:

    S5 =

    1 1 3 4 5 61 2 6 5 4 32 2 4 3 6 51 2 5 6 3 4

    .Now by theorem 53 the number of points in which the block {abefi . . .} meets S1 is24 · 02, and since τ1 leaves S1 unchanged we can assume the four set of two points wherethey meet are in the first four columns of S1. Hence, this block is {abefijmn}, and thisgives rise to the sextet S6:

    S6 =

    1 1 3 3 4 41 1 3 3 4 42 2 5 5 6 62 2 5 5 6 6

    .

    32

  • Finally, in a similiar way we can deduce that the block {aceik . . .} actually is {aceigkmo}by using the fact that τ3 leaves S1 unchanged. This gives rise to the sextet S7:

    S7 =

    1 1 3 3 5 52 2 4 4 6 61 1 3 3 5 52 2 4 4 6 6

    .Now we will show that the sextets S1, . . . , S7 which are uniquely determined by O actuallydetermine all blocks that meet O in 4 points. For this, we firstly notice that the numberof blocks containing four points is λ4 = 5, and the number of ways to choose 4 pointsout of 8 is

    (84

    )= 70, so the number of blocks meeting O in 4 points is (5− 1) · 70 = 280.

    Hence, we will check if S1, . . . , S7 determine 280 different blocks that meet O in 4 points.

    By theorem 50 (ii) the sum of two blocks that meet in 4 points again is a block. Ifwe now add blocks from two sextets that have an intersection matrix of the form asin 3.2, we will obtain new blocks and hence new sextets. An easy verification showsthat only intersection matrices IS1S6 , IS2S6 , IS3S6 , IS1S7 , IS2S7 , IS5S7 and IS6S7 are of thisform. So we have 7 duo’s of matrices from which we can obtain new blocks. For each ofthese duo’s, the number of new blocks that we can make from them that meet O in 4points is 8, but only four of them are different. For example, from the sextets S6 andS7 we can add the following blocks: S6,1 ∪ S6,3 + S7,1 ∪ S7,3, S6,1 ∪ S6,4 + S7,1 ∪ S7,4,S6,1 ∪ S6,5 + S7,1 ∪ S7,5 and S6,1 ∪ S6,6 + S7,1 ∪ S7,6, which are the same as S6,2 ∪ S6,3 +S7,2 ∪ S7,3, S6,2 ∪ S6,4 + S7,2 ∪ S7,4, S6,2 ∪ S6,5 + S7,2 ∪ S7,5 and S6,2 ∪ S6,6 + S7,2 ∪ S7,6.This means that in total we can obtain 7 · 4 = 28 new sextets that are defined by 4points of O. We already had S1, . . . , S7, so in total we now have 35 such sextets. Forany of these 35 sextets there are 8 possibilities of combining two of its tetrads in orderto obtain a block that meets O in 4 points (namely, tetrads 1+3, 1+4, 1+5, 1+6, 2+3,2+4, 2+5 or 2+6), so in total we have now find 35 ·8 = 280 blocks that meet O in 4 points.

    To conclude, the arbitrary block O determines all blocks that meet O in 4 points, whichby theorem 50 (iii) determine all blocks of S(5, 8, 24). So, S(5, 8, 24) is unique.

    Corollary 56. G24 is unique.

    Proof. We know the octads generate G24 and that they form the blocks of the Steinersystem S(5, 8, 24), which by theorem 55 is unique. Hence, G24 is unique.

    Remark 57. Note that G24 is unique up to equivalence, since a permutation of the columnsof a generator matrix G corresponds to a permutation of the digits in the codewords,and thus maps any octad to another octad. However, we already stated in remark 41that in general G24 refers to any of the linear codes that are equivalent to the one indefinition 40, so we can simply say that G24 is unique.

    33

  • 3.2. The Golay code

    Now that we have defined the extended Golay code G24 and discussed its most imporantfeatures, it is easy to construct the Golay code G23.

    Definition 58. The Golay code is the linear code G23 which is obtained from G24 byomitting the last digit from each codeword x ∈ G24.

    Remark 59. This definition is equivalent to stating that G23 is the linear code whosegenerator matrix is obtained from the generator matrix G of G24 by deleting its lastcolumn. Moreover, like the extended Golay code also G23 refers to any of the linear codesthat are linearly equivalent to the one in definition 58, so in fact we can remove any ith

    digit of every codeword x ∈ G24 to obtain ’the’ Golay code G23.This means that the block length of G23 is 23, its dimension remains 12 and the minimumHamming distance is 7. Deleting one digit ofcourse preserves linearity, so G23 is a[23, 12, 7]-code.

    We can also construct G24 from G23 by adding a so called overall parity check : to everycodeword x ∈ G23 we add a 1 as a 24th digit if the Hamming weight of x is odd, and weadd a 0 if the Hamming weight of x is even. This explains why G24 is called the extendedGolay code.

    Theorem 60. G23 is a perfect code.

    Proof. We calculate the sphere packing boundary of G23. Since d = 7 and t = b12(7−1)c =3, we obtain:

    3∑i=0

    (23

    i

    )= 1 +

    (23

    1

    )+

    (23

    2

    )+

    (23

    3

    )= 1 + 23 + 253 + 1771

    = 2048

    = 223−12,

    and by definition 26 it follows that G23 is perfect.

    By remark 38 we know that if a Steiner system S(t, k, v) exists, then a Steiner systemS(t− 1, k − 1, v − 1) exists as well. As the octads of G24 form the blocks of the Steinersystem S(5, 8, 24), a Steiner system S(4, 7, 23) must exist as well. The codewords of G23of Hamming weight 7 actually form its blocks.

    Table 3.4 gives an overview of the properties of the two binary Golay codes, and tabel 3.5gives the weight distributions of the two binary Golay codes:

    34

  • G24 G23

    [n, k, d] [24, 12, 8] [23, 12, 7]perfect no yeserror-correcting capability t 3 3self dual yes noSteiner system S(5, 8, 24) S(4, 7, 23)generators/blocks codewords of weight 8 codewords of weight 7

    Table 3.4.: Properties of G24 and G23.

    Hamming weight i 0 7 8 11 12 15 16 23 24G24 1 - 759 - 2576 - 759 - 1G23 1 253 506 1288 1288 506 253 1 -

    Table 3.5.: Weight distributions of G24 and G23.

    3.3. Decoding

    We use the Golay codes in communicating because they give a small risk of misinterpret-ing a message that was sent through a noisy channel. In this section we describe howthis interpreting, the decoding of received vectors, is being done.

    For both G24 and G23 we know that d = 8, so their error-correcting capability ist = b1

    2(8− 1)c = 3. This means we can correct all vectors that we receive with Hamming

    weight 3 or less. Firstly we explain the decoding procedure for G24.

    We know that the syndrome of a received vector is the same as the syndrome of its errorvector, so we will look at the error vector only. Now suppose that e is an error vector ofweight 3 or less, and write e = (eL | eR). Since the generator matrix G of definition 40 isa parity check matrix as well, there are two ways to compute the syndrome:

    S1(e) = eHt

    = (eL | eR)(B | 112)t

    = (eLBt + eR), and

    S2(e) = eGt

    = (eL | eR)(112 | B)t

    = (eL + eRBt)

    = S1(e)Bt.

    Since wt(e) ≤ 3, either wt(eL) ≤ 1 or wt(eR) ≤ 1:

    35

  • (i) If wt(eL) ≤ 1, then either:• wt(eL) = 0, in which case wt(eR) ≤ 3 and S1(e) = eR, or• wt(eL) = 1, in which case S1(e) = (b + eR), where b is a column of B and

    wt(eR) ≤ 2.

    (ii) If wt(eR) ≤ 1, then either:• wt(eR) = 0, in which case wt(eL) ≤ 3 and S2(e) = eL, or• wt(eR) = 1, in which case S2(e) = (eL + b), where b is a column of B and

    wt(eL ≤ 2).Hence, the syndrome is a column of B with at most two digits changed.

    This allows us to construct the following algorithm to decode a given vector r when usingthe extended Golay code:

    Algorithm 61. Decoding the extended Golay code G24.For a received vector r ∈ F242 :

    (i) Compute the syndrome S1(r) = rHt = r(B | 112)t.

    (ii) If wt(S1(r)) ≤ 3, then the error vector e = (0 | S1(r)) and you can complete thedecoding as in algorithm 24.

    (iii) If wt(S1(r)) > 3, then compute wt(S1(r) + bi) where bi is the ith column of B

    for all i = 1, . . . , 12. If wt(S1(r) + bi) ≤ 2 for some i, then the error vector ise = (S1(r) +Bi | δi), where δi is the vector in F242 with 1 on position i and 0 on allother positions. If it holds that wt(S1(r) + bi) ≤ 2 for more than one i, you choosethe one(s) with the smallest Hamming weight. You complete as in algorithm 24.

    (iv) If wt(S1(r) + bi) > 3 for all i = 1, . . . , 12, then compute the syndrome S2(r) =S1(r)B

    t.

    (v) If wt(S2(r) ≤ 3) then the error vector e = (S2(r) | 0) and you complete as inalgorithm 24.

    (vi) If wt(S2(r) > 3), then compute wt(S2(r)+bi) for all i = 1, . . . , 12. If wt(S2(r)+bi) ≤2 for some i, then the error vector e = (δi | S2(r) + bi), and if for more than one i itholds that wt(S2(r) ≤ 2), then you choose the one(s) with the smallest Hammingweight. You complete as in algorithm 24.

    Example 62. Suppose we receive a vector r with syndrome S1(r) = (100011010010).We will decode it using algorithm 61.

    (i) S1(r) = (100011010010).

    (ii) wt(S1(r)) = 6 > 3.

    (iii) • wt(S1(r) + b1) = wt(010100010111) = 6 > 2,

    36

  • • wt(S1(r) + b2) = wt(001101011001) = 6 > 2,• wt(S1(r) + b3) = wt(111111000101) = 8 > 2,• wt(S1(r) + b4) = wt(011011111111) = 10 > 2,• wt(S1(r) + b5) = wt(010010001001) = 4 > 2,• wt(S1(r) + b6) = wt(000001100101) = 4 > 2,• wt(S1(r) + b7) = wt(100110111101) = 8 > 2,• wt(S1(r) + b8) = wt(101000001111) = 6 > 2,• wt(S1(r) + b9) = wt(110101101011) = 8 > 2,• wt(S1(r) + b10) = wt(111000110001) = 6 > 2,• wt(S1(r) + b11) = wt(111101000100) = 6 > 2,• wt(S1(r) + b12) = wt(011001011001) = 6 > 2.

    (iv) wt(S1(r) + bi) > 3 for all i = 1, . . . , 12, so we compute the syndrome S2(r) =S1(r)B

    t = (100110100011).

    (v) wt(S2(r)) = 6 > 2.

    (vi) • wt(S2(r) + b1) = wt(010001100010) = 4 > 2,• wt(S2(r) + b2) = wt(001000101100) = 4 > 2,• wt(S2(r) + b3) = wt(111010110000) = 6 > 2,• wt(S2(r) + b4) = wt(011110001010) = 6 > 2,• wt(S2(r) + b5) = wt(010111111100) = 8 > 2,• wt(S2(r) + b6) = wt(000100010000) = 2,• wt(S2(r) + b7) = wt(100000001100) = 3 > 2,• wt(S2(r) + b8) = wt(101101111110) = 9 > 2,• wt(S2(r) + b9) = wt(110000011010) = 5 > 2,• wt(S2(r) + b10) = wt(001011010010) = 5 > 2,• wt(S2(r) + b11) = wt(111101000000) = 5 > 2,• wt(S2(r) + b12) = wt(011001011101) = 7 > 2.

    Our error vector is e = (δ6 | S2(r)) = (000001000000 | 000100010000), the originalcodeword is x = r− e, and therefore the original message is m = xL.

    For the decoding procedure of G23 we make use of the fact that G24 is obtained from G23by adding an overall parity check to its codewords: if we add a 24th digit to a receivedvector r ∈ F232 , we can use the standard array.

    37

  • Note that all codewords of G24 have even Hamming weight, so if a received vector inF242 has odd Hamming weight then we know that an odd amount of errors has occured.

    Since the error-correcting capability of G23 too is 3, we assume that at most 3 errorsoccured in the 23 digits of r. Now we want to choose the 24th digit r24 in such a waythat wt(r1 . . . r23 r24) is odd, for then we know that the Hamming weight of its errorvector is at most 3, and we can use the standard array. Moreover, since G23 is perfectand t = b1

    2(7− 1)c = 3, we can correct all vectors that we receive with Hamming weight

    3 or less uniquely.

    The algorithm for decoding both Golay codes by using GAP can be found in [14].

    Now that we have explored the most important properties of the Golay codes, we willdiscuss a group which has an interesting connection to G24, the Mathieu group M24.

    38

  • Chapter 4

    The Mathieu Group M24

    4.1. Introduction

    The Mathieu groups are five groups which are commonly denoted by M11,M12,M22,M23and M24. Their characterizing features are very rare, and as a result they do not belongto any of the larger families in the classification of all finite simple groups. Therefore,they are called sporadic. This classification is as follows:

    Any finite simple group belongs to either one of the following categories:

    • Cyclic groups Cp, where p is prime,

    • Alternating groups An, where n ≥ 5,

    • Lie-type groups, including both:– Classical Lie-groups, and

    – Exceptional and twisted groups of Lie-type,

    • Sporadic groups.

    The last category consists of a collection of groups that do not have common charac-teristics with the first three categories, but cannot be charcaterized as a fourth familyeither. The Mathieu groups are multiple transitive permutation groups, but they arenot alternating, which is the reason why they do not fall into either of the first threecategories. These characteristics will be explained in detail in the remainder of thischapter.

    Mathieu first discovered M12 in 1861, and already briefly mentioned M24. For a longtime the existence of M24 was controversial, and in 1898 Miller even showed that thisgroup could not exist. However, two years later he found a mistake in his proof. Itwas not until 1938 when Witt finally proved its existence, by showing that it was anautomorphism group of the Steiner system S(5, 8, 24). Since the blocks of this Steiner

    39

  • system are the octads of G24, we will devote this chaper to M24.

    In section 4.2 we will discuss some definitions and results that allow us to describe theconstruction of M24 in section 4.3, and to prove its multiple transitivity and simplicity insections 4.4 and 4.5.

    4.2. Definitions

    We will first discuss the most important definitions and results which we will need inthe remainder of this chapter. To start with, as M24 is the automorphism group of theextended Golay code, we will define what an automorphism group of an error-correctingcode is.

    Definition 63. The automorphism group of a linear code C is the group of all permuta-tions of the digits of the codewords that map C to itself. It is denoted by Aut(C).

    Remark 64. We can view Aut(C) as a subgroup of Sn, where n is the block length of C:σ(x) = (xσ(1) . . . xσ(n)) for any σ ∈ Aut(C).The theorems and lemmas in this sextion will be convenient in the proofs of M24’s order,multiple transitivity and simplicity. We use the following notation:

    For a group G of permutations on a set X it holds that:

    • The point-wise stabilizer of an element x ∈ X is given by Stab(x) = {g ∈ G |g(x) = x}.

    • The set-wise stabilizer of a subset Y in X is given by Stab(Y ) = {g ∈ G | g(y) ∈Y ∀y ∈ Y }.

    • The orbit of a subset Y in X under G is given by Y G = {g(Y ) | g ∈ G} = {{g(y) |y ∈ Y } | g ∈ G}.

    • The normalizer of a subgroup H of G is given by NG(H) = {g ∈ G | gH = Hg}.Note that NG(H) always contains H and that H is a normal subgroup if and onlyif NG(H) = G.

    Theorem 65 (Orbit-Stabilizer theorem). For a group G of permutations on a set Xand any Y ⊂ X it holds that:

    #G = #Stab(Y ) ·#Y G.

    Proof. For any g, h ∈ G we have that gY = hY if and only if gh−1 ∈ Stab(Y ), so thesize of Y G is equal to the number of cosets of Stab(Y ) in G, i.e. #Y G = [G : Stab(Y )] =

    #G#Stab(Y )

    .

    Definition 66. Let G be a group of permutations that acts on a set X. Then G istransitive on X if it has one orbit, i.e. if xG = X for all x ∈ X.

    40

  • One of the characterizing properties of the Mathieu groups is mutiple transitivity, whichis defined as follows:

    Definition 67. Let k be any integer and let G be a group of permutations that actson a set X with #X ≥ k. Then G is k-fold transitive if for any two ordered k-tuples(x1, . . . , xk) and (y1, . . . , yk) in X with xi 6= xj and yi 6= yj for all i 6= j, there is apermutation σ ∈ G such that σ(xi) = yi for i = 1, . . . , k.

    Lemma 68. A group of permutations G that acts on a set X with #X ≥ 3 is k-foldtransitive for 2 ≤ k ≤ #X if and only if the stabilizer of any k−1 points x1, . . . , xk−1 ∈ Xis transitive on X\{x1, . . . , xk−1}.

    Proof. ”⇒” Assume that G is k-fold transitive. Then for any two ordered k-tuples(x1, . . . , xk−1, w) and (x1, . . . , xk−1, z) in X with w, z 6= x1, . . . , xk−1 there is a permutationσ ∈ G such that σ(xi) = xi for all i = 1, . . . , k−1 and σ(w) = z. Hence, Stab(x1, . . . , xk−1)acts transively on X\{x1, . . . , xk−1}.”⇐” Now assume that for any x1, . . . , xk−1 ∈ X with xi 6= xj for all i 6= j we have thatStab(x1, . . . , xk−1) acts transitively on X\{x1, . . . , xk−1}. Let (x1, . . . , xk) and (y1, . . . , yk)be two ordered k-tuples in X such that xi 6= xj and yi 6= yj for all i 6= j.Firstly suppose that yk 6= x1, . . . , xk−1. Then there are σ ∈ Stab(x1, . . . , xk−1) and τ ∈Stab(yk) such that

    τ ◦ σ(x1, . . . , xk) = τ(x1, . . . , xk−1, yk) = (y1, . . . , yk).

    Now if yk = xi for some i = 1, . . . , k, then we make an additional step: choose somed ∈ X such that d 6= xi, y1, . . . , yk−1, which is possible since k ≤ #X. Then there areσ ∈ Stab(x1, . . . , xk−1), τ ∈ Stab(d) and υ ∈ Stab(y1, . . . , yk−1) such that:

    υ ◦ τ ◦ σ(x1, . . . , xk) = υ ◦ τ(x1, . . . , xk−1, d) = υ(y1, . . . , yk−1, d) = (y1, . . . , yk).

    So, G is k-fold transitive.

    Lemma 69. Let G be a group of permutations on a set X with #X ≥ k, that is k-foldtransitive for some k ≥ 4. If there is an x ∈ X such that Stab(x) is simple, then G issimple.

    Proof. See [19] p. 263.

    Definition 70 (Sylow p-subgroup). Let p be a prime number and let n, k be integerssuch that p - k. If G is a finite group with #G = kpn, then a Sylow p-subgroup Sp is asubgroup of G such that #Sp = pn. I.e. it is a subgroup whose order is the highest powerof p that divides the order of G.

    Theorem 71 (Sylow’s theorem). Let p be any prime number and let n, k be integerssuch that p - k. For any finite group G with #G = kpn it holds that:

    (i) G has at least 1 Sylow p-subgroup,

    41

  • (ii) The number of Sylow p-subgroups is congruent to 1 mod p,

    (iii) All Sylow p-subgroups of G are conjugate, i.e. if Sp and S ′p are Sylow p-subgroupsthen there is a g ∈ G such that g−1Spg = S ′p.

    Proof. See [8], p. 99-102.

    Since G24 is generated by the octads, M24 must be the set of permutations of S24 thatmap each octad to an octad. But what exactly do these permutations look like? A fewof them are found in a projective special linear group.

    Definition 72. The projective special linear group PSL2(23) is given by the set of2× 2-matrices with entries in F23 and determinant 1, that are factored out by the scalarmatrices:

    PSL2(23) =

    {(a bc d

    ) ∣∣∣∣ a, b, c, d ∈ F23, ad− bc = 1}/〈λ12 | λ = 1,−1〉.Let P (F23) be the projective line over F23, so P (F23) = {0, 1, . . . , 22,∞}. Then PSL2(23)acts on P (F23) as follows: (

    a bc d

    )x =

    ax+ b

    cx+ d,

    for all x ∈ P (F23). Hence, PSL2(23) is a set of permutations on the 24 points in P (F23).

    One of the elements of PSL2(23) is τ : by a simple relabeling of the columns of G = (L | R)in theorem 43, we can write:

    τ = (∞ 0)(1 22)(2 11)(3 15)(4 17)(5 9)(6 19)(7 13)(8 20)(10 16)(12 21)(14 18),

    i.e. τ(i) = −i−1 mod 23, for i ∈ P (F23). In matrix notation this becomes:

    τ =

    (0 1−1 0

    ),

    and since det(τ) = −1 · (1 · −1) = 1, we have that τ ∈ PSL2(23).

    Theorem 73. PSL2(23) = {υiσj, υiσjτσk | 0 ≤ i ≤ 10, 0 ≤ j, k < 23}, where:

    σ =

    (1 10 1

    )υ =

    (ε 00 ε−1

    )τ =

    (0 1−1 0

    ),

    and ε is a primitive element of F23.

    42

  • Proof. For any element

    (a bc d

    )∈ PSL2(23) and any x ∈ P (F23) we have:

    (i) If c = 0 then d = 1a, so

    (a b0 d

    )x = a2x+ ab,

    (ii) If c 6= 0 then b = adc− 1

    c, so:(a ad

    c− 1

    c

    c d

    )x =

    a(cx+ d)− 1c(cx+ d)

    =a

    c− 1c2x+ cd

    .

    This shows that PSL2(23)x = {a2x+ ab | a ∈ F∗23, b ∈ F23} ∪ {ac −1

    xc2+cd| a, b, c ∈ F23},

    for any x ∈ P (F23).

    Actually, we can show that:

    {a2x+ ab | a ∈ F∗23, b ∈ F23} = {υiσj(x) | 0 ≤ i ≤ 10, 0 ≤ j < 23} and (4.1){a

    c− 1xc2 + cd

    ∣∣∣∣ a, b, c ∈ F23} = {υiσjτσk | 0 ≤ i ≤ 10, 0 ≤ j, k < 23}, (4.2)which will prove the theorem. For this firstly note that:

    {a2 | a ∈ F∗23} = {2i mod 23 | 0 ≤ i ≤ 10}= {ε2i mod 23 | 0 ≤ i ≤ 10},

    so for any x ∈ P (F23):{a2x+ ab | a ∈ F∗23, b ∈ F23} = {ε2ix+ εib | 0 ≤ i ≤ 10, b ∈ F23}, and{a

    c− 1c2x+ cd

    ∣∣∣∣ a, c, d ∈ F23} = { aεi − 1ε2ix+ εd∣∣∣∣ 0 ≤ i ≤ 10, a, d ∈ F23}.

    Now if we let a = εi for any 0 ≤ i ≤ 10, then:υiσab(x) = σab(ε2i(x)) = ε2ix+ ab = ε2ix+ εib,

    and since b ∈ F23 we have that 0 ≤ ab < 23, which proves (4.1). Also, if we let c = εi forany 0 ≤ i ≤ 10, then:

    υiσcdτσac (x) = σcdτσ

    ac (ε2ix)

    = τσac (ε2ix+ cd)

    = σac (− 1

    ε2ix+ cd)

    = − 1ε2ix+ cd

    +a

    c

    =a

    εi− 1ε2x+ cd

    ,

    and since a, d ∈ F23 we have that 0 ≤ cd, ac < 23, which proves (4.2).

    43

  • 4.3. Construction

    We have now obtained the necessary definitions and results, so we can go on to constructthe Mathieu group M24. First of all, we have seen that τ preserves G24, so it is an elementof M24. The following theorem says that also the other two generators of PSL2(23), σand υ, are elements of M24.

    Theorem 74. G24 is preserved by PSL2(23).

    Proof. We know that PSL2(23) is generated by τ, σ and υ, and theorem 43 says thatτ preserves G24. So what remains is to prove that σ and υ preserve G24 too. For this,we consider G23 as QR-code with the digits of the codewords labeled as {0, 1, . . . , 22}.Then G24 is the code that is obtained by adding an overall parity check to the codewordsof G23, which is labeled as ∞. Then for any µ ∈ PSL2(23) and x ∈ G24, the action ofPSL2(23) on G24 is as follows:

    µ(x) = µ(x0 x1 · · ·x22 x∞) = (xµ(0) xµ(1) · · ·xµ(22) xµ(∞)).

    We can easily see that σ is a cyclic shift that fixes ∞, so G24 is fixed by σ. For anypolynomial f(X) ∈ F232 [X]/〈X23 − 1〉 it holds that:

    υ(f(X)) = f(X2) = f(X)2,

    so υ fixes G23. Also, υ(∞) = ε2 · ∞ =∞, so υ fixes G24.

    We will show that there is actually one more permutation ω ∈ S24 that fixes G24 and,together with σ, υ and τ , generates M24. This would then prove that M24 fixes G24, soM24 ⊂ Aut(G24).

    If we again consider G23 as QR-code then we have:

    Q = {12, . . . , 222 mod 23}= {1, 2, 3, 4, 6, 8, 9, 12, 13, 16, 18}, and

    N = F23\(Q ∪ {0})= {5, 7, 10, 11, 14, 15, 17, 19, 20, 21, 22}.

    We define the permutation ω of P (F23) as:

    ω : i 7→

    ∞ if i = 0−( i

    2)2 if i ∈ Q

    (2i)2 if i ∈ N0 if i =∞.

    An easy verification shows that ω(Q) = N and that ω(N) = Q.

    44

  • Theorem 75. G24 is invariant under ω.

    Proof. We will make use of a polynomial φ(X) that is defined as follows:

    φ(X) :=∑i∈N

    X i.

    If this polynomial generates G23, then we can write any polynomial in G24 as φ(X) ·X i +X∞ for i = 0, . . . , 22. Hence, if we prove that ω(φ(X)·X i+X∞) ∈ G24 for all i = 0, . . . , 22we are done. For this we notice that F23 = {2i mod 23 | i = 1, . . . , 11} ∪ {2i · 22 mod 23 |i = 1, . . . , 11}, and we make use of the fact that υω = ωυ2, so for all 1 ≤ i ≤ 11 it holdsthat:

    ω(φ(X) ·X2i +X∞) = (υω)(φ(X) ·X i +Xω)= (ωυ2)(φ(X) ·X i +X∞).

    Since υ and therefore υ2 fixes G24, this means it suffices to prove that ω(g(X)·X i+X∞) ∈G24 for i = 0, 1, 22. These 3 cases give:

    ω(φ(X) +X∞)) =∑i∈N

    Xω(i) +X0

    =∑i∈Q

    X i +X0

    =∑i∈Q

    X i +X0 +∑i∈N

    X i +∑i∈N

    X i +X∞ +X∞

    =

    (∑i∈N

    X i +X∞)

    +

    (∑i∈Q

    X i +∑i∈N

    X i +X0 +X∞)

    =(φ(X) +X∞) +∑

    i∈P (F23)

    X i

    =(φ(X) +X∞) + 1 ∈ G24,

    ω(φ(X) ·X +X∞)) =∑i∈N

    Xω(i+1 mod23) +X0

    = Xω(6) +Xω(8) +Xω(11) +Xω(12) +Xω(15) +Xω(16)

    +Xω(18) +Xω(20) +Xω(21) +Xω(22) +Xω(0) +X0

    = X14 +X7 +X1 +X10 +X3 +X5

    +X11 +X13 +X16 +X4 +X∞ +X0

    = (φ(X) ·X2 + φ(X) ·X11 + φ(X) ·X20 +X∞) ∈ G24, and

    ω(φ(X) ·X22 +X∞) =∑i∈N

    Xω(i+22 mod23) +X0

    = Xω(4) +Xω(6) +Xω(9) +Xω(10) +Xω(13) +Xω(14)

    +Xω(16) +Xω(18) +Xω(19) +Xω(20) +Xω(22) +X0

    45

  • = X19 +X14 +X20 +X9 +X21 +X2

    +X5 +X11 +X18 +X13 +X4 +X0

    = (φ(X) + φ(X) ·X + φ(X) ·X20 + φ(X) ·X22 +X∞) ∈ G24.

    Now what remains to prove is that φ(X) indeed generates G23. For this, notice that inF232 [X]/〈X23 − 1〉 we have that X23 − 1 = X23 + 1 = g(X)h(X), where:

    g(X) = X11 +X10 +X6 +X5 +X4 +X2 + 1, and

    h(X) = (X11 +X9 +X7 +X6 +X5 +X + 1)(X + 1).

    This means that we can choose the generator polynomial to be either g(X) or X11 +X9 +X7 +X6 +X5 +X + 1. We take G23 = 〈g(X)〉. Since g(X) and h(X) are relativelyprime there are f(X), f ’(X) such that:

    1 = f(X)g(X) + f ’(X)h(X).

    Moreover, we can choose f(X) such that:

    φ(X) = f(X)g(X).

    Since g(X) and h(X) are relatively prime, also f(X) and h(X) are relatively prime, sothere are p(X),p’(X) such that:

    1 = p(X)f(X) + p’(X)h(X), so

    g(X) = (p(X)f(X) + p’(X)h(X))g(X)

    = p(X)f(X)g(X) + p’(X)h(X)g(X)

    = p(X)f(X)g(X),

    since p’(X)h(X)g(X) = p’(X) (X23+1)

    g(X)g(X) = 0 in F232 [X]. Hence, 〈g(X)〉 ⊂ 〈g(X)r(X)〉 =

    〈φ(X)〉. Obviously, 〈φ(X)〉 ⊂ 〈g(X)〉, so G23 = 〈φ(X)〉, which completes the proof.

    We now know that 〈σ, υ, τ, ω〉 ⊂M24. What remains to prove is that there is actually noother permutation in S24 that fixes G24, so that we are sure that M24 = 〈σ, υ, τ, ω〉. Forthis, we will find the order of 〈σ, υ, τ, ω〉 and conclude that it must be equal to the sizeof M24.

    Theorem 76. #〈σ, υ, τ, ω〉 = 244823040.

    Proof. Suppose Y is a block in the Steiner system S(5, 8, 24), say Y = {abcdefgh}, wherea, b, c, d, e, f, g, h ∈ {0, 1, . . . , 22,∞} and they are all distinct. Let H be the subset ofStab(Y ) that fixes an additional i ∈ {0, 1, . . . , 22,∞}, so H = {g ∈ Stab(Y ) | g(i) = i}.By lemma 65 we have that #〈σ, υ, τ, ω〉 = # Stab(Y ) ·#Y 〈σ,υ,τ,ω〉. Since Y 〈σ,υ,τ,ω〉 is theorbit of Y under 〈σ, υ, τ, ω〉 and Y is a block, we have that #Y 〈σ,υ,τ,ω〉 is equal to thenumber of blocks in the Steiner system S(5, 8, 24). From table 3.1 we know that thisnumber is equal to 759, so:

    #〈σ, υ, τ, ω〉 = #Stab(Y ) · 759.

    We will find Stab(Y ) in four steps, where we follow the proof as in [13], p. 639-640:

    46

  • (i) Firstly we prove that #Stab(Y ) = 16 ·#H,

    (ii) Then we show that #H ≤ 20160 by proving that H is isomorphic to a subgroup ofGL4(F2),

    (iii) Thirdly we look at the action of H on Y to prove that H contains a subgroup thatis isomorphic to A8,

    (iv) We conclude by comparing the orders of the above mentioned groups.

    Step (i).〈σ, υ, τ, ω〉 contains a cykel of type 1 · 3 · 5 · 15, for example:

    ω−2σ11 = ((∞)(0)(3)(15)(1 18 4 2 6)(17 11 19 22 14)(5 21 20 10 7)(8 16 13 9 12))((∞)(0 11 22 10 21 9 20 8 19 7 18 6 17 5 16 4 15 3 14 2 13 1 12))

    = (∞)(2 17 22)(4 13 20 21 8)(0 11 7 16 1 6 12 19 10 18 15 3 14 5 9).

    Notice that for any two cycles τ1 and τ2, we have that τ−11 τ2τ1 is obtained from τ2 by

    applying τ1 to the symbols in τ2. For example, if τ1 = (a b c d) and τ2 = (a c)(e d b),then τ−11 τ2τ1 = (b d)(e a c). So, by conjugating, we may assume that the cykel of size 5is (a b c d e) and the one of size 1 is (i). Since Y is a block of S(5, 8, 24) it must be fixedby this permutation, so the cykel of size 3 is (f g h). We call this permutation π, andnote that it is contained in Stab(Y ).Now since:

    ω−2σ5 = ((∞)(0)(3)(15)(1 18 4 2 6)(17 11 19 22 14)(5 21 20 10 7)(8 16 13 9 12))((24)(1 6 11 16 21 3 8 13 18 23 5 10 15 20 2 7 12 17 22 4 9 14 19))

    = (∞)(1)(10 15)(4 12 11 13)(2 5 7 8 9 17 14 22)(0 21 3 16 20 6 19 18),

    We see that 〈σ, υ, τ, ω〉 also contains a cykel of type 12 · 2 · 4 · 82. By conjugating wemay assume that Y consists of the cykel of size 4 and a cykel of size 1. Then, sinceY is a block, it must contain the other cykel of size 1 and the cykel of size 2 as well.Hence, Stab(Y ) contains a permutation that fixes Y set-wise and permutes all otherpoints in two cykels of size 8. This means that Stab(Y ) is transitive on the remaining 16points, so #{i}Stab(Y ) = 16. H is the subgroup of Stab(Y ) that fixes {i}, so by lemma 65#Stab(Y ) = 16 ·#H.

    Step (ii).If we look at the collection of codewords of G24 for which a = b = . . . = i = 0, thenwe obtain a linear code of length 15. (Obviously, these entries will remain 0 wheneverwe add such codewords or if we multiply them with a constant.) Since {abcdefgh} is ablock and G24 is self-dual, the size of this code is

    212

    28= 24. Since the codewords of G24

    have Hamming weight 0, 8, 12, 16, or 24, this new code can only contain codewords ofHamming weight 0, 8 or 12. However, if it contains a codeword of Hamming weight 12,then G24 must contain a codeword x with wt(xL) = 4 of Hamming weight 20 as well,

    47

  • which is a contradiction. So, this new code is a [15, 4, 8]-code that contains 15 codewordsof Hamming weight 8. Now by [13] p.639 this code has automorphism group GL4(F2),which has order

    ∏4i=1(2

    4 − 2i−1) = 20160.If we now look at two permutations g and h in H that give the same per