Parallel decoding of the [23, 12, 7] binary Golay code

Parallel decoding of the [23, 12,7] binary Golay code

S.M. Dodunekov J.E.M. Nilsson

Indexing t e r m : Golay binary code, Parallel decoding, Decoding complexity

Abstract: This paper deals with the decoding of the [23,12,7] binary Golay code. Recently, S.W. Wei and C.H. Wei suggested a step-by-step decoding algorithm. We present an improvement in the comparison circuit of their algorithm and construct a very high-speed parallel decoder.

1 Introduction

The [23,12, 71 binary Golay code is among the most important codes both from a theoretical and a practical point of view. It was described by Golay [l] and, as was shown in References 2 and 3, it is the unique multiple- error-correcting binary perfect code. Several applications of the code are listed in Reference 4.

Two step-by-step decoders of the binary Golay code have recently been developed [4-61, thus simplifying the earlier decoder suggested by Chien and Lum [SI. The algorithm in Reference 5 applies, as a first step, the BCH decoding algorithm [7] for two-error correction. The algorithm given in Reference 6 explores, as an indicator for the number of errors, the algebraic decoder of Elia [9]. Also see the details in Reference 10.

In this paper we show that it is possible to avoid the most time and space consuming operations, the inversion and cubic root computation, in the comparison circuit of the decoder of Wei and Wei [6]. This makes it possible to construct a simpler and faster sequential step-by-step decoder than the one described in Reference 6 and to present a high-speed parallel decoder of the [23,12, 71 binary Golay code. Our idea is to invert all the information positions in parallel and check the results simultan- eously. The complexity of the decoder is evaluated as in References 11 and 12. For all facts and notations from coding theory, which are not introduced here, Reference 13 is taken as our guide. Part of the results was presented in Reference 15.

2 Preliminaries

The [23,12,7] binary Golay code Y,, can be described as a quadratic-residue code Cl3, p. 4821, i.e. a cyclic code

0 IEE, 1994 Paper 1002E (C3, E7), first received 6th July 1992 and in revised form 27th September 1993 S.M. Dodunekov is at the Institute of Mathematics, Bulgarian Academy of Sciences, 11 13 Sofia, Bulgaria J.E.M. Nilsson is at the German Aerospace Research Establishment (DLR), Institute of Communications Technology, D-8031 Oberpfaffen- hofen, Germany

I E E Proc.-Comput. Digit. Tech., Vol. 141, No. 2, March 1994

of length 23, generated by the polynomial

g(x) = X I 1 + x9 + x' + x6 + xs + x + 1 which is irreducible over GF(2). Equivalently, as was proved in Reference 14, YZ3 can be generated by

g(x) = X l l + X I 0 + x6 + xs + x4 + x2 + 1 Note that

xZ3 - 1 = (x - 1Xx" +XI" + x6 + x5 + x4 + xz + 1)

x ( x l l + x9 + x7 + x6 + x5 + x + 1)

Assume for simplicity that the code Y,, is presented in a systematic way and let

i(x) = C O + clx + " ' + CllX1l

p(x) = CIZ X I 2 + . ' ' + e,, x2,

and

be the information and the parity-check part of a code- word c i x i . Let e(x) = c:20 eixi be the error polynomial. Then the received polynomial is

fix) = 44 + e(x)

Take a to be a primitive 23rd root of unity. Note that a E GF(2l'). Consider the syndromes Si = e(ai), i # 0. Since g(a9 = 0 for j E (1, 2, 3, 4, 6, 8, 9, 12, 13, 16, 18) = Q. the syndromes Sj can be calculated from the received vector for each j E Q by

Sj = e(a') = fiaj) + c(aj) = y(a')

3 Computing the indicator

As was proved in Reference 9 (see also Reference [lo]), the error locator polynomial [13, p. 2441 for 923 is

4 2 ) = 1 + u,z + u 2 2 + u3z3

where

u1 = S , and a,=u,=O

if S3 = S:, and

U , = s, U, = s: + D1l3 U 3 = s, + S,D'/3

This work, initially presented at an international workshop on algebraic and combinatorial coding theory (Voneshta Voda, Bulgaria, 22nd-28th June 1992), was partially supported by the Bulgarian National Science Foundation under contract No. 35/1991. The authors would also like to thank the anonymous referees for their valuable comments.

119

otherwise. The value of D is given by

D = [(S: + S 3 ) , + (S: + S9)]/(S: + S , )

Since the code is perfect, S, # S: indicates that at least two errors have occurred and U, = 0 indicates that no more than two errors have occurred. Therefore, we can use the following indicators I j for the number j of errors ( I j = 1 iff j errors have occurred; otherwise Zj = 0) [SI:

I, = 1 iffS, = 0

I , = 1 iff I, = 0, S , = S :

I, = 1 iff I , = I, = 0, S, = S,D1'3

I , = 1 i f I , = I , = I , = o

(1)

All computations are in the field GF(2") [9, lo]. It is clear, that the most time and space consuming part of the decoder is the computation of I, in eqns. 1, since inversion and computation of a cubic root in the field GF(2,') are needed. Using that a3 = b3 iff a = b for any a, b E GF(2,') , instead of equality S, = S,D"3 in eqns. 1 we could consider equivalently S: = S:D. Therefore we can replace the conditions on I, in eqns. 1 by

I, = 1 iffl, = I , = O,N = 0

where

N = S:S, + SYS: + S:S9 + S:

4 Parallel decoder

The idea of step-by-step decoding is to invert a position, calculate the indicators, and compare with the case when no position is inverted. By repeating this procedure for all information positions the error pattern can be found. Because the [23, IZ, 71 binary Golay code is perfect, the distance of an arbitrary word y to the unique-closest code word is 0, 1, 2 or 3. Hence, we can assume that no more than three errors occurred. Consequently, the distance from the received vector y to the transmitted code- word for all possible error patterns follows from the indicators. Then simply, if an erroneous position is inverted the indicators show that the number of errors is reduced. On the other hand, if a correct position is inverted the indicators show that the number of errors is unchanged (if = 3 before) or increased (if i 3 before), see Reference 5. Clearly, if more than three errors have occurred the correct error pattern will not be found.

Let us denote by I?) and Sf) the value of the indicator and the syndrome when position k of the received vector y is inverted. Recall that the code is presented in a systematic way and that the first 12 positions are information positions.

Algorithm Step I : Calculate the syndromes Sj, S?) , j = 1, 3, 9. Step 2: Calculate I , , I,, I,, I , and Z$), I?), I p ) ,

Step 3 : For each k, 1 Q k < 12, invert position k iff

Read out the information part i(x) of y(x). End.

1 Q k Q 12 in 13 parallel branches.

31sjs3[rj = 1 AI?!, = 13.

Notice that the knowledge of the states of the indicators Ij, 0 d j Q 3 can be used to speed up the algorithm in case of less than three errors. For example, if I, = 1 we can directly read out the information part. For an outline

I20

of the implementation for the algorithm see Fig. 2. Clearly, the extended Golay [24,12,8] code 9,., can also be decoded by the algorithm by adding an additional check in the same way as in Reference 5.

5 Decoding complexity

The time complexity T and the space complexity S of some basic computational units in the fields GF(2") are summarised in the Appendix. (The subscripts show the unit whose complexity is considered). As space unit a two-input logic gate is used and as time unit the propagation delay of a two-input logic gate is used. From the Appendix we conclude that multipliers and inverters have the highest complexity. Especially element inversion in large finite fields is complicated and should be avoided if possible. When it is necessary to store intermediate results during the calculations (for example when the same multiplier is used for two multiplications) the fast parallel memory can be used.

Let us assume that the syndromes 'S,' are calculated online (i.e. while y is received). This is essential for low decoding delay and high decoding speed. The time complexity of the decoder will be estimated as the time required for calculating I,. This is the time required for finding the error pattern after y is received. The space complexity will be roughly estimated as 13 times the hardware required for calculating I,, I,, I,, I, and the hardware required for calculating ' S i . The space complexity of adders, parallel memories, comparison units and registers (to store y) are not included.

The time and space complexity of the parallel decoder depend on the option selected to compute N. One implementation, which is fast and straightforward, is shown in Fig. 1. The square units are very simple. In polynomial basis only a few XOR gates (exclusive-or gates) are required in GF(211) [ll]. In normal basis they can be replaced by a unit performing a cyclic shift. The comparison circuit of Fig. 1 is about two times as fast as the comparison circuit suggested in Reference 6. The longest computational path of Reference 6 is determined by three adders, one multiplier and three ROM circuits while the longest computational path of Fig. 1 consists of three

Itero- 1 j zero- 1 I zero- 1 checker checker checker

Fig. 1 Comparison circuit ofC23, 12, 71 Goby code - 11-bits path E4 parallel multiplier in GF(2")

parallel square unit 8 parallel adder in GF(2")

IEE Proc-Comput. Digit. Tech., Vol. 141, No. 2, Mmch 1994

adders and two multipliers. Furthermore our circuit has a much lower space complexity.

In a parallel decoder it might be better to use the same multiplier for more than one multiplication and use a

buffer 1 I

buffer 2

1 I l l

corrected + info. i(x)

Fig. 2 Outline ofthe parallel implementation for the algorithm

parallel memory. By such a modification the time complexity is increased slightly. However, a large reduction in space complexity of the decoder can be obtained since the comparison circuit is implemented 13 times.

A multiplier network of depth two with three multipliers is sufficient if parallel memories are used to store intermediate results. In multiplier 1 we calculate first S , ‘ S : and thereafter S: . S , ; in multiplier 2 first SI S , and thereafter Sy . S , S , and in multiplier 3 S: . S, after S: is obtained from multiplier 1. However, if we express N in the following form

N = [ ( S , S: + S:)S: + S , ] S : + S:

we can see that four multiplications are sufficient if they are performed in a serial fashion. Notice that one multiplication is required for S:.

For the parallel decoder our suggestions of how to calculate the indicators are:

First option: two serial multiplications in three multipliers.

Second option: four serial multiplications in one multiplier.

The decoding complexity of the first option is

T = 2T,,jl + Tpm + Kmp + Todd = 24 S x 3S, + 13(3Sm,,,) = 3S, + 39S,,,, z 45SmU1,

and

T = 4T,,jr + 3Tp, + zmp + 4T,,d = 47 S z 3S , + 13S,,,, z 19S,,,,

of the second option. Hence, we have a very efficient decoder. The time com-

plexity of the sequential step-by-step decoder is 13 times the time complexity of decoding one information bit. The comparison circuit of the decoder in Reference 6 can be simplified. If we evaluate its complexity in the same way as here, two multipliers, two ROM circuits (to calculate ( . ) - ’ and (.)1’3 and three comparison circuits are required in one branch. The time complexity for decoding one bit is determined by three additions, one comparison, a parallel memory shift, two multiplications and

IEE Proc.-Comput. Digit. Tech., Vol. 141, No. 2, March 1994

two table-lookups (for computation in GF(2”): Si,,., z 93s .....,.). ... ”..,

Hence, the complexity is

T = 13(2’&v.r + 2Tmvjf + T,, + Tmp + 3Tdd) = 624

S % 3S, + (2Sinv,, + 2Sm,jl) X 3S, + 188S,,jr X 194S,,j1 Dependent of the option selected, the parallel decoder is in the range of 13 to 26 times as fast and also simpler to implement in terms of space complexity than the decoder in Reference 6. Hence, by some simple modifications of the indicators used in Reference 6 we have obtained a major improvement in decoding efficiency. The Elia decoder [9] followed by root search is another efficient decoder. However, in terms of speed the parallel decoder is faster. In the Elia decoder the error locator polynomial, given in Section 3, has to be generated, which is more complicated than calculating the indicators in the way presented here. Furthermore, a complex circuit to search or solve for the roots of the error locator polynomial is required.

6 R e f e r e n c e s

I GOLAY, M.J.E.: ‘Notes on digital coding’, Proc. IEEE, 1949, 37, p. 657

2 TIETAVAINEN, A.: ‘On the nonexistence of perfect codes over finite fields’, SIAM J. Appl. Math., 1973, 24, pp. 88-96

3 ZINOVIEV, V.A., and LEONTIEV, V.K.: ‘The nonexistence of Derfect codes over Galois fields’. Probl. Control In,: Theorv. 1973. 2. ~. . . i2), pp. 123-132

4 REED, IS., TRUONG, T.K., YIN, X.Y., and HOLMES, J.K.: ‘A simolified Drocedure for decodine (23. 121 and (24. 121 Golav codes’. TDk progress report 42-97, JanZMar. 1989, Jet Probulsion Lahor- atory,Paiadena, CA

5 REED, I.S., TRUONG, T.K., YIN, X.Y., and HOLMES, J.K.: ‘Decoding the [24, 12, 81 Golay codes’, IEE Proc. E, 1990, 137, (3), pp. 202-206

6 WEI, S.W., and WEI, C.H.: ‘On high-speed decoding of the [23, 12, 71 Golay code’, lEEE Trans. In$ Theory, 1990, IT-36, (3). pp. 692- 695

7 BERLEKAMP, E.R.: ‘Algebraic coding theory’ (McGraw-Hill. New York, 1968)

8 CHIEN, R.T., and LUM, V.: ‘On Golay’s perfect codes and stepby- step decoding’, IEEE Trans. Info. Theory, 1966, IT-12, (2). pp. 403- 404

9 ELIA, M.: ‘Algebraic decoding of the [23, 12.71 Golay code’, IEEE Trans. Info. Theory, 1987, IT-33, (I ) , pp. 150-151

IO BOURS, P., JANSSEN, J.C.M., VAN ASPERDT, M., and VAN TILBORG, H.C.A.: ‘Algebraic decoding beyond eK,, of some binary cyclic codes, when e > eBC,,’, IEEE Trans. Info. Theory, 1990, IT-36, ( I ) , pp. 214-222

1 I MASTROVITO, E.D.: ’VLSI architectures for computations in Galois fields’. PhD thesis (dissertation 242), Linkoping Studies in Science and Technology, Linkopin& 1991

12 YOUZHI, X.: ’Contributions to the decoding of Reed-Solomon and related codes’. PhD thesis (dissertation 257), Linkoping Studies in Science and Technology, Linkoping, 1991

13 MAcWILLIAMS, F.J., and SLOANE. N.J.A.: ‘The theory of error- correcting codes’ (North-Holland, New York, 1977)

14 PLESS, V.: ‘On the uniqueness of the Golay codes’, J. Comb. theory, 1968,5, pp. 215-228

15 DODUNEKOV, S.M., and NILSSON, J.E.M.: ‘Parallel decoding of the Golay C23, 12, 71 binary code’. Proceedings of International workshop on algebraic and combinatorial coding theory, Voneshta Voda, Bulgaria, 22nd-28th June 1992, pp. 59-63

7 Appendix: Complexity of some c o m p u t a t i o n a l uni ts i n GF(2”)

The complexity of the decoder is evaluated as in Refer- ence 12. We take a two-input logic gate as the space unit and the propagation delay of a two-input logic gate as the time unit. The time complexity is the number of time units required to process an input. The space complexity is the number of space units required in the design [12].

121

The field elements are represented in a normal basis. The time complexity T and the space complexity S of some basic computational units in the fields GF(2"') are summarised below [ll, 123. For the code m = 11. The unit whose complexity is considered is shown as sub-

7.7 Adder m-parallel XOR gates:

T, = 1 Sadd = m

7.2 Parallel memory m-parallel memory cells:

script.

Tpm = 2 S,,, = 4m

7.3 Multiplier When 3 < m < 16, the complexity of a parallel multiplier (normal basis) can simply be estimated by (see Reference

122

T,,,, = 8 S,,,,,,, = 3m2

7.4 Inverter Table lookup (ROM):

Tnu,r = m Si = +mzm

7.5 Comparison Compare if two field elements are equal:

T,, z m/2 S,,, = 2m

7.6 Calculation of Si For online calculation during the receipt of y and very high data rate a switching network can be used. The space complexity is

S, = {rnn (S , = 2SmUrl when m = 11 and n = 23)

I E E Proc.-Cornput. Digit, Tech., Vol. 141, No. 2, March I994

Documents

Parallel decoding of the [23, 12, 7] binary Golay code