Novel Interpolation and Polynomial Selection for Low-Complexity Chase Soft-Decision Reed-Solomon Decoding

1318 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 7, JULY 2012

Transactions BriefsNovel Interpolation and Polynomial Selection for

Low-Complexity Chase Soft-DecisionReed-Solomon Decoding

Xinmiao Zhang, Yingquan Wu, Jiangli Zhu, and Yu Zheng

AbstractAlgebraic soft-decision decoding (ASD) of Reed-Solomon (RS)codes can achieve substantial coding gain with polynomial complexity. Par-ticularly, the low-complexity Chase (LCC) ASD decoding has better perfor-mance-complexity tradeoff. In the LCC decoding, test vectors need to beinterpolated over, and a polynomial selection scheme needs to be employedto select one interpolation output to send to the rest decoding steps. Theinterpolation and polynomial selection can account for a significant part ofthe LCC decoder area, especially in the case of long RS codes and large .In this paper, simplifications are first proposed for a low-complexity poly-nomial selection scheme. Then a novel interpolation scheme is developed bymaking use of the simplified polynomial selection. Instead of interpolatingover each vector, our scheme first generates information necessary for thepolynomial selection. Then only the selected vectors are interpolated over.The proposed interpolation and polynomial selection schemes can lead to162% higher efficiency in terms of throughput-over-area ratio for an ex-ample LCC decoder with for a (458, 410) RS code over .

Index TermsAlgebraic soft-decision decoding (ASD), interpolation,polynomial selection, Reed-Solomon (RS) codes, VLSI design.

I. INTRODUCTIONReed-Solomon (RS) codes are widely adopted in digital communi-

cation and storage systems. Algebraic soft-decision decoding (ASD)algorithms were proposed to incorporate the channel information intoan interpolation-based decoding process. They can achieve significantcoding gain with polynomial complexity. Among existing ASD algo-rithms, the low-complexity Chase (LCC) [1] decoding, which tests

vectors of points with multiplicity one, can achieve better perfor-mance-complexity tradeoff.

ASD algorithms share two major steps: the interpolation and factor-ization. Applying the re-encoding and coordinate transformation [2] tothe LCC decoding, the number of points to be interpolated in each testvector can be reduced from to for an RS code. In addi-tion, as proved in [3], the factorization can be eliminated from the re-en-coded LCC decoding. Interpolation architectures based on the Kttersforward interpolation [4] were developed in [5]. Nevertheless, interpo-lating over each of the test vectors from the beginning leads to highcomplexity. A backward interpolation scheme was proposed in [6] todelete points from a given interpolation result. Accordingly, the interpo-lation result of a test vector with only one point different from the currentvector can be derived by one backward and one Ktters forward inter-polation iterations. Moreover, these two interpolations can be unified inone single iteration [7]. To further reduce the latency, the parallel inter-polator in [8] can be used to generate multiple outputs at a time. Pro-cessing all interpolation outputs would lead to large computationalrequirement. Hence, polynomial selection schemes were developed in

Manuscript received December 07, 2010; revised April 14, 2011; acceptedApril 26, 2011. Date of publication June 07, 2011; date of current version June01, 2012. This work was supported by the National Science Foundation underGrant 0846331 and Grant 0802159.

X. Zhang, J. Zhu, and Y. Zheng are with Case Western Reserve Uni-versity, Cleveland, OH 44106 USA (e-mail: [email protected];[email protected]; [email protected]).

Y. Wu is with Link_A_Media, Santa Clara, CA 95051 USA.Digital Object Identifier 10.1109/TVLSI.2011.2150254

[1], [7] to pick only one interpolation result to send to the remaining de-coding steps. Both of them are based on root search over finite fields.

Despite all available techniques, the interpolation and root-search-based polynomial selection still account for a significant proportion ofthe LCC decoder area, especially when the finite field is large and/orparallel processing needs to be employed to reduce the latency. Onesuch case is magnetic recoding, which usually requires a RS codewordlength of 4 Kbits or longer and large .

In [9], a low-complexity polynomial selection scheme was proposedfor the re-encoded LCC decoder. By presetting one message symbolin the encoding process, the polynomial selection only needs to testwhether the evaluation value of a polynomial constructed from the in-terpolation output over the preset point is zero. Although the encoderneeds to be modified and one message symbol is sacrificed, this poly-nomial selection leads to great complexity reduction.

In this paper, the polynomial selection in [9] is further simplified sothat the zero testing can be done directly on the evaluation value of theinterpolation output. Then a novel interpolation scheme is proposed bymaking use of the simplified polynomial selection. Our scheme first de-rives the evaluation value needed for the polynomial selection withoutgoing through the entire interpolation process for each test vector. Thenonly the selected vectors are interpolated over. Since a single evalua-tion value, instead of the whole interpolation output polynomial, is allthat needed to tell whether the corresponding vector should be selected,our proposed scheme requires significantly lower complexity. In addi-tion, efficient architectures are developed to implement the proposedschemes. Based on synthesis results, the proposed interpolation andpolynomial selection architecture requires 75% less area and achieves37% higher throughput compared to the previous best design [8] for a(458, 410) LCC RS decoder over with . From hard-ware complexity analysis, the proposed design leads to 162% higherefficiency in the overall decoder in terms of throughput-over-area ratio.Furthermore, as increases, the saving that can be brought by the pro-posed design becomes more significant.

II. LCC DECODING ALGORITHMThis paper considers an RS code over finite field .ASD

algorithms consist of multiplicity assignment, interpolation, and factor-ization steps, and are only different in the first step. Among availableASD algorithms, the LLC decoding can achieve better performance-complexity tradeoff. The LCC multiplicity assignment first selects the least reliable code positions. For each unreliable code position, twopoints,

and

, are assigned. Here

is the field elementused for the evaluation map encoding, and

and

are the hard-deci-sion and the second mostly likely symbol for the th position. For eachof the rest positions, only

is assigned. All these pointshave multiplicity one. test vectors are formed by choosing one pointfrom each position, and decoding needs to be done for each vector.

The interpolation finds a polynomial, , with minimum weighted degree that passes each point with its associ-ated multiplicity. To reduce the complexity of the interpolation, there-encoding and coordinate transformation [4] have to be employed.Fig. 1 shows the re-encoded LCC decoder. Denote the receivedword by , and the set of the most reliable code positions by .The re-encoding is to generate a codeword , such that

for . Then the errors can be found by decoding .Since

for , the interpolation over the correspondingpoints can be eliminated by applying a coordinate transformation.

1063-8210/$26.00 2011 IEEE

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 7, JULY 2012 1319

Fig. 1. Block diagram of the re-encoded LCC decoder.

Fig. 2. FERs of the LCC decoding for a (458, 410) RS code over EPR4-equal-ized magnetic recording channel with 100% AWGN.

Accordingly, only points need to be interpolated over in eachtest vector. In the LCC decoding, each interpolation output is in theformat of

. It was proved in [3] that

and

can be used as the locator and evaluator, respectively, tocorrect the errors in . Hence, the factorization step can be removed.Moreover, the entire codeword can be efficiently recovered using thefull Chien-search-based (FCSB) scheme in [10].

The interpolation over each test vector can be carried out by theKtters forward interpolation [4], which is listed in Algorithm A inthe case of the LCC decoding. Passing all the interpolation outputsto the following decoder blocks would cause very high computationalcomplexity. Hence, polynomial selection schemes [1], [7] were devel-oped to pick only one . Both of them are based on root searchover finite fields, and are very hardware demanding. A novel polyno-mial selection was proposed for the re-encoded LCC decoder in [9].By presetting a message symbol in the encoder, the polynomial selec-tion can be carried out by testing the evaluation value of

over

, where

.

If

, the corresponding is selected. Both

and

can be computed with simple hardware units. Although the en-coder needs to be modified and one message symbol is sacrificed, theproposed polynomial selection leads to great complexity reduction.

Algorithm A: The Ktters Interpolation Algorithm

Initialize:

for Start: for each interpolation point

A1: compute

for A2:

A3: A4:

Output:

Fig. 2 shows the frame error rates (FERs) of the LCC decoding fora (458, 410) RS code over when every vector is tested, andwhen the polynomial selection in [9] is adopted to pick the first twoqualified in the case that the vectors are tested according to re-ducing reliability. These simulations are carried out over EPR4-equal-ized magnetic recoding channel with 100% additive white Gaussiannoise (AWGN), and the modification to the LCC multiplicity assign-ment [11] is employed to improve the performance. This modificationadds common erasures to all test vectors and does not affect the rest

decoding steps. Note that the code rate actually decreases to 409/458when the polynomial selection in [9] is used. This loss has been takeninto account in

for fair comparison. As shown in Fig. 2, usingthe polynomial selection in [9] only leads to negligible performancedegradation. For reference, the curves of the hard-decision decoding(HDD) and Ktter-Vardy (KV) ASD decoding [12] with maximummultiplicity

are also included in Fig. 2.

III. REDUCED-COMPLEXITY POLYNOMIAL SELECTIONAND INTERPOLATION

Although efficient architectures [6][8] have been developed basedon backward-forward LCC interpolation, the interpolation still occu-pies a significant proportion of the LCC decoder area, especially whenthe finite field is large and/or parallel processing needs to be employedto reduce the latency. In this section, after further simplifying the poly-nomial selection in [9], a novel scheme is developed to significantlyreduce the interpolation complexity.

As mentioned previously,

is tested for polynomial selectionin [9]. It can be derived that

, where

. Since

,

whether

is zero can be told by testing

instead. Ac-cordingly, the polynomial selection can be done by picking whose corresponding

is zero. This simplification does not af-fect the error correcting performance, and can save a finite field inver-sion. More importantly,

can be derived by tracing the polyno-mial updating during the interpolation without knowing two separatevalues of

and

as needed in the

computation. Thisproperty greatly facilitates the interpolation scheme proposed next.

Algorithm B: Proposed interpolation schemeInitialize:

for

for

Start:B1: Interpolate over the points with code positions in

;Update the initial values to derive

,

for

B2: Update evaluation values, derive

for each vector;Select the first two test vectors with

.

B3: Interpolate the rest points for each selected vector.The simplified polynomial selection only needs to test

. Hence,if

can be derived without first building , the polyno-mial selection can be applied to first pick the test vectors and then theinterpolation only needs to construct for the selected vectors.Compared to interpolating over test vectors, this approach has thepotential to substantially lower the complexity of the interpolation stepin the LCC decoding.

Algorithm B lists the proposed interpolation scheme for the re-en-coded LCC decoder assuming that the vectors are tested in the orderof decreasing reliability. The code positions in are furtherdivided into two sets,

for the most unreliable code positions and

for the rest code positions. Without loss of generality,assume

and

. To better explain the proposed scheme,subscript is added when necessary to to denotethe updated polynomials derived in the th interpolation iteration ofAlgorithm A. Accordingly,

pass all points

thathave been interpolated in iteration 0 through , and

isthe value needs to be tested for polynomial selection. Instead of firstbuilding

and then carrying out evaluation,

can be derived through updating the initial values by following thepolynomial updating that should have been done for each interpolation


Fig. 3. Test vector tree.

Fig. 4. Evaluation values need to be stored.

iteration. The initial values can be derived easily as shown in AlgorithmB. Since the points in

are common to all test vectors, theinterpolation over them is first done in Step B1 of Algorithm B usingthe same process as listed in Algorithm A. At the same time, the eval-uation values over

are updated according to Step A3-A4 as

(1)

using the

computed from the interpo-lation process. At the end of Step B1, the polynomials

that pass all points in

, and their evaluation values

are derived.In Step B2,

is computed for each test vectorwithout carrying out any interpolation. It can be derived throughiteratively updating

using (1) if

or

for

are available. Considering this, theevaluation values over

and

for

are

also initialized and updated in the same way as the evaluationvalue over

during Step B1. Hence,

and

for

are also available at the end of StepB1. Since

are actually

,

they can be used in an updating equation similar to (1) to derive

for . Likewise,

equal

, and areused to update other evaluation values again. This process can beapplied iteratively to derive all

and

for

, and accordingly

. After the first two test vectorsare selected based on

, the interpolation is continuedfrom the results of Step B1 to cover the points in

for each selectedvector in Step B3.

For a code position

, each test vector can take

or

. Hence, the test vectors can be mapped to a binary tree asshown in Fig. 3 [6]. In this figure, each path from the root to a leaf rep-resents a test vector. In addition, 0 and 1 denote that

and

, respectively, is included in the vector. An edge starting from anode in level represents using

or

as

to updateother evaluation values. If the evaluation values of a node are stored,they can be shared in the evaluation value updating (EVU) corre-sponding to the edges going to its children nodes. Since the evaluationvalues on a point is no longer needed after they are used to updateother evaluation values, only evaluation values needto be stored for a node in level as shown in Fig. 4. However, there are

nodes in level , and hence using a breath-first scheme to traversethe tree would lead to large memory requirement. Instead, our design

Fig. 5. Interpolator architecture. (a) PE unit. (b) PU unit.

Fig. 6. Architecture of the EVU unit.

adopts a depth-first scheme. When a node is reached, the updatedvalues can replace the previously stored values of the node in the samelevel. Accordingly, evaluation values only need to be remembered for nodes, one for each level.

The proposed interpolation scheme requires two interpolators andone EVU unit. The interpolation over the common pointsin Step B1 is carried out by Interpolator 1. In Step B3, each of the twoselected test vectors has points in

to be interpolated over. To reducethe latency, Interpolator 1 and 2 are employed to interpolate over thetwo test vectors in parallel. Moreover, the EVU unit is used to updatethe evaluation values in Step B1 and B2.

Interpolator 1 can be implemented by using the architecture inFig. 5. It mainly consists of the polynomial evaluation (PE) unit forStep A1 and polynomial updating (PU) unit for Step A3A4. In the PEunit, Horners rule is applied and the coefficients of

are inputserially with the most significant one first. Two copies of the PE unitsare employed to compute

simultaneously.The PU architecture shown in Fig. 5(b) takes care of the PU for onepair of

, and two copies of the PU units are used to update allpolynomial coefficients with the same -degree simultaneously. Sinceswitching and in the memory does not affect theinterpolation output, the updated coefficients are written back to fixedmemory blocks to save multiplexors. Moreover, once a coefficientis updated, it is sent to the PE unit to calculate the evaluation valuefor the next iteration. Interpolator 2 is only used to interpolate overthe last points of the second selected vector. During Step B2, theevaluation values required in this interpolation process have alreadybeen computed and stored. Hence, they can be passed to Interpolator2. Accordingly, Interpolator 2 only needs the PU units, and its arearequirement is much smaller than that of Interpolator 1.

Fig. 6 shows the EVU architecture, which carries out the com-putations in (1). This architecture is very similar to the PU unit inFig. 5(b). However, the data are routed to memory blocks differently.At the beginning of Step B1, the initial evaluation values on the points are loaded into RAM 2 in Fig. 6. During Step B1, the evaluationvalues are updated in the EVU unit by using the from Interpolator1. All the updated values are written back to RAM 2, except that inthe last iteration, the evaluation values on

and

are stored into RAM 1. This is because thesevalues will be used as during the first EVU iteration in Step B2.Similarly, during each iteration of Step B2, the updated evaluationvalues of the first two points are written into RAM 1 to be used as in the next iteration, while the rest are written to RAM 2. Hence, allthe updated values in the solid circle of Fig. 4 are stored in RAM 1,and those in the dotted circle are stored in RAM 2.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 7, JULY 2012 1321

IV. HARDWARE COMPLEXITY ANALYSES AND COMPARISONSThis section analyzes the hardware complexity of our polynomial

selection and interpolation scheme and compares it with prior effortsfor a (458, 410) RS code with . Then the complexity reductionof the LCC decoder achieved by using our design is investigated.

A. Hardware Complexity Analyses and Comparisons of thePolynomial Selection and Interpolation

Since Step B1B3 are carried out serially, the latency of the proposedinterpolation architecture is the sum of those of the three steps. The in-terpolation in Step B1 over the common points in

takes iterations. In addition, the PU can be carried out concurrently as the PEof the next interpolation iteration. Hence, the number of clock cyclesrequired by the interpolation in Step B1 is

, where

is the maximum -degree of the polynomials in the th iterationand

denotes the interpolation pipelining latency. In the worst case,

, and the interpolation latency of Step B1 is

.

In Step B1, the EVU unit is also activated and each updating iterationtakes clock cycles if one EVU unit is employed. Since

starts from one and increases at most by one in each iteration, the in-terpolation runs faster than the EVU in the first several iterations. Toavoid delaying the interpolation, the computed from the interpo-lator are buffered in RAM 1 of Fig. 6. If the EVU catches up later,it needs to wait until the corresponding are computed. Moreover,Step B2 can start right after the last EVU iteration is completed. Ac-cordingly, the number of clock cycles required for Step B1 is

.

During Step B2, each edge of the binary tree in Fig. 3 will be passedonce in the worst case using our depth-first scheme. To reduce the la-tency, the binary tree can be divided into sub-trees and one unit can beemployed for each sub-tree to carry out the EVU in parallel. To bal-ance the load on each EVU unit, the tree should be divided as symmet-rically as possible. This can be done by splitting from the top nodethat has two children each time. The latency of Step B2 is decidedby that of the tallest sub-tree. Assume that the entire binary tree isdivided into sub-trees. Then the tallest tree has exactly one edgebetween the nodes in levels and for , and hasa full binary tree starting from the node in level . Since the evalua-tion values over points need to be updated for an edgeending at a level- node, the latency of Step B2 can be derived as

.The last clock cycleis spent on testing

for polynomial selection. When is large,

is mainly decided by the term . Accordingly, increasing by one can reduce

by almost a half. However, it will also doublethe number of EVU engines. Hence, the speed-area tradeoff needs tobe considered when choosing .

In Step B3, the last points in the two selected vectors are inter-polated simultaneously by two interpolators, and this step consists of interpolation iterations. Similarly, in the worst case,

. How-ever, in the last iteration, only the polynomial of lower weighted de-gree, , will be sent to the output, and thus only this polyno-mial needs to be updated. Since decoding a test vector can correct atmost errors, the -degree of is at most .Therefore, the number of clock cycles required in Step B3 is

.

Table I lists the hardware complexity of the proposed interpolationscheme for the LCC decoder of a (458, 410) RS code over with . Since the proposed polynomial selection only needs totest whether

is zero, the corresponding hardware complexityis negligible and is not included in this table. After exploiting differentvalues, is set to one, i.e., two EVU units are employed, to increasehardware efficiency. Moreover,

. Hence, the entire interpola-tion process takes

clock cycles. Among previous

TABLE IHARDWARE REQUIREMENT OF INTERPOLATION AND POLYNOMIAL SELECTION

WITH FOR A (458, 410) RS CODE OVER

TABLE IISYNTHESIS RESULTS OF INTERPOLATION AND POLYNOMIAL SELECTION WITH FOR A (458, 410) RS CODE OVER

interpolator designs, the one in [8] is the most efficient for large .When 4-parallel processing is employed, it can finish the interpolationin 2828 clock cycles. To test four simultaneously, four poly-nomial selection engines are needed. Previously, highly parallel Chiensearch needs to be employed for each engine in order to match the inter-polation speed. Both the proposed design and that in [8] have the samecritical path. Hence the proposed interpolation and polynomial selec-tion can achieve % higher throughput. To furtherevaluate our design, it is modeled using Verilog-HDL and synthesizedusing Synopsys Design Compiler with SMIC 0.18-m CMOS tech-nology at 1.8 V power supply and 150 MHz clock frequency. More-over, Synopsys Power Compiler is used to estimate the power con-sumption, and the results are listed in Table II. In terms of throughput-over-area ratio, the proposed interpolator is 69% more efficient than theinterpolator in [8]. When the polynomial selection is considered, theproposed design in 457% more efficient. Our memory compiler onlygenerates memory with depth in the format of and eachmemory cell has eight transistors. Hence, much area is wasted on un-used memory portions when the proposed interpolator is synthesized.If a more optimized memory compiler is available, the proposed in-terpolator would occupy even less area. Although the proposed designneeds more memory than that in [8], the number of memory access issmaller, and logic gate requirement is much less. As a result, the pro-posed interpolator has much lower power consumption.

The proposed polynomial selection is achieved through testing

. Apparently, its complexity is negligible compared to that ofparallel exhaustive root search in prior polynomial selection schemes.The complexity of the proposed interpolation is also significantlylower than those of the architectures employing backward-forwardinterpolation [6][8]. In previous schemes, the interpolation resultof the first test vector is derived by forward interpolation. Then theresult for each of the following vectors can be derived by oneiteration of unified backward-forward interpolation. Since only oneinterpolation result needs to be stored, previous schemes have smallmemory requirement. Nevertheless, they need much more logic gatesdue to the parallel processing needs to be adopted. The latency ofinterpolating the first test vector in prior designs is about the sameas the sum of the latencies for Step B1 and B3 in our scheme. StepB2 needs EVU iterations in the worst case. Although it istwice the iteration number of unified backward-forward interpolation,the number of values need to be updated, and hence the number ofclock cycles required in each iteration is much smaller. The numberof evaluation values updated by a EVU unit in each iteration reducesfrom in the first level of the tree to one in the last level. On theother hand, the number of polynomial coefficients need to be updatedin each backward-forward interpolation remains at about .


TABLE IIIHARDWARE REQUIREMENT OF LCC DECODERS WITH FOR A (458,

410) RS CODE OVER

TABLE IVCOMPARISONS OF LCC DECODERS FOR A (458, 410) RS CODE

is usually much smaller than . Hence the proposedinterpolation scheme has much shorter latency. As it can be seenfrom Table I, our design employing 2-parallel EVU can achieve evenhigher speed than the 4-parallel unified backward-forward interpolatorin [8]. Another advantage of our design is that parallel processingis less costly. Since the EVU engine only takes a small part of theinterpolation area, duplicating this unit leads to lower area overhead.

B. Complexity Reduction in the Overall LCC DecoderThe LCC decoder can be pipelined to achieve higher throughput.

To increase the hardware utilization efficiency, each pipelining stageshould be adjusted to have about the same latency. Based on this idea,pipelining can be applied according to the cutsets in Fig. 1.

The most efficient architectures for the re-encoder and codeword re-covery can be found in [13] and [10], respectively. Their complexitiesare listed in Table III. The proposed interpolator has shorter latency.Hence, when it is adopted in the LCC decoder, besides including extraunits to compute

, higher level parallel processing needs to be usedin the re-encoder. Moreover, our proposed scheme selects two inter-polation outputs. Since the latency of the FCSB codeword recovery in[10] is less than half of the interpolation latency, running the codewordrecovery twice would not affect the decoder throughput. Also the rootnumber of

can be counted during the FCSB codeword recovery.If it matches

, the corresponding output will be sent as thefinal decoding output. Using the equivalent gate estimation explainedin [9], it can be calculated from Table III that the LCC decoder with for the (458, 410) RS code employing the proposed interpola-tion and polynomial selection requires 49261 XOR gates, and can de-code a received word in 2067 clock cycles. Comparatively, the decoderemploying the most efficient previous designs needs 102547 XOR gatesand 2828 clock cycles to decode each word.

To apply the proposed polynomial selection, the systematic encoderneeds to be modified [9]. The extra complexity needed for this modifi-cation is equivalent to 4316 XOR gates. For the purpose of fair compar-isons, this extra area requirement is added to the hardware complexityof the proposed design in Table IV. As shown in this table, for the de-coder with , employing our proposed scheme can reduce the arearequirement by

% and increase the throughputby 37%. Hence, our design can lead to 162% higher efficiency in termsof speed-over-area ratio. The hardware complexities of the LCC de-coders with and 10 are also listed in Table IV. From Table IV,the proposed scheme can increase the decoder efficiency by 53% when , and 491% with .

The error-correcting performance of the LCC decoder can be im-proved by using larger . As increases, the efficiency improvementcan be achieved by our proposed scheme becomes more significant. Theinterpolation latency is dominated by a term with . However, this termismultipliedby a larger factor in backward-forward-based interpolation.Hence the interpolation latency grows faster with in previous schemes.Alternatively, higher level parallel processing can be adopted to reducethe interpolation latency when is larger. As aforementioned, theoverhead for parallel processing in our proposed interpolation schemeis much less. In the case that more interpolation results are generatedat a time, more copies of the expensive parallel Chien search need tobe employed for the root-search-based polynomial selection. This willincrease the overall decoder area significantly. On the other hand, ourpolynomial selection only needs to test a single evaluation value foreach test vector. Its complexity is negligible in the LCC decoder.

V. CONCLUSIONIn this paper, a low-complexity polynomial selection scheme for the

re-encoded LCC decoder is further optimized. Based on the optimizedpolynomial selection, a novel interpolation scheme was developed. Dif-ferent from conventional designs, test vectors are first selected and thenthe interpolation is carried out only on the chosen vectors. In addition,efficient interpolation architectures are developed. Compared to pre-vious efforts, our proposed scheme leads to significant speedup andarea reduction. Also the saving can be brought by our design furtherincreases with . Future work will be directed to further improving theefficiency of the LCC decoder.

REFERENCES[1] J. Bellorado and A. Kavcic, Low-complexity soft-decoding algo-

rithms for Reed-Solomon codesPart I: An Algebraic soft-in hard-outchase decoder, IEEE Trans. Inf. Theory, vol. 56, no. 3, pp. 945959,Mar. 2010.

[2] W. J. Gross, F. R. Kschischang, R. Koetter, and P. Gulak, A VLSI ar-chitecture for interpolation in soft-decision decoding of Reed-Solomoncodes, in Proc. SiPS, 2002, pp. 3944.

[3] J. Zhu and X. Zhang, Factorization-free low-complexity Chase soft-decision decoding of Reed-Solomon codes, presented at the ISCAS,Taipei, Taiwan, 2009.

[4] R. Koetter, On algebraic decoding of algebraic-geometric andcyclic codes, Ph.D. dissertation, Dept. Elect. Eng., Linkoping Univ.,Linkoping, Sweden, 1996.

[5] Z. Wang and J. Ma, High-speed interpolation architecture for soft-decision decoding of Reed-Solomon codes, IEEE Trans. Very LargeScale Integr. (VLSI) Syst., vol. 14, no. 9, pp. 937950, Sep. 2006.

[6] J. Zhu, X. Zhang, and Z. Wang, Backward interpolation architecturefor algebraic soft-decision Reed-Solomon decoding, IEEE Trans. VeryLarge Scale Integr. (VLSI) Syst., vol. 17, no. 11, pp. 16021615, Nov.2009.

[7] X. Zhang and J. Zhu, Algebraic soft-decision decoder architectures forlong Reed-Solomon codes, IEEE Trans. Circuits Syst. II, Exp. Briefs,vol. 57, no. 10, pp. 787792, Oct. 2010.

[8] X. Zhang and J. Zhu, Reduced-complexity multi-interpolator alge-braic soft-decision Reed-Solomon decoder, in Proc. SiPS, 2010, pp.402407.

[9] X. Zhang, Y. Wu, and J. Zhu, A novel polynomial selection schemefor low-complexity Chase algebraic soft-decision Reed-Solomon de-coding, presented at the ISCAS, Rio de Janeiro, Brazil, 2011.

[10] X. Zhang and Y. Zheng, Efficient codeword recovery architecture forlow-complexity Chase Reed-Solomon decoding, presented at the ITAWorkshop, San Diego, CA, 2011.

[11] X. Zhang, J. Zhu, and W. Zhang, Modified low-complexity Chasesoft-decision decoder of Reed-Solomon codes, Springer J. SignalProcess. Syst., to be published.

[12] R. Koetter and A. Vardy, Algebraic soft-decision decoding ofReed-Solomon codes, IEEE Trans. Inf. Theory, vol. 49, no. 11, pp.28092825, Nov. 2003.

[13] J. Zhu and X. Zhang, High-speed re-encoder design for algebraicsoft-decision Reed-Solomon decoding, presented at the ISCAS, Paris,France, May 2010.

Documents

Novel Interpolation and Polynomial Selection for Low-Complexity Chase Soft-Decision Reed-Solomon Decoding