ChaPARTIALLY PARALLEL LDPC DECODER ARCHITECTURE WITH SCALABLE THROUGHPUTpter 3

Embed Size (px)

Citation preview

  • 8/12/2019 ChaPARTIALLY PARALLEL LDPC DECODER ARCHITECTURE WITH SCALABLE THROUGHPUTpter 3

    1/20

    CHAPTER 3

    PARTIALLY PARALLEL LDPC DECODER ARCHITECTURE WITH

    SCALABLE THROUGHPUT

    3.1 INTRODUCTION

    A high throughput parallelism enhanced Quasi cyclic LDPC decoder

    architecture is presented in this chapter. Partly parallel decoding scheme is

    employed in the decoder architecture. The decoder supports fixed code rate of

    with 19 different code lengths of IEEE 802.16e standard irregular LDPC codes. The

    constraints in the implementation of the decoder architectures are interconnect and

    storage requirement. The fully parallel architectures suffer by routing congestion for

    longer codes. But, high throughput is achieved with fully parallel architectures. The

    routing complexity is reduced in partly parallel architectures at the cost of

    throughput. The parallelism of the architecture is enhanced with parallel factor(PF)

    which is taken as 1,2 and 4 in this architecture. The throughput is scalable with

    different parallel factors. The throughput of the decoder is increased with enhanced

    parallelism at the cost of hardware. The performance and hardware utilization is

    compared with similar architectures.

    3.2 ARCHITECTURE OF THE ENHANCED PARALLELISM DECODER

    The overall architecture is explained in this section. Initially, the flow of

    decoding process , mapping the decoding algorithm is described. The decoding

    procedure involves two phases of operations: 1. Check node computation in the first

  • 8/12/2019 ChaPARTIALLY PARALLEL LDPC DECODER ARCHITECTURE WITH SCALABLE THROUGHPUTpter 3

    2/20

    phase and 2. Variable node update in the second phase of operation. The two

    phases of operation is considered as one iteration. After completing the

    maximum number of iterations defined , the decoded code bit is obtained from

    the check sum. The processing flow is illustrated with the flow diagram shown in

    the Figure3.1.

    Figure 3.1 Flow Chart Illustrating two phase Decoding Process

    INITIALIZE

    Read CRAM

    Check Node

    process

    Write VRAM

    Write CRAM

    Variable node

    rocess

    Read VRAM

    A

    END

    Yes

    No

    Yes

    Yes

    No

    Is

    iteration

  • 8/12/2019 ChaPARTIALLY PARALLEL LDPC DECODER ARCHITECTURE WITH SCALABLE THROUGHPUTpter 3

    3/20

    The two phases of operations are carried out by check node

    processor(CNP) and variable node processor(VNP) respectively. The number of

    check node and variable node processors are decided by the parallel factor and

    size of the base matrix. The base matrix size of the IEEE 802.16e standard for

    code rate is 12 24. The number of check node processors is given by 12 PF

    and the number of variable node processors is 24 p.f.. A CNP computes z / PF

    rows sequentially. Hence, 12 PF CNPs computes 12 z rows in z / PF clock

    cycles. A VNP updates z / PF columns sequentially and hence, 24 PF VNPs

    updates 24 z columns in z / PF cycles. The total number of clock cycles

    required to complete these two phases is 2 ( z / PF). The decoding throughput isgiven by the Equation (1.1)

    Throughput = (N R f) / ((NCT + l) Nit) (3.1)

    Where N is code length,

    R is code rate,

    f is synthesized frequency,

    NCTis total number of clock cycles given by 2 ( z / p.f.)

    l latency due to pipelining

    Nitis number of iterations which is set at 10 based on the MATLAB

    simulation results.

    The overall architecture is shown in the Figure 1.2. The architecture

    has two processing units: CNPs, VNPs and data storage units: CRAMs, VRAMs

    and INDEXROMs. The messages for CNPs are read from CRAMs, computed in

    CNPS and written into VRAMs. The messages in the same row are read

    simultaneously and sent to CNPS. The messages in CRAM are accessed

    sequentially in row order and written in the same order. The function of VNP is

    to add column messages along with intrinsic message. Hence, to access column

    wise message from VRAM, the addresses of the column messages are stored in

  • 8/12/2019 ChaPARTIALLY PARALLEL LDPC DECODER ARCHITECTURE WITH SCALABLE THROUGHPUTpter 3

    4/20

    INDEXROMs. The column addresses are read from the INDEXROMs and sent

    to VRAM address lines to access data for VNPs. The updated messages from

    VNPs for CNPs are stored in the same address locations of the CRAMs. The

    processing units and data storage units are explained in detail in the following

    sections.

    Figure 3.2 Architecture of parallelism enhanced Quasi cyclic LDPC

    Decoder

    3.2.1 Check Node Processor

    The incoming messages to the CNP are in the sign magnitude form.

    The most significant bit (MSB) is the sign bit which is 1 for negative number

    and 0 for positive number. The remaining bits give the absolute value of the

    message. The intrinsic values of the neibouring variable nodes are considered as

    inputs to the CNPs in the first iteration. The CNP computes the check node to

    variable node messages from the incoming neibouring variable node messages as

    defined by the equation ()in the section of chapter 2 is again recalled for

    reference in equations 3.2 and 3.3:

    Controller Unit

    Column

    Address

    CRAM Block

    CRAM 12 PF

    CRAM 1

    CNP Block

    CNP 12 PF

    CNP 1

    VNP Block

    VNP 12 PF

    VNP 1

    VRAM Block

    VRAM 12 PF

    VRAM 1

    INDEX ROM 12

    INDEXROM 1

  • 8/12/2019 ChaPARTIALLY PARALLEL LDPC DECODER ARCHITECTURE WITH SCALABLE THROUGHPUTpter 3

    5/20

    (3.2)

    (3.3)

    As shown in Figure 3.3, the updated messages from the check node(C1) to

    variable node V1 is computed by messages from variable nodes v2 and v3.

    Figure 3.3 Check to Variable node update

    The architecture of the CNP is shown in Figure 3.4. The CNP has

    three parts: 1.comparator logic which computes the minimum and sub minimum

    values from the absolute values of the neibouring variable node messages and

    index of the minimum value. 2. XOR logic which finds the product of signs of all

    the incoming messages. 3. The distributor which distributes the computed

    messages to all the neibouring variable nodes. The input to the distributor areminimum and subminimum values, index of the minimum value, product of signs

    of all the messages and signs of all the incoming messages. The sub minimum

    value is used as the magnitude of the updating message of the neighbouring

    variable node when the index of the minimum equals the index of the

    neighbouring node and all other neighbouring nodes are assigned the minimum

    value. The sign of out going message is obtained by XOR logic operation with

    the sign of the incoming message and product of the signs of all the messages.

    Check NodeC1

    V1 V2 V3

    Variable Nodes

  • 8/12/2019 ChaPARTIALLY PARALLEL LDPC DECODER ARCHITECTURE WITH SCALABLE THROUGHPUTpter 3

    6/20

    SEL1

    SEL5

    LOW1

    ABS D2

    ABS(D

    ABS(D3)

    ABS(D4)

    ABS(D5)

    ABS(D6

    SEL2

    SEL3

    HIGH1

    HIGH2

    HIGH 3

    SEL1

    LOW2

    LOW3

    LOW4

    HIGH 4

    SEL4

    LOW5

    HIGH4

    The outgoing messages from CNP are to be processed by VNPs. The VNPs adds

    updated messages in each column. To facilitate addition operation, the outgoing

    messages from CNPs are converted into twos complement form.

    *Low5 is the Minimum value among 6 inputs

    Figure 3.4 a Minimum Value Finder with 6 inputs

    Figure 3.4 b Sub Minimum Value Finder with 6 inputs

    HIGH 2

    LOW 1

    HIGH 1

    LOW 2

    LOW 4

    LOW 5

    SEL 4

    Minimum 2

    Sub Minimum

    LOW 3

    LOW 7Minimum 2

    HIGH 3SEL 6

    Comparator

    Comparator

    Comparator

    Comparator

    Comparator

    Comparator

    Comparator

    Comparator

  • 8/12/2019 ChaPARTIALLY PARALLEL LDPC DECODER ARCHITECTURE WITH SCALABLE THROUGHPUTpter 3

    7/20

    Figure 3.4c XOR Logic

    Figure 3.4d Distributer

    The comparator uses pre computation logic. It has two comparator and

    swap modules: CS-A and CS-B. CS-A compares and swaps the two inputs based

    on the MSBs of absolute values of the two inputs. CS-B compares and swaps the

    two inputs based on the remaining bits of the two inputs. XOR logic operation is

    performed with the MSBs of absolute values of the two messages. The output of

    the XOR logic is used to select either CS-A or CS-B. If XOR logic output is 1,

    SIGN(D3)

    SIGN(D4)XOR

    XOR

    XOR

    SIGN(D1)

    SIGN(D2)XOR

    SIGN(D5)

    SIGN(D6)XOR

    SIGN(O6)

    SIGN(O1)XOR

    XOR

    .

    .

    .

    SIGN

    Minimum

    SubMinimum

    SIGN O4

    Index

    SIGN

    SIGN O1

    SIGN(O2)

    SIGN O3

    SIGN O5

    SIGN O6

    Distributer.

    .

    .

    .

    .

    .

    COUT1

    COUT6

    Sign magnitude to

    2s complement

  • 8/12/2019 ChaPARTIALLY PARALLEL LDPC DECODER ARCHITECTURE WITH SCALABLE THROUGHPUTpter 3

    8/20

    it indicates MSBs of the two inputs differ and CS-A is selected. If it is 0, it

    shows that MSBs of the two input value are same and hence CS-B is selected.

    The compare and swap unit has 3 outputs: low, high and sel. If input1is greater

    than input2, the sel output is high else it is low. The sel input is used to find

    the index value of the input.

    The LDPC codes of IEEE802.16e standard are irregular codes, they

    have different row weights and column weights. The block rows 1,4,5,7,8,10,11

    and 12 have 6 non zero circulant matrices and weight of these rows is 6. The

    block rows 2,3,6 and 9 have 7 non zero matrices and weight of these row is 7.

    The number of inputs to CNP is 6 , if the row weight is 6 and the number of

    inputs is 7 if the row weight is 7. Hence , two different CNPs are required to

    compute 6 and 7 messages.

    The check node computation process is pipelined as shown in Figure 3.5 to

    improve the performance.

    Figure 3.5. Pipelining in the Check node computation phase

    3.2.2 Variable Node Processor

    The variable node processor is simply an adder. As shown in Figure 3.6, the

    updated messages from the variable node(V1) to check nodes C1 is computed by

    messages from variable nodes C2 and C3.

    READ (1)

    CRAM

    READ (2)

    CRAM

    READ (3)

    CRAM

    READ (4)

    CRAM

    WRITE (1)

    VRAM

    WRITE (2)

    VRAM

  • 8/12/2019 ChaPARTIALLY PARALLEL LDPC DECODER ARCHITECTURE WITH SCALABLE THROUGHPUTpter 3

    9/20

    Figure 3.6 Check to Variable node update

    The VNP has two parts: the first part simply adds intrinsic message of

    that particular variable node and updated messages from its neighbouring check

    nodes. The second part distributes the updated messages to its neighbouring

    nodes by subtracting original input message from total sum obtained from the

    first part. The updated outgoing messages are converted into sign magnitude

    form to be used by the CNPs. Since, the column weights of the IEEE802.16e

    standard irregular codes are 2,3 and 6, three different VNPs are used to add 3,4

    and 7 inputs respectively. They are represented as VNP3, VNP4 and VNP7

    respectively. The architectures of VNPs are shown in Figure 3.7. The number of

    adder stages between input and output of VNP3, VNP4 and VNP7 are 2,2 and 4.

    To reduce the critical path delay in VNP7 pipeline registers are introduced. (i/p1

    instead of LLR) LLR instead of i/p6.

    V1

    C1 C2 C3

    Variable Node

    Check Nodes

  • 8/12/2019 ChaPARTIALLY PARALLEL LDPC DECODER ARCHITECTURE WITH SCALABLE THROUGHPUTpter 3

    10/20

    Figure 3.7 Variable Node Processor

    The processing steps involved in variable node processor is shown in

    Fig.3.8. The column address for VRAM is read from INDEXROM in the first

    cycle and sent as address to VRAM, the column messages are read from VRAM

    and sent to VNP in the second clock cycle and the updated messages by VNPs

    are written into CRAM in the third cycle. These processes are overlapped as

    shown in Figure 3.8 to improve the performance by introducing pipeline stages.

    Figure 3.8 Variable node processing phase

    READ (1)

    INDEXROM

    READ (2)

    INDEXROM

    READ (3)

    INDEXROM

    READ (4)

    INDEXROM

    READ (n-1)

    INDEXROM

    READ (n)

    INDEXROM

    READ (1)

    VRAM

    READ (2)

    VRAM

    READ (3)

    VRAM

    READ (n-2)

    VRAMREAD (n-1)

    VRAM

    READ (n)

    VRAM

    WRITE (1)

    CRAMWRITE (2)

    CRAMWRITE (n-3)

    CRAMWRITE (n-2)

    CRAMWRITE (n-1)

    CRAMWRITE (n)

    CRAM

    VNP out 1

    VNP out 6

    i/p1

    LLR

    i/p2

    i/p3

    i/p4

    i/p5

    i/p6

    To

    CRAM

    +

    +

    +

    + +

    +

    -

    -

    2s com

    toSign

    2s comto

    Sign

  • 8/12/2019 ChaPARTIALLY PARALLEL LDPC DECODER ARCHITECTURE WITH SCALABLE THROUGHPUTpter 3

    11/20

    3.2.3 Memory Organization

    The updated messages by CNPs and VNPs are stored in VRAMs and

    CRAMs respectively. In the first phase data is read from CRAMs and given to

    CNPs for processing. The processed data is then written into VRAM. After

    processing all the rows in the first phase, the decoder enters into the second phase

    of the decoding. In the second phase, the VRAM addresses are read from

    INDEXROM, the messages for VNPs are accessed from VRAMs and written

    into CRAMs after processing in the VNPS. This forms one iteration and the

    above processes are repeated till the maximum iterations are reached.

    The number memory blocks to store messages for CNPs and VNPs is

    equal to number of block rows. Hence, the decoder has 12 CRAMs, 12 VRAMs

    and 12 INDEXROMS. Each CRAM and VRAM has 6 or 7 memory banks

    according to the number of non zero matrices in each block row. The messages

    are quantized to 6 bits and each location stores 6bit message. The messages are

    stored row wise in the memory locations that is the message in first row is stored

    in first location and message in second row in the second location and so on. A

    circulant matrix of size 12 12 is shown in fig.3.8 and the corresponding

    memory organization for different parallel factors are shown in Fig.3.9. The non

    zero message in the first row is at fifth column position and non zero message is

    in sixth column position and so on. With parallel factor = 1, the messages are

    stored row wise as shown in fig8a. If the parallel factor is 2, the messages of

    row1 to row 6 are stored in bank1 and messages of row 7 to row 12 are stored in

    bank2 as shown in fig.8b. If p.f. =4 , four messages are packed as single word

    and stored in a single location. Hence, the number of locations with parallel

    factor 1,2 and 4 are 12, 6 and 3 respectively.

  • 8/12/2019 ChaPARTIALLY PARALLEL LDPC DECODER ARCHITECTURE WITH SCALABLE THROUGHPUTpter 3

    12/20

    4

    5

    6

    7

    8

    9

    10

    11

    12

    1

    2

    3

    Figure 3.8 Circulant matrix of size 12 12

    PF = 4

    PF = 2

    PF = 1

    Figure 3.9 Illustration of Data storage with different parallel factors

    The data is stored in the same fashion as illustrated in the figure3.9 in

    CRAM and VRAM. To access column wise data from VRAM, the column

    addresses are stored in INDEXROM. In the given example for PF = 1, the

    column 1 message is stored in location 10 and 10 is stored in location 1 of

    Memory

    location

    Data

    1 4

    2 5

    3 6

    4 75 8

    6 9

    .

    .

    .

    .

    .

    .

    11 2

    12 3

    Memory

    location

    Data

    In

    bank1

    Data

    In

    bank2

    1 4 10

    2 5 113 6 12

    4 7 1

    5 8 2

    6 9 3

    Memory

    location

    Bank1

    1 4,7,10,1

    2 5,8,11,2

    3 6,9,12,3

  • 8/12/2019 ChaPARTIALLY PARALLEL LDPC DECODER ARCHITECTURE WITH SCALABLE THROUGHPUTpter 3

    13/20

    INDEXROM and so on. If PF = 2, the column 1, column2 and column3

    messages are stored in bank2 and column 4 , column5 and column6 messages are

    stored in bank1. Hence , a tag is attached with the column address to identify the

    bank.

    The contents of INDEXROM for PF = 1 and 2 is given in the Figure

    3.10. In the case of PF =1, the data is accessed in sequence from the locations

    10,11,12,1,2,...,9. With PF = 2, the bolded bit in fig indicates the tag of the bank.

    The remaining bits represent the location of bank1 and ban2. For illustrated

    circulant matrix, With PF = 2, 2 VNPs are assigned for each block column. For

    illustrated circulant matrix, VNPA processes columns 1 to 6 and VNPB

    processes columns 7 -12. If tag is 1, the memory location4 of the bank2 is sent to

    VNPA and memory location 4 of bank1 is to VNPB. If tag is 0, the message

    from the given memory location of bank 1 is sent to VNPA and the message

    from the given memory location of bank 2 is sent to VNPB.

    PF = 2

    PF = 1

    Figure 3.10 Contents of IndexROM with parallel factors 1 and 2

    Memorylocation

    Data

    1 1010

    2 1011

    3 1100

    4 0001

    5 0010

    6 0011

    .

    .

    .

    .

    .

    .

    11 1000

    12 1001

    Memory

    location

    Data

    1 1100

    2 1101

    3 1110

    4 0001

    5 0010

    6 0011

  • 8/12/2019 ChaPARTIALLY PARALLEL LDPC DECODER ARCHITECTURE WITH SCALABLE THROUGHPUTpter 3

    14/20

    If PF is 4, four data are merged and stored in one location and these

    four data from CRAM are sent to four CNPs. Each CNP receives 6 or 7

    messages from 6 or 7 CRAM banks. After processed by CNPS, the data are

    stored in the same location of the VRAM. But , to access column wise data, the

    data from CNP shuffled and stored in VRAM. The data in fig is shuffled by a

    shuffle network and stored in VRAM as shown in Figure 3.11.

    Figure 3.11 Contents of CRAM and VRAM

    The shuffled four data , data1 is sent to VNPA, data2 is sent to VNPB,

    data 3 is sent to VNPC and data 4 is sent to VNPD. After processed in the VNPs

    the are reshuffled and stored in the same location of the CRAM.

    The total number of memory bits for various code lengths is shown in

    Table 3.1

    Table 3.1 Total memory Requirement for various code lengths

    Code lengths

    576 1152 2304

    p.f.=1 P.f.=2 p.f.=1 P.f.=2 p.f.=1 P.f.=2

    CRAM (Bits) 10,944 10,944 21,888 21,888 43776 43776

    VRAM(Bits) 10,944 10,944 21,888 21,888 43776 43776

    INDEXROM

    (Bits)

    9360 5,184 22,464 11,232 52436 22176

    Total 32,248 27,072 66,240 55,008 139968 109728

    Memory

    location

    Bank1

    1 4,7,10,12 5,8,11,2

    3 6,9,12,3

    Memory

    location

    Bank1

    1 1,4,7,10

    2 2,5,8,11

    3 3,6,9,12

  • 8/12/2019 ChaPARTIALLY PARALLEL LDPC DECODER ARCHITECTURE WITH SCALABLE THROUGHPUTpter 3

    15/20

  • 8/12/2019 ChaPARTIALLY PARALLEL LDPC DECODER ARCHITECTURE WITH SCALABLE THROUGHPUTpter 3

    16/20

  • 8/12/2019 ChaPARTIALLY PARALLEL LDPC DECODER ARCHITECTURE WITH SCALABLE THROUGHPUTpter 3

    17/20

    Table 3.2 Comparison of Device Utilization and throughput of Architectures

    with and without pipelining for various code lengths

    The hardware and throughput are compared for the code lengths 576,

    864, 1152, 1768 and 2304 . The comparison is given in table 3.3.

    Table 3.3 Comparison of Device Utilization and throughput of Architectures

    with pipelining for various code lengths

    Code

    lengths

    576 1152 2304

    WOP WP WOP WP WOP WP

    Slice

    registers

    13100 13925 24055 24891 45358 46194

    Slices 1880 4318 1939 1983 1076 2140

    LUTs 8101 8183 12209 13440 20084 20901

    BRAMs 16 12 18 14 56 56

    Memory

    bits

    32,248 66,240 139988

    Clock

    cycles

    72 54 144 102 288 198

    Clock

    (MHz)

    126 226 126 226 180 226

    Throughput

    (Mbps)

    50.4 120.5 50.4 127.62 72 131.5

    Codelengths

    576 864 1152 1728 2304

    Slice

    registers

    13925 19490 24891 35850 46194

    Slices 4318 1982 1983 2596 2140

    LUTs 8183 11463 13440 19158 20901

    BRAMs 12 14 14 18 56

    Memory 32,248 49680 66,240 104976 139988

  • 8/12/2019 ChaPARTIALLY PARALLEL LDPC DECODER ARCHITECTURE WITH SCALABLE THROUGHPUTpter 3

    18/20

    The hardware and throughput of the architectures using block rams

    and distributed rams in FPGA are compared which is illustrated in Table 3.4. The

    architectures designed for code length 2304 bit with code rate 0.5 and parallel

    factor =1. The slice utilization is higher and the block ram usage is less in the

    architecture designed with distributed ram. The throughput is increased in the by

    8% in the architecture using block ram.

    Table 3.4 Comparison of Device Utilization and throughput of Architectures

    with block ram and distributed ram for code length 2304 bits/0.5 code rate

    Distributed

    RAM

    BlockRAM

    Slice

    registers

    46194 3084

    Slices 2140 -

    LUTs 20901 4603

    BRAMs 56 76

    Memory

    bits

    139988

    Clock

    cycles

    198 199

    Clock(MHz)

    226 246

    bits

    Clock

    cycles

    54 78 102 150 198

    Clock

    (MHz)

    226 226 226 226 226

    Throughput

    (Mbps)

    120.5 125.17 127.62 130 131.5

  • 8/12/2019 ChaPARTIALLY PARALLEL LDPC DECODER ARCHITECTURE WITH SCALABLE THROUGHPUTpter 3

    19/20

    Throughput

    (Mbps)

    131.5 142

    The architectures with parallel factors 1,2 and 4 designed to

    support the code length 2304 bits/ 0.5 code rate are compared. The table 3.5

    gives the comparison of hardware and throughput of architectures with {F

    =1,2,and 4. As parallel factor increases, the throughput increases with hardware.

    Table 3.5 Comparison of Device Utilization and throughput of Architectures

    with parallel factors 1,2 and 4 designed to support code length 2304

    bits/0.5 code rate

    PF=1 PF=2 PF=4

    Slice

    registers

    46194 39913 11493

    Slices 2140 2238 -

    LUTs 20901 23826 18302

    BRAMs 56 12 160

    Memory

    bits

    139988

    Clock

    cycles

    198 102 54

    Clock

    (MHz)

    226 185 221

    Throughput

    (Mbps)

    131.5 209 472

    The proposed architectures are compared with the existing

    architectures which is given in the table 3.6. The proposed architecture achieves

    very high throughput when compared with the existing architectures.

  • 8/12/2019 ChaPARTIALLY PARALLEL LDPC DECODER ARCHITECTURE WITH SCALABLE THROUGHPUTpter 3

    20/20

    Table 3.6 Comparison of Device Utilization and throughput of Architectures

    with parallel factors 1,2 and 4 designed to support code length 2304

    bits/0.5 code rate

    Proposed Vikram

    (2013)

    Karkoot

    (2008)

    No of CNPs

    and VNPs

    48,96 48,48 48,96 81

    No. of

    quantization

    bits

    6 4 7 7

    Code length

    and code

    rate

    2304 2304 1536

    Regular

    628,1296,

    1944

    Regular,Irregular

    Application WiMAX

    IEEE802.16e

    standard

    WiMAX

    IEEE802.16

    e standard

    IEEE802.11n

    Decoding

    Algorithm

    TPMP-

    Minsum

    TPMP-

    Modified

    Minsum

    Modifie

    d

    Minsum

    Layered

    MMS

    Slice registers 11493 2024 3455 12368

    Slices - 3141 9881 11328

    LUTs 18302 9547 18174 17104

    BRAMs 160 87 66 87

    Memory (bits) 87532 20736 NA

    Clock cycles 54 78 - -

    Clock (MHz) 221 144 211

    Throughput

    (Mbps)

    472

    (source datarate)

    266

    (source datarate)

    397

    (datarate)

    Code

    length1296

    1944

    56 84

    FPGA Virtex V Virtex V Virtex4