Transcript
Page 1: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 1 / 32

Tallinn Technical University :: May 4th 2009This presentation is available at http://www.slideshare.net/josemmf

Tallinn Technical University :: May 5th 2009This presentation is available at http://www.slideshare.net/josemmf

On using BS to improve thereliability and availability of reconfigurable hardware

J. M. Martins Ferreira [ [email protected] ]

FEUP / DEEC - Rua Dr. Roberto Frias

4200-537 Porto - PORTUGAL

M. G. Gericota, G. R. Alves, M. Silva, J. M. Ferreira, “Reliability and Avaliability in Reconfigurable Computing: A Basis for a Common Solution,” IEEE Transactions on VLSI Systems, Vol. 16, No. 11, pp. 1545-1558 , Nov. 2008.

Page 2: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 2 / 32

Outline of this talk

1. Introduction

2. Concurrent replication of active CLBs

3. On-line structural concurrent test (better reliability)

4. Defragmentation (better availability)

5. Conclusion

Page 3: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 3 / 32

• Motivation

• Causes of failure in FPGAs

Introduction

Page 4: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 4 / 32

Motivation: An old problem becomes more important• Dynamically reconfigurable

FPGAs:– Production tests cannot

guarantee fault-free operation– Application areas include

mission-critical systems– The cost / benefit of spatial

redundancy is different from static implementations

Page 5: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 5 / 32

Motivation: An old problem becomes more important

Page 6: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 6 / 32

Causes of failure in FPGAs

• Post-production failure modes may be permanent or temporary ― examples:– Electromigration phenomena may lead to

permanent physical damage– Single-event upsets (SEUs) may cause

permanent malfunction if not mitigated (modification of SRAM contents changes design and data information)

Page 7: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 7 / 32

• The principle

• How it works

• Resources required (time, space)

Concurrent replication of active CLBs

Page 8: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 8 / 32

Concurrent replication of CLBs: The principle

functional blockin another area,(non-intrusively),and making theoriginal resourcesavailable for test

Rotation

Test

Relocation

D Q

Replication of functionality

D Q

Rotation of free resources

D Q

Resources under test

• The basic idea underlying release-to-test strategies consists of replicating a given

Page 9: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 9 / 32

Concurrent replication of CLBs: The principle• Concurrent fault detection based on

release-to-test approaches must provide functional and state replication

• Replication at CLB-level – Facilitates state transfer and requires

a minimal amount of spare resources– The relative position of the replicated CLB and

its replica has an impact on propagation delay

CLB

IOB

Page 10: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 10 / 32

Concurrent replication of CLBs: How it works• General replication principle – phase one:

– Copy the internal configuration of the replicated CLB into the replica CLB and place the inputs of both CLBs in parallel

replicated CLB

CLBreplica

In

In Out

Out

Page 11: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 11 / 32

Concurrent replication of CLBs: How it works• General replication principle – phase two:

– Place the outputs of both CLBs in parallel (the replicated CLB may then be disconnected and made available for testing)

replicated CLB

CLBreplica

In

In Out

Outreplicated CLB

CLBreplica

In

In Out

Out

Page 12: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 12 / 32

Concurrent replication of CLBs: Replication aid block• Supports state transfer in synchronous gated-

clock circuits

FF_OUT

CC D Q

D Q

CE

R

01

BY_C

Logic

D Q

CE

R

01

Logic

10

RESETCLK

CE

LOGIC_OUT

Replication aid block

Replica cell

Replicated cell

from the circuit

to the circuit

Page 13: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 13 / 32

Replication flow:Time & space needed

Copy the internal logic functionality and place the

input signals in parallel

BY_C="1"CC="1"

CC="0"

Connect the clock enable inputs of both CLBs

Disconnect all the auxiliary relocation circuit signals

Place the CLB outputs in parallel

Disconnect the original CLB outputs

> 2 CLK pulseN

Y

>1CLK pulseN

Y

BY_C="0"

Disconnect the original CLB inputs

StepsNo. of bytes

Time (ms)

Copy the internal logic functionality and place the input signals in parallel

11 289 9,705

BY_C=1 & CC=1 441 0,379

CC=0 277 0,238

BY_C=0 277 0,238

Connect the clock enable inputs of both CLBs 2 145 1,844

Disconnect all the auxiliary relocation circuit signals

2 217 1,906

Place the CLB outputs in parallel 4 129 3,550

Disconnect the original CLB outputs 1 333 1,146

Disconnect the original CLB inputs 3 986 3,438

Total 26 094 22,444

1

2

3

4

5

6

7

8

9

1

2

3 4 5

6

7 8 9

Page 14: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 14 / 32

• Fault model, test configurations

• Test application

• Rotation and release for test strategy

• Fault detection latency

On-line structural concurrent test

Page 15: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 15 / 32

Fault model and test configurations• A hybrid fault model (stuck-at / functional)

was adopted and the two CLB slices (each with 13 inputs and 6 outputs) are tested in parallel Number of

configurationsNumber of

test vectorsNo. of bytes

Time (ms)

1st 16 18 392 15,813

2nd 16 3 115 2,678

3rd 2 623 0,536

4th 2 634 0,545

5th 2 613 0,527

6th 2 512 0,440

Total 40 23 889 20,539

Page 16: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 16 / 32

Test application

• CLB testing via BS:– Test vector application

is done through a 13-bit user test data register

– Response capturing takes place through unused BS cells

MUX

Bypass registerInstruction register

Config. register

TDOTDI

...CLB

under test

CLB under test

CLB under test

IN OUT

User Test Register

Page 17: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 17 / 32

Rotation strategy

• Vertical rotation has an advantage in the case of arithmetic circuits that use the dedicated carry interconnection between (vertically) adjacent CLBs

• In the general case, we should consider such factors as the number of circuits with high fanout and the shape / orientation of the implementation

Page 18: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 18 / 32

Replicate and release-to-test in a 24-bit counter (example)

CIN

COUTCLB_R22C7.S0

BX

YB

CIN

COUTCLB_R21C7.S0

BX

YB

CIN

COUTCLB_R23C7.S0

BX

YB

CIN

COUTCLB_R24C7.S0

BX

YB

Dedicatedcarry lines

Page 19: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 19 / 32

Replicate and release-to-test in a 24-bit counter (example)

0

20

40

60

80

100

120

140

160

0 1 2 3 4 5 6 7 8 9 10 11 12

Number of relocations

Max

imum

freq

uenc

y of

ope

ratio

n(M

Hz)

- verticalrotation

- horizontalrotation

CIN

COUTCLB_R22C7.S0

BX

YB

CIN

COUTCLB_R21C7.S0

BX

YB

CIN

COUTCLB_R23C7.S0

BX

YB

CIN

COUTCLB_R24C7.S0

BX

YB

U1/C6/C16/C1/O

U1/C6/C14/C1/O

Tbxcy

Tbyp

Tbyp

U1/C6/C12/C1/O

Page 20: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 20 / 32

Rotation strategy: ITC’99 benchmarks

Circuit Logic Carry logic

Reference # PI # PO # gates # FF Lines Segments

B01 2+2 2 47 5 0 0

B02 1+2 1 29 4 0 0

B03 4+2 4 150 30 0 0

B04 11+2 8 606 66 4 14

B05 1+2 36 977 34 4 16

B06 2+2 6 61 9 0 0

B07 1+2 8 422 49 2 6

B08 9+2 4 168 21 0 0

B09 1+2 1 160 28 0 0

B10 11+2 6 190 17 0 0

B11 7+2 6 484 31 1 4

B12 5+2 6 1037 121 0 0

B13 10+2 10 343 53 1 4

B14 32+2 54 4787 245 11 150

Page 21: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 21 / 32

Rotation strategy: ∆f and size for the ITC’99 circuits

Ref.

Maximum ∆f (%)Size of the

reconfiguration files (bytes) Ratio size of the

reconf. files by CLB (%)

(horizontal>vertical)Vertical Horizontal Vertical Horizontal

B01 -5,5 0,0 48 350 56 102 16,0

B02 0,0 0,0 7 016 10 623 51,4

B03 -1,9 -4,9 120 705 138 484 14,7

B04 -6,1 -29,3 548 595 665 419 21,3

B05 -17,3 -36,9 1 130 985 1 286 031 13,7

B06 -2,7 0,0 45 291 53 503 18,1

B07 -23,6 -37,8 354 367 425 214 20,0

B08 -5,8 -5,8 150 093 178 339 18,8

B09 -1,8 -4,9 112 107 129 855 15,8

B10 -7,5 -7,6 195 571 245 455 25,5

B11 -10,5 -36,0 500 261 614 093 22,8

B12 0,0 -1,2 1 275 804 1 631 953 27,9

B13 -4,3 -42,8 258 827 332 954 28,6

B14 -13,5 -47,8 5 195 444 6 070 485 16,8

Page 22: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 22 / 32

Fault detection latency

• The duration of a complete rotation cycle depends on the device size and on the reconfiguration and test times

• The fault detection latency alternates between a minimum and a maximum value, according to the rotation direction:

– MAXFDL = [(#CLBROWS x #CLBCOLS)-1] x 2 x

(ΔRECONF+ΔTEST)

– MINFDL = 2 x (ΔRECONF+ΔTEST)

Page 23: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 23 / 32

Fault detection latency

Synchronous circuits with clock enable [With the replication aid circuit]

# of bytes

Time (ms)20MHz TCK

Copy logic functionality and parallel input signals

11 289

9,705

BY_C=1CC=1 441 0,379

CC=0 277 0,238

BY_C=0 277 0,238

Connect the clock enable inputs of both CLBs

2145 1,844

Disconnect all the auxiliary relocation circuit signals

2217 1,906

Place the CLB outputs in parallel

4129 3,550

Disconnect the original CLB outputs

1333 1,146

Disconnect the original CLB inputs and setup test configuration

18392 15,813

Total 40500 34,820

Synchronous circuits with free-running clock and combinational circuits [Without the replication aid circuit]

# of bytes

Time (ms)20MHz TCK

Copy of the internal logic functionality and place of the input signals in parallel

12163 10,457

Place of the CLB outputs in parallel

3993 3,433

Disconnect of the original CLB outputs

1073 0,923

Disconnect of the original CLB inputs and setup test configuration

18392 15,813

Total 35621 30,625

Page 24: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 24 / 32

Worst-case fault detectionlatency (XCV200)

File size and reconfiguration time of the test configurations

# of configurations

# of bytes

Time (ms)20MHz TCK

2nd 3 115 2,678

3rd 623 0,536

4th 634 0,545

5th 613 0,527

6th 512 0,440

Total 5 497 4,726

Shifting time for test vector application

# of test vectors

Length (bits)

Total (bits)

Time (ms)20MHz TCK

40 13 520 0,066

Shifting time for the test vector responses from a CLB under test

# of cells of the BS register in a XCV200

# of test vectors

Time (ms)20MHz TCK

1 022 40 4,088

Mean time for the test of a 1176 CLBs matrix

Occupation type: 25% synchronous, 50% combinational, 25% empty

43 679,188 ms @ TCK = 20 MHz

26 472,235 ms @ TCK = 33 MHz

The mean time to test the full CLB matrix is also the worst-case fault detection latency

Page 25: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 25 / 32

• The importance of floor planning

• Why (de)fragmentation?

• Can concurrent replication help?

Defragmentation

Page 26: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 26 / 32

Availability vs. floor planning performance• Good dynamic floor planning management

may enable the implementation of applications that in total would require more than 100% of the FPGA resources

TimeInitial configuration rt - reconfiguration interval

- data transfer between different functions

Appl. C

Appl. B

Available resource

space

Function C1

Function B1

Function A1Appl. A

Function A2

Function B2

Function C3Function C2 Function C4

Applications running in the FPGA

rt

rt

r1

Page 27: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 27 / 32

Fragmentation: Why?

• The absence of faults does not guarantee acceptable availability, namely when function swapping /partial reconfiguration occurs frequently

• Insufficient contiguous resources will delay incoming functions

nth partial reconfig.

2nd partial reconfig.

1st partial reconfig.

Initial config.

Resource allocation(2-D spatial)

Time

y

x

Reconfigurations (temporal dimension)

Page 28: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 28 / 32

Can concurrent replication help?• Concurrent replication of active CLBs may

be used to defragment the FPGA and minimise the implementation delay to incoming functions– Defragmentation is performed concurrently with

all running functions (no need to halt their execution)

– Coherency of the register contents is guaranteed, preserving all state information

Page 29: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 29 / 32

• Summary

• Research topics

Conclusion

Page 30: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 30 / 32

Summary

• Concurrent replication offers a powerful and non-intrusive solution to improve reliability and availability of reconfigurable hardware

• Paralleling CLB inputs and outputs doesn’t create any problem

• Boundary-scan provides a valuable contribution to implement an on-line concurrent structural test strategy

Page 31: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 31 / 32

Research topics

• Concurrent replication of active CLBs offers a powerful tool for defragmentation purposes, but the higher-level strategy is still missing

• All aspects of the proposed solutions were validated in practice (lab experimentation), but a software tool to fully automate the reconfiguration process is still missing

Page 32: On using BS to improve the

J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 32 / 32

Tallinn Technical University :: May 4th 2009This presentation is available at http://www.slideshare.net/josemmf

Tallinn Technical University :: May 5th 2009This presentation is available at http://www.slideshare.net/josemmf

On using BS to improve thereliability and availability of reconfigurable hardware

Thanks for your attention!

J. M. Martins Ferreira [ [email protected] ]


Recommended