Download ppt - 3D-DRESD FT

Transcript
Page 1: 3D-DRESD FT

POLITECNICO DI MILANO

Vincenzo Rana

[email protected]

Fault tolerance inFault tolerance inFPGA-based systemsFPGA-based systems

Page 2: 3D-DRESD FT

2

OutlineOutline

Techniques:Triple module redundancy

Throughput logicState-machine logicI/O logicBRAM

Error detection and error correctionPartial reconfiguration

Real approachesSEU migration through dynamic partial reconfigurationRun-time fault reconfiguration

Conclusions

Page 3: 3D-DRESD FT

3

OutlineOutline

Techniques:Triple module redundancy

Throughput logicState-machine logicI/O logicBRAM

Error detection and error correctionPartial reconfiguration

Real approachesSEU migration through dynamic partial reconfigurationRun-time fault reconfiguration

Conclusions

Page 4: 3D-DRESD FT

4

Triple module redundancyTriple module redundancy

Page 5: 3D-DRESD FT

5

Triple module redundancy Triple module redundancy (voter)(voter)

The voter can be implementedwith Look-Up Tables (LUTs)with buffer 3-state (BUFT)

Page 6: 3D-DRESD FT

6

OutlineOutline

Techniques:Triple module redundancy

Throughput logicState-machine logicI/O logicBRAM

Error detection and error correctionPartial reconfiguration

Real approachesSEU migration through dynamic partial reconfigurationRun-time fault reconfiguration

Conclusions

Page 7: 3D-DRESD FT

7

Throughput logicThroughput logic

The system will include 3 copies of:the module itselfthe input signalsthe output signals

No voter is needed

No single-point-of-failure

Page 8: 3D-DRESD FT

8

OutlineOutline

Techniques:Triple module redundancy

Throughput logicState-machine logicI/O logicBRAM

Error detection and error correctionPartial reconfiguration

Real approachesSEU migration through dynamic partial reconfigurationRun-time fault reconfiguration

Conclusions

Page 9: 3D-DRESD FT

9

State-machine logicState-machine logic

State-machines strictly depend on their stateThe voter has to be implemented internally

A voter has to be inserted in the system for:each state registereach feedback path

This approach allows to keep each state-machine always in the correct state

Page 10: 3D-DRESD FT

10

OutlineOutline

Techniques:Triple module redundancy

Throughput logicState-machine logicI/O logicBRAM

Error detection and error correctionPartial reconfiguration

Real approachesSEU migration through dynamic partial reconfigurationRun-time fault reconfiguration

Conclusions

Page 11: 3D-DRESD FT

11

I/O logic (Input)I/O logic (Input)

Input pins have to be replicated in order to avoid single-points-of-failureIf the number of required input pins exceeds the number of input pins available on the reconfigurable devices:

Just a subset of input pins can be replicatedThe system can be split in more than one FPGA

Page 12: 3D-DRESD FT

12

I/O logic (Output)I/O logic (Output)

In order to avoid a single-point-of-failure on output pins it is necessary to implement the following circuit

Page 13: 3D-DRESD FT

13

OutlineOutline

Techniques:Triple module redundancy

Throughput logicState-machine logicI/O logicBRAM

Error detection and error correctionPartial reconfiguration

Real approachesSEU migration through dynamic partial reconfigurationRun-time fault reconfiguration

Conclusions

Page 14: 3D-DRESD FT

14

BRAMBRAM

BRAMs are large block of static memory (4K bits each) that are true dual port and fully synchronousTechniques:

Simple redundancyReplication of BRAMs

Redundancy and refreshReplication of BRAMsRefresh with voter

Data encryptionError Correction Control (ECC)

Page 15: 3D-DRESD FT

15

OutlineOutline

Techniques:Triple module redundancy

Throughput logicState-machine logicI/O logicBRAM

Error detection and error correctionPartial reconfiguration

Real approachesSEU migration through dynamic partial reconfigurationRun-time fault reconfiguration

Conclusions

Page 16: 3D-DRESD FT

16

Error detection and error Error detection and error correctioncorrection

It is more performance and cost effective to correct and error rather than retransmit the dataParity data are added to true data (64+8 or 32+7)No memory replication

Page 17: 3D-DRESD FT

17

OutlineOutline

Techniques:Triple module redundancy

Throughput logicState-machine logicI/O logicBRAM

Error detection and error correctionPartial reconfiguration

Real approachesSEU migration through dynamic partial reconfigurationRun-time fault reconfiguration

Conclusions

Page 18: 3D-DRESD FT

18

Partial reconfigurationPartial reconfiguration

Access to the configuration memory:Readback

Post-configuration read operation

Partial reconfigurationPost-configuration write operation

Techniques:SEU scrubbing

Partial reconfiguration

SEU detectionReadback

Bit for bit comparisonCRC comparison

SEU correctionReadbackPartial reconfiguration

Page 19: 3D-DRESD FT

19

OutlineOutline

Techniques:Triple module redundancy

Throughput logicState-machine logicI/O logicBRAM

Error detection and error correctionPartial reconfiguration

Real approachesSEU migration through dynamic partial reconfigurationRun-time fault reconfiguration

Conclusions

Page 20: 3D-DRESD FT

20

Dynamic partial Dynamic partial reconfigurationreconfiguration

Dynamic partial reconfiguration can be useful to trigger the reconfiguration of the affected portion of the architecture

while the rest of the system is still workingwithout need to perform a complete reconfiguration

It can be very useful to reconfigure the smallest portion of the FPGA where the fault is located (a good partitioning phase is needed)

Solution space exploration has to be performed

Page 21: 3D-DRESD FT

21

Dynamic partial reconfiguration Dynamic partial reconfiguration (DWC)(DWC)

Fault detection and characterizationIdentification of a mismatch

Fault localizationIdentification of the portion of the device where the fault is located

Several solutions with applying DWC

Page 22: 3D-DRESD FT

22

Dynamic partial reconfiguration (ro-index)Dynamic partial reconfiguration (ro-index)

ro-index: the ratio between the occupied area and its minimal placement constraint, both computed in slices

Occupied area in Slices: So

Placement constraint in Slices: Sc

ro-index = So / Sc

Page 23: 3D-DRESD FT

23

OutlineOutline

Techniques:Triple module redundancy

Throughput logicState-machine logicI/O logicBRAM

Error detection and error correctionPartial reconfiguration

Real approachesSEU migration through dynamic partial reconfigurationRun-time fault reconfiguration

Conclusions

Page 24: 3D-DRESD FT

24

Run-time fault reconfigurationRun-time fault reconfiguration

Recovery from permanent logic and interconnect faults

fine-grained physical design partitioning

Faults are localized to small partitioned blocks that have fixed interfaces to the surrounding portion of the device

affected block are reconfigured with previously generated, functionally equivalent block instances that do not use the faulty resources

Page 25: 3D-DRESD FT

25

Run-time fault reconfigurationRun-time fault reconfiguration

AssumptionsDetection of a faultLocalization of a faultDiagnosis of a fault (just helpful, not necessary)

ActionAn alternate configuration of the design can be loaded that does not utilize the faulty resources

Advantagesextremely low area overheadvery low timing overheadrun-time management of faultshigh flexibility

Disadvantagesvery complex design phase (and run-time management)

Page 26: 3D-DRESD FT

26

OutlineOutline

Techniques:Triple module redundancy

Throughput logicState-machine logicI/O logicBRAM

Error detection and error correctionPartial reconfiguration

Real approachesSEU migration through dynamic partial reconfigurationRun-time fault reconfiguration

Conclusions

Page 27: 3D-DRESD FT

27

ConclusionsConclusions

Reliable systems can be effectively implemented on FPGA devices

The previously presented techniques can be combined together in order to improve the overall reliability of the whole design

TMR combined with SEU correction through partial reconfiguration is a powerful and effective SEU migration strategy

3-state buffer can be used in order to implement fault tolerance methodologies without wasting LUTs (keeping low the area overhead)

Page 28: 3D-DRESD FT

28

The endThe end

•Thank you for your attention

•Do you have any questions?


Recommended