21
An Architecture for Fail An Architecture for Fail - - Silent Silent Operation of FPGAs and Operation of FPGAs and Configurable Configurable SoCs SoCs Lee W. Lerner and Charles E. Stroud Lee W. Lerner and Charles E. Stroud based on presentation at International Conf. on based on presentation at International Conf. on Embedded Systems & Applications, June 2006 Embedded Systems & Applications, June 2006

An Architecture for Fail-Silent Operation of FPGAs and

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Architecture for Fail-Silent Operation of FPGAs and

An Architecture for FailAn Architecture for Fail--Silent Silent Operation of FPGAs and Operation of FPGAs and

Configurable Configurable SoCsSoCsLee W. Lerner and Charles E. StroudLee W. Lerner and Charles E. Stroud

based on presentation at International Conf. on based on presentation at International Conf. on Embedded Systems & Applications, June 2006Embedded Systems & Applications, June 2006

Page 2: An Architecture for Fail-Silent Operation of FPGAs and

VLSI Design & Test Seminar, Spring 2007 2

Outline of PresentationOutline of PresentationMotivation and BackgroundMotivation and Background

Overview of FailOverview of Fail--Silent operationSilent operationSingle Event Upsets (Single Event Upsets (SEUsSEUs))

FailFail--Silent ArchitectureSilent ArchitectureFault isolation with Guard BandsFault isolation with Guard Bands

Experimental ImplementationsExperimental ImplementationsAtmel AT94K series Atmel AT94K series SoCSoCXilinx VirtexXilinx Virtex--4 series FPGAs4 series FPGAsTriple Modular Redundancy (TMR)Triple Modular Redundancy (TMR)

SummarySummaryFuture WorkFuture Work

Page 3: An Architecture for Fail-Silent Operation of FPGAs and

VLSI Design & Test Seminar, Spring 2007 3

Motivation and BackgroundMotivation and BackgroundFailFail--Silent operationSilent operation

Halt all operation immediately upon Halt all operation immediately upon occurrence of a faultoccurrence of a faultReduces need for periodic offReduces need for periodic off--line system line system testingtesting

Single Event Upsets (Single Event Upsets (SEUsSEUs))Transient or soft radiationTransient or soft radiation--induced errors in induced errors in microelectronic devices microelectronic devices Known to occur in highKnown to occur in high--radiation radiation environments such as spaceenvironments such as spaceAffect FPGA configuration memoryAffect FPGA configuration memory

Page 4: An Architecture for Fail-Silent Operation of FPGAs and

VLSI Design & Test Seminar, Spring 2007 4

Single Event Upsets (Single Event Upsets (SEUsSEUs))Energetic particles causing Energetic particles causing SEUsSEUs

Galactic cosmic raysGalactic cosmic raysCosmic solar particles influenced by solar flaresCosmic solar particles influenced by solar flaresTrapped protons in radiation beltsTrapped protons in radiation belts

Page 5: An Architecture for Fail-Silent Operation of FPGAs and

VLSI Design & Test Seminar, Spring 2007 5

Single Event Upsets (Single Event Upsets (SEUsSEUs))

CMOS Inverter Modified from Tribble, A. C., The Space Environment – Implications for Spacecraft Design, 2nd Ed., (Princeton, NJ: Princeton University Press, 2003).

VIN

VOUT

p-type substrate

n+ n+

n-well

p+ p+p+ n+

VSSVDD

Source

Gate

Drain Source

Radiation(proton, ion, neutron, …)

+

++

+

++

+

-

-

-

-

-Upset occurs ifchannel current turned on

Latchup occurs if parasitic current loop initiated

SEU effects on CMOS technologySEU effects on CMOS technologyChange logic values of transistorsChange logic values of transistors Vin Vout

VDD

VSS

CMOS Inverter

Page 6: An Architecture for Fail-Silent Operation of FPGAs and

VLSI Design & Test Seminar, Spring 2007 6

SEU Effects on an FPGASEU Effects on an FPGA

word

BIT

RAM Cell

Coupled Inverters BIT

Configuration Memory Bit

Wire BWire A

Programmable Interconnect Point (PIP)

PIP Connecting the Routing of Multiple Modules

Module 2

Module 1

isolatedwire segments

Deactivated PIP

Traditional TMR Approach

Page 7: An Architecture for Fail-Silent Operation of FPGAs and

VLSI Design & Test Seminar, Spring 2007 7

FailFail--Silent ArchitectureSilent ArchitectureGuard band region of isolationGuard band region of isolation

Isolate multiple working circuitsIsolate multiple working circuitsNo single fault can allow interaction between two No single fault can allow interaction between two working circuitsworking circuits

WorkingRegion

#1

WorkingRegion

#2

input set #1 input set #2

fail-silentoutput set #1

fail-silentoutput set #2

guard bandwith fault

monitor circuit

Page 8: An Architecture for Fail-Silent Operation of FPGAs and

VLSI Design & Test Seminar, Spring 2007 8

FailFail--Silent ArchitectureSilent ArchitectureFault monitoring circuitFault monitoring circuit

For each output of independent working regionsFor each output of independent working regionsPairPair--wise compare outputs of working regionswise compare outputs of working regionsTriTri--state output when any mismatch occursstate output when any mismatch occursInitiate processor routine to reconfigure FPGAInitiate processor routine to reconfigure FPGA

to processor interruptfail-silent output

output fromregion #1

PLB PLBs forfault isolation

guard bandwith fault

monitor circuit

output fromregion #2

tri-state buffer

Page 9: An Architecture for Fail-Silent Operation of FPGAs and

VLSI Design & Test Seminar, Spring 2007 9

Atmel AT94K Series Configurable SoC Architecture

AT94K SoCArchitecture Our AT94K Demo &

Development Board

Page 10: An Architecture for Fail-Silent Operation of FPGAs and

VLSI Design & Test Seminar, Spring 2007 10

repeaters

guard band

express x8

PLBs

local x4express x8

local cross-point PIPs

= Programmable Interconnect Point (PIP)

Y

Y

Y Y

X X

X

Local Routing

PLB

Global Routing (1 PLB) Horizontal Repeaters in Global Routing

4 PLBs 8 PLBs

repeaters×8 lines×4 lines

X

Atmel AT94K Routing Architecture Atmel AT94K Routing Architecture

Page 11: An Architecture for Fail-Silent Operation of FPGAs and

VLSI Design & Test Seminar, Spring 2007 11

Guard Band Implementation in AT94KGuard Band Implementation in AT94K8080--bit LFSR bit LFSR system system functionsfunctions4 PLB wide 4 PLB wide guard band guard band regionregionFault monitor Fault monitor circuit in circuit in guard band guard band regionregion

System Function

Fault Monitor

Page 12: An Architecture for Fail-Silent Operation of FPGAs and

VLSI Design & Test Seminar, Spring 2007 12

Guard Band Implementation in AT94KGuard Band Implementation in AT94KSystem Function

Fault Monitor

Page 13: An Architecture for Fail-Silent Operation of FPGAs and

VLSI Design & Test Seminar, Spring 2007 13

Basic Virtex-4 ArchitecturePIPsPIPs and Routing resourcesand Routing resources

4 types of 4 types of PIPsPIPsDouble lines (x2 lines) span 2 PLBsDouble lines (x2 lines) span 2 PLBsHex lines (x6 lines) span 6 PLBsHex lines (x6 lines) span 6 PLBsLong lines span width and length of PLB arrayLong lines span width and length of PLB array

= PLBs(1,368 – 22,272)

= block RAMs(36 – 552)

= DSPs(32-512)

= PowerPCs(0-2)

Horizontal guard bands work best with Virtex-4 architecture

Page 14: An Architecture for Fail-Silent Operation of FPGAs and

VLSI Design & Test Seminar, Spring 2007 14

System Function System FunctionFault Monitor

Guard BandPLBw/ 4

slices

I/Obuffer

System Function System FunctionFault Monitor

Guard Band

I/Obuffer

Guard Band Implementation in VirtexGuard Band Implementation in Virtex--44Xilinx ISE: constraints in PACE and routing in FPGA EditorXilinx ISE: constraints in PACE and routing in FPGA Editor

Two 5Two 5--bit LFSR system functionsbit LFSR system functions6 PLB wide guard band region with fault monitoring circuit6 PLB wide guard band region with fault monitoring circuit

Page 15: An Architecture for Fail-Silent Operation of FPGAs and

VLSI Design & Test Seminar, Spring 2007 15

7474--bit LFSR Implementationbit LFSR Implementation

System Function System Function

Fault Monitor

Guard Band

I/Obuffer

Guard Band

System Function System Function

Fault Monitor

PLBw/ 4

slices

I/Obuffer

Page 16: An Architecture for Fail-Silent Operation of FPGAs and

VLSI Design & Test Seminar, Spring 2007 16

Triple Modular Redundancy (TMR) Implementations in FPGAs

Traditional TMR SEU susceptibility problemTraditional TMR SEU susceptibility problemWire segments from a PIP can access multiple Wire segments from a PIP can access multiple modulesmodules

Therefore, 1 fault can destroy faultTherefore, 1 fault can destroy fault--tolerancetoleranceSpecial place and route algorithms needed to avoid Special place and route algorithms needed to avoid problemproblem

Deactivated PIP

TMR fault isolation with guard band regionsTMR fault isolation with guard band regionsGuard bands isolate module components and routingGuard bands isolate module components and routing

Module 2

Module 1 Module 3

Majority Voter

isolatedwire segments

Majority Voter

Module1

Module2

Module3

Guard Bands

Page 17: An Architecture for Fail-Silent Operation of FPGAs and

VLSI Design & Test Seminar, Spring 2007 17

Traditional TMR Implementation in AT94KTraditional TMR Implementation in AT94K

Mixed Routing of 3

Different System

Functions

Page 18: An Architecture for Fail-Silent Operation of FPGAs and

VLSI Design & Test Seminar, Spring 2007 18

TMR Implementation in AT94KTMR Implementation in AT94KSystem

Function A

System Function B

System Function C

Majority Voter Circuit

Page 19: An Architecture for Fail-Silent Operation of FPGAs and

VLSI Design & Test Seminar, Spring 2007 19

Fault Injection ResultsFault Injection Results

Majority Voter

Module1

Module2

Module3

Guard Bands

√√

AVR Fault Injection

Module1

Module2

Module3

Module1

Majority VoterMajority Voter

××√√

TMR TMR -- Pass 1:Pass 1:No fault injection No fault injection Majority Voter PassesMajority Voter Passes

TMR TMR -- Pass 2:Pass 2:Module 1 injected with fault Module 1 injected with fault Majority Voter PassesMajority Voter Passes

TMR TMR -- Pass 3:Pass 3:Modules 1 & 3 injected with faults Modules 1 & 3 injected with faults Majority Voter FailsMajority Voter Fails

Module3

Page 20: An Architecture for Fail-Silent Operation of FPGAs and

VLSI Design & Test Seminar, Spring 2007 20

repeaters

guard band

express x8

PLBs

local x4

local cross-point PIPs

express x8

Guard Band:Guard Band:Injected 240 faults at edge of guard band with no failureInjected 240 faults at edge of guard band with no failureMultiple specific faults required to cause failureMultiple specific faults required to cause failure

Fault Injection ResultsFault Injection Results

Page 21: An Architecture for Fail-Silent Operation of FPGAs and

VLSI Design & Test Seminar, Spring 2007 21

SummarySummaryGuard Band regions for FPGAsGuard Band regions for FPGAs

Isolate multiple working regions that contain Isolate multiple working regions that contain functionally equivalent system functionsfunctionally equivalent system functions

Fault monitoring circuits within guard bandsFault monitoring circuits within guard bandsMonitor and compare working region outputsMonitor and compare working region outputsTriTri--state outputs when a mismatch occursstate outputs when a mismatch occurs

FailFail--Silent operationSilent operationHalt operation immediately upon occurrence of a faultHalt operation immediately upon occurrence of a faultArea overhead only 2x that of nonArea overhead only 2x that of non--faultfault--tolerant circuittolerant circuitUse with TMR to achieve faultUse with TMR to achieve fault--tolerancetolerance

Single Event Upsets (Single Event Upsets (SEUsSEUs))Architecture provides immediate indication to initiate Architecture provides immediate indication to initiate scrubbing of the configuration memory scrubbing of the configuration memory