46
1 July 2005 Autonomous FPGA Fault Handling Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration Competitive Runtime Reconfiguration Ronald F. DeMara Ronald F. DeMara and and Kening Zhang Kening Zhang University of Central Florida University of Central Florida

Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

  • Upload
    hue

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration. Ronald F. DeMara and Kening Zhang University of Central Florida. 1 July 2005. Fault-Handling Techniques for SRAM-based FPGAs. Reprogrammable Device Failure. Characteristics. Duration :. Transient : SEU. - PowerPoint PPT Presentation

Citation preview

Page 1: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

1 July 20051 July 2005

Autonomous FPGA Fault HandlingAutonomous FPGA Fault Handlingthrough

Competitive Runtime ReconfigurationCompetitive Runtime Reconfiguration

Ronald F. DeMara Ronald F. DeMara and and Kening Zhang Kening ZhangUniversity of Central FloridaUniversity of Central Florida

Ronald F. DeMara Ronald F. DeMara and and Kening Zhang Kening ZhangUniversity of Central FloridaUniversity of Central Florida

Page 2: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Reprogrammable Device Failure

Duration:

Target:

Detection:

Isolation:

Diagnosis:

Recovery:

Transient: SEU Permanent: SEL, Oxide Breakdown, Electron Migration, LPD

Repetitive Readback [Wells00]

DeviceConfiguration

Approach: TMR(conventional

spatial redundancy)

BIST

Processing Datapath

DeviceConfiguration

Processing Datapath

Evolutionary

Bitwise Comparison

Invert BitValue

IgnoreDiscrepancy

MajorityVote

STARS[Abramovici01]

SupplementaryTestbench

CartesianIntersection

Worst-caseClock Period

Dilation

Replicate inSpare Resource

Characteristics

MethodsCED

[McCluskey04]

Duplex Output

Comparison

Fast Run-time Location

Select SpareResource

Sussex[Vigander01]

DuplexOutput

Comparison

(not addressed)

(not addressed)

unnecessary unnecessary

unnecessary

Population-basedGA using

Extrinsic FitnessEvaluation

EvolutionaryAlgorithm usingIntrinsic Fitness

Evaluation

Fault-Handling Techniques for SRAM-based FPGAs

CRR[DeMara05]

Page 3: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Previous Work Detection Characteristics of FPGA Fault-Handling Schemes

Fault Detection

Resource Coverage

Fault Isolation

Approach Fault Handling Method Latency Distinguish Transients

Logic Inter-

connect Comparator Granularity

TMR Spatial voting Negligible No Yes Yes No Voting element

[Vigander01] Spatial voting & offline

evolutionary regeneration

Negligible No Yes No No Voting element

[Lohn, Larchev, DeMara03]

Offline evolutionary regeneration

Negligible No Yes Yes No Unnecessary

[Lach98] Static-capability tile

reconfiguration Relies on independent fault detection mechanism

STARS [Abramovici01]

Roving Test Area Up to 8.5M

erroneous outputs Test pattern transients

Yes Yes No LUT function

[Keymeulen, Stoica,

Zebulum00]

Population-based fault insensitive design

Design-time prevention emphasis

No Yes Yes No Not addressed

at runtime

Competitive Runtime

Reconfiguration (CRR)

Competing configurations with temporal voting and online regeneration

Negligible

Transients are

attenuated automatically

Yes Yes Yes

Unnecessary, but can isolate

functional components

StrategiesStrategies: 1) Evolve redundancy into design before anticipated failure 2) Redesign after detection of failure 3) Combine desirable aspects of both strategies 1) + 2) …

Page 4: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

CRR Arrangement in SRAM FPGA

Configurations in PopulationConfigurations in Population• C = CL CR

• CL = subset of left-half configurations• CR = subset of right-half configurations• |CL|=|CR |= |C|/2

Discrepancy OperatorDiscrepancy Operator• Baseline Discrepancy Operator is dyadic operator with binary output:

• Z(Ci) is FPGA data throughput output of configuration Ci

• Each half-configuration evaluates using embedded checker (XNOR gate) within each individual

• Any fault in checker lowers that individual’s fitness so that individual is no longer preferred and eventually undergoes repair

Othewise

CZCZCC

Ri

LiR

iLi

)()(

1

0

Reconfiguration Algorithm

`

SR A M-based FPGA

LHalf-Configuration

Discrepancy Check L Discrepancy Check R

Function Logic L

CONFIGURATION BIT STREAM

INPUT DATA

Function Logic R

DATA OUTPUT

FEE

DB

AC

K

RHalf-Configuration

CONTROL

OFF

-CH

IP E

EPR

OM

( NO

TE: a

non

-vol

atile

mem

ory

is a

lread

y re

quire

d to

boo

t any

SR

AMFP

GA

from

col

d st

art .

.. th

is is

not

an

addi

tiona

l chi

p )

Rji

Ljii CEORC ,,j =RS:

(Hamming Distance)

Rji

Ljii CEORC ,,j ^ =WTA:

(Equivalence)

Page 5: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Terminology and Characteristics

Pristine Pool: Pristine Pool: CP. For any CiC, is member of CP at generation G if and only if

Suspect Pool:Suspect Pool: CS. For any CiC, is member of CS at generation G if and only if

at least one of

Under Repair Pool:Under Repair Pool: CU: For any CiC, is member of CU at generation G if and

only if

Refurbished Pool:Refurbished Pool: CR: after Genetic Operator applied, the new generated individual is member of CR at generation G if and only if

01

G

K

RK

LK CC

)1(0 GKCC RK

LK

11

G

K

RK

LK CC

01

G

K

RK

LK CC

ED is Discrepancy CountDiscrepancy Count of Ci and EC is Correctness CountCorrectness Count of Ci

Length of Evaluation Fitness Window:Length of Evaluation Fitness Window: W = ED+ EC

Fitness Metric:Fitness Metric: f(Ci) =EC/ EW

Page 6: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

1.1. InitializationInitialization Population P of functionally-identical yet physically-distinct configurations Partition P into sub-populations that use supersets of physically-distinct resources,

e.g. size |P|/2 to designate physical FPGA left-half or right-half resource utilization

2.2. Fitness AssessmentFitness Assessment Discrepancy Operator is some function of

bitwise agreement between each half’s output

Four Fitness States defined for Configurations as

{CP,CS,CU,CR} with transitions, respectively:

Pristine Suspect Under Repair Refurbished

Fitness Evaluation Window W determines comparison interval

3.3. RegenerationRegeneration Genetic Operators used to recover from fault based on Reintroduction Rate

Operators only applied once then offspring returned to “service” without for concern about increasing fitness

Sketch of CRR ApproachPremise: Recovery Complexity << Design Complexity

fitness assessment viafitness assessment via

pairwise discrepancypairwise discrepancy (temporal voting vs. (temporal voting vs.

spatial voting)spatial voting)

Page 7: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

States Transitions during lifetime of iStates Transitions during lifetime of ithth Half-Configuration Half-Configuration

Configuration Health States

pristine

suspect

refurbished

under repair

partial repair

L R

L = R

complete repair

primordial

L = R

L R

L R

L = R

L = R

LR

1

2

3

4

5

6

7

8

fi fOT

:L = R

: fi fOT

9

10

11

fi < fRT

L R:

fi < fRT

L R:

integral w ith

:fi fRT

:fi < fOT

COMPETITION

C O M P E T I T I O N

E V O L U T I O N

Page 8: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Procedural Flow under Competitive Runtime Reconfiguration

Initialization Population partitioned into

functionally-identical yetphysically-distincthalf-configurations

Fitness Adjustment

update fitness of onlyL and R based ondetection results

either L's or R'sfitness < Repair

Threshold?

Selectionchoose

FPGA configuration(s)labeled L and R

Detectionapply functional inputs

to compute FPGAoutputs using L, R

Adjust Controlsdetection mode, overlap interval, ...

invoke

GeneticOperators only once

and only on L or R

L=R

L=R

PRIMARYLOOP

discrepancyfree

L, R results

NO

YES

is

Integrates all fault handling stages using EC strategyIntegrates all fault handling stages using EC strategy Detects faults by the occurrence of discrepancy Isolates faults by accumulation of discrepancies Failure-specific refurbishment using Genetic Operators:

Intra-Module-Crossover, Inter-Module-Crossover, Intra-Module-Mutation

Realize online device refurbishmentRealize online device refurbishment Refurbished online without additional function or resource test vectors Repair during the normal data throughput process

Page 9: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Selection Process

Any Pristineindividuals?

Any Suspectindividuals?

Select* one Pristine individualas L half-configuration

Choose random number X on [0..1]

X >Re-introduction

rate?

YES

YES

YES

NO

NO

NO

* = selection that favors inventory rotation

**= selection based on fitness ranking that favors correctness

*** = selection based on fitness ranking that favors correctness with optional second-order metric such as routing delay (to automatically evolve better throughput performance at no additional cost)

Select** one Suspect individualas L half-configuration

Select*** one Refurbished individualas L half-configuration

Select*** one Under Repairindividual as R half-configuration

Select one Operational (Pristine*,Suspect**, or Refurbished***)

individual as R half-configuration

gotoDetectionprocess

X > R

Page 10: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Fitness Adjustment Procedure

Discrepancy?

Increase L's & R's fitnessaccording to fitness up-adjustment process

Decrease L's & R 's fitnessaccording to fitness down-adjustment process

Isthe individualPristine?

Mark individual as Suspect

Is itsfitness < Repair

Threshold?

YES

YES

NO

YES

NO

YES

Mark individual as Under Repair

Invoke Genetic Operators only onceand only on L or RMark individual as Refurbished

Isindividual Under

Repair?

Is itsfitness > Operational

Threshold?

YES

adjust controls& goto Selection process

fL,R>fOT

fL,R<fRT

Page 11: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Fitness Evaluation Window

• Fitness Evaluation WindowFitness Evaluation Window: W denotes number of iterations used to evaluate fitness before the state of

an individual is determined

• Determination ofDetermination of W for 3x3 multiplierfor 3x3 multiplier 6 input pins articulating 26=64 possible inputs W should be selected so that all possible inputs appear More formally,

Let rand(X) return some xi X at random

Seek W : [ rand(X) ] = X with high probabilityi=1

W

1

112

.....1

12.....

1

1

121

121

m

K

m

KK

DKK

Pm

K

xK

PK

PK

KP

K

K

KxK

xK

xK

Kx

K

K• xK = distinct orderings of K inputs showing in D trials

• if D constant, can calculate Pk>1 successively

• probability PK of K inputs showing after D trials is ratio of xK / KD

Page 12: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

When K=64:

W Determination

Page 13: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Impact of Fault on Viable Individuals

• Existence of Positive Test VectorExistence of Positive Test Vector Input Ip comprises a articulating test iff Ci(Ip) Cji(Ip) = 1 So if a discrepancy is detected then some Ip exists which manifests the fault

• Minimal Case whenMinimal Case when Ip is Uniqueis Unique

Ip is unique if fault is observable under exactly one input pattern

• Probability Mass Function for Encountering Minimal CaseProbability Mass Function for Encountering Minimal Case Ip

Consider W=600 yielding 99.5% coverage for a module with input space X=64

The number of input occurrences, 0 i 600, that randomly encounter Ip to

identify the fault is governed by the probability density function:

p.m.f. = where

W

iW

X

nX

i

W

1

16000,1,64,600 inXW

Page 14: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Integer Multiplier Case Study

• 3bit x 3bit unsigned multiplier3bit x 3bit unsigned multiplier automated design:esign:– Building blocks

Half-Adder: 18 templates created Full-Adder: 24 templates Parallel-And : 1 template created

– Randomly select templates for instantiation in modules

GA operatorsGA operatorsExternal-Module-CrossoverInternal-Module-Crossover Internal-Module-Mutation

GA parametersGA parametersPopulation size : 20 individuals Crossover rate : 5% Mutation rate : up to 80% per bit

Experimental EvaluationExperimental EvaluationXilinx Virtex II Pro on Avnet PCI board • Objective fitness function replaced by Objective fitness function replaced by

the Consensus-based Evaluation the Consensus-based Evaluation Approach and Relative FitnessApproach and Relative Fitness

• Elimination of additional test vectorsElimination of additional test vectors• Temporal Assessment processTemporal Assessment process

Experiments Demonstrate …Experiments Demonstrate …

Page 15: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Template Fault Coverage

Half-Adder Template A

Half-Adder Template B

Template ATemplate A– Gate3 is an AND gate– Will lose correctness if a Stuck-At-Zero fault occurs in second

input line of the Gate3, an AND gate

Template BTemplate B – Gate3 is a NOT gate and only uses the first input line– Will work correctly even if second input line is stuck at Zero or

One

Half-Adder Template A

Page 16: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Regeneration PerformanceRegeneration Performance

Difference (vs. Hamming Distance)Evaluation Window, Ew = 600Suspect Threshold: S = 1-6/600=99%Repair Threshold: R = 1-4/600 = 99.3%Re-introduction rate: r = 0.1

ParametersParameters:

Repairs evolvedRepairs evolved in-situ, in real-time, without additional test in-situ, in real-time, without additional test vectors, vectors, while allowing device to remainwhile allowing device to remain partially online. partially online.

Page 17: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Discrepancy Mirror

Fault CoverageFault Coverage

• Mechanism for Checking-the-Checker (“golden element” problem)

• Makes checker part of configuration that competes for correctness [DeMara PDPTA-05]

Page 18: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Discrepancy Mirror Circuit

Fault CoverageFault CoverageComponent Fault Scenarios Fault-Free

Function Output A Fault Correct Correct Correct Correct

Function Output B Correct Fault Correct Correct Correct

XNORA Disagree (0) Disagree (0) Fault : Disagree(0) Agree (1) Agree (1)

XNORB Disagree (0) Disagree (0) Agree (1) Fault : Disagree(0) Agree (1)

BufferA 0 0 High-Z 0 1

BufferB 0 0 0 High-Z 1

Match Output 0 0 0 0 1

Page 19: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Influence of LUT utilizationInfluence of LUT utilization

Perpetually Articulating InputsPerpetually Articulating Inputswith Equiprobable Distributionwith Equiprobable Distribution

Intermittently Articulating InputsIntermittently Articulating Inputswith Equiprobable Distributionwith Equiprobable Distribution

• expected number of pairings grows sub-linearly in number of resources

• utilization below 20% or above 80% implicates (or exonerates) a smaller sub-set of resources

• 50% utilization, the expected number of pairings for 1,000, 10,000, and 100,000 resources are 11.1, 14.9, and 17.6

• at 90% utilization mean value of 258 pairings are required to isolate the faulty resource.

Page 20: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Future Work:Development Board to Self-Contained FPGA

Year 1 Year 3Year 2

CRR on a Chip(Xilinx Virtex-II Pro)

Control viaon-chip

Power PC

Re-config

Config

Data

Configurationsin On ChipRAM Blocks

FunctionalCLBs

ICAP

Bit file

Data

Output

Request

Avnet FPGA Development Board

PCI Interface

Virtex-IIPro FPGA

Off ChipRAM

Controlhosted on

PCOutput

Bit file

Input Data

CRR on a Chip(Xilinx Virtex-II Pro)

Device Fault

Qualitative Analysis of CRR modelQualitative Analysis of CRR model• Number of iterations and completeness of regeneration repair • Percentage of time the device remains online despite physical resource

fault (availability)Hardware Resource ManagementHardware Resource Management

• Optimization of hardware profile for Xilinx Virtex II ProField Testing on SRAM-based FPGA in a Cubesat missionField Testing on SRAM-based FPGA in a Cubesat mission

Page 21: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Backup Slides

• On following pages …

Page 22: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Isolation: Block Duelling

• Algorithm based on group testingAlgorithm based on group testing methodsmethods• Successive intersection to assess health of resourcesSuccessive intersection to assess health of resources

Each configuration kk has a binary Usage Matrix UUk[i,j][i,j] 1 i m and 1 j n m, n are the number of rows and columns of resources in the device Elements Uk[i,j] = 1 are resources used in k

History Matrix H H [i,j][i,j] 1 i m and 1 j n, initially all zero, exists in which : entries represent the fitness of resources (i, j) Information regarding the fitness of resources over time is stored

A discrepant output will lead to an increase in the value of

H[i,j], Uk[i,j] = 1 ,k S All elements of H, corresponding to resources used by discrepant

configuration will be incremented by one. At any point in time, H[i,j] will be a record the outcomes of competitions m successive intersections among are performed

until |S|=1

Page 23: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Dueling Example

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 00 0 1 0 0 0 0 0 0 00 0 0 0 0 1 0 1 0 00 0 0 1 0 0 0 0 0 00 0 1 0 0 1 1 0 0 00 0 0 0 1 0 0 0 0 00 0 1 0 0 0 0 1 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 00 0 0 1 0 1 1 0 0 00 0 1 1 0 0 1 0 0 00 0 1 0 1 0 0 0 0 00 0 1 0 0 1 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 1 1 1 1 0 0 0

0 0 2 1 0 0 1 0 0 0

0 0 1 0 1 1 0 1 0 0

0 0 1 1 0 1 0 0 0 0

0 0 1 0 0 1 1 0 0 0

0 0 0 0 1 0 0 0 0 0

0 0 1 0 0 0 0 1 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

H H [i,j][i,j]@ t = 0

H H [i,j][i,j]@ t = 2

UU11 UU22

• H H [i,j] changes after [i,j] changes after CC1 1 andand C C2 2 are loadedare loaded• UU11 and and UU22 are corresponding are corresponding Usage MatricesUsage Matrices

• (3,3) is identified as the faulty resource(3,3) is identified as the faulty resource

Fitness of configuration Fitness of configuration kk

k

k

Page 24: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Isolation of a single faulty individual with 1-out-of-64 impact

• Outliers are identified after W iterations elapsed• E.V. = (1/64)*600 = 9.375 from minimum impact faulty individual• Isolated individual’s f differs from the average DV by 33 after 1 or more observation intervals of length W

Page 25: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Isolation of a single faulty L individual with 10-out-of-64 impact

• Compare with 1-out-of-64 fault impact E.V. of (10/64)*600 = 93.75 discrepancies for faulty configuration One isolation will be complete approx. once in every 93.75/5 = 19 Observation Intervals Fault Isolation demonstrated in 100% of case

Page 26: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Isolation of 8 faulty individuals L4&R4 with 1-out-of-64 impact

• Expected isolations do not occur approximately 40% of the time Average discrepancy value of the population is higher Outlier isolation difficult Multiple faulty individual, Discrepancies scattered

Page 27: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Online Dueling Evaluation

• ObjectiveObjective Isolate faults by successive intersection between sets of FPGA

resources used by configurations Analyze complexity of Isolation process

• VariablesVariables Total resources available

Measured in number of LUTs Number of Competing Configurations

Number of initial “Seed” designs in CRR process Degree of Articulation

Some inputs may not manifest faults, even if faulty resource used by individual

Resource Utilization Factor Percentage of FPGA resources required by target application/design

Number of Iterations for Isolation Measure of complexity and time involved in isolating fault

Page 28: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Isolation of Faulty Resource at the FPGA resource (LUT) granularity

• 50625 LUTs50625 LUTs comparable to LUTs on a Xilinx Virtex II Pro FPGAXilinx Virtex II Pro FPGA Xilinx Virtex II Pro has approximately

67 columns, 78 rows 4 slices per CLB 2 LUTs per slice

Page 29: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Isolation of Faulty Resource:Effect of Articulation

• No direct, uniform relation between % Articulation and Number of Isolations!• Performance best when Articulation (%) = 50% 50% 10% 10%

Each successive intersection provides maximal information Greatest number of resources are intersected out of “suspect” pool.

Page 30: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

For further info … EH Websitehttp://cal.ucf.edu

Page 31: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Fast Reconfiguration for Fast Reconfiguration for Autonomously Reprogrammable LogicAutonomously Reprogrammable Logic

• MotivationMotivation– Dynamic reconfiguration required by application– Exploit architectural & performance improvements fully– Reconfiguration delay – a major performance barrier

• Previous WorkPrevious Work• MethodologyMethodology

– Multilayer Runtime Reconfiguration Architecture (MRRA)– Spatial Management

• Prototype DevelopmentPrototype Development – Loosely-Coupled solution– Timing Analysis – System-On-Chip solution

Page 32: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Reconfiguration Demand during CRRReconfiguration Demand during CRR

For a complete repairFor a complete repair – Approximately 2,000 generations ( ) may be required– For each generation, # evaluations may be up to 100 evaluations– Yielding the Cumulative Number of Reconfigurations (CNR) up to

– For each reconfiguration task

)()()( iTiTiTL EDRTTATi

CNR

iitot LL

1

Even if reconfiguration delay alone is assumed to be in the order of tens or hundreds of milliseconds Ltot >= 5.5 hours

– Therefore, the total delay

CRG

newO

000,20 newCR OG

Page 33: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Previous Work - Tool LevelPrevious Work - Tool Level

ApproachFPGA

SupportedOn-chip System

Bit Stream Reuse

System Coupling Degree

Potential Limitations

Moraes,

Mesquita,

Palma, Moller

Virtex XCV300 devices

No N LooseLack of Area

Relocation Capability

Raghavan, Sutton

Xilinx Virtex

devicesNo N Loose

Cumbersome CAD flow

Blodget, McMillan

Virtex II devices

Partial Y Medium

Limited hardware speed and capacity. Lack of

information for bit stream

reuse

Page 34: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Previous Work - Algorithm LevelPrevious Work - Algorithm Level

Approach MethodPartial

ReconfigSpatial

Relocation

Temporal

ParallelismArea

shapeRun-Time

Potential Limitations

Hauck, Li, Schwabe

Bit file compression

N/A No N/A N/A NoFull

reconfiguration required

Shirazi, Luk, Cheung

Identifying common

componentsYes No Yes N/A No

Design time work required

Mak, YoungDynamic

PartitioningYes No Yes N/A Yes

Only desirable for large designs

Ganesan, Vemuri

Pipelining Yes No Yes N/A YesLimited

pipeline depth

Compton, Li, Knol, Hauck

Relocation and Defragmentatio

n with new FPGA

architecture

Yes Yes No Row-based YesSpecial FPGA architecture

required

Diessel, Middendorf

Schmeck, Schmidt

Task Remapped and Relocated

Yes Yes No Rectangle YesOverhead for remapping

calculations

Herbert, Christoph,

Macro

Partitioning and 2D Hashing

Yes Yes Yes Rectangle YesRigid task modeling

assumptions

compression method temporal method spatial method

Page 35: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Multilayer Runtime Reconfiguration Architecture Multilayer Runtime Reconfiguration Architecture

(MRRA)(MRRA)

Fault-RepairGenetic Algorithm

ReconfigurationEngineM

icro

proc

esso

r

System Bus

Virtex-II ProFPGA RAM

Control S

ystem

• Develop MRRA fast Develop MRRA fast reconfiguration paradigm for the reconfiguration paradigm for the CRR approachCRR approach

• Validate with real hardware Validate with real hardware platform along with detailed platform along with detailed performance analysis performance analysis

• First general-purpose framework First general-purpose framework for a wide variety of applications for a wide variety of applications requiring dynamic reconfiguration requiring dynamic reconfiguration

• Extend existing theories on Extend existing theories on reconfiguration reconfiguration

Page 36: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Avnet FPGA Development Board

PCI I nt er f ace

Virtex-IIPro FPGA

Off ChipRAM

Controlhosted on

PC

FP

GA

Ou

tp

ut

Bit file

Input Data

Loosely Coupled SolutionLoosely Coupled Solution

The entire system operates on a The entire system operates on a 32-bit basis32-bit basis

The The Virtex-II ProVirtex-II Pro is mounted on a is mounted on a development board which can then development board which can then

be interfaced with a WorkStation be interfaced with a WorkStation running running XilinxXilinx EDK and ISE. EDK and ISE.

Page 37: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Result AssessmentResult Assessment

• Establish full functional framework of both prototypesEstablish full functional framework of both prototypes

• Communication overhead, throughput and overall speed-up Communication overhead, throughput and overall speed-up

analysisanalysis Communication overhead for SOC solution is decreased to micro or sub-

micro second order Vs. milliseconds order of Loosely Coupled solution

Up to 5-fold speedup is expected compared to the Loosely Coupled solution

• Translation Complexity AnalysisTranslation Complexity Analysis The quantity of information that needs to be translated to generate the

reconfiguration bitstream

Simplification from file level to bit level is expected

• Storage Complexity AnalysisStorage Complexity Analysis– The memory space required for the run-time algorithms– Decreased memory requirement is expected due to the translation

complexity improvement

Page 38: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Project Milestones

Nov2004

Start

Jan2005

Mar2005

May2005

Jul2005

Sep2005

Nov2005

Jan2006

Mar2006

May2006

Jul2006

Sep2006

Nov2006

Jan2007

Mar2007

Jul2007

API &SEC

circuit

Scripts GArepresentationfor prototype 1

Performanceanalysis forprototype 1

on 3*3multiplier

OS forthe

SOC

ICAPcircuit

Reconfig.Peformance

Report

SOCFinal

Report

Performanceanalysis for

prototype 1 onQuad Decoder

circuit

HWHW Schedule:Schedule:

SW Schedule:SW Schedule:Nov2004

Start

Jan2005

Mar2005

May2005

Jul2005

Sep2005

Nov2005

Jan2006

Mar2006

May2006

Jul2006

Sep2006

Nov2006

Jan2007

Mar2007

Jul2007

Evaluate CRRParameters in3x3 multiplier

design

Design GUIof 3X3

multiplier

Build VHDLmodule and

incorporate intothe hardware

prototype

FPGA-resident

CRR

Implementthe SECcircuitdesign

OptimizedParametersfor layeredcomb/seqdesigns

Regen.Final

Report

Performanceanalysis for

prototype 1 onQuad Decoder

circuit

Page 39: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Publications

AcceptedAccepted ManuscriptsManuscripts1. R. F. DeMara and K. Zhang, “Autonomous FPGA Fault Handling through Competitive Runtime

Reconfiguration,” to appear in NASA/DoD Conference on Evolvable Hardware(EH’05), Washington D.C., U.S.A., June 29 – July 1, 2005.

2. H. Tan and R. F. DeMara, “A Device-Controlled Dynamic Configuration Framework Supporting Heterogeneous Resource Management,” to appear in International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA’05), Las Vegas, Nevada, U.S.A, June 27 – 30, 2005.

3. R. F. DeMara and C. A. Sharma, “Self-Checking Fault Detection using Discrepancy Mirrors,” to appear in International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’05), Las Vegas, Nevada, U.S.A, June 27 – 30, 2005.

SubmittedSubmitted ManuscriptsManuscripts1. R. F. DeMara and K. Zhang, “Populational Fault Tolerance Analysis Under CRR Approach,”

submitted to International Conference on Evolvable Systems (ICES’05), Barcelona, Sept. 12 – 14, 2005.

2. R. F. DeMara and C. A. Sharma, “FPGA Fault Isolation and Refurbishment using Iterative Pairing,” submitted to IFIP VLSI-SOC Conference, Perth, W. Australia, October 17 – 19, 2005.

Manuscripts In-preparationManuscripts In-preparation 1. R. F. DeMara and K. Zhang, “Autonomous Fault Occlusion through Competitive Runtime

Reconfiguration,” submission planned to IEEE Transactions on Evolutionary Computation.

2. R. F. DeMara and C. A. Sharma, “Multilayer Dynamic Reconfiguration Supporting Heterogeneous FPGA Resource Management,” submission planned to IEEE Design and Test of Computers.

Field TestingField TestingImplementation of CRR on-board SRAM-based FPGA in a Cubesat mission

Page 40: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

EHW Environments

• Evolvable Hardware (EHW) Environments enable experimental methods to research soft computing intelligent search techniques

• EHW operates by repetitive reprogramming of real-world physical devices using an iterative refinement process:

Genetic

Algorithm

Hardware in the loop

orTwo

modes

of

Evolvabl

e

Hardwar

e

Extrinsic Evolution

Genetic

Algorithm

software modelDone? Build it

device “design-time”refinement

Simulation in the loop

Intrinsic Evolution

device “run-time”refinement

new approach to

Autonomous Repair

of failed devices

Stardust Satellite: • >100 FPGAs onboard• hostile environment: radiation, thermal stress• How to achieve reliability to avoid mission failure???

Application

Page 41: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Genetic Algorithms (GAs)

Mechanism coarsely modeled after neo-Darwinism (natural selection + genetics)

selection of

parents

population of candidate solutions

parents

offspring

crossover

mutation

evaluatefitness

ofindividuals

replacement

start

Fitnessfunction

Goal reached

Page 42: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Genetic Mechanisms

• Guided trial-and-error search techniques using principles of Darwinian evolution iterative selection, “survival of the fittest” genetic operators -- mutation, crossover, … implementor must define fitness function

• GAs frequently use strings of 1s and 0s to represent candidate solutions if 100101 is better than 010001 it will have more chance to breed and

influence future population

• GAs “cast a net” over entire solution space to find regions of high fitness

• Can invoke Elitism Operator (E=1, E=2 …) guarantees monotonically increasing fitness of best individual over all

generations

Page 43: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Commercial Applications: Nextel: frequency allocation for cellular phone networks -- $15M

predicted savings in NY market Pratt & Whitney: turbine engine design --- engineer: 8 weeks;

GA: 2 days w/3x improvement

International Truck: production scheduling improved by 90% in 5 plants

NASA: superior Jupiter trajectory optimization, antennas, FPGAs

Koza: 25 instances showing human-competitive performance such as analog circuit design, amplifiers, filters

GA Success Stories

Page 44: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Representing Candidate Solutions

IndividualIndividual(Chromosome)(Chromosome)

GENEGENE

Representation of an individual can be using discrete values (binary, integer, or any other system with a discrete set of values)

Example of Binary DNA Encoding:

Page 45: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Genetic Operators

t t + 1

mutation

recombination (crossover)

reproduction

selection

Page 46: Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration

Crossover Operator

Population: . . .

1 1 1 1 1 1 1 0 0 0 0 0 0 0 parentscut cut

1 1 1 0 0 0 0 0 0 0 1 1 1 1 offspring