36
Using Implications for Using Implications for Online Error Detection Online Error Detection Nuno Alves, Jennifer Nuno Alves, Jennifer Dworak, and R. Iris Dworak, and R. Iris Bahar Bahar Division of Division of Engineering Engineering Brown University Brown University Providence, RI 02912 Providence, RI 02912 Kundan Nepal Electrical Engineering Dept. Bucknell University Lewisburg, PA 17837 International Test Conference, October 28-30, 2008

Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Embed Size (px)

Citation preview

Page 1: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Using Implications for Online Using Implications for Online Error DetectionError Detection

Nuno Alves, Jennifer Dworak, Nuno Alves, Jennifer Dworak, and R. Iris Baharand R. Iris Bahar

Division of Engineering Division of Engineering Brown UniversityBrown University

Providence, RI 02912Providence, RI 02912

Kundan NepalElectrical Engineering Dept.

Bucknell UniversityLewisburg, PA 17837

International Test Conference, October 28-30, 2008

Page 2: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

MotivationMotivation

Circuits are becoming more susceptible to Circuits are becoming more susceptible to transient errors….transient errors….– Soft errors, test escapes, noise, etc.Soft errors, test escapes, noise, etc.

Some applications need a Some applications need a reductionreduction in in error rates.error rates.

$$$$Error

DetectionCan we efficiently tradeoff Can we efficiently tradeoff error detection and cost?error detection and cost?

- using logic implications- using logic implications

Page 3: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

OutlineOutline

Common error detection techniquesCommon error detection techniques

Our approach—logic implicationsOur approach—logic implications

Finding an implication setFinding an implication set

Error coverageError coverage

Balancing error coverage and overheadBalancing error coverage and overhead

ConclusionsConclusions

Page 4: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

OutlineOutline

Common error detection techniquesCommon error detection techniques

Our approach—logic implicationsOur approach—logic implications

Finding an implication setFinding an implication set

Error coverageError coverage

Balancing error coverage and overheadBalancing error coverage and overhead

ConclusionsConclusions

Page 5: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

(Some) Previous Techniques in (Some) Previous Techniques in Online Error DetectionOnline Error Detection

Redundancy in time — e.g. re-executing in Redundancy in time — e.g. re-executing in a redundant threada redundant thread

Logic duplication or TMRLogic duplication or TMR

Codes — e.g. Parity, Berger, Bose LinCodes — e.g. Parity, Berger, Bose Lin

Pre-computed test vectors and their Pre-computed test vectors and their expected responses (stored in hardware)expected responses (stored in hardware)

High-level functional assertionsHigh-level functional assertions

Page 6: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

OutlineOutline

Common error detection techniquesCommon error detection techniques

Our approach—logic implicationsOur approach—logic implications

Finding an implication setFinding an implication set

Error coverageError coverage

Balancing error coverage and overheadBalancing error coverage and overhead

ConclusionsConclusions

Page 7: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Our Approach—Logic ImplicationsOur Approach—Logic Implications

Error detection compares expected Error detection compares expected behavior to actual behaviorbehavior to actual behavior

Implications Implications within a logic block describe within a logic block describe expected relationshipsexpected relationships between values at between values at circuit sites.circuit sites.

Violation of an expected implication Violation of an expected implication indicates the presence of an error. indicates the presence of an error.

Page 8: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Implications Naturally Occur in CircuitsImplications Naturally Occur in Circuits

n1

n2n3

n4n5

n6n7

n80

1

00

n5 = 1 → n8 = 0

Page 9: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Implication Violations Can Be Used Implication Violations Can Be Used to Detect Errorsto Detect Errors

ERROR

n1

n2n3

n4n5

n6n7

n8

n5=1 n8=0

Appropriate checker logic can detect multiple errors with a single implication.

Page 10: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Implication Violations Can Be Used Implication Violations Can Be Used to Detect Errorsto Detect Errors

ERROR

n1

n2n3

n4n5

n6n7

n8

n5=1 n8=0

Appropriate checker logic can detect multiple errors with a single implication.

sa1

sa1

sa1sa1sa1

sa1

sa1

sa1

sa1

sa1

sa1sa1

sa1

sa1sa1

sa1

sa1

sa1sa1

sa1

sa1

Page 11: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Identified Implications Determine Identified Implications Determine Checker HardwareChecker Hardware

Page 12: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

OutlineOutline

Common error detection techniquesCommon error detection techniques

Our approach—logic implicationsOur approach—logic implications

Finding an implication setFinding an implication set

Error coverageError coverage

Balancing error coverage and overheadBalancing error coverage and overhead

ConclusionsConclusions

Page 13: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Finding ImplicationsFinding Implications

Gate-level implications can be identified Gate-level implications can be identified automaticallyautomatically without requiring functional knowledge of the circuit in without requiring functional knowledge of the circuit in three steps:three steps:

Quickly identify potential implications:Quickly identify potential implications:– Choose potential sites of adequate distanceChoose potential sites of adequate distance– Fast good circuit simulationFast good circuit simulation– Look for missing logic value pairsLook for missing logic value pairs

Validate implicationsValidate implications– SAT solverSAT solver

Reduce implication setReduce implication set– Structural and error detection analysisStructural and error detection analysis

Page 14: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

So…how many “natural” So…how many “natural” implications are there?implications are there?

Page 15: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Total Number of Implications With Distance 2 or Greater

1

10

100

1000

10000

100000

Circuit

Nu

mb

er

of

Imp

lica

tion

s

Page 16: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Identifying “Subsumed” ImplicationsIdentifying “Subsumed” Implications

All the errors covered by a short-distance All the errors covered by a short-distance implication may sometimes also be implication may sometimes also be covered by a long-distance implication…. covered by a long-distance implication…. n1

n2n3

n4n5

n6n7

n8n9

n10

n11

n12

n13

Page 17: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Identifying “Subsumed” ImplicationsIdentifying “Subsumed” Implications

All the errors covered by a short-distance All the errors covered by a short-distance implication may sometimes also be implication may sometimes also be covered by a long-distance implication…. covered by a long-distance implication…. n1

n2n3

n4n5

n6n7

n8n9

n10

n11

n12

n13

n10 = 0 → n13 = 0

Page 18: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Identifying “Subsumed” ImplicationsIdentifying “Subsumed” Implications

All the errors covered by a short-distance All the errors covered by a short-distance implication may sometimes also be implication may sometimes also be covered by a long-distance implication…. covered by a long-distance implication…. n1

n2n3

n4n5

n6n7

n8n9

n10

n11

n12

n13

n10 = 0 → n13 = 0

n10 = 0 → n8 = 0

Page 19: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Identifying “Subsumed” ImplicationsIdentifying “Subsumed” Implications

All the errors covered by a short-distance All the errors covered by a short-distance implication may sometimes also be implication may sometimes also be covered by a long-distance implication…. covered by a long-distance implication…. n1

n2n3

n4n5

n6n7

n8n9

n10

n11

n12

n13

n10 = 0 → n13 = 0

n10 = 0 → n8 = 0

n4 = 1 → n8 = 0

n4 = 1 → n11 = 0

n4 = 1 → n13 = 0

n4 = 11 → n8 = 0

Page 20: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Reducing the Implication ListReducing the Implication List

SubsumedSubsumed implications detected through implications detected through structural analysis:structural analysis:– Implications fall on the same path with Implications fall on the same path with

appropriate “implied values”appropriate “implied values”– No fanout branches along the pathNo fanout branches along the path– The implication with the longest “distance” The implication with the longest “distance”

between implication sites is retained.between implication sites is retained.

Page 21: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

So, how much does this reduce the So, how much does this reduce the size of our implication lists?size of our implication lists?

Page 22: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Implication List Size after Removing Subsumed Implications

05000

100001500020000250003000035000

circuit

Nu

mb

er

of

Imp

lica

tion

s

before subsume after subsume

Page 23: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Compressing the Implication List Compressing the Implication List While Maintaining QualityWhile Maintaining Quality

Once subsumed implications are removed, the Once subsumed implications are removed, the implication list may still be too long.implication list may still be too long.

Evaluate the remaining implications for Evaluate the remaining implications for “implication quality”“implication quality”

Implication quality calculated for every Implication quality calculated for every implication/fault pair:implication/fault pair:

Each fault’s “highest quality” implication is added Each fault’s “highest quality” implication is added to the listto the list

patternsinput

faultmissedqualitynimplicatio

#1100

Page 24: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Number of "High-Quality" Implications after Compression Step

0100200300400500600700800900

1000

Circuit

Nu

mb

er

of

Co

mp

ress

ed

Im

plic

atio

ns

Page 25: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Compression Ratios for Compressed Implication Set

0

20

40

60

80

100

120

Circuit

Co

mp

ress

ion

Ra

tio

vs. initial vs. subsumed

Page 26: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

OutlineOutline

Common error detection techniquesCommon error detection techniques

Our approach—logic implicationsOur approach—logic implications

Finding an implication setFinding an implication set

Error coverageError coverage

Balancing error coverage and overheadBalancing error coverage and overhead

ConclusionsConclusions

Page 27: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Covering Faults with ImplicationsCovering Faults with Implications

For each random input vector, and for each For each random input vector, and for each fault, the implications-based circuit operation fault, the implications-based circuit operation can fall into the following 4 categories:can fall into the following 4 categories:

CaseCase

11

CaseCase

22

CaseCase

33

CaseCase

44

Error Propagates To OutputError Propagates To Output

An Implication is ViolatedAn Implication is Violated

True detection

False posit

ive

True miss

Benign miss

Page 28: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Distribution of the 4 Cases for Random Input Patterns

0

10

20

30

40

50

60

70

rd73

z5xp

1cli

p

z9sy

mb1

2

mise

x2c4

32c4

99c8

80

c135

5

c190

8

% o

f To

tal

Case 1: Error Propagated & Implication ViolatedCase 2: Error NOT Propagated & Implication ViolatedCase 3: Error NOT Propagated & Implication NOT ViolatedCase 4: Error Propagated & Implication NOT Violated

Page 29: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Average Detection Rate for Errors that Propagate to an Output

0.010.020.030.040.050.060.070.080.090.0

100.0

circuit

case

1/

(ca

se 1

+ c

ase

4)

Page 30: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

OutlineOutline

Common error detection techniquesCommon error detection techniques

Our approach—logic implicationsOur approach—logic implications

Finding an implication setFinding an implication set

Error coverageError coverage

Balancing error coverage and overheadBalancing error coverage and overhead

ConclusionsConclusions

Page 31: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

What is the hardware overhead?What is the hardware overhead?

Include all implications remaining after compressInclude all implications remaining after compress

Used simple implementation for each implication Used simple implementation for each implication (AND gate and up to 2 inverters)(AND gate and up to 2 inverters)

Outputs of AND gates OR’ed togetherOutputs of AND gates OR’ed together

180nm TSMC library and Mentor Graphics 180nm TSMC library and Mentor Graphics Toolset used to generate layout and calculate Toolset used to generate layout and calculate area overhead.area overhead.

Page 32: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Area Overhead Comparison for all Compressed Implications

020406080

100120140160180

Circuit

Ove

rhe

ad

(%

)

DUPLICATION PARITY IMPLICATIONS

Page 33: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Trading off Area Overhead and Trading off Area Overhead and CoverageCoverage

Coverage/area tradeoffs are intuitively easy with Coverage/area tradeoffs are intuitively easy with implicationsimplications

Threshold set for area overheadThreshold set for area overhead

Gate count used to estimate number of Gate count used to estimate number of implications that can be includedimplications that can be included

Implications chosen by:Implications chosen by:– Coverage of all faultsCoverage of all faults– Coverage of “most important” faults (more likely to be Coverage of “most important” faults (more likely to be

missed by test, more likely to cause important errors, missed by test, more likely to cause important errors, etc.)etc.)

Page 34: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

Variation in Probability of an Undetected Error for Fixed Area Thresholds

0

5

10

15

20

25

Circuit

Ave

rag

e P

rob

ab

ility

of

an

U

nd

ete

cte

d E

rro

r

10%20%30%40%50%

Page 35: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912

ConclusionsConclusions

Implications serve as gate-level “assertions” that can be Implications serve as gate-level “assertions” that can be automatically discovered without detailed functional automatically discovered without detailed functional knowledge of the circuit designknowledge of the circuit designMany implications naturally exist within circuitsMany implications naturally exist within circuitsGood coverage of many faults (often almost 90%)Good coverage of many faults (often almost 90%)Ideally suited to cost/coverage tradeoffs—especially for Ideally suited to cost/coverage tradeoffs—especially for applications that require a significant reduction in error applications that require a significant reduction in error rates instead of “zero” errors rates instead of “zero” errors With only a 10% area overhead, probability of an error With only a 10% area overhead, probability of an error being both observable and undetected is reduced to being both observable and undetected is reduced to ~12% on average (and actual error rate will be much ~12% on average (and actual error rate will be much less)less)

Page 36: Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912