Upload
brendan-poindexter
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
Using Implications for Online Using Implications for Online Error DetectionError Detection
Nuno Alves, Jennifer Dworak, Nuno Alves, Jennifer Dworak, and R. Iris Baharand R. Iris Bahar
Division of Engineering Division of Engineering Brown UniversityBrown University
Providence, RI 02912Providence, RI 02912
Kundan NepalElectrical Engineering Dept.
Bucknell UniversityLewisburg, PA 17837
International Test Conference, October 28-30, 2008
MotivationMotivation
Circuits are becoming more susceptible to Circuits are becoming more susceptible to transient errors….transient errors….– Soft errors, test escapes, noise, etc.Soft errors, test escapes, noise, etc.
Some applications need a Some applications need a reductionreduction in in error rates.error rates.
$$$$Error
DetectionCan we efficiently tradeoff Can we efficiently tradeoff error detection and cost?error detection and cost?
- using logic implications- using logic implications
OutlineOutline
Common error detection techniquesCommon error detection techniques
Our approach—logic implicationsOur approach—logic implications
Finding an implication setFinding an implication set
Error coverageError coverage
Balancing error coverage and overheadBalancing error coverage and overhead
ConclusionsConclusions
OutlineOutline
Common error detection techniquesCommon error detection techniques
Our approach—logic implicationsOur approach—logic implications
Finding an implication setFinding an implication set
Error coverageError coverage
Balancing error coverage and overheadBalancing error coverage and overhead
ConclusionsConclusions
(Some) Previous Techniques in (Some) Previous Techniques in Online Error DetectionOnline Error Detection
Redundancy in time — e.g. re-executing in Redundancy in time — e.g. re-executing in a redundant threada redundant thread
Logic duplication or TMRLogic duplication or TMR
Codes — e.g. Parity, Berger, Bose LinCodes — e.g. Parity, Berger, Bose Lin
Pre-computed test vectors and their Pre-computed test vectors and their expected responses (stored in hardware)expected responses (stored in hardware)
High-level functional assertionsHigh-level functional assertions
OutlineOutline
Common error detection techniquesCommon error detection techniques
Our approach—logic implicationsOur approach—logic implications
Finding an implication setFinding an implication set
Error coverageError coverage
Balancing error coverage and overheadBalancing error coverage and overhead
ConclusionsConclusions
Our Approach—Logic ImplicationsOur Approach—Logic Implications
Error detection compares expected Error detection compares expected behavior to actual behaviorbehavior to actual behavior
Implications Implications within a logic block describe within a logic block describe expected relationshipsexpected relationships between values at between values at circuit sites.circuit sites.
Violation of an expected implication Violation of an expected implication indicates the presence of an error. indicates the presence of an error.
Implications Naturally Occur in CircuitsImplications Naturally Occur in Circuits
n1
n2n3
n4n5
n6n7
n80
1
00
n5 = 1 → n8 = 0
Implication Violations Can Be Used Implication Violations Can Be Used to Detect Errorsto Detect Errors
ERROR
n1
n2n3
n4n5
n6n7
n8
n5=1 n8=0
Appropriate checker logic can detect multiple errors with a single implication.
Implication Violations Can Be Used Implication Violations Can Be Used to Detect Errorsto Detect Errors
ERROR
n1
n2n3
n4n5
n6n7
n8
n5=1 n8=0
Appropriate checker logic can detect multiple errors with a single implication.
sa1
sa1
sa1sa1sa1
sa1
sa1
sa1
sa1
sa1
sa1sa1
sa1
sa1sa1
sa1
sa1
sa1sa1
sa1
sa1
Identified Implications Determine Identified Implications Determine Checker HardwareChecker Hardware
OutlineOutline
Common error detection techniquesCommon error detection techniques
Our approach—logic implicationsOur approach—logic implications
Finding an implication setFinding an implication set
Error coverageError coverage
Balancing error coverage and overheadBalancing error coverage and overhead
ConclusionsConclusions
Finding ImplicationsFinding Implications
Gate-level implications can be identified Gate-level implications can be identified automaticallyautomatically without requiring functional knowledge of the circuit in without requiring functional knowledge of the circuit in three steps:three steps:
Quickly identify potential implications:Quickly identify potential implications:– Choose potential sites of adequate distanceChoose potential sites of adequate distance– Fast good circuit simulationFast good circuit simulation– Look for missing logic value pairsLook for missing logic value pairs
Validate implicationsValidate implications– SAT solverSAT solver
Reduce implication setReduce implication set– Structural and error detection analysisStructural and error detection analysis
So…how many “natural” So…how many “natural” implications are there?implications are there?
Total Number of Implications With Distance 2 or Greater
1
10
100
1000
10000
100000
Circuit
Nu
mb
er
of
Imp
lica
tion
s
Identifying “Subsumed” ImplicationsIdentifying “Subsumed” Implications
All the errors covered by a short-distance All the errors covered by a short-distance implication may sometimes also be implication may sometimes also be covered by a long-distance implication…. covered by a long-distance implication…. n1
n2n3
n4n5
n6n7
n8n9
n10
n11
n12
n13
Identifying “Subsumed” ImplicationsIdentifying “Subsumed” Implications
All the errors covered by a short-distance All the errors covered by a short-distance implication may sometimes also be implication may sometimes also be covered by a long-distance implication…. covered by a long-distance implication…. n1
n2n3
n4n5
n6n7
n8n9
n10
n11
n12
n13
n10 = 0 → n13 = 0
Identifying “Subsumed” ImplicationsIdentifying “Subsumed” Implications
All the errors covered by a short-distance All the errors covered by a short-distance implication may sometimes also be implication may sometimes also be covered by a long-distance implication…. covered by a long-distance implication…. n1
n2n3
n4n5
n6n7
n8n9
n10
n11
n12
n13
n10 = 0 → n13 = 0
n10 = 0 → n8 = 0
Identifying “Subsumed” ImplicationsIdentifying “Subsumed” Implications
All the errors covered by a short-distance All the errors covered by a short-distance implication may sometimes also be implication may sometimes also be covered by a long-distance implication…. covered by a long-distance implication…. n1
n2n3
n4n5
n6n7
n8n9
n10
n11
n12
n13
n10 = 0 → n13 = 0
n10 = 0 → n8 = 0
n4 = 1 → n8 = 0
n4 = 1 → n11 = 0
n4 = 1 → n13 = 0
n4 = 11 → n8 = 0
Reducing the Implication ListReducing the Implication List
SubsumedSubsumed implications detected through implications detected through structural analysis:structural analysis:– Implications fall on the same path with Implications fall on the same path with
appropriate “implied values”appropriate “implied values”– No fanout branches along the pathNo fanout branches along the path– The implication with the longest “distance” The implication with the longest “distance”
between implication sites is retained.between implication sites is retained.
So, how much does this reduce the So, how much does this reduce the size of our implication lists?size of our implication lists?
Implication List Size after Removing Subsumed Implications
05000
100001500020000250003000035000
circuit
Nu
mb
er
of
Imp
lica
tion
s
before subsume after subsume
Compressing the Implication List Compressing the Implication List While Maintaining QualityWhile Maintaining Quality
Once subsumed implications are removed, the Once subsumed implications are removed, the implication list may still be too long.implication list may still be too long.
Evaluate the remaining implications for Evaluate the remaining implications for “implication quality”“implication quality”
Implication quality calculated for every Implication quality calculated for every implication/fault pair:implication/fault pair:
Each fault’s “highest quality” implication is added Each fault’s “highest quality” implication is added to the listto the list
patternsinput
faultmissedqualitynimplicatio
#1100
Number of "High-Quality" Implications after Compression Step
0100200300400500600700800900
1000
Circuit
Nu
mb
er
of
Co
mp
ress
ed
Im
plic
atio
ns
Compression Ratios for Compressed Implication Set
0
20
40
60
80
100
120
Circuit
Co
mp
ress
ion
Ra
tio
vs. initial vs. subsumed
OutlineOutline
Common error detection techniquesCommon error detection techniques
Our approach—logic implicationsOur approach—logic implications
Finding an implication setFinding an implication set
Error coverageError coverage
Balancing error coverage and overheadBalancing error coverage and overhead
ConclusionsConclusions
Covering Faults with ImplicationsCovering Faults with Implications
For each random input vector, and for each For each random input vector, and for each fault, the implications-based circuit operation fault, the implications-based circuit operation can fall into the following 4 categories:can fall into the following 4 categories:
CaseCase
11
CaseCase
22
CaseCase
33
CaseCase
44
Error Propagates To OutputError Propagates To Output
An Implication is ViolatedAn Implication is Violated
True detection
False posit
ive
True miss
Benign miss
Distribution of the 4 Cases for Random Input Patterns
0
10
20
30
40
50
60
70
rd73
z5xp
1cli
p
z9sy
mb1
2
mise
x2c4
32c4
99c8
80
c135
5
c190
8
% o
f To
tal
Case 1: Error Propagated & Implication ViolatedCase 2: Error NOT Propagated & Implication ViolatedCase 3: Error NOT Propagated & Implication NOT ViolatedCase 4: Error Propagated & Implication NOT Violated
Average Detection Rate for Errors that Propagate to an Output
0.010.020.030.040.050.060.070.080.090.0
100.0
circuit
case
1/
(ca
se 1
+ c
ase
4)
OutlineOutline
Common error detection techniquesCommon error detection techniques
Our approach—logic implicationsOur approach—logic implications
Finding an implication setFinding an implication set
Error coverageError coverage
Balancing error coverage and overheadBalancing error coverage and overhead
ConclusionsConclusions
What is the hardware overhead?What is the hardware overhead?
Include all implications remaining after compressInclude all implications remaining after compress
Used simple implementation for each implication Used simple implementation for each implication (AND gate and up to 2 inverters)(AND gate and up to 2 inverters)
Outputs of AND gates OR’ed togetherOutputs of AND gates OR’ed together
180nm TSMC library and Mentor Graphics 180nm TSMC library and Mentor Graphics Toolset used to generate layout and calculate Toolset used to generate layout and calculate area overhead.area overhead.
Area Overhead Comparison for all Compressed Implications
020406080
100120140160180
Circuit
Ove
rhe
ad
(%
)
DUPLICATION PARITY IMPLICATIONS
Trading off Area Overhead and Trading off Area Overhead and CoverageCoverage
Coverage/area tradeoffs are intuitively easy with Coverage/area tradeoffs are intuitively easy with implicationsimplications
Threshold set for area overheadThreshold set for area overhead
Gate count used to estimate number of Gate count used to estimate number of implications that can be includedimplications that can be included
Implications chosen by:Implications chosen by:– Coverage of all faultsCoverage of all faults– Coverage of “most important” faults (more likely to be Coverage of “most important” faults (more likely to be
missed by test, more likely to cause important errors, missed by test, more likely to cause important errors, etc.)etc.)
Variation in Probability of an Undetected Error for Fixed Area Thresholds
0
5
10
15
20
25
Circuit
Ave
rag
e P
rob
ab
ility
of
an
U
nd
ete
cte
d E
rro
r
10%20%30%40%50%
ConclusionsConclusions
Implications serve as gate-level “assertions” that can be Implications serve as gate-level “assertions” that can be automatically discovered without detailed functional automatically discovered without detailed functional knowledge of the circuit designknowledge of the circuit designMany implications naturally exist within circuitsMany implications naturally exist within circuitsGood coverage of many faults (often almost 90%)Good coverage of many faults (often almost 90%)Ideally suited to cost/coverage tradeoffs—especially for Ideally suited to cost/coverage tradeoffs—especially for applications that require a significant reduction in error applications that require a significant reduction in error rates instead of “zero” errors rates instead of “zero” errors With only a 10% area overhead, probability of an error With only a 10% area overhead, probability of an error being both observable and undetected is reduced to being both observable and undetected is reduced to ~12% on average (and actual error rate will be much ~12% on average (and actual error rate will be much less)less)