Upload
muncel
View
30
Download
0
Embed Size (px)
DESCRIPTION
ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design. Vishwani D. Agrawal James J. Danaher Professor ECE Department, Auburn University Auburn, AL 36849 [email protected] http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07. Soft Errors. - PowerPoint PPT Presentation
Citation preview
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 11
ELEC 7770ELEC 7770Advanced VLSI DesignAdvanced VLSI Design
Spring 2007Spring 2007Soft Errors and Fault-Tolerant DesignSoft Errors and Fault-Tolerant Design
Vishwani D. AgrawalVishwani D. AgrawalJames J. Danaher ProfessorJames J. Danaher Professor
ECE Department, Auburn UniversityECE Department, Auburn UniversityAuburn, AL 36849Auburn, AL 36849
[email protected]@eng.auburn.eduhttp://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 22
Soft ErrorsSoft Errors Soft errors are the errors caused by the Soft errors are the errors caused by the
operating environment.operating environment. They are not due to a permanent hardware fault.They are not due to a permanent hardware fault. Soft errors are intermittent or random, which Soft errors are intermittent or random, which
makes their testing unreliable.makes their testing unreliable. One way to deal with soft errors is to make One way to deal with soft errors is to make
hardware robust:hardware robust: Capable of detecting soft errorsCapable of detecting soft errors Capable of correcting soft errorsCapable of correcting soft errors Both measures are probabilisticBoth measures are probabilistic
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 33
Some Early ReferencesSome Early References J. von Neumann, “Probabilistic Logics and the Synthesis J. von Neumann, “Probabilistic Logics and the Synthesis
of Reliable Organisms from Unreliable Components,” pp. of Reliable Organisms from Unreliable Components,” pp. 329-378, 1959, in A. H. Taub, editor, 329-378, 1959, in A. H. Taub, editor, John von Neumann: John von Neumann: Collected WorksCollected Works, , Volume V: Design of Computers, Volume V: Design of Computers, Theory of Automata and Numerical AnalysisTheory of Automata and Numerical Analysis, , Oxford University Press, 1963. Oxford University Press, 1963.
M. A. Breuer, “Testing for Intermittent Faults in Digital M. A. Breuer, “Testing for Intermittent Faults in Digital Circuits,” Circuits,” IEEE Trans. ComputersIEEE Trans. Computers, vol. C-22, no. 3, pp. , vol. C-22, no. 3, pp. 241-246, March 1973.241-246, March 1973.
T. C. May and M. H. Woods, “Alpha-Particle-Induces Soft T. C. May and M. H. Woods, “Alpha-Particle-Induces Soft Errors in Dynamic Memories,” Errors in Dynamic Memories,” IEEE Trans. Electron IEEE Trans. Electron DevicesDevices, vol. ED-26, no. 1, pp. 2-9, 1979., vol. ED-26, no. 1, pp. 2-9, 1979.
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 44
Causes of Soft ErrorsCauses of Soft Errors
Interconnect coupling (crosstalk).Interconnect coupling (crosstalk). Power supply noise: IR-drop, delta-I.Power supply noise: IR-drop, delta-I. Effects generally attributed to alpha-particles:Effects generally attributed to alpha-particles:
Charged particles: electrons, protons, ions.Charged particles: electrons, protons, ions. Radiation (photons): X-rays, gamma-rays, ultra-violet Radiation (photons): X-rays, gamma-rays, ultra-violet
light. light.
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 55
Sources of Alpha-ParticlesSources of Alpha-Particles
Radioactive contamination in VLSI packaging Radioactive contamination in VLSI packaging material.material.
Ionosphere, magnetosphere and solar radiation.Ionosphere, magnetosphere and solar radiation. Other electromagnetic radiation.Other electromagnetic radiation.
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 66
Alpha-ParticleAlpha-Particle
Helium nucleus: two protons and two Helium nucleus: two protons and two neutrons, mass = 6.65 neutrons, mass = 6.65 ×10×10-27-27kgkg, charge = , charge = +2e (e = 1.6 +2e (e = 1.6 ×10×10-19-19C).C).
Energy = 3.73 GeVEnergy = 3.73 GeV
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 77
Soft Error Rate (SER)Soft Error Rate (SER)
Failures in time (FIT): One FIT is 1 error per Failures in time (FIT): One FIT is 1 error per billion hours of operation.billion hours of operation.
Alternative unit is mean time between failures Alternative unit is mean time between failures (MTBF).(MTBF).
1 year MTBF = 109/(365×24) = 114,155 FIT
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 88
Particle StrikeParticle Strike
p - substrate
n - + + ++ - -
Ion orCharged particle
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 99
Induced CurrentInduced Current
time
curr
ent
I(t) = I0(e– t/a – e– t/b), a >> b
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1010
Voltage Induced at a NodeVoltage Induced at a Node
V = Q/C
Where Q = ∫ I(t) dt
C = node capacitance
Smaller node capacitance will result in larger voltage swing.
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1111
Effect on Digital CircuitEffect on Digital Circuit
IN OUT
CK
CombinationalLogic
ChargedParticles
ChargedParticles
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1212
An SRAM CellAn SRAM Cell
bit bit
VDDWL
BL BL
01
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1313
SRAM Cell Struck by Alpha-ParticleSRAM Cell Struck by Alpha-ParticleSingle-Event Upset (SEU)Single-Event Upset (SEU)
bit bit
VDDWL
BL BL
0→1 1→0
ChargedParticles
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1414
D-LatchD-Latch
D
CK = 0
Q1
0
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1515
SEU in D-LatchSEU in D-Latch
D
CK = 0
Q1→0
0→1
ChargedParticles
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1616
Single Event Transients in Single Event Transients in Combinational LogicCombinational Logic
CK
CK
1
1
0
1
01
ChargedParticles
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1717
Effects of TransientsEffects of Transients
Error correcting effectsError correcting effects Transient pulse is filtered by gate inertiaTransient pulse is filtered by gate inertia Transient is blocked by an unsensitized pathTransient is blocked by an unsensitized path Transient is blocked by an inactive clockTransient is blocked by an inactive clock
Error enhancing effectsError enhancing effects Large number of gates can produce multiple Large number of gates can produce multiple
pulsespulses Fanouts can multiply error pulsesFanouts can multiply error pulses
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1818
SEUs in FPGASEUs in FPGA Parts that can be affectedParts that can be affected
Look-up table (LUT)Look-up table (LUT) Configuration memory cellConfiguration memory cell Flip-flopFlip-flop Block RAMBlock RAM
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1919
LUTLUT
out
F1 F2 F3 F4
1
01
10
11
00
00
01
110
Mem
ory
cells
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2020
SEU in SEU in LUTLUT
out
F1 F2 F3 F4
1
01
00
11
00
00
01
110
Mem
ory
cells
ChargedParticle1 changed to 0
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2121
Four Types of SEU in FPGAFour Types of SEU in FPGA
F1F2F3F4
LUT
FF
M
M
M
M
M M M
Configuration memory cell
Type 1
Type 2
Type 3
BlockRAM
Type 4
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2222
SEU Detection MethodsSEU Detection Methods
Hardware redundancyHardware redundancy Time redundancyTime redundancy Error detection codes (EDC)Error detection codes (EDC) Self-checker techniquesSelf-checker techniques
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2323
SEU Mitigation TechniquesSEU Mitigation Techniques
Triple modular redundancy (TMR)Triple modular redundancy (TMR) Multiple redundancy with votingMultiple redundancy with voting Error detection and correction codes (EDAC)Error detection and correction codes (EDAC) Hardened memory cellsHardened memory cells FPGA-specific methodsFPGA-specific methods
ReconfigurationReconfiguration Partial configurationPartial configuration Rerouting designRerouting design
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2424
Hardware Redundancy for DetectionHardware Redundancy for Detection
CombinationalLogic
CombinationalLogic
(duplicated)
outputinputs
Logic 1 indicates
error
Hardware overhead is high ~ 100%Performance penalty is negligible.
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2525
Time Redundancy for DetectionTime Redundancy for Detection
CombinationalLogic outputinputs
Logic 1 indicates
error
Hardware overhead is low.Performance penalty ( ~ d) = maximum detectable pulse width.
D Q
D Q
CK+ d
CK
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2626
Repeat on Error DetectionRepeat on Error Detection
CombinationalLogic
outputinputs
Logic 1 indicates
errorD Q
D Q
CK+ d
CK
C
Operation: If error is detected, then output retains its previous value.Repeating the computation can produce correct result.
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2727
Muller C-ElementMuller C-Element
outputC
A
B
AA BB outputoutput
00 00 00
00 11 Old outputOld output
11 00 Old outputOld output
11 11 11
S Q
R
A
B
output
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2828
Triple Modular Redundancy (TMR)Triple Modular Redundancy (TMR)
CombinationalLogic copy 1
outputinputs MajorityVoter
CombinationalLogic copy 3
CombinationalLogic copy 2
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2929
Majority Voter CircuitMajority Voter CircuitA
BAA BB CC outputoutput
00 00 00 00
00 00 11 00
00 11 00 00
00 11 11 11
11 00 00 00
11 00 11 11
11 11 00 11
11 11 11 11
A
B output
outputMajorityVoter
C
C
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3030
Alternative Implementations of VoterAlternative Implementations of Voter
LUT
00010111
output output
A
B
C
A B C
VDD
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3131
Triple Modular Redundancy (TMR)Triple Modular Redundancy (TMR)
CombinationalLogic
output
inputs
D Q
D Q
CK
CK+ d
MajorityVoter
D Q
D Q
CK+2d
CK+3d
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3232
TMR for Memory CellsTMR for Memory Cells
CombinationalLogic
output
inputs
D Q
D Q
CK
CK
MajorityVoter
D Q
CK
Problems:1. Accumulation of
errors in flip-flops.1. Voter is not protected.
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3333
FF Refresh and TMR for Memory CellsFF Refresh and TMR for Memory Cells
output
D Q
D Q
CK
CK
D Q
CK
MajorityVoter
MajorityVoter
MajorityVoter
MajorityVoter
r1
r2
r3
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3434
A Resistor Hardened SRAM CellA Resistor Hardened SRAM Cell
bit bit
VDDWL
BL BL
01
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3535
ReferencesReferences
F. L. Kastensmidt, L. Carro and R. Reis, F. L. Kastensmidt, L. Carro and R. Reis, Fault-Fault-Tolerant Techniques for SRAM-Based FPGAsTolerant Techniques for SRAM-Based FPGAs, , Springer, 2006.Springer, 2006.
S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim, “Robust System Design with Built-In Soft-Kim, “Robust System Design with Built-In Soft-Error Resilience,” Error Resilience,” ComputerComputer, vol. 38, no. 2, pp. , vol. 38, no. 2, pp. 43-52, February 2005.43-52, February 2005.
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3636
Summary of Topics Covered (1)Summary of Topics Covered (1) Nanotechnology devicesNanotechnology devices Moore’s lawMoore’s law System level design for testability and test scheduling System level design for testability and test scheduling
problemproblem VerificationVerification
Logic equivalenceLogic equivalence Binary decision diagramsBinary decision diagrams
Power consumption and low-power conceptsPower consumption and low-power concepts Multi-core parallelismMulti-core parallelism MicroprocessorsMicroprocessors MemoriesMemories
Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3737
Summary of Topics Covered (2)Summary of Topics Covered (2) TimingTiming
Timing verificationTiming verification Timing simulationTiming simulation Static timing analysisStatic timing analysis
Timing optimizationTiming optimization Linear programming and clock constraintsLinear programming and clock constraints Clock skew problemClock skew problem Zero skew designZero skew design
Retiming, constraint graph and performance Retiming, constraint graph and performance optimizationoptimization
Soft errors and fault-tolerant designSoft errors and fault-tolerant design