Upload
lamnga
View
230
Download
0
Embed Size (px)
Citation preview
1
EE 371: Debug ExamplesJ. Stinson © 2007 1
EE371Debug Examples
Intel [email protected]
EE 371: Debug ExamplesJ. Stinson © 2007 2
Agenda
• Speedpath Failure• Circuit Marginality: Noise• Functional Failure• Circuit Marginality: Multiple• PowerUp Problems
2
EE 371: Debug ExamplesJ. Stinson © 2007 3
Speedpath Failure
EE 371: Debug ExamplesJ. Stinson © 2007 4
Speedpath Example: The Wall Shmoo
d004508A16588 -- #FF005 (92C)3.2V |EEEEEEEE+++++++++++++3.0V |EEEEEEEE+++++++++++++2.8V |XEEEEEEE+++++++++++++2.6V |XXEEEEEE+++++++++++++2.4V |XXXXXEEE+++++++++++++2.2V |XXXXXXXE+++++++++++++2.0V |XXXXXXXXXX+++++++++++
+^-----^-----^-----^--10.0 11.2 12.4 13.6
+ - passE - Wall failX - other fail
Voltage
Bus Period
Passing Region
3
EE 371: Debug ExamplesJ. Stinson © 2007 5
d004508A16588 -- #SS131 (92C)3.2V |XXXXXEEE+++++++++++++3.0V |XXXXXXEE+++++++++++++2.8V |XXXXXXXE+++++++++++++2.6V |XXXXXXXXX++++++++++++2.4V |XXXXXXXXXXX++++++++++2.2V |XXXXXXXXXXXXXXX++++++2.0V |XXXXXXXXXXXXXXXXXXX++
+^-----^-----^-----^--10.0 11.2 12.4 13.6 + - passE - Wall failX - other fail
d004508A16588 -- #FF005 (92C)3.2V |EEEEEEEE+++++++++++++3.0V |EEEEEEEE+++++++++++++2.8V |XEEEEEEE+++++++++++++2.6V |XXEEEEEE+++++++++++++2.4V |XXXXXEEE+++++++++++++2.2V |XXXXXXXE+++++++++++++2.0V |XXXXXXXXXX+++++++++++
+^-----^-----^-----^--10.0 11.2 12.4 13.6 + - passE - Wall failX - other fail
Fast Transistor Part Slow Transistor Part
Skew Insensitive wall
EE 371: Debug ExamplesJ. Stinson © 2007 6
The Wall Debug
• Production test platform suspected– A timing setup problem– How could silicon act this way?
However...Debug test platform confirmed– Unlikely two diff’t platforms had same timing
error– Now we had to do the debug...
4
EE 371: Debug ExamplesJ. Stinson © 2007 7
Pattern Timeline
Slow Pattern
Clock 0
Debug Process
TesterShmoo
Pin Failure
TesterShmooTAP
First ScanMismatch
TAPClock ShrinkRTL SimProbing
SpeedpathClock
EE 371: Debug ExamplesJ. Stinson © 2007 8
Why was it a wall?
• Long Interconnect Line– RC Delay less sensitive to driver strength– Voltage/process only improve driver
5
EE 371: Debug ExamplesJ. Stinson © 2007 9
Interconnect Effect
Transistor Delay
RC Delay
Clock
Path #1
Path #2
Data Valid Times
Must be valid before
Voltage = 2.0V
EE 371: Debug ExamplesJ. Stinson © 2007 10
Transistor Delay
RC Delay
Clock
Path #1
Path #2
Xtor Path Shows Big Improvement
RC Path has little Improvement
Interconnect Effect
Voltage = 2.5V
6
EE 371: Debug ExamplesJ. Stinson © 2007 11
Why was it a wall?
Jam Sustainer
• Jam sustainer at end of the line– Fights transition of signal– Sustainer gets stronger with voltage/skew– Adds to “wall” characteristics
EE 371: Debug ExamplesJ. Stinson © 2007 12
Wall Follow-up
• Two FIB experiments– Driver speedup – wall moved– Cut sustainer – wall “leaned”
7
EE 371: Debug ExamplesJ. Stinson © 2007 13
Shmoo with Cut Sustainer
d004508A16588 -- #FF013 FIB3.2V |EEEE+++++++++++++++++3.0V |EEEEE++++++++++++++++2.8V |XEEEE++++++++++++++++2.6V |XXEEEE+++++++++++++++2.4V |XXXXXEE++++++++++++++2.2V |XXXXXXXE+++++++++++++2.0V |XXXXXXXXXX+++++++++++
+^-----^-----^-----^--10.0 11.2 12.4 13.6
+ - passE - Wall failX - other fail
Wall has leaned over
EE 371: Debug ExamplesJ. Stinson © 2007 14
Circuit Marginality: Noise
8
EE 371: Debug ExamplesJ. Stinson © 2007 15
Noise Example
• High Voltage Failure– Only one FAB showed
signature• Second FAB seemed clean
– Scan pointed to branch memory array
g196027*, part#ZC-1063.3V |BCCCCCCCCCC3.2V |BCBBCCCCCCC3.1V |BBBBBBBBBBB3.0V |BBBBBBBBBBB2.9V |BBBBBBBBBBB2.8V |BBBBBBBBBBB2.7V |BBBBBBBBBBB2.6V |BBBBBBBBBBB2.5V |BBBBBBBBBBB2.4V |+B+BBBBBBBB2.3V |++++BB+++BB2.2V |+++++++++++2.1V |+++++++++++2.0V |A+++++++++++^-----^----10.0 16.0+ - pass Pass Low Voltage Only
EE 371: Debug ExamplesJ. Stinson © 2007 16
Noise Debug
• EBeam confirmed branch array read– Visibility limited in array
• Bit 4 resolved later than other bits– Based on EBeam waveforms
• Signals on either side of read lines transistioned in opposite direction
– Suspected coupling problem
9
EE 371: Debug ExamplesJ. Stinson © 2007 17
Bit
Bit#
WL
Sens
eA
mpl
ifier Out
BitBit#
Bit
Bit#
SAEN
SAEN
Coupling Schematics
EE 371: Debug ExamplesJ. Stinson © 2007 18
Bit
Bit#
WL
Sens
eA
mpl
ifier Out
BitBit#SAEN
Coupling Attacks
Sense Amp senses Wrong Value!
SAEN
Coupling Schematics
10
EE 371: Debug ExamplesJ. Stinson © 2007 19
BTB Coupling Debug
• Parameters data checked at problematic FAB– M2 CD’s wider than normal– ILD1 and ILD2 thicker than normal– More sensitive to coupling
• Audit of original design– Simulations ignored some coupling– New simulations showed failure
EE 371: Debug ExamplesJ. Stinson © 2007 20
BTB Coupling Validation: FIB experiments
• Deposit extra capacitance on read line– Resists coupling from neighbors
• Extend sense amp pulse width– Gives more time for read to resolve
11
EE 371: Debug ExamplesJ. Stinson © 2007 21
Bit
Bit#
WL
Sens
eA
mpl
ifier Out
BitBit#SAEN
Bitlines split correctly
SAEN
Coupling Schematics
EE 371: Debug ExamplesJ. Stinson © 2007 22
Functionality Failure
12
EE 371: Debug ExamplesJ. Stinson © 2007 23
Functionality Problem
• “Dash stepping” first silicon non-functional– Stepping was supposed to fix a min-delay race
• Suspected inadequate race fix– Scandiff confirmed same circuitry– EBeam also confirmed…– But visibility was limited
EE 371: Debug ExamplesJ. Stinson © 2007 24
Functionality Debug
• Design team was confident in fix, so…• Plan to strip back the entire block
– Look for possible mask defect– Takes 4-10 days in FIB
However...
• Noticed a floating node in EBeam scope
13
EE 371: Debug ExamplesJ. Stinson © 2007 25
Floating Node
Driven Metal Lines held at Vss
Electron Beam charges Floating Node
EE 371: Debug ExamplesJ. Stinson © 2007 26
Floating Node
14
EE 371: Debug ExamplesJ. Stinson © 2007 27
Floating Node Debug
• Node should NOT have been floating• A0 and A1 layout compared
– Via1 or M1 could cause error• FIB strip back focused on this node
EE 371: Debug ExamplesJ. Stinson © 2007 28
Should be 3 via1’sFAB contacted
– Accidentally used A0 via1 mask
Problem fixed– New silicon
arrived shortly
FIB Stripback Results
15
EE 371: Debug ExamplesJ. Stinson © 2007 29
Good Silicon has all 3 via1’s
Fully functional with correct via1 mask
FIB Stripback Results
EE 371: Debug ExamplesJ. Stinson © 2007 30
Functionality Summary
• Notice details– Focused stripback saved days of work– Very important during time critical debug
16
EE 371: Debug ExamplesJ. Stinson © 2007 31
Circuit Marginality: Multiple Sources
EE 371: Debug ExamplesJ. Stinson © 2007 32
Circuit Marginality
• Observed High Vcc failures– Frequency Insensitive
• TDO only failure– All signature mode tests were failing– Turning off signature mode allowed test to pass
17
EE 371: Debug ExamplesJ. Stinson © 2007 33
High Vcc Shmoo
DQS Sigmode Shmoo (104C)2.0V |XXXXXXXXXXXXXXXXXXXXX
|XXXXXXXXXXXXXXXXXXXXX1.8V |XXXXXXXXXXXXXXXXXXXXX
|XXXXXXX++XXXXXX+++XXX1.6V |AA+++++++++++++++++++
|AA+++++++++++++++++++1.4V |AAA++++++++++++++++++
|AAAAAA+++++++++++++++1.2V |AAAAAAAAAAAA+++++++++
+^-----^-----^-----^--12.0 13.0 14.0 15.0
+ - passX - High Vcc failA - Other fail
EE 371: Debug ExamplesJ. Stinson © 2007 34
Marginality Root Cause
• Scanout stopped working in failure region– Deduce scan chain itself was broken
• Probing was only way to root cause– Laser Voltage Probe was able to narrow failure
down to Scan MSFF– Three different mechanisms observed
18
EE 371: Debug ExamplesJ. Stinson © 2007 35
Scan MSFF Analysis: Pass
Data
Clock
Out
Clock#
N1 N2
Clock#
N2
Clock#
N1
N1 N2
EE 371: Debug ExamplesJ. Stinson © 2007 36
Problem #1: Charge-Share
Data
Clock
Out
Clock#
N1 N2
Clock#
N2N1
Charge-Sharing
Glitch on N1
19
EE 371: Debug ExamplesJ. Stinson © 2007 37
MSFF Problem #2
Data
Clock
Out
Clock#
N1 N2
Clock#
N2N1
Contention
Larger Glitch on N1
EE 371: Debug ExamplesJ. Stinson © 2007 38
MSFF Problem #3
Data
Clock
Out
Clock#
N1 N2
Clock#
N2N1
Cross Coupling
Flips State
20
EE 371: Debug ExamplesJ. Stinson © 2007 39
Scan MSFF “Backwriting”
• Slave “backwrites” value into Master– Combination of three mechanisms to cause failure
• Re-simulated all standard cell MSFF’s– Two other cells flagged with same problem
• Circuit was a direct “shrink” from a previous process– Discovered same issue on prior process—but at a
MUCH higher voltage
EE 371: Debug ExamplesJ. Stinson © 2007 40
PowerUp and Initialization
21
EE 371: Debug ExamplesJ. Stinson © 2007 41
PowerUp Issue
• Observed *some* systems wouldn’t boot– Toggling RESET always enabled boot– Toggling power did not guarantee boot
• Nasty problem to debug– System level issue (not seen on tester)– Intermittent failure (occurred 1 out of 100 times)– Debug tools not enabled (part hasn’t booted)
• Started with oscilloscope waveforms…
EE 371: Debug ExamplesJ. Stinson © 2007 42
Oscope Waveforms
xxRESET#
xxPWRGOOD
xxHIT#
Should be Tri-stated (pulled hi)
Assertion enables Tristate
22
EE 371: Debug ExamplesJ. Stinson © 2007 43
Why is TriState determined by PWRGOOD?
• Discovered busclk dependency @ – ACLOOP[1] directly controls I/O tristate signal
• Depends upon busclk for proper initialization– While !PWRGOOD, busclk is not generated
• Power-up initialization @ may generate a busclk no issue• Otherwise, must depend on power-up initialization of ACLOOP[1] ( )• “Driven value” on I/O pins will depend on power-up initialization at
EE 371: Debug ExamplesJ. Stinson © 2007 44
Why wasn’t the part booting?
• PWRGOOD will always clear the ACLOOP– Eventually the pins should tristate– So, why was the part still not booting?
• Further characterization: Power levels were very low– When the part failed to boot, the power was very low– Potentially indicated that the PLL wasn’t running– Discovered secondary effect of ACLOOP initialization
problem
23
EE 371: Debug ExamplesJ. Stinson © 2007 45
PLL Ratio Depends on PWRGOOD
PLL frequency ratio determined
at assertion edge of PWRGOOD
By Sampling the Address Pins
EE 371: Debug ExamplesJ. Stinson © 2007 46
Final Root Cause
• System drives address pins at PWRGOOD assertion– Sets internal PLL frequency– Address pins are *supposed* to be tristated by the processor
• If ACLOOP powers up incorrectly, contention can occur– Processor is driving a ‘0 on address pin; system is driving a ‘1– The processor will always win
• PWRGOOD assertion tristates the address bus– Too late! It’s already been sampled by PWRGOOD assertion– Only “illegal” bus fractions will cause failure
• Only 7 out of 32 possible bus fractions are “illegal”• Failure requires a confluence of diff’t events
– ACLOOP powers up “on”– Bus clock does NOT glitch during power up– Address pins power up driving an “illegal” bus fraction
24
EE 371: Debug ExamplesJ. Stinson © 2007 47
2nd PowerUp Issue
• Observed *some* systems wouldn’t boot– Toggling RESET never enabled boot– Toggling power usually enabled boot
• Nasty problem to debug– Intermittent failure (occurred 1 out of 1000+ times)
• Some bright spots– Able to demonstrate on tester
• Enabled “deterministic” behavior• Enabled debug tools (scan)
EE 371: Debug ExamplesJ. Stinson © 2007 48
Vcc Shmoo (100x repeat)
NoBoot Shmoo (40C)1.5V |+++++++++++++++++++++
|+++++++++++++++++++++1.4V |+++++++++++++++++++++
|+++++++++++++++++++++1.3V |A++++++++++++++++++++
|AA+++++++X+++++++++++1.2V |AAA++++++++++++++++++
|AAAAAA+++++++++++++++1.1V |AAAAAAAAAAAA+++++++++
+^-----^-----^-----^--6.0 7.0 8.0 9.0
+ - passX - FailA - Other fail
25
EE 371: Debug ExamplesJ. Stinson © 2007 49
Vcc Shmoo (10000x repeat)
NoBoot Shmoo (40C)1.5V |+++++++++++++++++++++
|+++++++++++++++++++++1.4V |XXXXXXXXXXXXXXXXXXXXX
|XXXXXXXXXXXXXXXXXXXXX1.3V |XXXXXXXXXXXXXXXXXXXXX
|XXXXXXXXXXXXXXXXXXXXX1.2V |XXXXXXXXXXXXXXXXXXXXX
|XXXXXXXXXXXXXXXXXXXXX1.1V |XXXXXXXXXXXXXXXXXXXXX
+^-----^-----^-----^--6.0 7.0 8.0 9.0
+ - passX - FailA - Other fail
EE 371: Debug ExamplesJ. Stinson © 2007 50
Temperature Shmoo (10000x repeat)
NoBoot Shmoo (40C)80C |AA+++++++++++++++++++
|+++++++++++++++++++++60C |+++++++++++++++++++++
|++X+XXX+++XXXX++XXX++40C |XXXXXXXXXXXXXXXXXXXXX
|++++XX+XXX++XXX+XX+XX20C |+++++++++++++++++++++
|+++++++++++++++++++++0C |+++++++++++++++++++++
+^-----^-----^-----^--6.0 7.0 8.0 9.0
+ - passX - FailA - Other fail
26
EE 371: Debug ExamplesJ. Stinson © 2007 51
Scan Analysis
Add
rDec
ode
FF
FF
Scan Failure
Clean Scan
• Scan failure looked like OR of two entries
- Common for multiple WL firing (dynamic read)
- Uncommon for random logic
• Address decode was simple CMOS
EE 371: Debug ExamplesJ. Stinson © 2007 52
Wordline Driver
• Used a fancy self-resetting mechanism– Self-reset WL prevented read→write min-delay– Pulsed WL read array for short period of time
Input
rWLn1
Forward inverter
Feedback inverter
27
EE 371: Debug ExamplesJ. Stinson © 2007 53
Wordline Driver: Problem
• Self-reset sized diff’t than forward path– Initial state could flip forward inverter but not
feedback (pseudo-metastable state)• Resolving pseudo-meta state
– Access WL– High temp– Low temp
Input
rWLn16
1
1
1
Forward inverter
EE 371: Debug ExamplesJ. Stinson © 2007 54
Summary
28
EE 371: Debug ExamplesJ. Stinson © 2007 55
Summary
• Debug requires a lot of detective work– Review all the evidence– Develop experiments to eliminate possible problems– Develop theory of failure– Validate theory
• Can’t ignore ANY evidence– If something doesn’t fit, you’re missing something
• EVERY problem is different– Need to constantly think about alternative methods of validation
• The Norwegian capacitor• The Kleveland voltmeter