82
Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic ICs: Compensation and Repair Problems, Solutions, Limitations H. T. Vierhaus BTU Cottbus Computer Engineering

Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Embed Size (px)

Citation preview

Page 1: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Transient and Permanent Faults in Nanoelectronic ICs: Compensation and Repair

Problems, Solutions, Limitations

H. T. VierhausBTU Cottbus

Computer Engineering

Page 2: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Outline

1. Introduction: Nanostructure Problems

3. Repair of Permanent Faults

4. Bus Structures and NoCs

5. Diagnostic Test

6. A Lot of Things to do ...

2. Transient Faults

Page 3: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

1. Introduction

A bunch of new problems from nanostructures ...

Page 4: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Nanoelectronic Problems

Lithography:

The wavelength used to „map“ structural information frommasks to wafers is larger (4 times of more) than the minimumstructural features (193 versus 90 / 65 / 45 nm).

Adaptation of layouts for correction of mapping faults

Parameter variations:

The number of atoms in MOS- transistor channels becomes sosmall that statistical variations of doping densities have an impacton device parameters such as threshold voltages.

Page 5: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Doping Fluctuations in MOS Transistors

p-Substrate

n n

Poly-Si

doping atom

p-Substrate

n n

Poly-Si

doping atom

Density and distribution of doping atomscause shifts in transistor threshold voltages!

Page 6: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Nanostructure ProblemsIndividual device characteristics such as Vth are more dependent on statistical variations of underlying physical features such as doping profiles.

A significant share of basic devices will be „out or specs“ and needs a replacement by backup elements for yield improvement after production.

As smaller features mean higher stress (field strength, current density), also early failures „in the field“ are more likely and must be compensated.

Transient error recognition and compensation „in time“ is becoming a must due to e. g. charged particles that can discharge circuit nodes.

Page 7: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Fault tolerant computing

An old technology that is already heavily used in every day computing(e.g. memory interfaces with ECC- check and correction).

Is required to handle intermittent and transient fault effects, e.g. induced by radiation.

Can handle only a limited number of permanent faults!

Built-in self test (BIST) and self-repair (BISR) Is required to handle permanent faults by self-repair using redundant elements.State-of-the-art for memories, not for logic.

Can handle multiple faults (sequentially) until the resource of redundancy is exhausted.

Algorithms that are fully or partially „fault hard“Most DSP algorithms show an inherent „stability“ and work even underfault conditions with reduced precision. The effect can be „HW-enhanced“.

Key Technologies

Page 8: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

System-on-a Chip (SoC)

LocalMemory

DSP

LocalMemory

DSP

RISCLocalMemory

FU 1 FU 2 FU 3globalbus

localbus

Buscoupler

globalbus

SoCs are heterogeneoussystems that requiretest & repair strategies for:

- logic (also in processors)

- memory blocks

- interconnects

- analog and D/A components

Page 9: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Fault Tolerant Computing

Faultevent

Software-basedfault detection

& compensation

HW logic & RT-level

detection &compensation

Works onlyfor transient faults!

Typically worksfor transient and permanent faults!

Transistor-and switch levelcompensation

Typically worksfor specific types of

transient faultsonly!

specific

veryspecific

universal

Page 10: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

2. Transient Fault Effects

Page 11: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Storage Nodes and Particles

1

10

100

Q / fC

Technology0,35 0,25 0,18 0,09

1 MeV Alpha-Particle generates 42 fC Charge!

Alpha-Part.

Page 12: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Contribution to Soft-Error Rates

Static combinational logic: 11 %

Sequential elements (FFs, Latches): 49 %

Unprotected SRAM: 40 %

Source: S. Mitra, N. Seifert, M. Zhang, Q. Shi, K. S. Kim,„Robust System Design with Built-In Soft Error Resilience“IEEE Computer, Vol. 38, No.2, Febr. 2005, pp. 43-52

Page 13: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Spikes and Clock Rates in Logic

Source: Pulse of 100 ps

t

clock

t

clock

Charge-/status restorationis possible

Charge-/status restorationis impossible

Fault probability is digital logic is about proportionalto clock frequency!

Page 14: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Logic Structures and Fault EventsInput-FFs Output

FFs

Particle-radiation

Flip-flops need fault tolerance / fault hardeningin the first place, logic close-to outputs comes next.

Page 15: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Muller-C-Element

&

&

&

OR

If both inputs are equal, the value is stored.

If the inputs are different, the previous value is kept.

Page 16: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Fault-Tolerant Latch Design

Latch1

Latch2

MullerC-Element

out

CL

in

t

v(t)

clock

outl1

outl2

outl1= inoutl2= in

outl1,outl2latched

outl1= inoutl2= in

If clock is high: out = in

Page 17: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Fault HandlingMuller-C-Element:

If both inputs are equal: out = outl1, outl2

If both element are not equal: out = previous (outl1, outl2)

Under local fault conditions on the latch outputs (one of 2 latches false), the C-element preserves the outputcondition from the „charge“ phase of the latch.

Latch1

Latch2

MullerC-Element

outin

outl1

outl2 Essentially 3 latches!

Page 18: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Intel‘s Scan Path Element

OR

1D

C1

2DC2

SI

SCB

SCA

Cap-

ture

CLK

D

update

C11D

C1C2

LatchLA

LatchPH2

1D

C1

LatchLB

LatchPH1

2D

1D

SO

Q

Page 19: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Intel‘s Scan Path Element plus Fault Compensation

OR

1D

C1

2DC2

SI

SCB

SCA

&

Cap-

ture

CLK

D

update

C11D

C1C2

LatchLA

LatchPH2

1D

C1

LatchLB

LatchPH1

KeeperLatch

2D

1D

SO

Q

C-Element

Test

Page 20: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

TMR-Latch / Flip-Flop

XOR

in

FF1

FF2

FF3

Out = L1out with cout = 1

MUX

cout

Out = L2out with cout = 0

clock

Can compensate static or dynamic faults in latches / FFs!

Works with latches or flip-flops-

FF1 is untestable (active redundancy)

Page 21: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

TMR-Scan-Element

XOR

ff1

ff2

ff3

Mux3

clock

Scanin

out

Mux1

D

Mux2

SC

TMR

ic

&

ic TMR out

0 1 ff21 1 ff1

0 0 ff20 0 ff2

funct.

Scantest

Scanin dyn.

SC

00

00

11

t1t2

t1t2

0 0 0 ff2

ff20ff2ff2

00 0

1 0 0

Scanin stat.

dyn.

11

01

ff2

ff111

Scanteststat.

0

0

0

1

1

1

ff2

ff1

Signals

Page 22: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

TMR Scan-Element

Fault tolerant in functional mode

Fault tolerant in scan-mode

Optional support of test strategies that require a specific sequence of 2 input bits!

Page 23: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Fault tolerant Latches and FFs

No. oftrans.

Contr.signals

fault tol.funct.dyn.

fault tol.scan dyn.

fault tol.ffs static

Latch withC-elem. [9]

20

0

yes

-

no

Scan-pathcell + C-el. [9]

48

5 (2 clocks)

yes

no

no

TMR scan path .elem.

Scan pathcell [9]

34

4 (2 clocks)

2-pat.scan test

-

no

66

2 (1 clock)

yes

yes

yes

no no yes

TMR-latch

24

yes

yes

0

- -

no

Page 24: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Fault Compensation in Combinational Logic

Input-FFsParticle-radiation

DMC

DMC

DMC

Page 25: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Fault Compensation in Combinational Logic

V(t)

V(t)

V(t)

fault-free signal

Signal with glitch

Signal with delayed glitch

MC capture MC no capture /hold

MC capture

t

t

t

Latchclose

Time leftto capture!

Page 26: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

3. Repair of Permanents Faults

Compensation of transient faults is not enough.

Some technologies for transient compensation can handle permanent faults, too, but not on the long run and withadditional transient faults!

Page 27: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Memory Test & Repair

Lines

columns

Lineaddress

Read-/Write lines

spare column

Page 28: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Memory Test & Repair (2)

Lines

columns

Lineaddress

Read-/Write lines

spare column

MemoryBIST

controller... is already state-of-the-art!

Page 29: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Logic Self Repair

Size or replaced blocks(granularity)

Repair procedureoverhead

FunctioningElements lost

Size or replaced blocks(granularity)

Repair procedureoverhead

FunctioningElements lost

Page 30: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Granularity of Replacement

Granularity(transistors)

100 101 102 103 104 105 106

trans. gate macroFPGA-block

cores CPU

Block-levelreplacement

(e. g. FPGAs)

Core-Replacement(e. g. CPU)

Expected fault density (1 out of..)

Hardly explored(logic)

Granularity(transistors)

100 101 102 103 104 105 106

trans. gate macroFPGA-block

cores CPU

Block-levelreplacement

(e. g. FPGAs)

Core-Replacement(e. g. CPU)

Expected fault density (1 out of..)

Hardly explored(logic)

Page 31: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Levels of RepairTransistors - Switch LevelReplace transistors or transistor groupsLosses by reconfiguration: (switched-off „good“ devices):

Overhead for test and diagnosis: Very highPotentially small ( 20 – 50%) for transistor faults

Gate LevelReplace gates or logic cellsLosses by reconfiguration: Medium (60 to 90 %) for single transistor faultsOverhead for test and diagnosis: Medium to high

Macro-Block LevelReplace functional macros (ALU, FPU, CPU)Losses by reconfiguration: High, 99 % or more

Overhead for test and diagnosis: Low

Page 32: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Replacement in Regular Structures (e.g. for DSP)

Z -1

Z -1

Z -1

Z -1

Z -1

Z -1

Z -1

Z -1

x

x

x

x

x

x

x

xx

x

x

+

++

+

+

+

+

+

+

+

+

+

c0

c1

c2

c3

cM-1

cM

d1

d2

d3

dN-1

dN

x (n)

InputOutputy (n)

Verzöge-rungen

y (n-1)

y (n-2)

y (n-3)

y (n-N-1)

y(n-N)

x(n-1)

x(n-2)

x(n-3)

x(n-M-1)

x(n-M)

Addierer

Multipliz.Z -1

Z -1

Z -1

Z -1

Z -1

Z -1

Z -1

Z -1

x

x

x

x

x

x

x

xx

x

x

+

++

+

+

+

+

+

+

+

+

+

c0

c1

c2

c3

cM-1

cM

d1

d2

d3

dN-1

dN

x (n)

InputOutputy (n)

Verzöge-rungen

y (n-1)

y (n-2)

y (n-3)

y (n-N-1)

y(n-N)

x(n-1)

x(n-2)

x(n-3)

x(n-M-1)

x(n-M)

Addierer

Multipliz.

+

Macro-replacement

faulty

Z-1

Z -1

Z -1

Z -1

Z -1

Z -1

Z -1

Z -1

Z -1

x

x

x

x

x

x

x

xx

x

x

+

++

+

+

+

+

+

+

+

+

+

c0

c1

c2

c3

cM-1

cM

d1

d2

d3

dN-1

dN

x (n)

InputOutputy (n)

Verzöge-rungen

y (n-1)

y (n-2)

y (n-3)

y (n-N-1)

y(n-N)

x(n-1)

x(n-2)

x(n-3)

x(n-M-1)

x(n-M)

Addierer

Multipliz.Z -1

Z -1

Z -1

Z -1

Z -1

Z -1

Z -1

Z -1

x

x

x

x

x

x

x

xx

x

x

+

++

+

+

+

+

+

+

+

+

+

c0

c1

c2

c3

cM-1

cM

d1

d2

d3

dN-1

dN

x (n)

InputOutputy (n)

Verzöge-rungen

y (n-1)

y (n-2)

y (n-3)

y (n-N-1)

y(n-N)

x(n-1)

x(n-2)

x(n-3)

x(n-M-1)

x(n-M)

Addierer

Multipliz.

+

Macro-replacement

faulty

Z-1

Page 33: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Parallel Backup Transistors

VDD

GND

VDD

GND

outin1

in2

in1

in2

out

redundanttransistors

Basic gate Gate with redundant transistors

Page 34: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Redundancy by „Active“ Parallel Transistors

Active redundancy is not testable. Therefore there is no way to monitor the status of „available“ redundancy in a logic circuit.

Parallel transistors cannot compensate a fault of the „stuck-on“ type (transistor always conducting).

Faulty „backup“-transistors may produce additional faults that cannot be corrected!

Adding redundancy is not enough, fault isolation is a real problem!

Page 35: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Configuration and Fault Isolation

VDD

GND

outin1

in2

VDD

GND

out

in1

in2

config. switches

AnAn

config.switches

backuptransistors

Ap Ap

stuck-onfault

Page 36: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

The Gate-Short-Problem

Load1

Load2

Driver

Gate-short

GND-shorts of input gates affect the whole fan-innetwork and make redundancy obsolete!!

Page 37: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Gate Turn-off

VDD

GND

out

in1

in2

config. switches

AnAn

config.switches

backuptransistors

Ap Ap

gate_control

input shut-offswitches

Page 38: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Schematic Layout with VDD/GND Switches

Ap Ap

Anin1 in2GND

VDD

out

in1

An

in2

pass-transistorgate-sep.

pass-transistorgate-sep.

GateSep.

GateSep.

GND

VDD

outin 1 in 2

n-diff.

p-diff.

metal 1

poly-Si

contact

metal2

via

redundantstripes ofp / n-diffusion

Gate with parallelredundancy

Gate with parallel redundancy andfault isolation

Page 39: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Transistor-Level Overhead

Overhead(cells only)

paralleltransistors

VDD / GNDswitches

separate gate poly lines

stuck-off coveragestuck-oncoverage

gateshorts cov.

control

30-40% 60-80 % 100-150 %

yes yes yes

no yes yes

no no yes

none one wire mult. wires

Redundancy

lines

estimates

Page 40: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Duplicate Standard CellsVDD

VDD-SwitchSwitchcontrol

VDD1

GND

out

in1

in2

Gate 1

out

VDD2

GND

in1

in2

Gate 2

Page 41: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Again: Fault IsolationVDD

VDD-SwitchSwitchcontrol

VDD1

GND

out

in1

in2

Gate 1

out

VDD2

GND

in1

in2

Gate 2

Output VDD / GND shortGate input short

Page 42: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Administrated Duplicate Cells

Gate 1 Gate 2

GND

VDD

VDD1 VDD2

Act 1

GND1 GND2

Gateshort

gate in

gateout

gate in

gateout

power switches

GND switchesAct 2

0 1

1 0

X 0

X 1

1 0

0 X

1 X

Page 43: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

FeaturesUse „normal“ cell designsFour states of operation: Config. 1: Gate 1 active, Gate 2 isolatedConfig. 2: Gate 2 active, Gate 1 isolatedConfig. 3: Both Gates active operating in parallel Config. 4: Both Gates isolated from VDD / GNDOperations like „high / low power“ possible.Cells can be put to temporary „sleep“ for stress relieve.Permanent repair functions.Active cell output is connected only to „floating“outputs of the other cell.If twin tubs are used and cell-internal tubs arealso disconnected, gate input / GND short prohibited.

Page 44: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Bistable Switching Cell

Gate 1 Gate 2

GND

VDD

Act

0 1

0 1

1 0

1 0

0 1

1 0

1 0

0 1Outputseparation

Page 45: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Cell Duplication and Power Switch

Possible for all types of cells (also flip-flops).

Granularity of partitioning for replacements (single gates,blocks) can be selected upon demand.

Combination with dynamic circuit optimization is favorablypossible.

Good coverage potential for transistor faults.

Significant overhead (above 100 %), but most likely belowTriple Modular Redundancy (TMR).

Redundancy may become exhausted and requires a further levelof redundancy!

Page 46: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Gate - Replacement

Std cells (gates)

Gate-fault

backup-cell

Insertion of replacement cell

Page 47: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Regular Logic Wiring

feeddrive

Con

fig

Blo

ck

logicgates

link

backupcell

link

next cell

next cell

next cell

Page 48: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Faults on Irregular Interconnects

Ssignal source C

C

C

C

Routing tree

single fault(line break)

Page 49: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Redundant Wiring

Ssignal source C

C

C

C

Routing tree with loops

single fault(line break)

extra wire .. plus double vias!

Problem: classic delay calculation works well on trees only!

Page 50: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

4. Bus Structures and „Networks on Chip“ (NoCs)

Technology forecasts predict that nano-wires may becomethe most vulnerable and unreliable circuit elements ...

Page 51: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Buses versus NoCs

Busmaster

Busmaster

Busmaster

Busmaster

Busmaster

NoCnode

NoCnode

NoCnode

NoCnode

NoCnode

NoCnode

NoCnode

NoCnode

NoCnode

Irregular bus structure(SoC)

Regular network structure(NoC)

Page 52: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Faults on Bus Structures

BM1

BM2

BM3

BM4

BM5

BM6

Local defectaffecting thetotal network

Page 53: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Bus Fault Conditions

A single permanent fault on a bus may affect the busas a whole.

Fault detection and compensation by methods developedfor transient faults (Hamming code, ECC-checks) can handlestatic faults, but are relatively expensive.

Capabilities of handling transient faults on top of permanentfaults are limited.

Technology forecasts predict a reliability problem withinterconnects (nano-wires) in nano-technologies.

Page 54: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Bus Segmentation

BM1

BM2

BM3

BM4

BM5

BM6

SC

SC

SC

SC

SC

SC

SC

SC

SC

segmentcouplers

Structure the bus into segments that can be repairedindividually!

Page 55: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

The Switching Problem

n n+k

1 1p p

n k p switches contr. states

8 1 1 16 9

16 1 1 32 3332 2 2 128 65

n

backup

Page 56: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Faults and Repair Actions1. Line- break: Section of a line is interrupted

use spare wire!

2. Line- short to GND: Section of a line is connected to GND

use spare wire!

3. Dynamic coupling between adjacent line:

a. Re-allocate lines in bundle

b. Insert grounded line for decoupling

4. Bridge between lines:

a. Feed both lines with same signal

b. Make one line „floating“

Page 57: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Single Line Replacement

s0

(k-1)

b0

b1

s1s2s3s4

b2

Bachup

Signal

Overhead: 2k switches, (k+1) logic states for 1 backup line

2pk switches, p (k+1) logic states for p backup lines

Fault

Page 58: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Inserting Lines for Decoupling

s0

(k-1)

b0

b1

s1s2s3s4

b2

Backup

Signal

coupling-fault

Multiple line insertion for de-coupling requires multipleShifts of lines, multiple switches and states!

Page 59: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Repair Mechanisms

Buses with „extra“ backup lines that need specific configurationfor repair generate high cost in terms of switches and administration due to many „logic states“ of the bus section.

Such repair schemes are not suited to re-organize neighborhood relations on buses for de-coupling of lines.

Try to cover all relevant fault conditions by a small set ofstates using permutation of lines!

Page 60: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Reconfiguration for De-Couplings0

s5

SC SC

s0

s5

SC SC s0

s5

reconfigure

… can help to minimize dynamic coupling faults!

ik

ik

ik

ik

2-Way Switchesmay be used!

Page 61: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Characteristics of 6 / 8 Wire BundlesGiven a bundle of 6 or 8 bus lines:

Are there any permutations that create all-new neighborsfor every single line in order to eliminate coupling faults?

0 - 2

1 - 4

2 - 0

3 - 5

4 - 1

5 - 3

6 lines 8 lines

0 - 21 - 62 - 03 - 54 - 75 - 36 - 1 7 - 4

0 - 31 - 52 - 73 - 0 4 - 65 - 16 - 4 7 - 2

NNP6 NNP81 NNP82 NNP83

0 - 51 - 72 - 4 3 - 64 - 25 - 06 - 3 7 - 1

Page 62: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

6 Wires: Permutations and ReplacementNNPPW2 PW3

005

02

53

0324

51

30

1 12

14

20

15

422403

221

02

14

2105

34

50

2,5,3

4,2,0

0,1,4

3

NNPPW2 PW3

34

35

41

305142155,4,1

443

4130

42

152403

1,3,0

550

53

02

51300324

3,0,2

2

42

2 4 backup

backup

Input wire

mapping1st switching column

2nd switching column3rd switching column

Replacement possibleby lines # (2 sw. col.)

Line selectedfor backup

Selected backup lines

PermutationsAdministration:

4 logic states for2 sw.-columns

6 logic states for3 sw.-columns

2 extra. wires

1 extra. wire

Page 63: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Selection of Permutations

All single faults must be repairable by selectinga minimum set of permutations.

Those lines that can act as replacement for most of theothers are selected for „backup lines“.

No permutation used for repair must map a functionalline to a faulty line.

By permutation, also non-faulty functional lines are re-arranged.

Page 64: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Permutations for 8-Wire-Bundles

0 - 21 - 62 - 03 - 54 - 75 - 36 - 1 7 - 4

0 - 31 - 52 - 73 - 0 4 - 65 - 16 - 4 7 - 2

PW1 PW2 PW3

0 - 11 - 02 - 33 - 24 - 55 - 46 - 77 - 6

1- 32 - 43 - 14 - 25 - 76 - 07 - 5

0 - 6NNP1 NNP2

0 - 41 - 72 - 53 - 64 - 05 - 26 - 3 7 - 1

0 - 51 - 72 - 4 3 - 64 - 25 - 06 - 3 7 - 1

NNP3

New-neighborhood Pair-wise symmetrical

Page 65: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

8 Wires: Permutations and ReplacementNNP2NNP1 PW3

002

03

27

0436

25

71

116

15

64

17

526340

220

72

03

2571

04

36

3,2,7

5,6,4

7, 0,3

3

NNP2NNP1 PW3

35

30

51

360452170,5,1

447

4672

40

637125

6,7,2

553

51

30

521736041,3,0

NNP2NNP1 PW3

661

77

4

64

15

63401752

4,1,5

72

46

71

25

4063

2,4,65

5 57

7

7

Bit

Bit

Bit

Bit

Bit

Bit

Bit

Bit

Selectedbackupwires

Selectedbackup

2 lines selected for backup!

Permutations

Page 66: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

8 Wires: Permutations and Replacement

NNP2NNP1 PW3

002

03

27

0436

25

71

116

15

64

17

526340

220

72

03

2571

04

36

3,2,7

5,6,4

7, 0,3

3

NNP2NNP1 PW3

35

30

51

360452170,5,1

4 47

4672

40

637125

6,7,2

553

51

30

521736041,3,0

NNP2NNP1 PW3

661

77

4

64

15

63401752

4,1,5

72

46

71

25

4063

2,4,65

5 57

7

Bit

Bit

Bit

Bit

Bit

Bit

Bit

Bit

2

2

2

7

1

1 1

4 lines selected for backup!

Permutations

Page 67: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Overhead / Coverage for 6-Line-Bundle

Spare. lines / Switches

-Singleline fault

Dyn. coupl.faults

Doubleline faults

Faults 0/ 12 1 /36 2 / 24

+

-

+

+

-

+

+

50%

Page 68: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Overhead / Coverage for 8-Line-Bundle

Spare Lines (out of 8) / Switches

-Singlefine fault

Dyn. coupl.fault

Doubleline faults

Faults 0/ 16 1 /48 2 / 32

+

-

+

+

-

+

+ +

20%

3 / 32

+

++

30%

+

++

100 %

4/ 32

Note: The number of switches is reduced by a factorof 2 if full 2-way-switches with 2 inputs / 2 outputs are used!

Page 69: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

ResultsBus segments can favorably be organized into bundlesof 8 lines for reconfiguration. Wider bundles require evenmore columns of switches.

In a bundle of 8 lines, all single faults can be repairedeither by one backup line and 3 columns of switches ortwo backup lines and 2 columns with 6 / 4 logic states.

Two columns with 4 states also allow for two alternativemodes of changing neighborhood relations for de-coupling.It also covers a fraction of double-line faults.

A full coverage of double-line-faults requires 4 backup linesand 2 columns of switches or 2 backup lines and 4 columns.

Page 70: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Administration Scheme

A B B A

0

1

2

3

4

5

6

7

0

1

2

3

4

5

6

7

0‘

1‘

2‘

3‘

4‘

5‘

6‘

7‘

C1 C2

Switches

C2 C1

Config-bitsDecode Decode

Config-Logic

Config-Logic

Switches

Matching

in / out

in / out

linesSC SC

Page 71: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Processor-Based Bus Test

TestProcessor

BusMaster

Bus Master

BusMaster

clock

reflector select

invert control

data lines

Bus reflector

Page 72: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Test and Fault Diagnosis

Test Processor

BM

BM

BM

BM

BM

BM

SC

SC

SC

SC

SC

SC

SC

S C

S C

SC

SegmentStatusList

Page 73: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Upcoming:Test Procedure & Fault Management

Test-Processor can „reset“ control of bus sections.

Test processor runs diagnostic test to identify faulty lines.

In case of faults, „trial and error“ test to identifyfaulty line segment(s).

Test Processor keeps „fault list“ for redundancymanagement & supervision.

Page 74: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

SummaryA simple scheme of re-arranging bus sections for repair ofpermanent faults.

Simple control scheme based on few logic states.

The number and the electrical effect of switches in complexbus systems may still cause problems.

Modular approach based on bundles of lines is scalable tocover wider buses. Should work well with NoCs.

Compatibility with regular schemes for bus test based on adedicated test processor device.

Page 75: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

5. Diagnostic Tests

Fault diagnosis by diagnostic (self-) test is possibly the real bottleneck in logic BISR!

Page 76: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Fault DiagnosisMemory cells are either to diagnose in case of faultsaffecting single cells. BIST is possible.

Diagnostic tests of buses that have to discover a singlefaulty line are straightforward. They can easily find whichwires are affected, but not where the fault is.

Detecting a fault gate or even transistor in a logic blockis a much more challenging problem. Diagnosis must be compatible with methods of test response compaction usedin scan testing.

Intelligent encoding for test responses! ... such as done by U. Potsdam and Infineon!

Page 77: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Combinational Logic Fault DiagnosisInput-FFs Output

FFs

Faults can occur within specific gates, on interconnects,or in a „distributed“ manner. Identifying a specific fault gate or line isnot easy at best and sometimes close-to impossible by logic testing.

Page 78: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Comb.Logic

(pseudo-) inputs

(pseudo-) outputs

Inputvector

Outputvector

Logic Test

Page 79: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Scan Path Technology

Comb.Logic

(pseudo-) inputs

(pseudo-) outputs

Inputvector

Outputvector

ff

ff

ff

ff

ff

ffff

ff

ff

ff

ff

Scan-in Scan-out

Page 80: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Scan-based Logic Test

De-compactor

Compacted / encoded test information

CL

CL

Test response compactor

Diagnosis

Coding

Page 81: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

Fault Diagnosis on Compacted Output Data

Scan input Generator(De-Compactor)

*patented, U. Potsdam and Infineon Technologies AG

& & & & & & &

MISR Ref. MISR

compare

d0 d1 d3 d4 d5 d6d2d-valuestorage

scan clock

MISR clock: k * scan-clock

Page 82: Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic

Lehrstuhl Technische Informatik - Computer Engineering

Brandenburgische Technische Universität Cottbus

6. A Lot of Work to Do

Logic fault diagnosis

Efficient logic self repair

Redundancy supervision and management

Resource management under fault conditions

Repair functions for interconnects

Overall system-level fault management