1
Department of Electrical and Computer Engineering, University of Virginia Xinfei Guo, Mircea R. Stan Enabling Wearout-Immune BEOL and FEOL with Active Rejuvenation {xg2dt, mircea}@virginia.edu BEOL & FEOL Wearout Wearout Issues Previous Solutions Tolerate - Design for the worst case (margins, upsizing) Compensate - Dynamically adapt to wearout Slow down the wearout - Reduce the stress Passive Recovery - Remove the stress, shut off the core Consider FEOL and BEOL separately! This Work Repair Both wearout completely by introducing the notion of Accelerated & Active Recovery Circadian Rhythms for FULL recovery Demonstrate both solutions experimentally Introduce Accelerated & Active Recovery as a new design knob for cross-layer resilience Wearout Accelerated & Active Recovery Interface Board Chip Data Sampling 16-b fref clk in Cout 16 En En 75 LUTs Circuit Under Test (CUT) rst Thermal Chamber Counter Accelerated & Active Recovery for 12 hours 24.5 25 25.5 26 Frequency(MHz) Accelerated Stress for 48 hours 72.4% An example where about 72.4% of wearout is recovered by accelerated self-healing techniques in only ¼ of stress time (measured). Accelerated self-healing space exploration (model prediction) Related Publications [Submitted] Enabling Wearout-Immune BEOL and FEOL with Active Recovery [INTEGRATION, the VLSI Journal] Implications of Accelerated Self-Healing as a Key Design Knob for Cross-Layer Resilience[ASPDAC ‘16] “Work hard, sleep well - Avoid irreversible IC wearout with proactive rejuvenation” [SELSE ‘15] “MCPENS: Multiple-Critical-Path Embeddable NBTI Sensors for Dynamic Wearout Management” [DAC ‘14] “Modeling and Experimental Demonstration of Accelerated Self -Healing Techniques” Sleep-when-getting-tired for FULL recovery Main Idea Schedule accelerated and active recovery for both wearout before the irreversible wearout kicks in and do this periodically Cross-Layer Implementations -V Active Recovery EN Circuit Wearout Sensors Architecture Dark Silicon Program Counters +1 System Virtual Sensors Load Balancer Heating Elements Accelerated Recovery EN Negative Voltage Generator Proactive Scheduler Redundant Resources Cross-layer Accelerated Self-Healing Core sensors Sensor Sensor Proactive Accelerated From To cores Apply to Sleep Cores Active Cores Accelerated router Scheduler & Active Recovery outputs sleep cores outputs Applications Scheduler & Active Recovery Blocks Core Allocation Load Balancer Heat for accelerated recovery A potential implementation of Cross-layer Accelerated Self-Healing in a NoC system 2016 IEEE/ACM Workshop on Variability Modeling and Characterization (VMC), Austin, TX BEOL FEOL Electromigration (EM) Negative/Positive-bias temperature instability (N/PBTI) ? System Circuit Device BTI |V th | increase EM → R increase Faults, Errors, Reduced MTTF Delay increase (e.g. 10% in 3 years), Timing violations Resistance Recording Thermal Chamber Constant Current Supply Device (Wire) under test BEOL Wearout (EM) Test Setup EM Test Chip Die Photo Metal wire under test Probe Pads FEOL Wearout (BTI) Test Setup On-chip Wire Information BTI Test Structure Vsg = 0, room Passive Recovery Vsg = 0, high Accelerated Recovery Activate the recovery Vsg = negative room temperature Active Recovery: Vsg = negative high temperature Accelerated & Active 1 2 3 4 Recovery temperature temperature I = 0, room Passive Recovery I = 0, high Accelerated Recovery Activate the recovery I = negative room temperature Active Recovery: I = negative high temperature Accelerated & Active 1 2 3 4 Recovery temperature temperature BTI (FEOL Wearout) active and accelerated recovery EM (BEOL Wearout) active and accelerated recovery Experimental Results FEOL Measurement Results BEOL Measurement Results 72.8 73 73.2 73.4 73.6 73.8 74 74.2 74.4 74.6 74.8 0 200 400 600 800 1000 1200 R () Time (min) Stress (230C, +100mA) Recovery (230C, -100mA) There is still a permanent part Summary BTI - 72.4% of the wearout is recovered within only 1/4 of the stress time through both high temperature and negative voltage EM - >75% of the wearout is recovered within 1/6 of the stress time with reverse current and high temperature In both cases, there is still an irreversible (permanent) component Fresh Turbo-boost Frequency Time Design Margin Average Frequency Negative Days Hours End of life Years Wearout No recovery Active Sleep FEOL recovery under different “circadian rhythms” Recovery under Different “circadian rhythms” 2X 1X Reduction of Design Margin >60X BEOL recovery right after void growth phase starts 76 77 78 79 80 0 200 400 600 800 R () Time (min) EM Stress (230C, +100mA) EM Accelerated & Active Recovery (230C, -100mA) Fully Recovered! Start Recovery EM induced by reverse current operation

Enabling Wearout-Immune BEOL and FEOL with Active ...people.virginia.edu/~xg2dt/papers/Xinfei Guo_VMC_Poster_48x36in.pdf · Enabling Wearout-Immune BEOL and FEOL with Active Rejuvenation

  • Upload
    doannhu

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Enabling Wearout-Immune BEOL and FEOL with Active ...people.virginia.edu/~xg2dt/papers/Xinfei Guo_VMC_Poster_48x36in.pdf · Enabling Wearout-Immune BEOL and FEOL with Active Rejuvenation

Department of Electrical and Computer Engineering, University of Virginia

Xinfei Guo, Mircea R. Stan

Enabling Wearout-Immune BEOL and FEOL with Active Rejuvenation

{xg2dt, mircea}@virginia.edu

BEOL & FEOL Wearout

Wearout Issues

Previous SolutionsTolerate - Design for the worst case (margins, upsizing)

Compensate - Dynamically adapt to wearout

Slow down the wearout - Reduce the stress

Passive Recovery - Remove the stress, shut off the core

Consider FEOL and BEOL separately!

This WorkRepair Both wearout completely by introducing the notion

of Accelerated & Active Recovery

Circadian Rhythms for FULL recovery

Demonstrate both solutions experimentally

Introduce Accelerated & Active Recovery as a new design

knob for cross-layer resilience

Wearout

Accelerated & Active Recovery

Interface Board ChipData Sampling

16-b

frefclk

in

Cout

16

En En

75 LUTs

Circuit Under Test (CUT)

rst

(a) (b)

Thermal Chamber

Counter

Accelerated & Active

Recovery for 12 hours24.5

25

25.5

26

Fre

qu

ency

(MH

z)

Accelerated Stress for 48 hours

72.4%

An example where about 72.4% of wearout is

recovered by accelerated self-healing techniques in

only ¼ of stress time (measured).Accelerated self-healing space exploration (model prediction)

Related Publications• [Submitted] Enabling Wearout-Immune BEOL and FEOL with Active Recovery

• [INTEGRATION, the VLSI Journal] “Implications of Accelerated Self-Healing as a Key Design Knob

for Cross-Layer Resilience”• [ASPDAC ‘16] “Work hard, sleep well - Avoid irreversible IC wearout with proactive rejuvenation”

• [SELSE ‘15] “MCPENS: Multiple-Critical-Path Embeddable NBTI Sensors for Dynamic Wearout

Management”

• [DAC ‘14] “Modeling and Experimental Demonstration of Accelerated Self-Healing Techniques”

Sleep-when-getting-tired for FULL recovery

Main IdeaSchedule accelerated and active recovery for both wearout before

the irreversible wearout kicks in and do this periodically

Cross-Layer Implementations

- VActive Recovery

EN

CircuitWearout Sensors

Architecture

Dark SiliconProgramCounters

+1

SystemVirtualSensorsLoad Balancer

Heating Elements

AcceleratedRecovery

EN

Negative Voltage Generator

Proactive Scheduler

RedundantResources

Cross-layer Accelerated Self-Healing

Core

sensors

Sensor

Sensor

Proactive

Accelerated

From

To cores

Apply to

Sleep Cores Active Cores

Accelerated

rou

ter

Scheduler

& Active Recovery

outputs

sleep cores

outputs

Applications

Scheduler

& Active Recovery

Blocks

Core

Allocation

LoadBalancer

Heat for accelerated

recovery

A potential implementation of Cross-layer Accelerated

Self-Healing in a NoC system

2016 IEEE/ACM Workshop on Variability Modeling and Characterization (VMC), Austin, TX

BEOL

FEOL

Electromigration (EM)

Negative/Positive-bias temperature

instability (N/PBTI)

?

System

Circuit

Device BTI → |Vth| increase

EM → R increase

Faults, Errors,

Reduced MTTF

Delay increase (e.g. 10% in 3 years),

Timing violations

Resistance Recording

Thermal Chamber

Constant Current Supply Device (Wire) under test

BEOL Wearout (EM) Test Setup

EM Test Chip Die Photo

Metal wire under test

Probe

Pads

FEOL Wearout (BTI) Test Setup

On-chip Wire Information

BTI Test Structure

Vsg = 0, room

Passive Recovery

Vsg = 0, high

Accelerated RecoveryActivate the recovery

Vsg = negative room temperature

Active Recovery:

Vsg = negativehigh temperature

Accelerated & Active 1 2 3 4

Recovery

temperature temperature

I = 0, room

Passive Recovery

I = 0, high

Accelerated RecoveryActivate the recovery

I = negative room temperature

Active Recovery:

I = negativehigh temperature

Accelerated & Active 1 2 3 4

Recovery

temperature temperature

BTI (FEOL Wearout) active and accelerated recovery

EM (BEOL Wearout) active and accelerated recovery

Experimental Results

FEOL Measurement Results

BEOL Measurement Results

72.8

73

73.2

73.4

73.6

73.8

74

74.2

74.4

74.6

74.8

0 200 400 600 800 1000 1200

R (

)

Time (min)

Stress (230C, +100mA)

Recovery (230C, -100mA)

There is still a permanent part

Summary• BTI - 72.4% of the wearout is recovered within only 1/4 of the stress

time through both high temperature and negative voltage

• EM - >75% of the wearout is recovered within 1/6 of the stress time

with reverse current and high temperature

• In both cases, there is still an irreversible (permanent) component

Fresh Turbo-boost

Fre

qu

en

cy

Time

Desig

n M

arg

in

Average

Frequency

Negative

Days

Hours

End of life

Years

Wearout

No recoveryActive

Sleep

FEOL recovery under different “circadian rhythms”

Recovery under Different “circadian rhythms”

2X

1X

Reduction of Design Margin >60X

BEOL recovery right after void growth phase starts

76

77

78

79

80

0 200 400 600 800

R (

)

Time (min)

EM Stress (230C, +100mA)

EM Accelerated & Active

Recovery (230C, -100mA)

Fully Recovered!

Start Recovery

EM induced

by reverse

current

operation