40
Mark Ren, Miloni Mehta USING MACHINE LEARNING FOR VLSI TESTABILITY AND RELIABILITY

USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

Mark Ren, Miloni Mehta

USING MACHINE LEARNING FOR VLSI TESTABILITY AND RELIABILITY

Page 2: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

2

TAKE-HOME MESSAGES

• Machine learning can improve approximate solutions for hard problems.

• Machine learning can accurately predict and replace brute force methods for computational expensive problems.

Page 3: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

3NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

VLSI TESTABILITY AND RELIABILITY

Design Manufacturing Wafer Chip

Testing

FailPassYears

TestabilityReliability

Page 4: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

4

PART 1

Testability Prediction and Test Point Insertion with Graph Convolutional Network (GCN)

Mark Ren, Brucek Khailany, Harbinder Sikka, Lijuan Luo, Karthikeyan Natarajan

Yuzhe Ma, Bei Yu

“High Performance Graph Convolutional Networks with Applications in Testability Analysis”, to appear in Proceedings of Design Automation Conference, 2019

Page 5: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

5

PART 2

Full Chip FinFET Self-heat Prediction using Machine Learning

Miloni Mehta, Chi Keung Lee, Chintan Shah, Kirk Twardowski

Page 6: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

6

PART 1 OUTLINE

Introduction

Learning model for testability analysis and enhancement

Practical issues

Scalability

Data imbalance

Page 7: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

7

HOW DO WE TEST A CHIP

100010

000101

100111

011101

010101

101111

001011

110101

010101

101111

001011

110101

Input patterns output patterns golden patterns

?

Stuck-at-0 fault

GND

010101

111111

001111

110101

Page 8: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

8

TESTABILITY PROBLEM

i0

O

Almost always 0

B’s faults unobservable → Difficult-to-test (DT)

i1

i2

i3

i4

i5

i6

i7

i8

i9i10

B

A

Almost always 0

B’s faults are observable with an inserted register

Test Point (TP)Stuck-at-0

fault

GND

Page 9: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

9

MOTIVATION

Test Point Insertion Problem:

Pick the smallest number of test points to achieve the largest testability enhancement

Number of test points → chip area cost

Number of test patterns → test time

Hard problem, only approximate solutions exist

Commercial solution: Synopsys TetraMax

Can we improve it with Machine Learning?

Predict testability

Select test points

Page 10: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

10

ML BASED TESTABILITY PREDICTION

Given a circuit, predict which gate outputs are difficult-to-test (DT)

Gate Features: [logic level, SCOAP_C0, SCOAP_C1, SCOAP_OB]

Gate Label: DT (0 or 1) generated by TetraMax

Input Features

N1: 0,0,1,1N2: 1,0,1,0N3: 2,0,1,1

.

.

.

Output classification

N1: 0N2: 1N3: 0

.

.

.

ML Model

Page 11: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

11

BASIC MACHINE LEARNING MODELINGDid not fully leverage the inductive bias of circuit structure

1a

3

42

5

6

7

8

9

10

11

12

13

F(a) = [Fa, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10]

fanin fanout

ML ModelsLR

RF

SVM

MLP

a is DT

a is not DT

fanin fanout

Page 12: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

12

GRAPH CONVOLUTIONAL NETWORK (GCN)

9

2

5

4

6

7

31

8

Aggregation (mean, sum)

Encoding (R4 → R32,Relu)

Page 13: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

13

GCN BASED TESTABILITY PREDICTION

Weighted sum & Relu(R4

→ R32)Weighted sum & Relu(R32

→ R64)

1

0

1

0

00

Layer 1 Layer 2 Layer 3 Fully Connected Layers

Weighted sum & Relu(R64

→ R128)(64,64,128,2)

Page 14: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

14

ACCURACY IMPACT OF GCN LAYERS (K)

60

65

70

75

80

85

90

95

100

1 31 61 91 121 151 181 211 241 271

Epochs

Training Accuracy (%)

K=1

K=2

K=3

60

65

70

75

80

85

90

95

100

1 31 61 91 121 151 181 211 241 271

Epochs

Testing Accuracy(%)

Page 15: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

15

EMBEDDING VISUALIZATION

K=1 K=2 K=3

• Embeddings looks more discriminative as stage increase;

Page 16: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

16

MODEL COMPARISON ON BALANCED DATASET

Compare with basic ML modeling: LR, RF, MLP, SVM

N=500 nodes in fanin cone and 500 nodes in

fanout cone, a total of 1000 nodes

Compare to 3-layer GCN

Less than 1000 nodes influence each node,

comparable with the baseline

GCN has the best accuracy (93%).

00.10.20.30.40.50.60.70.80.9

1

Precision Recall F1 score Accuracy

Page 17: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

17

TEST POINT INSERTION WITH GCN MODEL

An iterative process to select TPs

enabled by GCN model

Select TP candidate based on

predicted impactNumber of reduced DTs in the

fanin cone of TP

GraphCircuit GCN Model

Graph Modification

GCN Model

Impact Estimation

Point Selection

Graph Modification

Done?N

Y

TP Candidates

new TP

new TP

Final TPs

Page 18: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

18

TEST POINT INSERTION RESULTS COMPARISON

11% less test points with 6% less test pattern under same coverage vs TetraMax.

Machine learning can improve approximate solutions for hard problems

-15.00%

-10.00%

-5.00%

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

1 2 3 4

Test point reduction

Test pattern reduction

Page 19: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

19

MODEL SCALABILITY

Choices of model implementation

Batch processing: Recursion

Full graph: Sparse matrix multiplication

𝐸𝑘 = 𝑅𝑒𝐿𝑈((𝐴 ∗ 𝐸𝑘−1) ∗ 𝑊𝑘)

Tradeoff

Memory vs speed

1M nodes/second on Volta GPU

Page 20: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

20

MULTI GPU TRAINING

Training dataset has multiple million gates designs that can not fit on one GPU

Data parallelism, each GPU computes one design/graph

Replicate models across multiple GPUs

Leverage PyTorch DataParallel module

Trained with 4 Tesla V100 GPUs on DGX1

Shared modelGPU1

Shared modelGPU2

Graph1

Graph2

Shared modelGPU3

Shared modelGPU4

Graph3

Graph4

Δ

Page 21: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

21

IMBALANCE ISSUE

It is very common to have much more non-DTs (negative class) than DTs (positive class), imbalance ratio more than 100X

Predict: 0 Predict: 1

Fact: 0 133576 290

Fact: 1 3681 432

Classifier 1: ok precision, low recall

Predict: 0 Predict: 1

Fact: 0 100919 32927

Fact: 1 114 4069

Classifier 2: high recall, low precision

Recall: 10.5%Precision: 59.8%

Recall: 97.3%Precision: 11.0%

Page 22: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

22

MULTI-STAGE CLASSIFICATION

The networks on initial stages only filter out negative data points with high confidence

High recall, low precision

Positive predictions are sent to the network on the next stage

Network 1 Network 2 Network 3

-+

-+ - +

Page 23: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

23

MULTI-STAGE CLASSIFICATION RESULTBalanced Recall and Precision

Pred: 0 Pred: 1

Fact: 0 100919 32927

Fact: 1 114 4069

Pred: 0 Pred: 1

Fact: 0 26935 5992

Fact: 1 221 3848

Pred: 0 Pred: 1

Fact: 0 5207 785

Fact: 1 309 3539

Pred: 0 Pred: 1

Fact: 0 133061 785

Fact: 1 574 3539

Overall

Recall: 86.0%

Precision: 81.8%

Stage 1Recall: 97.3%Precision: 11.0%

Stage 2Recall:94.6%Precision: 39.1%

Stage 3Recall: 92.05Precision: 81.8%

Page 24: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

24

PART 1 - SUMMARY

Machine learning can improve VLSI design testability beyond the existing solution

Predictive power of ML model

Graph based model is suitable for VLSI problems

Practical issues such as scalability and data imbalance need to be dealt with

Page 25: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

25

PART 2

Full Chip FinFET Self-heat Prediction using Machine Learning

Miloni Mehta, Chi Keung Lee, Chintan Shah, Kirk Twardowski

Page 26: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

26NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

VLSI TESTABILITY AND RELIABILITY

Design Manufacturing Wafer Chip

Testing

FailPassYears

TestabilityReliability

Page 27: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

27

SEMICONDUCTOR RELIABILITY

Source: https://semiengineering.com/improving-automotive-reliability/

Page 28: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

28

RELIABILITY

Active power in transistors dissipated as heat to the surroundings

FinFETs are more sensitive to SH than planar devices

Why do we care?

Exacerbates Electro-migration (EM) on interconnects

Transistor threshold voltage (Vt) shifts

Time dependent dielectric breakdown (TDDB)

DEVICE SELF-HEAT (SH)

0

0.5

1

1.5

2

2.5

90 110 130 150 170

EM

rati

ng f

acto

r (I

max)

Temperature

16ff EM limit reduction vs Temperature

Page 29: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

29

SH METHODOLOGIES SO FAR

No sign-off tool that can handle full chip SH analysis

Limitations using Spice simulations

Impractical to run on billions of transistors

Teams review high power density cells

2D Look-up Table approach

Based on frequency and capacitive loading for different clock drivers

Reduced run time by more than 90% over full Spice simulations

Pessimistic wrt Spice

LUT

SPICE

2D LUT vs Spice comparison

Tem

pera

ture

(C)

Page 30: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

30

SELF-HEAT TRENDS

Frequency ∝ SH

Capacitive loading ∝ SH

Cell size ∝ 1/SH

Resistance ∝1/SH (non-linear)

Pre

dic

ted S

H

Frequency

R/C (1e12)

Norm

alise

d

SH

Page 31: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

31

MOTIVATION TO USE ML

Identify problematic cells in the design without exhaustive Spice simulations

Complex relationship between design and SH

Design database available for several projects

Reusability across projects

Focus

Clock inverters and buffers

Quick, easy, light-weight

Rank cells above certain SH threshold for thorough analysis

Page 32: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

32

MACHINE LEARNING MODEL

Select

Training

Data

Get Attributes

from

PrimeTime

Simulate in

HSPICE

Generate

ML Model

Equation:Y^ = ?

Ytraining

Xtraining

Select

Test Data

Get Attributes

from PrimeTime

Simulate in

HSPICE

Prediction on

Test Set

X

Y

No

Yes

(Predicted == Spice)?

Ready for Deployment

Validation

test

test

Ypred-test

Page 33: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

33

DATASET SELECTION

Cover a wide range of frequencies

Cover different types of standard cell sizes

Prevent duplication in training data due to replicated partitions/chiplets

Outliers in the design chosen

Labels obtained through Spice simulations (supported from foundry spice models)

TSMC 16nm FinFET training model used 4300 training samples with 9 features

Page 34: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

34

DNN REGRESSOR MODEL

Xn1

Xn2

Xn9

Input Layerhidden layer 1

hidden layer 2

hidden layer 3

Output layer

Predicted Self-Heat Yn^

X11 X12 . . . X19X21 X22 . . . X29

.

.

.Xn1 Xn2 . . . Xn9

Cost = Σ (Ypred- Y)2

Features:Output CapacitanceFrequencyCell sizeNet resistanceInput slewOutput slew# of output loadsInput Capacitance of loadsAvg transition on load

N

Page 35: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

35

MINIMIZING COST FUNCTION

Gradient descent

Adam optimizer which has adaptive learning rate

Exponential Linear Unit (ELU) used as activation function

300,000 training steps

Page 36: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

36

RESULTS

Xavier CPU 2000 validation samples

Good correlation between DNN prediction and Spice SH

Average err % wrt Spice = 6.5%

MSE = 0.05 Spice SH

Page 37: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

37

QUANTITATIVE BENEFITS

Trained model is deployed for inference on millions of clock cells

Training time: 37 minutes (DGX1 used)

Inference time: <1min

>99% cells filtered from Spice simulations!

Top 1000 prediction results simulated and verified

Found small clock tree cells had highest SH

Outlier detection improved inference by 2.65% in Turing

Page 38: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

38

COMPARISON TO PRIOR WORK

Instance #

Page 39: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model

39

PART 2 - SUMMARY

FinFET Self-Heat is a growing reliability concern

Proposed supervised ML model using DNN

Accurately predict Self-heat

100x runtime improvement

Displayed techniques to select representative dataset for training

Model deployed for Xavier and Turing projects

Use ML techniques to improve productivity and solve challenging problems in VLSI

Page 40: USING MACHINE LEARNING FOR VLSI TESTABILITY AND …...Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model