25
Accelerating Microprocessor Silicon Validation by Exposing ISA Diversity Rong Xu Zixin Wang 1

Accelerating microprocessor silicon validation by exposing ... · Accelerating Microprocessor Silicon Validation by Exposing ISA Diversity Rong Xu Zixin Wang 1

Embed Size (px)

Citation preview

Accelerating Microprocessor Silicon Validation by Exposing ISA DiversityRong XuZixin Wang

1

Background

2

Aggressive technology scaling and extreme chip integration significantly increase the complexity of microprocessor• Infeasible to exhaustively test when simulation

Pressure on validation team to deliver correct design to market on time is higher than ever.• 50% of microprocessor chips require extra unplanned

tape-out

3

4

Effective post-silicon validation needed to eliminate bugs before volume production

Random instruction tests(RIT) contribute tremendously to the detection of design bugs

5

Prototype to Volume Production

Challenges

Expected Responses

Actual Responses co

mpa

rato

r

Bug detected

Host Machine

Prototype

Actual Responses RIT

Simulation Limitations

Post-Processing (Triaging)

Blocking Bugs

6

Simulator

ISA DiversityConcept: Operations of an ISA can be performed equivalently in more than one different way.

7

Benefit: Same operation in different ways produce identical results but activate different logic paths

Application: Enable bug detection by comparing results of equivalent instructions (self-checking)

Statistics: In major ISAs, more than 75% instructions can be replaced with equivalent instructions

Example: MIPS ISA diversity

Original Instruction Equivalent Sequence Description

lw RA, addr(RB)Load word

lhu RA, addr(RB)lhu RC, (addr+2)(RB)sll RC, RC, 16or RA, RA, RC

Execute 2 load halfword unsigned instructions and places the second halfword to upper bytes.

8

Statistics

8%

15%

77%

ARM

11%

10%

79%

MIPS

8%

14%

78%

POWERPC

20%

6%

74%

X86

9

Full Equivalence

Partial Equivalence

No Equivalence

New Validation Method

ISA Diversity Database

Enhanced RIT

Hardware Replay

Mechanism

Post-processing

(triage)

New Validation Method

10

Contain equivalent Instructions for each instruction

RIT + ERIT (Equivalent RIT)+ checking code

Replace mismatch Instructions in RIT with Counterparts in ERIT

then replay

Data provided by HRM help

clustering of failure modes

Framework

Test Scenario

RandomInstruction Test

RandomInstruction Test

Generator

Enhanced RandomInstruction Test

GeneratorPrototype Triage

Debug

ISA DiversityDatabase

Enhanced log file

HardwareReplay

Mechanism

11

12

Framework (contd)

RIT

ERIT

CheckerBug

Detected

Prototype

Host machine

RITQuestion1:Why replay?

Question2:What if equivalent instruction is the offending one?

13

Enhanced RIT (Check Point)

InstructionSection 1store[1]

InstructionSection 2store[2]

……

store[k]

InstructionSection k

EquivalentInstructionSection 1estore[1]

EquivalentInstructionSection 2estore[2]

……

estore[k]

Equivalent InstructionSection k

CheckingCode

RIT ERIT

Mismatch? Check Point

14

Hardware Support

store-addr: buffer to store PC of all k store instructions.

estore-addr: buffer to store PC of all k estore instructions.

mids-queue: store every mismatch id ( 0~k ).

store counter: counts the number of stores for each run.

bypass control: control PC to bypass buggy code.

Stor

e-ad

drst

ore-

addr

esto

re-a

ddr

esto

re-a

ddr

bypass controlbypass control

store counterstore counter

hit

10

mids-queue

Mids from checking code (from a register)

monitor

25

130

0

Mismatch ids

15

Bypass Control

Run RITSC:0 -> mid-1 PC point to

Next inst ofestore[mid-1]

Until estore[mid] finish PC point to

store[mid]

“buggy” code in RITis bypassed

Remaining test still useful

16

Test Flow

Execute RIT, ERIT,Checking code

Update mids-queue

Mid == 0? N Mid > 0?

sc=0

replay(RIT,ERIT)

Y

sc==mid-1?

Y

PC=estore[mid-1]+4

Execute equivalent ops

PC=store[mid]

Execute checking code

Update mids-queue

Test PassedY N

StartFor all RITs:

17

Post-processing (triage)

Our MethodMismatch identifiers

stores’ addressDebug Engineer

Debug faster

Log information

Bypass buggy instructions

More bugs detectedwith each RIT

Time Analysis

21

VS

Simulation time is too long!

• PTLsim simulator: superscalar, OoO, single core x86-compatible• Bugs injected: 802 logic, 225 electrical, 1025 total• Original RIT: 154 RITS, ~4K Inst each, 616K Inst total• Comparison: Reversi & QED• Test Size: 2.4M Instructions for Reversi

1.8M Instructions for QED3.7M Instructions for Proposed

Experimental Environment

17

18

Injected Bugs Distribution

30%

52%

14%

4%

BUGS DISTRIBUTIONFetch/Decode Issue/Execute Retire Instruction&Data

Detected Bugs

19

0

200

400

600

800

1000

1200

Traditional RIT Reversi QED Proposed

detected bugs number

90.54% 88.10%

20.49%

100%

20

Validation Times

Time (Sec) Trad. RIT Reversi QED ProposedGeneration 4.460 6.310 5.530 7.680Simulation 51.000 - - -Execution 0.027 0.110 0.071 0.176

Total 55.487 6.420 5.601 7.856

Upload time, Download time and Compare time are nearly zero so neglected, But should be included for large tests.

Conclusion

Enhanced RIT

Support major ISA Fast self-checking

Hardware based replay

Bypass buggy InstFully utilize bug detect capability of RITs

Log information

Exact information on offending InstHelp locate root cause

AcceleratePost-siliconValidation!

22

24

Q&A

Debate

23

1. Is it a good idea to apply this method to pre-silicon verification?

2. Can we replace ERIT instructions with RIT instructions when mismatch happens?