Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018...

Preview:

Citation preview

© 2017 Arm Limited

DVClub

May 15, 2018

Vaibhav Agrawal

CPU Validation, Austin

Two Case Studies in Formal Deployment on ARM CPUs :

Instruction-Fetch and Floating-Point datapath

© 2017 Arm Limited

Instruction-Fetch unit

© 2017 Arm Limited 3

Why formal on Instruction-Fetch unit?

BTBs

Branch Predictor

RS

BX

FQ

iTag iData

uTag uData

iTLB Snoop

Architectural registers

CMO, TMO, DAR MMU L2 ID

IQ

CT

ID

ID

D E C O D E

ID

© 2017 Arm Limited 4

Why formal on Instruction-Fetch unit? - 2

• Control heavy: many independent FSMs interacting with each other

• Uop$ : new feature; critical for correct functionality

• Aggressive project timelines

• Simulation remains primary work horse (constrained random unit TB)

• But no dearth of bugs

© 2017 Arm Limited 5

Why formal on Instruction-Fetch unit? - 3

BTBs

Branch Predictor

RS

BX

FQ

iTag iData

uTag uData

iTLB Snoop

Architectural registers

CMO, TMO, DAR MMU L2 ID

IQ

CT

ID

ID

D E C O D E

ID

© 2017 Arm Limited 6

Why formal on Instruction-Fetch unit? - 4

• Any bug found by formal => one less for simulation

© 2017 Arm Limited 7

Making formal more efficient: complexity reduction

Definition of “efficiency”:

Improving formal reachability of the state space, both in terms of time and sequential depth

© 2017 Arm Limited 8

Making formal more efficient: complexity reduction

Technique Effort Return

1 Reduce table/hash sizes (caches/iTLB) High High

© 2017 Arm Limited 9

Making formal more efficient: complexity reduction

Technique Effort Return

1 Reduce table/hash sizes (caches/iTLB) High High

2 Reduce mop size Low High

© 2017 Arm Limited 10

Making formal more efficient: complexity reduction

Technique Effort Return

1 Reduce table/hash sizes (caches/iTLB) High High

2 Reduce mop size Low High

3 Preloading / IVAs High Extremely High

© 2017 Arm Limited 11

Making formal more efficient: complexity reduction

Technique Effort Return

1 Reduce table/hash sizes (caches/iTLB) High High

2 Reduce mop size Low High

3 Preloading / IVAs High Extremely High

4 Input VA/PA space reduction Low High

© 2017 Arm Limited 12

Making formal more efficient: complexity reduction

Technique Effort Return

1 Reduce table/hash sizes (caches/iTLB) High High

2 Reduce mop size Low High

3 Preloading / IVAs High Extremely High

4 Input VA/PA space reduction Low High

5 Input data space reduction (mops and instructions) Low High

© 2017 Arm Limited 13

A sample e2e formal check: mcac ordering checker

2 Tracked VAs

Constraint: VA1→VA2

Constraint: color mops at tr_VA{1,2}

Check on outputs: m[VA1]→ m[VA2]

Constrain both preloading and fills

m[VA1]

m[VA2]

All other VAs

Use oracles to manage conflicting constraints across checkers

© 2017 Arm Limited 14

Presenting potential bugs to designers

• Bug reproducibility important for testing fix

• Formal can hit different counter-examples across runs

• Extract input stimuli from trace; create a new assertion to emulate a directed test

• Original end-to-end assertion fails after 4 hours:

• precond |-> consequent

• New assertion with fixed stimulus

• directed_stimulus_sequence ##0 precond |-> consequent

© 2017 Arm Limited 15

Formal bug dissection

By property type By RTL functionality

19%

37%

34%

2% 4%

4%

iTag

iData

mopc

Fetch Queue

PC-Queue write

Misc

42%

38%

20%

End to end

Embedded

Interface

© 2017 Arm Limited 16

Formal can complement simulation Formal vs Simulation? Or, formal and simulation?

Feature bring up by designer using formal

Early RTL clean up

Corner case bugs

Could simulation have found all of these bugs?

© 2017 Arm Limited 17

Skeptical?

• Limited company resources. Simulation or formal?

• Law of diminishing returns w.r.t. resources?

• How much are the 5 formal only bugs worth?

• How much is the shift left worth?

• Its an investment; takes time to bear fruit

• Requires cooperation from design and simTB folks

• Ensure that value is provided back to them

• Requires management commitment

• Requires effort, commitment, and humility on part of formal verification engineer

© 2017 Arm Limited

Floating-Point datapath

© 2017 Arm Limited 19

Why formal on floating-Point datapath?

• Cost of FDIV bug in 1995: $475m

• Cost of a FP bug today?

• FP bug unacceptable in industry today; High expectations for FP accuracy

• Simulation based FP datapath validation: directed + random + exhuastive

• Took 4 months to hand code known corner cases for FP verification on Cortex-A15

• Exhaustive sims for a single Op with 2 Half-Precision inputs takes ~100 days of CPU time

• High cost in term of machine run time for sim based FP validation

• Need an efficient, yet exhaustive method for FP datapath verification

© 2017 Arm Limited 20

Sequential Equivalence Checking: An Example

Equivalence checking:

• Golden Reference == Given model

Boolean Equivalence Checking (EC)

Sequential Equivalence Checking (SEC)

Designs in Fig 1 and Fig 2 can be proven equivalent by SEC, but not by EC

a c

d

o

a

c d

o

Figure 1

Figure 2

© 2017 Arm Limited 21

More about SEC for FP datapath

• Reference design source

• RTL of a validated and released product: RTL vs RTL (Cadence Jaspergold)

• C model from floating-point library: C vs RTL (Mentor SLEC)

• Goal:

• Bug hunting

– Achieved by an end to end equivalence check, treating the design as black-box

– Very useful for shaking out the bugs

• Full proof

– May require internal map-points identification (proof decomposition)

FMUL, FMA, FDIV, FSQRT

© 2017 Arm Limited 22

Sample proof decomposition: radix-4 SRT FDIV

norm,scaling+

opA opB

(oth

er

stu

ff)

RTL Model

Partial Quotient Remainder

Digit Selection,

Q&R Update

Digit Selection,

Q&R Update

Partial Quotient Remainder

Partial Quotient Remainder

norm,scaling+

opA opB

(oth

er

stu

ff)

C Model

Digit Selection,

Q&R Update

Partial Quotient Remainder

Partial Quotient Remainder

Non-restoring

to Restoring Transactor

Non-restoring

to Restoring Transactor

Non-restoring

to Restoring Transactor

Non-restoring

to Restoring Transactor

Maps

Equal?

(Picture by Travis Pouarz @ Mentor)

© 2017 Arm Limited 23

Alternate theorem proving based approach

• New methodology adopted at CPG Austin

• Develop a “lower-level C model” which captures the RTL datapath algorithm

• Correctness of datapath algorithm proved using a mechanical theorem prover (e.g. ACL2)

– The C model is automatically translated to a theorem prover friendly input syntax (Common Lisp)

• Prove C model equivalence to RTL using SEC

© 2017 Arm Limited 24

Bugs found by formal

• About 11 over the course of 2 projects so far

• 1 bug was corner case FMA catch

• 2 additional meaty bugs caught by formal

• Simulation remains primary bring up vehicle

• Formal is now an integral part of overall verification methodology

© 2017 Arm Limited 25

Designer endorsement

In addition to the elimination of FP bugs, we are experiencing other benefits of our expanding set of formal tools.

(1) It (formal) lets us design more boldly. I was limited to very simple dividers/square rooters up through <project_name>, mostly because we had no ability to validate them to my satisfaction. …

(2) It (formal) frees us up to iterate more quickly. Through <project_name>, I spent about half my time trying to make sure that the designs were correct. That’s now down under 20%, and I believe it will go much lower as our collection of golden models and proof techniques grow. …

26 26 © 2017 Arm Limited

The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. www.arm.com/company/policies/trademarks

Thank you !

Recommended