26
© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM CPUs : Instruction-Fetch and Floating-Point datapath

Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited

DVClub

May 15, 2018

Vaibhav Agrawal

CPU Validation, Austin

Two Case Studies in Formal Deployment on ARM CPUs :

Instruction-Fetch and Floating-Point datapath

Page 2: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited

Instruction-Fetch unit

Page 3: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited 3

Why formal on Instruction-Fetch unit?

BTBs

Branch Predictor

RS

BX

FQ

iTag iData

uTag uData

iTLB Snoop

Architectural registers

CMO, TMO, DAR MMU L2 ID

IQ

CT

ID

ID

D E C O D E

ID

Page 4: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited 4

Why formal on Instruction-Fetch unit? - 2

• Control heavy: many independent FSMs interacting with each other

• Uop$ : new feature; critical for correct functionality

• Aggressive project timelines

• Simulation remains primary work horse (constrained random unit TB)

• But no dearth of bugs

Page 5: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited 5

Why formal on Instruction-Fetch unit? - 3

BTBs

Branch Predictor

RS

BX

FQ

iTag iData

uTag uData

iTLB Snoop

Architectural registers

CMO, TMO, DAR MMU L2 ID

IQ

CT

ID

ID

D E C O D E

ID

Page 6: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited 6

Why formal on Instruction-Fetch unit? - 4

• Any bug found by formal => one less for simulation

Page 7: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited 7

Making formal more efficient: complexity reduction

Definition of “efficiency”:

Improving formal reachability of the state space, both in terms of time and sequential depth

Page 8: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited 8

Making formal more efficient: complexity reduction

Technique Effort Return

1 Reduce table/hash sizes (caches/iTLB) High High

Page 9: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited 9

Making formal more efficient: complexity reduction

Technique Effort Return

1 Reduce table/hash sizes (caches/iTLB) High High

2 Reduce mop size Low High

Page 10: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited 10

Making formal more efficient: complexity reduction

Technique Effort Return

1 Reduce table/hash sizes (caches/iTLB) High High

2 Reduce mop size Low High

3 Preloading / IVAs High Extremely High

Page 11: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited 11

Making formal more efficient: complexity reduction

Technique Effort Return

1 Reduce table/hash sizes (caches/iTLB) High High

2 Reduce mop size Low High

3 Preloading / IVAs High Extremely High

4 Input VA/PA space reduction Low High

Page 12: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited 12

Making formal more efficient: complexity reduction

Technique Effort Return

1 Reduce table/hash sizes (caches/iTLB) High High

2 Reduce mop size Low High

3 Preloading / IVAs High Extremely High

4 Input VA/PA space reduction Low High

5 Input data space reduction (mops and instructions) Low High

Page 13: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited 13

A sample e2e formal check: mcac ordering checker

2 Tracked VAs

Constraint: VA1→VA2

Constraint: color mops at tr_VA{1,2}

Check on outputs: m[VA1]→ m[VA2]

Constrain both preloading and fills

m[VA1]

m[VA2]

All other VAs

Use oracles to manage conflicting constraints across checkers

Page 14: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited 14

Presenting potential bugs to designers

• Bug reproducibility important for testing fix

• Formal can hit different counter-examples across runs

• Extract input stimuli from trace; create a new assertion to emulate a directed test

• Original end-to-end assertion fails after 4 hours:

• precond |-> consequent

• New assertion with fixed stimulus

• directed_stimulus_sequence ##0 precond |-> consequent

Page 15: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited 15

Formal bug dissection

By property type By RTL functionality

19%

37%

34%

2% 4%

4%

iTag

iData

mopc

Fetch Queue

PC-Queue write

Misc

42%

38%

20%

End to end

Embedded

Interface

Page 16: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited 16

Formal can complement simulation Formal vs Simulation? Or, formal and simulation?

Feature bring up by designer using formal

Early RTL clean up

Corner case bugs

Could simulation have found all of these bugs?

Page 17: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited 17

Skeptical?

• Limited company resources. Simulation or formal?

• Law of diminishing returns w.r.t. resources?

• How much are the 5 formal only bugs worth?

• How much is the shift left worth?

• Its an investment; takes time to bear fruit

• Requires cooperation from design and simTB folks

• Ensure that value is provided back to them

• Requires management commitment

• Requires effort, commitment, and humility on part of formal verification engineer

Page 18: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited

Floating-Point datapath

Page 19: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited 19

Why formal on floating-Point datapath?

• Cost of FDIV bug in 1995: $475m

• Cost of a FP bug today?

• FP bug unacceptable in industry today; High expectations for FP accuracy

• Simulation based FP datapath validation: directed + random + exhuastive

• Took 4 months to hand code known corner cases for FP verification on Cortex-A15

• Exhaustive sims for a single Op with 2 Half-Precision inputs takes ~100 days of CPU time

• High cost in term of machine run time for sim based FP validation

• Need an efficient, yet exhaustive method for FP datapath verification

Page 20: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited 20

Sequential Equivalence Checking: An Example

Equivalence checking:

• Golden Reference == Given model

Boolean Equivalence Checking (EC)

Sequential Equivalence Checking (SEC)

Designs in Fig 1 and Fig 2 can be proven equivalent by SEC, but not by EC

a c

d

o

a

c d

o

Figure 1

Figure 2

Page 21: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited 21

More about SEC for FP datapath

• Reference design source

• RTL of a validated and released product: RTL vs RTL (Cadence Jaspergold)

• C model from floating-point library: C vs RTL (Mentor SLEC)

• Goal:

• Bug hunting

– Achieved by an end to end equivalence check, treating the design as black-box

– Very useful for shaking out the bugs

• Full proof

– May require internal map-points identification (proof decomposition)

FMUL, FMA, FDIV, FSQRT

Page 22: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited 22

Sample proof decomposition: radix-4 SRT FDIV

norm,scaling+

opA opB

(oth

er

stu

ff)

RTL Model

Partial Quotient Remainder

Digit Selection,

Q&R Update

Digit Selection,

Q&R Update

Partial Quotient Remainder

Partial Quotient Remainder

norm,scaling+

opA opB

(oth

er

stu

ff)

C Model

Digit Selection,

Q&R Update

Partial Quotient Remainder

Partial Quotient Remainder

Non-restoring

to Restoring Transactor

Non-restoring

to Restoring Transactor

Non-restoring

to Restoring Transactor

Non-restoring

to Restoring Transactor

Maps

Equal?

(Picture by Travis Pouarz @ Mentor)

Page 23: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited 23

Alternate theorem proving based approach

• New methodology adopted at CPG Austin

• Develop a “lower-level C model” which captures the RTL datapath algorithm

• Correctness of datapath algorithm proved using a mechanical theorem prover (e.g. ACL2)

– The C model is automatically translated to a theorem prover friendly input syntax (Common Lisp)

• Prove C model equivalence to RTL using SEC

Page 24: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited 24

Bugs found by formal

• About 11 over the course of 2 projects so far

• 1 bug was corner case FMA catch

• 2 additional meaty bugs caught by formal

• Simulation remains primary bring up vehicle

• Formal is now an integral part of overall verification methodology

Page 25: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

© 2017 Arm Limited 25

Designer endorsement

In addition to the elimination of FP bugs, we are experiencing other benefits of our expanding set of formal tools.

(1) It (formal) lets us design more boldly. I was limited to very simple dividers/square rooters up through <project_name>, mostly because we had no ability to validate them to my satisfaction. …

(2) It (formal) frees us up to iterate more quickly. Through <project_name>, I spent about half my time trying to make sure that the designs were correct. That’s now down under 20%, and I believe it will go much lower as our collection of golden models and proof techniques grow. …

Page 26: Two Case Studies in Formal Deployment on ARM CPUs ......© 2017 Arm Limited DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Two Case Studies in Formal Deployment on ARM

26 26 © 2017 Arm Limited

The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. www.arm.com/company/policies/trademarks

Thank you !