Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
© 2017 Arm Limited
DVClub
May 15, 2018
Vaibhav Agrawal
CPU Validation, Austin
Two Case Studies in Formal Deployment on ARM CPUs :
Instruction-Fetch and Floating-Point datapath
© 2017 Arm Limited
Instruction-Fetch unit
© 2017 Arm Limited 3
Why formal on Instruction-Fetch unit?
BTBs
Branch Predictor
RS
BX
FQ
iTag iData
uTag uData
iTLB Snoop
Architectural registers
CMO, TMO, DAR MMU L2 ID
IQ
CT
ID
ID
D E C O D E
ID
© 2017 Arm Limited 4
Why formal on Instruction-Fetch unit? - 2
• Control heavy: many independent FSMs interacting with each other
• Uop$ : new feature; critical for correct functionality
• Aggressive project timelines
• Simulation remains primary work horse (constrained random unit TB)
• But no dearth of bugs
© 2017 Arm Limited 5
Why formal on Instruction-Fetch unit? - 3
BTBs
Branch Predictor
RS
BX
FQ
iTag iData
uTag uData
iTLB Snoop
Architectural registers
CMO, TMO, DAR MMU L2 ID
IQ
CT
ID
ID
D E C O D E
ID
© 2017 Arm Limited 6
Why formal on Instruction-Fetch unit? - 4
• Any bug found by formal => one less for simulation
© 2017 Arm Limited 7
Making formal more efficient: complexity reduction
Definition of “efficiency”:
Improving formal reachability of the state space, both in terms of time and sequential depth
© 2017 Arm Limited 8
Making formal more efficient: complexity reduction
Technique Effort Return
1 Reduce table/hash sizes (caches/iTLB) High High
© 2017 Arm Limited 9
Making formal more efficient: complexity reduction
Technique Effort Return
1 Reduce table/hash sizes (caches/iTLB) High High
2 Reduce mop size Low High
© 2017 Arm Limited 10
Making formal more efficient: complexity reduction
Technique Effort Return
1 Reduce table/hash sizes (caches/iTLB) High High
2 Reduce mop size Low High
3 Preloading / IVAs High Extremely High
© 2017 Arm Limited 11
Making formal more efficient: complexity reduction
Technique Effort Return
1 Reduce table/hash sizes (caches/iTLB) High High
2 Reduce mop size Low High
3 Preloading / IVAs High Extremely High
4 Input VA/PA space reduction Low High
© 2017 Arm Limited 12
Making formal more efficient: complexity reduction
Technique Effort Return
1 Reduce table/hash sizes (caches/iTLB) High High
2 Reduce mop size Low High
3 Preloading / IVAs High Extremely High
4 Input VA/PA space reduction Low High
5 Input data space reduction (mops and instructions) Low High
© 2017 Arm Limited 13
A sample e2e formal check: mcac ordering checker
2 Tracked VAs
Constraint: VA1→VA2
Constraint: color mops at tr_VA{1,2}
Check on outputs: m[VA1]→ m[VA2]
Constrain both preloading and fills
m[VA1]
m[VA2]
All other VAs
Use oracles to manage conflicting constraints across checkers
© 2017 Arm Limited 14
Presenting potential bugs to designers
• Bug reproducibility important for testing fix
• Formal can hit different counter-examples across runs
• Extract input stimuli from trace; create a new assertion to emulate a directed test
• Original end-to-end assertion fails after 4 hours:
• precond |-> consequent
• New assertion with fixed stimulus
• directed_stimulus_sequence ##0 precond |-> consequent
© 2017 Arm Limited 15
Formal bug dissection
By property type By RTL functionality
19%
37%
34%
2% 4%
4%
iTag
iData
mopc
Fetch Queue
PC-Queue write
Misc
42%
38%
20%
End to end
Embedded
Interface
© 2017 Arm Limited 16
Formal can complement simulation Formal vs Simulation? Or, formal and simulation?
Feature bring up by designer using formal
Early RTL clean up
Corner case bugs
Could simulation have found all of these bugs?
© 2017 Arm Limited 17
Skeptical?
• Limited company resources. Simulation or formal?
• Law of diminishing returns w.r.t. resources?
• How much are the 5 formal only bugs worth?
• How much is the shift left worth?
• Its an investment; takes time to bear fruit
• Requires cooperation from design and simTB folks
• Ensure that value is provided back to them
• Requires management commitment
• Requires effort, commitment, and humility on part of formal verification engineer
© 2017 Arm Limited
Floating-Point datapath
© 2017 Arm Limited 19
Why formal on floating-Point datapath?
• Cost of FDIV bug in 1995: $475m
• Cost of a FP bug today?
• FP bug unacceptable in industry today; High expectations for FP accuracy
• Simulation based FP datapath validation: directed + random + exhuastive
• Took 4 months to hand code known corner cases for FP verification on Cortex-A15
• Exhaustive sims for a single Op with 2 Half-Precision inputs takes ~100 days of CPU time
• High cost in term of machine run time for sim based FP validation
• Need an efficient, yet exhaustive method for FP datapath verification
© 2017 Arm Limited 20
Sequential Equivalence Checking: An Example
Equivalence checking:
• Golden Reference == Given model
Boolean Equivalence Checking (EC)
Sequential Equivalence Checking (SEC)
Designs in Fig 1 and Fig 2 can be proven equivalent by SEC, but not by EC
a c
d
o
a
c d
o
Figure 1
Figure 2
© 2017 Arm Limited 21
More about SEC for FP datapath
• Reference design source
• RTL of a validated and released product: RTL vs RTL (Cadence Jaspergold)
• C model from floating-point library: C vs RTL (Mentor SLEC)
• Goal:
• Bug hunting
– Achieved by an end to end equivalence check, treating the design as black-box
– Very useful for shaking out the bugs
• Full proof
– May require internal map-points identification (proof decomposition)
FMUL, FMA, FDIV, FSQRT
© 2017 Arm Limited 22
Sample proof decomposition: radix-4 SRT FDIV
norm,scaling+
opA opB
(oth
er
stu
ff)
RTL Model
Partial Quotient Remainder
Digit Selection,
Q&R Update
Digit Selection,
Q&R Update
Partial Quotient Remainder
Partial Quotient Remainder
norm,scaling+
opA opB
(oth
er
stu
ff)
C Model
Digit Selection,
Q&R Update
Partial Quotient Remainder
Partial Quotient Remainder
Non-restoring
to Restoring Transactor
Non-restoring
to Restoring Transactor
Non-restoring
to Restoring Transactor
Non-restoring
to Restoring Transactor
Maps
Equal?
(Picture by Travis Pouarz @ Mentor)
© 2017 Arm Limited 23
Alternate theorem proving based approach
• New methodology adopted at CPG Austin
• Develop a “lower-level C model” which captures the RTL datapath algorithm
• Correctness of datapath algorithm proved using a mechanical theorem prover (e.g. ACL2)
– The C model is automatically translated to a theorem prover friendly input syntax (Common Lisp)
• Prove C model equivalence to RTL using SEC
© 2017 Arm Limited 24
Bugs found by formal
• About 11 over the course of 2 projects so far
• 1 bug was corner case FMA catch
• 2 additional meaty bugs caught by formal
• Simulation remains primary bring up vehicle
• Formal is now an integral part of overall verification methodology
© 2017 Arm Limited 25
Designer endorsement
In addition to the elimination of FP bugs, we are experiencing other benefits of our expanding set of formal tools.
(1) It (formal) lets us design more boldly. I was limited to very simple dividers/square rooters up through <project_name>, mostly because we had no ability to validate them to my satisfaction. …
(2) It (formal) frees us up to iterate more quickly. Through <project_name>, I spent about half my time trying to make sure that the designs were correct. That’s now down under 20%, and I believe it will go much lower as our collection of golden models and proof techniques grow. …
26 26 © 2017 Arm Limited
The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. www.arm.com/company/policies/trademarks
Thank you !