Upload
beverly-merritt
View
213
Download
0
Embed Size (px)
Citation preview
Coverage Metricsand
Error Models
Serdar Tasiran, Kurt Keutzer
Department of Electrical Engineering & Computer Sciences
University of California, Berkeley
Tasiran and Keutzer, UC Berkeley2
Problem Statement Designers tape-out IC designs but do not have a clear idea as to
the number and severity of errors that remain in the design the comprehensiveness of the functional verification that has
been performed.
In theory, formal verification conclusively verifies the IC design But, in practice often impossible to perform formal verification
To improve the current situation practically, a broader variety of verification techniques are being used
Formal verification• Model checking, theorem proving
Simulation (using informal coverage metrics)
This catches more bugs, but at the end designers still don’t know how comprehensive their verification is whether a subset of the techniques applied could provide the
same comprehensiveness.
Tasiran and Keutzer, UC Berkeley3
The Need for Coverage Metrics A mechanism is needed to
quantify the degree of verification achieved by a combination of techniques
provide information about the unchecked aspects of the design Coverage Metrics Good coverage metric(s) will enable us to
Evaluate existing validation approaches Model checking Simulation given a set of test vectors ...
Compare and correlate these approaches Assess whether a certain degree of verification
comprehensiveness has been achieved Guide the generation of new test vectors to exercise
unchecked parts of designs
Tasiran and Keutzer, UC Berkeley4
Future System Overview
Simulationdriver
(vectors)
Simulationmonitor(yes/no)
Simulationengine
Simulationmodel(HDL)
Diagnosis ofUnverifiedPortions
CoverageAnalysis
VectorGeneration
OUR FOCUSNOW
Tasiran and Keutzer, UC Berkeley5
Research Approach
Survey existing coverage approaches from Protocol conformance testing Software testing Hardware verification (CAD)
Tasiran and Keutzer, UC Berkeley6
Research Approach
1. Form of spec and implementation2. Testing/verification goal3. Error models4. Coverage metrics
4.1 Does it correlate well with errors? Intuitively Experimentally
• Effectiveness: Likelihood of catching bugs given X % coverage.
4.2 Applicability How easy is it to achieve coverage? Likelihood of catching bugs per unit computation time
4.3 Relevance for hardware verification purposes
5. Comparison of metrics: theoretically and empirically X% C1 coverage vs. Y% C2 coverage. Computational cost, size of test set required, ...
For each domain
Conclusions and Discussion
Tasiran and Keutzer, UC Berkeley7
Executive Summary -1
Strictly formal/mathematical analysis of coverage metrics is unlikely to define one best or most comprehensive metric
No such conclusion from existing work Existing results have the form
“For testing scenario A, according to probabilistic measure P,metric C1 is more likely to detect errors than metric C2”
At very least, many coverage metrics are likely to be incommensurable
Three directions for future research Look for new mathematical relationships (beyond subsumes)
that gives formal ordering of techniques Statistical/empirical data gathering
Situation analogous to fault models in manufacture test Evaluate and compare metrics by arriving at good
statistical data to support claims. Engage with a customer - e.g. Intel
Search for intuitively better coverage metrics.
Tasiran and Keutzer, UC Berkeley8
Executive Summary -2
Software testing motivates a number of interesting new coverage metrics
Dataflow-based coverage metrics All definitions-uses associations Definition-use interactions for different variables Dependence coverage
Mutation coverage
Mutation testing could lead to new (interactive) diagnostic tool
Tasiran and Keutzer, UC Berkeley9
Protocol Conformance/FSM Testing (1,2)
SPEC: Single Deterministic (Extended) FSM Written in ESTELLE, LOTOS, SDL
IMPL: Single Deterministic (Extended) FSM running inside bigger system
EFSM: FSM describes control. Extra state variables:
Data structures Context variables.
Mostly small control part:
<100 states. Bigger data part.50-100 input and output variables
VERIFICATION GOAL: IMPL is equivalent to or contained in SPECNot only I/O behavior: States and transitions must match
State of IMPL not visible Black box testing of IMPL vs. SPEC
i = 1 / o = 0
x := wait
s3
s4
s2
s1
sinit
i = 0, x = wait / o = 1x := done
Tasiran and Keutzer, UC Berkeley10
Protocols, FSMs: Error Models (3)
Assumption: Faulty machines still satisfy correct interface behavior.
Structural faults: Some 20 “mutation operators” in Mothra
Output fault: The output on a transition is incorrect Transfer fault: Transition leads to wrong state Additional/missing transition on a given present state and input Extra/missing state Blocked or dropped input Sequencing faults:
Missing alternative path Improper nesting of loops WHILE instead of REPEAT Wrong logical expression to control a loop or
conditional statement Arithmetic/manipulative errors:
Ignoring overflow Wrong operator (>, ,*, +) Wrong initial value Reference to wrong variable
Tasiran and Keutzer, UC Berkeley11
Protocols, FSMs: Control Coverage (4)
Cover each state and transition of SPEC FSM Transition tours Distinguishing sequences for states Unique I/O sequences for states ...
Verify that outputs of IMPL match SPEC
Distinguishingsequence for s1
Distinguishingsequence for s2
s1
s2
sinit
Different technique: Fault functions
Define sets of suspicious transitions
Represent as a non-deterministic FSM
Cover all states and transitions of NFSM.
Tasiran and Keutzer, UC Berkeley12
Protocols: Datapath Coverage (4)
Exercise all branches Exercise all definitions-uses paths
Definition-Use Coverage Determine all definition- use pairs
May be an exponential # in the # of branches
For each definition-use path Get to its initial state Determine if the d-u path is traversable
Constraint solving (SAT like approach) May need to traverse loops several times to make path traversable
Control + Datapath: Repeat same control test with different data values All control states/branches + all definitions-uses
Cover all d-u paths first Cover remaining transitions
i = 1 / o = 0
x := wait
s3
s4
s2
s1
sinit
i = 0, x = wait / o = 1x := done
Definition
Use
Tasiran and Keutzer, UC Berkeley13
Protocols: Domain Coverage (4)
Sub-domain for a data variable Range of values that make program
go through the same control path.
Exercise all of the sub-domains of a variable Exercise each sub-domain, extremes and a few
intermediate values BUT: Protocols are reactive, not a single set of parameters Must define this criterion over time:
[Zhu, Vuong ‘97] Behavior of protocol:
Set of execution sequences: (a1,r1), (a2,r2), (a3,r3), …
ai: Event i
ri: Recursion depth of event i• Probably doesn’t apply to hardware
D1 D2
D3D4
D5
Tasiran and Keutzer, UC Berkeley14
Protocols: A Metric for Execution Sequences (4)
[Zhu, Vuong ‘97] Infinite behavior space, want to cover w/ finite tests Define distance between sequences
d(a1,b1) + 1/2 d(a2,b2) + 1/4 d(a3,b3) + … 0 d(ai,bi) 1
Obtain a metric space Cover space with finitely many spheres of radius . Rationale: Differences later on in the sequence don’t matter as much
Can pick test sets of decreasing . Problem: If metric says two sequences are close, are they really
close in terms of bug-catching probability? Generalize to data parameters:
(a1(v1,…,vn), r1), (a2(v1,…,vm), r2), … Distance metric depends on v1,…,vm as well
No results on applicability.
Tasiran and Keutzer, UC Berkeley15
Protocol Cov. Metrics: Mutation Adequacy (4)
Effectiveness (Fault coverage)
Impractical to look at all (faulty) machines Use mutation operators to generate representative population Mutation operators for protocols:
Alter the output/next state on one (two, three, …, n) transitions Add/remove a state Local perturbations to state transition graph or protocol description
Each mutant is checked for conformance/equivalence Too many mutants
Limit type and number of mutations Limit on the number of states in the implementation
populationin machinesfaulty of #
method by testingcaught machinesfaulty of #
Tasiran and Keutzer, UC Berkeley16
Protocol Metrics: Correlation w/ Errors (4.1)
No systematic study to determine if fault models are realistic. Compound mutations not be an accurate model for natural faults
[Frankl, Weiss, Hu ‘96]
Effectiveness vs. coverage curve flat, increases sharply after 95% Error exposing ability good only at high coverage levels Weed out unexecutable d-u paths, require 100% d-u coverage
Coverage vs. effectiveness depends on distribution of faulty programs to subdomains of
Data variables Mutations
Effectiveness defined relative to test generation strategy Will reflect probability distribution of test sets better
Results derived from protocols with few (actual) errors: Errors revealed by few tests Otherwise any large enough test set works
Tasiran and Keutzer, UC Berkeley17
Protocol Cov. Metrics: Applicability (4.2)
Difficult to achieve high coverage for elaborate metrics May require large test set
Mutation testing is costly. Coverage per unit resource is better for other metrics.
Must determine if mutant is equivalent. If cost is not a factor, mutation testing does well in catching bugs. 100% branch coverage not effective enough for protocols
For general software 70% is considered excellent.
Encouraging: all-uses exposes Missing-path: Designer forgets to specify what to do in a certain case Interesting: Structural testing supposed to be poor in this case Some d-u associations would have taken missing path
Test suites much smaller if machine has reliable reset.
Tasiran and Keutzer, UC Berkeley18
Protocol Cov. Metrics: Relevance (4.3)
Protocol SPEC described in terms of “abstract interfaces”. Correct interface assumption: High level actions implemented
correctly by the tester/protocol interface. Implication for hardware testing:
Can specify FSM in terms of high level actions (transactions). Each action implemented over several clock cycles Determine how much of the high-level behavior is exercised.
Useful to study effectiveness vs. coverage for various metrics Using actual and artificial faults
HW analogs of datapath metrics Definition: Loading/initialization of a data register (via a certain path) Predicate use: Register fans out to control part of design Computation use: Register fans out to other data register Compute, e.g., definition-use path coverage
Like netlist coverage Can be achieved by tagging data registers.
Tasiran and Keutzer, UC Berkeley19
Protocols: Comparison of Metrics (5)
C1 subsumes C2 iff programs P, test sets T, T gives 100% C2 cov. T gives 100% C1 cov.
C1 subsumes C2 Test sets for C1 are more likely to uncover bugs than C2.
Elaborate metrics more effective at X% coverage because Large test set size needed for X% coverage
Branch coverage vs. random testing: Not much improvement for comparable test set size.
All definitions-uses coverage: Requires a lot of computation for high coverage. Example:
40 d-u tests is as effective as 250 random tests. D-U tests require computation.
If cost not a factor, mutation testing better at catching bugs. Random testing not as good as path testing
But a good enough cost-effective alternative
Tasiran and Keutzer, UC Berkeley20
Software Testing (1,2)
SPEC: Model-based specs (written in Z, VDM, …)
State space (set of typed vars, invariant on vars) Required operations (dynamics of system)
Predicates: Pre-post condition pairs Can be executable: FSMs, state-charts Evaluate all predicates, output evaluates to TRUE or FALSE.
Property-oriented specs: Axiomatic or algebraic specs Algebraic spec: Set of equations program should satisfy A term represents
A sequence of calls to program A value
Check if the two are equal IMPL: Program written in an imperative programming language
May have procedure and function calls Verification Goal: Do the outputs of IMPL satisfy or match SPEC ?
Tasiran and Keutzer, UC Berkeley21
SW Error Models (3)
Mutation operators mimic likely errors.
COMPUTATION ERRORS Statement Errors
Wrong control keyword (WHILE instead of REPEAT, etc.) Predicate Errors
Expression/evaluation errors in Boolean predicates Decision variable of “case” statement
Missing control path Assignment Errors
Reference to wrong variable, Wrong expression
DOMAIN ERRORSD1 D2
D3D4
D5
Error in specifying domain boundaries
Tasiran and Keutzer, UC Berkeley22
SW Covg. Metrics (Adequacy Criteria) (4)
Classification 1: Program-based SPEC-based Combined Interface-based: Inputs and outputs adhere to required format
Random (statistical) testing• Probability distribution over input space
Classification 2: Structural metrics Fault-based metrics Error-based metrics
Tasiran and Keutzer, UC Berkeley23
SW: Program-based Structural Testing (4)
CONTROL-FLOW-BASED METRICS Basics:
Statement coverage Branch coverage Decision (multiple condition) coverage Path coverage.
1, 2 and 3 usually too weak. Miss errors. Also undecidable: Code may be unreachable
4 ideal but impractical: Too many paths Choose representative subset of paths
Simple/elementary paths Length-n paths Level-i paths
Check all elementary paths At next level, check unexercised elementary subpaths and cycles
Only check a linearly independent set of paths There are = e - n + # of SCCs of them (Cyclomatic number)
Level 2
Level 1
Tasiran and Keutzer, UC Berkeley24
SW: Program-based Structural Testing (4)
DATA-FLOW-BASED METRICS GOAL:
Don’t fold in data variables into state space Find meaningful and efficient way to exercise them
Definition occurrence x := 3 Use occurrence (Global use)
Computational use z := x + 1 Predicate use if (x >2) then ...
All-definitions: Each definition of variable exercised by some path. All-uses: Each (feasible) use of a variable is covered by a path
Computational uses Predicate uses All c-uses, some p-uses All p-uses, some c-uses ...
There may be many paths through which definition reaches use
Tasiran and Keutzer, UC Berkeley25
SW: Program-based Structural Testing (4)
DATA-FLOW-BASED METRICS (cont’d) All definition-use pairs:
definitions of x (feasible) paths q via which
def’n reaches a use of x, path p such that q is a sub-path of p.
Interactions between different variables must be exercised [Ntafos] k def-ref interactions: [d1(x1),u1(x1), d2(x2),u2(x2), d3(x3), …, dk(xk),uk(xk)]
di: def of variable xi
ui: use of variable xi
ui and di+1 are at the same node di reaches ui
xi’s and nodes nj need not be distinct Interaction path for k-def-ref p =
Required k-tuples criterion For all j-def-ref interactions L with 1 j k, there is an interaction path for
all feasible L
x:=3
x > 2
q1
q2q3
n1 n2 n3 nk...
Tasiran and Keutzer, UC Berkeley26
SW: Program-based Structural Testing (4,4.1,4.2)
DATA-FLOW-BASED METRICS : (Ordered) context coverage:
ni has a def. of xi that reaches n ni …n is def free for xi
Dependence coverage Determine syntactically/semantically if execution of one
statement affects the other. If so, exercise path in between.
APPLICATION ISSUES Should structured data be considered as a single entity?
May identify a def-use path for array whereas no array elementsatisfies def-use
Treating arrays element-by-element difficult if dynamically indexed Interprocedural data-flow dependencies:
Module instantiations, formal vs. actual variable uses.
n1 n2 n3 n...p=
Node n has uses of variables x1, x2, …, xn
Tasiran and Keutzer, UC Berkeley27
SW: Spec-based Structural Testing (4)
SPEC can be used to Determine if outputs of IMPL are correct Provide info to select test cases, Measure test set adequacy
Example: Must exercise all feasible combinations of sub-expression values A feasible combination of atomic predicates =
Sub-domain of input space
PARTITION TESTING (Automatically if possible) identify categories for each input parameter
or environment variable Characteristics enumerated in pre-condition of spec Characteristics intrinsic to variable (parameter type, etc)
Tasiran and Keutzer, UC Berkeley28
SW: Spec-based Structural Testing (4)
PARTITION TESTING Choice: partition of the domain of one group of variables
Example: A V B partitioned into ~(A V B), A and B
Example: n 0 … 63. Predicates in program: n > 0, 1 n 15 Partition: 0, [1 … 15], [16 … 63]
All combinations criterion Exercise every possible combination of choices
Each-choice-used criterion Exercise each choice as part of some combination
Base choice criterion Base choice: Combination of parameters representing normal
operation of software Each choice used in combinations with all base choices for the
rest of the variables.
Tasiran and Keutzer, UC Berkeley29
SW: Fault-Based Adequacy Criteria (4)
Structural testing
Fault-based testing: Adequacy of test set = Ability to detect faults Methods:
Error seeding Mutation testing
• Program mutation testing• SPEC-mutation testing
Error-based testing
Tasiran and Keutzer, UC Berkeley30
SW Fault-Based Metrics:Error Seeding (4,4.1)
Originally proposed to estimate # of faults in SW. Introduce artificial faults at random, unknown to tester Assumption: These faults representative of inherent faults
Test SW, count artificial and inherent faults separately
r = Measure of test adequacy
f = # of inherent faults found by testing Estimated # of inherent faults in program = (1/r) * f
Advantage: Can be used to evaluate any testing method Drawbacks:
Measure dependent on how faults are introduced Difficult to implement error-seeding:
Often done manually Artificial errors much easier to find
faults artificial of # Total
found faults artificial of #
Tasiran and Keutzer, UC Berkeley31
SW Fault-Based Metrics: Mutation Cov. (4)
Procedure for program P Create set of alternative programs: MUTANTS Construct test set T For each mutant M, run tests from T
Either P and M differ on a test Mutant dies Or T is exhausted Mutant lives
Live mutants provide valuable info. Mutant lives because Test data inadequate
Large proportion of mutants live, Test data does not convince us that P is correct
Live mutants point to “un-exercised” aspects of program Mutant is equivalent to program
Only a small fraction should be this way
Mutation adequacy =
mutants equivalent-Non
mutants Dead
Tasiran and Keutzer, UC Berkeley32
SW: Generating Mutants (4)
Mutation operator Replace one syntactic structure with another Designed based on previous design experience
Statement Analysis: Make sure every line and branch is necessary Replace statement with CONTINUE, TRAP Replace logicals and relationals with TRUE or FALSE Replace DO with FOR, etc.
Predicate Analysis Exercise predicate boundaries
Alter limit sub-expressions by small amounts Insert absolute value operators into predicate sub-expressions Alter relational operators
Domain Analysis Change constants and sub-expressions by small amounts Insert absolute value operators (if syntactically correct)
Coincidental Correctness Analysis Change data references and operators to alternatives
(wherever syntactically correct)
Tasiran and Keutzer, UC Berkeley33
SW: Mutant Testing (4)
Mutant testing assumptions Competent programmer hypothesis
Programs are near-perfect, errors are small deviations from intended program
Coupling effect hypothesis Simple and complex errors are coupled If test kills simple non-equivalent mutant, it will also kill complex
non-equivalent mutant Trying to validate coupling effect
Test set that kills mutants well kills mutants of mutants well also Second-order mutants = complex faults ?
Not clear. Experiments say no. Local Correctness:
For program P, define neighborhood of programs N(p) P is locally correct w.r.t. N iff
For all q in N(p)• Either q is equivalent to p• Or q fails on at least one test point in the test set
LC Correctness if N(p) includes at least one correct program
Tasiran and Keutzer, UC Berkeley34
Mutation Analysis: Applicability (4.2)
PROS Easy automation
Applying mutation operators Running mutants
Interactive test environment. If mutant doesn’t fail while original fails, easy to examine
mutation and determine if there is an error. Other testing methods special cases of mutant testing
Example: Statement and branch coverage
CONS Expensive (both in time and space)
n-line program = O(n2) mutants Human cost of examining live mutants
Empirically ~10% of all mutants Must decide if they’re equivalent to original program (HARD) If not, must create new test case to kill mutant
Tasiran and Keutzer, UC Berkeley35
Mutation Testing: Improvements (4.2)
Weak mutation testing Mutate and test components instead of whole program
• Same # of mutants• Don’t need to run whole program
Firm mutation testing Select portion of program and subset of parameters to be mutated
Sensitive to selection, higher human cost
Constrained mutation testing Omit the few operators that cause the most dead mutants Quality of test sets remains almost the same Mutation testing cost is reduced Empirical data: Still quadratic cost, although reduced significantly
Tasiran and Keutzer, UC Berkeley36
Mutation Testing: Improvements (4.2)
Ordered mutation testing Define order < between mutants
a < b implies that if a test t kills b, then it kills a also Check b first, check a only if b survives Order mutation operators.
Example: replace = by , <,>
Similar order on test data Experiments needed to determine practical effectiveness
Tasiran and Keutzer, UC Berkeley37
SW Fault-based: SPEC-Mutation Tstg. (4)
Aimed at catching bugs due to Misinterpreting the SPEC Errors in the SPEC
Plant faults in the SPEC Check what fraction of faulty SPECs are caught by test New operators applied to pre-/post condition pairs
Some program mutations don’t work well with SPECs Replacing clauses with TRUE or FALSE is useless or uninteresting
Two testing methods Non-executable SPEC
Check if program satisfies mutated SPEC Executable SPEC
Check if mutant SPEC gives same result as original SPEC
Tasiran and Keutzer, UC Berkeley38
SW: Error-Based Adequacy Criteria (4)
Structural testing Fault-based testing
Error-based testing IDEA: Check programs on error prone points. Partition input-output space so that behavior within a
subdomain is equivalent One test case is representative of all data in the sub-
domain May want to pick a few more to increase confidence in
implementation
Tasiran and Keutzer, UC Berkeley39
SW Error-Based: Domain Analysis (4)
SPEC-based input space partitioning SPEC requires same function on data
Even when SPEC is formal, no general mechanical method to partition input space.
Idea: For given set of pre-/post-condition pairs, put them into CNF:
P1(I) Q1(I,O) P2(I) Q2(I,O) Input data that satisfies Pi makes up domain i
Program (IMPL) -based input space partitioning Two data belong to same subdomain if they cause the same
computation Same computation: Same execution path
Combined program and SPEC-based domain analysis methods Perform partitioning based on the two, separately Find partition that refines intersection For each partition, choose sufficient test-set
D1 D2
D3D4
D5
Tasiran and Keutzer, UC Berkeley40
Domain Analysis: Test Case Selection (4)
Recall: Program errors Domain errors: Program selects boundaries of domains incorrectly Computation errors: Implementation of computation is wrong
Boundary analysis for domain errors N x 1 domain adequacy
For N sub-domains D1, D2, …, DN
boundaries B1, B2, …, BN
at least N test cases on each Bi and one test case “off” Bi
• If Bi is part of Di, then “off” test case should be outside Di
• Otherwise it should be inside Di
Detects parallel shift of linear boundaries N x N domain adequacy
N test cases on each Bi
N linearly independent test cases just off Bi
Detects parallel shifts and rotations of linear boundaries Using vertices improves efficiency of boundary testing
V x V domain adequacy: Test each vertex and another point just off.
Tasiran and Keutzer, UC Berkeley41
Domain Analysis: Applicability (4.2)
For certain classes of programs Linear functions Multinomials, etc.
can choose small subset of inputs that guarantee correctness
Major drawback: Too complicated to apply to complex input spaces
Example: Process control software Difficult to come up with metrics for non-numeric inputs Difficult to partition input domain for reactive programs: Recall
protocols
Computation and boundary analysis methods should be used in complementary way
Tasiran and Keutzer, UC Berkeley42
Overall SW Test Applicability (4.2)
Most metrics effective only at high coverage levels Complexity of data-flow testing
Experiments to correlate adequate test-set size with # of decisions inprogram
Observation: Large proportion of infeasible paths Requires computation to weed out Just looking at # of test cases is misleading
• Must consider how much computation goes into it
Mutant testing is quadratic in terms of Variables, variable references Number of software units
Structural coverage metrics: Linear-sized adequate test sets Most SW metrics can be adapted to apply to HW.
Mutation coverage may be Costly Unreliable
Difficult to check equivalence of mutant and original.
Tasiran and Keutzer, UC Berkeley43
SW Test Adequacy Criteria: Comparison (5)
Little experimental data to compare effectiveness. Nothing conclusive.
Duran & Ntafos: Use simulation to compare Random testing Partition test cases: Choose a given ni cases from partition pi
Empirical result: 100 random cases = 50 partition cases Extra effort of partition testing not justified
Data from SW support data from protocols: For comparable test size
Sophisticated metric not more effective Requires extra computation
Sophisticated metrics better at finding difficult bugs
Formal Analysis Of Relationships Between Criteria Subsumes relation compares severity of testing methods,
not effectiveness for given % coverage. Only statistical measures for “better bug detection ability”
statementcoverage
all-paths
all level-ipaths
branchcoverage
cyclomaticadequacy
Ordered context
all d-upaths
requiredk-tuples
strongmutation
(un-ordered)context
all-p-usessome-c-uses
all-p-uses
all-uses
all-def’ns
requiredpairs
all-c-usessome-p-uses
firmmutation
weakmutation
“Subsumes” relation between criteria (5)
all-c-uses
Tasiran and Keutzer, UC Berkeley45
Other Relationships between Criteria
For sub-domain based criteria
Multi-sets of subsets of input space
C1: { D1, D2, D3, …, Dm }
C2: { E1, E2, E3, …, En }
C1 narrows C2. For each Ei there is Dj Ei
C1 covers C2. Each Ei = D U D U D U … U D C1 partitions C2. Each Ei = D U D U D U … U D and D’s are disjoint C1 properly covers C2.
C1 covers C2 The covers of Ei’s make up proper subset of D1 U … U Dm
C1 properly partitions C2: C1 partitions C2 The covers of Ei’s make up proper subset of D1 U … U Dm
For random and statistical experiments, compare statistical measures of fault detecting ability of various criteria
Positive correlation Sometimes clean theoretical proof of implication
Tasiran and Keutzer, UC Berkeley46
UniversallyProperlyPartition
UniversallyPartition
UniversallyProperlyCover
UniversallyCover
UniversallyNarrow
Subsume
Implications between Relations (5)
Universally: Holds for all programs and specs
Tasiran and Keutzer, UC Berkeley47
ordered-contextcoverage
(un-ordered)-contextcoverage
decisioncoverage
requiredK-tuples
all uses
all p-uses(limited) mutationcoverage
multiple-conditioncoverage
decision conditioncoverage
atomic conditioncoverage
Universally Properly Cover Relation (5)
Tasiran and Keutzer, UC Berkeley48
CAD for HW (1,2)
SPEC: English description Properties in some (temporal) logic Invariants
IMPL: RTL netlist Verification Goal: IMPL “satisfies” SPEC. Problem: SPEC almost never “complete”
Berkeley designers’ opinion:
High coverage according to some metric is more convincing
Tasiran and Keutzer, UC Berkeley49
CAD: Error Models (3)
Wrong connection in gate-level netlist Perturbation to state-transition graph (a la protocols)
Timing errors Control pulse arrives too late or early State is entered or exited too soon
Computational errors Error in control predicates Missed cases
Control goes down wrong path Assignment errors
Tasiran and Keutzer, UC Berkeley50
CAD: Coverage Metrics (4, 4.1)
CONTROL EVENT COVERAGE [Ho & Horowitz ‘96] FSM coverage for control variables controlling datapath
Ge: Control event graph. Project control FSM onto variables in control-datapath
interface No need to consider other control variables
Assumptions: Design already partitioned into datapath and control Datapath does not hold any control state Only the sequencing of datapath commands matter, not their timing
Automatically extract from Verilog (w/ user annotations) Comments in Verilog highlight important control state variables Transitive set-of-support: Capture logic that controls these variables Derive list of “independent” variables, coverage tool will use this
eGin events control reachable Total
takenevents control of #coverage Transition
Tasiran and Keutzer, UC Berkeley51
CAD: Coverage Metrics (4,4.1)
CONTROL EVENT COVERAGE [Ho & Horowitz ‘96] Take global state graph Project out independent vars Take state & transition dumps from simulators Check which states/transitions have been covered PROBLEM: State explosion Heuristics
Graph pruning using don’t cares If a variable is written every cycle,
“zero” it when not read. Efficient to determine statically
Approximating the state space Project out variables that are close to primary inputs
• More likely to be close to non-deterministic
Tasiran and Keutzer, UC Berkeley52
Control Event Coverage: Applicability (4.2)
CONTROL EVENT COVERAGE [Ho & Horowitz ‘96]
Empirical data:
In general, same test set gives less control event coverage than
full state/event coverage
Highlights important tests that are missed
Full coverage analysis gives huge # of untested scenarios: Hard to use this data For conventional coverage, may be useful to project these onto
fewer variables
Difficult to exercise an uncovered scenario in full coverage
Using fewer variables and over-approximation,
easier to incrementally construct a test scenario
Tasiran and Keutzer, UC Berkeley53
CAD Cov. Metrics: Tag Coverage (4)
“Observability-based” Coverage Metric [Fallah, Devadas, Keutzer ‘96] Tags:
Mechanism to extend standard coverage metrics using observability requirements [‘96]
Capture assignment/computation errors [‘98] DISCLAIMER: Bugs do not always manifest themselves as an
incorrect value of some HDL variable Errors of omission Wrong global assumptions Program goes down wrong control path
IDEA: Tag a variable +, -: deviation from intended value Optimistic assumption: Deviation big enough to propagate in each case. Example: x > y +
Run a set of simulation vectors, tagging one variable assignment at a time, using tag calculus.
Determine which ones propagate to the output, calculate % propagated.
Tasiran and Keutzer, UC Berkeley54
CAD Tag Coverage (4.1, 4.2)
“Observability-based” Coverage Metric [Fallah, Devadas, Keutzer ‘96] There is full observability of internal nodes during simulation
BUT this info may be incomprehensible even by designer Observability-based coverage gives more meaningful % numbers
Random test may exercise line but not propagate to output
According to obs-based metric, user given test cases yield much better coverage than random
Not necessarily the case if observability isn’t considered Overhead for computing controllability not too much
1.5-4 simulation time
Captures most errors that can be caught by structural metrics BUT: Produces more errors than need to be analyzed
Find error model with fewer candidate errors
Tasiran and Keutzer, UC Berkeley55
CAD: Coverage Metrics (4)
Reachability analysis with coverage goal:
Bias simulation/search to achieve more coverage. Coverage directed state space search [Aziz, et. al.]
Guard (decision) coverage is the metric Give low priority to states yielding few new guards.
Guided search of the state space
Guidepost Coverage: Set intermediate goals for reaching designated state set Bias search to maximize achievement of intermediate goals Useful, captures intuition about how to reach a state inside
big state-space• Requires a lot of designer effort.
Saturated simulation [Aziz, Kukula, Shiple] Pick subset of transitions or next states to
exercise all controller-pair states or transitions
Tasiran and Keutzer, UC Berkeley56
CAD: Coverage Metrics (4,4.1)
Netlist coverage: 0-in. Circuit structure. Registers: Loaded, loaded unique values, read, initialized
Like definitions, definitions-uses, etc. criteria Counters: Overflow, underflow
Exercise domain boundaries Register-to-register paths: Are all (feasible) ones exercised?
Line, branch, single controller FSM, pair-arc Architectural coverage
Coverage of high-level, behavioral machine. “Transaction level”
Berkeley Wireless Center designers complain: “No metric relating to timing errors”
Pulse timing Enters or exits state too soon
Tasiran and Keutzer, UC Berkeley57
Commercial Coverage Tools
SureFire (SureCov), Design Acceleration Inc. (Coverscan), SummitDesign, TransEDA (HDLCover, VeriCov), interHDL (CoverIt), Veritools,
Blocks, arcs (branches), expressions, FSM states and transitions, sequences, pair FSM coverage
Covermeter Statement, block, branch, conditions coverage Register and net toggle coverage FSM coverage Data transfer coverage (register transfer and buses) ?? Invariant coverage/ assertion checking ??
SureFire (SureSolve) Functional verification suite and automatic testbench generation. Exercises 90 to 100% of all reachable HDL constructs !!
Tasiran and Keutzer, UC Berkeley58
Conclusions I
Difficult to draw sharp conclusions from existing formal/mathematical relationships among metrics
Only statistical comparisons for particular testing scenarios. Formal relations between metrics (such as “subsumes”) only
indirectly correlate with “bug-detection ability” Metrics need to be compared intuitively and experimentally using
Actual test sets Actual designs and errors
Existing experimental results comparing metrics interesting but limited
Should not be taken as conclusive for HW Factors to be considered when comparing metrics
Effectiveness at 100% coverage Test set size vs. X% coverage
Cost of constructing this test set Coverage per unit resource The type of bugs it is well-tailored to catch
A “design errors data base” would be useful, perhaps indispensable, for studying the above
Tasiran and Keutzer, UC Berkeley59
Conclusions II
System models and error models from all three domains can be used for HW.
PROTOCOLS Error models for control part of protocols are too low level
May be useful for mutation coverage, but do not capture actual errors.
Control coverage methods unlikely to be practical for HW: They require detailed coverage of state transition graph Based on black-box testing
Datapath and domain coverage metrics are very similar to SW and can be useful
SOFTWARE Most comprehensively studied, but experimental data not conclusive Data-flow and dependence based metrics likely to be useful for HW
They have RTL netlist analogs Example: Definition-use paths
• Tag propagation approach can be applied Best to complement IMPL-based metrics with SPEC-based metrics.
Tasiran and Keutzer, UC Berkeley60
Conclusions III
SOFTWARE (cont’d) Domain (partition) testing metrics likely to be useful for HW
A subdomain • An assignment of values to wires that go from datapath into
control. Domain coverage metrics can be applied to these assignments
• Each choice used• Base choice used• All combinations
Mutation analysis/adequacy likely to uncover interesting bugs Live mutants show what part of design is not covered Good intuitive measure for “having simulated enough”. But computationally expensive: Quadratic in size of description
SPEC-mutation testing is More computationally viable Uncovers incompleteness in specs
Must derive a good set of mutation operators for HW IMPL and SPEC.
Tasiran and Keutzer, UC Berkeley61
Conclusions IV
CAD Control event coverage: Meaningful subset of control variables to
consider There may be other useful subsets
Tag coverage useful tool for computing other sorts of coverage All definitions-uses Different variable interactions
For tag coverage, can choose which circuit nodes are comprehensible to designer
Declare these “observable” Compute which errors propagate to observable nodes
Tasiran and Keutzer, UC Berkeley62
Conclusions V
No work seems to exist in HW domain for Datapath coverage metrics Metrics to cover timing errors Mutation testing
No experimental work on Effectiveness of metrics Comparison of metrics This will be essential if no meaningful formal relationships can be
derived (e.g. manufacture test - stuck-open faults vs. stuck-at-faults vs. bridging faults)