36
07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar 1 & 2 Static analysis tools at Microsoft These warning messages from PREFAST are interesting. warning messages/viðvaranir static analysis/kyrrleg greining http://www.microsoft.com/whdc/DevTools/tools/PREfast_steps.mspx

Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

Embed Size (px)

Citation preview

Page 1: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 1

Hugbúnaðarverkefni 2- Static Analysis

Fyrirlestrar 1 & 2Static analysis tools at Microsoft

These warning messages fromPREFAST are interesting.

warning messages/viðvaranirstatic analysis/kyrrleg greining

http://www.microsoft.com/whdc/DevTools/tools/PREfast_steps.mspx

Page 2: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 2

from Wikipedia“Static code analysis is the analysis of computersoftware that is performed without actually executingprograms built from that software (analysis performed onexecuting programs is known as dynamic analysis).”

“In most cases the analysis is performed on some versionof the source code and in the other cases some form ofthe object code. The term is usually applied to the analysisperformed by an automated tool, with human analysisbeing called program understanding or programcomprehension.”

http://en.wikipedia.org/wiki/Static_code_analysis

Page 3: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 3

from Wikipedia“The sophistication of the analysis performed by toolsvaries from those that only consider the behavior ofindividual statements and declarations, to those thatinclude the complete source code of a program in theiranalysis.”

“Uses of the information obtained from the analysis varyfrom highlighting possible coding errors (e.g., the linttool) to formal methods that mathematically proveproperties about a given program (e.g., its behaviormatches that of its specification).”

Page 4: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 4

from Wikipedia“Some people consider software metrics and reverseengineering to be forms of static analysis.”

“A growing commercial use of static analysis is in theverification of properties of software used in safety-critical computer systems and locating potentiallyvulnerable code.”

number of lines of code, of comments/fjöldi forritunarlína, fjöldi skjölunarlínareverse engineering/bakhönnun

Page 5: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 5

Case Study/Dæmisaga

ReferenceStatic Analysis Tools as Early Indicators of Pre-ReleaseDefect Density, N Nagappan, and T Ball, ICSE’05, ©ACM,pp 580-586

Microsoft case studyPREfix and PREFAST static analysis tools

Page 6: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 6

Static analysis• Static analysis can find:

– uninitialized variables– null pointer dereferences– buffer overflows– etc.

uninitialised/óframstilltnull pointer/tómabendirdereference/tilhliðrunoverflow of buffer/yfirflæði biðminnis

Page 7: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 7

Static analysis

• Static analysis does not replace testing.Kyrrleg greining kemur ekki í stað prófunar.– Static analysis reveals “shallow” errors while

testing finds “deep” functional and designerrors.

• Static analysis finds different classes of errors.

design errors/hönnunarvillur

Page 8: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 8

Problem concerning false warningsVandamál varðandi falskar viðvaranir

• Static analysis tools produce falsewarnings (“false positives”).

• If the rate of false positives is too high(>50%), the output from a static analysistool becomes less useful.– The developer spends too much time deciding

if warnings are real or not.rétt?

rate/hlutfall

Page 9: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 9

Pre-release defect density

• If pre-release defect density can be estimatedaccurately and quickly, this can inform decisionson:– testing/prófun– code inspections/forrits skoðanir– design revision/hönnunar endurskoðun– delaying the release/að fresta útgáfu

• Static analysis tools can be run on thedeveloper´s computer or during nightly builds.– quickly/fljótt

pre-release/forútgáfadensity of defects/þéttleiki villnabuild/smíð

Page 10: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 10

Definitions/Skilgreiningar

• static analysis defect density– “number of defects found by static analysis

tools per KLOC (thousand lines of code)”• pre-release defect density

– “number of defects per KLOC found by othermethods”

number of defects/fjöldi villna

Page 11: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 11

Hypotheses/Tilgátur

• H1 “static analysis defect density can be used asan early indicator of pre-release defect density”

• H2 “static analysis defect density can be used topredict pre-release defect density at statisticallysignificant levels”

• H3 “static analysis defect density can be used todiscriminate between components of high andlow quality (fault and not fault-pronecomponents”)

to predict/að spástatistically significant levels/ tölfræðilega marktæk stigto discriminate/að aðgreinaproness/tilhneigingindicator/vísibreyta

Page 12: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 12

PREfix and PREfast at Microsoft

• More than 12,5% of the defects fixed inWindows Server 2003 before release werefound with the PREFix and PREfast tools.

• PREfix and PREfast represent “state-of-the-art in industrial static analysis tools”.

Page 13: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 13

PREfix• Symbolic execution of C/C++ code.• Common low-level programming errors are

found:– uninitialized variables, etc.

• “PREfix uses various heuristics to rule outinfeasible execution paths.”

• PREfix is used on nightly builds.• PREFix has been in use for 6 years and PREfix

errors are “automatically entered into a defectdatabase to be fixed by programmers”.

symbolic execution/táknræn inningalgebra/algebraheuristic method/brjóstvitsaðferðgagnasafn/database

Page 14: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 14

PREfast

• Developers run PREfast.• Running PREfast takes a “negligible

percentage of compile time”.• PREfast uses:

– pattern matching to find simple programmingerrors

– local data flow analyses to find use ofuninitialized variables etc.

pattern matching/mynsturmátundata flow analysis/gagnaflæðis greining

int i;int j = 10;int k;k:=j/i;

Page 15: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 15

Figure 1 Software Development Process

©ACM

Page 16: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 16

Component-based analysisÍhlutabundin greining

• Windows Server 2003– 22M LOC, 199 components

• PREfix and PREfast defects wereextracted on a component basis from thedefect database.

• Other pre-release defects in the defectdatabase came from: “testing teams,integration teams, build results, externalteams, third party testers, etc.”

integration test/samþættingarprófun

Page 17: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 17

Table 1 Spearman rank correlations“Spearman...can be applied even when the association between elements is non-linear”

©ACM

raðfylgnistuðull/rank-order correlation coefficient H1

Page 18: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 18

Pearson correlationsdæmi

0,00

2,00

4,00

6,00

8,00

10,00

12,00

0,00 2,00 4,00 6,00 8,00 10,00 12,00 14,00 16,00

Pearson r = 0,816

Pearson r = 0,65

Page 19: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 19

H2

• H2 “static analysis defect density can beused to predict pre-release defect densityat statistically significant levels.”

statistically significant levels/tölfræðilega marktæk stig

X-ás óháð breytastatic analysis defect density

Y-ás háð breytapre-releasedefect density

Page 20: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 20

Various regression models triedLINEAR, LOGARITHMIC, EXPONENTIAL, QUADRATIC, CUBIC, ...

0,00

2,00

4,00

6,00

8,00

10,00

12,00

0,00 2,00 4,00 6,00 8,00 10,00 12,00 14,00 16,00

LINEAR

0,00

2,00

4,00

6,00

8,00

10,00

12,00

0,00 2,00 4,00 6,00 8,00 10,00 12,00 14,00 16,00

LOGARITHMIC

0,00

2,00

4,00

6,00

8,00

10,00

12,00

0,00 2,00 4,00 6,00 8,00 10,00 12,00 14,00 16,00

EXPONENTIAL

0,00

1,00

2,00

3,00

4,00

5,00

6,00

7,00

8,00

9,00

10,00

0,00 2,00 4,00 6,00 8,00 10,00 12,00 14,00 16,00

QUADRATIC

Page 21: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 21

Always draw a scatter plotAlltaf teikna punktarit

0,00

2,00

4,00

6,00

8,00

10,00

12,00

0,00 2,00 4,00 6,00 8,00 10,00 12,00 14,00 16,00

Pearson r = 0,816

0,00

2,00

4,00

6,00

8,00

10,00

12,00

0,00 2,00 4,00 6,00 8,00 10,00 12,00 14,00 16,00

Pearson r = 0,816

Page 22: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 22

Table 2 Regression Fits

• R2 “measures the variance in the dependant variable thatis accounted for by the model built using the predictors.”

• “We do not present the regression equations in order toprotect proprietary data.”

©ACM

regression equation/jafna bestu línuproprietary data/séreignargögnvariability/breytileiki

R2 coefficient of determination

Page 23: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 23

Data Splitting

• A random sample of 2/3rds of the components(132) was used to build a multiple regressionmodel and the remaining 1/3rd (67) were used tocheck the predictive ability of the built model.– multiple means using both PREfast and PREfix

• unclear if multiple linear or non-linear

• The R2 for the built equation was 0,806.– 80% of the variance in pre-release defect density

explained

random sample/slembiúrtak

Page 24: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 24

Figure 2Actual vs. Estimated pre-release defect density.

©ACM

67 components

Actual valuesare not shownfor proprietaryreasons.

outlier/einfari

Page 25: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 25

Correlation analysis

• Spearman rank correlation between theactual and estimated defect densities was0,564 (p<0,0005).– Pearson correlation coefficient was 0,669.

• R2 is 49%

Page 26: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 26

Taking one random sample is not enough

• Building one regression model on thebasis of one random sample of 132components is not enough.– Positive results may have been by chance.

• The results of three other random samplesof 132 components are shown in Figure 3.

by chance/af tilviljun

Page 27: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 27

Figure 3Actual vs. Estimated pre-release defect density.

©ACM

3 components are not tracked wellby the regression modelling.(Unclear if 3 separatecomponents.)

Page 28: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 28

Table 3 Fit and Correlation resultsof random model splitting

©ACM

Model Predictive ability

0,536 * 0,536 = 0,29%

Page 29: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 29

H3

• “static analysis defect density can beused to discriminate between componentsof high and low quality (fault and not fault-prone components”)

to discriminate/að aðgreina

Page 30: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 30

Discriminant analysis• “The overall classification obtained by

discriminant analysis is 82,91% (165 of the 199components are correctly identified as fault ornot fault-prone.”– Andy says: We do not know the actual defect density

distribution nor the cut-off used.• “The type I and type II misclassifications are not

separately recorded to protect proprietaryinformation.”– Andy says: We do not know if 34 fault-prone modules

were misclassified as non fault-prone or if 34 nonfault-prone were misclassified as fault-prone, or ....

Page 31: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 31

Study limitations

• Some of the PREfix and PREfast defectsentered into the defect database mighthave been false positives.– Andy asks: What are the false positive rates for these

two tools?

• The results depend on the quality of thePREfix and PREfast static analysis toolsand may not be repeatable with otherstatic analysis tools.

Page 32: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 32

Lessons learned

• “Static analysis defect density can be usedas early indicators of pre-release defectdensity”

• “Static analysis defect density can be usedto predict pre-release defect density atstatistically significant levels”

• “Static analysis defect density can be usedto discriminate between components ofhigh and low quality”

Page 33: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 33

AbstractThis paper presents the results of a study in which we empiricallyinvestigated the suite of object-oriented (OO) design metrics introduced in(Chidamber and Kemerer, 1994)...To perform our validation accurately, we collected data on thedevelopment of eight medium-sized information management systemsbased on identical requirements...Based on empirical and quantitative analysis, the advantages anddrawbacks of these OO metrics are discussed. Several of Chidamber andKemerer's OO metrics appear to be useful to predict class fault-pronenessduring the early phases of the life-cycle. Also, on our data set, they arebetter predictors than “traditional” code metrics, which can only becollected at a later phase of the software development processes.

A validation of object-oriented design metrics as quality indicatorsBasili, V.R. Briand, L.C. Melo, W.L.IEEE Transactions on Software Engineering, Oct 1996Volume: 22, Issue: 10 page(s): 751-761

Page 34: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 34

Sample defect #1:compiler warning

int f(int i){int n;if (i == 0)

n = 1;return n;}

uwmsrsi1.c(6) : warning C4701: local variable 'n'may be used without having been initialized

Slide taken from a presentation by Jon Pincus, PPRC,Slide taken from a presentation by Jon Pincus, PPRC,Reliability Tools, Microsoft Research in August, 2001.Reliability Tools, Microsoft Research in August, 2001.

Page 35: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

07/02/2008 Dr Andy Brooks 35

Sample defect #1:PREfix warning

int f(int i){int n;if (i == 0)

n = 1;return n;}uwmsrsi1.c(6):warning 1: using uninitialized memory 'n'

uwmsrsi1.c(3) : stack variable declared hereProblem occurs when the following condition is true:

uwmsrsi1.c(4) : when 'i != 0' herePath includes 2 statements on the following lines:

4 6

Slide taken from a presentation by Jon Pincus, PPRC,Slide taken from a presentation by Jon Pincus, PPRC,Reliability Tools, Microsoft Research in August, 2001.Reliability Tools, Microsoft Research in August, 2001.

Page 36: Hugbúnaðarverkefni 2 - Static Analysisstaff.unak.is/andy/StaticAnalysis0708/Lectures/SALec1and2.pdf07/02/2008 Dr Andy Brooks 1 Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar

36From PREfast Step-by-Step Updated for PREfast Version 2.1 © Microsoft