Upload
truongphuc
View
220
Download
6
Embed Size (px)
Citation preview
07/02/2008 Dr Andy Brooks 1
Hugbúnaðarverkefni 2- Static Analysis
Fyrirlestrar 1 & 2Static analysis tools at Microsoft
These warning messages fromPREFAST are interesting.
warning messages/viðvaranirstatic analysis/kyrrleg greining
http://www.microsoft.com/whdc/DevTools/tools/PREfast_steps.mspx
07/02/2008 Dr Andy Brooks 2
from Wikipedia“Static code analysis is the analysis of computersoftware that is performed without actually executingprograms built from that software (analysis performed onexecuting programs is known as dynamic analysis).”
“In most cases the analysis is performed on some versionof the source code and in the other cases some form ofthe object code. The term is usually applied to the analysisperformed by an automated tool, with human analysisbeing called program understanding or programcomprehension.”
http://en.wikipedia.org/wiki/Static_code_analysis
07/02/2008 Dr Andy Brooks 3
from Wikipedia“The sophistication of the analysis performed by toolsvaries from those that only consider the behavior ofindividual statements and declarations, to those thatinclude the complete source code of a program in theiranalysis.”
“Uses of the information obtained from the analysis varyfrom highlighting possible coding errors (e.g., the linttool) to formal methods that mathematically proveproperties about a given program (e.g., its behaviormatches that of its specification).”
07/02/2008 Dr Andy Brooks 4
from Wikipedia“Some people consider software metrics and reverseengineering to be forms of static analysis.”
“A growing commercial use of static analysis is in theverification of properties of software used in safety-critical computer systems and locating potentiallyvulnerable code.”
number of lines of code, of comments/fjöldi forritunarlína, fjöldi skjölunarlínareverse engineering/bakhönnun
07/02/2008 Dr Andy Brooks 5
Case Study/Dæmisaga
ReferenceStatic Analysis Tools as Early Indicators of Pre-ReleaseDefect Density, N Nagappan, and T Ball, ICSE’05, ©ACM,pp 580-586
Microsoft case studyPREfix and PREFAST static analysis tools
07/02/2008 Dr Andy Brooks 6
Static analysis• Static analysis can find:
– uninitialized variables– null pointer dereferences– buffer overflows– etc.
uninitialised/óframstilltnull pointer/tómabendirdereference/tilhliðrunoverflow of buffer/yfirflæði biðminnis
07/02/2008 Dr Andy Brooks 7
Static analysis
• Static analysis does not replace testing.Kyrrleg greining kemur ekki í stað prófunar.– Static analysis reveals “shallow” errors while
testing finds “deep” functional and designerrors.
• Static analysis finds different classes of errors.
design errors/hönnunarvillur
07/02/2008 Dr Andy Brooks 8
Problem concerning false warningsVandamál varðandi falskar viðvaranir
• Static analysis tools produce falsewarnings (“false positives”).
• If the rate of false positives is too high(>50%), the output from a static analysistool becomes less useful.– The developer spends too much time deciding
if warnings are real or not.rétt?
rate/hlutfall
07/02/2008 Dr Andy Brooks 9
Pre-release defect density
• If pre-release defect density can be estimatedaccurately and quickly, this can inform decisionson:– testing/prófun– code inspections/forrits skoðanir– design revision/hönnunar endurskoðun– delaying the release/að fresta útgáfu
• Static analysis tools can be run on thedeveloper´s computer or during nightly builds.– quickly/fljótt
pre-release/forútgáfadensity of defects/þéttleiki villnabuild/smíð
07/02/2008 Dr Andy Brooks 10
Definitions/Skilgreiningar
• static analysis defect density– “number of defects found by static analysis
tools per KLOC (thousand lines of code)”• pre-release defect density
– “number of defects per KLOC found by othermethods”
number of defects/fjöldi villna
07/02/2008 Dr Andy Brooks 11
Hypotheses/Tilgátur
• H1 “static analysis defect density can be used asan early indicator of pre-release defect density”
• H2 “static analysis defect density can be used topredict pre-release defect density at statisticallysignificant levels”
• H3 “static analysis defect density can be used todiscriminate between components of high andlow quality (fault and not fault-pronecomponents”)
to predict/að spástatistically significant levels/ tölfræðilega marktæk stigto discriminate/að aðgreinaproness/tilhneigingindicator/vísibreyta
07/02/2008 Dr Andy Brooks 12
PREfix and PREfast at Microsoft
• More than 12,5% of the defects fixed inWindows Server 2003 before release werefound with the PREFix and PREfast tools.
• PREfix and PREfast represent “state-of-the-art in industrial static analysis tools”.
07/02/2008 Dr Andy Brooks 13
PREfix• Symbolic execution of C/C++ code.• Common low-level programming errors are
found:– uninitialized variables, etc.
• “PREfix uses various heuristics to rule outinfeasible execution paths.”
• PREfix is used on nightly builds.• PREFix has been in use for 6 years and PREfix
errors are “automatically entered into a defectdatabase to be fixed by programmers”.
symbolic execution/táknræn inningalgebra/algebraheuristic method/brjóstvitsaðferðgagnasafn/database
07/02/2008 Dr Andy Brooks 14
PREfast
• Developers run PREfast.• Running PREfast takes a “negligible
percentage of compile time”.• PREfast uses:
– pattern matching to find simple programmingerrors
– local data flow analyses to find use ofuninitialized variables etc.
pattern matching/mynsturmátundata flow analysis/gagnaflæðis greining
int i;int j = 10;int k;k:=j/i;
07/02/2008 Dr Andy Brooks 15
Figure 1 Software Development Process
©ACM
07/02/2008 Dr Andy Brooks 16
Component-based analysisÍhlutabundin greining
• Windows Server 2003– 22M LOC, 199 components
• PREfix and PREfast defects wereextracted on a component basis from thedefect database.
• Other pre-release defects in the defectdatabase came from: “testing teams,integration teams, build results, externalteams, third party testers, etc.”
integration test/samþættingarprófun
07/02/2008 Dr Andy Brooks 17
Table 1 Spearman rank correlations“Spearman...can be applied even when the association between elements is non-linear”
©ACM
raðfylgnistuðull/rank-order correlation coefficient H1
07/02/2008 Dr Andy Brooks 18
Pearson correlationsdæmi
0,00
2,00
4,00
6,00
8,00
10,00
12,00
0,00 2,00 4,00 6,00 8,00 10,00 12,00 14,00 16,00
Pearson r = 0,816
Pearson r = 0,65
07/02/2008 Dr Andy Brooks 19
H2
• H2 “static analysis defect density can beused to predict pre-release defect densityat statistically significant levels.”
statistically significant levels/tölfræðilega marktæk stig
X-ás óháð breytastatic analysis defect density
Y-ás háð breytapre-releasedefect density
07/02/2008 Dr Andy Brooks 20
Various regression models triedLINEAR, LOGARITHMIC, EXPONENTIAL, QUADRATIC, CUBIC, ...
0,00
2,00
4,00
6,00
8,00
10,00
12,00
0,00 2,00 4,00 6,00 8,00 10,00 12,00 14,00 16,00
LINEAR
0,00
2,00
4,00
6,00
8,00
10,00
12,00
0,00 2,00 4,00 6,00 8,00 10,00 12,00 14,00 16,00
LOGARITHMIC
0,00
2,00
4,00
6,00
8,00
10,00
12,00
0,00 2,00 4,00 6,00 8,00 10,00 12,00 14,00 16,00
EXPONENTIAL
0,00
1,00
2,00
3,00
4,00
5,00
6,00
7,00
8,00
9,00
10,00
0,00 2,00 4,00 6,00 8,00 10,00 12,00 14,00 16,00
QUADRATIC
07/02/2008 Dr Andy Brooks 21
Always draw a scatter plotAlltaf teikna punktarit
0,00
2,00
4,00
6,00
8,00
10,00
12,00
0,00 2,00 4,00 6,00 8,00 10,00 12,00 14,00 16,00
Pearson r = 0,816
0,00
2,00
4,00
6,00
8,00
10,00
12,00
0,00 2,00 4,00 6,00 8,00 10,00 12,00 14,00 16,00
Pearson r = 0,816
07/02/2008 Dr Andy Brooks 22
Table 2 Regression Fits
• R2 “measures the variance in the dependant variable thatis accounted for by the model built using the predictors.”
• “We do not present the regression equations in order toprotect proprietary data.”
©ACM
regression equation/jafna bestu línuproprietary data/séreignargögnvariability/breytileiki
R2 coefficient of determination
07/02/2008 Dr Andy Brooks 23
Data Splitting
• A random sample of 2/3rds of the components(132) was used to build a multiple regressionmodel and the remaining 1/3rd (67) were used tocheck the predictive ability of the built model.– multiple means using both PREfast and PREfix
• unclear if multiple linear or non-linear
• The R2 for the built equation was 0,806.– 80% of the variance in pre-release defect density
explained
random sample/slembiúrtak
07/02/2008 Dr Andy Brooks 24
Figure 2Actual vs. Estimated pre-release defect density.
©ACM
67 components
Actual valuesare not shownfor proprietaryreasons.
outlier/einfari
07/02/2008 Dr Andy Brooks 25
Correlation analysis
• Spearman rank correlation between theactual and estimated defect densities was0,564 (p<0,0005).– Pearson correlation coefficient was 0,669.
• R2 is 49%
07/02/2008 Dr Andy Brooks 26
Taking one random sample is not enough
• Building one regression model on thebasis of one random sample of 132components is not enough.– Positive results may have been by chance.
• The results of three other random samplesof 132 components are shown in Figure 3.
by chance/af tilviljun
07/02/2008 Dr Andy Brooks 27
Figure 3Actual vs. Estimated pre-release defect density.
©ACM
3 components are not tracked wellby the regression modelling.(Unclear if 3 separatecomponents.)
07/02/2008 Dr Andy Brooks 28
Table 3 Fit and Correlation resultsof random model splitting
©ACM
Model Predictive ability
0,536 * 0,536 = 0,29%
07/02/2008 Dr Andy Brooks 29
H3
• “static analysis defect density can beused to discriminate between componentsof high and low quality (fault and not fault-prone components”)
to discriminate/að aðgreina
07/02/2008 Dr Andy Brooks 30
Discriminant analysis• “The overall classification obtained by
discriminant analysis is 82,91% (165 of the 199components are correctly identified as fault ornot fault-prone.”– Andy says: We do not know the actual defect density
distribution nor the cut-off used.• “The type I and type II misclassifications are not
separately recorded to protect proprietaryinformation.”– Andy says: We do not know if 34 fault-prone modules
were misclassified as non fault-prone or if 34 nonfault-prone were misclassified as fault-prone, or ....
07/02/2008 Dr Andy Brooks 31
Study limitations
• Some of the PREfix and PREfast defectsentered into the defect database mighthave been false positives.– Andy asks: What are the false positive rates for these
two tools?
• The results depend on the quality of thePREfix and PREfast static analysis toolsand may not be repeatable with otherstatic analysis tools.
07/02/2008 Dr Andy Brooks 32
Lessons learned
• “Static analysis defect density can be usedas early indicators of pre-release defectdensity”
• “Static analysis defect density can be usedto predict pre-release defect density atstatistically significant levels”
• “Static analysis defect density can be usedto discriminate between components ofhigh and low quality”
07/02/2008 Dr Andy Brooks 33
AbstractThis paper presents the results of a study in which we empiricallyinvestigated the suite of object-oriented (OO) design metrics introduced in(Chidamber and Kemerer, 1994)...To perform our validation accurately, we collected data on thedevelopment of eight medium-sized information management systemsbased on identical requirements...Based on empirical and quantitative analysis, the advantages anddrawbacks of these OO metrics are discussed. Several of Chidamber andKemerer's OO metrics appear to be useful to predict class fault-pronenessduring the early phases of the life-cycle. Also, on our data set, they arebetter predictors than “traditional” code metrics, which can only becollected at a later phase of the software development processes.
A validation of object-oriented design metrics as quality indicatorsBasili, V.R. Briand, L.C. Melo, W.L.IEEE Transactions on Software Engineering, Oct 1996Volume: 22, Issue: 10 page(s): 751-761
07/02/2008 Dr Andy Brooks 34
Sample defect #1:compiler warning
int f(int i){int n;if (i == 0)
n = 1;return n;}
uwmsrsi1.c(6) : warning C4701: local variable 'n'may be used without having been initialized
Slide taken from a presentation by Jon Pincus, PPRC,Slide taken from a presentation by Jon Pincus, PPRC,Reliability Tools, Microsoft Research in August, 2001.Reliability Tools, Microsoft Research in August, 2001.
07/02/2008 Dr Andy Brooks 35
Sample defect #1:PREfix warning
int f(int i){int n;if (i == 0)
n = 1;return n;}uwmsrsi1.c(6):warning 1: using uninitialized memory 'n'
uwmsrsi1.c(3) : stack variable declared hereProblem occurs when the following condition is true:
uwmsrsi1.c(4) : when 'i != 0' herePath includes 2 statements on the following lines:
4 6
Slide taken from a presentation by Jon Pincus, PPRC,Slide taken from a presentation by Jon Pincus, PPRC,Reliability Tools, Microsoft Research in August, 2001.Reliability Tools, Microsoft Research in August, 2001.
36From PREfast Step-by-Step Updated for PREfast Version 2.1 © Microsoft