Upload
ashley-simpson
View
227
Download
0
Tags:
Embed Size (px)
Citation preview
Generating Analyses for Detecting Faults in Path Segments
Wei Le* and Mary Lou SoffaUniversity of Virginia
*currently with Rochester Institute of Technology
2
Motivation
• Static analysis: an integral part of fault detection
– High code coverage
– No executables required
– Find faults early, so cheaper to fix
3
Challenges of Current Static Analysis
Precisionmany false positives and little support for diagnosis
Scalabilitymanual annotations sometimes required
Generalityhardcode heuristics, new tools for different types of faults
Important to achieve all three
4
Precision: Path-Sensitive Analyses
Heuristics based: ESP[das02] (based on an assumption of typestate fault)
Summary based: Saturn[xie07] (lack of
interprocedual path-sensitivity)
Partially exploring the state space: Prefix[bush00]
exhaustive analysis based on the structure of a program
Framework: AthenaAutomatically generate analyses from specifications:
• precise: low false positives and rich diagnostic info
interprocedural path-sensitive analysis
reports path-segments of a fault
• scalable: only covers code relevant to the fault
demand-driven analysis
• general: data- and control-centric, liveness and safety
a specification technique and a generation algorithm
5
6
Faults
• Commonality of the faults - Generality
– The violations are always observable at certain statements
– We are able to construct constraints to express violations
• Locality of a fault - Scalability
– Only the segments along the paths that are relevant to the fault
– Only a limited number of statements on the paths that contribute to the fault
– Fault locality holds for a variety of the faults
Path-SensitiveDemand-Driven
Template
Specification Language
ParserAnalyzer
Generator
Precision and Scalability of the Analyses
Generate Analyses
Athena: Components
Specification Repository
Syntax trees Code modules
Athena: Workflow
Demand-Driven Template
ParserAnalyzer
Generator
SpecAnalyzer for
the Spec
Path Classification
Path Segment
Infeasible
Safe
Faulty (severity, root cause)
Don’t-know
Program Generated Analysis
Step 1: Specifying Faults
Step 2: Generating Analysis
Step 3: Analyzing programs with Generated Analysis
8
Definition of a FaultInformation for Detecting the Fault
Path-SensitiveDemand-Driven
Template
Specification Language
ParserAnalyzer
Generator
Precision and Scalability of the Analyses
Generate Analyses
Components I: Specification and Language
Specification Repository
• Spec: <program point, constraints> <program point, actions>
• Language: attributes and operators on attributes
• Attributes – abstractions on program objects, e.g. len(s)
• Operators – comparison (>,<), computation (+, -), command (:=)
Grammar of the LanguageSpecification→ Vars VarList DefineFault FaultSigList DetectFault DetectSigListVarList → Var*Var → VarType namelist;VarType → Vbuffer|Vint|Vany|Vptr|...FaultSigList → FaultSigItem <or FaultSigItem>*DetectSigList → DetectSigItem <or DetectSigItem>* |# include < ExistentSpec >FaultSigItem → CodeSignature ProgramPoint S-Constraint Condition|CodeSignature ProgramPoint L-Constraint ConditionDetectSigItem → CodeSignature ProgramPoint Update ActionProgramPoint → $LangSyntax$|Condition|$LangSyntax$&&ConditionCondition → Attribute Comparator Attribute|!Condition|[Condition]|Condition&&Condition|Condition || ConditionAction → Attribute:=Attribute| ^ Condition|[Action]|Action&&Action|Action || Action|Condition→ ActionAttribute → PrimitiveAttribute(var, ...)|Constant|!Attribute|¬ Attribute|[Attribute]|Attribute ° Attribute|Attribute Op Attribute|min(Attribute,Attribute)|[Attribute,Attribute]PrimitiveAttribute → Size|Len|Value|MatchOperand|TMax|TMin|...Constant → 0|true|false|...Comparator → = | | > | < | | | | Op → +| − | * | |
Grammar of the LanguageSpecification→ Vars VarList DefineFault FaultSigList DetectFault DetectSigListVarList → Var*Var → VarType namelist;VarType → Vbuffer|Vint|Vany|Vptr|...FaultSigList → FaultSigItem <or FaultSigItem>*DetectSigList → DetectSigItem <or DetectSigItem>* |# include < ExistentSpec >FaultSigItem → CodeSignature ProgramPoint S-Constraint Condition|CodeSignature ProgramPoint L-Constraint ConditionDetectSigItem → CodeSignature ProgramPoint Update Action
ProgramPoint → $LangSyntax$| Condition|$LangSyntax$&&Condition
Condition → Attribute Comparator Attribute|!Condition|[Condition]|Condition&&Condition|Condition || Condition
Action → Attribute:=Attribute| ^ Condition|[Action]|Action&&Action|Action || Action|Condition→ Action
Attribute → PrimitiveAttribute(var, ...)|Constant|!Attribute|¬Attribute|[Attribute]|Attribute ° Attribute|Attribute Op Attribute|min(Attribute,Attribute)|[Attribute,Attribute]PrimitiveAttribute → Size|Len|Value|MatchOperand|TMax|TMin|...Constant → 0|true|false|...Comparator → = | | > | < | | | | Operators → +| − | * | |
1212
DetectFault CodeSignatureUpdate
$strcpy(a,b)$Len(a):=Len(b)
or CodeSignatureUpdate
$d=strlen(b)$Value(d):=Len(b)
Specification
Buffer Overflow Specification
Vars Vbuffer a, b; Vint d; Vany e;
DefineFault CodeSignature $strcpy(a,b)$
S_Constraint Len(b) Size(a)
or CodeSignature $memcpy(a,b,d)$
S_Constraint Min(Len(b),Value(a) Size(a)
Specification Language
Precision and Scalability of the Analyses
Generate Analyses
Component II: Demand-Driven Template
Specification Repository
ParserAnalyzer
Generator
Path-SensitiveDemand-Driven
Template
• Formulate fault detection problems into queries about program facts, e.g., variable relations
• Scalable: Buffer overflow detection [le08]
14
Safe
x[10] = ‘0’
bar()
s = (char*)malloc(80)
strlen(t) < 8
strcpy(s,t) strcat(x,t)
size(s)>= len(t)
size(s)>= len(t) len(t) < 8 yes no
80/8>=len(t) len(t)<8 : safe
size(s)>= len(t) len(t) < 8
Query
Resolution 1
2
3
4
5 6
buffer overflow buffer access size(buf) >= len(str)
Demand-Driven TemplateProgram
no
yes
Raise Queries
Propagate Queries
Update Queries
Evaluate Queries
Program
no
yes
Raise Queries
Propagate Queries
Update Queries
Evaluate Queries
Demand-Driven Template
• Rules for Propagating Query
Interprocedural, path-sensitive, context-sensitive Branch, loop, call, infeasible path
• Evaluating Queries (integer constraints)
Algebra rules, inequalities Integer constraint solver
Path-SensitiveDemand-Driven
Template
Specification Language
Precision and Scalability of the Analyses
Generate Analyses
Components III: Parser and Code Generator
Specification Repository
ParserAnalyzer
Generator
CodeSignature $strcpy(a,b)$
S_Constraint Len(b) Size(a)
17
CodeSignature: GetOp(s) = strcpy
S_Constraint: Size(Src1(s)) Len(Src2(s))
=
GetOp strcpy
Size
º
º
LenSrc1
CodeSignature, S_Constraint
A B
Src2
Parsing Specification (YACC)
Leaf: attribute
Non-leaf: Operator
18
Construct a function that implements the semantics of the tree based on the semantics of operators
bool IsStrcpy(statement t){ if (GetOp(t)==“strcpy”) return true; else return false; }
Create the instance of the call
IsStrcpy(n)
Find the function that implements the semantics of leaf attributes
int GetOp (statement t) { C_Syntax(t); return t.opcode; }
Code Generation
=
GetOp strcpy
Code Signature
1919
Generating Analysis
no
Demand-Driven Template
Raise Queries
Propagate Queries
Update Queries
Evaluate Queries yes
Code Module Generated
if(isnode(s)) q= raiseQ(s)
if(isnode(s)) updateQ(q)
CodeSignature $strcpy(a,b)$
S_Constraint Len(b) Size(a)
CodeSignatureUpdate
$strcpy(a,b)$Len(a):=Len(b)
if(isnode(s)) q= raiseQ(s)
if(isnode(s)) updateQ(q)
Syntax trees
Code modules
Demand-Driven Template
ParserAnalyzer
Generator
SpecAnalyzer for the Spec
20
Experimental Setup Athena (analyze C/C++/C#) – YACC, Phoenix and Disolver
Research Questions
Experiments Benchmarks
Evaluation Metrics
Can we generate analyses for detecting different faults?
buffer overflow integer fault null-pointer derefmemory leak
bugbench ffmpeg putty apache
detection ratefalse positivesfalse negativesdiagnostic infoscalability
Comparable with manually customized detectors?
memory leak detectorsSaturn
SPEC CPU-INT 2000
• Detection: 84 faults of four types from 9 benchmarks, 68 new
• False positive/negative: 18 false positives, missed 3
• Path segments: generally relevant to 1-4 procedures; maximum 35 procedures
• Scalability: apache (268.9 k) – 4 hours and ffmpeg (48.1 k) – 2.3 hours
21
Can We Generate Analyses for Different Faults?
New faults: many located along the same paths; dynamic tools would halt
Main source of imprecision: infeasible paths and pointers
Locality helped achieve the scalability; without guidance, manual inspection is hard
Code complexity matters;Generality does compromise scalability, but still scalable
22
Comparable with Manually Customized Detectors?
Heuristics designed for suppressing false positives may adversely hurt detection rate
Leak FP
Athena 53 6
NoPaths
[Orlovich06]
3 29
ValueGraph [Jeffery07]
38 6
Null-p
FP Finish
Athena
9 3 9/12
Saturn
[xie07]
7 44 5/12
• Lack interprocedural path-sensitivity
• Heuristics of applying consistency rules
23
Related Work• Static fault detection: type based, model checking,
data flow analysis
• Path-sensitive fault detection: Prefix, Metal, ESP, Archer, Saturn, Calysto – exhaustive based static analysis
• Athena is demand-driven, more precise, scalable and general
• Slicing and other demand-driven analyses
• Athena first uses it for computing path segments of faults
24
ConclusionsAthena - generates demand-driven, path-based, symbolic analysis for detecting specified faults:
• Faults are developed along paths, but manifest locality, thus demand-driven, path-based analysis is more precise and scalable
• Specification provides a way of mapping fault detection problems to constraints on program objects at the program points
• To specify different faults, the required attributes are limited, and the expression power comes from the composition of the attributes
Thank you and Questions?
26
i <10
strcpy(p,t)
p[10]
scanf(%s, t)
yes
1
2
3i = strlen(t)
Value(i) < 10
Len(t) < 10
Feasible
Size(p) Len(t)
Len(t)<10 Size(p) Len(t)
Len(t)<10 IsEntry(t) Size(p) Len(t)
Fault DetectionBranch Analysis
4
Len(t)<10 IsEntry(t) 10 Len(t) [Safe]
5