View
213
Download
0
Embed Size (px)
Citation preview
“A System and Language for Building System-Specific, Static Analyses”
CMSC 631 – Fall 2003
Seth Hallem, Benjamin Chelf, Yichen Xie, and Dawson Engler
(presented by Mujtaba Ali)
2
Motivation
• Goal: Find as many bugs as possible
• Applications:– Free checker
• Detect double frees and dereference of freed pointers
– Lock checker• Warn if locks released without being acquired, double
acquired, or not released at all
– Statistical analysis to infer checking rules• Infer whether routines a and b must be paired
3
State Machine Transitions
• Analyses modeled as state machine transitions
• State machines are:– Simple enough for programmers to understand– Expressive enough to specify lots of analyses
unknown freed stop
kfree(v) kfree(v)
*v
Note: stop state does not always imply an error
4
Free Checker Exampleint contrived(int *p, int *w, int x) {
int *q;
if(x) {
kfree(w);
q = p;
p = 0;
}
if(!x)
return *w; // safe
return *q; // using 'q' after free!
}
int contrived_caller (int *w, int x, int *p) {
kfree (p);
contrived (p, w, x);
return *w; // using 'w' after free!
}
(pfreed)(pfreed)Assume x!=0
(p,wfreed)(p,w,qfreed)(w,qfreed,pstop)
Prune true branch
(wfreed,qstop)
(pfreed)
5
Free Checker Exampleint contrived(int *p, int *w, int x) {
int *q;
if(x) {
kfree(w);
q = p;
p = 0;
}
if(!x)
return *w; // safe
return *q; // using 'q' after free!
}
int contrived_caller (int *w, int x, int *p) {
kfree (p);
contrived (p, w, x);
return *w; // using 'w' after free!
}
(pfreed)(pfreed)Assume x==0
Prune false branch
(pfreed)(wfreed,qstop)
(pfreed)
6
Free Checker Exampleint contrived(int *p, int *w, int x) {
int *q;
if(x) {
kfree(w);
q = p;
p = 0;
}
if(!x)
return *w; // safe
return *q; // using 'q' after free!
}
int contrived_caller (int *w, int x, int *p) {
kfree (p);
contrived (p, w, x);
return *w; // using 'w' after free!
}
(pfreed)(wfreed,qstop)
(pfreed)(p,wfreed)(pfreed,wstop)
union
7
A Unified Framework
• Two components:– metal
• Language used for expressing custom analyses
• I.e, for expressing state machines
– xgcc• Analysis engine that executes metal specifications
8
metal
• Language for specifying state machines
• metal specification is called an “extension”
• For programmers, not compiler writers– Many rules known only to programmers
• Flexibility allows for different kinds of analyses, e.g.:– Find violations of known correctness rules– Automatically infer such rules from source
9
Example Extension: Free Checker
– Extensions feature ML-like pattern matching
state decl any_pointer v;
start:
{ kfree(v) } ==> v.freed;
v.freed:
{ *v } ==> v.stop,
{ err("using %s after free!", mc_identifier(v)); }
| { kfree(v) } ==> v.stop,
{ err("double free of %s!", mc_identifier(v)); }
;
10
metal Extension Terminology
– Global state variable (with exactly one instance) implied
– Instances of variable-specific state variables come and go
state decl any_pointer v;
start:
{ kfree(v) } ==> v.freed;
v.freed:
{ *v } ==> v.stop,
{ err(...); }
| { kfree(v) } ==> v.stop,
{ err(...); }
;
variable-specific state variable
variable-specific state values
global state value
11
metal Extensions and SMs
• Extension composed of one or more SMs– Extension state = the state of these SMs
• State machine state is a state tuple:– Value of global instance– Value of one of variable-specific instances
• State tuple notation: (start,v:pfreed)
• So, extension state = set of state tuples, e.g.{(start,v:pfreed),(start,v:wfreed)}
12
xgcc
• Executes metal extensions– Context-sensitive, interprocedural analysis
• Does not restrict metal extensions– Beyond determinism
• Scalability a primary design requirement– More rules + more code = more bugs found
13
xgcc Algorithm Overview
• Applies extension to CFG for a function in depth-first order
• At each program point, looks for executable transition in all state machines
• Provides additional enhancements:– Prunes non-executable paths– Follows simple value flow– Deletes state attached to redefined expressions
14
Intraprocedural Heuristics
• Basic block-level state caching
• Motivation: Exploit determinism of extension– Applying extension to same program point in
same state always gives same result
• Algorithm:– Before traversal, record extension state in each
basic block – a “block summary”– Subsequent traversals abort if their extension state
is a subset of the block summary
15
Block Summaryint contrived(int *p, int *w, int x) {
int *q;
if(x) {
kfree(w);
q = p;
p = 0;
}
if(!x)
return *w; // safe
return *q; // using 'q' after free!
}
int int contrived_caller (contrived_caller (int int *w, *w, int int x, x, int int *p) {*p) {
kfree (p);kfree (p);
contrived (p, w, x);contrived (p, w, x);
return return *w; *w; // using 'w' after free!// using 'w' after free!
}}
(pfreed)(pfreed)Assume x!=0
(p,wfreed)(p,w,qfreed)(w,qfreed,pstop)
Prune true branch
(wfreed,qstop)
(p(pfreedfreed))(start,v:wfreed)(start,v:qfreed)
multi-line basic blocks
16
Interprocedural Heuristics
• Require additional cache information
• Block summary is now a union of:– Transition edges: (s,v:tvs)(s’,v:tvs’)
– Add edges: (s,v:tunknown)(s’,v:tvs’)
• When new instances created inside basic block
• Suffix summary– Edges starting at a basic block and ending at
function’s exit point– Function summary=entry block’s suffix summary– Built backwards (in contrast to block summaries)
17
Block and Suffix Summariesint contrived(int *p, int *w, int x) {
int *q;
if(x) {
kfree(w);
q = p;
p = 0;
}
if(!x)
return *w; // safe
return *q; // using 'q' after free!
}
int int contrived_caller (contrived_caller (int int *w, *w, int int x, x, int int *p) {*p) {
kfree (p);kfree (p);
contrived (p, w, x);contrived (p, w, x);
return return *w; *w; // using 'w' after free!// using 'w' after free!
}}
(pfreed)(pfreed)Assume x!=0
(p,wfreed)(p,w,qfreed)(w,qfreed,pstop)
Prune true branch
(wfreed,qstop)
(p(pfreedfreed))(start,v:wfreed)(start,v:wfreed)(start,v:qfreed)(start,v:qstop)(start,v:wfreed)(start,v:wfreed)
(start,v:pfreed)(start,v:pfreed)(start,v:wunknown)(start,v:wfreed)
(start,v:pfreed)(start,v:pfreed)
18
Unsoundness
• xgcc’s interprocedural analysis is unsound– But that’s OK (Jim Larus agrees)– If it can catch some errors, it’s still useful
• Unsound analyses can catch some errors that sound analyses can’t– Some analyses (e.g.,inferring which routines must
be paired) can not be expressed soundly
• Focus is on executing extensions efficiently
19
Reducing False Positives• Killing variables and expressions
– Remove state machine when variable is defined
• Synonyms
• False path pruning• Targeted suppression
– i.e., xgcc hacks
p = q = kmalloc(...);if(!p) return 0;*q; /* safe dereference: q = p = not null */
20
Free Checker Exampleint contrived(int *p, int *w, int x) {
int *q;
if(x) {
kfree(w);
q = p;
p = 0;
}
if(!x)
return *w; // safe
return *q; // using 'q' after free!
}
int contrived_caller (int *w, int x, int *p) {
kfree (p);
contrived (p, w, x);
return *w; // using 'w' after free!
}
(pfreed)(pfreed)Assume x!=0
(p,wfreed)(p,w,qfreed)(w,qfreed,pstop)
Prune true branch
(wfreed,qstop)
(pfreed)
On a write, if there is a state machine for p, we “kill” it.
21
Reducing False Positives• Killing variables and expressions
– Remove state machine when variable is defined
• Synonyms
• False path pruning• Targeted suppression
– i.e., xgcc hacks
p = q = kmalloc(...);if(!p) return 0;*q; /* safe dereference: q = p = not null */
22
Ranking of Errors
• Impossible to eliminate all false positives
• xgcc ranks errors– Generic ranking: distance– Path-specific ranking by annotating extensions– Statistical ranking (z-ranking)
• Ranking can distinguish different uses– Linux semaphore routines up and down used as
both counters and locks– Interprocedural analysis can not handle this case
23
Extending metal Extensions
• Extend state space using general purpose code
• Path specific transitions– Different destination state for when analysis
follows true branch or false branch
• C Code actions– Can manipulate extension’s state using xgcc’s
interface
start:
{trylock(l) != 0} ==> true=l.locked, false=l.stop
| {trylock(l) == 0} ==> true=l.stop, false=l.locked
24
Example Extension: Free Checker
state decl any_pointer v;
start:
{ kfree(v) } ==> v.freed;
v.freed:
{ *v } ==> v.stop,
{ err("using %s after free!", mc_identifier(v)); }
| { kfree(v) } ==> v.stop,
{ err("double free of %s!", mc_identifier(v)); }
;
C Code actions
25
Extending metal Extensions
• Extend state space using general purpose code
• Path specific transitions– Different destination state for when analysis
follows true branch or false branch
• C Code actions– Can manipulate extension’s state using xgcc’s
interface
start:
{trylock(l) != 0} ==> true=l.locked, false=l.stop
| {trylock(l) == 0} ==> true=l.stop, false=l.locked
26
The Good
• Unsoundness presents new opportunities
• Designed for use by “everyday” programmers
• Heuristics to speed up execution
• Heuristics to reduce false positives
• Ranking to help sift through false positives
• Tested on systems code (Linux, OpenBSD)
• Paper is very clearly written!
27
The Bad
• Unsoundness is unsound– Jim Larus says eventually programmers will want
to move to sound tools
• Designed for use by “everyday” programmers– Advanced features require analysis knowledge
• Path-specific state machine transitions
• Path-specific error ranking
• xgcc/metal is now commercial– Boooo!
28
Related Work• ESP
– Sound– Uses state machine language like metal – More likely to scale in the interprocedural case
• SLAM– Model-checking approach– Verification tool intended for smaller code bases
• PREfix– Unsound, more expensive analysis– Fixed set of error types and analyses
29
Related Work (con’t.)
• ESC/Java– Uses theorem prover– High annotation burden (1 ann / 3 loc)
• Recent efforts to infer annotations
• Cqual– Interprocedural, sound analysis– Annotations to express program properties and to
suppress false positives