Upload
jewel-jackson
View
214
Download
0
Embed Size (px)
Citation preview
Static Code Checking: Static Code Checking: Security and ConcurrencySecurity and Concurrency
Ben WatsonBen Watson
The George Washington UniversityThe George Washington UniversityCS 297 Security and Programming LanguagesCS 297 Security and Programming Languages
June 9, 2005June 9, 2005
The VideoThe Video
The ProblemThe Problem
How to discover errors in code without running itHow to discover errors in code without running itCode can run for weeks or months without Code can run for weeks or months without displaying the errordisplaying the errorMany errors are caused by pieces of code that Many errors are caused by pieces of code that are very difficult to testare very difficult to test Device drivers – manufacturers aren’t always good at Device drivers – manufacturers aren’t always good at
this, and one OS company can’t possibly test all the this, and one OS company can’t possibly test all the tens of thousands of devices out theretens of thousands of devices out there
The Windows 98 crash was caused by a bad scanner driverThe Windows 98 crash was caused by a bad scanner driver Concurrent code—debugging complicated Concurrent code—debugging complicated
concurrency problems is a nightmare x n.concurrency problems is a nightmare x n.
The ScopeThe Scope
Lines of Code (estimated)Lines of Code (estimated)
Windows 3.1Windows 3.1 3,000,0003,000,000
Windows NT 3.51Windows NT 3.51 4,000,0004,000,000
Windows 95Windows 95 15,000,00015,000,000
RedHat Linux 7.1RedHat Linux 7.1 30,000,00030,000,000
Windows 2000Windows 2000 35,000,00035,000,000
Windows XPWindows XP 40,000,00040,000,000
Debian Linux 2.2Debian Linux 2.2 56,000,00056,000,000
Debian Linux 3.1Debian Linux 3.1 213,000,000213,000,000
The Real ProblemThe Real Problem
We’re only humanWe’re only human No person, no group of people can possibly No person, no group of people can possibly
manually debug anything as complicated as manually debug anything as complicated as an OS and its related piecesan OS and its related pieces
Good tools are not enoughGood tools are not enoughCan’t rely on thorough annotations of entire code Can’t rely on thorough annotations of entire code basebase
Can’t rely on manual directions: the more Can’t rely on manual directions: the more automated the betterautomated the better
The SolutionsThe Solutions
MC Security checking systemMC Security checking system
RacerX: Race condition and Deadlock RacerX: Race condition and Deadlock detectiondetection
General rule inference from source codeGeneral rule inference from source code
MECA: Statically Checking MECA: Statically Checking Security PropertiesSecurity Properties
Checks low-level properties (pointer safety, Checks low-level properties (pointer safety, etc.)etc.)
Relies on annotations that propagate through Relies on annotations that propagate through the analysisthe analysis
GoalsGoals ExpressivenessExpressiveness Low manual overhead—programmers only have to Low manual overhead—programmers only have to
type in a relatively few number of annotationstype in a relatively few number of annotations Low false-positivesLow false-positives
How MC WorksHow MC Works
Uses a modified GCC compilerUses a modified GCC compilerParses source along with abstract syntax Parses source along with abstract syntax tree generated by compilertree generated by compilerAST used to build a control-flow graphAST used to build a control-flow graphAnnotation propagator uses CFG to Annotation propagator uses CFG to propagate annotations through entire propagate annotations through entire graphgraphCheckers are run on the completed graphCheckers are run on the completed graphResults are ranked and filteredResults are ranked and filtered
An exampleAn example
Rule: OS kernel may not access a user-Rule: OS kernel may not access a user-pointer (there are “paranoid” functions to pointer (there are “paranoid” functions to access the data pointed to by a user-access the data pointed to by a user-pointer)pointer) Referred to as a “tainted” pointersReferred to as a “tainted” pointers
Annotate:Annotate: Tainted variables, parameters, and fieldsTainted variables, parameters, and fields Functions that produce tainted valuesFunctions that produce tainted values
Source annotationsSource annotations
struct myStruct {struct myStruct {/*@ tainted */ int*p;/*@ tainted */ int*p;};};
/*@ tainted */ int *foo(/*@ tainted /*@ tainted */ int *foo(/*@ tainted */int *p);*/int *p);
void memcpy(/*@ !tainted */void *dst, void memcpy(/*@ !tainted */void *dst, /*@ !tainted */void *src, unsigned /*@ !tainted */void *src, unsigned nbytes);nbytes);
Source annotationsSource annotations
//Binding://Binding:
/*@ set_length($ret, sz) *//*@ set_length($ret, sz) */
void* malloc(unsigned sz);void* malloc(unsigned sz);
//Global: all sys_* calls //Global: all sys_* calls
//are tainted//are tainted
/*@ global $param ${!/*@ global $param ${!strncmp(current_fn,”sys_”,4)} ==> strncmp(current_fn,”sys_”,4)} ==> tainted */tainted */
PropagationPropagationvoid bar(/*@ tainted */void *p);void bar(/*@ tainted */void *p);struct S{char* buf;}struct S{char* buf;}//Before analysis//Before analysisvoid foo(char** p, struct S* s)void foo(char** p, struct S* s){{
char *r;char *r;struct S* ss;struct S* ss;r=*p;r=*p;bar(r);bar(r); //taints r and *p//taints r and *pss =s;ss =s;bar(ss->buf);bar(ss->buf); //taints ss and s//taints ss and s
}}
//At the end of analysis://At the end of analysis:Foo(/*@ tainted (*p) */char **p, /*@tainted(s->buf) Foo(/*@ tainted (*p) */char **p, /*@tainted(s->buf)
*/struct S* s);s*/struct S* s);s
MECA resultsMECA results
On average, one manual annotation led to 682 checksOn average, one manual annotation led to 682 checks
Linux 2.5.63 Bugs:Linux 2.5.63 Bugs:
TypeType WarningsWarnings FixedFixed
Arbitrary writeArbitrary write 1111 1111
Arbitrary readArbitrary read 88 88
Fault at willFault at will 1919 1717
Always failAlways fail 66 33
TotalTotal 4444 3939
False False PositivesPositives
88
RacerXRacerX
Static detection of race conditions and Static detection of race conditions and deadlocksdeadlocks
Designed to find errors in large, multi-Designed to find errors in large, multi-threaded systemsthreaded systems
Sorts errors by severity (the hard part)Sorts errors by severity (the hard part)
They checked Linux, FreeBSD, and a They checked Linux, FreeBSD, and a mystery OS that has only 500,000 lines of mystery OS that has only 500,000 lines of codecode
DeadlockDeadlock
DeadlockDeadlock Thread 1 has locked resource AThread 1 has locked resource A Thread 2 has locked resource BThread 2 has locked resource B Thread 1 needs resource B to completeThread 1 needs resource B to complete Thread 2 needs resource A to completeThread 2 needs resource A to complete Neither can proceed—these threads are Neither can proceed—these threads are
deadlockeddeadlocked
Race conditionRace condition
Multiple threads access the same memoryMultiple threads access the same memory
If memory is unprotected:If memory is unprotected: Two threads can simultaneously write to same Two threads can simultaneously write to same
memory (bad)memory (bad) One thread can read, another can write One thread can read, another can write
simultaneously (bad)simultaneously (bad) Two threads can simultaneously read from same Two threads can simultaneously read from same
memory (probably ok)memory (probably ok)
It’s a It’s a race race because final value is non-because final value is non-deterministically chosen by who gets there first.deterministically chosen by who gets there first.
Avoiding the ProblemAvoiding the Problem
If data is never accessed by more than If data is never accessed by more than one thread, you don’t have to worry about one thread, you don’t have to worry about concurrencyconcurrencyIf program If program logiclogic ensuresensures that only one that only one thread accesses data, you don’t need to thread accesses data, you don’t need to worry about locking the dataworry about locking the dataIf you’re writing a shared component, you If you’re writing a shared component, you almost almost alwaysalways have to worry about have to worry about concurrencyconcurrency
AlgorithmAlgorithm
““Lockset” algorithm detects both types of Lockset” algorithm detects both types of problemsproblems
Lockset - A pair ofLockset - A pair of Lock()/Unlock()Lock()/Unlock() InterruptDisable()/InterruptEnable()InterruptDisable()/InterruptEnable() Etc.Etc.
AlgorithmAlgorithm
Top-down analysis of control-flow graphTop-down analysis of control-flow graph
Add/remove locks as neededAdd/remove locks as needed
Check for race/deadlock on each Check for race/deadlock on each statementstatement
Cache results to ease exponential graph Cache results to ease exponential graph sizesize
Deadlock CheckDeadlock Check
Basically, finds if there are cycles in the Basically, finds if there are cycles in the lockset dependencieslockset dependencies If lock a is obtained, then lock b, we have: If lock a is obtained, then lock b, we have:
a a b b Following this line of reasoning, we can Following this line of reasoning, we can
discover cases that look like this:discover cases that look like this:a a b b c c a a
Deadlock CheckDeadlock Check
Deciding how important the cycle is, is Deciding how important the cycle is, is non-trivial.non-trivial.
Basically, rank higher according to:Basically, rank higher according to: Global locks vs. local locksGlobal locks vs. local locks Small depth difference vs. big depth Small depth difference vs. big depth
differencedifference Fewer threads vs. more threadsFewer threads vs. more threads
Race CheckingRace Checking
This is even harder than deadlock detectionThis is even harder than deadlock detection
Must answer:Must answer: Is lockset valid (if not, you will have LOTS of false Is lockset valid (if not, you will have LOTS of false
positives)positives) Can the unprotected memory be accessed more than Can the unprotected memory be accessed more than
one thread?one thread? Does the access need to be protected?Does the access need to be protected?
Two reads do not a wrong makeTwo reads do not a wrong make Must annotate API functions that require locksMust annotate API functions that require locks
Race CheckingRace Checking
Deciding if code is multithreaded:Deciding if code is multithreaded: Inferred from “programmer belief” – if a piece Inferred from “programmer belief” – if a piece
of code contains concurrency-related of code contains concurrency-related statements, the code is probably multi-statements, the code is probably multi-threadedthreaded
Annotations—designate API functions as Annotations—designate API functions as requiring locks requiring locks
Race CheckingRace Checking
Does memory need to be protected?Does memory need to be protected? If it’s never written to, no.If it’s never written to, no. If it’s only written on initialization, no.If it’s only written on initialization, no. On a certain code path, if there are a high-number of On a certain code path, if there are a high-number of
variables that are potentially written to concurrently, variables that are potentially written to concurrently, probably.probably.
Anything that can’t be written atomically, yes. Anything that can’t be written atomically, yes. (although, this is pretty much anything, especially if (although, this is pretty much anything, especially if you have more than 1 CPU)you have more than 1 CPU)
If a variable is statistically likely to be protected by If a variable is statistically likely to be protected by locking code (“Programmer Belief”)locking code (“Programmer Belief”)
RacerX: ResultsRacerX: Results
ConfirmedConfirmed UnconfirmedUnconfirmed BenignBenign FalseFalse
DeadlockDeadlock
System XSystem X 22 33 77
Linux 2.5.62Linux 2.5.62 44 88 66
FreeBSDFreeBSD 22 33 66
RaceRace
System XSystem X 77 44 1313 1414
Linux 2.5.62Linux 2.5.62 33 22 22 66
Pop Quiz – Question 1Pop Quiz – Question 1
If you have read the 3If you have read the 3rdrd paper, you may not paper, you may not answer this question.answer this question.
Find the bug:Find the bug:
if (card==NULL) {if (card==NULL) {
printk(KERN_ERR “capidrv-%d: … %d!\printk(KERN_ERR “capidrv-%d: … %d!\n”,n”,
card->contrnr, id);card->contrnr, id);
}}
Pop Quiz – Answer 1Pop Quiz – Answer 1
if (if (card==NULLcard==NULL) {) {
printk(KERN_ERR “capidrv-%d: … printk(KERN_ERR “capidrv-%d: … %d!\n”,%d!\n”,
card->contrnrcard->contrnr, id);, id);
}}
Pop Quiz – Question 2Pop Quiz – Question 2
If you have read the 3If you have read the 3rdrd paper, you may paper, you may not answer this question.not answer this question.Find the bug:Find the bug:
struct mxser_struct *info = struct mxser_struct *info = tty->driver_data;tty->driver_data;
unsigned long flags;unsigned long flags;if (!tty || !info->xmit_buf)if (!tty || !info->xmit_buf)
return 0;return 0;
Pop Quiz – Answer 2Pop Quiz – Answer 2
struct mxser_struct *info = struct mxser_struct *info =
tty->driver_datatty->driver_data;;
unsigned long flags;unsigned long flags;
if (if (!tty!tty || !info->xmit_buf) || !info->xmit_buf)
return 0;return 0;
General MethodologyGeneral Methodology
Take advantage of programmer beliefsTake advantage of programmer beliefs
Statistics are our friendStatistics are our friend
If something is usually done a certain way, If something is usually done a certain way, then instances that violate that should be then instances that violate that should be examinedexamined
Check Check internal consistencyinternal consistency Discover rules that are built-in to the codeDiscover rules that are built-in to the code Minimal to no annotationMinimal to no annotation
ConclusionConclusion
The methods tonight provide some of the The methods tonight provide some of the best ways to find errors:best ways to find errors: Millions of lines of code can be checked with Millions of lines of code can be checked with
at mostat most hundreds of lines of annotations hundreds of lines of annotations
The bugs these methods find are fairly The bugs these methods find are fairly specific in nature (revolve around well-specific in nature (revolve around well-structured code constructs)structured code constructs)
ReferencesReferences
Junfeng Yang, Ted Kremenek, Yichen Xie, and Dawson Engler. Junfeng Yang, Ted Kremenek, Yichen Xie, and Dawson Engler. MECA: an Extensible, Expressive System and Language for StaticalMECA: an Extensible, Expressive System and Language for Statically Checking Security Properties. ly Checking Security Properties. ACM CCS, 2003. ACM CCS, 2003. Dawson Engler and Ken Ashcraft. Dawson Engler and Ken Ashcraft. RacerXRacerX: Effective, Static Detection of Race Conditions and Deadlocks. : Effective, Static Detection of Race Conditions and Deadlocks. SOSP 2003. SOSP 2003. Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. Benjamin Chelf. Bugs as Deviant Behavior: A General Approach to Inferring Errors inBugs as Deviant Behavior: A General Approach to Inferring Errors in Systems Code. Systems Code. OSDI 2000. OSDI 2000. Source Lines of Code, Source Lines of Code, http://www.answers.com/topic/source-lines-of-codehttp://www.answers.com/topic/source-lines-of-codeConcurrency – Part 2: Avoiding the Problem, Concurrency – Part 2: Avoiding the Problem, http://blogs.msdn.com/larryosterman/archive/2005/02/15/373460.ashttp://blogs.msdn.com/larryosterman/archive/2005/02/15/373460.aspxpx