40
Surinder Kumar Jain School of IT Supervisor : Bernhard Scholz University of Sydney, Australia

Symbolic Analysis for Buffer Overflow

  • Upload
    katima

  • View
    51

  • Download
    1

Embed Size (px)

DESCRIPTION

Symbolic Analysis for Buffer Overflow. Surinder Kumar Jain School of IT. Supervisor : Bernhard Scholz University of Sydney, Australia. Buffer overflow threats. Vulnerability Note VU#180513. Buffer Overflow. Update/Read beyond bounds of buffer Results in Erratic program behaviour - PowerPoint PPT Presentation

Citation preview

Page 1: Symbolic Analysis for Buffer Overflow

Surinder Kumar Jain

School of IT

Supervisor : Bernhard ScholzUniversity of Sydney, Australia

Page 2: Symbolic Analysis for Buffer Overflow

Buffer overflow threats

Surinder Kumar Jain - [email protected] 2

Page 3: Symbolic Analysis for Buffer Overflow

Vulnerability Note VU#180513

Surinder Kumar Jain - [email protected] 3

Page 4: Symbolic Analysis for Buffer Overflow

Buffer OverflowUpdate/Read beyond bounds of buffer

Results in Erratic program behaviourProgram crashesSecurity breaches

Caused by Array access outside array limitsPointer reference errorsArray indicies errors

4Surinder Kumar Jain - [email protected]

Page 5: Symbolic Analysis for Buffer Overflow

Array Access ErrorsVariable array indexModified in a loop

Buffer overflow during (last iteration of the loop.

i=0array a[b-1]while c < b and i>m and i<n i=i+1 j=j+b d=2*d c=c+1 a[c]=0 /*buffer overflow */

if e > 0 f=f+2 else e=g*e

5Surinder Kumar Jain - [email protected]

Page 6: Symbolic Analysis for Buffer Overflow

Static Program Analysis

Data flow analysisAbstract InterpretationModel checkingSymbolic Analysis

Analyse Program behaviourWithout running the program

Techniques

6Surinder Kumar Jain - [email protected]

Page 7: Symbolic Analysis for Buffer Overflow

Symbolic Analysis & Execution

Symbolic execution of the programExecute a program with symbolic valuesSymbolic domains, predicates, semanticsRelate symbolic results to concrete interpretation

7Surinder Kumar Jain - [email protected]

Page 8: Symbolic Analysis for Buffer Overflow

Array bounds violation Analysis

Enumerate program paths in a loopFor each program path, do

Symbolic execution

Compare array indices with array bounds

8Surinder Kumar Jain - [email protected]

Page 9: Symbolic Analysis for Buffer Overflow

Example Number of Loop iterations

= min(b-c0,m-n-2)

Value of c during i’th iteration (closed form of c) at line a[c]=0 is

ci = c0+i

Value of c in final iteration isc = c0 +(b – c0)

= b

Hence statement a[c]=0 when c=b causes buffer overflow

in program path where m-n-2>=b-c0

c0 is value of c at the start of the program

i=0array a[b-1]while c < b and i>m and i<n i=i+1 j=j+b d=2*d c=c+1 a[c]=0 /*buffer overflow*/

if e > 0 f=f+2 else e=g*e

9Surinder Kumar Jain - [email protected]

Page 10: Symbolic Analysis for Buffer Overflow

Problem in General UndecidableEnumerate program paths

State explosion problem with loopsHow to do it for general programs with GoTos

Symbolic execution of a LoopUnknown number of repetitions Conditional assignments inside the loop

10Surinder Kumar Jain - [email protected]

Page 11: Symbolic Analysis for Buffer Overflow

Enumerating Program PathsGulwani et al.

non-deterministic semantics - no GoTosBurgstaller et al.

Path expressions algebra – with GoTosLoops as black boxes

Extend non-deterministic semantics to control flow graphs

Loop paths analysisAlgorithm to

Enumerate disjoint acyclic program paths for any program

11Surinder Kumar Jain - [email protected]

Page 12: Symbolic Analysis for Buffer Overflow

Symbolic ExecutionAlgorithm to

Do symbolic executionCompute path condition

Eliminate invalid pathsFor each loop in a program path, solve

Closed form of loop induction variablesSymbolic loop counterNumber of loop iterations

12Surinder Kumar Jain - [email protected]

Page 13: Symbolic Analysis for Buffer Overflow

Solving Program Loops

Recurrence System (for j) :

j(i+1)=j(i)+b

Solution is :

j(i)= j(0)+i*b

i=0array a[b-1]while c<b and i>m and i<n i=i+1 j=j+b d=2*d c=c+1 a[c]=0 /*buffer overflow*/

if e>0 f=f+2 else e=g*e

13Surinder Kumar Jain - [email protected]

Page 14: Symbolic Analysis for Buffer Overflow

Solving Program Loops ….

Loop continue condition :1.i > m :: i is in range [m+1,infinity]2.i < n :: i is in range [-infinity,n-1]And’ing the two we get : i is in range [n-1,m+1]

Loop non-entry condition is : m+1<n-1Loop entry condition is : m+1>=n-1Loop counter is : (m+1)-(n-1)=m-n-2

i=0array a[b-1]while c<b and i>m and i<n i=i+1 j=j+b d=2*d c=c+1 a[c]=0 if e>0 f=f+2 else e=g*e

Values at end of loop :

j = j0 + (m-n-2)*b

d = d0 * 2(m-n-2)

14Surinder Kumar Jain - [email protected]

Page 15: Symbolic Analysis for Buffer Overflow

Solving Program Loops ….Computer Algebra algorithms to

Solve recurrence systemsSolve loop exit conditionSolve loop counters as symbolic expressionsSolve number of loop iterations as symbolic

expressionsUse skolmisation techniques for unsolvable

cases

15Surinder Kumar Jain - [email protected]

Page 16: Symbolic Analysis for Buffer Overflow

Path State ExplosionExtend the notion of Burgstaller’s path expressions Extend Gulwani’s semanticsCombine them together defining new

Group of paths as non-deterministic domains

16Surinder Kumar Jain - [email protected]

Page 17: Symbolic Analysis for Buffer Overflow

Non-Determinism and PathsIf c then s10 else s11 s2If d then s3

Surinder Kumar Jain - [email protected] 17

choose( {assume(c);s10;s2;assume(d);s3 assume(-c);s11;s2;assume(d);s3 assume(c);s10;s2;assume(-d) assume(-c);s11;s2;assume(-d) } )

Path Condition Path statements

c & [[s2]][[s10]]d s10;s2;s3-c & [[s11]][[s2]]d s11;s2;s3 c & [[s2]][[s10]](-d) s10;s2-c & [[s2]][[s10]](-d) s11;s2

Page 18: Symbolic Analysis for Buffer Overflow

Path Enumeration Loops…Deterministic program Paths : nm

(m is the number of loop iterations)

Non-deterministic program paths : n+1(independent of loop iterations)

Reduction from exponential to polynomial of degree nested loop depth

while c if a1 then A1

else if a2 then A2

… … … else if an then An

18Surinder Kumar Jain - [email protected]

Page 19: Symbolic Analysis for Buffer Overflow

Initial ResultsContribution Status

Non-deterministic semantics for general control flow graphs

In progress Expected completion Dec 09

Algorithm to enumerate disjoint program paths in LISP for a simple language

Completed

Conditional recurrence systems as nested recurrence systems

Completed Proof for some simple cases

Non-Deterministic Symbolic domains Informal definition completed Formal definition expected by June 10

Loop path State explosion problem Reduced from exponential to polynomial of loop nesting depth degree.

Surinder Kumar Jain - [email protected] 19

Page 20: Symbolic Analysis for Buffer Overflow

Conditional recurrenceswhile p A if c then B else C endif Dendwhile

Surinder Kumar Jain - [email protected] 20

while p while p & [[A]]c A B D endwhile while p & -[[A]]c A C D endwhileendwhile

Page 21: Symbolic Analysis for Buffer Overflow

Experimental WorkWe develop program to Enumerate program pathspath conditions Interface with a Computer Algebra System to

Obtain closed form for loop induction variablesSolve loop exit conditionSolve number of loop iterations

Perform symbolic executionArray bound comparison Reporting array bound violation (Static

analysis)21Surinder Kumar Jain - [email protected]

Page 22: Symbolic Analysis for Buffer Overflow

Experimental Work ….Analyse a small program

for all possible pathsto crash it

OrAnalyse an open source system

for Buffer Overflow reportingPrioritised as

– Definite – May be

With full set of counter examples22Surinder Kumar Jain - [email protected]

Page 23: Symbolic Analysis for Buffer Overflow

SummaryArray access in loops causes buffer overflow

errorEnumerate disjoint program paths & conditionsNon-deterministic semantics for path expressionsSymbolic execution of program paths to identify

buffer overflowsProgram loops cause path enumeration state

explosionNon-deterministic domains to reduce state

explosion

23Surinder Kumar Jain - [email protected]

Page 24: Symbolic Analysis for Buffer Overflow

Time TableItem Duration Start Finish Comme

nts

Continuing survey ongoing

Extend non-deterministic semantics to general control flow graphs

Dec 2009 in progress

Formal definition of non-deterministic symbolic domains 5 months Jan 2010 May 2010

Algorithm to enumerate input disjoint program paths 3 months June 2010 Aug 2010

Algorithm to do symbolic execution of a program path 1 month Sep 2010 Sep 2010

Interface with a computer algebra system 1 month Oct 2010 Oct 2010

Automate solving of recurrence systems 1 month Nov 2010 Nov 2010

Automate solving of path exit conditions 1 month Dec 2010 Dec 2010

Experiment tool on a bench mark of programs 6 months Jan 2011 June 2011

Compile results 3 months July 2011 Sep 2011

Produce thesis 2 months Oct 2011 Nov 2011

Surinder Kumar Jain - [email protected] 24

Page 25: Symbolic Analysis for Buffer Overflow

ReferencesReferences[1] Johann Blieberger Bernd Burgstaller, Bernhard Scholz. Symbolic Analysis

An Algebra-based approach.[2] G. Canfora, A. Cimitile, and A. De Lucia. Conditioned program slicing.

Information and Software Technology, 40(11-12):595607, 1998.[3] T. Fahringer and B. Scholz. Advanced symbolic analysis for compilers.

Springer-Verlag New York, Inc. Secaucus, NJ, USA, 2003.[4] Michael P. Gerlek, Eric Stoltz, and Michael Wolfe. Beyond induction

variables: detecting and classifying sequences using a demand-driven ssa form. ACM Trans. Program. Lang. Syst., 17(1):85122, 1995.

[5] S. Gulwani, S. Jain, and E. Koskinen. Control-refinement and progress invariants for bound analysis. PLDI, 2009.

[6] M.R. Haghighat and C.D. Polychronopoulos. Symbolic analysis for parallelizing compilers. ACM Transactions on Programming Languages and Systems (TOPLAS), 18(4):477518, 1996.

25Surinder Kumar Jain - [email protected]

Page 26: Symbolic Analysis for Buffer Overflow

Conferences1. SCAM 2010 TConference on Source Code Analysis and Manipulation,

12th-13th September 2010, Timişoara, Romania, Deadline : 23rd April, 2010

2. FSE-18 2010 - ACM SIGSOFT / FSE ICSE - ideas, innovations, trends, and experiences in the field of software engineering, deadline 5 March 2010.

3. POPL 2010 - ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation (PEPM'2010) Jan 2010 (Closed)

4. PLDI 2010 – ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation deadline: November 13th (closed)

5. ESOP 2010: devoted to fundamental issues in the specification, analysis, and implementation of programming languages and systems (closed)

Surinder Kumar Jain - [email protected] 26

Page 27: Symbolic Analysis for Buffer Overflow

?Surinder Kumar Jain - [email protected] 27

Page 28: Symbolic Analysis for Buffer Overflow

Canonical formProgram = Choose (path1, path2, … , pathn)

Where each pathi = (ci, Si) is a program path.

Path conditions are in semi-canonical form ifFor all i & j, ci&cj=Falsei.e. only one of the paths can be taken as only one of

the path conditions can be true at any time.Path conditions are in canonical form ifThey are in semi-canonical form andc1 or c2 or … or cn=True i.e. one of the path will always be taken

Page 29: Symbolic Analysis for Buffer Overflow

Enumerating Canonical formProgram = Choose ((a,A),(b,B))Its canonical form is :Choose((a&b,A;B),(a&-b,A),(-a&b,B),(-a&-

b,SKIP))

A program with n program paths has at most 2n program paths in its canonical form.

Page 30: Symbolic Analysis for Buffer Overflow

Assume and ChooseAssume is a path conditionChoose permits choice of execution between program

pathsA general control flow graph can be represented as a

choice of program paths*

E.g.

Program = Choose (path1, path2, … , pathn)

Where each pathi = (ci, Si) is a program path.

*Burgstaller et al & Gulwani et al

Page 31: Symbolic Analysis for Buffer Overflow

Canonical formProgram = Choose (path1, path2, … , pathn)

Where each pathi = (ci, Si) is a program path.

Path conditions are in semi-canonical form ifFor all i & j, ci&cj=Falsei.e. only one of the paths can be taken as only one of

the path conditions can be true at any time.Path conditions are in canonical form ifThey are in semi-canonical form andc1 or c2 or … or cn=True i.e. one of the path will always be taken

Page 32: Symbolic Analysis for Buffer Overflow

Enumerating Canonical formProgram = Choose ((a,A),(b,B))Its canonical form is :Choose((a&b,A;B),(a&-b,A),(-a&b,B),(-a&-

b,SKIP))

A program with n program paths has at most 2n program paths in its canonical form.

Page 33: Symbolic Analysis for Buffer Overflow

Enumerating UnionP1 = Choose ((a1,A1), (a2,A2), … , (an,An))

P2 =Choose ((b1,B1), (b2,B2), … , (bn,Bm))

P1 union P2 = Choose ((a1,A1), (a2,A2), … , (an,An)(b1,B1), (b2,B2), … , (bn,Bm))

has m+n paths with 2m+n maximum canonical paths.

If P1 and P2 were in canonical form then maximum number of canonical paths of P1 union P2 is m*n.

Page 34: Symbolic Analysis for Buffer Overflow

Enumerating SequenceP1 = Choose ((a1,A1), (a2,A2), … , (an,An))

P2 =Choose ((b1,B1), (b2,B2), … , (bn,Bm))

P1;P2 is new program which executes P1 followed by execution of P2.

If P1 and P2 were in canonical form then maximum number of canonical paths of P1;P2 is m*n.

Page 35: Symbolic Analysis for Buffer Overflow

Enumerating LoopsConsider the following program While c Do S

This program can be split into two program paths ;

Choose((-co,Skip),(c0,Smu))

Where c0 is the initial value of c and mu is a function that returns an integer greater than zero, the number of times loop is executed. (mu can be infinite)

If program uses symbolic expressions as program variable values then mu is a function over symbolic expressions.

If loop has a closed form then mu can be expressed as a symbolic expression too.

Page 36: Symbolic Analysis for Buffer Overflow

Enumerating 2-path LoopConsider the following program While p Do Choose ((a,A),(-a,B))

Unrolling the loop gives following program path sequences : (-p0,skip) (p0&a0,A) (p0&-a0,B) (p0&p1&a0&a1,A;A) (p0&p1&a0&-a1,A;B) …. (p0&p1&…&pi&x0&x1&…&xi,X1;X2;…;Xi)

Where pk is value of predicate p at the start of k’th iteration of the loop ak is value of the predicate a at the start of k’th iteration of the loop xk refers to ak if ak evaluates to True otherwise it refers to –ak Xk refers to A if ak evaluates to True otherwise it refers to B All values are in symbolic expressions

Number of Program Paths is : 2mu+2mu-1+2mu-2+….+20

Page 37: Symbolic Analysis for Buffer Overflow

Enumerating multi-path LoopConsider the following program While p Do Choose ((a1,A1),(a2,A2),…,(an,An))

The path condition at the start of i’th iteration is given by :

p0&p1&…&pi&x0&x1&…&xi Where pk is value of predicate p at the start of k’th iteration of the loop xk refers to ajk for some value of j, 1<=j<=n such that aj during k’th

iteration evaluates to True. (j is unique if Choose is in canonical form)

Path statement is :

X1;X2;…;Xi Where Xk refers to Ak when xk evaluates to True otherwise Xk is null

statement. All values are in symbolic expressions

Number of Symbolic Program Paths is : nmu+nmu-1+nmu-2+….+n0

Page 38: Symbolic Analysis for Buffer Overflow

Enumerating multi-path LoopThe program While p Do Choose ((a1,A1),(a2,A2),…,(an,An))is same as Choose(-p0,Skip); Choose ((p0&p1&…&pi&x0&x1&…&xi,X1;X2;…;Xi)

where i varies from 1 to mu)

Number of Choices in the second choose statement is :

nmu+nmu-1+nmu-2+….+n1

Page 39: Symbolic Analysis for Buffer Overflow

Enumerating multi-path LoopChoose ((p0&p1&…&pi&x0&x1&…&xi,X1;X2;…;Xi) where i

varies from 1 to mu)Number of Choices is : nmu+nmu-1+nmu-2+….+n1

We will extend Burgstaller et al’s Symbolic domain (Deterministic) to a Non-deterministic domain to reduce this large number of paths to a single non-deterministic path in Non-deterministic domain.

Non-deterministic domain will group a large number of deterministic symbolic values.

We will extend definitions of operations over non-deterministic symbolic expressions

We will extend boolean operations over non-deterministic boolean predicates

We will define program paths over non-deterministic symbolic expressions

Page 40: Symbolic Analysis for Buffer Overflow

Extending to non-deterministic Symbolic DomainExtend Burgstaller et al’s Symbolic domain (Deterministic)

to a Non-deterministic domain to reduce this large number of paths to a single non-deterministic path in Non-deterministic domain.

Non-deterministic domain will group a large number of deterministic symbolic values. (Boolean : All(True), None(False), Any)

Extend definitions of operations over non-deterministic symbolic expressions

Extend boolean operations over non-deterministic boolean predicates

Define program paths over non-deterministic symbolic expressions

Prune and simplify non-deterministic paths and determine upper and lower bounds on these values