28
Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft Corporation

Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

Symbolic Path Simulation in Path-Sensitive Dataflow Analysis

Hari Hampapuram

Jason Yue Yang

Manuvir Das

Center for Software Excellence (CSE)Microsoft Corporation

Page 2: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft2

Gist of Results

Symbolic path simulation engine supporting:

1. Merge – For merge-based path-sensitive analysis

2. Function summaries– For scalable global analysis

3. Pointers– Our main client is Windows

Page 3: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft3

Infeasible Path False Positive

extern int a, b; void Process(int handle) { int x, y; if (a > 0) { CloseHandle(handle); x = 1; } else x = 2;

if (b > 0) y = 1; else y = 2;

if (x != 1) UseHandle(handle); }

START

OPEN CLOSE

ERROR

OpenHandle

UseHandle

CloseHandle

UseHandle

Page 4: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft4

Infeasible Path False Positive

extern int a, b; void Process(int handle) { int x, y; if (a > 0) { CloseHandle(handle); x = 1; } else x = 2;

if (b > 0) y = 1; else y = 2;

if (x != 1) UseHandle(handle); }

START

OPEN CLOSE

ERROR

OpenHandle

UseHandle

CloseHandle

UseHandle

Page 5: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft5

Need for Merge

The “knob” for scalability vs. precision tradeoff– Always merge (traditional dataflow) false errors– Always separate: exponential blow-up

Driven by client analyses

Page 6: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft6

Merge Criterion for ESP

Selective merging based on property states– Partition symbolic states into property states and

everything else– If the incoming paths differ in property states,

track them separately; otherwise, merge them.

Page 7: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft7

extern int a, b; void Process(int handle) { int x, y; if (a > 0) { CloseHandle(handle); x = 1; } else x = 2;

if (b > 0) y = 1; else y = 2;

if (x != 1) UseHandle(handle); }

Merge Criterion for ESP Example

Property states different along paths

Page 8: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft8

extern int a, b; void Process(int handle) { int x, y; if (a > 0) { CloseHandle(handle); x = 1; } else x = 2;

if (b > 0) y = 1; else y = 2;

if (x != 1) UseHandle(handle); }

Merge Criterion for ESP Example

Property states different along paths

Do not merge

Page 9: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft9

extern int a, b; void Process(int handle) { int x, y; if (a > 0) { CloseHandle(handle); x = 1; } else x = 2;

if (b > 0) y = 1; else y = 2;

if (x != 1) UseHandle(handle); }

Merge Criterion for ESP Example

Property states are the same

Property states change along paths

Do not merge

Page 10: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft10

extern int a, b; void Process(int handle) { int x, y; if (a > 0) { CloseHandle(handle); x = 1; } else x = 2;

if (b > 0) y = 1; else y = 2;

if (x != 1) UseHandle(handle); }

Merge Criterion for ESP Example

Property states are the same

Merge

Property states change along paths

Do not merge

Page 11: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft11

extern int a, b; void Process(int handle) { int x, y; if (a > 0) { CloseHandle(handle); x = 1; } else x = 2;

if (b > 0) y = 1; else y = 2;

if (x != 1) UseHandle(handle); }

Merge Criterion for ESP Example

Property states are the same

Merge

Still maintains the needed fact: “If CloseHandle is called, branch should fail.”

Property states change along paths

Do not merge

Page 12: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft12

extern int a, b; void Process(int handle) { int x, y; if (a > 0) { CloseHandle(handle); x = 1; } else x = 2;

if (b > 0) y = Foo(b); else y = 2;

if (x != 1) UseHandle(handle); }

Need for Function Summaries

Partial transfer functionsComputed on-demandEnforced by “into-binding” and “back-binding”

Page 13: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft13

Support for Language Features

Pointers Field-based objects Operator expressions …

Page 14: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft14

Symbolic Simulator Architecture

Client Application Client Application

Simulation Interface(SI)

Simulation Interface(SI)

Simulation State Manager(SSM)

Defect detection, core dump analysis, test generation code review ...

“Semantic translator”

“Theorem prover”

Page 15: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft15

Semantic Domains

Environment– ProgramSymbol Loc– Managed by Simulation Interface

Store– Loc Val– Managed by Simulation State Manager

Region-based model for symbolic store– region Loc– value Val

Page 16: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft16

Simulation State Manager (SSM)

Tracking symbolic simulation states to answer queries about path feasibility

What should be tracked?– Mapping of store region value– Constraints on values

Page 17: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft17

Regions

Variable regions vs. deref regions– Important for pointer dereference– Important for supporting merge and binding

void Process(int *p, int *q) { int x = *p; int y = *q; if (p != q) return; if (*p != *q) … // Not reachable }

Variable regions: R(p), R(q), R(x), R(y)

Deref regions: R(*p), R(*q)

Page 18: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft18

Values

Constant values (integers, floats, …) Operator values (arithmetic, bitwise, relational) Symbolic values (general constraint variables) Region-initial values (constraint variables for

initial values) Pointer values (for points-to relationship) Field-based values (for compound types)

Page 19: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft19

Need for Region-Initial Values

Important for function summary– Pre-condition: simulation state at Entry node– Post-condition: simulation state at Exit node– Input values vs. current values

To support lazy initialization for input values– An input region gets region-initial values by default,

unless it has been killed– Need to maintain a kill set

Page 20: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft20

Decision Procedures

Current implementation:– Equality (e.g. a == b): equivalence classes– Disequality (e.g. a != b): multi-maps between

equivalence classes– Inequality (e.g. a< b): a graph (nodes are

equivalence classes and edges are inequality relations)

Can plug in other theorem provers if needed

Page 21: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft21

Merge

Moves symbolic states upwards in the lattice– Less constraints on path feasibility after merge

Maps the memory graphs and the associated constraints on values

R1

R2

R1’

R2’

R1’’

R2’’

0xEFD0 0xEFD0 0xEFD0

$1 $3$2

JOIN

$1 > 0 $3 > 0$2 > 0

Page 22: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft22

Example Client Analysis ESP

Path-sensitive, context sensitive, inter-procedural defect detection tool for large C/C++ programs

Page 23: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft23

Simulation Interface (SI)

Fetching regions and values Assignments

– E.g., x = 1;

Branches– E.g., a == b;

Procedure call (into-binding) Call back (back-binding)

Page 24: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft24

Into-Binding

Two approaches:– Binding precise calling context into callee

Less demand in reasoning power to refute infeasible path More suitable for top-down analysis

– Binding no constraints (TOP) into callee More demand in reasoning power to refute infeasible path More suitable for bottom-up analysis

Binding from caller Call node to callee Entry node– Bind parameters– Bind global variables– Bind constraints

Page 25: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft25

Back-Binding

Binding from callee Exit node to caller Return node– Bind the region-initial values of input regions– Bind values of output regions– Bind constraints

Page 26: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft26

Experiences

Security properties for future version of Windows Difficult to check with other tools Scalability

– E.g., for all device drivers, found ~500 errors in 20 hours

Precision: – E.g., for Windows kernel (216,000 LOC, 9755 functions)

Bugs False Positives Time (sec)

With Path Simulation 2 0 1098

Without Path Simulation 2 12 1037

Page 27: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft27

Summary

Critical for improving precision Scalable enough for industrial programs Other client analyses

– PSE– Iterative refinement for ESP

Beneficial to have built-in support for

merge, function summaries, and pointers

Page 28: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft

PASTE'05 Jason Yang, Microsoft28

Thank You!

For more information, please visithttp://www.microsoft.com/windows/cse/pa