The Yogi Project Software property checking via verification and testing

The Yogi ProjectSoftware property checking via verification and testing

Aditya V. Nori, Sriram K. Rajamani

Programming Languages and ToolsMicrosoft Research India

The plan

Part 1: Our philosophy, the basic algorithm, related work coffee break

Part 2: Handling programs with pointers and procedures coffee break

Part 3: The engineering

Download: http://research.microsoft.com/yogi

http://research.microsoft.com/yogi

Thanks to our collaborators

Aws Albarghouthi Nels Beckman Patrice Godefroid Bhargav Gulavani Tom Henzinger

Yamini Kannan Rahul Kumar Robert Simmons Sai Deep Tetali Aditya Thakur

Property checking

void f(int *p, int *q){0: *p = 4;1: *q = 5;2: if ()3: error();}

QuestionIs the error unreachable for all possible inputs?

More generally, we are interested in the query

Possible solution: Testing

The “old-fashioned” and practical method of validating software

Generate test inputs and see if we can find a test that violates the assertion

What’s wrong with testing?

If we view testing as a “black-box” activity, Dijkstra is right!After executing many tests, we still don’t know if there is another test that can violatethe assertion

If we view testing as a “white-box” activity, and “observe” what happens inside the program, we can do interesting things:A. For a buggy program, we can generate

tests in a directed manner to find the bug

B. For a correct program, we can prove that the assertion holds for all inputs!

Our hypothesis

Tests: presence of bugs

void foo(int a){0: i = 0;1: c = 0;2: while (i < 1000) {3: c = c + i;4: i = i + 1;

}5: if (a <= 0)6: error();}

𝑎↦−5

Tests: presence of bugs𝑎↦−5

…

void foo(int a){0: i = 0;1: c = 0;2: while (i < 1000) {3: c = c + i;4: i = i + 1;

}5: if (a <= 0)6: error();}

Proofs: absence of bugs

void foo(int y1, int y2){0: state = 1;1: if (y1) {2: x0 = x0 + 1;

} else {3: x0 = x0 – 1;

}4: if (y2) {5: x1 = x1 + 1;

} else {6: x1 = x1 – 1;

}7: if(state != 1)8: error();}

Exponential number of tests required!

assume(y1)


0

1state = 1∃𝒔1

∃𝒔 2

2

overapproximation


} else {3: x0 = x0 – 1;

}4: if (y2) {5: x1 = x1 + 1;

} else {6: x1 = x1 – 1;



0

1

2

8:error

3

4

5 6

7

state = 1assume(y1) assume(!y1)x0=x0+1 x0=x0-1assume(y2) assume(!y2)

x1=x1+1 x1=x1-1assume(state != 1)9

assume(state == 1)


} else {3: x0 = x0 – 1;

}4: if (y2) {5: x1 = x1 + 1;

} else {6: x1 = x1 – 1;



linear proof exists!

0:state=11:state=1

2:state=1

8:error

3:state=14:state=1

5:state=1 6:state=17:state=1


} else {3: x0 = x0 – 1;

}4: if (y2) {5: x1 = x1 + 1;

} else {6: x1 = x1 – 1;


Algorithm uses only test case generation operations

Maintains two data structures:▪ A forest of reachable concrete states (tests)▪ Under-approximates executions of the program

Yogi: Proofs from Tests

… … …

Algorithm uses only test case generation operations

Maintains two data structures:▪ A forest of reachable concrete states (tests)▪ Under-approximates executions of the program

▪ A region graph (an abstraction)▪ Over-approximates all executions of the program


0

1

2

8:error

3

4

5 6

7

state = 1

assume(y1) assume(!y1)

x0=x0+1 x0=x0-1

assume(y2) assume(!y2)

x1=x1+1 x1=x1-1

assume(state != 1)9

assume(state == 1)

Algorithm uses only test case generation operations Maintains two data structures:▪ A forest of reachable concrete states (tests)▪ Under-approximates executions of the program

▪ A region graph (an abstraction)▪ Over-approximates all executions of the program

Our goal: bug finding and proving▪ If a test reaches an error, we have found bug▪ If we refine the abstraction so that there is *no* path from

the initial region to error region, we have a proof Key ideas▪ Frontier: Boundary between tested regions and untested

regions▪ uses only aliases α that are present along concrete tests

that are executed


Key ideas

Frontier = edge that crosses from tested to untested region

Step 1: Try to generate a test that crosses the frontier

Perform symbolic simulation on the path until the frontier and generate a constraint

Conjoin with the condition

needed to cross frontier Is satisfiable?

frontier

0

1

2

3

4

7

8

9

5

6

10

𝜑2

𝜑1

Key ideas



Conjoin with the condition needed to cross frontier

Is satisfiable? [YES]

Step 2: run the test and extend the frontier

0

1

2

3

4

7

8

9

5

6

10

frontier

Key ideas



Conjoin with the condition needed to cross frontier

Is satisfiable? [NO]

Step 2: use to refine so that the frontier moves back!

frontier

0

1

2

3

4:

7

8

9

5

6

10

4:

The Yogi algorithm

Can extend test beyond frontier?

Refine abstraction

Construct initial abstraction

Construct random tests

Test succeeded?

Bug!

Abstraction succeeded?

τ = error path in abstraction f = frontier of error path

yes

no

yes

no

Proof!

yes

no

Input:Program Property

The Yogi algorithm


Refine abstraction



Test succeeded?

Bug!



yes

no

yes

no

Proof!

yes

no


void f(int y){0: int lock=0, x;1: do {2: lock = 1;3: x = y;4: if (*) {5: lock = 0;6: y = y+1; }7: } while (x != y)8: if (lock != 1)9: error();10:}

0 1234

56

789

×

10

𝑦=1

)fronti

er

Symbolic execution +

Theorem proving

Symbolic execution


ylockxsymbol table

𝜏=(0,1,2,3,4,7,8,9)

Symbolic execution


ylockxsymbol table

𝜏=(0,1,2,3,4,7,8,9)

Symbolic execution


ylockxsymbol table

𝜏=(0,1,2,3,4,7,8,9)

Symbolic execution


ylockxsymbol table

𝜏=(0,1,2,3,4,7,8,9)

Symbolic execution


ylockxsymbol table

𝜏=(0,1,2,3,4,7,8,9)

constraints

Symbolic execution


ylockxsymbol table

𝜏=(0,1,2,3,4,7,8,9)

constraints

The Yogi algorithm


Refine abstraction



Test succeeded?

Bug!



yes

no

yes

no

Proof!

yes

no



0 1234

56

789

×

10)

frontier

NO

Refinement

89

8:¬ρ 8:ρ9

refine

𝑊𝑃 ¿

0 1234

56

789

10

assume(lock != 1)

Candidates for refinement predicate

A. Strongest postcondition (SP) along the path up to the frontier

B. Weakest precondition (WP): C. Any formula separating the two above

(e.g., an interpolant)

Refinement0 1

234

56

7

910

8: 8:p

89

8:¬ρ 8:ρ9

refine

𝜌=( 𝑙𝑜𝑐𝑘 !=1)

assume(lock != 1)

Next iteration


Refine abstraction



Test succeeded?

Bug!

Abstractionsucceeded?


yes

no

yes

no

Proof! yes

no



×

0 1234

56

7

910

8: 8:p

𝜏=(0,1,2,3,4,7 ,<8 ,𝑝>,9) frontier

Correct, the program is …


01

2

34⋀¬s5⋀¬s6⋀¬r

9

7⋀¬q8⋀¬p

4⋀s5⋀s6⋀r7⋀q

8⋀p

10

Soundness and Termination

• Theorem: Suppose we run Yogi on any program and property – If Yogi returns , then the abstract

program simulates , and thus is a proof that does not reach

– If Yogi returns , then is an error trace

35

prog. P’prog. P

SLIC rule

SLAM and CEGAR

boolean program

pathpredicates

slic

c2bp

bebop

newton

CEGAR (SLAM) on an example

void foo(int a){0: i = 0;1: c = 0;2: while (i < 1000) {3: c = c + i;4: i = i + 1; }5: if(a <= 0)6: error();}

CEGAR will need to refine the loop 1000 times!

01

2

3

456

Yogi on the same example

𝜏=(0,1,2 ,(3,4,2)1000 ,5,6)

Symbolic execution produces the constraint

frontier

01

2

3

456

void foo(int a){0: i = 0;1: c = 0;2: while (i < 1000) {3: c = c + i;4: i = i + 1; }5: if(a <= 0)6: error();}

Directed testing with DARTvoid test_me(int x, int y){1: int z = 2*x;2: if (z==y) {3: if (y == x+10)4: error(); }5: return; }

Step 1:• Try random test:

DART: Directed Automated Random Testing

Patrice Godefroid, Nils Klarlund, Koushik Sen, PLDI 2005

Directed testing with DART

Step 1:• Try random test:

Step 2:• Do a “parallel” symbolic

execution• Collect constraints and negate

the last branch • Solve for the constraint: • Try next test

void test_me(int x, int y){1: int z = 2*x;2: if (z==y) {3: if (y == x+10)4: error(); }5: return; }



Directed testing with DART

Step 1:• Try test:

Step 2:• Do a “parallel” symbolic

execution• Collect constraints and negate

the last branch • Solve for constraint: • Try next test • Find error!

void test_me(int x, int y){1: int z = 2*x;2: if (z==y) {3: if (y == x+10)4: abort(); }5: return; }



…

Path explosion with DARTvoid foo(int x1, int x2,…, int xn){ lock = 1 if (x1) g = g + 1; else g = g – 1; if (x2) g = g + 1; else g = g – 1; ... if (xn) g = g + 1; else g = g – 1; if (lock != 1) error();}

• Full path coverage expensive

• Yogi uses abstraction to ignore irrelevant states and efficiently computes a proof

Partition refinement, Bisimulations and the Lee-Yannakakis algorithm

Init Error

Bisimulations

• Partition is stable if for every region we have that doesn’t split any other region in a non-trivial manner

• Bisimulation = coarsest stable partition, precise abstraction for property checking

Init Error

Between regions

Between concrete

states

Partition refinement

Partition refinement:• For every region compute

and split every partition as needed

• Until no splits can be generated

𝜎 1 𝜎 2𝜎 3

Partition refinement

• On termination generates a bisimulation:

• For every pair of regions and we have that iff there is some state and some state such that

𝜎 1 𝑝𝑟𝑒 (𝜎 ¿¿3)¿ 𝜎 3On termination Yogi generates a simulation:For every pair of regions and , we have if there is some state and some state such that then

Partition refinement:• For every region compute

and split every partition as needed

• Until no splits can be generated

Lee-Yannakakis 92 andPasaraneu-Palanek-Visser 05

• Compute reachable bisimulation quotient

• Do splitting only for reachable portion of the state space

• Generate witnesses for reachability of regions and split only regions which have reachable witnesses

• On termination reachable partitions are a bisimulation:

• For every pair of regions and we have that iff there is some state and some state such that

ErrorInit

Can split

Can’t split

Comparison with Yogi

void foo(int y){0: while(y>0){1: y = y -1; }2: if(false)3: error();}

0

1

2

• Yogi terminates with this abstraction shown

• In contrast, Lee-Yannakakis will refine regions and with predicates forever

• Reachable bisumulation quotient for this program is infinite

• But bisimulation quotient is not necessary to prove the property. Simulation is sufficient, and Yogi is able to find such a simulation

3

Summary so far …

Yogi combines testing and verification Maintains a partition (region graph) just like SLAM Generates witnesses using test case generation like DART Core operation:▪ If frontier can be extended by a test case, extend▪ If frontier cannot be extended, refine the pre-state region

Combines advantages of SLAM and DART Uses regions to avoid exploring exponential number of paths (like

SLAM) Uses tests to explore complicated code (like DART)

Computes simulations that can do proofs

Terminates in more cases than algorithms that compute bisimulation quotients, such as Lee-Yannakakis

Documents

The Yogi Project Software property checking via verification and testing