11
In Defense of Probabilistic Static Analysis BEN LIVSHITS SHUVENDU LAHIRI MICROSOFT RESEARCH

too imprecise useless results may not scale does not scale overkill for some things possibly still too imprecise for others

Embed Size (px)

Citation preview

In Defense of Probabilistic Static

Analysis

BEN LIVSHITS

SHUVENDU LAHIRI

MICROSOFT RESEARCH

FROM THE PEOPLE WHO BROUGHT YOU SOUNDINESS.ORG…

STATIC ANALYSIS: UNEASY TRADEOFFS

too imprecise

useless results

may not scale

does not scale

overkill for some things

possibly still too imprecise for others

WHAT IS MISSING IS

ANALYSIS ELA S T I C I TY

OUR APPROACH IS PROBABILISTIC TREATMENT

Points-to(p, v, h)

• MANY INTERPRETATIONS ARE POSSIBLE

• OUR CERTAINTY IN THE FACT BASED ON STATIC EVIDENCE SUCH AS PROGRAM STRUCTURE

• OUR CERTAINTY BASED ON RUNTIME OBSERVATIONS

• OUR CERTAINLY BASED ON PRIORS OBTAINED FROM THIS OR OTHER PROGRAMS

Object x = new Object();

try {

} catch(...){

x = null;

}

if(...){ // branch direction info

x = new Object();

}else{

x = null;

}

$(‘mydiv’).css(‘color’:’red’);

BENEFITS

RESULT PRIORITIZATION

• STATIC ANALYSIS RESULTS CAN BE NATURALLY RANKED OR PRIORITIZED IN TERMS OF CERTAINTY, NEARLY A REQUIREMENT IN A SITUATION WHERE ANALYSIS USERS ARE FREQUENTLY FLOODED WITH RESULTS

ANALYSIS DEBUGGING

• PROGRAM POINTS OR EVEN STATIC ANALYSIS INFERENCE RULES AND FACTS LEADING TO IMPRECISION CAN BE IDENTIFIED WITH THE HELP OF BACKWARD PROPAGATION

MORE BENEFITS

HARD AND SOFT RULES

• IN AN EFFORT TO MAKE THEIR ANALYSIS FULLY SOUND, ANALYSIS DESIGNERS OFTEN COMBINE CERTAIN INFERENCE RULES WITH THOSE THAT COVER GENERALLY UNLIKELY CASES TO MAINTAIN SOUNDNESS

• NATURALLY BLENDING SUCH INFERENCE RULES TOGETHER, BY GIVING HIGH PROBABILITIES TO THE FORMER AND LOW PROBABILITIES TO THE LATTER ALLOWS US TO BALANCE SOUNDNESS AND UTILITY CONSIDERATIONS

INFUSING WITH PRIORS

• END-QUALITY OF ANALYSIS RESULTS CAN OFTEN BE IMPROVED BY DO- MAIN KNOWLEDGE SUCH AS INFORMATION ABOUT VARIABLE NAMING, CHECK-IN INFORMATION FROM SOURCE CONTROL REPOSITORIES, BUG FIX DATA FROM BUG REPOSITORIES, ETC.

SIMPLE ANALYSIS IN DATALOG

1. x=3;

2. y=null;

3. z=null;

4. z=x;

5. if(...){

6. z=null;

7. y=5;

8. }

9. w=*z

// transitive flow propagation1. FLOW(x,z) :- FLOW(x,y), ASSIGN(y,z)2. FLOW(a,c) :- FLOW(a,b),

ASSIGNCOND(b,c)3. FLOW(x,x). // nullable variables4. NULLABLE(x) :- FLOW(x,y), ISNULL(y) // error detection5. ERROR(a) :- ISNULL(a), DEREF(a)6. ERROR(a) :- !ISNULL(a),

NULLABLE(a), DEREF(a)

RELAXING THE RULES

// transitive flow propagationFLOW(x,y) ^ ASSIGN(y,z) => FLOW(x,z).1 FLOW(a,b) ^ ASSIGNCOND(b,c) => FLOW(a,c)FLOW(x,x).

// transitive flow propagationFLOW(x,z) :- FLOW(x,y), ASSIGN(y,z).FLOW(a,c) :- FLOW(a,b), ASSIGNCOND(b,c).FLOW(x,x).

// nullable variablesFLOW(x,y) ^ ISNULL(y) => NULLABLE(x).

// nullable variablesNULLABLE(x) :- FLOW(x,y), ISNULL(y).

// error detectionISNULL(a)^ DEREF(a) => ERROR(a).0.5 !ISNULL(a) ^ NULLABLE(a) ^ DEREF(a) => ERROR(a).

// error detectionERROR(a) :- ISNULL(a), DEREF(a).ERROR(a) :- !ISNULL(a), NULLABLE(a), DEREF(a).

// priors and shaping distributions3 !FLOW(x,y).

PROBABILISTIC INFERENCE WITH ALCHEMY

• TUNING THE RULES

• TUNING THE WEIGHTS

• SEMANTICS ARE NOT AS OBVIOUS

• SHAPING PRIORS IS NON-TRIVIAL, BUT FRUITFUL

X1

U1

W1

Z1

Z2

W4

Z3

W3

Y1

W5

W6

W7

W8

W9

W10 W11

0.616988 0.614989

0.567993

0.560994 0.544996

CHALLENGES

• LEARNING THE WEIGHTS

• EXPERT USERS

• LEARNING (NEED LABELED DATASET)

• WHAT CLASS OF STATIC ANALYSIS CAN BE MADE ELASTIC?

• DATALOG

• ABSTRACT INTERPRETATION

• DECISION PROCEDURE (SMT)-BASED