41
APSEC Keynote, Dec 2012, BG Ryder 1 Blended Program Analysis for Improving Reliability of Real-world Applications Collaborators: Bruno Dufour (Rutgers), Gary Sevitsky (IBM Research), Marc Fisher II (VT), Ben Wiedermann (VT), Shiyi Wei (VT), S. Basu (ug-Lafayette), Luke Marrs (ug-VT); Funded by IBM Open Collaborative Research Program and NSF 08-0811518 Dr. Barbara G. Ryder J. Byron Maupin Professor of Engineering Virginia Tech

Blended Program Analysis for Improving Reliability of Real

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Blended Program Analysis for Improving Reliability of Real

APSEC Keynote, Dec 2012, BG Ryder 1

Blended Program Analysis for Improving Reliability of

Real-world Applications

Collaborators: Bruno Dufour (Rutgers), Gary Sevitsky (IBM Research), Marc Fisher II (VT), Ben Wiedermann (VT), Shiyi Wei (VT), S. Basu (ug-Lafayette), Luke Marrs (ug-VT); Funded by IBM Open Collaborative Research Program and NSF 08-0811518

Dr. Barbara G. Ryder J. Byron Maupin Professor of Engineering

Virginia Tech

Page 2: Blended Program Analysis for Improving Reliability of Real

APSEC Keynote, Dec 2012, BG Ryder 2

Outline § What is program analysis? §  Challenge of modern software applications § What is blended program analysis? §  Examples of blended analysis

§  Performance diagnosis for framework-based applications (Java)

§ Data integrity & program understanding for webpage codes (JavaScript)

§  Summary

Page 3: Blended Program Analysis for Improving Reliability of Real

What is Program Analysis?

§  Historically, § Static program analysis was used for code optimization during compilation § Gave a safe approximation of all possible program behaviors, without running the program

§ Allows checking of specific program properties § Dynamic program analysis was used for tracking program behavior at runtime § Gathered information about program on one execution

APSEC Keynote, Dec 2012, BG Ryder 3

Page 4: Blended Program Analysis for Improving Reliability of Real

What is Program Analysis? §  Static Analysis

§  Input: code §  Output: model of

semantics of program §  When: At compile time §  Cost: High §  Goal: code

optimization, property validation, program understanding, test case generation...

§  Dynamic Analysis §  Input: trace of

execution(s) §  Output: an observed

property §  When? At runtime §  Cost: Low §  Goal: Debugging,

uncovering code dependences,test coverage...

APSEC Keynote, Dec 2012, BG Ryder 4

Page 5: Blended Program Analysis for Improving Reliability of Real

Challenge: Tools for Modern SW Applications

§  Framework-based software systems (Java) § Transactional, complex, many libraries/components § E.g., personal financial managers, e-commerce applications, information records managers

§  Performance problems difficult to isolate § Need understanding of business logic as well as run-time behavior

APSEC Keynote, Dec 2012, BG Ryder 5

Page 6: Blended Program Analysis for Improving Reliability of Real

Dynamic Language Constructs §  Java

§  Reflection – values from run-time environment influence program behavior (e.g., in dynamic class loading) //read name of class to be dynamically loaded //from command line* Class a = Class.forName(args[0]); … Method mainMethod = findMain(a); mainMethod.invoke( … );

*http://media.techtarget.com/tss/static/articles/content/dm_classForname/DynLoad.pdf

APSEC Keynote, Dec 2012, BG Ryder

6

Page 7: Blended Program Analysis for Improving Reliability of Real

Challenge: Tools for Modern SW Applications

§ Websites (JavaScript) §  Constructed from combination of static and dynamically loaded/generated code

§  Code can be constructed at runtime and then executed

§  Program understanding (for maintenance), testing, and validating data integrity are hard

APSEC Keynote, Dec 2012, BG Ryder 7

Page 8: Blended Program Analysis for Improving Reliability of Real

Dynamic language constructs

§  JavaScript § Ability to construct a URL at runtime and then load that webpage

§  Functions with variable numbers of arguments

§ Ability to write code at runtime and execute it with eval statement //xmlhttp is an instance of XMLHttpRequest eval(xmlhttp.response());

APSEC Keynote, Dec 2012, BG Ryder 8

Page 9: Blended Program Analysis for Improving Reliability of Real

What is Blended Program Analysis? ISSTA07, FSE08, ICSM10, CS@VT TR-12-18(2012)

§  A way to integrate dynamic and static analyses to obtain scalability and precision for specific problems § Use of a dynamic program representation

§ Reflects call structure of execution(s) § Use of light-weight dynamic information

§ To prune (some) infeasible paths § To tie static analysis information to actual run-time objects

§ To deal with dynamic program constructs

APSEC Keynote, Dec 2012, BG Ryder 9

Page 10: Blended Program Analysis for Improving Reliability of Real

Outline

§  Blended analysis for performance diagnosis in framework-intensive applications (Java)

APSEC Keynote, Dec 2012, BG Ryder 10

Page 11: Blended Program Analysis for Improving Reliability of Real

APSEC Keynote, Dec 2012, BG Ryder 11

Framework-based Applications

§  Application is an iceberg §  Bulk of the code in libraries and

frameworks §  Genre not commonly addressed by

research community §  E.g., financial planning services, e-

commerce sites, online reservation systems, Tomcat-based systems software

§  Programs are not just large, but are more complex in interactions between frameworks

§  Performance problems span multiple layers

Libraries and Frameworks

App

Middleware

Page 12: Blended Program Analysis for Improving Reliability of Real

APSEC Keynote, Dec 2012, BG Ryder 12

Framework-based Applications §  Software characteristics

§  Not amenable to static analyses § Not scalable -- too complex

§  Not amenable to dynamic analysis § Too intrusive into execution for production codes

§  Application’s main function often is data transformation

§  Goal: design analyses for performance diagnosis of these systems

Libraries and Frameworks

App

Middleware

Page 13: Blended Program Analysis for Improving Reliability of Real

APSEC Keynote, Dec 2012, BG Ryder 13

Goal: Find Object Churn §  Identify execution contexts with excessive use of temporaries §  Based on total number of instances §  Not the same as finding often-executed allocation sites §  Need to identify temporary objects and to approximate

“object lifetime”

§  Elimination strategies §  Optimize the frequent use of frameworks and libraries

together §  Introduce caching for temporary data structures §  Code specialization

Page 14: Blended Program Analysis for Improving Reliability of Real

APSEC Keynote, Dec 2012, BG Ryder 14

Current Practice: Jinsight Trace of HoldingDataBean_Ser.serialize()

Tens of thousands of calls How to find churn locality?

Page 15: Blended Program Analysis for Improving Reliability of Real

APSEC Keynote, Dec 2012, BG Ryder 15

Blended Analysis - Scalability

1473

18267

8089

25012

348

3919

2223

5848

17

322

71

373

1

10

100

1000

10000

100000

Dct-Std Dct-WS EJB-Std EJB-WS

Ca

llin

g c

on

tex

ts

getHoldings()

2 orders of magnitude! Looking at the entire trace

Approximating contexts that use temporaries

Identifying contexts that truly use temporaries

Trade6:

Page 16: Blended Program Analysis for Improving Reliability of Real

APSEC Keynote, Dec 2012, BG Ryder 16

Blended Analysis Paradigm

Java Application

Profile Loaded Classes

Reflection Specification + Templates

Dynamic Calling Structure

Static Analysis

Models of methods

Page 17: Blended Program Analysis for Improving Reliability of Real

Pruning Code in Methods Entry

Exit

x = new B()

y = D.m()

z = C.m()

w = new A()

Allocated types: {B} Observed targets: {D.m}

(FSE’08)

APSEC Keynote, Dec 2012, BG Ryder 17

Page 18: Blended Program Analysis for Improving Reliability of Real

C D

E

18

Escape Analysis, by Example

zag()

foo()

bar() baz() A B C D

E

Captured Arg-escaping Globally escaping

void bar() { a = new A(); a.x = new B(); } C baz() { c = new C(); c.y = new D(); c.z = new E(); return c; } void foo(F f) { c = baz(); f.w = c.z; } void zag() { F f = new F(); foo(f); G.global = f; }

C D

E F E

F E G F E

A B C D E F G

Disposition

APSEC Keynote, Dec 2012, BG Ryder

Page 19: Blended Program Analysis for Improving Reliability of Real

Invocation tree for hasConnectAccess!

LittleEndianUtil

readInt

40 captured, 40 new

LittleEndianUtil

readInt

80 captured, 80 new

SecurityServer

hasConnectAccess

180 captured

SecurityDescriptor

getEffectiveAccess

60 captured, 40 new

LittleEndianUtil

readIntFromBytes

80 captured

GCDObject

getAttribute

20 captured, 20 new

SecurityDescriptor

deserialize

60 new

PermissionSource

getInstanceFromInt

40 captured, 40 new

AceAccess

getInstanceFromInt

40 captured, 40 new

LittleEndianUtil

readShort

40 captured, 20 new

SecurityDescriptor

loadFromBytes

100 captured, 40 new

AccessControlList

deserialize

80 new

AccessControlEntry

deserialize

120 new

LittleEndianUtil

readIntFromBytes

40 captured

LittleEndianUtil

readShort

80 captured, 40 new

SecurityContext

isSidPresent

80 captured

APSEC Keynote, Dec 2012, BG Ryder 19

Optimized trace; only shows allocating and capturing methods 1000 instances created but 880 not escaping

Each invoke of hasConnectaccess has a unique SecurityDescriptor for an ID instance; could cache w/I ID;

Page 20: Blended Program Analysis for Improving Reliability of Real

Metrics

Designed new metrics for blended escape analysis

§ Measure effectiveness of pruning § Scalability of analysis – % of blocks in methods pruned

APSEC Keynote, Dec 2012, BG Ryder 20

Page 21: Blended Program Analysis for Improving Reliability of Real

Scalability

APSEC Keynote, Dec 2012, BG Ryder 21

Benchmark

Analysis time (h:m:s)

Speedup %Pruned No

pruning Pruned

Direct-Std 00:00:18 00:00:17 1.1 45%

Direct-WS 01:34:01 00:04:41 20.0 40%

EJB-Std 00:04:24 00:01:46 2.5 43%

EJB-WS N/A 29:23:16 N/A 43% Eclipse 24:37:12 06:37:22 3.7 29%

Jazz 02:49:55 00:39:06 4.3 27%

IBM Appln 00:04:35 00:02:05 2.2 39%

Page 22: Blended Program Analysis for Improving Reliability of Real

Metrics

§ Measure usage of temporaries § Disposition- categorizes instances as globally: escaping, captured, mixed

APSEC Keynote, Dec 2012, BG Ryder 22

Page 23: Blended Program Analysis for Improving Reliability of Real

Disposition of Instances

APSEC Keynote, Dec 2012, BG Ryder 23

33.9% 51.6%

32.3% 42.1%

26.3%

74.3% 56.9%

0%

25%

50%

75%

100%

Direct/Std Direct/WS EJB/Std EJB/WS Eclipse Jazz CDMS

% o

f dy

namic o

bjec

t inst

ance

s

Captured Mixed Escaping

IBM App

~50% instances captured

Most instances exhibit only 1 behavior

Page 24: Blended Program Analysis for Improving Reliability of Real

Metrics

§ Measure usage of temporaries §  Concentration- measures locality of temporary usage

APSEC Keynote, Dec 2012, BG Ryder 24

Page 25: Blended Program Analysis for Improving Reliability of Real

Concentration of Captured Instances

APSEC Keynote, Dec 2012, BG Ryder 25

29.0%

54.3% 42.0%

51.2%

88.4%

62.2% 66.8%

0%

25%

50%

75%

100%

Direct/Std

Direct/WS

EJB/Std EJB/WS Eclipse Jazz CDMS

% o

f ca

ptur

ed ins

tanc

es

5% 10% 20%

IBM App

Page 26: Blended Program Analysis for Improving Reliability of Real

Outline §  Blended analysis for capturing effects of dynamically generated or dynamically loaded code (JavaScript)

APSEC Keynote, Dec 2012, BG Ryder 26

Page 27: Blended Program Analysis for Improving Reliability of Real

JavaScript – Website Glue Code

§  JavaScript is lingua franca of client-side applications

§ 98 out of 100 most popular websites use JavaScript (Guarnieri et al, ISSTA11)

§ Use of dynamic features is evident in websites (Richards et al, ECOOP11, PLDI10; Zorn et al, WebApps10)

APSEC Keynote, Dec 2012, BG Ryder 27

Page 28: Blended Program Analysis for Improving Reliability of Real

APSEC Keynote, Dec 2012, BG Ryder 28

Blended Analysis Paradigm for JavaScript

Application

Profile

Gather dynamically generated/loaded

code

Dynamic Calling Structure

Static Analysis

Pruned models

of methods

Page 29: Blended Program Analysis for Improving Reliability of Real

How to Profile a Website for Analysis? §  Run several executions of the website, to explore its behaviors

§  Each execution is comprised by a set of page traces

§  Gather all page traces for the same webpage together and consider them a single JavaScript program for analysis

APSEC Keynote, Dec 2012, BG Ryder 29

Page 30: Blended Program Analysis for Improving Reliability of Real

JavaScript Analysis Framework

APSEC Keynote, Dec 2012, BG Ryder 30

Execution collector

Trace selector

Solution integrator

Static analyzer

`

Dynamic analysis

Static analysis

JS Program Solution

Page 31: Blended Program Analysis for Improving Reliability of Real

Tainted Input Analysis

§  Integrity violation – allowing user input to reach a sensitive operation

§ Tainted Source: user has control of its value § Sensitive operation: can affect behavior of website or browser

§ Source-sink pair reported if there is a possible dataflow between them (ignoring sanitizers)

§  Hyp: blended tainted input analysis will report fewer false alarms and more true positives than pure static tainted input analysis

APSEC Keynote, Dec 2012, BG Ryder 31

Page 32: Blended Program Analysis for Improving Reliability of Real

Benchmarks1

APSEC Keynote, Dec 2012, BG Ryder 32

Website Page count Trace count bing.com 19 30 twitter.com 14 30 linkedin.com 12 30 qq.com 18 30 wordpress.com 23 30 sina.com.cn 16 30 163.com 22 30 cnn.com 18 30 msn.com 18 30 conduit.com 10 16 imdb.com 10 18 myspace.com 10 24 sohu.com 10 19 xing.com 10 18 xunlei.com 10 22 zedo.com 10 16 washingtonpost.com 10 27 pconline.com.cn 10 21 Average 14 25

Page 33: Blended Program Analysis for Improving Reliability of Real

Blended vs Pure Static Analysis

APSEC Keynote, Dec 2012, BG Ryder 33

Pure static solution Blended solution

Part of static solution not found by blended

Part of solution from dynamically loaded or generated code

Shaded area is part of solution found by pure static and blended

Page 34: Blended Program Analysis for Improving Reliability of Real

Tainted Input Results

APSEC Keynote, Dec 2012, BG Ryder 34

Website Pure Static Pure Static Blended Blended true soln false alarm true soln false alarm

live.com 1 youtube.com 1 1 myspace.com 1 sohu.com 2 1 2 xunlei.com 3 3 msn.com 1 bing.com 1 totals 6 2 9

false positives

false negatives

a b c

Page 35: Blended Program Analysis for Improving Reliability of Real

Statement-level Side-effects (ST-MOD)

For Program Understanding

§ Want to know the number of objects whose f field value may be changed at statement: x.f= § Helpful in navigating unfamiliar code

§  How solve? §  Find which objects o that x can point to §  Find which objects o.f can point to

APSEC Keynote, Dec 2012, BG Ryder 35

Page 36: Blended Program Analysis for Improving Reliability of Real

Benchmarks2

APSEC Keynote, Dec 2012, BG Ryder 36

Website Page count

Trace count

eval page

variadic function

google.com 203 2104 52 177 facebook.com 138 1098 23 65 youtube.com 122 579 19 29 yahoo.com 52 265 21 13 baidu.com 49 147 6 16 wikipedia.org 67 130 0 3 live.com 54 226 10 44 blogger.com 24 146 6 7 totals 709 4695 137 354

Page 37: Blended Program Analysis for Improving Reliability of Real

Comparison: Pure Static to Blended Solutions (points-to)

APSEC Keynote, Dec 2012, BG Ryder 37

Website

% Coverage of pure static

solution %Additional

results google.com 89.7 5.9 facebook.com 85.3 7.5 youtube.com 89.1 9.9 yahoo.com 78 9.8 baidu.com 93 6.7 wikipedia.org 92.1 none live.com 81.8 7.5 blogger.com 83.8 1.4 mean 86.6 7.0

a b c

Page 38: Blended Program Analysis for Improving Reliability of Real

ST-MOD Solutions

APSEC Keynote, Dec 2012, BG Ryder 38

Websites Pure Static Blended Average

number of objects in static code

Average number of objects in static code

Average number of objects in

dynamic code

google.com 5.8 2.4 2.1 facebook.com 7.7 4.1 3.6 youtube.com 5.9 3.5 2.4 yahoo.com 5.2 2.5 2.8 baidu.com 2.6 1.4 1.8 live.com 2.9 1.6 2.2 blogger.com 4.5 2.8 2.3 average 4.9 2.6 2.5

Page 39: Blended Program Analysis for Improving Reliability of Real

Summary § Modern SW applications require new approaches to program analysis - the engine behind SW tools

§  New blended analysis paradigm seems promising for handling the more dynamic programming constructs

§ Performance diagnosis of framework-based apps § Understanding of website codes for enhancement

§ More experimentation and investigation needed to select best analysis for specific use

APSEC Keynote, Dec 2012, BG Ryder 39

Page 40: Blended Program Analysis for Improving Reliability of Real

Challenges §  Expect more combinations of static and dynamic analyses for web & mobile apps

§  Challenge: analyzing 3rd party apps (executables) in combo with known codes

§  Challenge: analyzing event-driven codes §  Challenge: increased use of explicit concurrency will require new tools and supporting analyses

APSEC Keynote, Dec 2012, BG Ryder

40

Page 41: Blended Program Analysis for Improving Reliability of Real

Questions?

Thank You

APSEC Keynote, Dec 2012, BG Ryder 41