57
Query-Based Debugging Raimondas Lencevicius Department of Computer Science, UCSB

Query-Based Debugging Raimondas Lencevicius Department of Computer Science, UCSB

  • View
    222

  • Download
    3

Embed Size (px)

Citation preview

Query-Based Debugging

Raimondas Lencevicius

Department of Computer Science, UCSB

2

Debugging of OO Programs

• Symbolic debugging– Control flow debugging– Object state monitoring– Data breakpoints– Conditional breakpoints

• Debugging of abstract relationships?– Complex object relationships

3

Debugging Object Relationships

• Programmers need to find objects violating relationships– “Are there any windows that do not reference

some child widget?”

• Current debuggers provide only low-level views

• Programmers have to write special testing code

4

Goals of Query-Based Debugging

• Make debugging of data structures easier by answering questions about object relationships

• Explore unfamiliar programs

• Find data structure errors as soon as they occur

5

Query-Based Debugging

• Ask common questions about program state

• Quickly access sets of interesting objects

• Check properties of large groups of objects using single query

• Answer queries while program is running

• Provide functionality efficiently

6

Window

Widgets

Program:

Graphical user interface:

window widget1widget collection

parent window

widget2

Windows and Widgets

7

Query Example

• “Are there any windows that do not reference some child widget?”

window widget1widget collection

parent window

8

Talk Overview

• Query case study

• Query model

• Implementation of debugger

• Dynamic queries

• Experimental results

• Future work

• Conclusions

9

Java Compiler - Case Study

• Goal: understand and debug Java subset compiler written for UCSB compiler course

• Variety of queries– “Can the current lexer token refer to an

unitialized token?”– “Can identifiers declared in the same scope

have the same name and type?”– “Can methods have the same name?”

10

Java Compiler - Case Study

• “Can methods have the same name?”• Experiment with input file containing such

methods:…

static int isOne(int c)

{ return 0;}

static int isOne(int c)

{ return 1; }

11

Java Compiler - Case Study

• “Can methods have the same name?”• Debugger gives positive answer

• But not a program error– Compiler finds duplicate methods in later phase

SemanticException: The name `isOne' at line 27 chars 14 to 20 was already declared.

MethodDeclaration

public Id name >> “isOne”…

Code >>…(ReturnStmt,Num"0")...

MethodDeclaration

public Id name >> “isOne”…

Code >>…(ReturnStmt,Num”1")...

12

Java Compiler Example Summary

• Explore unfamiliar program

• Find a possible error– Further program investigation shows that there

is no error

• Use query as invariant to verify program’s execution– Dynamic query

13

Talk Overview

• Query case study

• Query model

• Implementation of debugger

• Dynamic queries

• Experimental results

• Future work

• Conclusions

14

Query Model• Widget wid; Window win.

(wid.window == win) && (! win.widgetCollection.contains(wid))

Search domain

Constraint expression in conjunctive form

• Arbitrary boolean constraint expression• Assumption: side-effect free methods

• Selection and join queries

15

Java Compiler Example

• “Can methods have the same name?”MethodDecl x y.(x.name.spelling == y.name.spelling)&& (x != y)

16

Talk Overview

• Query case study

• Query model

• Implementation of debugger

• Dynamic queries

• Experimental results

• Future work

• Conclusions

17

Static Query Implementation

Query string

Intermediate form Optimized form Generated code

Domain collections

Variable types

Domain sizes

User input

Parser Optimizer

Domaincollector

Code generatorExecution module

GUI Output

18

Overview of Implementation

• Enumeration primitive: finds all instances of domain

• Join ordering: finds good order to evaluate query

• Hash joins: speed up equality constraints

• Incremental delivery: shows first result early

19

Query Execution

(d.contains(m))?

Declaration d

Method m

x1 m2x1 m2

d1 m2

CallExpression ce

(ce.decl == m)?ce1x1 m1

ce1x1 m1ce1d1 m1

“Find all declared methods returning integers and called at least once”

Declaration d; Method m; CallExpression ce.(d.contains(m)) && (ce.decl == m) &&(m.typeName != “int”)

20

Join OrderingInefficient ordering

Efficient ordering

10%

2000 200

10

200

1001%

10%10

20010

200

1001%

21

Join Ordering

• Join execution order significantly influences performancececil_method a b; cecil_formal c d. (a.formals.includes(c)) && (b.formals.includes(d)) && (c.name == d.name) && (a != c) && (b != d)

– Naïve evaluation of Cartesian product is slow– Straightforward order takes 37 seconds– Optimized order takes 6 seconds.

• Problem is NP-complete

• System uses heuristics

22

Hash JoinsNested-loop joins

Hash joins

200

X = Y 20,000 operations

100

X = Y100200

300 operations

23

Incremental Delivery

Declaration d

Method m

x1 m2x1 m2

d1 m2

CallExpression ce

ce1x1 m1ce1x1 m1

ce1d1 m1

• Show first result early by pushing intermediate results through pipeline

(d.contains(m))?

(ce.decl == m)?

24

Incremental Delivery

• Goal: fast response for most queries

• Pipelining– Joins are separate threads connected in pipeline

by limited-size buffers– Thread blocks on empty input or full output– Scheduler prefers threads closer to the end of

pipeline

• Time-slicing– Interrupt “slow” threads and reschedule

25

Talk Overview

• Query case study

• Query model

• Implementation of debugger

• Dynamic queries

• Experimental results

• Future work

• Conclusions

26

Gas Tank - Case Study

• Goal: to debug a gas tank simulation applet

• Inter-object constraints– Molecules should stay inside the gas tank– Molecules should not occupy the same position

27

Gas Tank - Case Study

• Detecting an error is not enough

• What code led to this error?

• Need dynamic queries!

Blue molecule x = 20, y = 25 Red molecule x = 20, y = 25

28

Gas Tank - Case Study

• Dynamic query finds error in Move methodpublic void move() {… x += (int)(v*Math.cos(dtor(dir)));y += (int)(v*Math.sin(dtor(dir))); …

• Fix the errory += (int)(v*Math.sin(dtor(dir)));if collided() then handleCollision();

• But debugger still shows an error• Exclude “atomic” regions

29

Motivation of Dynamic Queries

• Close cause-effect gap between error and its discovery– Errors are reported as soon as they occur

• Display dynamics of objects’ relationships - visualization

• Perform continuous invariant or assertion checks

30

Dynamic Query Implementation

Query Results

Java Program

Query String and Change Set

Custom Class Loader

Standard Java Virtual Machine

CustomDebugger Code

Instrumented Java Program

DebuggerLibrary Code

31

Implementation of Dynamic Queries

• Monitor changes that affect query result

• Invoke debugger when change occurs

• Reevaluate query efficiently - incrementally

32

Change MonitoringMolecule m1, m2.(m1.x == m2.x) && (m1.y == m2.y) && (m1 != m2)

• When to reevaluate?– What to monitor?

• Change set - objects and fields affecting result of query– Domain objects– Referenced fields Molecule <init>, x, y– Objects and fields referenced in methods

33

Instrumentation…x += … ; …

22: iadd

23: putfield 37

26: aload_0

Compile

Load and Instrument

22: iadd

23: invokestatic debug

26: aload_0

Molecule m1, m2.(m1.x == m2.x) && (m1.y == m2.y) &&

(m1 != m2)

public final class DebuggingCode implements RunTimeCode {

public static void debug(Molecule updatedObject, int newValue) { … updatedObject.x = newValue; // replaces putfield 37 QueryTool.runTool(updatedObject); // invokes query evaluator }}

34

Implementation of Monitoring

• Java bytecode instrumented during load time– Custom class loader

– Uses modified class file handling tools from BCA library

• Creation and deletion of domain objects– Creation monitored by instrumenting constructors

– Deletion handled by GC - not implemented yet

• Modification of change set fields– Instrumentation of field assignments

35

Efficient Query Reevaluation

• Same techniques as static queries– Join ordering

– Hash joins

• Incremental reevaluation

• Custom code generation for selection queries

36

Incremental ReevaluationOriginal query: A * B * C

Incremental query: A * B * C

200 200

10

200

10010%

10%1 1

1

10

1001%

1%

Old results

37

Query Reevaluation Optimizations

Molecule m1, m2.(m1.x == m2.x) && (m1.y == m2.y) && (m1 != m2)

• Same value assignments

– Do not change result - no reevaluation required

• Fast selection queries– Lean custom code

… x = 5; …x: 5

Molecule m

38

Talk Overview

• Query case study

• Query model

• Implementation of debugger

• Dynamic queries

• Experimental results

• Future work

• Conclusions

39

Static Query Experiments

• Setup: Sun Ultra 2/200 (200 Mhz UltraSparc) running modified Self 4.0

• Queries– Self GUI– Cecil compiler– Synthetic stress tests

• Different query structures

40

Static Query Evaluation Time20.7

5.9

0

0.5

1

1.5

2

2.5

3

3.5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Query number

Tim

e (s

ec)

Completion TimeResponse TimeTranslation TimePrimitive Time

Self GUI Cecil comp Points and rectangles

12 x 146 x 370

11K x 4.5K hash join

4.5K x 4.5K

1804 join

Costly selection

41

Discussion of Static Query Experiments

• Most queries take less than a second to execute

• Join ordering heuristic performs well

• Hash joins can speed up execution

• Incremental delivery decreases response time

42

Discussion of Results

• Query 17– 5,000x5,000 = 25,000,000 checks

• Query 18– Complex, large intermediate results

43

Dynamic Query Experiments• Implemented in fully portable Java 1.2• Setup: Sun Ultra 2/2300 (300 Mhz UltraSparc II) running

Sun Solaris Java 1.2 with JIT compiler

• Queries– Gas tank

– Decaf compiler

– SPECjvm98 applications:

– Synthetic stress test microbenchmarks

• Jess expert system

• compress

• Ray tracer

44

Program Slowdown - Selections

• Overhead does not depend on domain size

• Query 4:z.OutCnt < 0Queries 5-6: z.count() < 0,Query 7:z.costlyMathCount(0)

• Query 12: point.radialDistanceGreaterThan(100M)

1 2 3 4 5 6 7 8 9 10 11 120

0.5

1

1.5

2

2.5

3

3.5

Slo

wd

ow

n

Query number

5.83

Decaf

Gas tank

Jess

Compress

Ray tracer

Invocation frequency

1.9M/s

2.3M/s

45

Program Slowdown - Joins

• Practical for infrequent invocations

Size Slowdown Invocationfrequency

Gas tank 33x33 hash join 2.13 54K

Decaf 120Kx600 hash join 3.43 25K

Ray tracer 85Kx8K hash join 229 350K

Compress 1x1 hash join 157 1.5M

Compress 1x1 join 77 2.6M

Micro benchmark 1x20 hash join 228 40M

Microbenchmark 1x20 join 930 42M

46

Discussion of Dynamic Query Experiments

• Selections are efficient

• Join queries practical for infrequent evaluations and small query domains

• Can we predict debugger performance for wide class of queries?– Query execution model

47

Performance Model

Tinstrumented = Toriginal (1 + Tevaluate * Fevaluate)

• Slowdown depends on– Frequency of debugger invocations

– Selections: Tevaluate = 131 ns - 4.26 s

– Joins: Tevaluate = 5.7 s - 546 s

48

Field Assignment Frequencies

• Microbenchmark: 40M assignments per second• SPECjvm98 suite

– Max frequency: 1.9M assignments per second in compress

– 95% fields have < 100K assignments per second

0.1

0.5 1 5

10

50

10

05

00

10

00

50

00

10

K5

0K

10

0K

50

0K

1M

2M

0

10

20

30

40

50

60

70

80

90

100

Cu

mu

lativ

e p

erc

en

tag

e o

f fie

lds

Field assignment frequency

0.1

0.5 1 51

05

01

00

50

01

00

05

00

01

0K

50

K1

00

K5

00

K1

M 2M

0

50

100

150

200

250N

um

be

r o

f fie

lds

Field assignment frequency

49

Selection Slowdown Estimates

• 500K assignments per second

– 6.5% overhead for Tevaluate = 130 ns

– 313% overhead for Tevaluate = 4.26 s

• 95% fields have < 100K assignments per second

– 43% overhead for 4.26 s selection constraints

0.1

0.5 1 5

10

50

10

0

50

0

10

00

50

00

10

K

50

K

10

0K

50

0K

1M

2M

0

1

2

3

4

5

6

7

8

9

10

Slo

wd

ow

n

Field assignment frequency

Low cost

High cost

50

Summary of Dynamic Queries

• Selection queries are efficient– Less than factor 2 slowdown in experiments

including stress tests– Projected less than 43% overhead for most

selection queries

• Join queries are efficient for infrequent evaluations– 2-930 factor slowdown on join queries

51

Related Work• Extensions to symbolic debuggers

– Limited queries on objects [Sefika et al., Hart et al.]

– Script based visualization of data structures [Duel]

– Data structure animation [HotWire]

– Instance filtering and reference visualization [Look!, DDD]

– Method call visualization [Program Explorer, Object Visualizer]

• Rule-based extensions of OO languages [R++]

• Software visualization [Balsa-Zeus, Tango-Polka, Pavane]

• Database query optimization [Ibaraki and Kameda, Krishnamurthy et al., Swami and Iyer]

52

Future Work• Functionality extensions

– Support for projection, arbitrary computations– Supporting on-the-fly debugging– Distributed query-based debugging– Safe update points

• Execution optimizations– Delaying monotonic updates– Lookup caches

53

Conclusions• New approach to debugging

– Quick access to sets of interesting objects

– Efficient way to check properties of large groups of objects using single query

– Instant error alert with dynamic queries

• Good performance– Most static queries execute in one or two seconds

– Most dynamic selection queries slow down programs less than 43%

54

Further Information

• Query-Based Debugginghttp://www.cs.ucsb.edu/~raimisl/DQBD.html

OOPSLA’97 and ECOOP’99 papers

• Researchhttp://www.cs.ucsb.edu/~raimisl/Research.html

[email protected]

55

Static Query Evaluation Time20.7

5.9

0

0.5

1

1.5

2

2.5

3

3.5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Query number

Tim

e (s

ec)

Completion TimeResponse TimeTranslation TimePrimitive Time

56

Program Slowdown

• Other join queries - 77-229 slowdown

• Microbenchmark

– Selection - 6.4 slowdown

– Hash join - 228 slowdown

– Nested join - 930 slowdown

1 2 3 4 5 6 7 8 9 10 11 12 13 140

0.5

1

1.5

2

2.5

3

3.5

Slo

wd

ow

n

Query number

5.83

Decaf

Gas tank

Jess

Compress

Ray tracer

57

Breakdown of Query Overhead

• 76% Evaluation time

• 17% Loading

• 7% Garbage collection (128M heap)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

10

20

30

40

50

60

70

80

90

100

Ove

rhe

ad

pe

rce

nta

ge

Query number

Loading

GC

First evaluation

Evaluation