1 Chains of Evidence (Thesis Proposal) Tim Halloran William L. Scherlis (advisor) James D. Herbsleb Mary Shaw Joshua J. Bloch, Sun Microsystems Inc. A

1

Chains of Evidence

(Thesis Proposal)

Tim Halloran

William L. Scherlis (advisor)James D. HerbslebMary ShawJoshua J. Bloch, Sun Microsystems Inc.

A Programmer-Oriented Approach to Assurance of Mechanical Program Properties

2

A thesis proposal should:

Explain the basic ideas of the thesis topic Argue why the topic is interesting

I.e., scientific value and engineering impact State what kinds of results are expected Argue that these results are obtainable

within a reasonable amount of time Demonstrate the student’s personal

qualifications for doing the proposed work

3

A “bug” description

I got a NullPointerException in CallStackRootNode.CallStackChildren.changeChildren() where the CallStackProducer returned a null Location[]. Now in this case my code is incomplete but it seems to me that there is a case for the producer not being able to furnish a stack, or the filter filtering it all out in which case some placeholder Location[] needs to be created and displayed.

NetBeans Bug Report #31423

4

The “answer”

I do not understand, you would prefer to return null rather than new Location[0]? There is something like a convention here that we prefer to not return null values from functions—its more “safe”

What is the problem here?

NetBeans Bug Report #31423

5

Loss of design intent

People leave and join software teams Documents become out of date and inconsistent

Models are missing Source code becomes the only authoritative

system artifact Maintainability suffers because code does not reveal

all the design intent behind it Quality suffers because programmers make mistakes

complying with tacit or informally expressed design intent

6

What models are missing? Low-level models of design intent about

“mechanical” program properties not expressible in the language Focus on bureaucratic aspects of a program

E.g., concurrency policy, exception policy, mutability policy, type use policy, static program structure

Rather than functional ones E.g., correctly sorting a data structure or correct

computation of a value

We hypothesize that expression and assurance of mechanical program properties can provide great value

/** @typerecommendation Collection, List */public class ArrayList extends...

7

Reasons for this problem

Missing capability in today’s languages, models, tools, and processes to Express and capture intent Assure our implementations are faithful to that intent

Worse, we don’t know how to keep intent consistent with as-built reality of a system as both evolve.

My research addresses both these problems

8

NetBeans “bug” example

Capture the intent that the getCallStack() method should never return a null Location[]

Annotate the interface as follows:

package org.netbeans.modules.debugger;

public interface CallStackProducer extends CallStackRoot { ... public /*@not-null*/ Location[] getCallStack(); ...}

Programmers might overlook the annotation or not be confident they followed it—to address this problem our approach uses a tool to statically assure consistency

9

NetBeans “bug” example We can go further and annotate that filterCallStack()

(within the CompactCallStackFilter class) should not raise NullPointerException

/** @never-throws java.lang.NullPointerException */public Location[] filterCallStack( /*@not-null*/ CallStackProducer producer) { ... Location[] stack = producer.getCallStack(); ... int i, k = stack.length; ...}

Now our programmer who reported NetBeans bug #31423 and implemented getCallStack() to return null could be informed two models of design intent are violated

10

NetBeans “bug” example – steps to assurance

1. Evidence filterCallStack() does not raise NullPointerException assuming the @not-null annotations are valid

2. Evidence that calls to the filterCallStack() method will never pass null in the producer parameter

3. Evidence that all implementations of getCallStack() never return null

Evidence that each step is valid, given its assumptions, is gathered by semantics-based program analysis

Each individual piece of evidence is useful to the programmer…but they can be linked

1

2

3

11

NetBeans “bug” example – key points

Each step becomes a “link” of evidence we are able to “chain” together to give us program assurance.

Our annotations capture design intent and serve as cut-points for program analysis

The program properties in the example are confusing to the programmer because they are non-local

12

Adoption in practice

Consistency management Stepwise approach to consistency Support real-world inconsistencies

Avoiding programming language change “Rising tide of abstraction” Support extra-language assurance

User experience Different from a compiler

Assurance selection—Where can we help?

24 October 2002 Post on Apache Jakarta General Mailing List: Most of the automated code metrics I read complain about things like “duh its an API of course its an unused class”—“or duh it a development utility or test case which isn’t MEANT to be flexible”A Follow-up: Exactly! Stuff like “This class is unused”—no, it’s just specified in a properties file somewhere and the static analysis is not picking that up! A couple of false positives like that and people start ignoring the tool. At least I do.

13

Research goals

Effective capture of implementation-level design decisions, incrementality, and tool supported consistency management

Assurance of properties not addressed by widely used programming languages

Design of an effective user experience for extra-language assurance

Understanding defects in widely deployed open source Java projects to understand where we can have the largest impact

14

Outline

Introduction Thesis Statement Approach Hypotheses Preliminary Work Validation Schedule Expected Contribution

•Loss of design intent•NetBeans “bug” example•Adoption in practice•Research goals•Chains of evidence

15

Chains of evidence

Proofs that a software system satisfies the theorem that programmer-expressed models of design intent are consistent with source code Models constructed from annotations within

code and other documentation and focused on mechanical program properties

Assurance is formed by linking together “chains” forged from small “links” of evidence about the software system

16

Chains of evidence

Partial chains of evidence are essential—they enable focused engagement with the programmer to determine if The design intent is wrong The design intent is incomplete The source code is wrong The program analysis algorithms (due to

limitations) have insufficient information to provide a result

17

Assurance spectrum of chains of evidence

Chains of Evidence – Assurance Focus(tractable)

- Scalability of Assurance Technique +

Sem

antic

“D

epth

” of

- D

esig

n In

tent

Ass

ured

+

TypeChecking

ProgramVerification

Concurrency Policy (Greenhouse)Thread Coloring (Sutherland)

Java Best PracticeProgram Structure

Exception PolicyMutability Policy

Type Use Policy

18

Thesis statement

Chains of evidence enables assurance of useful mechanical properties about programs with respect to explicit models of design intent, and that the approach has the potential to be scalable and practical for working programmers to adopt

19

Key ideas

A set of representative and substantive assurances available as part of our prototype tool is necessary to show feasibility and flexibility of our approach

An effective architecture for chains of evidence is required to organize assurance results and scale up to large Java systems

An effective user experience is needed to elicit design intent from and communicate assurance results to programmers

20

Key ideas

A prototype tool set within the context of a Java IDE enables evaluation of the effectiveness of our approach

Selection of what design intent to model and how to assure it can be empirically informed through (formative) analyses of bug and quality practices and (evaluative) analysis and tool use

A business case analysis can show cost-effectiveness of our approach and assurances

21

Approach

Develop an architecture, framework, tools, and user experience for chains of evidence*

Develop specific assurances Conduct three empirical investigations Business case analysis

22

Assurance development

Concurrency Policy * Mutability Policy API Protocol Policy NullPointerException

Policy Alias Policy Types and Their Use * Program Structure

Research challenge to design, using state-of-the-art program analysis, substantive assurances along a

representative set of points on our curve

Chains of Evidence – Assurance Focus(tractable)

- Scalability of Assurance Technique +S

eman

tic “

Dep

th”

of-

Des

ign

Inte

nt A

ssur

ed +

TypeChecking

ProgramVerification

Concurrency Policy (Greenhouse)Thread Coloring (Sutherland)

Java Best PracticeProgram Structure

Exception PolicyMutability Policy

Type Use Policy

23

Empirical investigations

Survey of open source Java bugs (39,463) Understand: “Where help is needed most?” 2 phases: bug selection and bug analysis

Sophomore experiment Hypothesis: “Violations of Java best practice

correlate with software defects” Prototype use studies

Qualitative use studies of our prototype tool Understand utility and practicality of chains of

evidence

24

Business case analysis

Cost/Benefit Analysis (in the sense of Reifer) to evaluate the programmer time and effort required to provide and maintain design models as compared with the costs of using current techniques Done for each individual assurance (eases

identification of state-of-the-practice techniques that address similar concerns)

25

Hypotheses

Safe evolution of software systems can be carried out with less up-front effort using our incremental approach then in approaches that rely on full functional specification Qualitative use studies of our prototype tool

Bugs of a non-local character (e.g., concurrency) are more difficult for programmers to solve and have great significance to engineering success Survey of open source Java Bugs

26

Hypotheses

Cut points are feasible to provide scalability for a wide range of important program analyses Assurance development Program analysis theory

Similar techniques can be used for assurances of model compliance and assessment of Java best practice (in the sense of Bloch) Architecture for chains of evidence Prototype tool coupled with assurance development

27

Hypotheses

Violations of Java best practice correlate with software defects (and overall bad software quality) Sophomore experiment

Model compliance is a cost-effective approach to improve software quality Business case analysis

Consistency management can be an independent function that is not coupled to program analysis Architecture for chains of evidence Consistency management (part of user experience)

28

Evidence of Feasibility —Preliminary Work

Two preliminary assurance prototypes “Models of Thumb” Demonstration of lock policy assurance

Preliminary Architecture Third prototype

Empirical investigations Survey of open source quality practices Preliminary survey of Java bugs

29

“Models of Thumb”

Assurance that Java “rules of thumb” are followed Two cases investigated on 2 million SLOC corpus

Ignored exceptions

Overspecific variable declarations

ArrayList results = new ArrayList();

try { ...} catch (Throwable t) { ;}

30

31

Tomcat: 230 ignored exceptions

32

33

34

Tomcat: 485 overspecific variable declarations

35

36

User Experience

Early prototype reported the following:

Mimicking compiler error message reporting Not effective for extra-language assurance

Negative focus, no rationale, no next step

Extension.java [line 297] changeFROM: ArrayList results = new ArrayList(); TO: List results = new ArrayList(); WHY: Use most abstract interface possible

Research challenge to design an effective user experience for extra-language assurance

37

Rationale

38

Flexible Organization of Results

39

Empirical Results

Name kSLOC

Overspecific Variable Declarations Ignored Exceptions

Variable Decl.Uses (u)

Violations Found catch Block Uses (u)

Violations Found

# %u /kSLOC # %u /kSLOC

Ant 64 13,953 434 3 6.7 916 163 18 2.5

Tomcat 66 13,970 485 3 7.3 964 230 24 3.5

J2SDK 1.4 508 116,397 3,650 3 7.2 3,239 686 21 1.4

NetBeans 571 99,201 5,851 6 10.2 5,085 1,048 21 1.8

Eclipse 792 178,872 8,325 5 10.5 6,511 1,110 17 1.4

Subtotal: 2,001 422,393 18,745 4 9.4 16,715 3,237 19 1.6

Whiteboard 38 6,823 1,205 18 28.0 199 40 20 1.4

Total: 2,039 429,216 19,950 5 9.8 16,914 3,257 19 1.6

40

Ignored Exceptions: Why?

Name

catch block Uses (u)

Ignored Exceptions

Total (t) Commented

# %u # %t

Ant 916 213 23 59 28

Tomcat 964 248 26 66 27

J2SDK 3,239 744 23 291 39

NetBeans 5,085 1,241 24 443 36

Eclipse 6,511 1,275 20 440 35

Tomcat

Sample of 50

Ignored exception

# %

Unfinished exception handling 1 2

Catch of an overly-broad exception

5 10

Unsure [comment or log] 8 16

Default-try-catch [comment] 9 18

Thread: InterruptedException 8 16

IO: IOException [close()] 7 14

Test code [wrapping test] 3 6

OK, well commented [not formal]

9 18

We sampled 50 ignored exceptions from Tomcat and

Eclipse and found roughly 90% are false positives (program

correctness only) -Explicit design intent needed

41

Greenhouse Concurrency Assurance

Assurance obtained•All accesses to shared fields are protected with the correct lock•All lock preconditions are satisfied for method calls that require callers to hold locks•Constructor does not allow references to escape (i.e., avoiding leakage)

42

Greenhouse Concurrency Assurance

Assurance obtained•All accesses to shared fields are protected with the correct lock•All lock preconditions are satisfied for method calls that require callers to hold locks•Constructor does not allow references to escape (i.e., avoiding leakage)

- Complex design intent models- What is the next step?

43

Prototype problems Difficult to understand the network of analyses that make up an

assurance Difficult to reuse portions of an assurance for another assurance No separation between data used to calculate results and actual

results No benefit from building up assurances from smaller assurances No standard approach to communicate results to higher-level

analyses No standard approach to communicate results to the user interface No standard ability to maintain assurance as the software or the

design intent model is being changed by a programmer within the IDE (i.e., truth maintenance)

Diagnosis: Our architecture is wrong

44

Toward an Architecture forChains of Evidence

Preliminary architecture for chains of evidence is based upon: A categorized

blackboard A truth maintenance

system A network of

program analysis components

Regiondesign intent

Lock policydesign intent

Lock policyassurance

Thread coloringdesign intent

Thread coloringassurance

Ignored exceptionassurance

Ignored exceptiondesign intent

OK to ignoreInterrupedException

within fluid.ex.*

OK: IgnoredInterrupedException

on line 13 of Foo.java

ISSUE: IgnoredIOException on

line 56 of Bar.java

Sea Blackboard

Developed a feasibility prototype

45

Toward an Architecture forChains of Evidence

Preliminary use has found this design: Enhances a programmer’s ability to

understand and react to tool results Allows a separation of analysis results and

design intent model information Provides efficient maintenance of assurance

as models and program code evolve

Research challenge to design user experience, evaluate (and enhance) scalability and flexibility

46

Validation

Prototype Tool Capabilities Assurance Soundness Empirical evidence of adoptability & utility

Bug survey, and Prototype use studies Cost-Effectiveness

Business case analysis

Chains of evidence enables assurance of useful mechanical properties about programs with respect to explicit models of design intent, and that the approach has the potential to be scalable and practical for working programmers to adopt

47

ScheduleDate Milestone Tasks

Jul 2003 Architecture completed and documentedPrototype using updated architectureRepresentative program assurances designedRefine automatic selection for Java bug survey

Aug 2003 Draft ICSE paperComplete sophomore experiment plan

Sep 2003 Complete Java bug survey

Dec 2003 Complete and document sophomore experimentComplete representative program assurances

Jan 2003 Begin prototype tool use studiesBegin dissertation draft

Mar 2003 Dissertation draft completed

May 2003 Prototype tool use studies completed and documentedOral and written thesis defense

48

Expected Contributions

I expect to provide an effective architecture, framework,

tools, and user experience for chains of evidence,

demonstrate the usefulness of the of the framework for representative assurances,

provide an empirically informed assessment of the potential for adoption, and

qualitatively demonstrate cost effectiveness

49

Chains of Evidence

Tim Halloran

William L. Scherlis (advisor)James D. HerbslebMary ShawJoshua J. Bloch, Sun Microsystems Inc.

A Programmer-Oriented Approach to Assurance of Mechanical Program Properties

Questions

50

Backup Slides

51

My Proposal in One Slide Problem: Increasing source code quality and assurance thereof Idea: Chains of Evidence—A tool-supported method to assist

programmers in expressing models of low-level design intent and assuring their consistency with code

Preliminary Results: Java best practice and concurrency policy prototypes Architecture for managing chains of evidence Survey of open source quality practice and Java bugs

Approach(—demonstrating potential for): Develop a set of substantive assurances—feasibility & flexibility Develop architecture—scalability Design an effective user experience—adoption Develop prototype tool in Java IDE—feasibility Empirical investigation (bug survey, experiment)—adoption/impact Develop a business case analysis—practicability

52

Chains of Evidence

A well-formed code base using chains of evidence includes: A collection of source code A set of low-level design models that address

semantic properties significant to the mechanical attributes of code

A linkage of the code base with models assuring consistency