14
OOPSLA 2005 Workshop on Library-Centric Software Design The Diary of a Datum: An Approach to Modeling Runtime Complexity in Framework-Based Applications Nick Mitchell, Gary Sevitsky (speaker), Harini Srinivasan IBM T.J. Watson Research Center Oct. 16, 2005

OOPSLA 2005 Workshop on Library-Centric Software Design The Diary of a Datum: An Approach to Modeling Runtime Complexity in Framework-Based Applications

Embed Size (px)

Citation preview

Page 1: OOPSLA 2005 Workshop on Library-Centric Software Design The Diary of a Datum: An Approach to Modeling Runtime Complexity in Framework-Based Applications

OOPSLA 2005 Workshop on Library-Centric Software Design

The Diary of a Datum:An Approach to Modeling Runtime Complexity in Framework-Based Applications

Nick Mitchell, Gary Sevitsky (speaker), Harini Srinivasan IBM T.J. Watson Research CenterOct. 16, 2005

Page 2: OOPSLA 2005 Workshop on Library-Centric Software Design The Diary of a Datum: An Approach to Modeling Runtime Complexity in Framework-Based Applications

IBM Research

OOPSLA 2005 Workshop on Library-Centric Software Design

Background

Applications are built more and more by integrating libraries and frameworks

– Lots of standard frameworks (J2EE, servlets, XML, JSPs, eMF, …)

– Plus industry-specific frameworks, in-house frameworks

Our research group has been diagnosing performance problems in large-scale framework-based Java applications for more than five years

– High volume web-based servers

– Client-side applications built on large frameworks like Eclipse

Page 3: OOPSLA 2005 Workshop on Library-Centric Software Design The Diary of a Datum: An Approach to Modeling Runtime Complexity in Framework-Based Applications

IBM Research

OOPSLA 2005 Workshop on Library-Centric Software Design

Problem It takes a lot of work to perform very simple tasks, even after tuning at the

application level

Source: SOAP client,Trade benchmark v.3.1

Copy to another

version of the

business object

Calendar*(business object field)

Date*(business object field)

bytes(SOAP)

Parse, set field

in business object

Cost:- 268 calls- 70 objects

*new objects

Zoom level: 0

Conversion of a stock purchase date field from SOAP to a Java business object field

Page 4: OOPSLA 2005 Workshop on Library-Centric Software Design The Diary of a Datum: An Approach to Modeling Runtime Complexity in Framework-Based Applications

IBM Research

OOPSLA 2005 Workshop on Library-Centric Software Design

What are these applications doing that is so expensive?

Not what you would expect.

Example: accessing the database?– Inefficiencies in multiple layers of frameworks to process queries are the source

of many performance problems.

Example: expensive sort algorithm?– More often the problem is in the coupling of the sort algorithm and the

comparator, or the sort algorithm and the UI framework that calls it

In general, problems are not due to poor algorithms. Nor are they located in a few hot methods or paths.

Page 5: OOPSLA 2005 Workshop on Library-Centric Software Design The Diary of a Datum: An Approach to Modeling Runtime Complexity in Framework-Based Applications

IBM Research

OOPSLA 2005 Workshop on Library-Centric Software Design

What is costing so much?

Most activity is transformation of data– To meet the requirements of framework APIs or external standards

Each transformation often contains many smaller transformations

Much effort is also spent facilitating these transformations– e.g. initializing converters or looking up schemas

Usually there is little or no change to the information content

Page 6: OOPSLA 2005 Workshop on Library-Centric Software Design The Diary of a Datum: An Approach to Modeling Runtime Complexity in Framework-Based Applications

IBM Research

OOPSLA 2005 Workshop on Library-Centric Software Design

From customer application: Diary of a timecard

parseXML

documentbusinessobject

MQmessage

extract content

Store inDB2

record

copy (andrepackage)

serializeserialized

J ava objectDB2blob

DB2 record

Cost of parse step:- 2000 calls- 300 objects

One timecard record has 11 fields Each step can be very expensive

– and usually contains many smaller transformations

Page 7: OOPSLA 2005 Workshop on Library-Centric Software Design The Diary of a Datum: An Approach to Modeling Runtime Complexity in Framework-Based Applications

IBM Research

OOPSLA 2005 Workshop on Library-Centric Software Design

How can we understand the sources of inefficiency and runtime complexity?

We would like to view a run in terms that make these transformations visible

– Existing performance tools are focused on control flow, and report in terms of methods, paths, packages.

– Most of the work in these applications is massaging data. This work doesn’t line up with methods, paths, packages.

We would like to understand the general causes of cost and complexity in these applications

– So we can compare diverse implementations

– So we can surface more general characteristics: API design practices, implementation practices, opportunities for automated optimization, etc.

– Existing performance tools only help find specific bottlenecks

Page 8: OOPSLA 2005 Workshop on Library-Centric Software Design The Diary of a Datum: An Approach to Modeling Runtime Complexity in Framework-Based Applications

IBM Research

OOPSLA 2005 Workshop on Library-Centric Software Design

Approach

Structure a run into a hierarchy of “diaries”– Organized according to the transformation of logical content

– e.g. flow of an Employee record from SOAP to Java to HTML

Metrics for cost and complexity

Manual approach right now– Lots of opportunities for automation

Allows insights into single implementations, and comparisons across diverse implementations

Page 9: OOPSLA 2005 Workshop on Library-Centric Software Design The Diary of a Datum: An Approach to Modeling Runtime Complexity in Framework-Based Applications

IBM Research

OOPSLA 2005 Workshop on Library-Centric Software Design

Example

Source: SOAP client,Trade benchmark v.3.1

Copy to another

version of the

business object

Calendar*(business object field)

Date*(business object field)

bytes(SOAP)

Parse, set field

in business object

Cost:- 268 calls- 70 objects

*new objects

Zoom level: 0

Conversion of a stock purchase date field from SOAP to a Java business object field

Page 10: OOPSLA 2005 Workshop on Library-Centric Software Design The Diary of a Datum: An Approach to Modeling Runtime Complexity in Framework-Based Applications

IBM Research

OOPSLA 2005 Workshop on Library-Centric Software Design

From Trade: Diary of a Date (SOAP parsing level)

Detail of just the first step of the previous slide

Parse (using SOAP CalendarDeserializer)

parse using Simple-Date-

Format

String* Date*

parse time zone and

millis; ref ormat without them

Cost:- 11 calls- 6 objects

add in timezone and millis

Dateextract

value f rom SOAP tag

bytes String*

Cost:- 30 calls- 3 objects

getschema

inf o

XML andJ ava types

BeanPropertyDescriptor

Cost:- 10 calls- 0 objects

get de-serializer

Cost:- 51 calls- 5 objects

Deserializer*

buildCalendar

Calendar*+ 11 arrays*+ TimeZone*

set time

Cost:- 7 calls- 1 object

Cost:- 15 calls- 15 objects

Calendar

Cost:- 95 calls- 39 objects

Cost:- 4 calls- 0 objects

ParsePosition*

TimeZone*(constant)

SimpleDateFormat+ Calendar

2 longs(TZ and millis)

Set business object fi eld via reflection

box into array

call invoke()

onsetter

Object[]*

Cost:- 6 calls- 1 object

Calendar

*new objects

Zoom level: 1

Page 11: OOPSLA 2005 Workshop on Library-Centric Software Design The Diary of a Datum: An Approach to Modeling Runtime Complexity in Framework-Based Applications

IBM Research

OOPSLA 2005 Workshop on Library-Centric Software Design

From Trade: Diary of a Date (Java SimpleDateFormat parsing)

Detail of SimpleDateFormat parse step from previous slide

extract and parse subfi eld

set fi eld in Calendar

int

String x 6 f orYY,MM, DD, ...

Calendar

compute time

create Datef romtime

long Date*

Cost:- 4 calls- 1 object

Cost:- 14 calls- 6 objects

Cost:- 1 calls- 0 objects

Cost:- 0 calls- 1 object

boolean[]**new objects

Zoom level: 2

Page 12: OOPSLA 2005 Workshop on Library-Centric Software Design The Diary of a Datum: An Approach to Modeling Runtime Complexity in Framework-Based Applications

IBM Research

OOPSLA 2005 Workshop on Library-Centric Software Design

From Trade: Diary of a year/month/day…

Detail of extract and parse subfield from previous slide Six transformations to parse a year!

Parse number using DecimalFormat.parse()

Parse long using DigitList.getLong()

extractdigits

String copy digits toString() parse box intValue()Digit-List

String-Buff er*

String* long Long* int

Cost:- 11 calls- 5 objectsCost:

- 4 calls- 3 objects- 600 instructions

Cost:- 1 call- 0 objects

Parse-Position*

boolean[]*

*new objects

Zoom level: 3

Page 13: OOPSLA 2005 Workshop on Library-Centric Software Design The Diary of a Datum: An Approach to Modeling Runtime Complexity in Framework-Based Applications

IBM Research

OOPSLA 2005 Workshop on Library-Centric Software Design

Metrics of cost and complexity

Cost: aggregate costs by transformation– Aids understanding by measuring

something accomplished.

– e.g. 268 calls, 70 objects to parse a field

Complexity: count transformations– Shows the complexity hidden in each step

– Histogram by level shows how far afield

– e.g. 36 transformations parsing subfields

These metrics enable comparisons across diverse implementations

Copy to another

version of the

business object

Calendar*(business object fi eld)

Date*(business object fi eld)

bytes(SOAP)

Parse, set fi eld

in business object

Cost:- 268 calls- 70 objects

*new objects

Zoom level: 0

Total transformations: 58Max depth: 3- depth 1: 8- depth 2: 14- depth 3: 36

Page 14: OOPSLA 2005 Workshop on Library-Centric Software Design The Diary of a Datum: An Approach to Modeling Runtime Complexity in Framework-Based Applications

IBM Research

OOPSLA 2005 Workshop on Library-Centric Software Design

Ongoing research

Validation by hand on applications (large and small examples)

Automation of structuring into diaries– Combination of static and dynamic analysis

– Automation will also enable further validation of approach

Classification of transformations– Developing a framework-independent vocabulary for what transformations

accomplish

– e.g. various kinds of change in physical representation

– e.g. various kinds of change in logical content

– Developing metrics based on classification

– Enables “descriptive characterization” of a run

– Also gives us a more formal definition of transformation