42
Explorations in the Grid/WS Computing Jungle A Narrative of the Reverse-Engineering Attempts of a “New” Distributed Paradigm François Taïani, Matti Hiltunen & Rick Schlichting Opera Talk, 6 May 2008, Comp. Dept., Cambridge University

Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

Explorations in theGrid/WS Computing Jungle

A Narrative of the Reverse-Engineering Attemptsof a “New” Distributed Paradigm

François Taïani, Matti Hiltunen & Rick Schlichting

Opera Talk, 6 May 2008, Comp. Dept., Cambridge University

Page 2: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

2

The best and safest method of philosophizingseems to be, first to inquire diligently into theproperties of things, and to establish thoseproperties by experiences and then to proceedmore slowly to hypotheses for the explanation ofthem. For hypotheses should be employed only inexplaining the properties of things, but not assumedin determining them; unless so far as they mayfurnish experiments.

Isaac Newton

Page 3: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

3

Context: Middleware Complexity

one request,2065 individual invocations,over 50 C-functions and 140C++ classes.

ORBacus RequestProcessing

Page 4: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

4

Web Services’ Bright New World

Grid Computing: federating resourcesWeb Services: integrating services

Globus (Argonne Lab.): reference implementation

Large, complex, collaborative middleware (IBM, Apache,...)

Very poor performances: Up to 30s to create a simple distributed object (counter) Up to 2s for a roundtrip remote add operation on this counter

over the Internet

Where does it come from? Does it tell us something about modern mw development?

Page 5: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

5

Globus

Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around

Argonne National Laboratory.

Globus is a high level “middleware” (software glue) It offers services to share/ use remote distributed “resources”

(CPU, memory, DB, bandwidth)

Since version 3.9.x use Web Service “connectivity” Web Services: “connectivity” middleware across the Internet Integration of services across organizations Related alphabet soup: SOAP, XML, WSDL, HTTP, .NET, etc.

Page 6: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

6

Exploration Goals

We wanted to understand Globus (at least its connectivity)

Huge piece of software (3.9.x): 123,839 lines in Java (without reused libraries) 1,908,810 lines in C/C++ (including reused libraries)

Many libraries / technologies involved: XML, WSDL (Descr. Lang), WSRF (Resource Framework) Axis (Java to SOAP), Xerces (XML Parsing), com.ibm.wsdl

How to understand that?

A typical reverse engineering problem

Page 7: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

7

Methodology I + First Resultscreate resource

subscribe to changes

add 3

notify ×4

destroy resource

client container

Java VMtracing

execution traces

First attempt: tracing everything (outside the JVM libs) client : 1,544,734 local method call (sic) server : 6,466,652 local method calls (sic) [+time out]

How to visualize such results?

Page 8: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

8

Program Visualization: a few Notions

Problem studied for quite a long time now.

Different aspect : collection, manipulation, visualization.

Visualization some form of projection (many proposed).

Our goal: understand software structure:

lib 1

lib 2

lib 3

lib1.Wale .breathlib1.Mammal.inhalelib2.Lung .inhalelib2.Muscle.contractlib2.Nerve .transmitlib3.Signal.travel

GrowingCallStack

Tracing calls reveals the software structure.

Page 9: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

9

lib1.Wale .breath

lib1.Mammal.inhale

lib2.Lung .inhale

lib2.Muscle.contract

lib2.Nerve .transmit

lib3.Signal.travel

Methodology I

lib3.Blood .flow

lib3.Pressure.foo

0%20%40%60%80%

100%

1 2 3 4 5 6Stack Depth

Cal

ls lib3lib2lib1

0

1

2

3

1 2 3 4 5 6Stack Depth

Cal

ls lib3lib2lib1

Aggregates invocations of the same library. Chart w.r.t. position in call stack.

A call graph obtained by tracing

lib2.Muscle.stop

lib2.Nerve .transmit

lib3.Signal.travel

Page 10: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

10

Methodology I

lib 1

lib 2

lib 3

Package Activity vs. Stack Depth

Software Structure

Page 11: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

11

Package Activity vs. Stack Depth

0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

180,0001 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55

Stack Depth

Cal

ls

org.apache.xerces org.apache.xml org.apache.axis org.apache.log4j org.apache.xpath org.apache.commons com.ibm.wsdlothers

(client, 1 creation, 4 requests, 1 destruction)

Page 12: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

12

0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

180,0001 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55

Stack Depth

Cal

ls

org.apache.xerces org.apache.xml org.apache.axis org.apache.log4j org.apache.xpath org.apache.commons com.ibm.wsdlothers

(client, 1 creation, 4 requests, 1 destruction)

0

1

2

3

1 2 3 4 5 6Stack Depth

Cal

ls lib3lib2lib1

Package Activity vs. Stack Depth

Page 13: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

13

Package Activity vs. Stack Depth

0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

180,0001 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55

Stack Depth

Cal

ls

org.apache.xerces org.apache.xml org.apache.axis org.apache.log4j org.apache.xpath org.apache.commons com.ibm.wsdlothers

(client, 1 creation, 4 requests, 1 destruction)

Looks better, but is the same!

89% of invocations (1,372,534) due to XML!

Page 14: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

14

0%

20%

40%

60%

80%

100%1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55

Stack Depth

Cal

ls

org.apache.xerces org.apache.xml org.apache.axis org.apache.log4j org.apache.xpath org.apache.commons com.ibm.wsdlothers

0%20%40%60%80%

100%

1 2 3 4 5 6Stack Depth

Cal

ls lib3lib2lib1

Package Activity vs. Stack Depth

(client, percentage view)

Page 15: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

15

Package Activity vs. Stack Depth

0%

20%

40%

60%

80%

100%1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55

Stack Depth

Cal

ls

org.apache.xerces org.apache.xml org.apache.axis org.apache.log4j org.apache.xpath org.apache.commons com.ibm.wsdlothers

(client, percentage view)

XML used by org.apache.axis, not by Globus!

Very long stack probably recursive parsing!

Page 16: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

16

What does it tell us?

Most of the local invocations (89%) are related to XMLparsing (org.apache.xerces, org.apache.xml).

The parsing is used by Axis (the SOAP/Java bridge fromthe apache foundation), not directly by Globus.

The very long stacks we observe (up to 57 frames!) mostprobably reflect some recursive parsing loop, rather thanthe program structure.

Similar findings on the server side (only more dramatic,stack depth up to 108 (sic), 4 times more invocations).

Page 17: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

17

New Questions

More insight needed:

Does invocation count reflect real performance?

How “bad” is really the platform?

Can we do the same kind of “structural” projection ofprofiling data?

If yes, is it useful?

Our choice: 2 step approach (1) Black box profiling (2) “Internal” profiling using sampling

Page 18: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

18

Chosen Approach

2 steps:1. Black box profiling: minimal interferences. Coarse results.2. Sample based profiling: less accurate but more detailed.

We focused on the connectivity of the WSRFimplementation of GT4-Java: Low level “plumbing”. No high level service involved Motivation: profile the founding bricks of the Globus platform

Experimental set-up: Standalone SMP server running 4 Intel Xeon @ 1.6GHz No network cost involved! Avoids context switching overhead! Globus 3.9.4 used (last GT4 alpha release, released Dec.04)

Page 19: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

19

client

clock instrumentation

Black-Box Profiling: Approach Black Box Approach: Measure externally visible latencies

Many different situations to be considered!

container

create

add 3

notify 3

destroy ×5

influence of resource init

×5

influence of client init

×5

influence of container init

×50

aver

agin

g

(10 000 invocations)

subscribe

Page 20: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

20

Resource Set-Up

×5×5

subscribecreate

client cont.

Page 21: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

21

Resource Set-UpContainer initoverhead(~8.2s!)

Client initoverhead(~24.8s!)

High lazy initialization costs! (> 30s!)

Stabilized latency remains high (380ms)

Page 22: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

22

First Notification

×5×5client cont.

1st notify

Page 23: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

23

First NotificationContainer initoverhead(~430ms)

Client initoverhead (~1.4s!)

Stabilizedlatency(~1.1s!)

Page 24: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

24

Second Notification

Lazy initialization everywhere

×5×5client cont.

2nd notify

Resource initoverhead(~930ms!)

Stabilizedlatency 1st

notification(~1.1s)

Stabilized request latency still high (170ms)

Page 25: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

25

Internal Profiling: Basics Profiling data obtained through sampling

(SUN hprof basic profiler on Java 1.4) JVM periodically stopped. Stack of active thread is captured. Result : A set of weighted stack traces. Weight = measures

how often the stack was observed.

Visualization:Set of weight stacks = multi-dimensional object Time (represented by weights) Threads: each trace belongs to a thread Control flow (represented by stacks, reflects use relationships) Code Structure (package organization, class hierarchy, etc.)

Projection (aggregation / collapsing) required

Many possibility. Our choice: code structure + stack depth

Page 26: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

26

Methodology IIIlib1.Wale .breath

lib1.Mammal.inhale

lib2.Lung .inhale

lib2.Muscle.contract

lib2.Nerve .transmit

lib3.Signal.travel

lib3.Blood .flow

lib3.Pressure.foo

lib2.Muscle.stop

lib2.Nerve .transmit

lib3.Signal.travel

Sampling yields a set of weighted stack traces (weight reflects time spent)

×3 ×1 ×2

0123456

1 2 3 4 5 6Stack Depth

Tim

e U

nits lib3

lib2lib1

Aggregates invocations of the same library. Chart w.r.t. position in call stack.

EXCLUSIVE

0123456

1 2 3 4 5 6Stack Depth

Tim

e U

nits lib3

lib2lib1

INCLUSIVE

Page 27: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

27

Experimental Set-Up

containerclient

createsubscribe

add 3notify 3

destroy ×5×5

Java VMhprof

profiling data

Page 28: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

28

Sharp drop atlength 13

Container Profiling: Results

Some very deep traces. Lookquite regular beyond depth28 (recursion?)org.apache.axis predominant

Layered structuredfor upper stackdepths

Busy waiting related tonotification management.Outside request critical path.

Page 29: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

29

subscribe

notify 3

+ extra granularity to observe packageorg.apache.axis

New Experimental Set-Up

containerclient

create

add 3

destroy ×5×5

Java VMhprof

profiling data

Page 30: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

30

This is a recursion inorg.apache.wsdl.symbolTable(web services).Symbol management issue?

Traces of length 13 havedisappeared. They werecaused by the notificationmanagement.

org.globus.wsrf

org.globus.gsi (security)

sun.reflect (reflection)

New Results

Page 31: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

31

Profiling Breakdown

Abstracts away low level packages (java.*, etc.)

Sample breakdown among “higher level” packages:Package Name Samples %

org.apache.axis.wsdl 231 21% org.apache.axis.encoding 66 6% org.apache.axis (others) 113 10% org.globus.gsi 249 23% org.globus.wsrf 49 4% cryptix.provider.rsa 82 7% org.apache.xerces 78 7% others 237 21%

Page 32: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

32

Profiling Breakdown

Abstracts away low level packages (java.*, etc.)

Sample breakdown among “higher level” packages:Package Name Samples %

org.apache.axis.wsdl 231 21% org.apache.axis.encoding 66 6% org.apache.axis (others) 113 10% org.globus.gsi 249 23% org.globus.wsrf 49 4% cryptix.provider.rsa 82 7% org.apache.xerces 78 7% others 237 21%

Symbol management issue?

Page 33: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

33

Profiling Breakdown

Abstracts away low level packages (java.*, etc.)

Sample breakdown among “higher level” packages:Package Name Samples %

org.apache.axis.wsdl 231 21% org.apache.axis.encoding 66 6% org.apache.axis (others) 113 10% org.globus.gsi 249 23% org.globus.wsrf 49 4% cryptix.provider.rsa 82 7% org.apache.xerces 78 7% others 237 21%

SOAP + XML: 44%

Page 34: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

34

Profiling Breakdown

Abstracts away low level packages (java.*, etc.)

Sample breakdown among “higher level” packages:Package Name Samples %

org.apache.axis.wsdl 231 21% org.apache.axis.encoding 66 6% org.apache.axis (others) 113 10% org.globus.gsi 249 23% org.globus.wsrf 49 4% cryptix.provider.rsa 82 7% org.apache.xerces 78 7% others 237 21%

Security / Cryptography: 30%

Page 35: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

35

Many Other Visualisation Ways

0

100

200

300

400

500

600

700

800Th

read

-3Th

read

-2Th

read

-5Th

read

-9Th

read

-20

Thre

ad-0

Thre

ad-1

0Th

read

-22

Thre

ad-1

1Th

read

-7Th

read

-17

Thre

ad-1

4Th

read

-15

Thre

ad-2

5Th

read

-6Th

read

-21

Thre

ad-1

9Th

read

-28

Thre

ad-2

4Th

read

-29

Thre

ad-1

6Th

read

-18

Thre

ad-2

6Th

read

-13

Thre

ad-2

7Th

read

-8m

ain

Thre

ad-1

2Th

read

-23

Thre

ad-1

Thre

ad-4

Ref

eren

ce H

andl

erpr

oces

s re

aper

Tim

e U

nits

0

5

10

15

20

25

30

Ave

rage

Sta

ck D

epth

Time UnitsAverage Stack Depth

(v3.9.2)

Page 36: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

36

Summing Up Globus

Lazy optimisation: very high latency on first invocation ofoperations (up to 30s to set up a resource on a new container!)

Stabilized latencies still high: ~ 160ms for a round triprequest (with authentication turned on)

No clear culprit. Technology overload:WSDL, SOAP, security

Is lazy optimisation a problem? Yes and No.

Page 37: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

37

Longer Term

Platform and technology come and goGlobus is a moving target

But experimental approaches stay

And so do development practices

Page 38: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

38

Middleware Practices:Are we doomed?

Lazy optimization flexibility paradox

Poor performance abstraction leaking

Reverse engineering Frankenstein’s return?

Can Cognitive-based Middleware save us?

API are for real beings!

Page 39: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

39

To look further

Globus profiling The Impact of Web Service Integration on Grid Performance,

François Taïani, Matti Hiltunen, Rick Schlichting, HPDC-14,2005

Large graph reverse engineering CosmOpen: Dynamic reverse-engineering on a budget,

Francois Taiani, Marc-Olivier Killijian, Jean-Charles Fabre, TR COMP-002-2008, Lancaster University, 2008

http://ftaiani.ouvaton.org/7-software/index.html#CosmOpen

Next Generation Middleware Group at Lancaster

Page 40: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

40

The End(Thank you)

Page 41: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

41

Package Activity vs. “Progress”

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96

Progress (% of Local Calls)

% o

f C

alls

in P

rogr

ess

Slic

e .

org.apache.xerces com.ibm.wsdl org.apache.xpath org.apache.xml org.apache.axis org.apache.log4j org.apache.commons org.xml.sax org.globus.wsrf org.apache.namingothersRemote Calls

client(1 creation, 4 requests, 1 destruction)

Page 42: Explorations in the Grid/WS Computing Jungle · 5 Globus Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National

42

124

0

10

20

30

40

50

60

70

80

90

100

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Stack Depth

Hits

java.netjava.securitysun.security.providerorg.apache.axisothersjava.util.zipjava.langorg.apache.xerces

1124

666.

Profiling Results (Exclusive, Server)