42
Java-Based Parallel Computing on the Internet: Javelin 2.0 & Beyond Michael Neary & Peter Cappello Computer Science, UCSB

Introduction Goals

  • Upload
    toby

  • View
    32

  • Download
    1

Embed Size (px)

DESCRIPTION

Java-Based Parallel Computing on the Internet: Javelin 2.0 & Beyond Michael Neary & Peter Cappello Computer Science, UCSB. Introduction Goals. Service parallel applications that are: Large : too big for a cluster Coarse-grain : to hide communication latency Simplicity of use - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction Goals

Java-Based Parallel Computing on the Internet: Javelin 2.0 & Beyond

Michael Neary & Peter CappelloComputer Science, UCSB

Page 2: Introduction Goals

IntroductionGoals

• Service parallel applications that are:– Large: too big for a cluster– Coarse-grain: to hide communication latency

• Simplicity of use– Design focus: decomposition [composition] of computation.

• Scalable high performance– despite large communication latency

• Fault-tolerance– 1000s of hosts, each dynamically [dis]associates.

Page 3: Introduction Goals

IntroductionSome Related Work

Page 4: Introduction Goals

IntroductionSome Applications

• Search for extra-terrestrial life• Computer-generated animation• Computer modeling of drugs for:

– Influenza– Cancer– Reducing chemotherapy’s side-effects

• Financial modeling• Storing nuclear waste

Page 5: Introduction Goals

Outline

• Architecture

• Model of Computation

• API

• Scalable Computation

• Experimental Results

• Conclusions & Future Work

Page 6: Introduction Goals

Architecture Basic Components

Brokers

Clients

Hosts

Page 7: Introduction Goals

ArchitectureBroker Discovery

B

B B B

B

B B B

BrokerNamingSystem

B

H

Page 8: Introduction Goals

ArchitectureBroker Discovery

B

B B B

B

B B B

BrokerNamingSystem

B

H

Page 9: Introduction Goals

ArchitectureBroker Discovery

B

B B B

B

B B B

BrokerNamingSystem

B

H

Page 10: Introduction Goals

ArchitectureBroker Discovery

B

B B B

B

B B B

BrokerNamingSystem

B

H

PING(BID?)

Page 11: Introduction Goals

ArchitectureBroker Discovery

B

B B B

B

B B B

BrokerNamingSystem

B

H

Page 12: Introduction Goals

ArchitectureNetwork of Broker-Managed Host Trees

• Each broker manages a tree of hosts

Page 13: Introduction Goals

ArchitectureNetwork of Broker-Managed Host Trees

• Brokers form a network

Page 14: Introduction Goals

ArchitectureNetwork of Broker-Managed Host Trees

• Brokers form a network

• Client contacts broker

Page 15: Introduction Goals

ArchitectureNetwork of Broker-Managed Host Trees

• Brokers form a network

• Client contacts broker• Client gets host trees

Page 16: Introduction Goals

Scalable ComputationDeterministic Work-Stealing Scheduler

Task container

addTask( task ) getTask( )

stealTask( )

HOST

Page 17: Introduction Goals

Scalable ComputationDeterministic Work-Stealing Scheduler

Task getWork( ) {

if ( my deque has a task ) return task;else if ( any child has a task ) return child’s task;else return parent.getWork( );

}

CLIENT

HOSTS

Page 18: Introduction Goals

Models of Computation

• Master-slave

– AFAIK all proposed commercial applications

• Branch-&-bound optimization

– A generalization of master-slave.

Page 19: Introduction Goals

Models of ComputationBranch & Bound

34 8 7 12 10 9 10

3 6 10 8

2 7

0 0UPPER = LOWER = 0

Page 20: Introduction Goals

Models of ComputationBranch & Bound

34 8 7 12 10 9 10

3 6 10 8

2 7

0

2

0UPPER = LOWER = 2

Page 21: Introduction Goals

Models of ComputationBranch & Bound

34 8 7 12 10 9 10

3 6 10 8

2 7

0

3

2

0UPPER = LOWER = 3

Page 22: Introduction Goals

Models of ComputationBranch & Bound

34 8 7 12 10 9 10

3 6 10 8

2 7

0

4

3

2

0UPPER = 4LOWER = 4

Page 23: Introduction Goals

Models of ComputationBranch & Bound

34 8 7 12 10 9 10

3 6 10 8

2 7

0

34

3

2

0UPPER = 3LOWER = 3

Page 24: Introduction Goals

Models of ComputationBranch & Bound

34 8 7 12 10 9 10

3 6 10 8

2 7

0

34

3 6

2

0UPPER = 3LOWER = 6

Page 25: Introduction Goals

Models of ComputationBranch & Bound

34 8 7 12 10 9 10

3 6 10 8

2 7

0 UPPER = 3LOWER = 7

34

3 6

2 7

0

Page 26: Introduction Goals

Models of ComputationBranch & Bound

• Tasks created dynamically

• Upper bound is shared

• To detect termination:

scheduler detects tasks that

have been:

– Completed

– Killed (“bounded”)34

3 6

2 7

0

Page 27: Introduction Goals

APIpublic class Host implements Runnable{ . . . public void run() { while ( (node = jDM.getWork()) != null ) { if ( isAtomic() ) compute(); // search space; return result else { child = node.branch(); // put children in child array for (int i = 0; i < node.numChildren; i++) if ( child[i].setLowerBound() < UpperBound )

jDM.addWork( child[i] ); //else child is killed implicitly } } }

Page 28: Introduction Goals

APIprivate void compute() { . . .

boolean newBest = false;

while ( (node = stack.pop()) != null ) { if ( node.isComplete() ) if ( node.getCost() < UpperBound ) { newBest = true; UpperBound = node.getCost(); jDM.propagateValue( UpperBound ); best = Node( child[i] ); } else { child = node.branch(); for (int i = 0; i < node.numChildren; i++) if ( child[i].setLowerBound() < UpperBound ) stack.push( child[i] ); //else child is killed implicitly } } if ( newBest ) jDM.returnResult( best );} }

Page 29: Introduction Goals

Scalable ComputationWeak Shared Memory Model

• Slow propagation of bound affects performance not correctness.

Propagate bound

Page 30: Introduction Goals

Scalable ComputationWeak Shared Memory Model

• Slow propagation of bound affects performance not correctness.

Propagate bound

Page 31: Introduction Goals

Scalable ComputationWeak Shared Memory Model

• Slow propagation of bound affects performance not correctness.

Propagate bound

Page 32: Introduction Goals

Scalable ComputationWeak Shared Memory Model

• Slow propagation of bound affects performance not correctness.

Propagate bound

Page 33: Introduction Goals

Scalable ComputationWeak Shared Memory Model

• Slow propagation of bound affects performance not correctness.

Propagate bound

Page 34: Introduction Goals

Scalable ComputationFault Tolerance via Eager Scheduling

When:

• All tasks have been assigned

• Some results have not been reported

• A host wants a new task

Re-assign a task!

• Eager scheduling tolerates faults & balances the load.

– Computation completes, if at least 1 host communicates with client.

Page 35: Introduction Goals

Scalable ComputationFault Tolerance via Eager Scheduling

• Scheduler must know which: – Tasks have completed

– Nodes have been killed

• Performance balance – Centralized schedule info

– Decentralized computation34

3 6

2 7

0

Page 36: Introduction Goals

Experimental Results

0

20

40

60

80

100

0 20 40 60 80 100

Processors

Speedup graph22

idealgraph24

Page 37: Introduction Goals

Experimental Results

34 8 7 12 10 9 10

3 6 10 8

2 7

0 Example of a “bad” graph

Page 38: Introduction Goals

Conclusions• Javelin 2 relieves designer/programmer managing a set of

[Inter-] networked processors that is:– Dynamic– Faulty

• A wide set of applications is covered by:– Master-slave model– Branch & bound model

• Weak shared memory performs well.• Use multicast (?) for:

– Code distribution– Propagating values

Page 39: Introduction Goals

Future Work

• Improve support for long-lived computation:– Do not require that the client run continuously.

• A dag model of computation– with limited weak shared memory.

Page 40: Introduction Goals

Future WorkJini/JavaSpaces Technology

TaskManageraka Broker

H H

HH

H

H

H

H

“Continuously” disperse Tasks among brokers via a physics model

Page 41: Introduction Goals

Future WorkJini/JavaSpaces Technology

• TaskManager uses persistent JavaSpace– Host management: trivial

– Eager scheduling: simple

• No single point of failure– Fat tree topology

Page 42: Introduction Goals

Future WorkAdvanced Issues

• Privacy of data & algorithm• Algorithms

– New computation-communication complexity model– N-body problem, …

• Accounting: Associate specific work with specific host– Correctness– Compensation (how to quantify?)

• Create open source organization– System infrastructure– Application codes