Upload
macha
View
27
Download
2
Embed Size (px)
DESCRIPTION
Internet-Based TSP Computation with Javelin++ Michael Neary & Peter Cappello Computer Science, UCSB. Introduction Goals. Service parallel applications that are: Large : too big for a cluster Coarse-grain : to hide communication latency Simplicity of use - PowerPoint PPT Presentation
Citation preview
Internet-Based TSP Computation with Javelin++
Michael Neary & Peter CappelloComputer Science, UCSB
IntroductionGoals
• Service parallel applications that are:– Large: too big for a cluster– Coarse-grain: to hide communication latency
• Simplicity of use– Design focus: decomposition [composition] of computation.
• Scalable high performance– despite large communication latency
• Fault-tolerance– 1000s of hosts, each dynamically [dis]associates.
IntroductionSome Related Work
IntroductionSome Applications
• Search for extra-terrestrial life• Computer-generated animation• Computer modeling of drugs for:
– Influenza– Cancer– Reducing chemotherapy’s side-effects
• Financial modeling• Storing nuclear waste
Outline
• Architecture
• Model of Computation
• API
• Scalable Computation
• Experimental Results
• Conclusions & Future Work
Architecture Basic Components
Brokers
Clients
Hosts
ArchitectureBroker Discovery
B
B B B
B
B B B
BrokerNamingSystem
B
H
ArchitectureBroker Discovery
B
B B B
B
B B B
BrokerNamingSystem
B
H
ArchitectureBroker Discovery
B
B B B
B
B B B
BrokerNamingSystem
B
H
ArchitectureBroker Discovery
B
B B B
B
B B B
BrokerNamingSystem
B
H
PING(BID?)
ArchitectureBroker Discovery
B
B B B
B
B B B
BrokerNamingSystem
B
H
ArchitectureNetwork of Broker-Managed Host Trees
• Each broker manages a tree of hosts
ArchitectureNetwork of Broker-Managed Host Trees
• Brokers form a network
ArchitectureNetwork of Broker-Managed Host Trees
• Brokers form a network
• Client contacts broker
ArchitectureNetwork of Broker-Managed Host Trees
• Brokers form a network
• Client contacts broker• Client gets host trees
Scalable ComputationDeterministic Work-Stealing Scheduler
Task container
addTask( task ) getTask( )
stealTask( )
HOST
Scalable ComputationDeterministic Work-Stealing Scheduler
Task getWork( )
{
if ( my deque has a task )
return task;
else if ( any child has a task )
return child’s task;
else
return parent.getWork( );
}
CLIENT
HOSTS
Models of Computation
• Master-slave
– AFAIK all proposed commercial applications
• Branch-&-bound optimization
– A generalization of master-slave.
Models of ComputationBranch & Bound
34 8 7 12 10 9 10
3 6 10 8
2 7
0 0UPPER = LOWER = 0
Models of ComputationBranch & Bound
34 8 7 12 10 9 10
3 6 10 8
2 7
0
2
0UPPER = LOWER = 2
Models of ComputationBranch & Bound
34 8 7 12 10 9 10
3 6 10 8
2 7
0
3
2
0UPPER = LOWER = 3
Models of ComputationBranch & Bound
34 8 7 12 10 9 10
3 6 10 8
2 7
0
4
3
2
0UPPER = 4LOWER = 4
Models of ComputationBranch & Bound
34 8 7 12 10 9 10
3 6 10 8
2 7
0
34
3
2
0UPPER = 3LOWER = 3
Models of ComputationBranch & Bound
34 8 7 12 10 9 10
3 6 10 8
2 7
0
34
3 6
2
0UPPER = 3LOWER = 6
Models of ComputationBranch & Bound
34 8 7 12 10 9 10
3 6 10 8
2 7
0 UPPER = 3LOWER = 7
34
3 6
2 7
0
Models of ComputationBranch & Bound
• Tasks created dynamically
• Upper bound is shared
• To detect termination:
scheduler detects tasks that
have been:
– Completed
– Killed (“bounded”)34
3 6
2 7
0
APIpublic class Host implements Runnable{ . . . public void run() { while ( (node = jDM.getWork()) != null ) { if ( isAtomic() ) compute(); // search space; return result else { child = node.branch(); // put children in child array for (int i = 0; i < node.numChildren; i++) if ( child[i].setLowerBound() < UpperBound )
jDM.addWork( child[i] ); //else child is killed implicitly } } }
APIprivate void compute() { . . .
boolean newBest = false;
while ( (node = stack.pop()) != null ) { if ( node.isComplete() ) if ( node.getCost() < UpperBound ) { newBest = true; UpperBound = node.getCost(); jDM.propagateValue( UpperBound ); best = Node( child[i] ); } else { child = node.branch(); for (int i = 0; i < node.numChildren; i++) if ( child[i].setLowerBound() < UpperBound ) stack.push( child[i] ); //else child is killed implicitly } } if ( newBest ) jDM.returnResult( best );} }
Scalable ComputationWeak Shared Memory Model
• Slow propagation of bound affects performance not correctness.
Propagate bound
Scalable ComputationWeak Shared Memory Model
• Slow propagation of bound affects performance not correctness.
Propagate bound
Scalable ComputationWeak Shared Memory Model
• Slow propagation of bound affects performance not correctness.
Propagate bound
Scalable ComputationWeak Shared Memory Model
• Slow propagation of bound affects performance not correctness.
Propagate bound
Scalable ComputationWeak Shared Memory Model
• Slow propagation of bound affects performance not correctness.
Propagate bound
Scalable ComputationFault Tolerance via Eager Scheduling
When:
• All tasks have been assigned
• Some results have not been reported
• A host wants a new task
Re-assign a task!
• Eager scheduling tolerates faults & balances the load.
– Computation completes, if at least 1 host communicates with client.
Scalable ComputationFault Tolerance via Eager Scheduling
• Scheduler must know which:
– Tasks have completed
– Nodes have been killed
• Performance balance
– Centralized schedule info
– Decentralized computation34
3 6
2 7
0
Experimental Results
0
20
40
60
80
100
0 20 40 60 80 100
Processors
Speedup graph22
ideal
graph24
Experimental Results
34 8 7 12 10 9 10
3 6 10 8
2 7
0 Example of a “bad” graph
Conclusions• Javelin 2 relieves designer/programmer managing a set of
[Inter-] networked processors that is:– Dynamic– Faulty
• A wide set of applications is covered by:– Master-slave model– Branch & bound model
• Weak shared memory performs well.• Use multicast (?) for:
– Code distribution– Propagating values
Future Work
• Improve support for long-lived computation:– Do not require that the client run continuously.
• A dag model of computation– with limited weak shared memory.
Future WorkJini/JavaSpaces Technology
TaskManageraka Broker
H H
HH
H
H
H
H
“Continuously” disperse Tasks among brokers via a physics model
Future WorkJini/JavaSpaces Technology
• TaskManager uses
persistent JavaSpace
– Host management: trivial
– Eager scheduling: simple
• No single point of failure
– Fat tree topology
Future WorkAdvanced Issues
• Privacy of data & algorithm• Algorithms
– New computational complexity model“Minimize” communication between machines
– N-body problem, …
• Accounting: Associate specific work with specific host– Correctness– Compensation (how to quantify?)
• Create international open source organization– System infrastructure– Application codes
Models of ComputationBranch & Bound
34 8 7 12 10 9 10
3 6 10 8
2 7
0
34 8 7 12 10 9 10
3 6 10 8
2 7
0UPPER = 3LOWER = 0
ArchitectureBroker Name Service (BNS)
BROKER
HOST
BNS1. Register with BNS
ArchitectureBroker Name Service (BNS)
BROKER
HOST
BNS1. Register with BNS
2. Get broker list
ArchitectureBroker Name Service (BNS)
BROKER
HOST
BNS1. Register with BNS
2. Get broker list
3. Ping brokers on list
ArchitectureBroker Name Service (BNS)
BROKER
HOST
BNS1. Register with BNS
2. Get broker list
3. Ping brokers on list
4. Connect to selected broker