Upload
galena-hanson
View
15
Download
1
Embed Size (px)
DESCRIPTION
Young Suk Moon Chair: Prof. Gregor von Laszewski Reader: Observer:. Fault Tolerant Grid Workflow in Water Threat Management Master’s project / thesis seminar. Outline. Brief summary of Water Threat Management Goal of the project - PowerPoint PPT Presentation
Citation preview
Fault Tolerant Grid Workflowin Water Threat Management
Master’s project / thesis seminar
Young Suk Moon
Chair: Prof. Gregor von Laszewski Reader: Observer:
Outline
Brief summary of Water Threat Management
Goal of the project
Background for my topic
Dynamic job scheduling
Fault tolerant grid systems
My ideas
Water Threat Management project
Analyzing contamination of water in urban water distribution systems
Sensor Data OptimizationEngine
Grid Resources
MiddleWare
SimulationEngine(MPI)
EPANET
EPANET
EPANET
EPANET
find the contaminant
source
find the optimal solution
Goal of the project
Problems of the current WTM system (MPI) Not fault tolerant
All computing should restart from the beginning
in case of node failure
Decision Change MPI systems to loosely coupled systems
Problems to solve
Run-time job scheduling
Fault tolerance
Background: Dynamic resource selection
Job Queue
Machines
Jobs
Performance DB
Select machine
Background: Fault tolerance in grid
Replication
Run the same job in multiple nodes
Need more resources
Checkpoint-restart
Checkpoint server
Slow due to checkpoint overhead
My ideas: Multiple-enqueue and Discard
Global Queue
Jobs 135 4 2
Machine A
Machine C
Machine B
queue A
queue C
queue B
My ideas: Multiple-enqueue and Discard
Global Queue
Jobs 6810 9 7
Machine A
Machine C
Machine B
23 123 123 1
23 123 145 3
23 123 134 2
queue A
queue C
queue B
Issues
How many duplicated jobs to enqueue
How to allocate which jobs to which machines How to divide jobs or input data How to cluster nodes
Evaluation
Comparison based on the different settings
References
G. von Laszewski, K. Mahinthakumar, R. Ranjithan, D. Brill, J. Uber, K. Harrison, S. Sreepathi, and E. Zechman, “An Adaptive Cyberinfrastructure for Threat Management in Urban Water Distribution Systems,” in Proceedings of ICCS 2006, vol. 3993, 2006, pp. 401–.
S. Sreepathi, “CYBERINFRASTRUCTURE FOR CONTAMINATION SOURCE CHARACTERIZATION IN WATER DISTRUBUTION SYSTEMS,” Master’s thesis, North Carolina State University, 2006
G. von Laszewski, “A Loosely Coupled Metacomputer: Cooperating Job Submissions Across Multiple Supercomputing Sites,” Concurrency, Experience, and Practice, vol. 11, no. 5, pp. 933–948, Dec. 1999
L. Ramakrishnam and D. A. Reed, “Performability modeling for scheduling and fault tolerance strategies for scientific workflows,” in Proceedings of the 17th international symposium on High performance distributed computing, 2008.
S. Ayyub and D. Abramson, “GridRod - A Dynamic Runtime Scheduler for Grid Workflows,” in Proceedings of the 21st annual international conference on Supercomputing, 2007.