Upload
evelyn-briggs
View
215
Download
0
Embed Size (px)
Citation preview
http://www.epcc.ed.ac.uk/sungrid
EPCC Sun Data and Compute Grids Project
Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local and Remote Machines
Terry SloanEdinburgh Parallel Computing Centre (EPCC)
Telephone: +44 131 650 5155
Email: [email protected]
2 http://www.epcc.ed.ac.uk/sungrid
Overview
The Project Why do it ? Project Scenario Project Goal How ? Project Achievements The Compute Scheduler The Compute & Data Scheduler
http://www.epcc.ed.ac.uk/sungrid
The Project
4 http://www.epcc.ed.ac.uk/sungrid
The Project
Develop a Globus enabled compute and data scheduler
Based on Grid Engine, Globus and variety of data technologies
5 http://www.epcc.ed.ac.uk/sungrid
The Project (cont)
Partners– Sun Microsystems– National e-Science Centre represented by EPCC
Timescales– 23 months – Start Feb 2002– End Dec 2003– Feb 2003 = Project Month 13 (PM13)
http://www.epcc.ed.ac.uk/sungrid
Why do it ?
7 http://www.epcc.ed.ac.uk/sungrid
Why do it?
Grid Engine – over 20000 downloads (Nov 2002)– Distributed Resource Management tool– Schedules activities across networked resources
Sun classifies 3 levels of Grid– Cluster Grid – a single team or project and their associated
resources– Enterprise Grid – multiple teams and projects but within a single
organisation, facilitating collaboration across the enterprise– Global Grid – linked Cluster and Enterprise grids, providing
collaboration amongst organisations
Grid Engine meets first two levels but by itself does not meet the third
8 http://www.epcc.ed.ac.uk/sungrid
Why do it? (cont)
Globus Toolkit – A Grid API for connecting distributed compute and instrument
resources
Integration with Globus allows Grid Engine to meet level 3
– Collaboration amongst enterprises – Most integration efforts use Globus to submit work to Grid Engine
This project tackles opposite problem - to engineer Grid Engine on top of Globus
9 http://www.epcc.ed.ac.uk/sungrid
Why do it? (cont)
Grid Engine concerned with compute resources– Extend it to work with popular data and service access protocols (eg.
OGSA-DAI)
http://www.epcc.ed.ac.uk/sungrid
Project Scenario
11 http://www.epcc.ed.ac.uk/sungrid
Project Scenario
Two collaborating enterprises A and B both have some machines– Both enterprises run Grid Engine to schedule jobs– Local demand for machines is variable
• Sometimes it exceeds supply• Other times machines lie idle
Grid Enginea b c d
A BGrid Enginee f g h
Users (A) Users (B)
12 http://www.epcc.ed.ac.uk/sungrid
Project Scenario(cont)
Ideal Situation– If enterprises A and B could expose some of their machines to
each other across the internet through Grid Engine…• Both A and B could enjoy through-put efficiency improvements• Large gains when one enterprise is busy while the other is idle
Grid Enginea b c d
e f g h
Grid Enginee f g h
a b c d
A BUsers (A) Users (B)
http://www.epcc.ed.ac.uk/sungrid
The Project Goal
14 http://www.epcc.ed.ac.uk/sungrid
Project Goal
Final goal – Develop a scheduler based on Grid Engine to schedule jobs
across a combination of local and remote machines– Enable jobs to access necessary data sources– Use Globus as the Grid API to provide secure communications
and transfer
Development Criteria– Industrial strength – Application of software engineering techniques– Use of industry standard design and analysis tools– Migration to OGSA-compliant Globus 3
http://www.epcc.ed.ac.uk/sungrid
How ?
16 http://www.epcc.ed.ac.uk/sungrid
Workpackages
WP 1: Analysis of existing Grid componentsWP 1.1: UML analysis of core Globus 2.0 WP 1.2: UML analysis of Grid Engine WP 1.3: UML analysis of other Globus 2.0– WP 1.4: UML analysis of Globus 3.0– WP 1.5: Exploration of data technologies
WP 2: Requirements Capture & Analysis WP 3: Prototype Compute Scheduler WP 4: Compute/Data Scheduler Design WP 5: Compute/Data Scheduler Development
17 http://www.epcc.ed.ac.uk/sungrid
The Project Team
Project Personnel– Terry Sloan : Project leader – Geoff Cawood : Project architect– Ratna Abrol : Engineering– Thomas Seed : Engineering– Ali Anjomshoaa : Globus 2 Analysis– Paul Graham : Requirements Capture and Analysis– Amy Krause : Technical reviewer
Project Review Board– Fritz Ferstl (Sun Microsystems Gmbh)– John Barr (Sun Microsystems Ltd)– Steven Newhouse (London e-Science Centre)– Neil Chue Hong (EPCC)
http://www.epcc.ed.ac.uk/sungrid
Achievements
19 http://www.epcc.ed.ac.uk/sungrid
Achievements
Publications– D1.1 Analysis of Globus Toolkit V2.0– D1.2 Grid Engine UML Analysis– D2.1 Use cases and requirements– D2.2 Questionnaire Report– D3.1 Prototype Development: Requirements
Software– Transfer-queue Over Globus (TOG)
http://www.epcc.ed.ac.uk/sungrid
Transfer-queue Over Globus (TOG) - A Compute Scheduler
21 http://www.epcc.ed.ac.uk/sungrid
Transfer-queue Over Globus (TOG)
Grid Enginea b c d
e
Grid Enginee f g h
d
A B
Glo
bus 2
User A User B
Integrates Grid Engine and Globus 2 to access remote resources GE execution methods provide job submission and control GE job context stores job specific information eg job handle Globus GSI for security Globus GRAM enables interaction with remote resource GASS for small data transfer, GridFTP for large datasets
22 http://www.epcc.ed.ac.uk/sungrid
TOG (cont)
Current Status– Secure job submission functionality implemented and tested
• Staging of input data and executables and transfer of output
– Secure job control functionality implemented and tested• Suspend, Resume, Terminate
– Basic scheduling functionality implemented and tested• Schedules jobs to remote resources when local resources are full
– Testing • Integrated successfully within Grid Engine test suite
• Tested through firewalls
TOG software available upon request– Contact [email protected]
Generally available via web site soon– www.epcc.ed.ac.uk/sungrid
23 http://www.epcc.ed.ac.uk/sungrid
TOG (cont)
Pros Simple approach Usability – existing Grid Engine interface, users only
need to be aware of Globus certificates Remote administrators still have full control of their
resources
24 http://www.epcc.ed.ac.uk/sungrid
TOG (cont)
Cons Low quality scheduling decisions (?)
– May be a time-lag in getting query results back from remote resource
– Incorporating data transfer costs into scheduling Mirror queues for remote resources Possible set-up overhead Globus 2 vs. Globus 3 Grid Engine specific solution
http://www.epcc.ed.ac.uk/sungrid
The Compute & Data Scheduler
26 http://www.epcc.ed.ac.uk/sungrid
Current status
Considering two possible routes
1. Extend TOG– Migrate to Globus 3– Incorporate OGSA-DAI
2. Hierarchical Scheduler– Overcome limitations – Global Grid vision
27 http://www.epcc.ed.ac.uk/sungrid
1. Extend compute scheduler
Compute Grid
Data Grid
GEGEGE
GridFTP Site SRB OGSA-DAI
(Hides ODBC, JDBC, XMLDB
etc.)
GlobusGlobusGlobus
Globus
28 http://www.epcc.ed.ac.uk/sungrid
2. Hierarchical Scheduler
Unified Interface– Grid Scalability
Grid Engine
Grid Engine
Grid Engine
Hierarchical Scheduler
Web Services Layer
Web Services Layer
Web Services Layer
Web Services Layer
Hierarchical Scheduler
Web Services Layer Scotland
Edinburgh
EPCC
Query child DRMs for capabilities
Pass Job Specification to the child
Same Interface
29 http://www.epcc.ed.ac.uk/sungrid
Conclusions
Before proceeding Examine Globus 3 Analysis Examine Data Technologies ie OGSA-DAI, etc Informed decision on whether to
– Extend Compute Scheduler, or– Build Hierarchical Scheduler or some sub-set of this.
Delivery in December 2003