29
http://www.epcc.ed.ac.uk/sungrid EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local and Remote Machines Terry Sloan Edinburgh Parallel Computing Centre (EPCC) Telephone: +44 131 650 5155 Email: [email protected]

Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

Embed Size (px)

Citation preview

Page 1: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

http://www.epcc.ed.ac.uk/sungrid

EPCC Sun Data and Compute Grids Project

Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local and Remote Machines

Terry SloanEdinburgh Parallel Computing Centre (EPCC)

Telephone: +44 131 650 5155

Email: [email protected]

Page 2: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

2 http://www.epcc.ed.ac.uk/sungrid

Overview

The Project Why do it ? Project Scenario Project Goal How ? Project Achievements The Compute Scheduler The Compute & Data Scheduler

Page 3: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

http://www.epcc.ed.ac.uk/sungrid

The Project

Page 4: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

4 http://www.epcc.ed.ac.uk/sungrid

The Project

Develop a Globus enabled compute and data scheduler

Based on Grid Engine, Globus and variety of data technologies

Page 5: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

5 http://www.epcc.ed.ac.uk/sungrid

The Project (cont)

Partners– Sun Microsystems– National e-Science Centre represented by EPCC

Timescales– 23 months – Start Feb 2002– End Dec 2003– Feb 2003 = Project Month 13 (PM13)

Page 6: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

http://www.epcc.ed.ac.uk/sungrid

Why do it ?

Page 7: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

7 http://www.epcc.ed.ac.uk/sungrid

Why do it?

Grid Engine – over 20000 downloads (Nov 2002)– Distributed Resource Management tool– Schedules activities across networked resources

Sun classifies 3 levels of Grid– Cluster Grid – a single team or project and their associated

resources– Enterprise Grid – multiple teams and projects but within a single

organisation, facilitating collaboration across the enterprise– Global Grid – linked Cluster and Enterprise grids, providing

collaboration amongst organisations

Grid Engine meets first two levels but by itself does not meet the third

Page 8: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

8 http://www.epcc.ed.ac.uk/sungrid

Why do it? (cont)

Globus Toolkit – A Grid API for connecting distributed compute and instrument

resources

Integration with Globus allows Grid Engine to meet level 3

– Collaboration amongst enterprises – Most integration efforts use Globus to submit work to Grid Engine

This project tackles opposite problem - to engineer Grid Engine on top of Globus

Page 9: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

9 http://www.epcc.ed.ac.uk/sungrid

Why do it? (cont)

Grid Engine concerned with compute resources– Extend it to work with popular data and service access protocols (eg.

OGSA-DAI)

Page 10: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

http://www.epcc.ed.ac.uk/sungrid

Project Scenario

Page 11: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

11 http://www.epcc.ed.ac.uk/sungrid

Project Scenario

Two collaborating enterprises A and B both have some machines– Both enterprises run Grid Engine to schedule jobs– Local demand for machines is variable

• Sometimes it exceeds supply• Other times machines lie idle

Grid Enginea b c d

A BGrid Enginee f g h

Users (A) Users (B)

Page 12: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

12 http://www.epcc.ed.ac.uk/sungrid

Project Scenario(cont)

Ideal Situation– If enterprises A and B could expose some of their machines to

each other across the internet through Grid Engine…• Both A and B could enjoy through-put efficiency improvements• Large gains when one enterprise is busy while the other is idle

Grid Enginea b c d

e f g h

Grid Enginee f g h

a b c d

A BUsers (A) Users (B)

Page 13: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

http://www.epcc.ed.ac.uk/sungrid

The Project Goal

Page 14: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

14 http://www.epcc.ed.ac.uk/sungrid

Project Goal

Final goal – Develop a scheduler based on Grid Engine to schedule jobs

across a combination of local and remote machines– Enable jobs to access necessary data sources– Use Globus as the Grid API to provide secure communications

and transfer

Development Criteria– Industrial strength – Application of software engineering techniques– Use of industry standard design and analysis tools– Migration to OGSA-compliant Globus 3

Page 15: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

http://www.epcc.ed.ac.uk/sungrid

How ?

Page 16: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

16 http://www.epcc.ed.ac.uk/sungrid

Workpackages

WP 1: Analysis of existing Grid componentsWP 1.1: UML analysis of core Globus 2.0 WP 1.2: UML analysis of Grid Engine WP 1.3: UML analysis of other Globus 2.0– WP 1.4: UML analysis of Globus 3.0– WP 1.5: Exploration of data technologies

WP 2: Requirements Capture & Analysis WP 3: Prototype Compute Scheduler WP 4: Compute/Data Scheduler Design WP 5: Compute/Data Scheduler Development

Page 17: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

17 http://www.epcc.ed.ac.uk/sungrid

The Project Team

Project Personnel– Terry Sloan : Project leader – Geoff Cawood : Project architect– Ratna Abrol : Engineering– Thomas Seed : Engineering– Ali Anjomshoaa : Globus 2 Analysis– Paul Graham : Requirements Capture and Analysis– Amy Krause : Technical reviewer

Project Review Board– Fritz Ferstl (Sun Microsystems Gmbh)– John Barr (Sun Microsystems Ltd)– Steven Newhouse (London e-Science Centre)– Neil Chue Hong (EPCC)

Page 18: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

http://www.epcc.ed.ac.uk/sungrid

Achievements

Page 19: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

19 http://www.epcc.ed.ac.uk/sungrid

Achievements

Publications– D1.1 Analysis of Globus Toolkit V2.0– D1.2 Grid Engine UML Analysis– D2.1 Use cases and requirements– D2.2 Questionnaire Report– D3.1 Prototype Development: Requirements

Software– Transfer-queue Over Globus (TOG)

Page 20: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

http://www.epcc.ed.ac.uk/sungrid

Transfer-queue Over Globus (TOG) - A Compute Scheduler

Page 21: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

21 http://www.epcc.ed.ac.uk/sungrid

Transfer-queue Over Globus (TOG)

Grid Enginea b c d

e

Grid Enginee f g h

d

A B

Glo

bus 2

User A User B

Integrates Grid Engine and Globus 2 to access remote resources GE execution methods provide job submission and control GE job context stores job specific information eg job handle Globus GSI for security Globus GRAM enables interaction with remote resource GASS for small data transfer, GridFTP for large datasets

Page 22: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

22 http://www.epcc.ed.ac.uk/sungrid

TOG (cont)

Current Status– Secure job submission functionality implemented and tested

• Staging of input data and executables and transfer of output

– Secure job control functionality implemented and tested• Suspend, Resume, Terminate

– Basic scheduling functionality implemented and tested• Schedules jobs to remote resources when local resources are full

– Testing • Integrated successfully within Grid Engine test suite

• Tested through firewalls

TOG software available upon request– Contact [email protected]

Generally available via web site soon– www.epcc.ed.ac.uk/sungrid

Page 23: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

23 http://www.epcc.ed.ac.uk/sungrid

TOG (cont)

Pros Simple approach Usability – existing Grid Engine interface, users only

need to be aware of Globus certificates Remote administrators still have full control of their

resources

Page 24: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

24 http://www.epcc.ed.ac.uk/sungrid

TOG (cont)

Cons Low quality scheduling decisions (?)

– May be a time-lag in getting query results back from remote resource

– Incorporating data transfer costs into scheduling Mirror queues for remote resources Possible set-up overhead Globus 2 vs. Globus 3 Grid Engine specific solution

Page 25: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

http://www.epcc.ed.ac.uk/sungrid

The Compute & Data Scheduler

Page 26: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

26 http://www.epcc.ed.ac.uk/sungrid

Current status

Considering two possible routes

1. Extend TOG– Migrate to Globus 3– Incorporate OGSA-DAI

2. Hierarchical Scheduler– Overcome limitations – Global Grid vision

Page 27: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

27 http://www.epcc.ed.ac.uk/sungrid

1. Extend compute scheduler

Compute Grid

Data Grid

GEGEGE

GridFTP Site SRB OGSA-DAI

(Hides ODBC, JDBC, XMLDB

etc.)

GlobusGlobusGlobus

Globus

Page 28: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

28 http://www.epcc.ed.ac.uk/sungrid

2. Hierarchical Scheduler

Unified Interface– Grid Scalability

Grid Engine

Grid Engine

Grid Engine

Hierarchical Scheduler

Web Services Layer

Web Services Layer

Web Services Layer

Web Services Layer

Hierarchical Scheduler

Web Services Layer Scotland

Edinburgh

EPCC

Query child DRMs for capabilities

Pass Job Specification to the child

Same Interface

Page 29: Http:// 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local

29 http://www.epcc.ed.ac.uk/sungrid

Conclusions

Before proceeding Examine Globus 3 Analysis Examine Data Technologies ie OGSA-DAI, etc Informed decision on whether to

– Extend Compute Scheduler, or– Build Hierarchical Scheduler or some sub-set of this.

Delivery in December 2003