42
1 Parametric modeling on the Grid with Nimrod/G Jeff Tan Faculty of Information Technology Monash e-Science and Grid Engineering Laboratory

Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

1

Parametric modeling on the Grid with Nimrod/G

Jeff TanFaculty of Information TechnologyMonash e-Science and Grid Engineering Laboratory

Page 2: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

2

Overview

New Methods in Scientific discovery

e-Science & e-ResearchThe role of Grid Services & MiddlewareSoftware Lifecycle Tools

Applications developmentExecution

Examples from Monash Tools

Page 3: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

3

Scientific discovery

e-Science & e-Research

Page 4: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

4

e-SciencePre-Internet

Theorize &/or experiment, aloneor in small teams; publish paper

Post-InternetConstruct and mine large databases of observational or simulation dataDevelop simulations & analysesAccess specialized devices remotelyExchange information within distributed multidisciplinary teams

Source: Ian Foster

Page 5: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

5

Typical Grid ApplicationsCharacteristics

High Performance ComputationDistributed infrastructureInstruments are first class resourcesLots of dataNot just bigger – fundamentally different

Some examplesIn silico biology (See MyGrid)Earthquake simulationVirtual observatoryDynamic aircraft maintenanceHigh energy physicsMedical applicationsEnvironmental questions

Page 6: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

6

Software Life Cycle on the Grid?

Deploy & Build

Execution

ApplicationsDevelopment

Test & Debug

Page 7: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

7

Grid Services & Middleware

Page 8: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

8

MiddlewareGlobus GT4 CondorAPST

PlatformInfrastructure Unix Windows JVM TCP/IP MPI .Net Runtime

Environmental Sciences

Life & Pharmaceutical

Sciences

ApplicationsGeo Sciences

Building Software for the Grid

VPN SSH

Courtesy IBM

Page 9: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

9

MiddlewareGlobus GT4 CondorAPST

PlatformInfrastructure Unix Windows JVM TCP/IP MPI .Net Runtime

Environmental Sciences

Life & Pharmaceutical

Sciences

ApplicationsGeo Sciences

Building Software for the Grid

VPN SSH

Courtesy IBM,Lower Middleware

Upper Middleware & Tools

Bonds

Page 10: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

10

LowerMiddleware

PlatformInfrastructure Unix Windows JVM TCP/IP MPI .Net Runtime

Environmental Sciences

Life & Pharmaceutical

Sciences

ApplicationsGeo Sciences

Building Software for the Grid

VPN SSH

Semantic Gap

Globus GT4 Web Services Shibboleth SRB

Page 11: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

11

Coding to underweardef build_rsl_file(executable, args, stagein=[], stageout=[], cleanup=[]):

tocleanup = []stderr = t5temp.mktempfile()stdout = t5temp.mktempfile()rstderr = '${GLOBUS_USER_HOME}/.nimrod/' + os.path.basename(stderr)rstdout = '${GLOBUS_USER_HOME}/.nimrod/' + os.path.basename(stdout)

rslfile = t5temp.mktempfile()f = open(rslfile, 'w')f.write("<job>\n <executable>%s</executable>\n" % executable)for arg in args:

f.write(" <argument>%s</argument>\n" % str(arg))f.write(" <stdout>%s</stdout>\n" % rstdout)f.write(" <stderr>%s</stderr>\n" % rstderr)# User defined stage-in sectionif stagein:

f.write(" <fileStageIn>")for src, dest, leave in stagein:

if not leave:tocleanup.append(dest)

f.write("""<transfer>

<sourceUrl>gsiftp://%s%s</sourceUrl><destinationUrl>file:///${GLOBUS_USER_HOME}/.nimrod/%s</destinationUrl>

</transfer>""" % (hostname, src, dest))f.write("\n\t</fileStageIn>\n")

f.write(" <fileStageOut>")# User defined stage-out files section…………………………………………………………

Page 12: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

12

LowerMiddleware

PlatformInfrastructure Unix Windows JVM TCP/IP MPI .Net Runtime

Environmental Sciences

Life & Pharmaceutical

Sciences

ApplicationsGeo Sciences

Software Layers

VPN SSH

UpperMiddleware/Tools

Globus GT4 Web Services Shilbolith SRB

Page 13: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

13

LowerMiddleware

PlatformInfrastructure Unix Windows JVM TCP/IP MPI .Net Runtime

Environmental Sciences

Life & Pharmaceutical

Sciences

ApplicationsGeo Sciences

Software Layers

VPN SSH

NimrodNimrodPortal& WS

DistANT

UpperMiddleware/Tools

MotorGlobus GT4 Web Services Shilbolith SRB

Worqbench

Debug REMUS

GriddLeSKepler Guard ActiveSheets

Development Deploy Test/Debug Execution

Page 14: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

14

Applications Development

Page 15: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

15

Why is this challenging?

Write software for local workstation

Page 16: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

16

Why is this challenging?

Build heterogeneous testbed

Page 17: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

17

Applications Development on the Grid

New ApplicationsCode to middleware standardsSignificant effortExciting new distributed applicationNumerous programming techniques

Legacy ApplicationsWere built before the GridThey are fragileFile based IOMay be sequentialLeverage old codes to produce new virtual applicationAmenable to Grid Workflows

Page 18: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

18

Approaches to Grid programming

General Purpose Workflows

Generic solutionWorkflow editor Scheduler

Special purpose workflows

Solve one class of problemSpecification languageScheduler

Page 19: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

19

Parameter Sweep Workflows with Nimrod

LowerMiddleware

NimrodNimrodPortal& WS

DistANT

UpperMiddleware/Tools

MotorGlobus GT4 Web Services Shibboleth SRB

Worqbench

Debug REMUS

GriddLeSKepler Guard ActiveSheets

Page 20: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

20

Nimrod…Supports workflows for robust design and search

Vary parametersExecute programsCopy data in and out

Sequential and parallel dependenciesComputational economy drives schedulingComputation scheduled near data when appropriateUse distributed high performance platformsUpper middleware broker for resources discoveryWide Community adoption

Nimrod

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006

Nimrod/GEnFuzion (www.axceleon.com)

Nimrod/ONimrod/OI

Nimrod/KActive Sheets (Excel)

Nimrod Roadmap

Nimrod/WS

Page 21: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

21

Parameter Studies & SearchStudy or search the behaviour of some of the output variables against a range of different input scenarios.

Design optimizationAllows robust analysisMore realistic simulations

Computations are loosely coupled (file transfer)Very wide range of applications

Page 22: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

22

Nimrod scales from local to remote resources

Office

Department

OrganisationNation

Page 23: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

23

From Quantum chemistry to aircraft design

Drug Docking Aerofoil Design

Page 24: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

24

Nimrod Development Cycle

Prepare Jobs using Portal

Jobs Scheduled Executed Dynamically

Sent to available machines

Results displayed &interpreted

Page 25: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

25

Optimization using Nimrod/O

Nimrod/G allows exploration of design scenarios

Search by enumeration

Search for local/global minima based on objective function

How do I minimize the cost of this design?How do I maximize the life of this object?

Objective function evaluated by computational model

Computationally expensive

Page 26: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

26

Genetic AlgorithmGenetic Algorithm

SimplexSimplex

Grid or ClusterGrid or Cluster

How Nimrod/OWorks

BFGSBFGS

Nimrod orNimrod orEnFuzionEnFuzion

DispatcherDispatcher

FunctionFunctionEvaluationsEvaluations

JobsJobs

NimrodNimrodPlanPlanFileFile

Page 27: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

27

Interactive Design

Human-in-the-optimization-loopUse population based methodsRank solutions

Page 28: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

28

Execution

LowerMiddleware

NimrodNimrodPortal& WS

DistANT

UpperMiddleware/Tools

MotorGlobus GT4 Web Services Shibboleth SRB

Worqbench

Debug REMUS

GriddLeSKepler Guard ActiveSheets

Page 29: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

29

Why is this challenging?

Build, schedule & Execute virtual application

Page 30: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

30

The Nimrod Portal

Page 31: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

31

Nimrod’s Runtime machinery

0

2

4

6

8

10

12

0 1 3 4 6 8 9 10 12 14 15 17 19 20 21 22 24 25 27 28 30 31 33 34 36 37 38 40 41 43 44 46 47 49 51 52 54

Time (minutes)

Jobs

Linux cluster - Monash (20) Sun - ANL (5) SP2 - ANL (5) SGI - ANL (15) SGI - ISI (10)

Soft real-time scheduling problem

Page 32: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

32

Active Sheets …

Page 33: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

33

Can we support this process better?

Deploy & Build

Execution

ApplicationsDevelopment

Test & Debug

Support scientists do what they do best

Science

Combination of MiddlewareSoftware tools

Page 34: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

34

Acknowledgements (Monash Grid Research)

Research FellowsColin EnticottSlavisa GaricJagan KommineniTom PeachyJeff Tan

PhD StudentsShahaan AyyubPhilip ChanTim HoDonny KurniawanWojtek GoscinskiAaron Searle

Funding & SupportCRC for Enterprise Distributed Systems (DSTC)Australian Research CouncilGrangeNet (DCITA)Australian Partnership for Advanced Computing (APAC)MicrosoftSun MicrosystemsIBMHewlett PackardAxceleon

Page 35: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

35

Questions?

www.csse.monash.edu.au/~davida

Page 36: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

36

parameter energy label "Variable Photon Energy" float select anyof 0.03 0.05 0.1 0.2 0.3 default 0.03 0.05 0.1 0.2 0.3;parameter iseed integer random from 0 to 10000;parameter length label "Length of collecting electrode" float select anyof .8 .9 1 default .8 .9 1;parameter radius label "Radius" float select anyof 0.0625 0.0725 0.0825 default 0.0625 0.0725 0.0825;task nodestart

copy NE2611.dat node:.copy ne2611.skel node:.

endtasktask main

node:substitute ne2611.skel NE2611.INPnode:execute ne2611.xx copy node:NE2611.OP ne2611out.$jobnamecopy node:stderr ne2611.time.$jobname

endtask

Plan File

Page 37: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

www.monash.edu.au

Burnoff of the Australian savanna –Does it affect the climate? Testing the Pragma Testbed.

K. Görgen, A. Lynch, C. Enticott*, J. Beringer, D. Abramson**,

P. Uotila, N. Tapper

School of Geography and Environmental Science* Distributed Systems Technology Centre** School of Computer Science and Software Engineering

Page 38: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

38

Savanna Burnoff

• Extensive savanna eco-systems in northern Australia

– 25 % of Australia– Vegetation: spinifex / tussok

grasslands; forest / open woodland

– Warm, semiarid tropical climate– Primary land uses:

> Pastoralism> Mining > Tourism> Aboriginal land

management

(Tropical Savannas CRC)

Page 39: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

39

Motivation

• Extensive savanna eco-systems in northern Australia

• Changing fire regime • Fires lead to abrupt changes in

surface properties– Surface energy budgets– Partititioning of convective fluxes – Increased soil heat flux→ Modified surface-atmosphere

coupling • Sensitivity study: do the fire’s

effects on atmospheric processes lead to changes in highly variable precipitation regime of Australian Monsoon?

• Many potential impacts (e.g. agricultural productivity)

(J. Beringer)

Page 40: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

40

• Combination of atmospheric modelling (C-CAM), re-analysis and observational data

• C-CAM Simulations

Experiment Design

1974 to 1978 1979 to 1999

spinup control run, no fires / succession

real fires / succession, selected scenarios

~ 90 independent runs (fire / succession scenarios)for sensitivity studies → 1890 yrs of simulations

Part IPart II

Page 41: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

41

Use of Grid Computing

• 90 parallel independent model runs • Single CPU model version of parallelized C-CAM (MPI)• Distribution of forcing data repositories to cluster sites (~80

GB), 250 MB forcing data per month• Machine independent dataformats (NetCDF)• Architecture specific, validated C-CAM executables• ~1.5 month CPU time for one experiment (90 exp. total)• Robust, portable, self-controlling model system incl. all

processing tools and restart files• PRAGMA Testbed

– Can we get enough nodes to complete experiment?– Can we maintain a testbed for 1.5 Months?– Can we maintain a node up for 0.5 days?– Can we make this routine for climate modelers?

Page 42: Parametric Modeling on the Grid with Nimrod/G · Distributed infrastructure Instruments are first class resources Lots of data Not just bigger – fundamentally different Some examples

42

0

10

20

30

40

50

60

70

80

90

100

Mar

08

2006

Mar

12

2006

Mar

17

2006

Mar

22

2006

Mar

27

2006

Mar

31

2006

Apr 0

5 20

06

Apr 1

0 20

06

Apr 1

4 20

06

Apr 1

9 20

06

Apr 2

4 20

06

Apr 2

8 20

06

May

03

2006

May

08

2006

May

13

2006

May

17

2006

May

22

2006

May

27

2006

May

31

2006

Jun

05 2

006

Jun

10 2

006

Jun

15 2

006

Jun

19 2

006

Jun

24 2

006

Jun

29 2

006

Jul 0

3 20

06

Jul 0

8 20

06

Jul 1

3 20

06

Jul 1

8 20

06

Jul 2

2 20

06

Jul 2

7 20

06

Aug

01 2

006

Aug

05 2

006

Aug

10 2

006

Aug

15 2

006

Aug

19 2

006

maharrocks-52umejupiterpragma001amata1tgcTotal