Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
1
Parametric modeling on the Grid with Nimrod/G
Jeff TanFaculty of Information TechnologyMonash e-Science and Grid Engineering Laboratory
2
Overview
New Methods in Scientific discovery
e-Science & e-ResearchThe role of Grid Services & MiddlewareSoftware Lifecycle Tools
Applications developmentExecution
Examples from Monash Tools
3
Scientific discovery
e-Science & e-Research
4
e-SciencePre-Internet
Theorize &/or experiment, aloneor in small teams; publish paper
Post-InternetConstruct and mine large databases of observational or simulation dataDevelop simulations & analysesAccess specialized devices remotelyExchange information within distributed multidisciplinary teams
Source: Ian Foster
5
Typical Grid ApplicationsCharacteristics
High Performance ComputationDistributed infrastructureInstruments are first class resourcesLots of dataNot just bigger – fundamentally different
Some examplesIn silico biology (See MyGrid)Earthquake simulationVirtual observatoryDynamic aircraft maintenanceHigh energy physicsMedical applicationsEnvironmental questions
6
Software Life Cycle on the Grid?
Deploy & Build
Execution
ApplicationsDevelopment
Test & Debug
7
Grid Services & Middleware
8
MiddlewareGlobus GT4 CondorAPST
PlatformInfrastructure Unix Windows JVM TCP/IP MPI .Net Runtime
Environmental Sciences
Life & Pharmaceutical
Sciences
ApplicationsGeo Sciences
Building Software for the Grid
VPN SSH
Courtesy IBM
9
MiddlewareGlobus GT4 CondorAPST
PlatformInfrastructure Unix Windows JVM TCP/IP MPI .Net Runtime
Environmental Sciences
Life & Pharmaceutical
Sciences
ApplicationsGeo Sciences
Building Software for the Grid
VPN SSH
Courtesy IBM,Lower Middleware
Upper Middleware & Tools
Bonds
10
LowerMiddleware
PlatformInfrastructure Unix Windows JVM TCP/IP MPI .Net Runtime
Environmental Sciences
Life & Pharmaceutical
Sciences
ApplicationsGeo Sciences
Building Software for the Grid
VPN SSH
Semantic Gap
Globus GT4 Web Services Shibboleth SRB
11
Coding to underweardef build_rsl_file(executable, args, stagein=[], stageout=[], cleanup=[]):
tocleanup = []stderr = t5temp.mktempfile()stdout = t5temp.mktempfile()rstderr = '${GLOBUS_USER_HOME}/.nimrod/' + os.path.basename(stderr)rstdout = '${GLOBUS_USER_HOME}/.nimrod/' + os.path.basename(stdout)
rslfile = t5temp.mktempfile()f = open(rslfile, 'w')f.write("<job>\n <executable>%s</executable>\n" % executable)for arg in args:
f.write(" <argument>%s</argument>\n" % str(arg))f.write(" <stdout>%s</stdout>\n" % rstdout)f.write(" <stderr>%s</stderr>\n" % rstderr)# User defined stage-in sectionif stagein:
f.write(" <fileStageIn>")for src, dest, leave in stagein:
if not leave:tocleanup.append(dest)
f.write("""<transfer>
<sourceUrl>gsiftp://%s%s</sourceUrl><destinationUrl>file:///${GLOBUS_USER_HOME}/.nimrod/%s</destinationUrl>
</transfer>""" % (hostname, src, dest))f.write("\n\t</fileStageIn>\n")
f.write(" <fileStageOut>")# User defined stage-out files section…………………………………………………………
12
LowerMiddleware
PlatformInfrastructure Unix Windows JVM TCP/IP MPI .Net Runtime
Environmental Sciences
Life & Pharmaceutical
Sciences
ApplicationsGeo Sciences
Software Layers
VPN SSH
UpperMiddleware/Tools
Globus GT4 Web Services Shilbolith SRB
13
LowerMiddleware
PlatformInfrastructure Unix Windows JVM TCP/IP MPI .Net Runtime
Environmental Sciences
Life & Pharmaceutical
Sciences
ApplicationsGeo Sciences
Software Layers
VPN SSH
NimrodNimrodPortal& WS
DistANT
UpperMiddleware/Tools
MotorGlobus GT4 Web Services Shilbolith SRB
Worqbench
Debug REMUS
GriddLeSKepler Guard ActiveSheets
Development Deploy Test/Debug Execution
14
Applications Development
15
Why is this challenging?
Write software for local workstation
16
Why is this challenging?
Build heterogeneous testbed
17
Applications Development on the Grid
New ApplicationsCode to middleware standardsSignificant effortExciting new distributed applicationNumerous programming techniques
Legacy ApplicationsWere built before the GridThey are fragileFile based IOMay be sequentialLeverage old codes to produce new virtual applicationAmenable to Grid Workflows
18
Approaches to Grid programming
General Purpose Workflows
Generic solutionWorkflow editor Scheduler
Special purpose workflows
Solve one class of problemSpecification languageScheduler
19
Parameter Sweep Workflows with Nimrod
LowerMiddleware
NimrodNimrodPortal& WS
DistANT
UpperMiddleware/Tools
MotorGlobus GT4 Web Services Shibboleth SRB
Worqbench
Debug REMUS
GriddLeSKepler Guard ActiveSheets
20
Nimrod…Supports workflows for robust design and search
Vary parametersExecute programsCopy data in and out
Sequential and parallel dependenciesComputational economy drives schedulingComputation scheduled near data when appropriateUse distributed high performance platformsUpper middleware broker for resources discoveryWide Community adoption
Nimrod
1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Nimrod/GEnFuzion (www.axceleon.com)
Nimrod/ONimrod/OI
Nimrod/KActive Sheets (Excel)
Nimrod Roadmap
Nimrod/WS
21
Parameter Studies & SearchStudy or search the behaviour of some of the output variables against a range of different input scenarios.
Design optimizationAllows robust analysisMore realistic simulations
Computations are loosely coupled (file transfer)Very wide range of applications
22
Nimrod scales from local to remote resources
Office
Department
OrganisationNation
23
From Quantum chemistry to aircraft design
Drug Docking Aerofoil Design
24
Nimrod Development Cycle
Prepare Jobs using Portal
Jobs Scheduled Executed Dynamically
Sent to available machines
Results displayed &interpreted
25
Optimization using Nimrod/O
Nimrod/G allows exploration of design scenarios
Search by enumeration
Search for local/global minima based on objective function
How do I minimize the cost of this design?How do I maximize the life of this object?
Objective function evaluated by computational model
Computationally expensive
26
Genetic AlgorithmGenetic Algorithm
SimplexSimplex
Grid or ClusterGrid or Cluster
How Nimrod/OWorks
BFGSBFGS
Nimrod orNimrod orEnFuzionEnFuzion
DispatcherDispatcher
FunctionFunctionEvaluationsEvaluations
JobsJobs
NimrodNimrodPlanPlanFileFile
27
Interactive Design
Human-in-the-optimization-loopUse population based methodsRank solutions
28
Execution
LowerMiddleware
NimrodNimrodPortal& WS
DistANT
UpperMiddleware/Tools
MotorGlobus GT4 Web Services Shibboleth SRB
Worqbench
Debug REMUS
GriddLeSKepler Guard ActiveSheets
29
Why is this challenging?
Build, schedule & Execute virtual application
30
The Nimrod Portal
31
Nimrod’s Runtime machinery
0
2
4
6
8
10
12
0 1 3 4 6 8 9 10 12 14 15 17 19 20 21 22 24 25 27 28 30 31 33 34 36 37 38 40 41 43 44 46 47 49 51 52 54
Time (minutes)
Jobs
Linux cluster - Monash (20) Sun - ANL (5) SP2 - ANL (5) SGI - ANL (15) SGI - ISI (10)
Soft real-time scheduling problem
32
Active Sheets …
33
Can we support this process better?
Deploy & Build
Execution
ApplicationsDevelopment
Test & Debug
Support scientists do what they do best
Science
Combination of MiddlewareSoftware tools
34
Acknowledgements (Monash Grid Research)
Research FellowsColin EnticottSlavisa GaricJagan KommineniTom PeachyJeff Tan
PhD StudentsShahaan AyyubPhilip ChanTim HoDonny KurniawanWojtek GoscinskiAaron Searle
Funding & SupportCRC for Enterprise Distributed Systems (DSTC)Australian Research CouncilGrangeNet (DCITA)Australian Partnership for Advanced Computing (APAC)MicrosoftSun MicrosystemsIBMHewlett PackardAxceleon
35
Questions?
www.csse.monash.edu.au/~davida
36
parameter energy label "Variable Photon Energy" float select anyof 0.03 0.05 0.1 0.2 0.3 default 0.03 0.05 0.1 0.2 0.3;parameter iseed integer random from 0 to 10000;parameter length label "Length of collecting electrode" float select anyof .8 .9 1 default .8 .9 1;parameter radius label "Radius" float select anyof 0.0625 0.0725 0.0825 default 0.0625 0.0725 0.0825;task nodestart
copy NE2611.dat node:.copy ne2611.skel node:.
endtasktask main
node:substitute ne2611.skel NE2611.INPnode:execute ne2611.xx copy node:NE2611.OP ne2611out.$jobnamecopy node:stderr ne2611.time.$jobname
endtask
Plan File
www.monash.edu.au
Burnoff of the Australian savanna –Does it affect the climate? Testing the Pragma Testbed.
K. Görgen, A. Lynch, C. Enticott*, J. Beringer, D. Abramson**,
P. Uotila, N. Tapper
School of Geography and Environmental Science* Distributed Systems Technology Centre** School of Computer Science and Software Engineering
38
Savanna Burnoff
• Extensive savanna eco-systems in northern Australia
– 25 % of Australia– Vegetation: spinifex / tussok
grasslands; forest / open woodland
– Warm, semiarid tropical climate– Primary land uses:
> Pastoralism> Mining > Tourism> Aboriginal land
management
(Tropical Savannas CRC)
39
Motivation
• Extensive savanna eco-systems in northern Australia
• Changing fire regime • Fires lead to abrupt changes in
surface properties– Surface energy budgets– Partititioning of convective fluxes – Increased soil heat flux→ Modified surface-atmosphere
coupling • Sensitivity study: do the fire’s
effects on atmospheric processes lead to changes in highly variable precipitation regime of Australian Monsoon?
• Many potential impacts (e.g. agricultural productivity)
(J. Beringer)
40
• Combination of atmospheric modelling (C-CAM), re-analysis and observational data
• C-CAM Simulations
Experiment Design
1974 to 1978 1979 to 1999
spinup control run, no fires / succession
real fires / succession, selected scenarios
~ 90 independent runs (fire / succession scenarios)for sensitivity studies → 1890 yrs of simulations
Part IPart II
41
Use of Grid Computing
• 90 parallel independent model runs • Single CPU model version of parallelized C-CAM (MPI)• Distribution of forcing data repositories to cluster sites (~80
GB), 250 MB forcing data per month• Machine independent dataformats (NetCDF)• Architecture specific, validated C-CAM executables• ~1.5 month CPU time for one experiment (90 exp. total)• Robust, portable, self-controlling model system incl. all
processing tools and restart files• PRAGMA Testbed
– Can we get enough nodes to complete experiment?– Can we maintain a testbed for 1.5 Months?– Can we maintain a node up for 0.5 days?– Can we make this routine for climate modelers?
42
0
10
20
30
40
50
60
70
80
90
100
Mar
08
2006
Mar
12
2006
Mar
17
2006
Mar
22
2006
Mar
27
2006
Mar
31
2006
Apr 0
5 20
06
Apr 1
0 20
06
Apr 1
4 20
06
Apr 1
9 20
06
Apr 2
4 20
06
Apr 2
8 20
06
May
03
2006
May
08
2006
May
13
2006
May
17
2006
May
22
2006
May
27
2006
May
31
2006
Jun
05 2
006
Jun
10 2
006
Jun
15 2
006
Jun
19 2
006
Jun
24 2
006
Jun
29 2
006
Jul 0
3 20
06
Jul 0
8 20
06
Jul 1
3 20
06
Jul 1
8 20
06
Jul 2
2 20
06
Jul 2
7 20
06
Aug
01 2
006
Aug
05 2
006
Aug
10 2
006
Aug
15 2
006
Aug
19 2
006
maharrocks-52umejupiterpragma001amata1tgcTotal