Upload
oscar-roberts
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
9-1.1
“Grid-enabling” applications
Part 1
© 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26, 2010
Grid-enabling an applicationA poorly defined and understood term.
It does NOT mean simply executing a job of a Grid platform!Almost all computer batch programs can be shipped to a remote Grid site and executed with little more than with a remote ssh connection.
This is a model we have had since computers were first connected (via telnet).
Grid-enabling should include utilizing the unique distributed nature of the Grid platform.
9-1.2
Grid-enabling an application
With that in mind, a simple definition is:
Being able to execute an application on a Grid platform, using the distributed
resources available on that platform.
However, even that simple definition is not agreed upon by everyone!
9-1.3
A broad definition that matches our view of Grid enabling applications is:
“Grid Enabling refers to the adaptation or development of a program to provide the capability of interfacing with a grid middleware in order to schedule and utilize resources from a dynamic and distributed pool of “grid resources” in a manner that effectively meets the program’s needs”2
2 Nolan, K., “Approaching the Challenge of Grid-Enabling Applications.,” Open Source Grid & Cluster Conf., Oakland, CA, 2008.
9-1.4
9-1.5
How does one do “Grid-enabling”?
Still an open question and in the research domain without a standard approach.
Here we will describe various approaches.
We can divide the use of the computing resources in a Grid into two types:
•Using multiple computers separately to solve multiple problems
•Using multiple computers collectively to solve a single problem
9-1.6
Using Multiple Computers SeparatelyParameter Sweep Applications
In some domains areas, scientists need to run the same program many times but with different input data.
“Sweep” across parameter space with different values of input parameter values in search of a solution.
Many cases, not easy to compute answer and human intervention is required for to search or design space
9-1.7
Parameter Sweep ApplicationsExamples
•A scientist might wish to search for a new drug and needs to try different formulations that might best fit with a particular protein.
•A design engineer might be studying effects of different aerodynamic designs on performance of an aircraft.
•Computing aesthetic design process with many possible alternative designs and a human has to choose.
•Sometimes, a learning process - design engineer wishes to understand effects of changing various parameters.
9-1.8
Parameters in Parameter Sweep
Typically, many parameters that can be altered.
Might be a vast combination of parameter values.
Ideally, some automated way of doing parameter sweep needed that includes both specifying parameter sweep and a way of scheduling individual sweeps across Grid platform.
9-1.9
Implementing Parameter Sweep
Can be simply achieved by submitting multiple job description files, one for each set of parameters but that is not very efficient.
Parameter sweep applications are so important that research projects devoted to making them efficient on a Grid.
Parameter sweeps appears explicitly in job description languages.
9-1.10
RSL-2/JDD Example
<count> 5 </count>
causes five instances of job to be submitted.
Simply cause five identical executables submitted.
Four would be pointless unless either:•Code selected actions for each instance, or •different inputs and output files selected for each instance in job description file.
Job description elements usually can be specified to change for each instance.
9-1.11
JSDL (version 1)Originally did not have parameter sweep.
Has been (unofficially) extended to incorporate features for parameter sweep.
Two forms of parameter sweep creation identified:
•Enumeration in a list, and•Numerically related arguments.
9-1.12
Arguments Enumerated in a List
Two additional elements:
•<Parameter> To specify selection of parameters•<Value> To list the values
contained within an <Assignment> element for each assignment.
Multiple/nested assignments for various scenarios:
• Single substitution or • Multiple simultaneous substitutions in different combinations.
9-1.13
9-1.14Fig 9.1
Parameter sweep element selection and substitution
9-1.15Fig 9.2
Selecting XML Element
Expression needed that selects an XML element.
XPath expression -- provides a way to select an XML element in a XML document.
9-1.16
XPathSuppose XML document has form:
<a><b>
<c> </c>
</b></a>
XPath expression to identify element :
<c> ... </c>
would be /a/b/c
9-1.17
XPath allows for much more expressive forms.
For example suppose multiple tags called <c>:
<a><b>
<c> </c>..<c> </c>
</b></a>
Expression to select 3rd <c> element is /a/b/c[3]
9-1.18
To take an example for parameter sweep, consider JSDL job:
<jsdl:JobDefinition>
<jsdl:JobDescription>
<jsdl:Application>
<jsdl-posix:POSIXApplication>
<jsdl-posix:Executable>/bin/echo</jsdl-posix:Executable>
<jsdl-posix:Argument>Hello</jsdl-posix:Argument>
<jsdl-posix:Argument>Fred</jsdl-posix:Argument>
</jsdl-posix:POSIXApplication>
</jsdl:Application>
</jsdl:JobDescription>
</jsdl:JobDefinition>
9-1.19
To alter second argument to be Bob, Alice, and Tom (3 sweeps):<jsdl:JobDefinition><jsdl:JobDescription><jsdl:Application><jsdl-posix:POSIXApplication><jsdl-posix:Executable>/bin/echo</jsdl-posix:Executable><jsdl-posix:Argument>Hello</jsdl-posix:Argument><jsdl-posix:Argument>Fred</jsdl-posix:Argument>
</jsdl-posix:POSIXApplication></jsdl:Application></jsdl:JobDescription><sweep:Sweep><sweep:Assignment>
<sweep:Parameter>//jsdl-posix:Argument[2]</sweep:Parameter><sweepfunc:Values><sweepfunc:Value>Bob</sweepfunc:Value><sweepfunc:Value>Alice</sweepfunc:Value><sweepfunc:Value>Tom</sweepfunc:Value>
</sweepfunc:Values></sweep:Assignment>
</sweep:Sweep></jsdl:JobDefinition> 9-1.20
Question
What is the output from the echo programs?
9-1.21
Numerically Related Arguments
Job description languages such as JSDL can be extended to increment an integer argument automatically with a for-like construct.
for construct would specify the values of an argument, which would substitute in a similar fashion to the previous substitutions—essentially a macro-substitution.
9-1.22
Example - XPML job description languageFirst, a parameter element specifies argument values, for example
<parameter name="arg1" type="integer" domain="range">
<range from="1" to="99" type="step" interval="2"/>
</parameter>
Argument called arg1. Values for arg1 here are 1,3,5 ... 99.
Argument arg1 would occur later within execute element:
<execute>
<command value=" ... "/>
<arg value="$arg1"/>
...
</execute>
One value of arg1 substitutes for each sweep. 9-1.23
Using Multiple Computers Collectively
9-1.24
Data partitioning
Perhaps easiest way to use multiple computers together.
Divide data into parts.
Each computer works on each part.
9-1.25
Example
BLAST algorithm used in bioinformatics to find statistical matches between gene sequences.
User might submit sequence query that is compared to a very large database of known sequences in order to discover relationships or to match sequence to a gene family.
Databases extremely large.
9-1.26
Partitioning BLAST database
9-1.27
If just one sequence from user, database partitioned into parts and different computers work on different parts.
Fig 9.3
Alternatively, if user(s) submitting many queries, submit each query to a different computer having access to whole database
9-1.28Fig 9.4
Legacy Code
9-1.29
In many cases, Grid users want to re-use their existing programs written in C, C++ or even Fortran if really old.
Documented source code may not be available.
May be pre-packaged by manufacturer so rewriting not an option.
9-1.30
Grid Enabling Legacy Software (GriddLeS)
One project that addresses porting legacy code onto a Grid.
Focuses on file handling
Overloads existing file handling routines and redirects requests to remote locations if required.
9-1.31
Grid Enabling Legacy Software (GriddLeS)
Derived from: http://www.csse.monash.edu.au/~davida/griddles/
Exposing an Application as a Service
• Grid computing has embraced Web service technology so natural to consider its use for accessing applications.
• “Wrap” application code to produce a Web service
• “Wrapping” means application not accessed directly but through service interface
9-1.32
Web Service Wrapper Approach
9-1.33Fig 9.5
Web service invoking a program
If Web service written in Java, service could issue a command in a separate process using exec method of current Runtime object with the construction:
Runtime runtime = Runtime.getRuntime();
Process process = runtime.exec(“<command>” )
where <command> is command to issue, capturing output with
OutputStream stdout = process.getOutputStream();
...9-1.34
Portlet acting as a front-end to a wrapped application
9-1.35Fig 9.6
Application with physically distributed components
9-1.36Fig 9.7
Using Grid Middleware API’s
Could use Grid middleware APIs in application code for operations such as:
• File input/output
• Starting and monitoring jobs
• Monitoring and discovery of Grid resources.
9-1.37
Using Globus API’s
Globus provides suite of services that have APIs (C and Java interfaces) that could be called from the application.
Extremely steep learning curve!!
Literals hundreds, if not thousands, of C and Java routines listed at the Globus site.
No tutorial help and sample usage.9-1.38
Code using Globus APIs to copy a file (C+
+)
Directly from (van Nieuwpoort) Also in (Kaiser 2004) (Kaiser
2005).
9-1.39
Using CoG kit API’s
Using CoG kit API’s is at slightly higher level.
Not too difficult but still requires setting up the Globus context.
9-1.40
CoG Kit program
to transfer files
9-1.41
Higher Level Middleware-Independent APIs
Higher level of abstraction than Globus middleware API’s desirable because:
•Complexity of Globus routines
•Grid middleware changes very often
•Globus not only Grid middleware
9-1.42
Other Grid middlewareIncludes:
• UNICORE (Uniform Interface to Computing Resources)
• gLite (Lightweight Middleware for Grid computing)
– part of EGEE (Enabling Grids for E-sciencE) collaborative.
To give an indication of the rapid changes that occur:
• gLite 3.0.2 Update 43 released May 22, 2008.• gLite 3.1 Update 27 released July 3, 2008 6 weeks later. 9-1.43
Concept of higher-level API’s above
Grid middleware
9-1.44
Higher-level API’s should expose simple interface not tied to specific version of Grid middleware or even Grid middleware family at all.
Fig 9.8
9-1.45
Grid Application Toolkit (GAT)
• APIs for developing and executing portable Grid applications that are independent of the underlying Grid infrastructure and available services.
• Developed in 2003-2005 time frame.
9-1.46
9-1.47
Copy a file in GAT/C++(Kaiser, H. 2005)
9-1.48
SAGA(Simple API for Grid Applications)
A subsequent effort made by Grid community to standardize higher level API’s
9-1.49
SAGA Reading a file (C++) (Kielmann 2006)
9-1.50
What is meant by parameter sweep?
(a) Executing an application multiple times each time with the arguments specifically incremented by one each time
(b) Executing an application multiple times with the same arguments
(c) Executing an application multiple times with different arguments
(d) Cleaning out the parameters from a computer program
SAQ 9-2
9-1.51
What is the XPath expression to select the second c element within the second b element within the second a element of an XML document?
(a) 2a/2b/2c
(b) a/b/c[2]
(c) a[2]/b[2]/c[2]
(d) a2/b2/c2
(e) 2a2b2c
(f) None of the other answersSAQ 9-3
Questions
9-1.52