1
1
Grid Computing andNetSolve / GridSolve
Jack Dongarra
2
Between Now and the EndToday April 5th
Grids and NetSolveApril 12th and 19th
Dr. Langou on Iterative Methods in Linear AlgebraApril 26th
Dr. Cronk on Tools for Debugging and Performance Analysis
May 1st
Project report are due (email to [email protected])May 3rd
Presentation of class projects Starting at 12:30
2
3
More on the In-Class Presentations
Start at 12:30 on Wednesday, 5/3/06Roughly 20 minutes eachUse slidesDescribe your project, perhaps motivate via applicationDescribe your method/approachProvide comparison and results
See me about your topic
4
No in Class Final Exam
Grade will be based on Homework
30%
Project presentation30%
Project write up30%
Class participation10%
3
5
Distributed Computing
Concept has been around for two decadesBasic idea: run scheduler across systems to runs processes on least-used systems first
Maximize utilizationMinimize turnaround time
Have to load executables and input files to selected resource
Shared file systemFile transfers upon resource selection
6
Examples of Distributed Computing
Workstation farms, Condor flocks, etc.Generally share file system
SETI@home project, United Devices, etc.Only one source code; copies correct binary code and input data to each system
Napster, Gnutella, Bit Torrent: file/data sharingNetSolve
Runs numerical kernel on any of multiple independent systems, much like a Grid solution
4
7
The Grid Problem
Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resourceFrom “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”
Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals --assuming the absence of…
central location,central control, omniscience, existing trust relationships.
8
Elements of the ProblemResource sharing
Computers, storage, sensors, networks, …Sharing always conditional: issues of trust, policy, negotiation, payment, …
Coordinated problem solvingBeyond client-server: distributed data analysis, computation, collaboration, …
Dynamic, multi-institutional virtual organisationsCommunity overlays on classic org structuresLarge or small, static or dynamic
5
9
Grid vs. Internet/Web Services?
We’ve had computers connected by networks for 20 yearsWe’ve had web services for 5-10 yearsThe Grid combines these things, and brings additional notions
Virtual OrganizationsInfrastructure to enable computation to be carried out across these
Authentication, monitoring, information, resource discovery, status, coordination, etc
Can I just plug my application into the Grid?No! Much work to do to get there!
10
What is Grid Computing? Resource sharing & coordinated problem
solving in dynamic, multi-institutional virtual organizations
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
IMAGING INSTRUMENTS
COMPUTATIONALRESOURCES
LARGE-SCALE DATABASES
DATA ACQUISITION ,ANALYSIS
ADVANCEDVISUALIZATION
6
11
The Computational Grid is……a distributed control infrastructure that allows applications to treat compute cycles as commodities.Power Grid analogy
Power producers: machines, software, networks, storage systemsPower consumers: user applications
Applications draw power from the Grid the way appliances draw electricity from the power utility.
SeamlessHigh-performanceUbiquitousDependable
12
Introduction
Grid systems can be classified depending on their usage:
ComputationalGrid
GridSystems
Collaborative
Data Grid
ServiceGrid
HighThroughput
On Demand
Multimedia
DistributedSupercomputing
7
13
The Grid
14
The Grid Architecture Picture
Resource Layer
High speed networks and routers
Computers Data bases Online instruments
Service Layers
User Portals
Authentication
Co- Scheduling
Naming & Files Events
Grid Access & InfoProblem SolvingEnvironments
Application SciencePortals
Resource Discovery& Allocation Fault Tolerance
Software
8
15
Basic Grid Architecture
Clusters and how grids are different than clustersDepartmental Grid ModelEnterprise Grid ModelGlobal Grid Model
16
Examples of Grids
TeraGrid – NSF funded linking 5 major research sites at 40 Gbs (www.teragrid.org)European Union Data Grid – grid for applications in high energy physics, environmental science, bioinformatics (www.eu-datagrid.org)Access Grid – collaboration systems using commodity technologies (www.accessgrid.org)Network for Earthquake Engineering Simulations Grid - grid for earthquake engineering (www.nees.org)
9
17
Teragrid Network
18
...and enabling broad based collaborations
GloriadNLR UltraScience NetTeraGrid
10
19
Examples of GridProgramming Technologies
MPICH-G2: Grid-enabled message passingCoG Kits, GridPort: Portal construction, based on N-tier architecturesGDMP, Data Grid Tools, SRB: replica management, collection managementCondor-G: simple workflow managementLegion: object models for Grid computingNetSolve/GridSolve: Network enabled solverCactus: Grid-aware numerical solver framework
Note tremendous variety, application focus
20
Atmospheric Sciences Grid
Real time data
Data Fusion
General Circulation model
Regional weather model
Photo-chemical pollution model Particle dispersion model
TopographyDatabase
TopographyDatabase
VegetationDatabase
VegetationDatabaseBushfire modelEmissions
InventoryEmissions Inventory
11
21
Standard Implementation
GASS
Real time data
Data Fusion
General Circulation model
Regional weather model
Photo-chemical pollution model Particle dispersion model
TopographyDatabase
TopographyDatabase
VegetationDatabase
VegetationDatabaseEmissions
InventoryEmissions Inventory
MPIMPI
MPI
GASS/GridFTP/GRC
MPI
MPI
Bushfire model GASS
Change Models
22
MPICH-G2: A Grid-Enabled MPI
A complete implementation of the Message Passing Interface (MPI) for heterogeneous, wide area environments
Based on the Argonne MPICH implementation of MPI (Gropp and Lusk)
Globus services for authentication, resource allocation, executable staging, output, etc.Programs run in wide area without changeSee also: MetaMPI, PACX, STAMPI, MAGPIE
www.globus.org/mpi
12
23
Useful References
Book (Morgan Kaufman)www.mkp.com/grids
Perspective on Grids“The Anatomy of the Grid: Enabling Scalable Virtual Organizations”, IJSA, 2001www.globus.org/research/papers/anatomy.pdf
All URLs in this section of the presentation, especially:
www.gridforum.org, www.grids-center.org, www.globus.org
24
Globus Grid Services
The Globus toolkit provides a range of basic Grid services
Security, information, fault detection, communication, resource management, ...
These services are simple and orthogonalCan be used independently, mix and matchProgramming model independent
For each there are well-defined APIsStandards are used extensively
E.g., LDAP, GSS-API, X.509, ... You don’t program in Globus, it’s a set of tools like Unix
13
25
Distributed and Parallel Systems
Distributedsystemshetero-geneous
Massivelyparallelsystemshomo-geneous
Grid
bas
ed
Com
putin
gBe
owul
f clu
ster
Netw
ork
of w
sCl
uste
rs w
/sp
ecia
l int
erco
nnec
t
Entro
pia
ASCI
Tflo
ps(1
30 T
flop/
s)
Gather (unused) resourcesSteal cyclesSystem SW manages resourcesSystem SW adds value10% - 20% overhead is OKResources drive applicationsTime to completion is not criticalTime-shared
Bounded set of resources Apps grow to consume all cyclesApplication manages resourcesSystem SW gets in the way5% overhead is maximumApps drive purchase of equipmentReal-time constraintsSpace-shared
SETI
@ho
me
(27
Tflo
p/s)
Para
llel D
ist m
em
26
GridSolve:Basic Usage Scenarios
Grid based numerical library routines
User doesn’t have to have software library on their machine, LAPACK, SuperLU, ScaLAPACK, PETSc, AZTEC, ARPACK
Task farming applications“Pleasantly parallel” executioneg Parameter studies
Remote application executionComplete applications with user specifying input parameters and receiving output
“Blue Collar” Grid Based Computing
Does not require deep knowledge of network programmingLevel of expressiveness right for many usersUser can set things up, no “su” requiredIn use today, up to 200 servers in 9 countries
Can plug into Globus, Condor, NINF, …
14
27
NetSolve Network Enabled Server
NetSolve is an example of a grid based hardware/software server.Easy-of-use paramountBased on a RPC model but with …
resource discovery, dynamic problem solving capabilities, load balancing, fault tolerance asynchronicity, security, …
Other examples are NEOS from Argonne and NINF Japan.Use resources, not tie together geographically distributed resources, for a single application.
28
NetSolve: The Big Picture
AGENT(s)
AC
S1 S2
S3 S4
Client
Matlab
Mathematica
C, Fortran
Web
Schedule
Database
No knowledge of the grid required, RPC like.
IBP Depot
15
29
NetSolve: The Big Picture
AGENT(s)
AC
S1 S2
S3 S4
Client
Matlab
Mathematica
C, Fortran
Web
Schedule
Database
No knowledge of the grid required, RPC like.
A, BIBP Depot
30
NetSolve: The Big Picture
AGENT(s)
AC
S1 S2
S3 S4
Client
Matlab
Mathematica
C, Fortran
Web
Schedule
Database
No knowledge of the grid required, RPC like.
HandlebackIBP Depot
16
31
NetSolve: The Big Picture
AGENT(s)
AC
S1 S2
S3 S4
Client
Answer (C)
S2 !
Request
Op(C, A, B)
Matlab
Mathematica
C, Fortran
Web
Schedule
Database
No knowledge of the grid required, RPC like.
A, B
OP, handle
IBP Depot
32
NetSolve Agent
Name server for the NetSolve system.Information Service
client users and administrators can query the hardware and software services available.
Resource schedulermaintains both static and dynamic information regarding theNetSolve server components touse for the allocation of resources
Agent
17
33
NetSolve Agent
Resource Scheduling (cont’d):CPU Performance (LINPACK).Network bandwidth, latency.Server workload.Problem size/algorithm complexity.Calculates a “Time to Compute.” for each appropriate server.Notifies client of most appropriate server.
Agent
34
NetSolve - Load BalancingNetSolve agent :
predicts the execution times and sorts the servers
Prediction for a server based on :• Its distance over the network
- Latency and Bandwidth- Statistical Averaging
• Its performance (LINPACK benchmark)• Its workload• The problem size and the algorithm complexity
Cached data Quick estimate
workload out of date ?
18
35
Function Based Interface.Client program embeds call from NetSolve’s API to access additional resources.Interface available to C, Fortran, Matlab, Mathematica, and Java.Opaque networking interactions.NetSolve can be invoked using a variety of methods: blocking, non-blocking, task farms, …
NetSolve Client
Client
36
NetSolve Client
Intuitive and easy to use.Matlab Matrix multiply e.g.:
A = matmul(B, C);
A = netsolve(‘matmul’, B, C);
• Possible parallelisms hidden.
Client
19
37
NetSolve Client
i. Client makes request to agent.
ii. Agent returns list of servers.
iii. Client tries each one in turn untilone executes successfully or list is exhausted.
Client
38
NetSolve - MATLAB Interface
>> define sparse matrix A>> define rhs>> [x, its] = netsolve( ‘itmeth’, ‘petsc’, A, rhs );…>> [x, its] = netsolve( ‘itmeth’, ‘aztec’, A, rhs ); >> [x, its] = netsolve( ‘solve’, ‘superlu’, A, rhs ); >> [x, its] = netsolve( ‘solve’, ‘ma28’, A, rhs );
Synchronous Call
Asynchronous Calls also available
20
39
NetSolve - FORTRAN Interface
parameter( MAX = 100)double precision A(MAX,MAX), B(MAX)integer IPIV(MAX), N, INFO, LWORKinteger NSINFO
call DGESV(N,1,A,MAX,IPIV,B,MAX,INFO)
Easy to ‘switch’ to NetSolve
call NETSL(‘DGESV()’,NSINFO,N,1,A,MAX,IPIV,B,MAX,INFO)
40
Hiding the Parallel Processing
User maybe unaware of parallel processing
NetSolve takes care of the starting the message passing system, data distribution, and returning the results.
21
41
Problem Description Specification
Specifies the calling interface between GridSolve and the service routine
Original NetSolve problem description filesStrange notationDifficult for users to understand
Previous attempts to simplify involved GUI front-ends
In GridSolve, the format is totally re-designed Specified in a manner similar to normal function prototypesSimilar to Ninf
42
GridSolve Problem Description (DGESV)
SUBROUTINE dgesv(IN int N, IN int NRHS,INOUT double A[LDA][N], IN int LDA,OUT int IPIV[N], INOUT double B[LDB][NRHS],IN int LDB, OUT int INFO)
"This solves Ax=b using LAPACK"LANGUAGE = "FORTRAN"LIBS = "$(LAPACK_LIBS) $(BLAS_LIBS)"COMPLEXITY = "2.0*pow(N,3.0)*(double)NRHS"MAJOR="COLUMN"
SUBROUTINE DGESV(N,NRHS,A,LDA,IPIV,B,LDB,INFO)INTEGER INFO, LDA, LDB, N, NRHSINTEGER IPIV( * )DOUBLE PRECISION A( LDA, * ), B( LDB, * )
Original Fortran Subroutine:
GridSolve IDL Specification:
22
43
GridSolve Interface Definition LanguageData types: int, char, float, doubleArgument passing modes:
IN -- input only; not modifiedINOUT -- input and outputOUT -- output onlyVAROUT -- variable length output only dataWORKSPACE -- server-side allocation of workspace; not passed as part of calling sequence
Argument sizeSpecified as expression using scalar arguments, e.g. ddot(IN int n, IN double dx[n*incx], IN int incx, …All typical operators supported (+, -, *, /, etc).
44
Problem AttributesMAJOR
Row or column major; depends on the implementation of the service routine
LANGUAGELanguage in which the service routine is implemented; currently C or Fortran
LIBSAdditional libraries to be linked
COMPLEXITYTheoretical complexity of the service routine, specified in terms of the arguments
23
45
Building the Servicescd GridSolve/src/problemmake check
When building a new service, the server should be restarted, but thereafter it is not necessary.
For more detailed documentation, consult the manual:
cd GridSolve/doc/ugmakeghostview ug.ps
46
Task Farming -Multiple Requests To Single Problem
A Solution:Many calls to netslnb( ); /* non-blocking */
Farming Solution:Single call to netsl_farm( );
Request iterates over an “array of input parameters.”
Adaptive scheduling algorithm.
Useful for parameter sweeping, and independently parallel applications.
24
47
Data Persistence
Chain together a sequence of NetSolve requests.Analyze parameters to determine data dependencies. Essentially a DAG is created where nodes represent computational modules and arcs represent data flow.Transmit superset of all input/output parameters and make persistent near server(s) for duration of sequence execution.Schedule individual request modules for execution.
48
Server Software Repository
Dynamic downloading of new software.Enhance servers capabilities without shutdown and restart.Repository maintained independently of server.
Hardware SoftwareHardware Software
NetSolve Server
25
49
NetSolve: A Plug into the Grid
NetSolve
C Fortran
Globusproxy
NetSolveproxy
Ninfproxy
Condorproxy
Gridmiddleware
Resource Discovery
System Management Resource Scheduling
Fault Tolerance
50
NetSolve: A Plug into the Grid
NetSolve
C Fortran
Globus NetSolveservers
Ninfservers
NetSolveservers
Condor
NetSolveservers
Globusproxy
NetSolveproxy
Ninfproxy
Condorproxy
Grid back-ends
Gridmiddleware
Resource Discovery
System Management Resource Scheduling
Fault Tolerance
26
51
NetSolve: A Plug into the Grid
NetSolve
C Fortran
Matlab Mathematica Custom
Globus NetSolveservers
Ninfservers
NetSolveservers
Condor
NetSolveservers
Globusproxy
NetSolveproxy
Ninfproxy
Condorproxy
PSEfront-ends
Grid back-ends
SCIRun
Gridmiddleware
Remote procedure call
Resource Discovery
System Management Resource Scheduling
Fault Tolerance
52
University of Tennessee Deployment: Scalable Intracampus Research Grid SInRG
Federated Ownership: CS, ChemEng., Medical School, Computational Ecology, El. Eng.Real applications, middleware development, logistical networking
The Knoxville Campus has two DS-3 commodity Internet connections and one DS-3 Internet2/Abilene connection. An OC-3 ATM link routes IP traffic between the Knoxville campus, National Transportation Research Center, andOak Ridge National Laboratory. UT participates in several national networking initiatives including Internet2 (I2),Abilene, the federal Next Generation Internet (NGI) initiative, Southern Universities Research Association (SURA)Regional Information Infrastructure (RII), and Southern Crossroads (SoX).
The UT campus consists of a meshed ATM OC-12 being migrated over to switched Gigabit by early 2002.
27
53
54
NetSolve Monitor
http://anaka.cs.utk.edu:8080/monitor/signed.html
28
55
Thanks
Fran Berman Director, San Diego Supercomputer Center
Jay BoisseauDirector, Texas Advanced Computing Center