Upload
cynthia-miller
View
215
Download
1
Embed Size (px)
Citation preview
Observe that -
Today’s processors are tremendously powerful, even compared to a few years agoMillions of computers in the worldMost are not busy at any one time
…Observe that -
Large percentage of computers are interconnected via the InternetNetworking technology has made tremendous progressMillions of computers have access to relatively high performance networkingNetworking performance progressing rapidly Internet-2 Lambda Rail – DWDM 10 Gs/fiber
…Observe that -
Large number of computing problems have become increasingly complexComputational demands of computing programs have outstripped the computational capability of any one computerYet, world-wide there appears to be a surplus of computational capacity (idle machines)
Recall that…
Clusters came about by tying together a group of desktop computers…… to harness the computational power of these computers as a collective whole…physically in one place……with a single common interconnect…
Grid Computing
Why not tie computational resources (desktop computers, supercomputers, etc.) together …… and harness their collective computational power.… thus Grid Computing
Grid Computing
“A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities”( Foster and Kesselman, 1998)
A Grid is…
…A collection of computational processing elements……possibly organized dynamically……utilizing relatively high performance networking…… to provide computational resources beyond those normally available
Grid Computing
Primarily accomplished through middleware--software layers that tie discrete computers together into a gridmust be based on standards – why?*** participating elements are administratively autonomous ***
the Virtual Organization
Important concept in grid computingenabled by and part of a griddynamically “convening” expertise around a problem dynamically “constructing” resources to support the approach to a problemmay go away when problem is solved or project is completed
Middleware IssuesSecurity transaction/data security authentication
Resource Management authorization resource allocation
Information services resource monitoring job monitoring
Data Management data access data caching
Grids --Come in many “flavors”-
Cluster of clusters, grids of high performance systems well known, stable resources under administrative management
Dynamic grids Cycle “stealing” not so stable resources not always well known little or no communications among
processes - sometimes
Standards
OGSA – Open Grid Services ArchitectureOGSI – Open Grid Services Infrastructure Infrastructure around which OGSA is built Core grid service specification
On-going development through the Global Grid Forumwww.ggf.org
TeraGrid
Extensible Terascale Computational FacilityTies together HPCs from major national supercomputing centers in the U.S.Massive computational resourcesWell known, controlled computing environmentsee http://www.teragrid.org
The Sabre Grid
Overall managed by PSCcomposed of clusters from PSC… and WVU (Energy)…… and the Department of Energy (NETL)..… and a Condor flockEarly stages
Einstein@home
Cycle stealingsearches for gravitation objects – pulsars in astronomy dataruns as a screen saver – when computer is not usedBerkeley Open Infrastructure for Network Computing – BOINCBOINC – “An open-source software platform for computing using volunteered resources. “ from:http://boinc.berkeley.edu/
Other BOINC based projects
SETI@home – search for extraterrestrial intelligenceClimateprediction.net – study climate changePredictor@home - investigate protein related diseases
Global Grid Exchange
Uses central serverdeploys tasks to “common” computersfrom a large pool of available computerpotentially massive pool of computersprimarily Java basedno inter-task communicationshas process fail-over capability
Global Grid Exchange
Operated by the WV High Technology Consortium Foundationpotentially thousands of computersCan run non-Java code requires special “intervention” to get
by-pass security
CondorDeveloped and maintained by the University of Wisconsin – MadisonOriginally – a cycle-stealing approach to gathering high performance computational resourcesCan function like a clusteror like a grid (flocking)… can be part of a Globus based grid (Condor–G)Supports message passing
Types of Grids
Desktop Grids collections of computers office grids volunteer compute elements Can be heterogeneous Unreliable
Types of Grids
Cluster Grids Cluster of Clusters Single system image “completely compiled” code Stable resources Known environment Sabre
Types of Grids
HPC Grids Grid of “Big Iron” supercomputers
Very high performance Stable platform reliable known environment not so many organizational/human issues
TeraGrid
Types of Grids
Data Grids access to distributed data resource global and local resource
management common access protocol resources can be very large National Virtual Observatory
Requirements for a Grid
Interface should provide the user community with
a familiar, understandable interface command-line command (like qsub) and
tools the user community is familiar with
Job Scheduling Should be done in a manner similar to
other parallel paradigms Known queuing algorithms
Requirements for a Grid
Data Management Access to data by distributed processes
Grid Global file system does not scale beyond a point Staging/Caching data Consistent namespace
Remote Execution Environment User should have control of the execution
environment environment variables/parameters
Grid Requirements
Security Authentication – positively identify users,
devices, other resources Confidentiality – information is not disclosed
to unauthorized people, systems,… Data integrity – data not modified
accidentally, maliciously Non-repudiation – trusted confirmation –
“return receipt”
Grid Requirements
Gang Scheduling process/thread scheduling must be
managed grid wide all processes/threads must start/stop
at the same time if a process/thread fails, grid must
manage the entire job stop job, restart job
Grid Requirements
Checkpointing and Job Migration Fault-tolerence – Failure recovery Load balancing Checkpointing – automatic, user-induced,
none
Management tools to manage grid as a system must respect rights, autonomy, authority of
components
Some BarriersResource Sharing call for sharing corporate resources
things that have cost to companies/organizations
System Integrity once someone has code running your
computer….?
Data Integrity confidence in results – are they correct
architecture software environment tampering
Some BarriersAvailability Critical Grid App vs. Critical Corporate App who gets priority how to assert that priority
Ownership who owns the discovery if it was discovered
on my computer Intellectual Property – does the U of X own a
piece of my work
Licensing calls for new licensing models (no named
seats)
Some Barriers
Culpability/Liability if its wrong – who’s to blame
Propriety Commericial code running on a state-
owned computer inappropriate code