View
213
Download
0
Tags:
Embed Size (px)
Citation preview
1a.1
Introduction to Grid Computing
ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Jan 17, 2007
1a.2
“The grid virtualizes heterogeneous geographically disperse resources” from "Introduction to Grid Computing with Globus," IBM
Redbooks
• Using geographically distributed and interconnected computers together for computing and for resource sharing.
Grid Computing
1a.3
Need to harness computers
Original driving force behind grid computing same as behind the early development of networks that became the Internet:
– Connecting computers at distributed sites for high performance computing.
1a.4
History
• Began in mid 1990’s with experiments using computers at geographically dispersed sites.
• Seminal experiment – “I-way” experiment at 1995 Supercomputing conference (SC’95), using 17 sites across the US running:– 60+ applications.– Existing networks (10 networks).
1a.5
• Grid computing is about collaborating and resource sharing as much as it is about high performance computing.
1a.6
Virtual Organizations
Grid computing offerspotential of virtual organizations:
– groups of people, both geographically and organizationally distributed, working together on a problem, sharing computers AND other resources such as databases and experimental equipment.
• Crosses multiple administrative domains.
1a.7
Shared Resources
Can share much more than just computers:
• Storage
• Sensors for experiments at particular sites
• Application Software
• Databases
• Network capacity, …
1a.8
Interconnections and Protocols
Focus now on:
• using standard Internet protocols and technology, i.e. HTTP, SOAP, web services, etc.,
1a.9
Applications• Originally e-Science applications
– Computational intensive• Traditional high performance computing
addressing large problems• Not necessarily one big problem but a
problem that has to be solved repeatedly with different parameters.
– Data intensive• Computational but emphasis on large
amounts of data to stored and processed
– Experimental collaborative projects
1a.10
• Now also e-Business applications–To improve business models and
practices.
–Sharing corporate computing resources and databases
–On-demand grid computing
1a.11
Computational Grid Applications
• Biomedical research
• Industrial research
• Engineering research
• Studies in Physics and Chemistry
• …
1a.12
Sample Grid Computing Projects
NSF Network for Earthquake Engineering Simulation
(NEES) Transform our ability to carry out research vital to reducing vulnerability to
catastrophic earthquakes
from I. Foster
Environment/Earth
1a.14www.earthsystemgrid.org
DOE Earth System Grid Goal: address
technical obstacles to sharing and analysis of high-volume data from advanced earth system models
From I. Foster
1a.15
Earth System Grid I. Foster
1a.16
http://www.ediamond.ox.ac.uk/
Medicine/Biology
1a.17
http://www.openmolgrid.org/
1a.18http://www.grid.org/projects/hpf/about.htm
1a.19
Large Hadron Collider experimental facility for complex particle experiments at CERN (European Center for Nuclear Research,
near Geneva Switzerland).
Physics CERN grid
1a.20
http://www.gridpp.ac.uk/
1a.21
http://www.ppdg.net/
1a.22
http://eu-datagrid.web.cern.ch/eu-datagrid/
Data grids
1a.23
Grid computing infrastructure projects
Not tied to one specific application
1a.24
Grid networks for collaborative grid computing
projects
Grids have been set up at local level, national level, and international level throughout the world, to promote grid computing
Grid Networks
1a.25
TeraGridFunded by NSF in 2002 to link 5
supercomputer sites with 40 Gb/s links
1a.26
TeraGrid
1a.27
Grid2003: An Operational National Grid28 sites: Universities + national labs2800 CPUs, 400–1300 jobsRunning since October 2003Applications in HEP, LIGO, SDSS, Genomics
Korea
http://www.ivdgl.org/grid2003From “A Grid of One to a Grid of Many,” Miron Livny, UW-Madison, Keynote presentation, MIDnet conference, 2005.
1a.28
SURAGrid
1a.29
CiscoEPA
North Carolina’s Foundation for Grid: NCREN
4-7 MCNC-owned Clusters distributed throughout the stateLocations still under evaluation
Internet Internet 2
NLR
Internet Internet 2
NLR
InternetInternet
Existing: Blend of owned and leased fiber and circuits moving toward resilient rings powered by Cisco routers
Planned: Strong focus on owned and leased fiber, Lambda, and few circuits, in resilient rings powered by Cisco routers and Wave Division Multiplexers
Close to home:
From “Grid Computing in the Industry” by Wolfgang Gentzsch, presentation to Fall 2004 grid computing course. Full set of
slides on course home page.
1a.30
National GridsMany countries have embraced grid computing and set-up grid computing infrastructure:• UK e-Science grid• Grid-Ireland• NorduGrid• DutchGrid• POINIER grid (Poland)• ACI grid (France)• Japanese grid• etc, etc., …
1a.31
UK e-Science Grid
1a.32
Campus GridsSeveral examples of grids
within one university
and across campuses
ExampleUniversity of
Virginia Campus
Grid
1a.33
Grid Computing Software
1a.341995 2000 200519901985
Distributed computing
Remote Procedure calls (RPC)Concept of service registry
Beginnings of service oriented architecture
Object oriented approachesCORBA (Common Request Broker Architecture)
Java Remote Method Invocation (RMI)
Cluster computing
Software Techniques:
Computing platforms:
Parallel computers
Geographically distributed computers (Grid computing)
Web services
SC’95 experiment
Adopted for grid infrastructure components
Internet
mark-up languages, HTML XML
IP addresses, URLs, …
ports, protocols
Networks
Globus toolkit
4.03.x2.x1.x
1a.35
Globus Project
• Open source software toolkit developed for grid computing.
• Roots in I-way experiment.
• Work started in 1996.
• Four versions developed to present time.
• Reference implementations of grid computing standards.
• Defacto standard for grid computing.
1a.36
Globus Toolkit:Recent History
• GT2 (2.4 released in 2002)– GRAM, MDS, GridFTP, GSI.
• GT3 (3.2 released mid-2004): redesign– OGSA (Open Grid Service Architecture) - OGSI
(Open Grid Services Infrastructure) based.– Introduced “Grid services” as an extension of web
services.– OGSI now abandoned.
• GT4 (release for April 2005): redesign– WSRF (Web service Resource Framework) based.– Grid standards merged with Web services.
1a.37
Globus
• A “toolkit” of services and packages for creating the basic grid computing infrastructure
• Higher level tools added to this infrastructure
• Version 4 is web-services based• Some non-web services code exists
from earlier versions (legacy) or where not appropriate (for efficiency, etc.).
1a.39
• Each part comprises a set of web services and/or non-web service components.
• Some built upon earlier versions of Globus.
Data Management
SecurityCommonRuntime
Execution Management
Information Services
Web Services
Components
Non-WS
Components
Pre-WSAuthenticationAuthorization
GridFTP
GridResource
Allocation Mgmt(Pre-WS GRAM)
Monitoring& Discovery
System(MDS2)
C CommonLibraries
GT2
WSAuthenticationAuthorization
ReliableFile
Transfer
OGSA-DAI[Tech Preview]
GridResource
Allocation Mgmt(WS GRAM)
Monitoring& Discovery
System(MDS4)
Java WS Core
CommunityAuthorization
ServiceGT3
ReplicaLocationService
XIO
GT3
CredentialManagement
GT4
Python WS Core[contribution]
C WS Core
CommunitySchedulerFramework
[contribution]
DelegationService
GT4
Globus Open Source Grid Software
I Foster
1a.41From “Globus Toolkit 4 Tutorial,” MCNC Jan-Feb, 2005, Pawel Plaszczak and Bogdan Lobodzinski, Gridwise Technologies.
2. discover resource
3. submit job
4. transfer data
1. secure environmentGSI
GRAM
MDS
GridFTP
1a.42
Supercomputing 2003 Demonstration
• We used Globus version 2.4 in a Supercomputing 2003 demo organized by the University of Melbourne.
• 21 countries involved, numerous sites.
1a.43
1a.44
1a.45
A re-implementation of version 2 based upon the Open Grid Service Architecture (OGSA) and Open Grid Service Infrastructure (OGSI) “standards”.
The first move towards a web services implementation.
• We used version 3.2 for the Fall 2004 course.
Version 3
1a.46
Grid Computing Course(Fall 2004)
• Originated from WCU on NCREN network. Broadcast to:– UNC-Wilmington– NC State University– UNC-Asheville– UNC-Greensboro– Appalachian State University– NC Central University– Cape Fear Community College– Elon University
• Instructors: – Barry Wilkinson and
Clayton Ferner (UNC-W)• Several faculty helped at various sites• 43 students
WCU teleclassroom
1a.47
Participating Sites
1a.48
• Globus version 3 had a very short life (a little over 2 years, 2002-2004).
• Underlying implementation of version 3.x used a type of extension to web services (OGSI) that was not embraced by the community.
Globus Version 3
1a.49
Version 4
• Released early 2005.
• OGSA kept but OGSI abandoned in favor of new implementation standard based around a more compatible version of web services called Web Services Resource Framework (WSRF) standard.
1a.50
Participating Sites, Fall 2005
Participating UNC campusesPrivate institutions
1a.51
Fall 2005 Course grid structure
MCNC
UNC-W UNC-A
NCSUWCU
UNC-CASU
CA
CA
CA
CA
CA
CA
CA
Backup facility, not actually used
1a.52
National Publicity
Science Grid This WeekFeaturestory
Gridtoday.com
1a.53
Web Services-Based Grid Computing
• Grid Computing is now strongly based upon web services.
• Large number of newly proposed grid computing standards:– WS-Resource Framework– WS-Addressing– etc., etc. …. .
1a.54
Other grid computing software• National Science Foundation started NSF
middleware initiative in 2001 for bringing together all important grid computing software including:
Grid portals• Web based interfaces to accessing and
controlling grid resources and to communicate with other members of Virtual Organization
1a.55
Grid Computing Course(Spring 2007)
• Uses GT version 4 (most recent version)
• Redesigned course, now also with OGCE2/Gridsphere grid computing portal software.
• Three sites
– UNC-Charlotte
– UNC-Wilmington
– UNC-Asheville
1a.56
Other software
Meta-schedulers – to allow job to be scheduled across grid resources.
Taken some time to develop meta-schedulers.
ExampleGridway
Pre-existing local schedulers schedule jobs once at a local cluster
1a.57
Resource Discovery
• Still primitive and in research but ideal is to be able to submit a job and the system find the best grid resources for that job across the whole grid
1a.58
Other issues
Account managementUsers need access to resources, which means one way or the other, users need accounts on all resource at their disposal (or access through common VO accounts).
For the course, we set up accounts manually, but in real production grids, use automated tools to assist.
1a.59
Security• A big issue.• Has to cross administrative domains.
• Agreed mechanisms.
• Focus is on Internet security mechanisms, modified to handle the special needs of grid computing.
• Will look at this in detail later in course.
1a.60
• VisualGrid project
• Project to develop a grid-enabled a bio-informatics algorithm hardware accelerator– Principal Investigators, Arun A Ravindran
and Arindam Mukherjee (EE dept)
Grid computing projectsat UNC-C
1a.61
VisualGrid Project• Goal: Collaborative environmental visualization
research using a grid computing infrastructure• Started Jan 2006• Involves two sites, UNC-Charlotte and UNC-Asheville• plus Environment Protection Agency, Raleigh, NC
(funding agency)
EPA
1a.62
Project Structure at UNC-C(Virtual Organization)
• Visualization Charlotte Visualization Center
Bill Ribrasky, Bank of America Endowed Chair of Information Technology (VisualGrid PI)
Aidong Lu, Asst. Professor of Computer Science
• Environmental Studies Global Inst. of Energy & Environmental Syst.
Hilary Inyang, Duke Energy Distinguished Professor
Sunyoung Bae, Research Associate
Grid InfrastructureBarry Wilkinson, Professor of Computer Science
1a.63
VisualGrid Infrastructure Group:Goal: To create a geographically distributed set of resources and facilitate collaboration between VisualGrid researchers.
Team:
Barry WilkinsonJeremy Villalobos (MS student)Nikul Suthar (MS Student)Keyur Sheth (MS student)Jasper Land (BS student)
Department of Computer ScienceUNC-Charlotte
Infrastructure Support52-node University Research ClusterChuck Price, Director of University Research ComputingMike Mosley, Senior Systems Developer
1a.64
Development System(Four 3.4 Ghz dual Xeons)
visualgrid.uncc.eduVisualization
lab data server (4 Tbytes)
Compute resources52-node (104 processor)
University Research Cluster
Software: Globus 4.0, Condor.
CA
CA
Certificate Authority
UNC-Charlotte resources
UNC-Asheville resources
transylvania.tr.cs.unca.edu(8-node system)
VisualGrid ConfigurationVisualGrid portal
1a.65
National AttentionListed as one of the portals to use OGCE2
1a.66
1a.67
UNC-Asheville
Bioinformatics hardware accelerator
52-node UNC-Charlotte university research cluster
UNC-C Dept of CS grid computing development system
4TB Windows 2003 data server reached through coit-grid02.uncc.edu (samba mount)
1a.68
Sample VisualGrid portlets
One CMAQ script editing portlet
CMAQ portlet, main page
CMAQ settings portlet Tabs for various CMAQ actions
1a.69
VisualGrid Links
VisualGrid Infrastructure group pagehttp://www.cs.uncc.edu/~abw/VisualGrid/
VisualGrid portalhttp://visualgrid.uncc.edu
VisualGrid Portal User’s Guidehttp://www.cs.uncc.edu/~abw/VisualGrid/PortalInstr.doc
wikihttp://visualgrid.uncc.edu/wiki
1a.70
There will (may) be multiple-choice quizzes in the course (on-line through WebCT).
Quiz
Question: What is a virtual organization?
(a) An imaginary company.(b) A web-based organization.(c) A group of people geographically distributed that
come together from different organizations to work on grid project.
(d) A group of people that come together to work on a virtual reality grid project.
1a.71
Questions