Introduction to Themes and Technologies
Per Öster
CSC – IT Center for Science Ltd
Finland
CSC at a glance
● Founded in 1970 as a technical support unit for Univac 1108
● Reorganized as a company, CSC - Scientific Computing Ltd. in 1993
● All shares to the Ministry of Education of Finland in 1997
● Operates on a non-profit principle
● Facilities in Espoo, close to Otaniemi community (of 15,000 students and 16,000 technology professionals)
● Staff 170
● Turnover 2008 19,6 million euros
Themes of the First Week
Date Theme Technology
Tue 7 July Principles of job submission and execution management
UNICORE
Wed 8July Principles of high-throughput computing CONDOR
Thu 9 July Principles of service-oriented architectures Globus
Fri 10 July Principles of distributed data management gLite
Sat 11 July Principles of using distributed and high performance systems
ARC
Themes of the Second Week
Date Theme Technology
Mon 13 July How to solve my problem?
Tue 14 July Higher level APIs: OGSA-DAI, SAGA and metadata management
SAGA,OGSA-DAI,Grid SAM
Wed 15 July Workflows P-GRADE,Semantic Metadata
Thu 16 July Integrating Practical All
Fri 17 July Cloud Computing (lecture)
The Acronyms
Acronym What
WSRF Web Services Resource Framework
OGSA Open Grid Service Architecture
SOA Service Oriented Architecture
Principles of job submission and execution management
Principles of high-throughput computing
Principles of service-oriented architecture
Principles of distributed data management
Principles of using distributed and high performance systems
Higher level APIs: OGSA-DAI, SAGA and metadata management
Workflows
Principles of job submission and execution management
Principles of high-throughput computing
Principles of service-oriented architecture
Principles of distributed data management
Principles of using distributed and high performance systems
Higher level APIs: OGSA-DAI, SAGA and metadata management
Workflows
1. Principles of job submission and execution management
• Vision• UNiform Interface to COmputing Resources
- seamless, secure, and intuitive
• History• 08/1997 – 12/2002: UNICORE and UNICORE Plus
projects- Initial development started in two German projects
funded by the German ministry of education and research (BMBF)
• Continuation in different EU projects since 2002• Open Source community development since summer
2004
9http://www.unicore.eu
UNICORE 6 Guiding Principles, Implementation Strategies
Open source under BSD license with software hosted on SourceForge Standards-based: OGSA-conform, WS-RF 1.2 compliant Open, extensible Service-Oriented Architecture (SOA) Interoperable with other Grid technologies Seamless, secure and intuitive following a vertical end-to-end approach Mature Security: X.509, proxy and VO support Workflow support tightly integrated while being extensible for different
workflow languages and engines for domain-specific usage Application integration mechanisms on the client, services and
resource level Variety of clients: graphical, command-line, API, portal, etc. Quick and simple installation and configuration Support for many operating systems (Windows, MacOS, Linux, UNIX)
and batch systems (LoadLeveler, Torque, SLURM, LSF, OpenCCS) Implemented in Java to achieve platform-independence
10
UNICOREWS-RFhosting
environment
XNJS – Site 1
IDB
UNICORE Atomic Services
OGSA-*
ServiceRegistry
Local RMS (e.g. Torque, LL, LSF, etc.)
Target System Interface – Site 1
Local RMS (e.g. Torque, LL, LSF, etc.)
X.509, Proxies, SOAP, WS-RF,
WS-I, JSDL
OGSA-ByteIO, OGSA-BES, JSDL,
HPC-P, OGSA-RUS, UR
X.509, XACML, SAML, Proxies
DRMAA
UCCcommand-line client
URCEclipse-based Rich client
Portal e.g. GridSphere
HiLAProgrammingAPI
Gateway – Site 1
UVOSVO
Service
ExternalStorage
USpace
GridFTP, Proxies
USpace
XUUDB
WorkflowEngine
ServiceOrchestrator
XACML entity
UNICOREWS-RFhostingenvironment
XNJS – Site 2
IDB
UNICORE Atomic Services
OGSA-*
Target System Interface – Site 2
XUUDB
XACML entity
Gateway – Site 2CISInfo
Service
OGSA-RUS, UR,GLUE 2.0
Grid services hosting
job incarnation
web service stack
data transfer to external storages
authorization
authentication
scientific clientsand applications
central services running in WS-RF hosting
environments
Gateway
http://www.unicore.eu
11http://www.unicore.eu
Two layer architecture for scalability
Workflow engine Based on Shark
open-source XPDL engine
Pluggable, domain-specific workflow languages
Service orchestrator Job execution and monitoring Callback to workflow engine Brokering based on pluggable strategies
Clients GUI client based on Eclipse Commandline submission of workflows is also possible
Workflows in
Principles of job submission and execution management
Principles of high-throughput computing
Principles of service-oriented architecture
Principles of distributed data management
Principles of using distributed and high performance systems
Higher level APIs: OGSA-DAI, SAGA and metadata management
Workflows
High-Throughput Computing
• Large amount of tasks that can be executed independently• Parameter Studies• Monte Carlo or Stochastic Methods• Genome Sequencing (matching)• Analysis of LHC data• :
Starting from this
Looking for this
(1 in 1013)
2. Principles of high-throughput computing
• Vision• Condor provides high-throughput computing in a variety of
environments- Local dedicated clusters (machine rooms)- Local opportunistic (desktop) computers)- Grid environments; Can submit jobs to other systems- Can run workflows of jobs- Can run parallel jobs- Independently parallel (lots of single jobs)- Tightly coupled (such as MPI)
2. Principles of high-throughput computing
• History and Activity • Distributed Computing research performed by a team of ~35
faculty, full time staff and students who• Established in 1985• Faces software/middleware engineering challenges in
a UNIX/Linux/Windows/OS X environment, • Involved in national and international collaborations,• Interacts with users in academia and industry,• Maintains and support a distributed production
environment (more than 5000 CPUs at UW),• Educates and trains students.
Condor Project:Main Threads of Activities
• Distributed Computing Research – develop and evaluate new concepts, frameworks and technologies
• Develop and maintain Condor; support our users – More on next slide
• The Open Science Grid (OSG) – build and operate a national High Throughput Computing infrastructure
• The Grid Laboratory Of Wisconsin (GLOW) – build, maintain and operate a distributed computing and storage infrastructure on the UW campus
• The NSF Middleware Initiative (NMI) - Develop, build and operate a national Build and Test facility powered by Metronome (ETICS-II)
Principles of job submission and execution management
Principles of high-throughput computing
Principles of service-oriented architecture
Principles of distributed data management
Principles of using distributed and high performance systems
Higher level APIs: OGSA-DAI, SAGA and metadata management
Workflows
RPCDCE
DCOM
CORBARMI
Web ServicesXML
“Web services has dramatically reduced the programming and management cost of publishing and receiving information”
Jim Gray, Microsoft Research
EMBRACE – 4yr EU project to establish services for the bioinformatics community
3. Principles of service-oriented architectures
• Vision• Provide the fundamental components to get
the grid working
• History• Starting point in I-WAY, a distributed high-
performance network demonstrated at the SuperComputing '95 conference and exhibition
…14 Years Later
• 4 major versions• Components to address the original
problems• Many new fields
• recent hot topics: service oriented science, virtualization
• Diverse application areas• recently: lots of bioinformatics and medical apps• others include: earthquakes, particle physics,
earth sciences
21
IncubatorProjects
Globus Software now – many components
SecurityExecution
MgmtInfo
ServicesCommonRuntime
Globus Projects
Other
MPICH-G2
GridWay
Data Mgmt
IncubatorMgmt
Cog WF
LRMA
GAARDS
OGROGDTE UGP
HOC-SAPURSE
GridShib
Introduce
Dyn Acct
WEEP
Gavia JSC
Gavia MS
DDM
Virt WkSp
SGGC
Others...
ServMark
GridFTP
ReliableFile
Transfer
OGSA-DAI
GRAM
MDS4CAS
DataRep
DelegationReplica
LocationJava
Runtime
C Runtime
Python Runtime
GT4
C Sec GT4 Docs
MEDICUS
GSI-OpenSSH
MyProxy
Metrics
Principles of job submission and execution management
Principles of high-throughput computing
Principles of service-oriented architecture
Principles of distributed data management
Principles of using distributed and high performance systems
Higher level APIs: OGSA-DAI, SAGA and metadata management
Workflows
4. Principles of distributed data management
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 24
EGEE Project Overview
17000 users
136000 LCPUs (cores)
25Pb disk
39Pb tape
12 million jobs/month
+45% in a year
268 sites
+5% in a year
48 countries
+10% in a year
162 VOs
+29% in a year
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 25
Middleware Supporting HTC• Archeology• Astronomy• Astrophysics• Civil Protection• Comp. Chemistry• Earth Sciences• Finance• Fusion• Geophysics• High Energy Physics• Life Sciences• Multimedia• Material Sciences
Supported End-user Activity• 13,000 end-users in 112 VOs
• +44% users in a year• 23 core VOs
• A core VO has >10% of usage within its science cluster
History of gLite• Development started in 2004• Entered production in May 2006• Middleware distribution of EGEE
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 26
EGEE Maintained Components External Components
gLite Middleware
Physical Resources
General Services
LHC FileCatalogue
HydraWorkload
Management Service
File TransferService
Logging &Book keeping
Service
AMGA
Storage Element
Disk Pool Manager
dCache
Information S
ervices
BDII
MON
User InterfaceUser Access
SecurityServices
Virtual Organisation Membership
Service
Authz. Service
SCAS
Proxy Server
LCAS & LCMAPS
Compute Element
CREAM LCG-CE
gLExec
BLAH
Worker Node
User Interface
Principles of job submission and execution management
Principles of high-throughput computing
Principles of service-oriented architecture
Principles of distributed data management
Principles of using distributed and high performance systems
Higher level APIs: OGSA-DAI, SAGA and metadata management
Workflows
The Computing “Eco-system”
TIER 1
TIER 2
TIER 3
Capability Computing
Capacity Computing
TIER 4
National/regional centers, Grid-collaboration
Local centers
Large-scale HPC centers
Personal/office computing
• Scientific need for all tiers!
5. Principles of using distributed and high performance systems
ARC middleware (Advanced Resource Connector)
• open source out-of-the-box Grid solution software which
enables production quality computational and data Grids
(released in May 2002)
• development is coordinated by NDGF
• emphasis is put on scalability, stability, reliability and
performance
• builds upon standard OS solutions, OpenLDAP, OpenSSL, SASL and Globus Toolkit• adds services not provided by Globus• extends or completely replaces some Globus components
ARC Tutorial & Grid Technologies Intro Slide 30 / 53
NorduGrid collaboration*
national Grids (e.g. M-grid, SweGrid, NorGrid), users also outside the Nordic countries
real users, real applications
implemented a production Grid system working non stop since May 2002
open for anyone to participate
* http://www.nordugrid.org/monitor
a community around open source Grid middleware: ARC
ARC Tutorial & Grid Technologies Intro Slide 31 / 53
M-grid ̶ the Finnish Material Sciences Grid
joint project between seven Finnish universities, Helsinki Institute of Physics and CSC
partners are laboratories and departments and not university IT centers
not limited by the field of research, used for a wide range of physical, chemical and nanoscience applications
jointly funded by the Academy of Finland and the participating universities
first large initiative to put Grid middleware into production use in Finland
goal: throughput computing capacity mainly for the needs of physics and chemistry researchers
– opened to all CSC customers in Nov 2005
Grids at CSC (HPC and Grids in Practice)
gLite on HP cluster HP CP4000BL ProLiant Cluster
2176 processor cores 5 TB memory 11 TF peak performance Infiniband interconnect
ARC on HP cluster
Cray XT4/XT5 10960 computing cores 11.092 TB computing peak power
100.8 TF. Final configuration Q3/2008
UNICORE on Cray MPP
Principles of job submission and execution management
Principles of high-throughput computing
Principles of service-oriented architecture
Principles of distributed data management
Principles of using distributed and high performance systems
Higher level APIs: OGSA-DAI, SAGA and metadata management
Workflows
6. Higher level APIs: OGSA-DAI, SAGA and metadata management (S-OGSA)
• OGSA-DAI Vision• is to enable the sharing of data resources to enable
collaboration, to support:- Data access - access to structured data in distributed
heterogeneous data resources.- Data transformation e.g. expose data in schema X to users
as data in schema Y.- Data integration e.g. expose multiple databases to users
as a single virtual database- Data delivery - delivering data to where it's needed by the
most appropriate means e.g. web service, e-mail, HTTP, FTP, GridFTP
6. Higher level APIs: OGSA-DAI, SAGA and metadata management (S-OGSA)
• OGSA-DAI History• The OGSA-DAI project started in February 2002 as part of the
UK e-Science Grid Core Program• Is today part of OMII-UK, a partnership between:
- OMII, The University of Southampton- myGrid, The University of Manchester- OGSA-DAI, The University of Edinburgh
6. Higher level APIs: OGSA-DAI, SAGA and metadata management (S-OGSA)
• Vision of a Simple API for Grid Application - SAGA• Provide simple programmatic interface that is widely-adopted,
usable and available for enabling applications for the grid• Simplicity:
- easy to use, install, administer and maintain• Uniformity:
- provides support for different application programming languages as well as consistent semantics and style for different Grid functionality
• Scalability:- Contains mechanisms for the same application (source)
code to run on a variety of systems ranging from laptops to HPC resources
• Genericity:- adds support for different grid middleware, even concurrent
ones• Modularity:
- provides a framework that is easily extendable
6. Higher level APIs: OGSA-DAI, SAGA and metadata management (S-OGSA)
• Metadata management: Make metadata Princess in the kingdom of Semantic Web
Principles of job submission and execution management
Principles of high-throughput computing
Principles of service-oriented architecture
Principles of distributed data management
Principles of using distributed and high performance systems
Higher level APIs: OGSA-DAI, SAGA and metadata management
Workflows
7. Workflows
• Organize your work e.g:• Gather initial data• Pre-processing of data• Define computing job(s)• Initiate job(s)• Gather results• Post-processing of results• :• Repeat
During the school you will understand how you can do this in different ways with the systems studied. But, this can also be done with specific workflow systems: Taverna, P-Grade Portal,…
40
Motivations for developing P-GRADE portal
• P-GRADE portal should – Give an answer for all the questions of an e-scientist– Hide the complexity of the underlying grid middlewares– Provide a high-level graphical user interface that is easy-to-use for
e-scientists– Support many different grid programming approaches (see Morris
Riedel’s talk):• Simple Scripts & Control (sequential and MPI job execution)• Scientific Application Plug-ins (based on GEMLCA)• Complex Workflows• Parameter sweep applications: both on job and workflow level• Interoperability: transparent access to grids based on different
middleware technology– Support three levels of parallelism
41
Short History of P-GRADE portal
• Parallel Grid Application and Development Environment
• Initial development started in the Hungarian SuperComputing Grid project in 2003
• It has been continuously developed since 2003• Detailed information:
http://portal.p-grade.hu/• Open Source community development since
January 2008:https://sourceforge.net/projects/pgportal/
Integrating Practical
Principles of job submission and execution management
Principles of high-throughput computing
Principles of service-oriented architecture
Principles of distributed data management
Principles of using distributed and high performance systems
Higher level APIs: OGSA-DAI, SAGA and metadata management
Workflows