General Introduction to technologies that will be seen in the school

Preview:

Citation preview

Introduction to Themes and Technologies

Per Öster

<per.oster@csc.fi>

CSC – IT Center for Science Ltd

Finland

CSC at a glance

● Founded in 1970 as a technical support unit for Univac 1108

● Reorganized as a company, CSC - Scientific Computing Ltd. in 1993

● All shares to the Ministry of Education of Finland in 1997

● Operates on a non-profit principle

● Facilities in Espoo, close to Otaniemi community (of 15,000 students and 16,000 technology professionals)

● Staff 170

● Turnover 2008 19,6 million euros

Themes of the First Week

Date Theme Technology

Tue 7 July Principles of job submission and execution management

UNICORE

Wed 8July Principles of high-throughput computing CONDOR

Thu 9 July Principles of service-oriented architectures Globus

Fri 10 July Principles of distributed data management gLite

Sat 11 July Principles of using distributed and high performance systems

ARC

Themes of the Second Week

Date Theme Technology

Mon 13 July How to solve my problem?

Tue 14 July Higher level APIs: OGSA-DAI, SAGA and metadata management

SAGA,OGSA-DAI,Grid SAM

Wed 15 July Workflows P-GRADE,Semantic Metadata

Thu 16 July Integrating Practical All

Fri 17 July Cloud Computing (lecture)

The Acronyms

Acronym What

WSRF Web Services Resource Framework

OGSA Open Grid Service Architecture

SOA Service Oriented Architecture

Principles of job submission and execution management

Principles of high-throughput computing

Principles of service-oriented architecture

Principles of distributed data management

Principles of using distributed and high performance systems

Higher level APIs: OGSA-DAI, SAGA and metadata management

Workflows

Principles of job submission and execution management

Principles of high-throughput computing

Principles of service-oriented architecture

Principles of distributed data management

Principles of using distributed and high performance systems

Higher level APIs: OGSA-DAI, SAGA and metadata management

Workflows

1. Principles of job submission and execution management

• Vision• UNiform Interface to COmputing Resources

- seamless, secure, and intuitive

• History• 08/1997 – 12/2002: UNICORE and UNICORE Plus

projects- Initial development started in two German projects

funded by the German ministry of education and research (BMBF)

• Continuation in different EU projects since 2002• Open Source community development since summer

2004

9http://www.unicore.eu

UNICORE 6 Guiding Principles, Implementation Strategies

Open source under BSD license with software hosted on SourceForge Standards-based: OGSA-conform, WS-RF 1.2 compliant Open, extensible Service-Oriented Architecture (SOA) Interoperable with other Grid technologies Seamless, secure and intuitive following a vertical end-to-end approach Mature Security: X.509, proxy and VO support Workflow support tightly integrated while being extensible for different

workflow languages and engines for domain-specific usage Application integration mechanisms on the client, services and

resource level Variety of clients: graphical, command-line, API, portal, etc. Quick and simple installation and configuration Support for many operating systems (Windows, MacOS, Linux, UNIX)

and batch systems (LoadLeveler, Torque, SLURM, LSF, OpenCCS) Implemented in Java to achieve platform-independence

10

UNICOREWS-RFhosting

environment

XNJS – Site 1

IDB

UNICORE Atomic Services

OGSA-*

ServiceRegistry

Local RMS (e.g. Torque, LL, LSF, etc.)

Target System Interface – Site 1

Local RMS (e.g. Torque, LL, LSF, etc.)

X.509, Proxies, SOAP, WS-RF,

WS-I, JSDL

OGSA-ByteIO, OGSA-BES, JSDL,

HPC-P, OGSA-RUS, UR

X.509, XACML, SAML, Proxies

DRMAA

UCCcommand-line client

URCEclipse-based Rich client

Portal e.g. GridSphere

HiLAProgrammingAPI

Gateway – Site 1

UVOSVO

Service

ExternalStorage

USpace

GridFTP, Proxies

USpace

XUUDB

WorkflowEngine

ServiceOrchestrator

XACML entity

UNICOREWS-RFhostingenvironment

XNJS – Site 2

IDB

UNICORE Atomic Services

OGSA-*

Target System Interface – Site 2

XUUDB

XACML entity

Gateway – Site 2CISInfo

Service

OGSA-RUS, UR,GLUE 2.0

Grid services hosting

job incarnation

web service stack

data transfer to external storages

authorization

authentication

scientific clientsand applications

central services running in WS-RF hosting

environments

Gateway

http://www.unicore.eu

11http://www.unicore.eu

Two layer architecture for scalability

Workflow engine Based on Shark

open-source XPDL engine

Pluggable, domain-specific workflow languages

Service orchestrator Job execution and monitoring Callback to workflow engine Brokering based on pluggable strategies

Clients GUI client based on Eclipse Commandline submission of workflows is also possible

Workflows in

Principles of job submission and execution management

Principles of high-throughput computing

Principles of service-oriented architecture

Principles of distributed data management

Principles of using distributed and high performance systems

Higher level APIs: OGSA-DAI, SAGA and metadata management

Workflows

High-Throughput Computing

• Large amount of tasks that can be executed independently• Parameter Studies• Monte Carlo or Stochastic Methods• Genome Sequencing (matching)• Analysis of LHC data• :

Starting from this

Looking for this

(1 in 1013)

2. Principles of high-throughput computing

• Vision• Condor provides high-throughput computing in a variety of

environments- Local dedicated clusters (machine rooms)- Local opportunistic (desktop) computers)- Grid environments; Can submit jobs to other systems- Can run workflows of jobs- Can run parallel jobs- Independently parallel (lots of single jobs)- Tightly coupled (such as MPI)

2. Principles of high-throughput computing

• History and Activity • Distributed Computing research performed by a team of ~35

faculty, full time staff and students who• Established in 1985• Faces software/middleware engineering challenges in

a UNIX/Linux/Windows/OS X environment, • Involved in national and international collaborations,• Interacts with users in academia and industry,• Maintains and support a distributed production

environment (more than 5000 CPUs at UW),• Educates and trains students.

Condor Project:Main Threads of Activities

• Distributed Computing Research – develop and evaluate new concepts, frameworks and technologies

• Develop and maintain Condor; support our users – More on next slide

• The Open Science Grid (OSG) – build and operate a national High Throughput Computing infrastructure

• The Grid Laboratory Of Wisconsin (GLOW) – build, maintain and operate a distributed computing and storage infrastructure on the UW campus

• The NSF Middleware Initiative (NMI) - Develop, build and operate a national Build and Test facility powered by Metronome (ETICS-II)

Principles of job submission and execution management

Principles of high-throughput computing

Principles of service-oriented architecture

Principles of distributed data management

Principles of using distributed and high performance systems

Higher level APIs: OGSA-DAI, SAGA and metadata management

Workflows

RPCDCE

DCOM

CORBARMI

Web ServicesXML

“Web services has dramatically reduced the programming and management cost of publishing and receiving information”

Jim Gray, Microsoft Research

EMBRACE – 4yr EU project to establish services for the bioinformatics community

3. Principles of service-oriented architectures

• Vision• Provide the fundamental components to get

the grid working

• History• Starting point in I-WAY, a distributed high-

performance network demonstrated at the SuperComputing '95 conference and exhibition

…14 Years Later

• 4 major versions• Components to address the original

problems• Many new fields

• recent hot topics: service oriented science, virtualization

• Diverse application areas• recently: lots of bioinformatics and medical apps• others include: earthquakes, particle physics,

earth sciences

21

IncubatorProjects

Globus Software now – many components

SecurityExecution

MgmtInfo

ServicesCommonRuntime

Globus Projects

Other

MPICH-G2

GridWay

Data Mgmt

IncubatorMgmt

Cog WF

LRMA

GAARDS

OGROGDTE UGP

HOC-SAPURSE

GridShib

Introduce

Dyn Acct

WEEP

Gavia JSC

Gavia MS

DDM

Virt WkSp

SGGC

Others...

ServMark

GridFTP

ReliableFile

Transfer

OGSA-DAI

GRAM

MDS4CAS

DataRep

DelegationReplica

LocationJava

Runtime

C Runtime

Python Runtime

GT4

C Sec GT4 Docs

MEDICUS

GSI-OpenSSH

MyProxy

Metrics

Principles of job submission and execution management

Principles of high-throughput computing

Principles of service-oriented architecture

Principles of distributed data management

Principles of using distributed and high performance systems

Higher level APIs: OGSA-DAI, SAGA and metadata management

Workflows

4. Principles of distributed data management

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 24

EGEE Project Overview

17000 users

136000 LCPUs (cores)

25Pb disk

39Pb tape

12 million jobs/month

+45% in a year

268 sites

+5% in a year

48 countries

+10% in a year

162 VOs

+29% in a year

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 25

Middleware Supporting HTC• Archeology• Astronomy• Astrophysics• Civil Protection• Comp. Chemistry• Earth Sciences• Finance• Fusion• Geophysics• High Energy Physics• Life Sciences• Multimedia• Material Sciences

Supported End-user Activity• 13,000 end-users in 112 VOs

• +44% users in a year• 23 core VOs

• A core VO has >10% of usage within its science cluster

History of gLite• Development started in 2004• Entered production in May 2006• Middleware distribution of EGEE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 26

EGEE Maintained Components External Components

gLite Middleware

Physical Resources

General Services

LHC FileCatalogue

HydraWorkload

Management Service

File TransferService

Logging &Book keeping

Service

AMGA

Storage Element

Disk Pool Manager

dCache

Information S

ervices

BDII

MON

User InterfaceUser Access

SecurityServices

Virtual Organisation Membership

Service

Authz. Service

SCAS

Proxy Server

LCAS & LCMAPS

Compute Element

CREAM LCG-CE

gLExec

BLAH

Worker Node

User Interface

Principles of job submission and execution management

Principles of high-throughput computing

Principles of service-oriented architecture

Principles of distributed data management

Principles of using distributed and high performance systems

Higher level APIs: OGSA-DAI, SAGA and metadata management

Workflows

The Computing “Eco-system”

TIER 1

TIER 2

TIER 3

Capability Computing

Capacity Computing

TIER 4

National/regional centers, Grid-collaboration

Local centers

Large-scale HPC centers

Personal/office computing

• Scientific need for all tiers!

5. Principles of using distributed and high performance systems

ARC middleware (Advanced Resource Connector)

• open source out-of-the-box Grid solution software which

enables production quality computational and data Grids

(released in May 2002)

• development is coordinated by NDGF

• emphasis is put on scalability, stability, reliability and

performance

• builds upon standard OS solutions, OpenLDAP, OpenSSL, SASL and Globus Toolkit• adds services not provided by Globus• extends or completely replaces some Globus components

ARC Tutorial & Grid Technologies Intro Slide 30 / 53

NorduGrid collaboration*

national Grids (e.g. M-grid, SweGrid, NorGrid), users also outside the Nordic countries

real users, real applications

implemented a production Grid system working non stop since May 2002

open for anyone to participate

* http://www.nordugrid.org/monitor

a community around open source Grid middleware: ARC

ARC Tutorial & Grid Technologies Intro Slide 31 / 53

M-grid ̶ the Finnish Material Sciences Grid

joint project between seven Finnish universities, Helsinki Institute of Physics and CSC

partners are laboratories and departments and not university IT centers

not limited by the field of research, used for a wide range of physical, chemical and nanoscience applications

jointly funded by the Academy of Finland and the participating universities

first large initiative to put Grid middleware into production use in Finland

goal: throughput computing capacity mainly for the needs of physics and chemistry researchers

– opened to all CSC customers in Nov 2005

Grids at CSC (HPC and Grids in Practice)

gLite on HP cluster HP CP4000BL ProLiant Cluster

2176 processor cores 5 TB memory 11 TF peak performance Infiniband interconnect

ARC on HP cluster

Cray XT4/XT5 10960 computing cores 11.092 TB computing peak power

100.8 TF. Final configuration Q3/2008

UNICORE on Cray MPP

Principles of job submission and execution management

Principles of high-throughput computing

Principles of service-oriented architecture

Principles of distributed data management

Principles of using distributed and high performance systems

Higher level APIs: OGSA-DAI, SAGA and metadata management

Workflows

6. Higher level APIs: OGSA-DAI, SAGA and metadata management (S-OGSA)

• OGSA-DAI Vision• is to enable the sharing of data resources to enable

collaboration, to support:- Data access - access to structured data in distributed

heterogeneous data resources.- Data transformation e.g. expose data in schema X to users

as data in schema Y.- Data integration e.g. expose multiple databases to users

as a single virtual database- Data delivery - delivering data to where it's needed by the

most appropriate means e.g. web service, e-mail, HTTP, FTP, GridFTP

6. Higher level APIs: OGSA-DAI, SAGA and metadata management (S-OGSA)

• OGSA-DAI History• The OGSA-DAI project started in February 2002 as part of the

UK e-Science Grid Core Program• Is today part of OMII-UK, a partnership between:

- OMII, The University of Southampton- myGrid, The University of Manchester- OGSA-DAI, The University of Edinburgh

6. Higher level APIs: OGSA-DAI, SAGA and metadata management (S-OGSA)

• Vision of a Simple API for Grid Application - SAGA• Provide simple programmatic interface that is widely-adopted,

usable and available for enabling applications for the grid• Simplicity:

- easy to use, install, administer and maintain• Uniformity:

- provides support for different application programming languages as well as consistent semantics and style for different Grid functionality

• Scalability:- Contains mechanisms for the same application (source)

code to run on a variety of systems ranging from laptops to HPC resources

• Genericity:- adds support for different grid middleware, even concurrent

ones• Modularity:

- provides a framework that is easily extendable

6. Higher level APIs: OGSA-DAI, SAGA and metadata management (S-OGSA)

• Metadata management: Make metadata Princess in the kingdom of Semantic Web

Principles of job submission and execution management

Principles of high-throughput computing

Principles of service-oriented architecture

Principles of distributed data management

Principles of using distributed and high performance systems

Higher level APIs: OGSA-DAI, SAGA and metadata management

Workflows

7. Workflows

• Organize your work e.g:• Gather initial data• Pre-processing of data• Define computing job(s)• Initiate job(s)• Gather results• Post-processing of results• :• Repeat

During the school you will understand how you can do this in different ways with the systems studied. But, this can also be done with specific workflow systems: Taverna, P-Grade Portal,…

40

Motivations for developing P-GRADE portal

• P-GRADE portal should – Give an answer for all the questions of an e-scientist– Hide the complexity of the underlying grid middlewares– Provide a high-level graphical user interface that is easy-to-use for

e-scientists– Support many different grid programming approaches (see Morris

Riedel’s talk):• Simple Scripts & Control (sequential and MPI job execution)• Scientific Application Plug-ins (based on GEMLCA)• Complex Workflows• Parameter sweep applications: both on job and workflow level• Interoperability: transparent access to grids based on different

middleware technology– Support three levels of parallelism

41

Short History of P-GRADE portal

• Parallel Grid Application and Development Environment

• Initial development started in the Hungarian SuperComputing Grid project in 2003

• It has been continuously developed since 2003• Detailed information:

http://portal.p-grade.hu/• Open Source community development since

January 2008:https://sourceforge.net/projects/pgportal/

Integrating Practical

Principles of job submission and execution management

Principles of high-throughput computing

Principles of service-oriented architecture

Principles of distributed data management

Principles of using distributed and high performance systems

Higher level APIs: OGSA-DAI, SAGA and metadata management

Workflows

Recommended