25
- 1 - GRID@Large - Lisbon, August 29-30 2005 GRID@Large - Lisbon, August 29-30 2005 DEISA DEISA Forschungszentrum Jülich in der Helmholtz-Gesellschaft Achim Streit www.deisa.org

- 1 - GRID@Large - Lisbon, August 29-30 2005A. Streit DEISA Forschungszentrum Jülich in der Helmholtz-Gesellschaft Achim Streit

  • View
    220

  • Download
    2

Embed Size (px)

Citation preview

- 1 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

DEISADEISA

Forschungszentrum Jülichin der Helmholtz-Gesellschaft

Achim Streit

www.deisa.org

- 2 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

AgendaAgenda

Introduction

SA3: Resource Management

DEISA Extreme Computing Initiative

Conclusion

- 3 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

The DEISA ConsortiumThe DEISA Consortium

DEISA is a consortium of leading national supercomputer centers in Europe

IDRIS – CNRS, France

FZJ, Jülich, Germany

RZG, Garching, Germany

CINECA, Bologna, Italy

EPCC, Edinburgh, UK

CSC, Helsinki, Finland

SARA, Amsterdam, The Netherlands

HLRS, Stuttgart, Germany

BSC, Barcelona, Spain

LRZ, Munich, Germany

ECMWF (European Organization), Reading, UK

Granted by: European Union FP6 Grant period: May, 1st 2004 – April, 30th 2008

- 4 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

DEISA objectivesDEISA objectives

To enable Europe’s terascale science by the integration of Europe’s most powerful supercomputing systems.

Enabling scientific discovery across a broad spectrum of science and technology is the only criterion for success

DEISA is an European Supercomputing Service built on top of existing national services.

DEISA deploys and operates a persistent, production quality, distributed, heterogeneous supercomputing environment with continental scope.

- 5 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

Basic requirements and strategies for the Basic requirements and strategies for the DEISA research InfrastructureDEISA research Infrastructure

Fast deployment of a persistent, production quality, grid empowered supercomputing infrastructure with continental scope.

European supercomputing service built on top of existing national services requires reliability and non disruptive behavior.

User and application transparency

Top-down approach: technology choices result from the business and operational models of our virtual organization. DEISA technology choices are fully open.

- 6 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

The DEISA supercomputing Grid: The DEISA supercomputing Grid: A layered infrastructureA layered infrastructure

Inner layer: a distributed super-cluster resulting from the deep integration of similar IBM AIX platforms at IDRIS, FZ-Jülich, RZG-Garching and CINECA (phase 1) then CSC (phase 2). It looks to external users as a single supercomputing platform.

Outer layer: a heterogeneous supercomputing Grid: IBM AIX super-cluster (IDRIS, FZJ, RZG, CINECA, CSC) close to 24 Tf BSC, IBM PowerPC Linux system, 40 Tf LRZ, Linux cluster (2.7 Tf) moving to SGI ALTIX system (33 Tf in 2006, 70

Tf in 2007) SARA, SGI ALTIX Linux cluster, 2.2 Tf ECMWF, IBM AIX system, 32 Tf HLRS, NEC SX8 vector system, close to 10 Tf

- 7 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

Logical view of the Logical view of the phase 2 DEISA networkphase 2 DEISA network

DFN

RENATER

GARR

GÈANT

SURFnet

UKERNA

RedIRIS

FUnet

- 8 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

AIX Super-Cluster May 2005AIX Super-Cluster May 2005

CSC

ECMWF

ServicesServices:

High performance datagrid via GPFSAccess to remote files use the fullavailable network bandwidth

Job migration across sitesUsed to load balance the global workflow when a huge partition is allocated to a DEISA project in one site

Common Production Environment

- 9 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

Service ActivitiesService Activities

SA1 – Network Operation and Support (FZJ) Deployment and operation of a gigabit per second network infrastructure for an

European distributed supercomputing platform. Network operation and optimization during project activity.

SA2 – Data Management with Global File Systems (RZG) Deployment and operation of global distributed file systems, as basic building

blocks of the “inner” super-cluster, and as a way of implementing global data management in a heterogeneous Grid.

SA3 – Resource Management (CINECA) Deployment and operation of global scheduling services for the European

super-cluster, as well as for its heterogeneous Grid extension. SA4 – Applications and User Support (IDRIS)

Enabling the adoption by the scientific community of the distributed supercomputing infrastructure, as an efficient instrument for the production of leading computational science.

SA5 – Security (SARA) Providing administration, authorization and authentication for a heterogeneous

cluster of HPC systems, with special emphasis on single sign-on.

- 10 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

SA3: A Three Layer ArchitectureSA3: A Three Layer Architecture

Basic services located closest to the operating system of the computing platforms enable the operation of a single or a multiple cluster through local or

extended batch schedulers and other cluster-like features

Intermediate services first-level Grid services that allow access to an enlarged Grid-

empowered infrastructure dealing with resource and network monitoring and information systems

Advanced service use the previous layers to implement the global management of the

distributed resources of the infrastructure

- 11 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

Logical LayoutLogical Layout

Hardware

OS and communication

Resource manager

Policies implementation through the scheduler (workload,advance reservation, accounting)

Services:• access• workflow management• co-allocation• brokering• job rerouting• multiple accounting• data staging

- 12 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

Gateway 4.1.0 NJS 4.2.0 TSI 4.1.0

J2SE 1.4.2

UNICORE InfrastructureUNICORE Infrastructure

- 13 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

Physical LayoutPhysical LayoutResource ManagementResource Management

Power 4

AIX 5.2

LL RM

LL backfill

Power 4

AIX 5.2

LL RM

LL backfill

Power 4

AIX 5.2

LL RM

LL backfill

Power 4

AIX 5.2

LL RM

LL backfill

Power 4

AIX 5.2

LL RM

LL backfill

Power 4

AIX 5.2

LL RM

LL backfill

IA64

RHEL+SGI PP

LSF RM

LSF HPC

IA64

RHEL+SGI PP

SGE RM

SGE

PPC

SUSE

LL RM

LL backfill

SX

NECOS

NEC NQERM

NEC NQE

IDRIS FZJ RZG CINECA CSC ECMWF SARA LRZ BSC HLRS

UNICORE

- 14 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

Physical LayoutPhysical LayoutData ManagementData Management

Power 4

AIX 5.2

Power 4

AIX 5.2

Power 4

AIX 5.2

Power 4

AIX 5.2

Power 4

AIX 5.2

Power 4

AIX 5.2

IA64

RHEL+SGI PP

ClientAd Hoc

IA64

RHEL+SGI PP

PPC

SUSE

SX

NECOS

IDRIS FZJ RZG CINECA CSC ECMWF SARA LRZ BSC HLRS

IBM GPFS (General Parallel File System) over WAN

ClientAd Hoc

ClientAd Hoc??

ClientNative

ClientNative

ClientNative

ClientNative

ClientNative

ClientNative

ClientNative

- 15 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

DEISA Supercomputing Grid servicesDEISA Supercomputing Grid services

Workflow management: based on UNICORE plus further extensions and services coming from DEISA’s JRA7 and other projects (UniGrids, …)

Global data management: a well defined architecture implementing extended global file systems on heterogeneous systems, fast data transfers across sites, and hierarchical data management at a continental scale.

Co-scheduling: needed to support Grid applications running on the heterogeneous environment.

Science Gateways and portals: specific Internet interfaces to hide complex supercomputing environments from end users, and facilitate the access of new, non traditional, scientific communities.

- 16 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

CPU GPFS CPU GPFS CPU GPFS CPU GPFS CPU GPFS

+ NRENs

ClientClient

Job

Data

Job-workflow:1) FZJ2) CINECA3) RZG4) IDRIS5) SARA

Workflow Application with UNICOREWorkflow Application with UNICOREGlobal Data Management with GPFSGlobal Data Management with GPFS

- 17 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

Resource ManagementResource ManagementInformation System (RMIS)Information System (RMIS)

Deliver up to date and complete resource management information about the grid

Provide relevant information to system administrators from remote sites and to end-users

Our approach performed a implementation-independent system analysis attempted to model the DEISA distributed supercomputer platform

designed to operate the grid identified the resource management part as a sub-system needing to

interface other sub-systems to get relevant information other sub-systems use external tools (monitoring tools, data bases and

batch system) with which we need to interface

- 18 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

ImplementationImplementation

Cluster

Server

Batch Scheduler

Ganglia gmond

MDS2 : static and almost static data

Ganglia gmond

Ganglia gmond

Firewall

RMIS web front-end

Web Server

Configuration files

Gangliagmetad

MDS2back-end

Based on Ganglia monitoring tool coupled with MDS2/Globus The data published have been distinguished in two groups :

static data (MDS2) – refresh time ~ hours or days dynamic data (Ganglia) – refresh time ~ seconds or minutes

Web server based on the Ganglia web front end allows the display of any relevant data from MDS2 or Ganglia

- 19 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

Portals (Science Gateways)Portals (Science Gateways)

Same concept as TeraGrid’s Science Gateways

Needed to enhance the outreach of supercomputing infrastructures

Hiding complex supercomputing environments from end users, providing discipline specific tools and support, and moving in some cases towards community allocations.

There is already work done by DEISA on Genomics and Material Sciences portals

Intense brainstorming on the desing of a global strategy, if possible interoperable with TeraGrid’s Science Gateways

- 20 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

Enabling scienceEnabling science

Initial, “early users” program: a number of Joint Research Activities integrated in the project from the start.

Moving towards “exceptional users”: the DEISA Extreme Computing Initiative

Activity Scientific program Partners Leader

JRA1 Enabling Material Science, CPMD cods, portals

RZG Hermann Lederer, RZG

JRA2 Computational environment for applications in Cosmology

EPCC Gavin Pringle, EPCC

JRA3 Enabling the TORB Plasma Physics code

RZG Hermann Lederer, RZG

JRA4 Life science: genomic and eHealth Applications

IDRIS, (BSC) Victor Alessanrini, IDRIS BSC

JRA5 CFD in automobile industry CINECA, CRI Roberto Tregnago, CRI

JRA6 Coupled applications: Astrophysics, Combustion, Environment

IDRIS (HLRS) Gilles Grasseau, IDRIS

- 21 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

The Extreme Computing InitiativeThe Extreme Computing Initiative

Identification, deployment and operation of a number of “flagship” applications in selected areas of science and technology

Applications must rely on the DEISA Supercomputing Grid services (application profiles have been clearly defined). They will benefit from exceptional resources from the DEISA pool.

Applications are selected on the basis of scientific excellence, innovation potential, and relevance criteria.

European call for proposals: April 1st May 30, 2005

- 22 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

Evaluation and allocation of DEISA Evaluation and allocation of DEISA resourcesresources

National evaluation committees evaluate the proposals and determine priorities.

On the basis of this information, the DEISA consortium examines how the applications map to the resources available in the DEISA pool, and negotiates internally the way the resources will be allocated and the final priorities for projects.

Exceptional DEISA resources will be allocated – as in large scientific instruments – at well defined time windows (to be negotiated with the users).

- 23 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

DDEISA EISA EExtreme xtreme CComputing omputing IInitativenitativeDECIDECI

Call for Expressions of Interest / Proposals in April and May 2005 50 proposals submitted Requested CPU time: 32 million CPU-hr European countries involved

Finland, France, Germany, Greece, Hungary, Italy, Netherlands, Russia, Spain, Sweden, Switzerland, UK

Proposals Materials Science, Quantum Chemistry, Quantum Computing: 16 Astrophysics (Cosmology, Stars, Solar Sys.): 13 Life Sciences, Biophysics, Bioinformatics: 8 CFD, Fluid Mechanics, Combustion: 5 Earth Sciences, Climate Research: 4 Plasma Physics: 2 QCD, Particle Physics, Nuclear Physics: 2

- 24 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

ConclusionsConclusions

DEISA adopts Grid technologies to integrate national supercomputing infrastructures, and to provide an European Supercomputing Service.

Service activities are supported by the coordinated action of the national center's staffs. DEISA operates as a virtual European supercomputing centre.

The big challenge we are facing is enabling new, first class computational science.

Integrating leading supercomputing platforms with Grid technologies creates a new research dimension in Europe.

- 25 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005

A. Streit A. Streit

October 11–12, 2005ETSI Headquarters, Sophia Antipolis, France

http://summit.unicore.org/2005

In conjunction withGrids@work: Middleware, Components, Users, Contest and Plugtests

http://www.etsi.org/plugtests/GRID.htm

Supported by