33
31 January 2003 GridPP Collaboration Meeting 1 CLRC e-Science Centre David Boyd Deputy Director, CLRC e-Science Centre [email protected] http://www.e-science.clrc.ac.uk/

31 January 2003 GridPP Collaboration Meeting 1 CLRC e-Science Centre David Boyd Deputy Director, CLRC e-Science Centre [email protected]

Embed Size (px)

Citation preview

31 January 2003 GridPP Collaboration Meeting 1

CLRC e-Science Centre

David BoydDeputy Director, CLRC e-Science Centre

[email protected]://www.e-science.clrc.ac.uk/

31 January 2003 GridPP Collaboration Meeting 2

CLRC e-Science Centre

• Organisational structure

• Centre’s core programme

• National e-Science role

• Development programme

• Future challenges

31 January 2003 GridPP Collaboration Meeting 3

Organisational structure

• Centre Director – Paul Durham (secretary Shirley Miller)• Deputy Director – David Boyd (secretary Virginia Jones)

• 5 operating groups– Computing and Data Services – John Gordon– Grid Support – Alistair Mills– Grid Technology – Rob Allan– Data Management – Kerstin Kleese van Dam– Grid Visualisation – Lakshmi Sastry

• Now 37 technical staff in post in these groups

31 January 2003 GridPP Collaboration Meeting 4

CLRC e-Science Centre

• Organisational structure

• Centre’s core programme

• National e-Science role

• Development programme

• Future challenges

31 January 2003 GridPP Collaboration Meeting 5

Providing facilities . . .

• Providing Grid-accessible scientific computing facilities and expertise for CLRC’s user community (both in-house and external)– BaBar Tier A and UKHEP production centres and LHC Tier 1 prototype

centre• 500 CPUs + 45TB disk (and growing)

– Columbus• 24 CPU Alphaserver SC/Quadrics for the Computational Chemistry Working

Party

– Mott• 20 CPU Alphaserver SC/Quadrics for the Minerals and Ceramics Consortium

– Beowulf/PC/Linux clusters• 32 AMD CPUs/Wulfkit, 16 AMD CPUs/Myrinet• 2x 32 AMD CPUs/ethernet (for ISIS)• 64 Alpha CPUs/QsNET/ethernet

– Atlas Datastore • 100TB scientific data archive being upgraded to 1PB capacity

• Ensuring other major facilities are Grid-enabled – including HPCx

31 January 2003 GridPP Collaboration Meeting 6

. . . and infrastructure

• Providing a Grid-based infrastructure for major scientific facilities in CLRC and elsewhere– Particle Physics (CERN, SLAC, Fermilab)

• data analysis and management– ISIS

• data visualisation, analysis and management, remote instrument control– Synchrotron Radiation Source

• data analysis, data management, remote instrument control– British Atmospheric Data Centre

• data management– Space Science and Astronomy (ESA, NASA)

• data management

• Collaboration on new CLRC facility developments – Diamond

• data management and analysis, remote instrument control– 4GLS

• data management and analysis

31 January 2003 GridPP Collaboration Meeting 7

CLRC e-Science Centre

• Organisational structure

• Centre’s core programme

• National e-Science role

• Development programme

• Future challenges

31 January 2003 GridPP Collaboration Meeting 8

National e-Science support

• Grid Support Centre (Alistair Mills)– Certification Authority, directory and information services, software

distribution (Globus, Condor), helpdesk, user support, . . .• BBSRC Grid Support Service (Pete Oliver)

– Supporting BBSRC institutes, IGF centres and researchers outside e-Science

– Helping to develop Grid demonstrators for DNA homology searching, etc– Working with PPARC project supporting biomolecular simulation

• Network monitoring (Robin Tasker)– GNT-sponsored monitoring and information services

• Engineering Task Force (David Boyd)– Set up to build the UK e-Science Grid– Level 1 Grid implemented and tested in June 2002– Level 2 Grid project team formed in October 2002– Target is an operational Grid infrastructure by April 2003– Continue to strengthen and expand and eventually migrate to

OGSA/GT3

31 January 2003 GridPP Collaboration Meeting 9

Grid Support Centre Services

• Helpdesk - [email protected]– provides access to expert technical support

• Web information resource - http://www.grid-support.ac.uk– offers Grid awareness and education material

• Grid Starter Kit – downloadable Grid software and installation tools

• National Grid Directory Service– supports Grid resource discovery and access to current status

• Certification Authority (CA) – http://www.grid-support.ac.uk/ca– assigns a trustable digital identity to an individual

– you need one to use the Grid!

31 January 2003 GridPP Collaboration Meeting 10

UK e-Science CA – Jens Jensen

• We issue X.509 certificates to people and servers in the UK e-Science community

• Web based software with extensive on-line documentation• CP/CPS published on the web• Registration Authorities (RAs) carry out local identity checks• We currently have ~30 active RAs • More RAs being appointed at about 3 per month• We regularly run training courses for new RAs• Over 250 certificates issued (personal and server)• Approved by DataGrid and CrossGrid CAs and US DOE• Collaborating with GGF CA group• Grid Support Centre provides documentation and resolves

problems • See http://www.grid-support.ac.uk/ca/

31 January 2003 GridPP Collaboration Meeting 11

Grid Information Services – Rob Allan

Information Portal created for deployment on the UK e-Science Grid

Uses the MDS system via services on ginfo to create HTML-based and on-line Web services for resource discovery and monitoring

• Resource-oriented view of compute and data resources: http://esc.dl.ac.uk/InfoPortal

• Site-oriented view via an active map: http://esc.dl.ac.uk/InfoPortal/Map

• Virtual Organisation view using UDDI with links to contacts, resources and trading models - available as a CLRC Web service http://esc5.dl.ac.uk:8080/uddi/inquiry

31 January 2003 GridPP Collaboration Meeting 12

Grid Compute Resources

InfoPortal augments the Globus MDS with a static XML-based information system showing resource architecture and installed software details:

• showing resource-centric view and menu bar

31 January 2003 GridPP Collaboration Meeting 13

Active Map of Sites

Site-centric view of e-Science Grid, shows:

• List of Grid resources from MDS

• Globus Integration Test results

• Network monitoring data

31 January 2003 GridPP Collaboration Meeting 14

UDDI

Two prototype UDDI registries have been developed:

• CLRC e-Science project registry listing services for HPCPortal, DataPortal and Visualisation Tools

• UK e-Science project registry

31 January 2003 GridPP Collaboration Meeting 15

Extending the Schema

Proposing an enriched schema for projects, users, applications and resources with OGSA-DAI access. Data will be stored as XML in an Oracle relational dbms server.

31 January 2003 GridPP Collaboration Meeting 16

Level 2 Grid project

• Building a usable, operational Grid for science and engineering applications linking resources at all the UK e-Science Centres

• First project attempting to integrate such a heterogeneous set of resources (physically, technically and organisationally) into a working Grid

• Project team led by Rob Allan• 8 workpackages

• Middleware deployment (Nick Hill)• Grid information services (Rob Allan)• User authentication (Alistair Mills)• User access management and accounting (Steven Newhouse)• Grid security (Jon Hillier)• Operational status monitoring (David Baker)• Grid platform deployment (Alistair Mills)• Grid applications (Simon Cox)

• Involves resource managers at all e-Science Centres setting up necessary middleware plus specialised technical input from several Centres

31 January 2003 GridPP Collaboration Meeting 17

Level 2 Grid – current status

• Standardised on Globus 2.2.3 – now installed at most e-Science Centres – MDS still proving unreliable

• Monitoring operational state of middleware now routinely carried out using test scripts – results reported to web – availability improving

• Problem with browser restrictions of Open CA being addressed• Imperial’s VOM software demonstrated and being evaluated for

access management – will VOM/VOMS/CAS converge?• Trusted host database being set up to help securely generate firewall

rules – short term solution to satisfy local security concerns• Applications being identified for initial release – workshop at S’ton on

22 January – issued raised of software licensing for Grid use• Now using GridSite to control access to project confidential

information• Regular fortnightly progress meetings – usually followed by live

demonstrations or debugging sessions

31 January 2003 GridPP Collaboration Meeting 18

Access Grid – group conferencing

Multi-site group-to-group conferencing system

Continuous audio and video contact with all participants

Globally deployed

All UK e-Science Centres have AG rooms

Widely used for technical and management meetings

31 January 2003 GridPP Collaboration Meeting 19

CLRC e-Science Centre

• Organisational structure

• Centre’s core programme

• National e-Science role

• Development programme

• Future challenges

31 January 2003 GridPP Collaboration Meeting 20

Integrated e-Science Environment

Framework for distributed scientific computing and experimentation

local remote

Computers

local remote

Data storage

local remote

Experiments

Grid services middleware

Computing Grid service

Data discovery

Grid service

Data visualisation Grid service

“Problem Solving Environments” Domain-specific application interfaces for scientists

AuthenticationAuthorisationAccounting

Experiment control Grid

service

31 January 2003 GridPP Collaboration Meeting 21

Scalable Application Visualisation Services – Lakshmi Sastry

• Motivation– Address the scalability requirements of scientific visualisation

using Grid and Web services– Preserve investment in familiar domain-specific problem solving

environments that are in everyday use– Improve the support for near-real time data exploration of very

large datasets at the desktop• Applications include

– Next generation data analysis software for ISIS MAPS and MERLIN detectors (mslice)

– Crystallography simulation and instrument monitoring (TobyFit)– Diagnostics of oceanographic modelling with assimilation of

observed data (GODIVA)

31 January 2003 GridPP Collaboration Meeting 22

Architecture

• Grid Aware Portals toolkit, GAPtk, provides scalable visualisation services and APIs to embed these services into familiar application portals and PSEs

• New software to be installed on the desk/lap top is minimal – e.g. no requirement to install Apache web servers or Globus type Grid toolkits

• GAPtk client-side software maps its advanced/specialised graphics and interaction onto the native tool’s facilities for drawing and input handling

• All communication between the client and server is based on SOAP• On the server side, all computations are handled as third-party

delegated tasks, generating computational data as well as visualisation data (geometry information from computational data)

• Connection to third party data services for data retrieval is also handled on the server side so that huge volumes of data don’t get to client desktop PC

31 January 2003 GridPP Collaboration Meeting 23

User’s desktop

Data, compute portal and

registry services

GAPtk AV Server

Grid aware visualisation services

GAPtk services communication frontend (SOAP messages parser and generator)

Grid aware application services

Communication to Grid fabric layer using Globus services API

Grid fabric layer: Data & compute resources and networks

Desktop client side software

Application portal (A task based domain specific user interface - e.g. Java or Matlab based UI)

Portal software’s own event handling backend (responds to user input)

Client side interface for GAPtk utilities (additional functionality registered or linked into the portal software using native mechanisms for extensions)

Communication backend to GAPtk services (SOAP generator and parser)

31 January 2003 GridPP Collaboration Meeting 24

DataPortal – Kerstin Kleese

A Grid service connecting experiment, observation and A Grid service connecting experiment, observation and simulationsimulation

PC-filesystem Unix-filesystem Tape Libraries Databases SAN

CLRC DataPortal

CLRC Metadata Model

General Data description language, enabling the integration of

experimental, observational and computational data from various

scientific disciplines.

Security

Authentication

Access Control

Secure Communication

Query Generation

Generating queries based on user request using the CLRC Metadata Model + XML, addressing multiple XML Wrappers

User Information

ID Mapping

Session Control + History

Data Facility Information

Name, Address, Type of Data held

Active Topics

Additional FeaturesResult Presentation

Request reply collection, collation and presentation

Manipulating and Downloading Data

XML-Wrapper

Translates local Metadata Formats into the CLRC Metadata Format. Handles Queries and requests.

External Metadata Databases (e.g. ISIS, BADC)

Linking Data description and physical Data

Visualisation Portal

Providing general and user specific visualisation tools in close

proximity to the data, a range of capabilities is offered.

HPC Portal

Providing Access to Compute Facilities

and codes on demand

SRB - Storage Resource Broker

Transparently integrates Storage resources of various types. Allows to create logical views on physical storage

space.

RasDaMan

Database Management System providing selective contents based access to multidimensional raster

data. Allows manipulation and reduction of data before retrieval.

Data Extraction

http, ftp, Grid-ftp

31 January 2003 GridPP Collaboration Meeting 25

DataPortal features

• Major functions of the DataPortal (DP) are grouped into modules

• Each module has a grid services interface to communicate with the other DP services and in some cases also with outside services like Visualisation or HPC Portal

• Soap protocol is used for communication and WSDL to describe the various services

• DP does not change any local metadata system, but use its own wrappers to translate its general query format into the local syntax

• Replies from the resources will be XML files compliant with the CLRC Scientific Metadata Format.

• As well as interacting with the DP via the Web Interface users can also run queries by directly calling the Query & Reply service assuming that they are properly authenticated

• Other services are also externally visible, for example the Shopping Cart

31 January 2003 GridPP Collaboration Meeting 26

DataPortal users

The DataPortal currently allows access to selected metadata and data from four facilities. The first three housed by CLRC:

The Synchrotron Radiation Department (SRD)

The Neutron Spallation Source (ISIS)

The British Atmospheric Data Centre (BADC)

Max-Planck Institute for Meteorology (MPIM)

Several e-Science projects are now using DataPortal technology: Environment from the Molecular level (NERC)

NERC DataGrid

E-Science technologies in the simulation of complex materials (EPSRC)

European Spatio-Temporal Data Infrastructure for high-performance computing – ESTEDI (EU)

31 January 2003 GridPP Collaboration Meeting 27

Wider InternetNERC Grid

taperobot

XML data-base

XML data-base

BADC NDG Wrapper

OnlineData

OnlineData

BODC NDGWrapper

OnlineData

XML data-base

Group NDGWrapper

Software Agent

Grid User

Satellite Supercomputer

Research Group DataSources

Internet Link

Internet User

Internet LinkESG (&other)Applications

Wider Internet

NDGWeb

Portal

XML data-base

NERC DataGrid – Bryan Lawrence

31 January 2003 GridPP Collaboration Meeting 28

NDG – close links with ESG

TOMCATServlet engine

TOMCATServlet engine

MCSMetadata Cataloguing Services

MCSMetadata Cataloguing Services

RLSReplica Location Services

RLSReplica Location Services

SOAP

RMI

MyProxyserver

MyProxyserver

MCS client

RLS client

MyProxy clientGRAM

gatekeeper

GRAMgatekeeper

CASCommunity Authorization Services

CASCommunity Authorization Services

CAS client

diskMSS

Mass Storage System

HPSSHigh PerformanceStorage System

disk

HPSSHigh PerformanceStorage System

disk

disk

SRMStorage Resource

Management

SRMStorage Resource

Management

SRMStorage Resource

Management

SRMStorage Resource

Management

SRMStorage Resource

Management

SRMStorage Resource

Management

SRMStorage Resource

Management

SRMStorage Resource

Management

gridFTP

gridFTP

gridFTPserver

gridFTPserver

gridFTPserver

gridFTPserver gridFTP

server

gridFTPserver

gridFTPserver

gridFTPserver

openDAPgserver

openDAPgserver

CAS-enabledStriped-gridFTP

server

CAS-enabledStriped-gridFTP

server

LBNL

LLNL

ISI

NCAR

ORNL

ANL

Earth System Grid

Striped gridFTPclient

Striped gridFTPclient

gridFTP

openDAPgserver

openDAPgserver

CAS-enabledStriped-gridFTP

server

CAS-enabledStriped-gridFTP

server

gridFTP

openDAPgserver

openDAPgserver

CAS-enabledStriped-gridFTP

server

CAS-enabledStriped-gridFTP

server

gridFTP

LASLive

AccessServer

LASLive

AccessServer

31 January 2003 GridPP Collaboration Meeting 29

NDG will provide support for

•Small-but-complex datasets.

•Data-mining (searchable metadata).

ESG will provide support for:

• large but simple data sets,

• limited metadata, but not searchable.

NDG is complementary to ESG!

Web-based Data Portal

31 January 2003 GridPP Collaboration Meeting 30

NDG will:

• Provide python based classes for non-gridded observational data to complement the access to 3D gridded data.

• Provide a web services wrapper so that other grid applications can access NDG data.

Example of a Client Application

31 January 2003 GridPP Collaboration Meeting 31

European Grid projects

• ESTEDI– Has developed a framework for storing and retrieving TB-scale

multi-dimensional data for HPC applications in climate modelling, CFD, etc

• European DataGrid / UK GridPP– Building a European Grid for large-scale data-intensive science

• European Grid Support Centre (just starting)– Developing a prototype multi-national support centre – CLRC (UK Grid), CERN (LHC Grid), KTH (NorduGrid)

• Enabling Grids for E-Science and Industry in Europe (EGEE)– FP6 Research Infrastructure proposal to develop a pan-European

framework integrating national and regional Grids to provide a single, coherent operational Grid for science and industry

– Involves all major European countries and regions– Grid Support Centre involved in support and operations aspects

31 January 2003 GridPP Collaboration Meeting 32

CLRC e-Science Centre

• Organisational structure

• Centre’s core programme

• National e-Science role

• Development programme

• Future challenges

31 January 2003 GridPP Collaboration Meeting 33

Some future challenges

• Delivering the promised benefits of the Grid philosophy and Grid technology for science and engineering

• Providing sufficiently credible and scalable solutions that they spread beyond the research community into commercial use

• Growing a body of knowledge in how to use Grid technology which is sufficient to drive and support its widespread adoption

• Defining a robust and enduring standards framework which will ensure the Grid doesn’t become a proprietary battleground

• Ensuring continuing open availability of essential generic Grid middleware in the face of growing pressure towards commercial licensing