12
GridPP9 – 5 February 2004 – Data Management DataGrid is a project funded by the European Union GridPP is funded by PPARC GridPP2: Data and Storage Management Gavin McCance - University of Glasgow Jens Jensen - RAL GridPP9, NeSC, Edinburgh

GridPP2: Data and Storage Management

Embed Size (px)

DESCRIPTION

GridPP2: Data and Storage Management. Gavin McCance - University of Glasgow Jens Jensen - RAL GridPP9, NeSC, Edinburgh. GridPP2 Middleware Data and Storage Management. Work areas. UK metadata management group Storage management. Metadata Management. - PowerPoint PPT Presentation

Citation preview

Page 1: GridPP2: Data and Storage Management

GridPP9 – 5 February 2004 – Data Management

DataGrid is a project funded by the European UnionGridPP is funded by PPARC

GridPP2: Data and Storage Management

Gavin McCance - University of Glasgow

Jens Jensen - RAL

GridPP9, NeSC, Edinburgh

Page 2: GridPP2: Data and Storage Management

GridPP9 – 5 February 2004 – Data Management – n° 2Gavin McCance – University of Glasgow

GridPP2 MiddlewareData and Storage Management

Page 3: GridPP2: Data and Storage Management

GridPP9 – 5 February 2004 – Data Management – n° 3Gavin McCance – University of Glasgow

Work areas

UK metadata management group

Storage management

Page 4: GridPP2: Data and Storage Management

GridPP9 – 5 February 2004 – Data Management – n° 4Gavin McCance – University of Glasgow

Metadata Management

The focus is upon Grid-enabling metadata services for the experiments

Building upon our previous work in this area Building upon experiments’ existing work in this area

Formation of a UK metadata group with GridPP2 1 generic Grid metadata post @ Glasgow ~1 post per experiment

ATLAS @ Glasgow, LHCb @ Oxford, CMS @ Bristol/ICUS expts, others??

These posts were described yesterday – the UK metadata group should form part of their work

Input from the UK data management support teams

Page 5: GridPP2: Data and Storage Management

GridPP9 – 5 February 2004 – Data Management – n° 5Gavin McCance – University of Glasgow

GridPP2 Metadata Group

Purpose will be to Take overall responsibility for common experiment metadata

technologies in order to Grid-enable the experiments’ metadata

Identify the commonalities and experience across experiments and make sure these are recognized

i.e. technologies, schema: data product navigational problem

Come to agreement and feed this back into the wider ARDA process

Work directly with interested groups forming the ARDA EGEE JRA1 Data Management Group (@CERN)

LCG Deployment Teams (@CERN)

LCG Experiments

IT Database group (@CERN)

Page 6: GridPP2: Data and Storage Management

GridPP9 – 5 February 2004 – Data Management – n° 6Gavin McCance – University of Glasgow

Metadata Responsibilities

Generic metadata post: Concentration on the technologies used to create scalable,

manageable and fault-tolerant metadata services The underlying Grid software stack

Emphasis upon the service, not just the product 24/7 supportable production services

Not prescribing things like the schema, or saying the ‘API must look like Spitfire’: prototype interfaces should be based upon experiments’ existing metadata interfaces

Will track, develop and adopt as necessary Grid metadata access standards

Feed into standards to make sure we’re in a position to benefit from the future production products that implement these standards

Feed PPE use-case and experience back into the wider world

Page 7: GridPP2: Data and Storage Management

GridPP9 – 5 February 2004 – Data Management – n° 7Gavin McCance – University of Glasgow

Metadata Responsibilities Experiment metadata posts (~1 per experiment):

Document existing implementations from the experiments and make sure all the experiments’ use-cases are satisfied by the products and the technologies being proposed by the group

Work within the group to ensure that commonalities and experience across experiments are recognized and effort is not wasted

At the technology level – e.g. using the same underlying Grid software stack

At the interface level – e.g. GANGA Possibly at the schema level…

Feed this understanding and agreement back into the wider ARDA process and back into their own experiments

ARDA terminology:Dataset metadata ARDA Metadata serviceData product navigation ARDA Job Provenance service

Page 8: GridPP2: Data and Storage Management

GridPP9 – 5 February 2004 – Data Management – n° 8Gavin McCance – University of Glasgow

Storage Management

Two areas of work (based at RAL)

SRM interface to UK storage sites

Site local data management

Page 9: GridPP2: Data and Storage Management

GridPP9 – 5 February 2004 – Data Management – n° 9Gavin McCance – University of Glasgow

SRM interface to UK Storage

Initial deliverable will be to provide an SRM (Storage Resource Manager) v1 interface to the Atlas DataStore at RAL

Subsequent migration to the more advanced features offered by e.g. SRM v2

Perform an analysis of the UK Tier-2 storage sites and how these can be exposed via the common SRM interface

Implementation of SRM interfaces these storage systems Deployment on all the Tier-2 sites and support

Contribution to the SRM standardisation process

Work closely with the EGEE JRA1 and LCG deployment groups

Work with support staff for Tier-1 and Tier-2

Page 10: GridPP2: Data and Storage Management

GridPP9 – 5 February 2004 – Data Management – n° 10Gavin McCance – University of Glasgow

Site-local Data Management

Management of data and files within a site How you access the grid storage from the worker nodes

Cleanup of volatile data resources that a job no longer needs (Tier2) – cache management

Evaluation of existing technologies dCache, SAM, EDG Zambo prototype, Condor, …

Development and deployment of these local data management solutions (@ Tier-2)

Interaction with Tier-2 site managers is vital

Feed back solutions into LCG / EGEE

Page 11: GridPP2: Data and Storage Management

GridPP9 – 5 February 2004 – Data Management – n° 11Gavin McCance – University of Glasgow

GridPP2 SupportData and Storage Management

Page 12: GridPP2: Data and Storage Management

GridPP9 – 5 February 2004 – Data Management – n° 12Gavin McCance – University of Glasgow

Data Management Support

UK data management support posts Aim: to provide first-level support for all DM software

first stop for UK system administrators

Work directly with the development and deployment teams (GridPP2, EGEE and LCG)

Provide hands-on deployment help for data challenge support

Develop how-to portal to collect deployment experience

Feed back sys-admin issues and experience to developers Site policies, quotas, firewalls – survey sysadmins

Develop site validation tools

Responsible for developing the overall support plan for the data management services beyond GridPP2

Need to fit all this in with the rest of the UK Support Plan