The lightweight Grid-enabled Disk Pool Manager (DPM)

Preview:

DESCRIPTION

The lightweight Grid-enabled Disk Pool Manager (DPM). Sophie Lemaitre – Jean-Philippe Baud EGEE-OSG workshop 25 June 2007. Agenda. DPM architecture SRMv2.2 VOMS and virtual ids What’s next ? Issues. DPM architecture. Functionality offered. - PowerPoint PPT Presentation

Citation preview

Enabling Grids for E-sciencE

The lightweight Grid-enabled Disk Pool Manager (DPM)

Sophie Lemaitre – Jean-Philippe Baud

EGEE-OSG workshop25 June 2007

Enabling Grids for E-sciencE

Agenda

• DPM architecture • SRMv2.2• VOMS and virtual ids• What’s next ?• Issues

Enabling Grids for E-sciencE

DPM architecture

Enabling Grids for E-sciencE

Functionality offered

• Management of disk space on geographically distributed disk servers

• Management of name space (including ACLs)

• Control interfaces– socket, SRM v1.0, SRM v2.1, SRM v2.2 (no srmCopy)

• Data access protocols– secure RFIO, gsiFTP, HTTPS, and to come HTTP

Enabling Grids for E-sciencE

/vo

DPM architecture

/dpm/domain

/home

DPMhead node file

DPMdisk servers

• DPM Name Server– Namespace– Authorization– Physical files location

• DPM Server– Requests queuing and processing– Space management

• SRM Servers (v1.1, v2.1, v2.2)• Disk Servers

– Physical files• Direct data transfer from/to disk server (no bottleneck)

CLI, C API, SRM-enabled

client, etc. data transfer

Enabling Grids for E-sciencE

DPM administration• Feedback from DPM administrators

– “Easy to install and configure”– “It works for us !”– “Our DPM has been running untouched for months”– “Very good online documentation”

• Intuitive commands– As similar to UNIX commands as possible– Ex: dpns-ls, dpns-mkdir, dpns-getacl, etc.

• DPM architecture is database centric– No configuration file– Support for MySQL and Oracle

• Scalability– All servers (except the DPM one) can be replicated if needed

(DNS load balancing)

Enabling Grids for E-sciencE

Platforms

• Supported platforms– SL(C)3– SL(C)4– MAC OS X

• From next release onwards– GridFTP 2 instead of GridFTP 1

• GridFTP 2 plugin– Allowed to have a cleaner implementation– Much simpler than GridFTP 1 to interface to

Enabling Grids for E-sciencE

SRMv2.2

Enabling Grids for E-sciencE

What’s new ?• SRMv2.2

– Biggest effort of last year– Required significant changes in DPM server code

• 5 new method types– Space reservation

srmReserveSpace, srmReleaseSpace, …– Namespace operations

srmMkdir, srmLs, …– Permissions and ACLs

srmSetPermission, srmGetPermission, …– Transfer functions

srmPrepareToPut, srmPerpareToGet, …– Admin functions

srmPing

Enabling Grids for E-sciencE

What’s new ?• Retention policies

– Given quality of disks, admin defines quality of service– Replica, Output, Custodial

• Access latency– Online, Nearline– Nearline will be used for BIOMED DICOM integration

• File storage type– Volatile, Permanent

• File pinning– Extend TURL lifetime (srmPrepareToGet, srmPrepareToPut)– Extend file lifetime in space (srmBringOnline)

Enabling Grids for E-sciencE

Space reservation

• Static space reservation (admin)$ dpm-reservespace --gspace 20G --lifetime Inf --group atlas --token_desc

Atlas_ESD$ dpm-reservespace --gspace 100M --lifetime 1h --group dteam/Role=lcgadmin --

token_desc LcgAd$ dpm-updatespace --token_desc myspace --gspace 5G$ dpm-releasespace --token_desc myspace

• Dynamic space reservation (user)– Defined by user on request

dpm-reservespace srmReserveSpace

– Limitation on duration and size of space reserved

Enabling Grids for E-sciencE

VOMS & Virtual Ids

Enabling Grids for E-sciencE

How to support VOMS ?

• Lightweight VOMS handling in DPM– Check that VOMS proxy signature comes from a trusted

host– For scalability reasons, we didn’t want to contact

another server for every authorization

• Why virtual ids ?– We didn’t want to use local users / groups

That admins would need to create a priori– DPM instead uses virtual ids

Stored in the DPM Name Server database Created automatically when user first connects with a valid proxy

Enabling Grids for E-sciencE

DPM virtual ids

• Each user’s DN– Is mapped to a unique virtual uid

• Each VOMS group, each VOMS role– Is mapped to a unique virtual gid

• Virtual uids / gids are created automatically– the first time a given user / group contacts the DPM

DPMName Server

(uid1, gid1)

Enabling Grids for E-sciencE

DPM virtual ids

/C=CH/O=CERN/OU=GRID/CN=Sophie Lemaitre 2268 101/C=CH/O=CERN/OU=GRID/CN=Simone Campana 7461 102

Virtual gids mapping (example)

Virtual uids mapping (example)

atlas 101atlas/Role=lcgadmin 102atlas/Role=production 103

DPMName Server

(uid1, gid1)Ex: (102, 101)

$ grid-proxy-init$ voms-proxy-init --vo atlas

Simone will be mapped to (uid, gid) = (102, 101)

DB

Enabling Grids for E-sciencE

DPM secondary groups

/C=CH/O=CERN/OU=GRID/CN=Sophie Lemaitre 2268 101/C=CH/O=CERN/OU=GRID/CN=Simone Campana 7461 102

Virtual gids mapping (example)

Virtual uids mapping (example)

atlas 101atlas/Role=lcgadmin 102atlas/Role=production 103

DPMName Server

(uid1, gid1)Ex: (102, 103, 101)

$ voms-proxy-init –vomsatlas:/atlas/Role=production

Simone will be mapped to (uid, gid, …) = (102, 103, 101)Simone still belongs to “atlas”

DB

Enabling Grids for E-sciencE

ACLs on files• DPM supports Posix ACLs based on Virtual Ids

– Access Control Lists on files and directories– Default Access Control Lists on directories: they are inherited by the sub-

directories and files under the directory

• Example– dpns-mkdir /dpm/cern.ch/home/dteam/jpb– dpns-setacl -m d:u::7,d:g::7,d:o:5 /dpm/cern.ch/home/dteam/jpb– dpns-getacl /dpm/cern.ch/home/dteam/jpb # file: /dpm/cern.ch/home/dteam/jpb # owner: /C=CH/O=CERN/OU=GRID/CN=Jean-Philippe Baud 7183 # group: dteam user::rwx group::r-x #effective:r-x other::r-x default:user::rwx default:group::rwx default:other::r-x

Enabling Grids for E-sciencE

ACLs on pools

• DPM terminology– A DPM pool is a set of filesystems on DPM disk servers

• By default, pools are generic

• Possibility to dedicate a pool to several groups– dpm-addpool --poolname poolA --group alice– dpm-addpool --poolname poolB --group atlas,cms,lhcb

• Easy to add or remove groups– dpm-modifypool --poolname poolA --group +atlas,-alice

Enabling Grids for E-sciencE

Authorization models

• Follow the UNIX model– Namespace: primary and secondary groups

– Space reservation: primary group only For disk space accounting (and quotas later) Who actually uses the space gets to pay the bill…

Enabling Grids for E-sciencE

What’s next ?

Enabling Grids for E-sciencE

What’s next ?

• Next release– DPM Name Server as local LFC

• Short term (autumn 2007)– Quotas– srmCopy daemon– Medical data management

Encryption DICOM backend

• Medium term (beginning 2008)– NFSv4.1

Enabling Grids for E-sciencE

Local LFC

• DPM Name Server– Can act as a local LFC (LCG File Catalog)

• Advantages– Only one service to run instead of two (LFC + DPM)– Transparent to the users

• Available in next release

Enabling Grids for E-sciencE

DPM quotas

• DPM terminology– A DPM pool is a set of filesystems on DPM disk servers

• Unix-like quotas– Quotas are defined per disk pool– Usage in a given pool is per DN and per VOMS FQAN– Primary group gets charged for usage– Quotas in a given pool can be defined/enabled per DN and/or

per VOMS FQAN– Quotas can be assigned by admin– Default quotas can be assigned by admin and applied to new

users/groups contacting the DPM

Enabling Grids for E-sciencE

DPM quotas

• Unix-like quota interfaces– User interface

dpns-quota gives quota and usage information for a given user/group (restricted to the own user information)

– Administrator interface dpns-quotacheck to compute the current usage on an existing

system dpns-repquota to list the usage and quota information for all

users/groups dpns-setquota to set or change quotas for a given user/group

Enabling Grids for E-sciencE

DPM with NFSv4.1

• NFSv4.1 and DPM have similar architectures– Separate metadata server– Direct access to physical files– Easy NFSv4.1 integration

Enabling Grids for E-sciencE

Encrypted StorageMedical community as the principal user• large amount of images are produced in DICOM• privacy concerns vs. processing needs• ease of use (image production and application)Strong security requirements• anonymity (patient data is separate)• fine grained access control• privacy (even storage administrator cannot read)

data is encrypted (DICOM-SE) and decrypted (client) in memory

DICOM-SE

SRMv2

gridftp

I/O

DICOM

HydraKeyStore

AMGAmetadata

1. patient look-up

3. get TURL

2. keys

4. read enc. image

HydraKeyStoreHydra

KeyStore

DICOMplug-in

3.1 get enc. image

3.1.1 keys

3.1.2 image 5. decrypt

Enabling Grids for E-sciencE

Issues

Enabling Grids for E-sciencE

Issues

• DPM stable and reliable service but…

• No NFS support yet– For several sites, reason for not moving from Classic SE to DPM

• Lack of experience with big sites

• Lack of internal monitoring– Ex1: automatically disable a file system that is down– Ex2: automatically limit the number of transfers to a disk server

• Different VO types (HEP, BIOMED, etc.)– Need to develop different features for different needs

Enabling Grids for E-sciencE

Summary

• DPM service– Manages space on distributed disks– Easy to configure and administer– Easy and transparent to use– Stable and reliable Grid service– Widely deployed

125 DPM instances in EGEE 138 VOs supported

• Short term– Quotas– NFSv4 support

DPM

dCache

CASTOR

Number of Storage Element instancespublished in EGEE top BDII

Enabling Grids for E-sciencE

Help ?

• DPM online documentationhttps://twiki.cern.ch/twiki/bin/view/LCG/DataManagementDocumentation

• Support– helpdesk@ggus.org

• General questions– hep-service-dpm@cern.ch

Recommended