16 th May 2006Alessandra Forti Storage Alessandra Forti Group seminar 16th May 2006

Storage

Alessandra Forti

Group seminar

16th May 2006

Introduction

• Different applications, different data, different environments, different solutions.– Root, paw ≠ Athena, BetaMiniApp or Dzero applications– AOD ≠ user ntuples– Tier2 ≠ department– Home directories ≠ user space on data servers

• We don’t need to stick to one solution for everything• So I guess the basic fundamental questions to answer

are:– What data do will we have on our storage? – How much space do we need?– What do we want from the our storage i.e redundancy,

performance?• Can we obtain this in different ways?

– How can we access this space?

Solutions

• Accessibility– SRM– GFAL, lcg-utils – AFS/NFS

• Classic storage– RAID (Redundant Array of Inexpensive disks) on local

machines• Grid Storage

– Dcache – Xrootd– /grid (HTTP based file system)

Accessibility

• SRM: the grid middleware component whose function is to provide dynamic space allocation and file management on shared distributed storage systems– Manage space

• Negotiate and assign space to users and manage lifetime of spaces

– Manage files on behalf of user• Pin files in storage till they are released• Manage lifetime of files

– Manage file sharing• Policies on what should reside on a storage or what to evict

– Bring the files from remote locations• Manage multi-file requests• queue file requests, pre-stage

Accessibility

• GFAL is a library that can be linked to the applications to access data on a grid system.– It supports SRM APIs and the majority of grid

protocols.

• Lcg-utils also can talk to SRM to copy, replicate and list data.– Lcg utils are the way to copy data on a grid

system and register the copy in the file catalogs.

Accessibility

• AFS/NFS (briefly) are shared file systems that can help sharing small amount of data.– AFS on a WAN would be really good if used

for software distribution and I think ATLAs is supporting it.

– NFS cannot be used outside the local site and it doesn’t scale very well with a number large (few hundred) of clients writing at the same time. Reading is fine.

Classic storage

• Classic storage consist in 1 or more data servers normally with RAIDED disks accessible by local machines normally via NFS.

• Accessible some times (mostly at bigger labs) by remote machines via transfer protocols like scp, ftp or else, but not by applications for direct data reading.

• There are no file catalogs attached.• Files are not replicated somewhere else

– Need of local redundancy

• The file name space is local and normally offered by NFS

RAID

• There are different RAID levels depending on the purpose.– Most used: RAID0, RAID1, RAID5

• RAID 0: clusters 2 or more disks, data are written in blocks (striped) across the disks, there is no redundancy.– enhanced read/write performance but no reliability if one disk

dies all data are lost– Good for access of temporary data a WEB cache for example.

• RAID 1: mirrors two or more disks– Exponentially enhanced reliability– Linearly enhanced read performance (data striping for reading

but not for writing)– Partitions can be mirrored instead of disks– Good for servers: home dirs, WEB servers, Computing element,

dcache head node, sw servers

RAID• RAID 2,3,4 data are striped across the disks at respectively bit,

bytes, block level – They have parity disks for reliability

• Parity is a way of tracking changes using single bits or blocks of bits. Parity alone is not enough to do error recovery and reconstruction.

– They are not very popular: if the parity disk dies the whole raid is unrecoverable.

– They require minimum 3 disks• RAID 5 like RAID4 (block-level striping) but the parity is distributed

across disks– Enhanced reliability parity and data blocks are distributed.

• If one disk dies it can be rebuilt, if two die the whole array is lost.– In theory unlimited number of disks in practice it is better to limit them.– Poorer write performance due to the way parity must be maintained

consistent with each write.• Raid 5 is what is normally used on data servers where reads are more

frequent than writes.

Grid Storage• Grid Storage consist in any device that has a space.

data servers, Worker Nodes, tapes….• It is accessible to local CPUs via a number of different

protocols depending on what storage management software the site administrator has installed.

• It is accessible from anywhere in the world to copy data in and out using grid utilities.

• It has all the supported VO file catalogs attached• Files can be easily replicated at other sites

– No real need for local redundancy

• File name space has to span across multiple machines.• In Manchester we have 400 TB of distributed disks on

the worker nodes. – Dcache xrootd and other solutions are a way to exploit it.

dcache• Dcache has been developed by Fermi lab and DESY to

deal with their tape storage system and the staging of data on disk but it has evolved in a more general storage system manager tool.

• Advantages– It is SRM integrated so it has most of the space management

features.– Combines several hundred nodes disks under a single file name

space.– Load balance.– Data only removed if space is running short (no threshold)– Takes care that at least 'n' but not more than 'm' copies of a

single dataset exists within one dCache instance.– Takes care that this rule is still true if nodes go down (schedules

or even unexpected)

dcache(3)

• Disadvantages– It is not POSIX compliant files cannot be accessed as on a

normal unix file system– Supported protocols are rewritten in dcache language– It is written in java– Sources are not available– The file name space is implemented using a database in the

middle– Support is, for various reasons, inadequate

• Unfortunately up to now it was the only solution availbable for a system like Manchester one

• Other viable solutions could be xrootd and StoRM

Xrootd(1)• XROOTD: file server which provides high performance file-based

access. It was developed by BaBar/SLAC as an extension of rootd. It is now distributed as part of the standard ROOT.

• It is now being adopted by two LHC experiments (Alice and CMS)• Advantages:

– data are located within xrootd process there is no need of a database to catalog files on the system

– It supports load balancing• xrootd determines which server is the best for client’s request to open a file

– It is fault tollerant Fault tolerance feature• missing data can be again restored from other disks

– Authorization plugin• resolve "trusted/untrusted" user for write access

• Disadvantages– It is not integrated with SRM so all the space management isn’t there– lcg-utils and GFAL cannot talk to xrootd (yet)

Discussion

Documents

16 th May 2006Alessandra Forti Storage Alessandra Forti Group seminar 16th May 2006