Upload
homer-skinner
View
215
Download
2
Embed Size (px)
Citation preview
Storage
Alessandra Forti
Group seminar
16th May 2006
Introduction
• Different applications, different data, different environments, different solutions.– Root, paw ≠ Athena, BetaMiniApp or Dzero applications– AOD ≠ user ntuples– Tier2 ≠ department– Home directories ≠ user space on data servers
• We don’t need to stick to one solution for everything• So I guess the basic fundamental questions to answer
are:– What data do will we have on our storage? – How much space do we need?– What do we want from the our storage i.e redundancy,
performance?• Can we obtain this in different ways?
– How can we access this space?
Solutions
• Accessibility– SRM– GFAL, lcg-utils – AFS/NFS
• Classic storage– RAID (Redundant Array of Inexpensive disks) on local
machines• Grid Storage
– Dcache – Xrootd– /grid (HTTP based file system)
Accessibility
• SRM: the grid middleware component whose function is to provide dynamic space allocation and file management on shared distributed storage systems– Manage space
• Negotiate and assign space to users and manage lifetime of spaces
– Manage files on behalf of user• Pin files in storage till they are released• Manage lifetime of files
– Manage file sharing• Policies on what should reside on a storage or what to evict
– Bring the files from remote locations• Manage multi-file requests• queue file requests, pre-stage
Accessibility
• GFAL is a library that can be linked to the applications to access data on a grid system.– It supports SRM APIs and the majority of grid
protocols.
• Lcg-utils also can talk to SRM to copy, replicate and list data.– Lcg utils are the way to copy data on a grid
system and register the copy in the file catalogs.
Accessibility
• AFS/NFS (briefly) are shared file systems that can help sharing small amount of data.– AFS on a WAN would be really good if used
for software distribution and I think ATLAs is supporting it.
– NFS cannot be used outside the local site and it doesn’t scale very well with a number large (few hundred) of clients writing at the same time. Reading is fine.
Classic storage
• Classic storage consist in 1 or more data servers normally with RAIDED disks accessible by local machines normally via NFS.
• Accessible some times (mostly at bigger labs) by remote machines via transfer protocols like scp, ftp or else, but not by applications for direct data reading.
• There are no file catalogs attached.• Files are not replicated somewhere else
– Need of local redundancy
• The file name space is local and normally offered by NFS
RAID
• There are different RAID levels depending on the purpose.– Most used: RAID0, RAID1, RAID5
• RAID 0: clusters 2 or more disks, data are written in blocks (striped) across the disks, there is no redundancy.– enhanced read/write performance but no reliability if one disk
dies all data are lost– Good for access of temporary data a WEB cache for example.
• RAID 1: mirrors two or more disks– Exponentially enhanced reliability– Linearly enhanced read performance (data striping for reading
but not for writing)– Partitions can be mirrored instead of disks– Good for servers: home dirs, WEB servers, Computing element,
dcache head node, sw servers
RAID• RAID 2,3,4 data are striped across the disks at respectively bit,
bytes, block level – They have parity disks for reliability
• Parity is a way of tracking changes using single bits or blocks of bits. Parity alone is not enough to do error recovery and reconstruction.
– They are not very popular: if the parity disk dies the whole raid is unrecoverable.
– They require minimum 3 disks• RAID 5 like RAID4 (block-level striping) but the parity is distributed
across disks– Enhanced reliability parity and data blocks are distributed.
• If one disk dies it can be rebuilt, if two die the whole array is lost.– In theory unlimited number of disks in practice it is better to limit them.– Poorer write performance due to the way parity must be maintained
consistent with each write.• Raid 5 is what is normally used on data servers where reads are more
frequent than writes.
Grid Storage• Grid Storage consist in any device that has a space.
data servers, Worker Nodes, tapes….• It is accessible to local CPUs via a number of different
protocols depending on what storage management software the site administrator has installed.
• It is accessible from anywhere in the world to copy data in and out using grid utilities.
• It has all the supported VO file catalogs attached• Files can be easily replicated at other sites
– No real need for local redundancy
• File name space has to span across multiple machines.• In Manchester we have 400 TB of distributed disks on
the worker nodes. – Dcache xrootd and other solutions are a way to exploit it.
dcache• Dcache has been developed by Fermi lab and DESY to
deal with their tape storage system and the staging of data on disk but it has evolved in a more general storage system manager tool.
• Advantages– It is SRM integrated so it has most of the space management
features.– Combines several hundred nodes disks under a single file name
space.– Load balance.– Data only removed if space is running short (no threshold)– Takes care that at least 'n' but not more than 'm' copies of a
single dataset exists within one dCache instance.– Takes care that this rule is still true if nodes go down (schedules
or even unexpected)
dcache(3)
• Disadvantages– It is not POSIX compliant files cannot be accessed as on a
normal unix file system– Supported protocols are rewritten in dcache language– It is written in java– Sources are not available– The file name space is implemented using a database in the
middle– Support is, for various reasons, inadequate
• Unfortunately up to now it was the only solution availbable for a system like Manchester one
• Other viable solutions could be xrootd and StoRM
Xrootd(1)• XROOTD: file server which provides high performance file-based
access. It was developed by BaBar/SLAC as an extension of rootd. It is now distributed as part of the standard ROOT.
• It is now being adopted by two LHC experiments (Alice and CMS)• Advantages:
– data are located within xrootd process there is no need of a database to catalog files on the system
– It supports load balancing• xrootd determines which server is the best for client’s request to open a file
– It is fault tollerant Fault tolerance feature• missing data can be again restored from other disks
– Authorization plugin• resolve "trusted/untrusted" user for write access
• Disadvantages– It is not integrated with SRM so all the space management isn’t there– lcg-utils and GFAL cannot talk to xrootd (yet)
Discussion