20
Scalla/XRootd Scalla/XRootd Scalla/XRootd Scalla/XRootd Advancements Advancements xrootd /cmsd (f.k.a. olbd) Fabrizio Furano CERN IT/PSS CERN IT/PSS Andrew Hanushevsky Stanford Linear Accelerator Center http://xrootd slac stanford edu http://xrootd.slac.stanford.edu

Scalla/XRootd Advdv ce e sancements...Composite Cluster Name Space opendir() refers to the directory structure maintained by xrootd:2094 Client xroot.redirect mkdir myhost:2094 Redirector

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Scalla/XRootd Advdv ce e sancements...Composite Cluster Name Space opendir() refers to the directory structure maintained by xrootd:2094 Client xroot.redirect mkdir myhost:2094 Redirector

Scalla/XRootdScalla/XRootdScalla/XRootd Scalla/XRootd AdvancementsAdvancementsdv ce e sdv ce e s

xrootd /cmsd (f.k.a. olbd)

Fabrizio FuranoCERN – IT/PSSCERN IT/PSS

Andrew HanushevskyStanford Linear Accelerator Center

http://xrootd slac stanford eduhttp://xrootd.slac.stanford.edu

Page 2: Scalla/XRootd Advdv ce e sancements...Composite Cluster Name Space opendir() refers to the directory structure maintained by xrootd:2094 Client xroot.redirect mkdir myhost:2094 Redirector

Outline

Current ElaborationsComposite Cluster Name SpacePOSIX file system access via FUSE+xrootdy

New DevelopmentsCl t M t S i ( dd)Cluster Management Service (cmsdcmsd)

Cluster globalization

WAN direct data accessConclusion

28-November-07 2: http://xrootd.slac.stanford.edu

Conclusion

Page 3: Scalla/XRootd Advdv ce e sancements...Composite Cluster Name Space opendir() refers to the directory structure maintained by xrootd:2094 Client xroot.redirect mkdir myhost:2094 Redirector

The Distributed Name Space

The Scalla/xrootdScalla/xrootd suite implements a distributed name spacedistributed name space

Very scalable and efficientSufficient for data analysis

Some users and applications (e.g., SRM) rely pp yon a centralized name space

Spurred the development of a Composite Name p p pSpace (cnsdcnsd) add-on

Simplest solution with the least entanglement

28-November-07 3: http://xrootd.slac.stanford.edu

Page 4: Scalla/XRootd Advdv ce e sancements...Composite Cluster Name Space opendir() refers to the directory structure maintained by xrootd:2094 Client xroot.redirect mkdir myhost:2094 Redirector

Composite Cluster Name Spaceopendir() refers to the directory structure maintained by xrootd:2094

Client

xroot.redirect mkdir myhost:2094 Redirectorxrootd:1094

Name Spacexrootd:2094ManagerManager open/trunc

mkdirmv

xroot.redirect mkdir myhost:2094

Data Data ServersServers

mvrm

rmdirServersServers

cnsdofs.notify closew, create, mkdir, mv, rm, rmdir |/opt/xrootd/etc/cnsd

28-November-07 4: http://xrootd.slac.stanford.edu

Page 5: Scalla/XRootd Advdv ce e sancements...Composite Cluster Name Space opendir() refers to the directory structure maintained by xrootd:2094 Client xroot.redirect mkdir myhost:2094 Redirector

cnsdcnsd Specifics

Servers direct name space actions to common xrootd(s)C td i t i itCommon xrootd maintains composite name space

Typically, these run on the redirector nodes

Name space replicated in the file systemName space replicated in the file systemNo external database neededSmall disk footprintSmall disk footprint

Deployed at SLAC for AtlasNeeds synchronization utilities, more documentation, and packagingy p g g

See Wei Yang for detailsSimilar mySQL based system being developed by CERN/Atlas

A b ll L < b ll l @ h>

28-November-07 5: http://xrootd.slac.stanford.edu

Annabelle Leung <[email protected]>

Page 6: Scalla/XRootd Advdv ce e sancements...Composite Cluster Name Space opendir() refers to the directory structure maintained by xrootd:2094 Client xroot.redirect mkdir myhost:2094 Redirector

Data System vs File System

ScallaScalla is a data access systemSome users/applications want file system semantics

More transparent but many times less scalable

For years users have asked ….Can ScallaScalla create a file system experience?Can ScallaScalla create a file system experience?

The answer is ….It can to a degree that may be good enough

We relied on FUSEFUSE to show how28-November-07 6: http://xrootd.slac.stanford.edu

We relied on FUSEFUSE to show how

Page 7: Scalla/XRootd Advdv ce e sancements...Composite Cluster Name Space opendir() refers to the directory structure maintained by xrootd:2094 Client xroot.redirect mkdir myhost:2094 Redirector

What is FUSEFUSE

FFilesystem in UUsersspaceeU d t i l t fil t iUsed to implement a file system in a user space program

Linux 2.4 and 2.6 onlyRefer to http://fuse sourceforge net/Refer to http://fuse.sourceforge.net/

Can use FUSE FUSE to provide xrootd accessLooks like a mounted file systemLooks like a mounted file system

SLAC and FZK have xrootd-based versions of thisWei Yang at SLAC g

Tested and practically fully functionalAndreas Petzold at FZK

I l h t t t f ll f ti l t

28-November-07 7: http://xrootd.slac.stanford.edu

In alpha test, not fully functional yet

Page 8: Scalla/XRootd Advdv ce e sancements...Composite Cluster Name Space opendir() refers to the directory structure maintained by xrootd:2094 Client xroot.redirect mkdir myhost:2094 Redirector

XrootdFS (Linux/FUSE/Xrootd)

ClientClient Kernel

User Space

Appl

POSIX File SystemInterface FUSE

FUSE/X t I t fHostHost opendir

createmkdir

xrootd POSIX Client

Appl FUSE/Xroot Interface

mkdirmvrm

rmdir

Redirectorxrootd:1094

Name Spacexrootd:2094RedirectorRedirector

HostHostHostHost

Should run cnsd on serverst t FUSE t

28-November-07 8: http://xrootd.slac.stanford.edu

to capture non-FUSE events

Page 9: Scalla/XRootd Advdv ce e sancements...Composite Cluster Name Space opendir() refers to the directory structure maintained by xrootd:2094 Client xroot.redirect mkdir myhost:2094 Redirector

XrootdFS PerformanceSun V20zRHEL4

2x 2.2Ghz AMD Opteron

VA Linux 1220RHEL3

2x 866Mhz Pentium 34GB RAM

1Gbit/sec Ethernet1GB RAM

100Mbit/sec Ethernet

Unix dd, globus-url-copy & uberftp

Client

, g py p5-7MB/sec with 128KB I/O block size

Unix cp 0.9MB/sec with 4KB I/O block size

Conclusion: Better for some things than othersConclusion: Better for some things than others..

28-November-07 9: http://xrootd.slac.stanford.edu

f gf g

Page 10: Scalla/XRootd Advdv ce e sancements...Composite Cluster Name Space opendir() refers to the directory structure maintained by xrootd:2094 Client xroot.redirect mkdir myhost:2094 Redirector

Why XrootdFS?

Makes some things much simplerM t SRM i l t ti t tlMost SRM implementations run transparentlyAvoid pre-load library worries

But impacts other thingsPerformance is limited

Kernel-FUSE FUSE interactions are not cheapRapid file creation (e.g., tar) is limited

FUSEFUSE t b d i i t ti l i t ll d t b dFUSE FUSE must be administratively installed to be usedDifficult if involves many machines (e.g., batch workers)Easier if it involves an SE node (i e SRM gateway)

28-November-07 10: http://xrootd.slac.stanford.edu

Easier if it involves an SE node (i.e., SRM gateway)

Page 11: Scalla/XRootd Advdv ce e sancements...Composite Cluster Name Space opendir() refers to the directory structure maintained by xrootd:2094 Client xroot.redirect mkdir myhost:2094 Redirector

Next Generation Clustering

Cluster Management Service (cmsdcmsd)Functionally replaces olbd

Compatible with olbd config fileUnless you are using deprecated directives

Straight forward migrationEither run olbd or cmsd everywhere

Currently in alpha test phaseAvailable in CVS headDocumentation on web site

28-November-07 11: http://xrootd.slac.stanford.edu

Page 12: Scalla/XRootd Advdv ce e sancements...Composite Cluster Name Space opendir() refers to the directory structure maintained by xrootd:2094 Client xroot.redirect mkdir myhost:2094 Redirector

cmsdcmsd Advantages

Much lower latencyNew very extensible protocolNew very extensible protocolBetter fault detection and recoveryAdded functionalityAdded functionality

Global clustersAuthenticationServer selection can include space utilization metricUniform handling of opaque informationC l b l f lCross protocol messages to better scale xproof clusters

Better implementation for reduced maintenance cost

28-November-07 12: http://xrootd.slac.stanford.edu

Page 13: Scalla/XRootd Advdv ce e sancements...Composite Cluster Name Space opendir() refers to the directory structure maintained by xrootd:2094 Client xroot.redirect mkdir myhost:2094 Redirector

Cluster Globalization

xrootdxrootd

BNL all.role meta managerall.manager meta atlas.bnl.gov:1312root://atlas.bnl.gov/root://atlas.bnl.gov/

includesincludes Meta Managers can be

cmsdcmsdSLAC, UOM, UTASLAC, UOM, UTAxroot clustersxroot clusters

Meta Managers can be geographically replicated!

Note:the security hats will likely

require you use xrootdnative proxy support

cmsdcmsd

xrootdxrootd

cmsdcmsd

xrootdxrootd

cmsdcmsd

xrootdxrootd

cmsdcmsd

UTA

cmsdcmsd

SLAC

cmsdcmsd

UOMall.role manager all.role manager all.role managerll t tl b l 1312 ll t tl b l 1312 ll t tl b l 1312

28-November-07 13: http://xrootd.slac.stanford.edu

all.manager meta atlas.bnl.gov:1312 all.manager meta atlas.bnl.gov:1312 all.manager meta atlas.bnl.gov:1312

Page 14: Scalla/XRootd Advdv ce e sancements...Composite Cluster Name Space opendir() refers to the directory structure maintained by xrootd:2094 Client xroot.redirect mkdir myhost:2094 Redirector

Why Globalize?Uniform view of participating clusters

Can easily deploy a virtual MSSCan easily deploy a virtual MSSIncluded as part of the existing MPS framework

Try out real time WAN accessTry out real time WAN accessYou really don’t need data everywhere!

Alice is slowly moving in this directionAlice is slowly moving in this directionThe non-uniform name space is an obstacle

Sl l h i th ld hSlowly changing the old approachSome workarounds could be possible, though

28-November-07 14: http://xrootd.slac.stanford.edu

Page 15: Scalla/XRootd Advdv ce e sancements...Composite Cluster Name Space opendir() refers to the directory structure maintained by xrootd:2094 Client xroot.redirect mkdir myhost:2094 Redirector

Virtual MSSPowerful mechanism to increase reliability

Data replication load is widely distributedM lti l it il bl fMultiple sites are available for recovery

Allows virtually unattended operationBased on BaBar experience with real MSSp

Idea: to consider as a MSS the meta-cluster a cluster is subscribed inAutomatic restore due to server failure

Missing files in one cluster fetched from anothergTypically the fastest one which has the file really online

Local cluster file (pre)fetching on demandCan be transformed into a 3rd-party copy

When cmsd is deployedPractically no need to track file location

But still need for metadata repositories

28-November-07 15: http://xrootd.slac.stanford.edu

Page 16: Scalla/XRootd Advdv ce e sancements...Composite Cluster Name Space opendir() refers to the directory structure maintained by xrootd:2094 Client xroot.redirect mkdir myhost:2094 Redirector

Virtual MSS – a way to do it

xrootdxrootd

CERN meta all.role meta managerall.manager meta metaxrd.cern.ch:1312root://metaxrd.cern.ch/root://metaxrd.cern.ch/

includesincludes Meta Managers can be

cmsdcmsdSLAC, GSISLAC, GSIxroot clustersxroot clusters

Meta Managers can be geographically replicated!

A local client stillcontinues to work

cmsdcmsd

xrootdxrootd

cmsdcmsd

xrootdxrootdMissing a file?

Ask to the global metamgr cmsdcmsd

GSI

cmsdcmsd

SLACll t tl b l 1312 ll t t d h 1312

Get it from any othercollaborating cluster

28-November-07 16: http://xrootd.slac.stanford.edu

all.manager meta atlas.bnl.gov:1312 all.manager meta metaxrd.cern.ch:1312

Page 17: Scalla/XRootd Advdv ce e sancements...Composite Cluster Name Space opendir() refers to the directory structure maintained by xrootd:2094 Client xroot.redirect mkdir myhost:2094 Redirector

**Dumb WAN Access**

Setup: client at CERN, data at SLAC164 RTT ti il bl b d idth < 100Mb/164ms RTT time, available bandwidth < 100Mb/s

Test 1: Read a large ROOT Tree (~300MB, 200k interactions)

Expected time: 38000s (latency)+750s (data)+CPU➙10 hrs!

Test 2: Draw a histogram from that tree data(6k interactions)

Measured time ~15-20min Using xrootd with WAN optimizations disabled

28-November-07 17: http://xrootd.slac.stanford.edu**Federico Carminati, Federico Carminati, The The ALICE ALICE Computing Status and ReadinessComputing Status and Readiness, LHCC, November 2007, LHCC, November 2007

Page 18: Scalla/XRootd Advdv ce e sancements...Composite Cluster Name Space opendir() refers to the directory structure maintained by xrootd:2094 Client xroot.redirect mkdir myhost:2094 Redirector

**Smart WAN Access**

Exploit xrootd WAN OptimizationsTCP multi-streaming: for up to 15x improvement data WAN throughputTh ROOT TT C h id th hi t ”f t ” d tThe ROOT TTreeCache provides the hints on ”future” data accessesTXNetFile/XrdClient ”slides through” keeping the network pipeline full

Data transfer goes in parallel with computationData transfer goes in parallel with computationThroughput improvement comparable to “batch” file-copy tools

70-80%, we are doing a live analysis, not a file copy!Test 1 actual time: 60-70 secondsTest 1 actual time: 60 70 seconds

Compared to 30 seconds using a Gb LANVery favorable for sparsely used files

Test 2 actual time: 7-8 secondsTest 2 actual time: 7 8 seconds Comparable to LAN performance

100x improvement over dumb WAN access (i.e., 15-20 minutes)

28-November-07 18: http://xrootd.slac.stanford.edu**Federico Carminati, Federico Carminati, The The ALICE ALICE Computing Status and ReadinessComputing Status and Readiness, LHCC, November 2007, LHCC, November 2007

Page 19: Scalla/XRootd Advdv ce e sancements...Composite Cluster Name Space opendir() refers to the directory structure maintained by xrootd:2094 Client xroot.redirect mkdir myhost:2094 Redirector

Conclusion

Scalla is a robust frameworkElaborative

Composite Name SpaceXrootdFS

ExtensibleCluster globalization

Many opportunities to enhance data analysisMany opportunities to enhance data analysisSpeed and efficiency

28-November-07 19: http://xrootd.slac.stanford.edu

Page 20: Scalla/XRootd Advdv ce e sancements...Composite Cluster Name Space opendir() refers to the directory structure maintained by xrootd:2094 Client xroot.redirect mkdir myhost:2094 Redirector

AcknowledgementsSoftware Collaborators

INFN/Padova: Alvise DorigoRoot: Fons Rademakers, Gerri Ganis (security), Bertrand Bellenot (windows)( y) ( )Alice: Derek Feichtinger, Guenter KickingerCERN: Fabrizio Furano (client) , Andreas Peters (Castor)STAR/BNL: Pavel JaklCornell: Gregory SharpCornell: Gregory SharpSLAC: Jacek Becla, Tofigh Azemoon, Wilko Kroeger, Bill WeeksBaBar: Peter Elmer (packaging)

Operational collaboratorsOperational collaboratorsBNL, CNAF, FZK, INFN, IN2P3, RAL, SLAC

FundingUS D t t f EUS Department of Energy

Contract DE-AC02-76SF00515 with Stanford UniversityFormerly INFN (BaBar)

28-November-07 20: http://xrootd.slac.stanford.edu