Derek Ross
E-Science Department
DCache Deployment at Tier1A
UK HEP Sysman April 2005
Derek Ross
E-Science Department
DCache at RAL 1
• Mid 2003
– We deployed a non grid version for CMS. – It was never used in production.
• End of 2003/Start of 2004– RAL offered to package a production quality
DCache.– Stalled due to bugs and and went back to
developers and LCG developers.
Derek Ross
E-Science Department
DCache at RAL 2
• Mid 2004– Small deployment for EGEE JRA1
• Intended for gLite i/o testing.• End of 2004
– CMS instance• 3 disk servers ~ 10TB disk space• Disk served via nfs to pool nodes.• Each pool node running a gridftp door.• In LCG information system.
Derek Ross
E-Science Department
CMS Instance
Derek Ross
E-Science Department
DCache at RAL 4
• Start of 2005– New Production instance supporting CMS,
DTeam,LHCb and Atlas VOs.• 22TB disk space.• CMS instance decommissioned and
reused. • Separate gdbm file for each VO.• Uses directory-pool affinity to map areas of
file system to VOs’ assigned disk.
Derek Ross
E-Science Department
Derek Ross
E-Science Department
DCache at RAL 5
• Early 2005– Service Challenge 2
• 4 disk servers ~ 12TB disk space.• UKLight connection to CERN.• Pools directly on disk servers.• Standalone Gridftp and SRM doors .• SRM not used in Challenge due to software
problems at CERN.• Interfaced to Atlas Data Store.
Derek Ross
E-Science Department
SC2 instance
gridftp gridftp
gridftp
gridftp
gridftp
gridftp gridftp gridftp
Nortel 5510 Stack (80Gps)
SRM
Summit 7i
UKLIGHT (2*1Gps)
SJ42*1Gps dCach
e
D/B
8 dCachepools
3TB
3TB
3TB
3TB
DisklessGridFTP doors
head
Derek Ross
E-Science Department
SC2 results
• Achieved 75MB/s to disk, 50MB/s to tape
– Seen faster - 3000Mb/s to disk over LAN
– Network delivered at last minute, under-provisioned
• Odd iperf results, high udp packet loss.
Derek Ross
E-Science Department
Future Developments
• Interface ADS to production dCache– Considering second srm door.– Implement script to propagate deletes from
dCache to ADS• Service Challenge 3
– Still planning.– Use production dCache.
• Experiments may want to retain data.– Avoid multi-homing if possible.
• Connect UKLight into site network.
Derek Ross
E-Science Department
Production Setup
Proposed
Testing:Dteam onlyfor now
Derek Ross
E-Science Department
VO Support
• Bit of a hack – DCache has no concept of VOs
– Gridmap periodically run through perl script to produce mapping of DN to Unix UID/GID.• Each vo member mapped to first pool
account of vo. All vo’s files owned by that account.
– VOMS support coming…
Derek Ross
E-Science Department
Postgres
• Postgres SRM database is CPU hog– Being worked on.– Current recommendation is a separate
host for PostgreSQL.• Can use the database to store dCache
transfer information for monitoring.• In future may be possible to use for pnfs
databases.
Derek Ross
E-Science Department
SRM requests
• Each SRM request lasts for (default) 24 hours if not finished properly.
– Too many and the srm door queues new requests until slot available.
– Educate users to use lcg-sd after an lcg-gt, don’t Ctrl-C lcg-rep…
Derek Ross
E-Science Department
SRM-SRM copies
• Pull mode– If dCache is the destination, then the
destination pool initiates the gridftp transfer from the source srm.• Need dcache-opt rpm installed (don’t need
gridftp door running) on pools.• Pool node need certificate and
GLOBUS_TCP_PORT_RANGE accessible to incoming.
– Lcg-utils don’t do this but srmcp does.
Derek Ross
E-Science Department
Quotas
• If two vo’s can access same pool, no way to stop one vo grabbing all of pool.
• No global quotas
– Hard to do, pools can come and go
• Only way to restrict disk usage is limit pools a vo can write to.
– But can’t get space available per vo.
Derek Ross
E-Science Department
Links
• http://ganglia.gridpp.rl.ac.uk/?c=DCache
• http://ganglia.gridpp.rl.ac.uk/?c=SC