Scientific Computing Scientific Computing ResourcesResources
Scientific Computing Scientific Computing ResourcesResources
Ian Bird – Computer CenterHall A Analysis Workshop
December 11, 2001
OverviewOverviewOverviewOverview
• Current Resources– Recent evolution– Mass storage – HW & SW– Farm– Remote data access– Staffing levels
• Future Plans– Expansion/upgrades of current resources– Other computing – LQCD– Grid Computing
• What is it? – Should you care?
Gigabit EthernetSwitching Fabric
Gigabit EthernetSwitching Fabric
JLAB Network Backbone
JLAB Network Backbone
Batch Farm Cluster• 350 Linux nodes (400 MHz – 1 GHz)• 10,000 SPECint95• Managed by LSF + Java layer + web interface
Batch Farm Cluster• 350 Linux nodes (400 MHz – 1 GHz)• 10,000 SPECint95• Managed by LSF + Java layer + web interface
Interactive Analysis• 2 Sun 450 – 4 processor• 2 4-processor Intel/Linux
Interactive Analysis• 2 Sun 450 – 4 processor• 2 4-processor Intel/Linux
Lattice QCD Cluster• 40 Alpha/Linux (667 MHz)• 256 Pentium 4 (Q2 FY02?)• Managed by PBS + Web portal
Lattice QCD Cluster• 40 Alpha/Linux (667 MHz)• 256 Pentium 4 (Q2 FY02?)• Managed by PBS + Web portal
Unix, Linux, Windows desktops
bbftp service
Grid gateway
16 TB Cache disk SCSI + EIDE diskRAID 0 on Linux servers
16 TB Cache disk SCSI + EIDE diskRAID 0 on Linux servers
•2 STK silos•10 9940•10 9840•8 Redwood
•10 Solaris/Linux data movers w/ 300 GB stage
•2 STK silos•10 9940•10 9840•8 Redwood
•10 Solaris/Linux data movers w/ 300 GB stage
10 TB work areasSCSI disk – RAID 5
CUE General Services
JASMine managed Mass Storage Systems Internet
(ESNet : OC-3)
Jefferson LabScientific Computing
EnvironmentNovember 2001
Jefferson LabScientific Computing
EnvironmentNovember 2001
2 TB Farm CacheSCSI – RAID 0on Linux servers
2 TB Farm CacheSCSI – RAID 0on Linux servers
Batch Farm – 350 processors175 – dual nodes each connected at 100 Mb to 24-port switch with Gb uplink (8 switches)
Batch Farm – 350 processors175 – dual nodes each connected at 100 Mb to 24-port switch with Gb uplink (8 switches)
Foundry BigIron 8000Switch; 256 Gb backplane,~45/60 Gb ports in use
Foundry BigIron 8000Switch; 256 Gb backplane,~45/60 Gb ports in use
Site Router –CUE and general services
Site Router –CUE and general services
•2 STK silos•10 9940•10 9840•8 Redwood
•10 Solaris/Linux data movers each w/ 300 GB stage & Gb uplink
•2 STK silos•10 9940•10 9840•8 Redwood
•10 Solaris/Linux data movers each w/ 300 GB stage & Gb uplink
CH-Router –Incoming data fromHalls A & C
CH-Router –Incoming data fromHalls A & C
Fiber Channeldirect from CLAS
Fiber Channeldirect from CLAS
Cache disk farm20 Linux servers –each with Gb uplinkTotal 16 TB SCSI/IDE – RAID 0
Cache disk farm20 Linux servers –each with Gb uplinkTotal 16 TB SCSI/IDE – RAID 0
Work disk farm4 Linux servers – each with Gb uplinkTotal 4 TB SCSI – RAID 5
Work disk farm4 Linux servers – each with Gb uplinkTotal 4 TB SCSI – RAID 5
Work disks4 MetaStor systems each with 100 Mb uplinkTotal 5 TB SCSI – RAID 5
Work disks4 MetaStor systems each with 100 Mb uplinkTotal 5 TB SCSI – RAID 5
JLAB Farm andMass Storage
SystemsNovember 2001
JLAB Farm andMass Storage
SystemsNovember 2001
CPU ResourcesCPU ResourcesCPU ResourcesCPU Resources
• Farm– Upgraded this summer with 60 dual 1 GHz P III
(4 cpu / 1 u rackmount)– Retired original 10 dual 300 MHz– Now 350 cpu (400, 450, 500, 750, 1000 MHz)
• ~11,000 SPECint95
– Deliver > 500,000 SI95-hrs / week• Equivalent to 75 1 GHz cpu
• Interactive– Solaris: 2 E450 (4-proc)– Linux: 2 quad systems (4x450, 4x750MHz)– If required can use batch systems (via LSF) to
add interactive CPU to these (Linux) front ends
First purchases, 9 duals per 24” rack
Last summer, 16 duals (2u) + 500 GB cache (8u) per 19” rack
Recently, 5 TB IDE cache disk (5 x 8u) per 19”
Intel Linux Farm
Tape storageTape storageTape storageTape storage
• Added 2nd silo this summer– Required move of room of equipment– Added 10 9940 drives (5 as part of new silo)– Current:
• 8 Redwood, 10 9840, 10 9940– Redwood: 50 GB @ 10MB/s (helical scan single reel)– 9840: 20 GB @ 10MB/s (linear mid-load cassette (fast))– 9940: 60 GB @ 10MB/s (linear single reel)– 9840 & 9940 are very reliable– 9840 & 9940 have upgrade paths that use same media
» 9940 2nd generation – 100 GB@20MB/s ??• Add 10 more 9940 this FY (budget..?)• Replace Redwoods (reduce to 1-2)
– Requires copying 4500 tapes – started – budget for tape?
» Reliability, end of support(!)
Disk storageDisk storageDisk storageDisk storage• Added cache space
– For frequently used silo files, to reduce tape accesses– Now have 22 cache servers
• 4 dedicated to farm ~ 2 TB• ~16 TB of cache space allocated to expts
– Some bought and owned by groups• Dual Linux systems, Gb network, ~ 1 TB disk, RAID 0
– 9 SCSI systems– 13 IDE systems
» Performance approx equivalent– Good match cpu:network throughput:disk space
– This is a model that will scale by a few factors, but probably not by 10 (but there is as yet no solution to that)
• Looking at distributed file systems for the future – to avoid NFS complications – GFS, etc., but no production level system yet.
– Nb. Accessing data with jcache does not need NFS, and is fault tolerant• Added work space
– Added 4 systems to reduce load on fs3,4,5,6 (orig /work)– Dual Linux systems, Gb network, ~ 1 TB disk, SCSI RAID 5– Performance on all systems is now good
• Problems –– Some issues with IBM 75 GB ATA drives, 3-ware IDE RAID cards, Linux
kernels• System is reasonably stable, but not yet perfect – but alternatives are not cost-
effective
JASMineJASMineJASMineJASMine• JASMine – Mass Storage system software • Rationale – why write another MSS?
– Had been using OSM• Not scaleable, not supported, reached limit of sw, had to run 2 instances to
get sufficient drive capacity• Hidden from users by “Tapeserver”
– Java layer that» Hid complexities of OSM installations» Implemented tape disk buffers (stage)» Provided get, put, managed cache (read copies of archived data) capabilities
– Migration from OSM• Production environment….
– Timescales driven by experiment schedules, need to add drive capacity– Retain user interface
• Replace “osmcp” function – tape to disk, drive and library management– Choices investigated
• Enstore, Castor, (HPSS)– Timescales, support, adaptability (missing functionality/philosophy – cache/stage)
– Provide missing functions within Tapeserver environment, clean up and reworking
• JASMine (JLAB Asynchronous Storage Manager)
ArchitectureArchitectureArchitectureArchitecture• JASMine
– Written in Java• For data movement, as fast as C code.• JDBC makes using and changing databases easy.
– Distributed Data Movers and Cache Managers– Scaleable to the foreseeable needs of the experiments– Provides scheduling –
• Optimizing file access requests• User and group (and location dependent) priorities
– Off-site cache or ftp servers for data exporting• JASMine Cache Software
– Stand-alone component – can act as a local or remote client, allows remote access to JASMine
– Can be deployed to a collaborator to manage small disk system and as basis for coordinated data management between sites
– Cache manager runs on each cache server.• Hardware is not an issue.• Need a JVM, network, and a disk to store files.
Software cont.Software cont.Software cont.Software cont.
• MySQL database used by all servers.– Fast and reliable.– SQL
• Data Format– ANSI standard labels with extra information– Binary data– Support to read legacy OSM tapes
• cpio, no file labels
• Protocol for file transfers• Writes to cache are never NFS• Reads from cache may be NFS
Dispatcher
CacheManager
DriveManager
DriveDisk
VolumeManager
Dispatcher
CacheManager
DriveManager
DriveDisk
VolumeManager
LibraryManager
Dispatcher
CacheManager
DriveManager
DriveDisk
VolumeManager
Client
RequestManager
Scheduler
Data Mover
LogManager
LibraryManager
Database
RequestManager
Database Connection
Service Connection
Log Connection
JASMine ServicesJASMine ServicesJASMine ServicesJASMine Services
• Database– Stores metadata
• also presented to user on an NFS filesystem as “stubfiles”– But could equally be presented as e.g. a web service, LDAP, …
• Do not need to access stubfiles – just need to know filenames– Tracks status and locations of all requests, files, volumes,
drives, etc.• Request Manager
– Handles user requests and queries.• Scheduler
– Prioritizes user requests for tape access.• priority = share / (0.01 + (num_a * ACTIVE_WEIGHT) + (num_c * COMPLETED_WEIGHT) )
– Host vs User shares, farm priorities• Log Manager
– Writes out log and error files and databases.– Sends out notices for failures.
• Library Manager– Mount and dismounts tapes as well as other library related
tasks.
JASMine Services -2JASMine Services -2JASMine Services -2JASMine Services -2
• Data Mover– Dispatcher
• Keeps track of available local resources and starts requests the local system can work on.
– Cache Manager• Manages a disk or disks for pre-staging data to and
from tape.• Sends and receives data to and from clients.
– Volume Manager• Manages tapes for availability.
– Drive Manager• Manages tape drives for usage.
User AccessUser AccessUser AccessUser Access
• Jput– Put one or more files on tape
• Jget– Get one or more files from tape
• Jcache– Copies one or more files from tape to cache
• Jls– Get metadata for one or more files
• Jtstat– Status of the request queue
• Web interface– Query status and statistics for entire system
Web interfaceWeb interfaceWeb interfaceWeb interface
Data Access to cacheData Access to cacheData Access to cacheData Access to cache
• NFS– Directory of links points the way.– Mounted read-only by the farm.– Users can mount read-only on their desktop.
• Jcache– Java client.– Checks to see if files are on cache disks.– Will get/put files from/to cache disks.
• More efficient than NFS, avoids NFS hangs if server dies, etc., but users like NFS
Disk Cache ManagementDisk Cache ManagementDisk Cache ManagementDisk Cache Management
• Disk Pools are divided into groups– Tape staging.– Experiments.– Pre-staging for the batch farm.
• Management policy set per group– Cache – LRU files removed as needed.– Stage – Reference counting.– Explicit – manual addition and deletion.– Policies are pluggable – easy to add
Protocol for file movingProtocol for file movingProtocol for file movingProtocol for file moving
• Simple extensible protocol for file copies– Messages are java serialized objects passed
over streams,– Bulk data transfer uses raw data transfer over
tcp
• Protocol is synchronous – all calls block– Asynchrony & multiple requests by threading
• CRC32 checksums at every transfer• More fair than NFS• Session may make many connections
Protocol for file movingProtocol for file movingProtocol for file movingProtocol for file moving
• Cache server extends the basic protocol– Add database hooks for cache– Add hooks for cache policies– Additional message types were added
• High throughput disk pool – Database shared by many servers– Any server in the pool can look up file location,
• But data transfer always direct between client and node holding file
– Adding servers and disk to pool increases throughput with no overhead,
• Provides fault tolerance
Example: Get from cacheExample: Get from cacheExample: Get from cacheExample: Get from cache• cacheClient.getFile(“/foo”, “halla”);
– send locate request to any server– receive locate reply– contact appropriate server– initiate direct xfer– Returns true on success
cache4
Where is /foo?
Client (farm node)
cache1
cache2
cache3
Cache3 has /foo
Database
Get /fooSending
/foo
Example: simple put to cacheExample: simple put to cacheExample: simple put to cacheExample: simple put to cache• putFile(“/quux”,”halla”,123456789);
Cache4 has room
Client(data mover)
cache1
cache2
cache3
cache4
Where can I put
/quux?
Database
Fault ToleranceFault ToleranceFault ToleranceFault Tolerance
• Dead machines do not stop the system– Data Movers work independently
• Unfinished jobs will restart on another mover
– Cache Servers will only impact NFS clients• System recognizes dead server and will re-cache file
from tape• If users would not use NFS would never see a failure –
just extended access time
• Exception handling for– Received timeouts– Refused connections– Broken connections– Complete garbage on connections
Authorization and Authorization and AuthenticationAuthentication
Authorization and Authorization and AuthenticationAuthentication
• Shared secret for each file transfer session– Session authorization by policy objects
– Example: receive 5 files from user@bar
• Plug-in authenticators– Establish shared secret between client
and server– No clear text passwords– Extend to be compatible with GSI
JASMine Bulk Data TransfersJASMine Bulk Data TransfersJASMine Bulk Data TransfersJASMine Bulk Data Transfers
• Model supports parallel transfers– Many files at once, but not bbftp style
• But could replace stream class with a parallel stream
– For bulk data transfer over WANs
• Firewall issues– Client initiates all connections
Architecture: Disk pool Architecture: Disk pool hardwarehardware
Architecture: Disk pool Architecture: Disk pool hardwarehardware
• SCSI Disk Servers– Dual Pentium III 650 (later 933)MHz CPUs– 512 Mbytes 100MHz SDRAM ECC– ASUS P2B-D Motherboard– NetGear GA620 Gigabit Ethernet PCI NIC– Mylex eXtremeRAID 1100, 32 MBytes cache– Seagate ST150176LW (Qty. 8) - 50 GBytes Ultra2 SCSI in Hot
Swap Disk Carriers– CalPC 8U Rack Mount Case with Redundant 400W Power Supplies
• IDE Disk Servers– Dual Pentium III 933MHz CPUs– 512 Mbytes 133MHz SDRAM ECC– Intel STL2 or ASUS CUR-DLS Motherboard– NetGear GA620 or Intel PRO/1000 T Server Gigabit Ethernet PCI
NIC– 3ware Escalade 6800– IBM DTLA-307075 (Qty. 12) - 75 GBytes Ultra ATA/100 in Hot
Swap Disk Carriers– CalPC 8U Rack Mount Case with Redundant 400W Power Supplies
Cache PerformanceCache PerformanceCache PerformanceCache Performance
• Matches network, disk I/O, and CPU performance with size of disk pool:
– ~800 GB,– 2 x 850MHz– Gb Ethernet
Cache statusCache statusCache statusCache status
Performance – SCSI vs IDEPerformance – SCSI vs IDEPerformance – SCSI vs IDEPerformance – SCSI vs IDE
• Disk Array/File System – Ext2– SCSI Disk Server - 8 50 GByte disks in a RAID-0
stripe over 2 SCSI controllers• 68 MBytes/sec single disk write• 79 MBytes/sec burst for a single disk write• 52 MBytes/sec single disk read• 56 MBytes/sec burst for a single disk read
– IDE Disk Server - 6 75 GByte disks in a RAID-0 stripe
• 64 MBytes/sec single disk write• 77 MBytes/sec burst for a single disk write • 48 MBytes/sec single disk read • 49 MBytes/sec burst for a single disk read
Performance NFS vs JcachePerformance NFS vs JcachePerformance NFS vs JcachePerformance NFS vs Jcache• NFS v2 udp - 16 clients,
• rsize=8192 and wsize=8192– Reads
• SCSI Disk Servers– 7700 NFS ops/sec and 80% cpu utilization– 11000 NFS ops/sec burst and 83% cpu utilization– 32 MBytes/sec and 83% cpu utilization
• IDE Disk Servers– 7700 NFS ops/sec and 72% cpu utilization– 11000 NFS ops/sec burst and 92% cpu utilization– 32 MBytes/sec and 72% cpu utilization
• Jcache - 16 clients– Reads
• SCSI Disk Servers– 32 MBytes/sec and 100% cpu utilization
• IDE Disk Servers– 32 MBytes/sec and 100% cpu utilization
JASMine system performanceJASMine system performanceJASMine system performanceJASMine system performance• End-to-end performance
• i.e. tape load, copy to stage, network copy to client– Aggregate sustained performance of 50MB/s is
regularly observed in production– During stress tests, up to 120 MB/s was
sustained for several hours• A data mover with 2 drives can handle ~15MB/s (disk
contention is the limit)– Expect current system should handle 150MB/s and is
scaleable by adding data movers & drives– N.B. this is performance to a network client!
• Data handling– Currently the system regularly moves 2-3 TB
per day total• ~6000 files per day, ~2000 requests
JASMine performanceJASMine performanceJASMine performanceJASMine performance
Tape migrationTape migrationTape migrationTape migration
• Begin migration of 5000 Redwood tapes to 9940– Procedure written– Uses any/all available drives– Use staging to allow re-packing of tapes– Expect will last 9-12 months
Batch Farm Cluster• 350 Linux nodes (400 MHz – 1 GHz)• 10,000 SPECint95• Managed by LSF + Java layer + web interface
Batch Farm Cluster• 350 Linux nodes (400 MHz – 1 GHz)• 10,000 SPECint95• Managed by LSF + Java layer + web interface
10 TB work areasSCSI disk – RAID 5
16 TB Cache disk SCSI + EIDE diskRAID 0 on Linux servers
16 TB Cache disk SCSI + EIDE diskRAID 0 on Linux servers
Typical Data FlowsTypical Data Flows
Raw Data< 10MB/s over
Gigabit Ethernet(Halls A & C)
Raw Data > 20 MB/s overFiber channel
(Hall B)
25-30 MB/s
25-30 MB/s
How to make optimal use of the How to make optimal use of the resourcesresources
How to make optimal use of the How to make optimal use of the resourcesresources
• Plan ahead!• As a group:
– Organize data sets in advance (~week) and use the cache disks for their intended purpose
• Hold frequently used data to reduce tape access
– In a high data rate environment no other strategy works
• When running farm productions– Use jsub to submit many jobs in one command
– as it was designed• Optimizes tape accesses
– Gather output files together on work disks and make a single jput for a complete tape’s worth of data
Remote data accessRemote data accessRemote data accessRemote data access• Tape copying is deprecated
– Expensive, time consuming (for you and us), and inefficient– We have OC-3 (155 Mbps) connection that is under-utilized,
filling it will get us upgraded to OC-12 (622 Mbps)• At the moment we do often have to coordinate with ESnet and
peers to ensure high-bandwidth path, but this is improving as Grid development continues
• Use network copies– Bbftp service
• Parallel, secure ftp – optimizes use of WAN bandwidth
• Future– Remote jcache
• Cache manager can be deployed remotely – demonstration Feb 02.
– Remote silo access, policy-based (unattended) data migration– GridFTP, bbftp, bbcp
• Parallel, secure ftp (or ftp-like)– As part of a Grid infrastructure
• PKI authentication mechanism
(Data-) Grid Computing(Data-) Grid Computing(Data-) Grid Computing(Data-) Grid Computing
Particle Physics Data GridParticle Physics Data GridCollaboratory PilotCollaboratory Pilot
Particle Physics Data GridParticle Physics Data GridCollaboratory PilotCollaboratory Pilot
Who we are:Four leading Grid Computer Science Projects
andSix international High Energy and Nuclear Physics Collaborations
What we do:Develop and deploy Grid Services for our Experiment Collaborators
andPromote and provide common Grid software and standards
The problem at hand today:Petabytes of storage, Teraops/s of computing
Thousands of users, Hundreds of institutions,
10+ years of analysis ahead
PPDG ExperimentsPPDG ExperimentsPPDG ExperimentsPPDG Experiments
ATLAS - a Toroidal LHC ApparatuS at CERN Runs 2006 onGoals: TeV physics - the Higgs and the origin of mass …
http://atlasinfo.cern.ch/Atlas/Welcome.html
BaBar - at the Stanford Linear Accelerator Center Running
NowGoals: study CP violation and more
http://www.slac.stanford.edu/BFROOT/
CMS - the Compact Muon Solenoid detector at CERN Runs 2006
onGoals: TeV physics - the Higgs and the origin of mass …
http://cmsinfo.cern.ch/Welcome.html/
D0 – at the D0 colliding beam interaction region at Fermilab Runs SoonGoals: learn more about the top quark, supersymmetry, and the Higgs
http://www-d0.fnal.gov/
STAR - Solenoidal Tracker At RHIC at BNL Running
NowGoals: quark-gluon plasma …
http://www.star.bnl.gov/
Thomas Jefferson National Laboratory Running
NowGoals: understanding the nucleus using electron beams …
http://www.jlab.org/
PPDG Computer Science GroupsPPDG Computer Science GroupsPPDG Computer Science GroupsPPDG Computer Science Groups
Condor – develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing on large collections of computing resources with distributed ownership.
http://www.cs.wisc.edu/condor/
Globus - developing fundamental technologies needed to build persistent environments that enable software applications to integrate instruments, displays, computational and information resources that are managed by diverse organizations in widespread locations
http://www.globus.org/
SDM - Scientific Data Management Research Group – optimized and standardized access to storage systems
http://gizmo.lbl.gov/DM.html
Storage Resource Broker - client-server middleware that provides a uniform interface for connecting to heterogeneous data resources over a network and cataloging/accessing replicated data sets.
http://www.npaci.edu/DICE/SRB/index.html
Delivery of End-to-End ApplicationsDelivery of End-to-End Applications& Integrated Production Systems & Integrated Production Systems
Delivery of End-to-End ApplicationsDelivery of End-to-End Applications& Integrated Production Systems & Integrated Production Systems
to allow thousands of physicists to share data & computing resources for scientific processing and analyses
Operators & Users
Resources: Computers, Storage, Networks
PPDG Focus:
- Robust Data Replication
- Intelligent Job Placement and Scheduling
- Management of Storage Resources
- Monitoring and Information of Global Services
Relies on Grid infrastructure:- Security & Policy- High Speed Data Transfer- Network management
Project Activities,Project Activities,End-to-End ApplicationsEnd-to-End Applications
and Cross-Cut Pilotsand Cross-Cut Pilots
Project Activities,Project Activities,End-to-End ApplicationsEnd-to-End Applications
and Cross-Cut Pilotsand Cross-Cut Pilots
Project Activities are focused Experiment – Computer Science Collaborative developments.
Replicated data sets for science analysis – BaBar, CMS, STARDistributed Monte Carlo production services – ATLAS, D0, CMSCommon storage management and interfaces – STAR, JLAB
End-to-End Applications used in Experiment data handling systems to give real-world requirements, testing and feedback.
Error reporting and responseFault tolerant integration of complex components
Cross-Cut Pilots for common services and policies Certificate Authority policy and authenticationFile transfer standards and protocolsResource Monitoring – networks, computers, storage.
Year 0.5-1 Milestones (1)Year 0.5-1 Milestones (1)Year 0.5-1 Milestones (1)Year 0.5-1 Milestones (1)
Align milestones to Experiment data challenges:
– ATLAS – production distributed data service – 6/1/02
– BaBar – analysis across partitioned dataset storage – 5/1/02
– CMS – Distributed simulation production – 1/1/02
– D0 – distributed analyses across multiple workgroup clusters – 4/1/02
– STAR – automated dataset replication – 12/1/01
– JLAB – policy driven file migration – 2/1/02
Year 0.5-1 MilestonesYear 0.5-1 MilestonesYear 0.5-1 MilestonesYear 0.5-1 Milestones
Common milestones with EDG:
GDMP – robust file replication layer – Joint Project with EDG Work Package (WP) 2 (Data Access)
Support of Project Month (PM) 9 WP6 TestBed Milestone. Will participate in integration fest at CERN - 10/1/01
Collaborate on PM21 design for WP2 - 1/1/02
Proposed WP8 Application tests using PM9 testbed – 3/1/02
Collaboration with GriPhyN:
SC2001 demos will use common resources, infrastructure and presentations – 11/16/01
Common, GriPhyN-led grid architecture
Joint work on monitoring proposed
Year ~0.5-1 “Cross-cuts”Year ~0.5-1 “Cross-cuts”Year ~0.5-1 “Cross-cuts”Year ~0.5-1 “Cross-cuts”
• Grid File Replication Services used by >2 experiments:– GridFTP – production releases
• Integrate with D0-SAM, STAR replication• Interfaced through SRB for BaBar, JLAB• Layered use by GDMP for CMS, ATLAS
– SRB and Globus Replication Services• Include robustness features• Common catalog features and API
– GDMP/Data Access layer continues to be shared between EDG and PPDG.
• Distributed Job Scheduling and Management used by >1 experiment:
• Condor-G, DAGman, Grid-Scheduler for D0-SAM, CMS• Job specification language interfaces to distributed schedulers – D0-
SAM, CMS, JLAB
• Storage Resource Interface and Management• Consensus on API between EDG, SRM, and PPDG• Disk cache management integrated with data replication services
Year ~1 other goals:Year ~1 other goals:Year ~1 other goals:Year ~1 other goals:
• Transatlantic Application Demonstrators:– BaBar data replication between SLAC and IN2P3– D0 Monte Carlo Job Execution between Fermilab and NIKHEF– CMS & ATLAS simulation production between Europe/US
• Certificate exchange and authorization.– DOE Science Grid as CA?
• Robust data replication.– fault tolerant – between heterogeneous storage resources.
• Monitoring Services– MDS2 (Metacomputing Directory Service)?– common framework– network, compute and storage information made available to scheduling and resource management.
PPDG activities as part of the PPDG activities as part of the Global Grid CommunityGlobal Grid Community
PPDG activities as part of the PPDG activities as part of the Global Grid CommunityGlobal Grid Community
Coordination with other Grid Projects in our field:GriPhyN – Grid for Physics NetworkEuropean DataGridStorage Resource Management collaboratoryHENP Data Grid Coordination Committee
Participation in Experiment and Grid deployments in our field:ATLAS, BaBar, CMS, D0, Star, JLAB experiment data handling systemsiVDGL/DataTAG – International Virtual Data Grid LaboratoryUse DTF computational facilities?
Active in Standards Committees:Internet2 HENP Working Group Global Grid Forum
Staffing LevelsStaffing LevelsStaffing LevelsStaffing Levels
• We are stretched thin– But compared with other labs with
similar data volumes we are efficient• Systems support group: 5 + 1 vacant• Farms, MSS development: 2• HW support/ Networks: 3.7• Telecom: 2.3• Security: 2• User services: 3• MIS, Database support: 8• Support for Engineering: 1
– We cannot do as much as we would like
Future (FY02)Future (FY02)Future (FY02)Future (FY02)• Removing Redwoods is a priority
– Copying tapes, replacing drives w/ 9940’s
• Modest farm upgrades – replace older CPU as budget allows– Improve interactive systems
• Add more /work, /cache• Grid developments:
– Visible as efficient WAN data replication services
• After FY02– Global filesystems – to supercede NFS– 10 Gb Ethernet– Disk vs. tape? Improved tape densities, data rates
• We welcome (coordinated) input as to what would be most useful for your physics needs