Upload
noel-quinn
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
4 March 2004 GridPP 9th Collaboration Meeting
SAMGrid:JIM and CDF Development
• CDF Accepts the Need for the Grid– Requirements
• How to Meet the Need– Status of SAMGrid for CDF
Rick St. Denis, University of Glasgow
4 March 2004 GridPP 9th Collaboration Meeting
Director’s review, International Finance
Committee: 50% computing outside FNAL
Maximize physics output @ low Lumi
–L3 output rate: 80 -> 360Hz by 06
Spokespersons’ Requirements for CDF
CDFGrid supported by FNAL PAC
CDF needs the Grid
4 March 2004 GridPP 9th Collaboration Meeting
Scale of CDF Requirements
THz %offsite CPU
Speed
#duals
FY04 3.7 25% 3GHz 150
FY05 9.0 50% 5GHz +360
FY06 16.5 50% 8GHz +220
6-7 sites, 100Duals each, by 2006 + 700 @FNAL
4 March 2004 GridPP 9th Collaboration Meeting
CDF Computing Model
• Develop Analysis on desktop– Access to all CDF data from
anywhere• Large scale processing on batch
clusters– Submission from anywhere– interactive tools: ls,top,head/tail/cat– Output to scratch space or desktop
Implemented Now with CAF
4 March 2004 GridPP 9th Collaboration Meeting
Use Cases for Summer 2004
• User Level MC Production– All CDF Users have access– No data on site -> SAM write
• User Level Data Access– All users have access– Selected samples on site: Full SAM
Support
SAM Essential for Summer 2004
4 March 2004 GridPP 9th Collaboration Meeting
Medium Term Vision
• Many Sites
• Fully transparent submission to all of CDF resources: 75% FNAL, 25%
outside
• Fully transparent input and output of data
4 March 2004 GridPP 9th Collaboration Meeting
Summer 04 Functionality
• User selects submission site, saying what dataset they will use
• System checks they can do this (privileges)
• User access with SAM/dCache
• User registers output with SAM
4 March 2004 GridPP 9th Collaboration Meeting
October 04
• To extend beyond 25% outside computing JIM is essential: JIM Test for CDF June04, production October 04
• HOWEVER: It already seems that the 25% resources are not sufficient for the produciton passes: will want JIM earlier.
4 March 2004 GridPP 9th Collaboration Meeting
CAF Gui/CLI
CDFGrid from a User Perspective
AC++
Grid
Toronto KoreaItaly Taiwan FermiCAF UK
CAF Gui/CLI
CDF Grid from a User Perspective
Only Fermilab
Uses SAM
Outside LabGrid
Uses SAMUses SAM
4 March 2004 GridPP 9th Collaboration Meeting
CDF Grid Strategy• 25% of CDF Computing from external
resources. All CDF computing on CDF Grid by April 15: Utilize resources fully controlled by CDF: Kerberos/fbsng: dCAF + SAM
• October 15, 2004: JIM to capture shared resources
• June 2005: 50% of Computing resources external
4 March 2004 GridPP 9th Collaboration Meeting
Desktop
Anywhere
CondorSubmitter
@regional centers
SAM DBCondor Matchmaker
@FNAL
Globus GKCAF SubmitterSAM Station
@ each site
WN
Private LAN
Private LAN
dCache
June 2004testing
June 2005required
Simple JIM
4 March 2004 GridPP 9th Collaboration Meeting
Detailed JIM
SiteSite SiteSite SiteSite
Resource Selector
Info Collector
Info Gatherer
Match Making
User InterfaceUser Interface User InterfaceUser Interface
SubmissionGlobal Job Queue
Grid Client
SubmissionSubmission
User InterfaceUser Interface User InterfaceUser Interface
Global DH ServicesSAM Naming Server
SAM Log Server
Resource Optimizer
SAM DB ServerRC MetaData Catalog
Bookkeeping Service
SAM Stager(s)
SAM Station(+other servs)
Data Handling
Worker Nodes
Grid Gateway
Local Job Handler(CAF, D0MC, BS, ...)
JIM Advertise
Local Job Handling
Cluster
AAA
Dist.FS
Info Manager
XML DB server
Site Conf.Glob/Loc JID map...
Info Providers
MDS
MSS Cache Site
Web ServGrid Monitoring
User Tools
Flow of: job data meta-data
4 March 2004 GridPP 9th Collaboration Meeting
Meeting the Needs
• Progress in SAM
• JIM Status
• RunJob
• CDFGridWorkshop: “Nerd’s Paradise”
• Strict Project Management and process to respond to operational issues
4 March 2004 GridPP 9th Collaboration Meeting
Progress in SAM• Dbserver, the database server between
applications and Oracle, was upgraded to use a common schema for CDF and D0.
• All CDF data files are in SAM • Sam in is in beta testing on the CDF CAF
(1200 cpus): passed 20TB/Day delivery• Minos uses SAM for its Data Handling• Steve Mrenna (Phenomenology) depositing
ALPGEN files in SAM for common CDF/D0 use.
4 March 2004 GridPP 9th Collaboration Meeting
JIM Deployment IssuesFocus: • 200 jobs each getting 200 files generated 120000
requests simultaneously to the DBServer!– Sensible sam: reliability went to 60%. Now add retries.
Training Users• D0 has D0Tools: Big script; determines where user
is and copies files: harder to get into a sandbox; • CAF conditions users!Distribution and compatibility: • This has made great strides with SAM, now time
for JIM
Communication with the expert!
4 March 2004 GridPP 9th Collaboration Meeting
RunJob
• Dedicated farms at FNAL will go away and RunJob will be used for production processing of data
• CDF will use RunJob for MC production• Dave Evans worked for CDF for 2 mo.: has
made CDFRunJob based on RunJob(Shakar), a tool common to CMS. Morag will work on this.
4 March 2004 GridPP 9th Collaboration Meeting
Florida workshop:• 11 installations in about 2 hours. Integrated with
dCAF in 2 cases in 2 days.• 3 in Asia, 4 in Europe • 6 sites committed to summer 2004 usage of their
facilities for all of CDF (mostly MC)• Sam installation now: initsam cdf <stationname>• Follow-up on April 1.• Each site has a local user support person to reduce
load on core development team.• Generally: Security ate 80% of the effort!
Now 20!
4 March 2004 GridPP 9th Collaboration Meeting
4 March 2004 GridPP 9th Collaboration Meeting
Installations progress
Participating Institues installation and testing progress
INSTITUTE krb5 Caf
Head Caf
Node DCAF Works
CDF Sam
Software
Sam Station
sam_par_ret Sam
AC++Dump
Sam File
Store
Sam File
Store Remote
Sam AC++Dump
on CAF
MIT Yes ?
Korea Yes Yes Yes Yes Yes knu Yes Yes
Pisa Yes Yes Yes Yes Yes pisa Yes Yes Yes Yes Yes
Japan Yes Yes Yes Problems Yes japan Yes Yes Yes
Karlsruhe Yes Yes Yes Problems Yes fzzka Yes Yes Yes Yes Yes
Liverpool Yes Yes Yes Problems Yes liverpool Yes Yes Yes
Toronto Yes In progress
Yes toronto Yes
Taiwan Yes Yes Yes Yes Yes taiwan Yes Yes
TTU Yes -ttu,-ttu-phys
Yes
Glasgow Yes In Progress
Yes glasgow Yes Yes
UCSD Yes Yes Yes Yes Yes ucsd Yes
CNAF Yes Yes Yes Yes Yes cnaf Yes Yes Yes Yes Yes
Florida Workshop: After 2 Days
4 March 2004 GridPP 9th Collaboration Meeting
2TB/Day: Karlsruhe
4 March 2004 GridPP 9th Collaboration Meeting
CDF Dcache on CAF
ALL CDF on CAF reads 20TB/Day
4 March 2004 GridPP 9th Collaboration Meeting
4 March 2004 GridPP 9th Collaboration Meeting
4 March 2004 GridPP 9th Collaboration Meeting
Dcache and SAM• Dcache shapes traffic into disk: If a SAM
cache is large, need to use Dcache instead of nfs mounts
• Dcache gives the user what is requested. 1TB gets same priority as 1GB: CDF users must send email requesting data to be staged.
• SAM examines consumption rate before staging next files – No EMAIL needed.
• SAM uses Dcache for its Caching at FNAL.• This needs further work with SRM
4 March 2004 GridPP 9th Collaboration Meeting
SAMGrid Management
Sam Management Team
Sam OperationsAnd Projects
Sam Design
Sam Project Leaders
Sam Technical Leaders
4 March 2004 GridPP 9th Collaboration Meeting
SamGrid Development Process
SAMGrid Operations/Projects Issue Raised SAMGrid Design
SAMGrid Management TeamGrid Deliverables
Subproject
Chaired by Technical Managers Chaired by Project Leaders
4 March 2004 GridPP 9th Collaboration Meeting
Subproject Organization
• Each Subproject has a subproject leader (SPL) responsible for making a plan and reporting progress.
• Each Subproject has one of the Technical leaders evaluating against an assessment template.
• No deliverable requires more than 3mo work to deliver.
4 March 2004 GridPP 9th Collaboration Meeting
SubProject Assessment Template1. Background Documents2. Project Definition/Mission Statement3. Deliverables and timetable4. Inter-project deliverables5. Project status6. Challenges and Critical Path Items7. Lessons Learned8. Project specific comments, alternate views
4 March 2004 GridPP 9th Collaboration Meeting
Housekeeping
SAMGrid Assigned SubProjects
JIM:D0Tools
Common API
Database Server RewriteDatabase Servers toLinux
Metadata Query with configurable Params
Work FlowPackageMCRequest
H Stream for CDF
JIM:MCD0
Test Harness
Retire CDF Replica Catalog
Caching
Configuration Management
HousekeepingMC / Reconstruction
Infrastructure
User analysis Apps
4 March 2004 GridPP 9th Collaboration Meeting
Status of Assessments
• Subprojects defined
• Interviews conducted on about ½
• Assessment reports being written
4 March 2004 GridPP 9th Collaboration Meeting
Conclusions
• CDF has embraced the need for the Grid to achieve its physics mission
• Progress in deployment, robustness testing has SAM in CDF
• JIM is rapidly solving its problems
• … with the help of a review and management process