DØ RAC Working Group Report

  • Upload
    warner

  • View
    38

  • Download
    1

Embed Size (px)

DESCRIPTION

DØ RAC Working Group Report. DØRACE Meeting May 23, 2002 Jae Yu. Progress Definition of an RAC Services provided by an RAC Requirements of RAC Pilot RAC program Open Issues. DØRAC Working Group Progress. Working group has been formed at the Feb. workshop - PowerPoint PPT Presentation

Citation preview

  • D RAC Working Group ReportProgressDefinition of an RACServices provided by an RACRequirements of RACPilot RAC programOpen Issues

    DRACE MeetingMay 23, 2002Jae Yu

  • DRAC Working Group ProgressWorking group has been formed at the Feb. workshopThe members: I. Bertram, R. Brock, F. Filthaut, L. Lueking, P. Mattig,M. Narain , P. Lebrun, B. Thooris , J. Yu, C. ZeitnitzHad many weekly meetings and a face-to-face meeting at FNAL three weeks ago to hash out some unclear issuesA proposal (D Note #3984) has been worked onSpecify the services and requirements for RACsDoc. At : http://www-hep.uta.edu/~d0race/d0rac-wg/d0rac-spec-050802-8.pdf All comments are in Target to release next week, prior to the Directors computing review in two weeks

  • Central Analysis Center (CAC)RegionalAnalysis CentersInstitutionalAnalysis CentersDesktopAnalysis StationsProvide Various ServicesProposed DRAM Architecture

  • What is a DRAC?An institute with large concentrated and available computing resourcesMany 100s of CPUsMany 10s of TBs of disk cacheMany 100Mbytes of network bandwidthPossibly equipped with HPSSAn institute willing to provide services to a few small institutes in the regionAn institute willing to provide increased infrastructure as the data from the experiment growsAn institute willing to provide support personnel if necessary

  • What services do we want a DRAC do?Provide intermediary code distributionGenerate and reconstruct MC data setAccept and execute analysis batch job requestsStore data and deliver them upon requestsParticipate in re-reconstruction of dataProvide database accessProvide manpower support for the above activities

  • Code Distribution ServiceCurrent releases: 4GB total will grow to >8GB?Why needed?:Downloading 8GB once every week is not a big load on network bandwidthEfficiency of release update rely on Network stabilityExploit remote human resourcesWhat is needed?Release synchronization must be done at all RACs every time a new release become availablePotentially need large disk spaces to keep releases UPS/UPD deployment at RACs FNAL specificInteraction with other systems?Need administrative support for bookkeepingCurrent DRACE procedure works well, even for individual users Do not see the need for this service at this point

  • Generate and Reconstruct MC dataCurrently done 100% at remote sitesWhat is needed?A mechanism to automate request processingA Grid that canAccept job requestsPackages the jobIdentify and locate the necessary resourcesAssign the job to the located institutionProvide status to the usersDeliver or keep the results Database for noise and Min-bias addition Perhaps the most undisputable task of a DRAC

  • Batch Job ProcessingCurrently rely on FNAL resources (Dmino, ClueD, CLUBS, etc)What is needed?Sufficient computing infrastructure to process requestsNetworkCPUCache storage to hold job results till the transferAccess to relevant databasesA Grid that can:Accept job requestsPackages the jobIdentify and locate the necessary resourcesAssign the job to the located institutionProvide status to the usersDeliver or keep the results This task definitely needs a DRACBring input to the user or bring the exe to the input?

  • Data Caching and Delivery ServiceCurrently only at FNAL (CAC)Why needed?To make data should be readily available to the users with minimal latencyWhat is needed?Need to know what data and how much we want to store100% TMB10-20% DST? To make up 100% of DST on the netAny RAW data at all?What about MC? 50% of the actual dataShould be on disk to minimize data caching latencyHow much disk space? (~50TB if 100% TMB and 10% DST for RunIIa)Constant shipment of data to all RACs from the CACConstant bandwidth occupation (14MB/sec for Run IIa RAW)Resources from CAC needed A Grid that canLocate the data (SAM can do this already)Tell the requester about the extent of the requestDecide whether to move the data or pull the job over

  • Data Reprocessing ServicesThese include:Re-reconstruction of the actual and MC dataFrom DST?From RAW?Re-streaming of dataRe-production of TMB data setsRe-production of roottreeab initio reconstructionCurrently done only at CAC offline farm

  • Reprocessing Services contdWhat is needed?Sufficiently large bandwidth to transfer necessary data or HPSS (?)DSTsRACs will have 10% or so already permanently storedRAWTransfer should begin when need arisesRACs reconstruct as data gets transferredLarge data storageConstant data transfer from CAC to RACs as CAC reconstructs fresh dataDedicated file server at CAC for data distribution to RACsConstant bandwidth occupationSufficient buffer storage at CAC in case network goes downReliable and stable networkAccess to relevant databasesCalibration LuminosityGeometry and Magnetic Field Map

  • Reprocessing Services contdRACs have to transfer of new TMB to other sitesSince only 10% or so of DSTs reside only the TMB equivalent to that portion can be regeneratedWell synchronized reconstruction executablerun_time environmentA grid that canIdentify resources on the netOptimize resource allocation for most expeditious reproductionMove data around if necessaryA dedicated block of time for concentrated CPU usage if disaster strikes QuestionsDo we keep copies of all data at the CAC?Do we ship DSTs and TMBs back to CAC?

  • Database Access ServiceCurrently done only at CACWhat is needed?Remote DB access software servicesSome copy of DB at RACsA substitute of Oracle DB at remote sitesA means of synchronizing DBs A possible solution is proxy server at the central location supplemented with a few replicated DB for backup

  • What services do we want a DRAC do?Provide intermediary code distributionGenerate and reconstruct MC data setAccept and execute analysis batch job requestsStore data and deliver them upon requestsParticipate in re-reconstruction of dataProvide database accessProvide manpower support for the above activities

  • DRAC Implementation TimescaleImplement First RAC by Oct. 1, 2002CLUBs cluster at FNAL and Karlsruhe, GermanyCluster associated IACsTransfer TMB (10kB/evt) data set constantly from CAC to the RACsWorkshop on RAC in Nov., 2002Implement the next set of RAC by Apr. 1, 2003Implement and test DGridware as they become availableThe DGrid should work by the end of Run IIa (2004), retaining the DRAM architectureThe next generation DGrid, a truly gridfied network without

  • Pilot DRAC ProgramRAC Pilot sites:Karlsruhe, Germany Already agreed to do thisCLUBS, Fermilab Need to verifyWhat should we accomplish?Transfer TMB files as they get producedA File server (both hardware and software) at CAC for this job Request driven or constant push?Network monitoring toolsTo observe network occupation and stabilityFrom CAC to RACFrom RAC to IACAllow IAC users to access the TMBObserve Use of the data setAccessing patternPerformance of the accessing systemSAM system performance for locating

  • The user account assignment?Resource (CPU and Disk space) need?What are the needed Grid software functionality?To interface with the usersTo locate the input and necessary resourcesTo gauge the resourcesTo package the job requests

  • Open IssuesWhat do we do with MC data?Iains suggestion is to keep DStar format not DSTAdditional storage for Min-bias event samplesWhat is the analysis scheme?The only way is to re-do all the remaining process of MC chainRequire additional CPU resourcesDSimReconstructionReco-analysisAdditional disk space to buffer intermediate filesKeep the DST and rootpule? Where?What are other questions we want answers from the pilot RAC program?How do we acquire sufficient funds for these resources?Which institutions are the candidates for RACs?Do we have full support from the collaboration?Other detailed issues covered in the proposal