Nurcan Ozturk
University of Texas at Arlington
Grid User Training for Local Community
TUBITAK ULAKBIM, Ankara, Turkey
April 5 - 9, 2010
Overview of ATLAS Distributed Analysis
Nurcan OzturkNurcan Ozturk 2
Outline
User’s Work-flow for Data Analysis
ATLAS Distributed Data Analysis System
What Type of Data You Can Run On
What Type of Jobs You Can Run
How to Find Your Input Data AMI
dq2 end-user tools
ELSSI - Event Level Selection Service Interface
Tips for Submitting Your Jobs How To Check Status of ATLAS Releases
How to Check Status of Installation of Releases at Sites
Tips for Retrieving Your Outputs
Tips of Debugging Failures
User Support
Pathena Example on 7 TeV Collision Data
Nurcan OzturkNurcan Ozturk 3
User’s Work-flow for Data Analysis
Locate the data
Analyze the results
Setup the analysis job
Submit to the Grid
Retrieve the results
Setup the analysis code
Nurcan OzturkNurcan Ozturk 4
ATLAS Distributed Data Analysis System
Gateways to resources
User interface
Execution infrastructure
3-layers:
Nurcan OzturkNurcan Ozturk 5
What Type of Data You Can Run On RAW: raw data from DAQ system ESD – Event Selection Data: output of reconstruction of RAW data AOD – Analysis Object Data: condensed version of ESD D1PD – Primary Derived Physics Data: ESD or AOD with:
certain event removed (skimming) some data objects within an event removed (thinning) certain data members within object removed (slimming) Commissioning DPD and Performance DPD (made from ESD’s) Physics DPD (made from AOD’s)
DnPD, n-th Derived Physics Data (D2PD, D3PD): Specific format defined by physics/performance groups or users
TAG: event tags which are event thumbnails with pointers to the full event in the AOD – fast access to specific events (file-based or database-based)
Monte Carlo data EVNT – Event generator data: output of event generation HITS – output of detector simulation (uses EVNT as input) RDO – Raw Data Object: output of digitization (uses HITS as input) – MC equivalent of raw data AOD and DPD
All of the above. However RAW/HITS/RDO are on tape, you need to request
for DDM replication to disk storage.
Nurcan OzturkNurcan Ozturk 6
What Type of Jobs You Can Run
Athena jobs with official production transformations: Event generation, Simulation, Pileup, Digitization, Reconstruction, Merge
General jobs (non-Athena type analysis): ROOT(CINT, C++, pyRoot)
ARA (AthenaRootAccess)
Python, user's executable, shell script, etc.
Jobs with multiple-input streams (e.g. reco trf) Cavern, Minimumbias, BeamHalo , BeamGas input
TAG selection jobs
Jobs with nightly builds
Jobs with arbitrary DBRelease Database release contains Conditions, Geometry and Trigger data
More info about DBReleases is in backup slides
etc.
Nurcan OzturkNurcan Ozturk 7
How to Find Your Input Data
AMI
dq2 end-user tools
ELSSI (Event Level Selection Service Interface)
Nurcan OzturkNurcan Ozturk 8
What is AMI
ATLAS Metadata Interface.
A generic cataloging framework: Dataset discovery tool
Tag Collector tool (release management tool)
Where does AMI get its data: From real data : DAQ data from the Tier 0
From Monte Carlo and reprocessing Pulled from the Task Request Database : Tasks, dataset names, Monte Carlo
and reprocessing config tags
Pulled from the Production Database : Finished tasks – files and metadata
From physicists Monte Carlo input files needed for event generation
Monte Carlo Dataset number info, physics group owner,…
Corrected cross sections and comments.
DPD tags
Nurcan OzturkNurcan Ozturk 9
AMI Portal Page - http://ami.in2p3.fr
There is also a read-only server at CERN: http://atlas-ami.cern.ch
Nurcan OzturkNurcan Ozturk 10
7 TeV Datasets
Nurcan OzturkNurcan Ozturk 11
AMI Tutorial Page
Nurcan OzturkNurcan Ozturk 12
AMI Fast Tutorial Page
Nurcan OzturkNurcan Ozturk 13
Simple Search in AMI –search by name
type here to search for latest 7 TeV collision dataset: data10_7TeV%physics%MinBias%AOD%
Nurcan OzturkNurcan Ozturk 14
Simple Search in AMI – various useful links
Group by
Apply filter
Nurcan OzturkNurcan Ozturk 15
Simple Search in AMI – DQ2 link
By clicking on the DQ2 link:
Use always merge datasets ending with /
Nurcan OzturkNurcan Ozturk 16
Simple Search in AMI – PANDA link
By clicking on the PANDA link:
Nurcan OzturkNurcan Ozturk 17
Simple Search in AMI - interpretation of tags (1)
Nurcan OzturkNurcan Ozturk 18
Simple Search in AMI – Interpretation of tags (2)
Nurcan OzturkNurcan Ozturk 19
Simple Search in AMI – Run Summary
By clicking on the Run_Summary link:
Nurcan OzturkNurcan Ozturk 20
Simple Search in AMI – Run Queries
By clicking on the Run_Query link:
Nurcan OzturkNurcan Ozturk 21
dq2 End-User Tools (1)
User interaction with DDM system: via dq2 end-user tools: querying, retrieving, creating datasets
requesting dataset replication, dataset deletion, etc.
How to set up (on lxplus): source /afs/cern.ch/atlas/offline/external/GRID/ddm/DQ2Clients/setup.sh
voms-proxy-init --voms atlas
More info on setup at: https://twiki.cern.ch/twiki/bin/view/Atlas/
RegularComputingTutorial#Getting_ready_to_use_the_Grid
Nurcan OzturkNurcan Ozturk 22
dq2 End-User Tools (2)
How to use: List available MInBias datasets in DDM system (same search as in AMI) :
dq2-ls 'data10_7TeV*physics*MinBias*‘
Search for merged AOD’s in container datasets: dq2-ls 'data10_7TeV*physics*MinBias*merge*AOD*‘
Find location of container datasets (groups of datasets, ending with a '/' ): dq2-list-dataset-replicas-container
data10_7TeV.00152489.physics_MinBias.merge.AOD.f241_p115/
List files in container dataset: dq2-ls –f data10_7TeV.00152489.physics_MinBias.merge.AOD.f241_p115/
Copy one file locally: dq2-get -n 1 data10_7TeV.00152489.physics_MinBias.merge.AOD.f241_p115/
More info at: https://twiki.cern.ch/twiki/bin/view/Atlas/DQ2ClientsHowTo
https://twiki.cern.ch/twiki/bin/view/Atlas/DQ2Tutorial
Nurcan OzturkNurcan Ozturk 23
DQ2ClientsHowTo
Extensive info here
https://twiki.cern.ch/twiki/bin/view/Atlas/DQ2ClientsHowTo
Nurcan OzturkNurcan Ozturk 24
DQ2Tutorial https://twiki.cern.ch/twiki/bin/view/Atlas/DQ2Tutorial
Nurcan OzturkNurcan Ozturk 25
ELSSI – Event Level Selection Service Interface
https://voatlas18.cern.ch/tagservices/index.htm
Goal: Retrieve the TAG file from TAG database• Define a query to select runs, streams, data quality, trigger chains,… •.Review the query • Execute the query and retrieve the TAG file (a root file) to be used in Athena job
Nurcan OzturkNurcan Ozturk 26
Tips for Submitting Your Jobs
Always test your job locally before submitting to the Grid
Use the latest version of pathena/Ganga
Submit your jobs on container datasets (merged)
If your dataset is on tape, you need to request for data replication to a disk storage first
Do not specify any site name in your job submission, pathena/Ganga will choose the best site available
If you are using a new release that is not meant be used for user analysis, e.g. cosmic data, Tier0 reconstruction, then it’ll not be available at all sites. In general you can check release and installation status at: http://atlas-computing.web.cern.ch/atlas-computing/projects/releases/status/ https://atlas-install.roma1.infn.it/atlas_install/
Nurcan OzturkNurcan Ozturk 27
How To Check Status of ATLAS Releases
http://atlas-computing.web.cern.ch/atlas-computing/projects/releases/status/
Nurcan OzturkNurcan Ozturk 28
How to Check Status of Installation of Releases at Sites (1)
https://atlas-install.roma1.infn.it/atlas_install/
Choose here 15.6.8.2
Nurcan OzturkNurcan Ozturk 29
How to Check Status of Installation of Releases at Sites (2)
https://atlas-install.roma1.infn.it/atlas_install/list.php?rel=15.6.8.2
Nurcan OzturkNurcan Ozturk 30
Tips for Retrieving Your Outputs
If everything went fine, you need to retrieve your outputs from the Grid
Where and how you will store your output datasets:
request a data replication: your output files will stay as a dataset on Grid. You need to freeze your dataset before requesting for replication
download onto your local disk using dq2-get: by default, user datasets are created on _SCRATCHDISK at the site where the jobs run. All the datasets on _SCRATCHDISK are to be deleted after a certain period (~ 30days). If you see a possibility to use them on Grid, you should think about replication.
No more than 30-40GB/day can be copied by dq2-get
Details at: https://twiki.cern.ch/twiki/bin/view/Atlas/DQ2ClientsHowTo#AfterCreatingDataset
Nurcan OzturkNurcan Ozturk 31
Tips of Debugging Failures Try to understand if the error is job related or site related
Job log files tell most of the time
If the site went down during your jobs being executed, you can check its status from the ATLAS Computer Operations Logbook: https://prod-grid-logger.cern.ch/elog/ATLAS+Computer+Operations+Logbook/
If input files are corrupted at given site you can exclude the site (till they are recopied) and notify DAST (Distributed Analysis Support Team)
Look at the FAQ (Frequently Asked Questions) section of the pathena/Ganga twiki pages for most common problems and their solutions: https://twiki.cern.ch/twiki/bin/view/Atlas/PandaAthena#FAQ
https://twiki.cern.ch/twiki/bin/view/Atlas/DAGangaFAQ
If you can not find your problem listed there, do a search in the archive of the distributed-analysis-help forum in e-groups: https://groups.cern.ch/group/hn-atlas-dist-analysis-help/default.aspx
If you still need help, send a message to the e-groups, DAST and other users will help you: [email protected]
Nurcan OzturkNurcan Ozturk 32
Does your job require conditions database access?
If you are running at CERN or at Tier1’s, you may have not even noticed it. They provide direct access to Oracle databases. You may have seen possible some overload problems (errors in job logs).
What is stored in Oracle databases: The geometry and most of the conditions data
LAr calibrations and InDet alignments are too large to be stored effectively in Oracle; they are stored as POOL files and replicated to all Tier1’s (soon to Tier2’s)
If your job runs at Tier2/Tier3’s, remote access to Oracle databases is possible Solution for users: provide the conditions DB releases together with the job:
DB release is an extraction of the needed constants into a tar file that is copied to the worker node and accessed locally. Avaiable as a dataset in DDM system.
--dbReleases NameofDataset (in pathena/Ganga)
Now in place: have user jobs to access Oracle database through FroNtier/Squid caches Jobs can run anywhere w/o configuration changes Solves latency problems with jobs running at Tier2/Tier3’s
More info about databases at: https://twiki.cern.ch/twiki/bin/view/Atlas/AtlasDBRelease https://twiki.cern.ch/twiki/bin/view/Atlas/AthenaDBAccess
Nurcan OzturkNurcan Ozturk 33
ATLAS Computer Operations Logbook – site/service problems reported
https://prod-grid-logger.cern.ch/elog/ATLAS+Computer+Operations+Logbook/
Nurcan OzturkNurcan Ozturk 34
pathena FAQhttps://twiki.cern.ch/twiki/bin/view/Atlas/PandaAthena#FAQ
Nurcan OzturkNurcan Ozturk 35
Ganga FAQhttps://twiki.cern.ch/twiki/bin/view/Atlas/DAGangaFAQ
Nurcan OzturkNurcan Ozturk 36
User Support (1)https://twiki.cern.ch/twiki/bin/view/Atlas/AtlasDAST
DAST membersEU time zone NA time zone------------------------------------------------------------------------Daniel van der Ster Nurcan OzturkMark Slater Alden StradlingHurng-Chun Lee Sergey Panitkin Bjorn Samset Kamile YagciChristian Kummer Bill EdsonMaria Shiyakova Wensheng DengJaroslava Schovancova Manuj JhaKarl HarrisonElena Oliver Garcia-----------------------------------------------------------------------
Nurcan OzturkNurcan Ozturk 37
User Support (2)
DAST started in September 2008 for a combined support of pathena and Ganga users
First point of contact for distributed analysis questions
All kinds of problems are discussed, not just pathena/Ganga related ones Analysis tools related problems
DDM related problems
And athena related problems
15 hours coverage with 2 people on shift (one in NA, one in EU time zones). Plan is to have 2 people in each time zone.
DAST helps directly by solving the problem or escalating to relevant experts
More shifters and more user2user support needed
Shift work counts to OTSMOU credit (Category-2 shifts currently)
User feedback is extremely useful to debug the distributed analysis tools and explore the features pathena/Ganga have to offer. So feel free to write to this forum
Nurcan OzturkNurcan Ozturk 38
Pathena Example on 7 TeV Collision Data
Everyone needs to complete this tutorial: https://twiki.cern.ch/twiki/bin/view/Atlas/RegularComputingTutorial
Setup CMT and athena as explained above. You can use the latest release (production cache): source cmthome/setup.sh -tag=15.6.8.2,AtlasProduction,setup,32
Use a ntuple (D3PD) making package to run on AOD’s: Example: SUSYD3PDMaker package https://twiki.cern.ch/twiki/bin/view/AtlasProtected/SUSYD3PDMaker
Get one file locally to test that athena runs fine: dq2-get -n 1 data10_7TeV.00152489.physics_MinBias.merge.AOD.f241_p115/
Configure the job option file and run athena locally as explained in the SUSYD3PDMaker wiki page: athena SUSYD3PDMaker_topOptions.py
Setup pathena on lxplus, or install yourself as explained in pathena wiki page: source /afs/cern.ch/atlas/offline/external/GRID/DA/panda-client/latest/etc/panda/panda_setup.sh https://twiki.cern.ch/twiki/bin/view/Atlas/DAonPanda
Submit your job to Grid with pathena: pathena –inDS data10_7TeV.00152489.physics_MinBias.merge.AOD.f241_p115/
--outDS user10.NurcanOzturk.trgrid.test SUSYD3PDMaker_topOptions.py Monitor your job on Panda monitor or using pbook (bookkeeping tool for pathena). Once your job is
completed, get your output (root files and log files) to your local machine: dq2-get user10.NurcanOzturk.trgrid.test
Open Root and plot some histograms to see how the real data looks like Use analysis codes (D3PD readers) to do more detailed analysis on ntuples. Codes can be based C++
(example in SUSYD3PDMaker wiki page), python (for instance SPyRoot), etc.