Upload
johnathan-robbins
View
218
Download
1
Tags:
Embed Size (px)
Citation preview
ATLAS Analysis Overview
Eric TorrenceUniversity of Oregon/CERN
10 February 2010Atlas Offline Software Tutorial
2Eric Torrence February 2010
Outline
This talk is intended to give a broad outline for howanalyses within Atlas should be done for first
data.
Technical details will be covered by other speakers
• ATLAS data types and locations
• Derived samples (dAOD/dESD)
• Data sample selection
• Luminosity
AMFY Task ForceJ. Boyd, A. Gibson, S. Hassani, R. Hawkings, B. Heinemann,
S. Paganis, J. Stelzer, T. Wengler
Analysis Model for the First YearFinal report October 2009
See talk by Thorsten Wengler at Barcelona Atlas Week
http://indico.cern.ch/materialDisplay.py?contribId=73&sessionId=11&materialId=slides&confId=47256
Or the final AMFY Reporthttp://cdsweb.cern.ch/record/1223952
4Eric Torrence February 2010
Analysis Data Formats
•RAW - specialized detector studies (e.g.: alignment)•ESD - detector performance studies•AOD - starting point for physics analyses (~1/10 of ESD)
All events processed from RAW to AODProcessing separated by trigger-based data STREAM
•Egamma•jet - Tau - MET•Muon•Minbias•Bphysics
Streams are inclusive, e.g. : All electron triggers written to e/gamma stream
Physics Streams
RAW AOD ESD
AOD
5Eric Torrence February 2010
Derived Samples
AOD ESD
AODRAW
•Derived samples (dESD, dAOD) aim to reduce sample size furtherby removing events (skimming) and/or removing info (slimming/thinning)
•Many versions/variants possible for specific uses•Skimming based on offline quantities and/or trigger
dAODdESDAOD
Specialised RAW samples
AODFix Central ProductionGroup ProductionSmall Group/User
dAOD (Physics DPD) - e.g.: diMuon (2 mu, pT > 15 GeV)dESD (Perf. DPD) - e.g.: eGamma (CaloCells near e/gamma triggers)
Currently popular: dESD_COLLCAND
6Eric Torrence February 2010
User Data
AOD ESD
AODRAW
dAODdESDAOD
AODFix
ntuple ntuple
ESD, AOD, dESD - Stored onGrid in ATLAS space
production managed centrally
dAOD, ntuples - Stored ingroup disk or local user space
managed by groups/users
Each group has a grid space manager and a
production manager. Good to find out who this person
is...
Performance
Analysis
PhysicsAnalysis
7Eric Torrence February 2010
Databases and Metadata
Oracle server at nearest Tier-1
Tier-2
laptop
desktop
Tier-3
Analysis model assumes any Athena job (ESD/dESD, AOD,
dAOD) needs access to conditions data (e.g.: COOL)
•Detector geometry, conditions, alignments•Data quality and trigger configuration metadata •InSituPerformance information•Dataset information (AMI)•TAG database (TAG)
Even a purely ntuple-based analysis may need some of this.
Potential bottleneck, hassle for end usersRunning on MC typically easier than data
Be aware of external files referenced by DB...
9Eric Torrence February 2010
Selecting your data
• Time range/machine conditions
- Collision Energy: 7 TeV
- All 2010 data approved for Summer conferences
• Detector/offline data quality (DQ flags)
- Data periods with good electrons in barrel
- Data with pixels turned on
• Trigger Configuration
- EF_e20_loose active
- Both e20 and mu10 active and unprescaled
All analyses must define their data sampleKey ingredient in defining luminosity
10
Eric Torrence February 2010
Luminosity Blocks
• ATLAS runs are subdivided into Luminosity Blocks (~1 min)
• LB is the atomic unit for selecting a data sample
• Most conditions are mapped to specific luminosity blocks
- Trigger pre-scales (can only change at LB boundary)
- Data Quality flags (mapped to LBs offline)
• Luminosity can only be determined for a specific set ofruns/luminosity blocks
Specifying your data sample functionally meansspecifying a list of runs/LBs to analyze.
aka: GoodRunList
Run 165789Lumi Block 1 2 3 4 5 6 7 8 9 10
11
Eric Torrence February 2010
Data Quality Flags• DQ Flags are simple indicators of data quality,
but many many sources (I counted 102...)
- Detectors (divided by barrel, 2 endcaps)
- Trigger (by slice)
- Offline combined perf. (e/mu/tau/jet/MET/...)
• Physics analyses should not arbitrarily choose DQ flags
• Combined performance groups will define a recommended set of DQ flag criteria for physics objects (e.g.: Barrel electrons or forward jets) - soon with virtual flags
• Users in their working group will decide which set of (virtual) DQ flags is needed for each analysis, whichcan then be used to generate a GoodRunList
• Standard lists will be centrally produced by DQ group
12
Eric Torrence February 2010
Reprocessing• Data processed at Tier 0 (right off the detector)
have preliminary calibrations and first-pass DQ flags
• Any physics results must start with reprocessed data
- Updated calibrations
- Consistent release/bug fixes
• DQ flags are re-evaluated, tagged and locked after each reprocessing - flags fixed, but must use correct tag!
• Important to always access DQ information from the COOL database, using the tag appropriate to your data processing version, or else you won’t get consistent results
Best to use officially generated (static) Good Run Listsappropriate for your reprocessed data
13
Eric Torrence February 2010
Luminosity Calculation
• In ATLAS, Lumi is calculated for a specific set of LBs and includes LB-dependent corrections for
- Trigger prescales
- L1 Trigger Deadtime
• The user derived εsel must contain all other event-dependent efficiencies, including
- Unprescaled trigger efficiency (vs. pT for example)
- Skimming efficiency (in dAOD production)
- TAG selection efficiency (in TAG-based analyses)
- Event selection cuts
σ = (Nsel - Nbgd)/(εsel Lumi)
Final luminosity can only be calculated after full GRL and trigger is specified
14
Eric Torrence February 2010
Example Physics Analysis Models• Direct AOD/dAOD
- User submitted Grid job run over all AODs or group-generated dAOD/dESD samples
- Final user ntuples produced directly
• TAG-based analysis
- Start with TAG-selected sample (see TAG tutorial)
- Saves having to run over large datasets
• Ntuple-based analysis
- Start with general purpose group ntuple where all data selection criteria haven’t been applied
- Probably most complicated, but tools support this also
Will concentrate on this, other cases are not so different
15
Eric Torrence February 2010
AOD/dAOD analysis
• Pre-run query based on input parameters already defines GRL (instantiated as an XML file)
• atlas-runquery (or TAG browser) currently best way to do this - eventually most users will start with pre-made lists
• Can find ‘expected’ Luminosity already without runninga single job
• Needs COOL access, but can just be run once, XML GRL files can be transferred to your local area/laptop
Time RangeDQ queryTrigger, ...
Data sampleDefinition
RunQuery GRL(XML File)
COOL
LumiCalc
COOL
ExpectedLuminosity
Pre-run query
16
Eric Torrence February 2010
AOD/dAOD analysis
• No automatic tool (yet) to convert GRL to AOD DataSet(although TAG does this for you)
• Using standard tools, you will get an output GRL - very useful to compare against input for cross-checks
• Copy of GRL will also be saved to your output ntuple if made with the PAT Ntuple Dumper*
Input GRL(XML File)
DataSet(AOD files)
Distributed Analysis
AMIGanga/pAthena
jobjobjobjobjobjob
outoutoutoutoutout
Outp
ut
Merg
e
Output GRL(XML File)
Ntuple
GRL
The Grid
17
Eric Torrence February 2010
AOD/dAOD analysis
• Comparing input and output XML files very useful to check for job failures and dataset consistency
• Output shows exactly which LBs were analyzed
- run LumiCalc directly on ntuple (with COOL access)
- extract XML file, copy to CERN, and run LumiCalc there
Input GRL(XML File)
Bookkeeping
Ntuple GRL(XML File)
Ntuple
GRL
Output GRL(XML File)
Compare!
LumiCalc
COOL
ObservedLuminosity
Ntuple
GRL
LumiCalc
COOL
ObservedLuminosity
18
Eric Torrence February 2010
Sparse Data Problem
• Even if LB selects no events, LB still contributes to Luminosity
• GRL selection tool outputs LB metadata (also in ntuple) even when no events selected
• Derived (dAOD) samples must also obey this requirement
• Must merge all job output to correctly include all LBs consideredworking to include this in standard distributed analysis merger
job 1
job 2
job 3
Grid Job Output
1 ev
0 ev
Input
LB: 1-3
LB: 4-5
LB: 6-8 crashed
Total LBs analyzed:1-5
19
Eric Torrence February 2010
Other use cases• Easiest to make entire selection (complete GRL
specification) before launching AOD/dAOD job
• For many reasons, people may start with samples withpartial (or no) GRL specifications applied
- skimmed group dAODs
- group ntuples
- TAG-selected data*
• Scheme still works, as long as samples have been produced with the standard tools - metadata about all LBs contributing to the sample still present and consistent
• Users must still apply final GRL specification (including trigger) to create sample with well-defined luminosity
• Tools also work at the ntuple level, must save LB number
20
Eric Torrence February 2010
Conclusions
• Analysis model for the first year has undergone considerable discussion and development recentlyThere may still be some rough edges
• General scheme for simple cases now exist
• Work ongoing to support more advanced use cases
• Tools available to make this easier to the end-user- many more technical details shown today
• There are many ways to screw this up, use central tools and central GRLs as much as possible
• If tools don’t seem do what you want, let a developer know!