ATLAS Analysis Overview Eric Torrence University of Oregon/CERN 10 February 2010 Atlas Offline Software Tutorial

ATLAS Analysis Overview

Eric TorrenceUniversity of Oregon/CERN

10 February 2010Atlas Offline Software Tutorial

2Eric Torrence February 2010

Outline

This talk is intended to give a broad outline for howanalyses within Atlas should be done for first

data.

Technical details will be covered by other speakers

• ATLAS data types and locations

• Derived samples (dAOD/dESD)

• Data sample selection

• Luminosity

AMFY Task ForceJ. Boyd, A. Gibson, S. Hassani, R. Hawkings, B. Heinemann,

S. Paganis, J. Stelzer, T. Wengler

Analysis Model for the First YearFinal report October 2009

See talk by Thorsten Wengler at Barcelona Atlas Week

http://indico.cern.ch/materialDisplay.py?contribId=73&sessionId=11&materialId=slides&confId=47256

Or the final AMFY Reporthttp://cdsweb.cern.ch/record/1223952





Analysis Data Formats

•RAW - specialized detector studies (e.g.: alignment)•ESD - detector performance studies•AOD - starting point for physics analyses (~1/10 of ESD)

All events processed from RAW to AODProcessing separated by trigger-based data STREAM

•Egamma•jet - Tau - MET•Muon•Minbias•Bphysics

Streams are inclusive, e.g. : All electron triggers written to e/gamma stream

Physics Streams

RAW AOD ESD

AOD


Derived Samples

AOD ESD

AODRAW

•Derived samples (dESD, dAOD) aim to reduce sample size furtherby removing events (skimming) and/or removing info (slimming/thinning)

•Many versions/variants possible for specific uses•Skimming based on offline quantities and/or trigger

dAODdESDAOD

Specialised RAW samples

AODFix Central ProductionGroup ProductionSmall Group/User

dAOD (Physics DPD) - e.g.: diMuon (2 mu, pT > 15 GeV)dESD (Perf. DPD) - e.g.: eGamma (CaloCells near e/gamma triggers)

Currently popular: dESD_COLLCAND


User Data

AOD ESD

AODRAW

dAODdESDAOD

AODFix

ntuple ntuple

ESD, AOD, dESD - Stored onGrid in ATLAS space

production managed centrally

dAOD, ntuples - Stored ingroup disk or local user space

managed by groups/users

Each group has a grid space manager and a

production manager. Good to find out who this person

is...

Performance

Analysis

PhysicsAnalysis


Databases and Metadata

Oracle server at nearest Tier-1

Tier-2

laptop

desktop

Tier-3

Analysis model assumes any Athena job (ESD/dESD, AOD,

dAOD) needs access to conditions data (e.g.: COOL)

•Detector geometry, conditions, alignments•Data quality and trigger configuration metadata •InSituPerformance information•Dataset information (AMI)•TAG database (TAG)

Even a purely ntuple-based analysis may need some of this.

Potential bottleneck, hassle for end usersRunning on MC typically easier than data

Be aware of external files referenced by DB...


Selecting your data

• Time range/machine conditions

- Collision Energy: 7 TeV

- All 2010 data approved for Summer conferences

• Detector/offline data quality (DQ flags)

- Data periods with good electrons in barrel

- Data with pixels turned on

• Trigger Configuration

- EF_e20_loose active

- Both e20 and mu10 active and unprescaled

All analyses must define their data sampleKey ingredient in defining luminosity

10

Eric Torrence February 2010

Luminosity Blocks

• ATLAS runs are subdivided into Luminosity Blocks (~1 min)

• LB is the atomic unit for selecting a data sample

• Most conditions are mapped to specific luminosity blocks

- Trigger pre-scales (can only change at LB boundary)

- Data Quality flags (mapped to LBs offline)

• Luminosity can only be determined for a specific set ofruns/luminosity blocks

Specifying your data sample functionally meansspecifying a list of runs/LBs to analyze.

aka: GoodRunList

Run 165789Lumi Block 1 2 3 4 5 6 7 8 9 10

11


Data Quality Flags• DQ Flags are simple indicators of data quality,

but many many sources (I counted 102...)

- Detectors (divided by barrel, 2 endcaps)

- Trigger (by slice)

- Offline combined perf. (e/mu/tau/jet/MET/...)

• Physics analyses should not arbitrarily choose DQ flags

• Combined performance groups will define a recommended set of DQ flag criteria for physics objects (e.g.: Barrel electrons or forward jets) - soon with virtual flags

• Users in their working group will decide which set of (virtual) DQ flags is needed for each analysis, whichcan then be used to generate a GoodRunList

• Standard lists will be centrally produced by DQ group

12


Reprocessing• Data processed at Tier 0 (right off the detector)

have preliminary calibrations and first-pass DQ flags

• Any physics results must start with reprocessed data

- Updated calibrations

- Consistent release/bug fixes

• DQ flags are re-evaluated, tagged and locked after each reprocessing - flags fixed, but must use correct tag!

• Important to always access DQ information from the COOL database, using the tag appropriate to your data processing version, or else you won’t get consistent results

Best to use officially generated (static) Good Run Listsappropriate for your reprocessed data

13


Luminosity Calculation

• In ATLAS, Lumi is calculated for a specific set of LBs and includes LB-dependent corrections for

- Trigger prescales

- L1 Trigger Deadtime

• The user derived εsel must contain all other event-dependent efficiencies, including

- Unprescaled trigger efficiency (vs. pT for example)

- Skimming efficiency (in dAOD production)

- TAG selection efficiency (in TAG-based analyses)

- Event selection cuts

σ = (Nsel - Nbgd)/(εsel Lumi)

Final luminosity can only be calculated after full GRL and trigger is specified

14


Example Physics Analysis Models• Direct AOD/dAOD

- User submitted Grid job run over all AODs or group-generated dAOD/dESD samples

- Final user ntuples produced directly

• TAG-based analysis

- Start with TAG-selected sample (see TAG tutorial)

- Saves having to run over large datasets

• Ntuple-based analysis

- Start with general purpose group ntuple where all data selection criteria haven’t been applied

- Probably most complicated, but tools support this also

Will concentrate on this, other cases are not so different

15


AOD/dAOD analysis

• Pre-run query based on input parameters already defines GRL (instantiated as an XML file)

• atlas-runquery (or TAG browser) currently best way to do this - eventually most users will start with pre-made lists

• Can find ‘expected’ Luminosity already without runninga single job

• Needs COOL access, but can just be run once, XML GRL files can be transferred to your local area/laptop

Time RangeDQ queryTrigger, ...

Data sampleDefinition

RunQuery GRL(XML File)

COOL

LumiCalc

COOL

ExpectedLuminosity

Pre-run query

16


AOD/dAOD analysis

• No automatic tool (yet) to convert GRL to AOD DataSet(although TAG does this for you)

• Using standard tools, you will get an output GRL - very useful to compare against input for cross-checks

• Copy of GRL will also be saved to your output ntuple if made with the PAT Ntuple Dumper*

Input GRL(XML File)

DataSet(AOD files)

Distributed Analysis

AMIGanga/pAthena

jobjobjobjobjobjob

outoutoutoutoutout

Outp

ut

Merg

e

Output GRL(XML File)

Ntuple

GRL

The Grid

17


AOD/dAOD analysis

• Comparing input and output XML files very useful to check for job failures and dataset consistency

• Output shows exactly which LBs were analyzed

- run LumiCalc directly on ntuple (with COOL access)

- extract XML file, copy to CERN, and run LumiCalc there

Input GRL(XML File)

Bookkeeping

Ntuple GRL(XML File)

Ntuple

GRL

Output GRL(XML File)

Compare!

LumiCalc

COOL

ObservedLuminosity

Ntuple

GRL

LumiCalc

COOL

ObservedLuminosity

18


Sparse Data Problem

• Even if LB selects no events, LB still contributes to Luminosity

• GRL selection tool outputs LB metadata (also in ntuple) even when no events selected

• Derived (dAOD) samples must also obey this requirement

• Must merge all job output to correctly include all LBs consideredworking to include this in standard distributed analysis merger

job 1

job 2

job 3

Grid Job Output

1 ev

0 ev

Input

LB: 1-3

LB: 4-5

LB: 6-8 crashed

Total LBs analyzed:1-5

19


Other use cases• Easiest to make entire selection (complete GRL

specification) before launching AOD/dAOD job

• For many reasons, people may start with samples withpartial (or no) GRL specifications applied

- skimmed group dAODs

- group ntuples

- TAG-selected data*

• Scheme still works, as long as samples have been produced with the standard tools - metadata about all LBs contributing to the sample still present and consistent

• Users must still apply final GRL specification (including trigger) to create sample with well-defined luminosity

• Tools also work at the ntuple level, must save LB number

20


Conclusions

• Analysis model for the first year has undergone considerable discussion and development recentlyThere may still be some rough edges

• General scheme for simple cases now exist

• Work ongoing to support more advanced use cases

• Tools available to make this easier to the end-user- many more technical details shown today

• There are many ways to screw this up, use central tools and central GRLs as much as possible

• If tools don’t seem do what you want, let a developer know!

Documents

ATLAS Analysis Overview Eric Torrence University of Oregon/CERN 10 February 2010 Atlas Offline Software Tutorial