52
A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University of Toronto Jet/EtMiss Data Preparation Task Force Meeting September 1, 2009 For the AMFY Task Force (J. Boyd, A. Gibson, S. Hassani, R. Hawkings, B. Heinemann, S. Paganis, J. Stelzer, chaired by T. Wengler) But opinions expressed are sometimes my own. (See Thorsten’s talks for a more

A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Embed Size (px)

DESCRIPTION

Mode of operation Small group of people from different areas in ATLAS; first meeting late May –Full representation of all areas would make the task force unworkable Consulting widely in ATLAS –Detector communities, Trigger, Performance groups, Physics groups, Computing, Selected Analyses, Individuals, University groups etc. Identify and flag problems as we go along Not an implementation task force –Problems are followed up in the appropriate groups (PAT, DPD Task Force, etc.) Rather practical approach; not just formats and tools –How will we get from RAW data on disk to commissioning/performance/physics analyses? Updates from Thorsten at June and August Open Executive Board (EB) meetings –http://indico.cern.ch/conferenceDisplay.py?confId=52946 –http://indico.cern.ch/conferenceDisplay.py?confId=52948 Final report end of September 09 –Not necessarily a ToDo list, but hopefully a recipe of how to do analysis in the first year (with useful related details and many issues cleared up in the process of creating the recipes) 3 September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status

Citation preview

Page 1: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1September 1, 2009

Status of the Analysis Model for the First Year Task Force

Adam GibsonUniversity of Toronto

Jet/EtMiss Data Preparation Task Force MeetingSeptember 1, 2009

For the AMFY Task Force(J. Boyd, A. Gibson, S. Hassani, R. Hawkings,

B. Heinemann, S. Paganis, J. Stelzer, chaired by T. Wengler)

But opinions expressed are sometimes my own.(See Thorsten’s talks for a more “official” view of

preliminary conclusions.)

Page 2: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Analysis Model for the First Year Task Force: Mandate

• The Analysis Model for the First Year (AMFY) task is to condense the current analysis model, building on existing work and including any necessary updates identified through the work of the TF, into concise recipes on how to do commissioning/ performance/ physics analysis in the first year. In particular it will:

• Identify the data samples needed for the first year, and how they derive from each other:– How much raw data access is needed (centrally provide/sub-system solutions)– How many different outputs and of what type will the Tier-0 produce– Expected re-generation cycles for ESD/AOD/DPDs– Types of processing to take place (ESD->PerfDPD, ESD->AOD, AOD-> AOD, AOD-> DPD, etc)

• Related to the items above, address the following points:– Are the Performance DPDs sufficient for all detector and trigger commissioning tasks (are changes to ESD needed)?– What is the procedure for physics analysis in the first year in terms of data samples and common tools used (down

to and including PerfDnPDs and common D3PD generation tools), including both required and recommended items?– How much physics will be done based on Performance DPDs?– How will tag information be used?– Every part of our processing chain needs validation, how will it be done?

• Scrutinise our current ability to do distributed analysis as defined in the computing model • Match the items above to available resources (CPU/ disk space / Tier-0/1/2 capabilities etc).

2

As listed on: https://twiki.cern.ch/twiki/bin/view/AtlasProtected/AnalysisModelFirstYear

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status

Page 3: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Mode of operation

• Small group of people from different areas in ATLAS; first meeting late May– Full representation of all areas would make the task force unworkable

• Consulting widely in ATLAS– Detector communities, Trigger, Performance groups, Physics groups, Computing, Selected

Analyses, Individuals, University groups etc.• Identify and flag problems as we go along• Not an implementation task force

– Problems are followed up in the appropriate groups (PAT, DPD Task Force, etc.)• Rather practical approach; not just formats and tools

– How will we get from RAW data on disk to commissioning/performance/physics analyses?

• Updates from Thorsten at June and August Open Executive Board (EB) meetings– http://indico.cern.ch/conferenceDisplay.py?confId=52946– http://indico.cern.ch/conferenceDisplay.py?confId=52948

• Final report end of September 09– Not necessarily a ToDo list, but hopefully a recipe of how to do analysis in the first year (with

useful related details and many issues cleared up in the process of creating the recipes)

3September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status

Page 4: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

How should a user analyse dataFrom the offline tutorial:

We should provide a recipe something like this, including relations to formats, lumi and DQ information, conditions data, etc.

September 1, 2009 p. 4A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status

Page 5: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

First Steps: Attempt to Define Current Model• Tried to define the model as we saw it then

– Reach a common foundation for future discussion• Flowcharts in several views:

– Format flow– Conditions flow– Reprocessing cycle– Tools– As from an analyst for study x,y, z, …

• Attempt to define current model didn’t really work– Too many pieces currently in flux. – A close look at ay piece requires scrutiny of computing resources, production

system, MC strategy, etc. (at least enough to understand the issues involved)• Gave up on goal of defining “current” model

– Aim for a description, at the end of task force, of envisioned model for first year• Probe assumptions, practicalities, and check it with as many people as possible• Still working on distilling many detailed discussions into one high-level picture,

this talk reflects some of that transition.5September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status

Page 6: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

FE ROD ROS SFO…

Tier-0

RAW

AODAOD

ESDPerfDPDAOD

x9

from here everything is stream-wise

Format Flow

RAW data samples

sub-system resources

merge Commissioning and Performance DPDs

CAF~10%

~10%

Comm.NT

• Trigger/MinBias (from ESD)• Muon Comm. (from RAW)• Tracking Comm. (from RAW)

Tier-1 Tier-2 x N

on disk on disk

on disk as long as possible

one copy of RAW across all Tier-1s

two copies of ESD across all Tier-1s

One copy of all active AOD/PerfDPD/D1PD sets across all Tier-2s of one cloud

ESD/RAW data sets retrieved on request

user/group data sets (i.e. D2PDs), MC produced on Tier-2, copied to Tier-1s

on disk

D1PD

TAG

September 1, 2009 p. 6

A snapshot of one aspect of the (evolving) analysis model.

Page 7: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Issues under discussion

• Flow of conditions data– Important for what type of job can be run where (e.g. analyze perfDPD’s on a laptop?)– (Some AOD analysis may be possible without conditions information; but definitely

needed for ESD analysis and to e.g. change jet calibrations or apply a new alignment)– Copies of full oracle DB at Tier-1’s– CondDB access for PerfDPD analysis at Tier-2’s is a concern– Our current favoured solution: Frontier/SQUID caching– Keep a cache of recently accessed conditions at Tier-2’s (and, if you like, on Tier-3’s,

or even your personal computer)– Applied successfully at e.g. CDF, CMS– Testing in progress, some technical issues worked through– Testing at multiple sites underway– Hope for a widespread deployment this fall– Backup solution: a “hero” effort of frequent sqlite file production?

7September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status

Page 8: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Issues under discussion• Use of TAGs

– ~1 kB/event containing general features of event– Physics and cosmic/commissioning version; ROOT files; DB and web interface– Allows for quick identification of events, and fast access to particular events in other formats– Should be extremely useful for finding unusual events, or accessing a particular run/event number

to e.g. make an event display– Mechanisms not extensively (user-) tested, as hardly needed for MC– Would be helpful if they could point to events in all formats (RAW all the way to performance

and physics DPD’s) – some technical issues remain to be solved– Could be used for large-scale sample selection, not just to select a few events– Our current conclusion: TAG should be exercised on cosmic data and mixed MC samples now, to

have them in regular use ASAP– AMFY members commissioning tests to

• Use TAG root files to select events.• Use TAG database with web interface to select events.• Select AOD events using physics TAG (on the grid).• Select ESD and RAW events using commissioning TAG (on the grid).

8September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status

Page 9: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 9

http://indico.cern.ch/conferenceDisplay.py?confId=59721Analysis of the cosmic09 RNDM events with TAGs dataPablo Barbero, Luca Fiorini; Barcelona

Example of (commissioning) TAG Use.Can also be used as a lightweight ntuple:

Page 10: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 10

Ave

rage

Sum

Ene

rgy

(MeV

)R

MS

Sum

Ene

rgy

(MeV

)

Page 11: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 11

Shift in the EMB noise (or pedestals) on the time scale of ~1 hour.

47 events with >1 TeV in LAr(data corruption issue, now, again, masked)

Long , but continuous, tail in tile.Here to ~100 GeV.

MET Tails

Time Dependence of LAr Noise

Page 12: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Tools: Common D3PD (ntuple) Makers• Many steps in the analysis chain are well defined

– Everything down to AOD/D1PD level is centrally provided and produced• Common D3PD (aka ntuple) maker under development

– A highly configurable, centrally supported ntuple dumper will be very beneficial (under construction in the PAT group)

– Based on Storegate ntuple, no binding to any code producing the SG objects, therefore easy to use for many (all?) use cases

– Core code now exists; need to see what’s required from clients (jo’s, etc.)– Offer to produce commissioning ntuples produced on the production system

(e.g. Tier1’s) (some interest from tile, L1Calo, muons) • L1Calo community is testing automated production (first with own tool, then will

switch to new PAT tool)– Physics and performance groups may eventually be interested in centralized

production, but many prefer to keep close control for now

12September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status

Page 13: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Reprocessing Cycle

• Reprocessing cycle– Unlikely that outputs directly from the Tier0 will be suitable for final results.– How often, and from what format, should we reprocess the data?– Other formats possible, but lowest level corrections (cable swaps, re-clustering in

ID, even masking calorimeter cells) requires reprocessing from RAW– Should be able to do ~3 from RAW next year – do we need more?

• Can only keep ~2 versions of AOD on disk• Limiting factor is likely to be validation effort

– Unless we use Tier0 cache based “fast” reprocessing as in July cosmics...

• Latest chunk of data is always ~ as good as a reprocessing • Analysts need some stability to dig deep (physics and performance)• Significant jumps in detector understanding need some time anyway

– Our preliminary conclusion: Only do reprocessing from RAW in 2009/10, foresee about 2-3 cycles, and validate these properly

– Role of “fast” reprocessing from Tier0 cache, a la June-July 2009 cosmics?

13September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status

Page 14: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Reprocessing Cycle, cont.AODFix

• Reprocessing cycle cont.– In the model above, could also deal with some part of the corrections with AOD-

to-AOD corrections on the fly (AODfix mechanism)• Frontier or SQLite file with DB updates + some routines for correction

logic to be distributed with production caches• To be called first in AOD job, delivers corrected quantities to subsequent

chain• Only works if corrections can be applied based on AOD information

– Reprocessing later merges such changes into a new (physical) version of AOD

14September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status

Page 15: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Data Access and Computing• Estimated usage from detector, performance, and physics groups is mostly in line

with expectations– Dedicated calibration/alignment streams for analysis at CAF, dedicated Tier2 facilities– A limited number of ntuples produced at Tier0, when speed’s critical or systematic

RAW access is needed (trigger, muon, track validation)– Lots of expected use of performance DPD’s (Tier2’s and downstream)– Many requests for a modest amount of RAW data

• My favorite run or three to study in detail; cable swaps; validation of calculations in electronics e.g. LAr DSP

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 15

Page 16: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Data Access and Computing, cont.

• Sharing of CAF disk resources– Calibration and alignment streams, Tier0 produced ntuples, a given– Long assumed that we’d stream X% of the express stream (RAW and ESD?)

• Can be helpful for quick investigation of new, but not too rare, problem• Sometimes just important to have “some RAW data”• Perhaps only maximally utilized if automatic “Task Management System” is used

• Much interest in “on demand” access to RAW/ESD on CAF– A popular model; a web form exists for requests– Can they be approved automatically, below a certain threshold, or for certain users, etc?– How quickly is access possible? Only after end of run?

• Requests for RAW data of specific samples (CAF or group space)– RAW data for all W/Z electrons– RAW data for 100k high pt muons from early data– RAW/ESD for 100k jets from early data for L1Calo timing studies– Large amounts of RAW data in first weeks for HLT validation– Significant amounts of ESD’s if perfDPD’s aren’t sufficiently flexible (e.g. jet

calibrations, single beam, pre-HLT collision data)

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 16

Page 17: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Data Access and ComputingIssues to be Discussed

• Model for access to RAW/ESD on Tier1’s? – In the first year we can likely afford a certain amount of flexibility– Not just for centralized reprocessing?– Who can access data on Tier1’s? Users? Tier2/3 subscriptions? Detector,

Performance, and/or Physics groups?– In the first year will some access be allowed relatively freely? Or what procedures will

be required?– Important to understand before the data arrives.

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 17

Page 18: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Performance DPD’s: Size Constraints

• So far, the goal is (total size of primary DPD’s) == (total size of AOD’s)– ~90% of that space going to performance DPD’s (ESD information)– That 90% is divided equally among nine performance DPD’s

• Strategic question: is the goal– 1)Do photon-jet balancing on AOD’s; if problems are found, go to perfDPD to

understand what’s happening, implement a fix, and then wait for a reprocessing?– Or 2)Do full measurement with performance DPD?– For case 1) you don’t need full statistics in the performance DPD; for case 2) you do– If the size constraints are too tight for only a few measurements (e.g. jet calibrations), can we

afford to allow them rather free access to ESD’s? Or can they use AOD’s (with no ability to e.g. recluster jets)?

• Space saving or redistribution possibilities could be implemented– Possible to store fewer copies of performance DPD’s on grid? Is a copy per cloud more than

we need? (Might the same be true even of AOD’s?)– Or share space unequally among perfDPD’s?– Or just allow them to be larger, if it reduces the need for ESD access?

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 18

Page 19: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Performance DPD’s: Size Constraints, cont.

• Seems likely to be a significant issue for e.g. jet studies and calibrations– Dijets (e.g. eta-dependent corrections from dijet balancing)– Gamma-jet (e.g. absolute correction from photon-jet balancing)– At first, it’s not at all clear we can afford to give up statistics at low PT; current DPD size

requirements force us to give up ~98% of recorded events• We should address this formally – soon!

– Build a case starting from those doing jet studies and calibrations• Is there a problem?• Statistical error bars for different DPD scenarios?

– Feedback from users Jet Calibration and DP Task Force Jet/EtMiss conveners DPD Task Force, DP conveners, DPC community?

– Not so much time to find a solution…• Possible solutions include:

– Full statistics calibrations always extracted from AOD’s? Or from ESD’s, but rather rarely, as part of a central request from Jet/EtMiss Group?

– Smarter DPD cuts?– Somewhat larger jet DPD’s, perhaps based only at selected grid sites?

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 19

Page 20: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

DPD Transition in Early Data

• Transition from detector-driven commissioning DPD’s to performance group-driven performance DPD’s

• Producing both is considered too heavy for the Tier0 (and inefficient) • Are the Performance DPDs sufficient for all detector and trigger commissioning tasks?

– For detectors, possibly, if they can be made sufficiently flexible– Run type: Cosmicssingle beamlow lumihigher lumi (moving back and forth between these)– Triggering: BPTXL1 menuHLT turns onevolving trigger menus with lumi– Are the perfDPD trigger and stream selection and prescales flexible enough to handle this?– Can the production system support that flexibility?

• Can we afford relatively free access to ESD’s if perfDPD selection isn’t flexible enough?– L1Calo Timing-in with first data; Studies with low pT muons– More widespread use for first month(s) data, while LHC and trigger ramp up to perfDPD assumptions?– Or perhaps some residual use of commissioning DPD’s?

• If the LHC moves from single beam to 1E31 collisions relatively quickly, and the trigger commissioning goes well, these questions are less important; but seems prudent to plan for a possibly longer transition period

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 20

Page 21: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

A Few Other Important Topics• DPD format for first year analysis

– Recommendation: only one format between AOD and ntuple (merged D1PD/D2PD), called dAOD (suggest also renaming PerfDPD to dESD, without however changing their definition at this point)

– dAOD driven by group analysis needs, possibility for added group info (example: top D2PD – here directly produced from AOD: top-dAOD)

– Coordinated via PC ( Signal of one group useful for background studies of others)

• Formats for MC: We should have the same formats as for the data(built with the same software for reco quantities, whenever possible)– Full DPD production for MC09– Would allow more performance and physics analyses to test DPD’s• InsituPerformance package for feeding performance information (e.g. efficiencies) into MC

• DQ and lumi information in analysis should be exercised– Need MC sample with (fake) lumi/DQ info in ATLAS wide use before data taking starts

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 21

Page 22: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Summary of Missing Pieces• The Model

– We can hopefully document a reasonable analysis model for the first year.– But several important pieces are not yet available, or not widely used/tested

• Not yet available, but urgently needed– dAOD formats (providing recommendation is accepted)– Concrete plan for the CommissoiningDPD PerformanceDPD (dESD) transition– Frontier + backup solution for conditions data access on Tier-2 and below, deployed for ATLAS wide

use– Common ntuple dumper deployed for ATLAS wide use– AODFix mechanism (if recommendation is accepted)– MC samples equivalent to first data formats with lumi/DQ info

• Urgently needing exercise and feedback– Use of lumi/DQ info in analysis– InsituPerformance for feeding performance information (e.g. efficiencies) into MC– TAG (especially use on the GRID, back-navigation to RAW and all formats)– Improvements to Athena read speed and compile/initialization times– Exercising distributed analysis for both physics and performance analyses (now!)

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 22

For jet studies: the question

of DPD statistics?

Page 23: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Analysis Model for the First Year Task Force: Outlook

• We continue to explore our assumptions of how analysis will happen in the first year

• Identify missing pieces, issues for discussion as we go along– Pass them along to other groups that can implement solutions– Including “items for immediate action” on next two pages

• Help raise awareness of analysis tools and procedures• Your input still very welcome

– Contact info at https://twiki.cern.ch/twiki/bin/view/AtlasProtected/AnalysisModelFirstYear

• Target for report and wrap-up of task force is end of September 2009– Including recipes of how to do commissioning/performance/physics analyses in the first

year– Be ready to robustly analyze LHC data in November!

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 23

Page 24: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

• Additional material

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 24

Page 25: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

• Analysis Model for the First Year (AMFY) Task Force Twiki– https://twiki.cern.ch/twiki/bin/view/AtlasProtected/AnalysisModelFirstYear

• DPD Task Froce– https://twiki.cern.ch/twiki/bin/view/AtlasProtected/PrimaryDPDMaker– With links to performance, physics, etc DPD pages.

• Commissioning TAG Twiki– https://twiki.cern.ch/twiki/bin/view/Atlas/CommissioningTag

• TAG tutorials– https://twiki.cern.ch/twiki/bin/view/Atlas/EventTagTutorials

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 25

Page 26: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Skeleton physics analysis model 09/10

26

AOD

dAOD

AODfixAnalysis group driven definitions coordinated by PC, May have added meta data to allow ARA-only analysis from here

User file

PAT ntuple dumper keep track of tag versions of meta-data, lumi-info etc

Direct or Frontier/Squid DB accessPool filesUse of TAG

Athena[main selection& reco work]

User format[final complexanalysis steps]

2-3 times reprocessingfrom RAW in2009/10

With release/cache

Re-produce for reprocessed dataand significant meta-data updatesMay have several forms (left to the user):•Pool file (ARA analysis)•Root Tree•Histograms•…

results

Port developed analysisalgorithms back to Athena as much as possible

Data super-set of good runs for this period

Page 27: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Reminder of Data Formats (old/existing nomenclature)

• RAW data– Recorded at 200 Hz nominal trigger rate and 1.6 MB/event nominal size– Archived on tape (kept on disk as long as possible for reprocessing, and analysis?)

• ESD (Event Summary Data) full output of the reconstruction and enough low level objects (hits, calorimeter cell energies) to allow reprocessing

– 0.8 MB nominal size– Archived on tape; kept on disk at two different T1’s for reprocessing, and analysis?

• AOD (Analysis Object Data) summary of reconstruction (electrons, vertices, etc.)– 150 kB nominal size– Archived on tape; sent to all T1’s for distribution to associated Tier 2’s

• DPD (Derived Physics Data)– Performance DPD’s begin from the ESD and select particular events and/or event info– Physics DPD’s begin from the AOD and select particular events and/or event info– Primary DPD’s (D1PD’s) are as above– D2PD’s are derived from D1PD’s, are more specialized, and add analysis specific info– D3PD’s are typically small flat ntuples and are used to produce final analysis plots– (all others are POOL format, as are ESD and AOD)

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 27

We’re considering possible alternate naming conventions, or even reducing the number of formats.

Page 28: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Conditions data flow

ATONR100 GB/y

ATLR100 GB/y

Tier-1 DB100 GB/y

detectors

DAQ

PVSS

pvss2cool

POOL files<500 GB/y

POOL files

Calibrationprocesses

POOL files

dbproxy

pt1transfer

HLT

Tier-0

CAF

users

DBRelease:MC ~ 300MB

CDRelease: data- 1GB slice?

Tier-1 prod

Tier-1 analysis

SQLite/POOL

Tier-2/3 analysis?

Tier-2 MCprod

DDM

DDM

Frontier? Squid cache?

Squid cache?

RH

Online

Tier-0

Tier-1

Tier-2/3

POOL files?DDM

DDM

September 1, 2009 p. 28A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status

Page 29: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 29

Page 30: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Reprocessing Cycle, cont.

• Reprocessing cycle cont.– In the model above, could also deal with some part of the corrections with AOD-

to-AOD corrections on the fly (AODfix mechanism)• Frontier or SQLite file with DB updates + some routines for correction

logic to be distributed with production caches• To be called first in AOD job, delivers corrected quantities to subsequent

chain• Only works if corrections can be applied based on AOD information

– Benefits• Needs no extra storage (not re-writing AOD, correcting on the fly)• Keeps version of AOD on disk stable, with an easy way check differences

between corrections• Easy to recover from mistakes in correction procedure

– Reprocessing later merges such changes into a new (physical) version of AOD

30September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status

Page 31: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Reprocessing Cycle, cont.

• Also to consider: reprocessing order of operations– If ID produced a new alignment, do muons then have to realign?– May need handshaking process before reprocessing; and bookkeeping to be sure a

consistent set of conditions is applied

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 31

Page 32: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status 32

Reprocessing Cycle, cont.

• Strategy for the Express Stream and Bulk Reconstruction– Related to reprocessing in the sense that one lives with the first pass bulk

reconstruction until the first reprocessing (perhaps months)– Plans developing for cosmics express stream and 24-48 calibration/DQ loop

for October cosmics– Ongoing discussion about first LHC data, when to implement an express stream,

and when to begin holding the bulk reconstruction• We’re happy to see discussion underway in DPC about when to release the Bulk

for reconstruction on the Tier0– Some comments in backup slides

Page 33: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status 33

Strategy for the Express Stream and Bulk Reconstruction

• Related to reprocessing in the sense that one lives with the first pass bulk reconstruction until the first reprocessing (perhaps months)

• Plans developing for cosmics express stream and 24-48 calibration/DQ loop for October cosmics

• From Claude Guyot (DPC):• Strategy for the beam start-up yet to be defined:

– Express stream and calibration loop to be implemented “asap”• Would imply a delay in the bulk reco by 24-48h• Clearly not needed for the single beam operations

– “asap” meaning to be discussed with(in) Trigger coordination and with the detector groups

– Several calibration/alignment tasks may be based on the express stream even soon after the collision start-up:

• Pixels, TRT calibration, update of bad channels list, beam spot (if used in the bulk reco)

Page 34: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Strategy for the Express Stream and Bulk Reconstruction, cont.

• Implications of holding the express stream– Trigger ntuple (and other T0 produced ntuples) only available promptly for the express team –

impacts time sensitive commissioning tasks– Offline DQ histograms only available promptly for the express stream– Allows time to improve inputs to bulk reconstruction: missing module and noise maps, some

offline DQ decisions, perhaps 24 hour alignment, beam spot, updated list of bad channels, etc– Affects plans for detector and DQ operations – good to test this in October– Full calibration loop would significantly complicate Tier0 operations, already perhaps a bit

chaotic for first LHC data, and may increase the chance of mistakes being made– Most detector systems don’t think it’s critical to delay bulk reconstruction at first, but it’s

certainly the long-term plan– It may be best to not start bulk reconstruction at Tier-0 before the end of a run, at the earliest (so

that all missing module and noise maps etc available)

• We’re happy to see discussion underway in DPC

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 34

Page 35: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 35

A. Hoecker

Page 36: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 36

Possible Startup Scenario for 2009Under discussion in DP

A. Hoecker

Page 37: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Data Access and Computing• Estimated usage from detector, performance, and physics groups is mostly in line with

expectations– Dedicated calibration/alignment streams for analysis at CAF, dedicated Tier2 facilities– A limited number of ntuples produced at Tier0, when speed’s critical or systematic RAW access

is needed (trigger, muon, track validation)– Lots of expected use of performance DPD’s (Tier2’s and downstream)– Many requests for a modest amount of RAW data

• My favorite run or three to study in detail; cable swaps; validation of calculations in electronics e.g. LAr DSP– A lot of interest in on-demand access to small to medium amounts of RAW or ESD data for

debugging problems as they come up• More so than for streaming access to a flat X% of data?

• Where to use samples for commissioning– Existing grid space; existing CAF space– Many groups have made requests for dedicated T2 space– Some requests for large amount of CAF space – some may better suited for grid

• e.g. A large sample (~40 TB) of muons to validate new alignments, calibrations, software fixes• e.g. Early sample of calorimeter events (DPD’s or ESD’s) for L1Calo timing

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 37

Page 38: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Performance DPD’s: Size Constraints

• So far, the goal is (total size of primary DPD’s) == (total size of AOD’s)– ~90% of that space going to performance DPD’s (ESD information)– That 90% is divided equally among nine performance DPD’s

• Strategic question: is the goal– 1)Do a tag-and-probe measurement on AOD’s; if problems are found, go to perfDPD to

understand what’s happening, implement a fix, and then wait for a reprocessing?– Or 2)Do full tag-and-probe efficiency measurement with performance DPD?– For case 1) you don’t need full statistics in the performance DPD; for case 2) you do– If the size constraints are too tight for only a few measurements (e.g. jet calibrations), can we

afford to allow them rather free access to ESD’s? Or can they use AOD’s (with no ability to e.g. recluster jets)?

• Space saving or redistribution possibilities could be implemented– Possible to store fewer copies of performance DPD’s on grid? Is a copy per cloud more than

we need? (Might the same be true even of AOD’s?)– Or share space unequally among perfDPD’s?– Or just allow them to be larger, if it reduces the need for ESD access?

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 38

Page 39: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Data Access and Computing: Special Cases

• HLT Commissioning

• Requires access to a range of data types, depending on slice (RAW, ESD, AOD, Tier0 Trigger Ntuple, Histograms)

• But, especially in Phase 1, needs access to a large amount of RAW data– Want to validate HLT code offline on the full range of events– Initial request was for all RAW data to CAF; but unless duty factor of LHC is small this

by definition requires T0-like resources– Some solution to be worked out…– HLT validation will be a very high priority for first data

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 39

Page 40: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

DPD’s and Physics• Physics groups will also make use of performance DPD’s (early on may be important to be

able to e.g. recalculate EtMiss)– Especially for physics, would be helpful to have selection based on trigger objects from the EF

whenever possible; independent of offline release and calibration– Nevertheless, analysis should start from AOD quantities from the beginning, not ESD quantities,

so that there are no transition problems later. (AOD’s may even play a role in performance studies)• Reducing format proliferation: possible merger of (physics) D1PD and D2PD?

– Format is driven by physics groups, so primary physics DPD could well contain info like top inputs (becoming more like D2PD’s)

– Incentive to use common code further down analysis chain, making validation easier– Whoever doesn’t have a specific format could borrow one, or use AOD’s

• Performance DPD’s have relatively sophisticated plans, and often experience, with their DPD’s– Less so for physics; when you’re using selected MC samples the DPD’s less necessary; also true

when statistics in data re modest– Need more physics (and performance) DPD users! Production with MC09, and encourage to

exercise, should help (also for performance groups)

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 40

Page 41: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

DPD’s and Physics

• Physics groups will also make use of performance DPD’s (early on may be important to be able to e.g. recalculate EtMiss)– Recommend that final analysis results should be based on AOD quantities from the

beginning, not ESD quantities, so that there are no transition problems later.

• Reducing format proliferation: possible merger of (physics) D1PD and D2PD?– Format is driven by physics groups, so primary physics DPD could contain info like

top inputs (becoming more like D2PD’s)

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 41

Page 42: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Selected MC Issues

• Formats for MC: We should have the same formats as for the data(done by the same software for reco quantities as for data) as for the data �

– Full DPD production for MC09– Would allow more performance and physics analyses to test DPD’s– Including updated definition of physics D1/2PD’s?

• Time-dependent MC– Detector coverage, triggers, alignments, resolutions, will all change over time– Possibly large, discrete changes (e.g. SCT cooling fails), certainly smaller, more frequent

changes (failing LAr front end boards; muon chambers that trip)– Broad interest in time-dependent MC, but not a clear vision; Charlie Young investigating

some technical possibilities– Insitu package seems to be one helpful tool for applying efficiencies, resolutions, etc. in

MC• Little experience so far – needs users!• What’s our model for estimating/bookkeeping performance variables that change over time?

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 42

Page 43: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Using DQ info; Educating users

• DQ and lumi information in analysis should be exercised– Lots of experience with detectors in monitoring, filling DQ (if not yet on 24-48 hour time

scales; that’s planned in October)– Also now for some performance groups– Relatively little experience with users reading DQ flags in their analyses!– Some of this, and TAG tests, is being done TopMix samples

• Educating ATLAS members is important!– The more time one spends at P1 the less involved one is in discussions of analysis models– Hopefully the discussions AMFY has had across ATLAS have helped raise awareness– We can’t afford to lose our detector experts! Need to make it as easy as possible for them

to participate in analysis– Also true for new students, new ATLAS members, etc.– Clear documentation, recipes, offline tutorials, required and recommended techniques well

publicized in visible talks– Complicated, technical issues; so it’s a non-trivial task

August 4, 2009 A. Gibson, Toronto; ATLAS Americas @ NYU; AMFY Status p. 43

Page 44: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Validation

• Validation is a very important area – do we have all the tools we need to verify the data and reconstruction is good quality, deal with problems, check new releases, make the MC as realistic as possible etc– Much formal validation of AOD’s, much less for ESD’s, DPD’s, etc.– Encourage common formats as far down in analysis chain as possible – more users

help validation effort– E.g. Even for Tier0 produced ntuples, derive from ESD if possible so that we validate the

ESD when we use these ntuples for commissioning

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 44

Page 45: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Approval of Results

• Concern about approval procedure for results. When can a new Z cross section be shown? When the performance groups have signed off on new algorithms and calibrations, and then you reconstruct yourself with the approved methods, e.g. from a performance DPD? Or only after a full, central, reprocessing?– Better to make realistic plans now, rather than invent them under pressure of CMS

competition or conference schedules– Smart to exercise full approval procedure now (athena code in CVS, etc.), for analysis

walkthroughs; also for pre-collision PUB notes?

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 45

Role of “Walk throughts” of early analyses?Role of MC analyses currently being approved?

Physics coordination says there are “common misunderstandings” of the approval process. E.g. Your whole analysis doesn’t have to happen in Athena.But the process isn’t crystal clear, and seems likely to contain surprises

Page 46: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

Some common themes• Making the data (and metadata) available without too much pain, as fast as

needed/possible, is an important driving factor– Tested, documented methods of access– If data access is slow analysis is slow– TAGs, Tier-0 ntuples, (Perf)DPDs, Frontier , etc. are all targeted to get this done, but– LHC data flow and user access pattern for data are difficult to test before the LHC

turns on– Glad to hear from Andy yesterday that more “chaotic” tests are planned

• Distributed analysis is a must from the beginning– Time critical (e.g. online-related tasks) must be done with CERN based resources,

but:– We will need the full computing model online from the beginning, or we will be

painfully slow – We all need to help commission grid, not just find one less painful solution (BNL

Tier1 != the Grid)

46September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status

Page 47: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 47

http://indico.cern.ch/conferenceDisplay.py?confId=59721Analysis of the cosmic09 RNDM events with TAGs dataPablo Barbero; Barcelona

Page 48: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 48

Page 49: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 49

Page 50: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 50

Page 51: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 51

Page 52: A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 1 September 1, 2009 Status of the Analysis Model for the First Year Task Force Adam Gibson University

September 1, 2009 A. Gibson, Toronto; Jet/EtMiss DP; AMFY Status p. 52