42
FINDING THE “HIGGS” IN THE HAYSTACK(S) Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference

Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Embed Size (px)

Citation preview

Page 1: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

FINDING THE “HIGGS” IN THE HAYSTACK(S)

Stephen J. Gowdy (CERN) 12th September 2012 XLDB Conference

Page 2: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Overview

Large Hadron Collider (LHC)

Compact Muon Solenoid (CMS) experiment

The Challenge

Worldwide LHC Computing Grid (wLCG)

Data Organisation

Analysis Techniques

Databases

Future Trends

12th September 2012 Finding the "Higgs" in the Haystack(s) 2

Page 3: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

a hadron is a composite particle made of quarks

Large Hadron Collider

12th September 2012 Finding the "Higgs" in the Haystack(s) 3

Page 4: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Big machine characteristics

17 mile circular tunnel, 100m underground, straddling the French-Swiss border

Protons currently travel at 99.9999964% of the speed of light

Each proton enters CH over 11,000 times in a second

Will not reach design beam energy till 2014

Interactions potentially every 25ns (40MHz)

Each interaction has multiple collisions

Call “pileup”, currently around 30 collisions per event

12th September 2012 Finding the "Higgs" in the Haystack(s) 4

Page 5: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Accelerator Complex

Older machines feed newer machines

LHC Protons start in LINAC2 then go to the PS via the BOOSTER

From the PS they are injected to the SPS

Injected to LHC at 450GeV Accelerated to 4TeV in

LHC

Need to have “fills” ~1/day

12th September 2012 Finding the "Higgs" in the Haystack(s) 5

Page 6: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

LHC

CMS

CERN Main Site

12th September 2012 Finding the "Higgs" in the Haystack(s) 6

SPS

Page 7: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

a muon is a (comparatively) long lived big brother to the electron

Compact Muon Spectrometer

12th September 2012 Finding the "Higgs" in the Haystack(s) 7

Page 8: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

12th September 2012 Finding the "Higgs" in the Haystack(s) 8

Page 9: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Particle Identification 101

12th September 2012 Finding the "Higgs" in the Haystack(s) 9

Page 10: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

12th September 2012 Finding the "Higgs" in the Haystack(s) 10

Page 11: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Trigger Architecture

12th September 2012 Finding the "Higgs" in the Haystack(s) 11

Matching “Trigger Towers” ECAL, HCAL:

ET(dd

Electron Isolation,

Jet detection

Sorting

ETmiss

ETtot

0.8 < || < 2.4 || < 1.2 || < 2.1

for Endcap and Barrel:

pT, , , quality

Track segments

endcap and barrel

≤ 4 candidates

Final decision, partitioning

Interface to TTC, TTS (Trigger throttling system)

Page 12: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Data Rates

RAW (ie unprocessed) data is about ~1MB/ev

Potential detector acquisition rate 1MB * 40MHz = 40TB/s

Actual data is much larger but all detectors not able to readout at 40MHz

Hardware trigger decision allows 100kHz rate Looks at individual detectors to make a fast choice

Data rate up to 100GB/s

High Level Trigger done on filter farm Output rate is nominally 300Hz ~= 300MB/s

12th September 2012 Finding the "Higgs" in the Haystack(s) 12

Page 13: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

why it isn’t easy

12th September 2012 Finding the "Higgs" in the Haystack(s) 13

The Challenge

Page 14: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

A “Higgs” event

12th September 2012 Finding the "Higgs" in the Haystack(s) 14

Page 15: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

A Haystack

12th September 2012 Finding the "Higgs" in the Haystack(s) 15

40 reconstructed vertices High PileUp run 25th October 2011

Page 16: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Haystacks

So that was one event

2012 average is 30 collisions per event

By the end of 2012 will have almost 7 billion events recorded

After the reduction of 40MHz to O(300Hz)

Doesn’t include simulated data

Looking for a half million Higgs particles

Assuming predicted cross sections are correct

Many are much much harder to find than 4 muons

12th September 2012 Finding the "Higgs" in the Haystack(s) 16

Page 17: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

like an electric grid that supplies computing power

12th September 2012 Finding the "Higgs" in the Haystack(s) 17

Worldwide LHC Computing Grid (wLCG)

Page 18: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Tiered System

Tier-0 at CERN Data gets “sorted” and its first pass reconstruction

Tier-1 centres CMS has seven, large regional facilities

Provide custodial tape storage

Large scale re-reconstruction

Tier-2 centres Frequently universities or groups of universities

Simulation

End user analysis

12th September 2012 Finding the "Higgs" in the Haystack(s) 18

Page 19: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Schematic

12th September 2012 Finding the "Higgs" in the Haystack(s) 19

CERN

Fermilab IN2P3 ASGC KIT CNAF

Florida UCSD

Tier-0

Tier-2

Tier-1

Tier-3

CMS Detector

Filter Farm

UCLA MyLaptop

Page 20: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

LHCOPN (Optical Private Network)

12th September 2012 Finding the "Higgs" in the Haystack(s) 20

CMS is green

Traf

fic

on

a C

ER

N H

olid

ay

Page 21: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Resources

12th September 2012 Finding the "Higgs" in the Haystack(s) 21

Tier-0 121 21%

Tier-1 137 23%

Tier-2 324 56%

CPU (kHS06) 582kHS06~=150kSi2k

Tier-0 4800 9%

Tier-1 21000 40%

Tier-2 27000 51%

Disk (TB) 51800TB

Tier-0 23000 33%

Tier-1 47000 67%

Tape (TB) 90000TB

Page 22: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

lining up the bytes in a consumable order

12th September 2012 Finding the "Higgs" in the Haystack(s) 22

Data Organisation

Page 23: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Data Tiers

“Streamer” files written to disk by filter farm

Read and reorganised into Primary Datasets (PD)

Based on trigger selections (physics motivation)

Output is the custodial RAW data

Reconstruction run on RAW PDs

Output RECO and AOD (Analysis Object Data)

Simulation also produces similar data tiers plus truth information

12th September 2012 Finding the "Higgs" in the Haystack(s) 23

Page 24: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Data Ordering

ROOT used as persistency framework

Depending on expected reading pattern adjust ordering of data in files

RAW & RECO expected to read whole event

Ordering in file is by event

AOD could have subset of data read

Pass frequently over a single variable making plots

12th September 2012 Finding the "Higgs" in the Haystack(s) 24

Attribute 1

Attribute 4

1 2 3 … n

1 2 3 n

… 1 2 3 n

Page 25: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Skims

Train model like event selection

Various analysis include their event selection

Selection done using reco output

More detailed and accurate than trigger info

Can cut a lot harder

First skims done at Tier-1 on the Tier-0 output

Called PromptSkims as it is started ASAP

Currently write out 81 datasets from Tier-0 output

12th September 2012 Finding the "Higgs" in the Haystack(s) 25

Page 26: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Datasets

Files are collected in datasets

Datasets should be processed together

This actually uses a database (Oracle)

Each dataset has provenance attached to it

Can be superseded by a reprocessing

End user tool queries database and creates jobs to process it

Typically across all the Tier-2s hosting the dataset

12th September 2012 Finding the "Higgs" in the Haystack(s) 26

Page 27: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

narrowing the haystacks

12th September 2012 Finding the "Higgs" in the Haystack(s) 27

Analysis Techniques

Page 28: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Discriminating Variables

Each analysis will find the variables that enhance their signal to noise ratio High energy muon is an easy

one i.e. something going really

fast doesn’t bend so much in the magnetic field

May end up loosing a lot of signal to reduce the background by a larger factor Optimise S/√B or S/ √ (S+B)

12th September 2012 Finding the "Higgs" in the Haystack(s) 28

0

10

20

30

40

50

60

Momentum of muon (GeV)

Pseudo Data

Background Signal

Page 29: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Multivariate Analysis

Many different types Simple rectangular cuts (multiple 1-d cuts)

Maximum Likelihood approaches Combine the probability of all input variables

Fisher Discriminants Input variables are projected to another space to

avoid correlations

Neural Networks

Most of these methods rely on training

Some packages can apply many methods

12th September 2012 Finding the "Higgs" in the Haystack(s) 29

Page 30: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

TMVA (Toolkit for MVA in ROOT)

12th September 2012 Finding the "Higgs" in the Haystack(s) 30

Page 31: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

New Boson Plot

H -> ZZ -> llll

Use five angles and two masses as discriminators

12th September 2012 Finding the "Higgs" in the Haystack(s) 31

Page 32: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

not xldbs though

12th September 2012 Finding the "Higgs" in the Haystack(s) 32

Databases

Page 33: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Conditions Database

Largest database use (not in size, ~300GB)

Provides calibration, geometry and alignment information

Used by all running jobs

Can be more than 100k jobs world wide

Network of squid caches used

Database queues transformed into http requests

Home grown technology to achieve this (Frontier)

Works as data is written once, read many

12th September 2012 Finding the "Higgs" in the Haystack(s) 33

Page 34: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

12th September 2012 Finding the "Higgs" in the Haystack(s) 34

Squids Aggregate: 500k requests/min

500MB/s

Offline Servers: 4k requests/min

0.5MB/s

Page 35: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Other Databases

PhEDEx : Manages file transfers

Single Oracle instance at CERN

DBS : Dataset Bookkeeping System

Contains meta-data about datasets and files

Main instance in Oracle at CERN

User instances available elsewhere with MySQL

Job tracking databases

Use both Oracle and MySQL

Recent system archiving information in CouchDB

12th September 2012 Finding the "Higgs" in the Haystack(s) 35

Page 36: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Reading Rate

12th September 2012 Finding the "Higgs" in the Haystack(s) 36

6TB/day

250TB/day

Page 37: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

…need to wear shades

12th September 2012 Finding the "Higgs" in the Haystack(s) 37

Future Trends

Page 38: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Federated Storage

Aiming towards an architecture where all storage is visible globally

12th September 2012 Finding the "Higgs" in the Haystack(s) 38

User App

Global Redirector

US Redirector EU Redirector

Site A Site B Site C Site D

Open /store/foo

Query /store/foo Query /store/foo

Query /store/foo

/store/foo

Redirect Global

Open /store/foo

US Region EU Region ?? Region

Redirect EU

Redirect Site C

Page 39: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Clouds: for a rainy day

Helix Nebula

European initiative to provide unified system

Shows importance for standards

Proof of concept demonstrated on Amazon

Costs still prohibitively expensive

Estimate order of magnitude

Running our own data centres more cost effective

May be interesting for adding short term capacity

12th September 2012 Finding the "Higgs" in the Haystack(s) 39

Page 40: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Clouds: internal cloud

CERN moving to “agile” infrastructure

Commissioning new data centre in Hungary

Filter farm as cloud during LHC shutdown

Using OpenStack across 15k cores

Allows flexibility for redeployment

Farm also needed for detector work

12th September 2012 Finding the "Higgs" in the Haystack(s) 40

Page 41: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

Summary

Database technology used in various roles

Whole size around 10TB: not huge

Our Big Data: 20PB RAW data

CMS uses worldwide computing infrastructure to deliver physics results

We’ve found a needle, now need to figure out what kind it is: http://lanl.arxiv.org/abs/1207.7235

12th September 2012 Finding the "Higgs" in the Haystack(s) 41

Page 42: Finding the “higgs” in the haystacks Trends Finding the "Higgs ... lining up the bytes in a consumable order Finding the "Higgs" in the Haystack(s) ... Output RECO and AOD

XLDB Europe 2013 @ CERN

CERN will be happy to host a European Satellite XLDB

Planned date: 25+26 June 2013 During LHC long shutdown, which will allow to

include also discussions on LHC data management issues

We invite everyone to help reaching out to places in Europe with challenging xldb-related issues please contact [email protected] and

[email protected]

12th September 2012 Finding the "Higgs" in the Haystack(s) 42