16
overview Graeme Watt (IPPP Durham) 2nd DPHEP Collaboration Workshop CERN, 13 th March 2017 1 https://hepdata.net

watt hepdata dphep mar2017 · 2018. 11. 20. · G. Watt What is HEPData? • Unique open-access repository for scattering data from more than 8000 experimental HEP (“hep-ex”)

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: watt hepdata dphep mar2017 · 2018. 11. 20. · G. Watt What is HEPData? • Unique open-access repository for scattering data from more than 8000 experimental HEP (“hep-ex”)

overview

Graeme Watt (IPPP Durham)2nd DPHEP Collaboration Workshop

CERN, 13th March 2017

1

https://hepdata.net

Page 2: watt hepdata dphep mar2017 · 2018. 11. 20. · G. Watt What is HEPData? • Unique open-access repository for scattering data from more than 8000 experimental HEP (“hep-ex”)

G. Watt

What is HEPData?

• Unique open-access repository for scattering data from more than 8000 experimental HEP (“hep-ex”) papers.

• “Level 1” according to DPHEP classification [1205.4667].

• Traditional focus on production cross sections etc.

• In recent years also include data from LHC searches.

• Based in Institute for Particle Physics Phenomenology (IPPP) at Durham University (UK), going back to 1970s.

• Funded by UK Science & Technology Facilities Council (STFC). Current grant to support two staff until 2019.

2

Page 3: watt hepdata dphep mar2017 · 2018. 11. 20. · G. Watt What is HEPData? • Unique open-access repository for scattering data from more than 8000 experimental HEP (“hep-ex”)

G. Watt

HEPData staff in Durham

• Graeme Watt: Database Manager (2013-)

• Mike Whalley: Previous Database Manager (1982-2016)

• Joanne Bentham: Administrative Assistant (1995-)

• Frank Krauss: Principal Investigator (2014-)

3

Close collaboration with Open Access group at CERN (2014-)

Graeme Watt Mike Whalley Joanne Bentham Frank Krauss

Page 4: watt hepdata dphep mar2017 · 2018. 11. 20. · G. Watt What is HEPData? • Unique open-access repository for scattering data from more than 8000 experimental HEP (“hep-ex”)

G. Watt

HEPData redevelopment

4

hepdata.cedar.ac.uk

hepdata.net

• Originally developed by Andy Buckley and Mike Whalley: “HepData reloaded: reinventing the HEP data archive”, PoS ACAT2010 (2010) 067 [arXiv:1006.0517].

• Legacy site, hosted in Durham, soon to be switched off.

• Lead developer: Eamonn Maguire (CERN, 03/2015-10/2016).

• Overlay on Invenio v3 digital library software.

• Production site, hosted on CERN OpenStack infrastructure.

Code: https://github.com/HEPData

Page 5: watt hepdata dphep mar2017 · 2018. 11. 20. · G. Watt What is HEPData? • Unique open-access repository for scattering data from more than 8000 experimental HEP (“hep-ex”)

G. Watt

Old data “input” format

5

http://hepdata.cedar.ac.uk/view/ins1203852/d2/input

*dataset:*location: Page 20 of preprint*dscomment: The measured total cross sections. The first systematic uncertainty is the combined systematic uncertainty excluding luminosity, the second is the luminosity*reackey: P P --> Z0 Z0 X*obskey: SIG*qual: RE : P P --> Z0 Z0 X*yheader: SIG(total) IN FB*xheader: SQRT(S) IN GEV *data: x : y 7000; 6.7 +- 0.7 (DSYS=+0.4,-0.3,DSYS=0.3:lumi);*dataend:

• All data needed to be transformed into a standard “input” text format.

• Metadata for each table followed by data points in a structured format.

• *reackey and *obskey keywords can be used for searching database.

• Flexible table format: any number of rows and ‘x’ or ‘y’ columns possible.

• Limited support to attach and link auxiliary files stored in a web directory.

Page 6: watt hepdata dphep mar2017 · 2018. 11. 20. · G. Watt What is HEPData? • Unique open-access repository for scattering data from more than 8000 experimental HEP (“hep-ex”)

G. Watt

New YAML text format

6

---name: 'Table 2'label: 'Data from Page 20 of preprint'description: | The measured total cross sections. The first systematic uncertainty is the combined systematic uncertainty excluding luminosity, the second is the luminosity.keywords: - {name: reactions, values: ['P P --> Z0 Z0 X']} - {name: observables, values: [‘SIG']} - {name: phrases, values: ['Inclusive', 'Integrated Cross Section', 'Cross Section', 'Proton-Proton Scattering', 'Z Production', 'Z pair Production']} - {name: cmenergies, values: [7000.0]}additional_resources:independent_variables: - header: {name: 'SQRT(S)', units: 'GEV'} values: - {value: 7000}dependent_variables: - header: {name: 'SIG(total)', units: 'FB'} qualifiers: - {name: 'RE', value: 'P P --> Z0 Z0 X'} values: - value: 6.7 errors: - {symerror: 0.7, label: 'stat'} - {asymerror: {plus: 0.4, minus: -0.3}, label: 'sys'} - {symerror: 0.3, label: 'sys,lumi'}

• Data from existing HepData database was exported to the new YAML format for migration to the new system.

http://hepdata.cedar.ac.uk/view/ins1203852/d2/yaml

https://hepdata.net/record/ins1203852

New keywords

Auxiliary files

Page 7: watt hepdata dphep mar2017 · 2018. 11. 20. · G. Watt What is HEPData? • Unique open-access repository for scattering data from more than 8000 experimental HEP (“hep-ex”)

G. Watt

Example table on old site

• Automatic plots and export to various output formats.7

Page 8: watt hepdata dphep mar2017 · 2018. 11. 20. · G. Watt What is HEPData? • Unique open-access repository for scattering data from more than 8000 experimental HEP (“hep-ex”)

G. Watt

Example table on new site

8

Page 9: watt hepdata dphep mar2017 · 2018. 11. 20. · G. Watt What is HEPData? • Unique open-access repository for scattering data from more than 8000 experimental HEP (“hep-ex”)

G. Watt

Data output formats

• YAML: native HEPData format.

• CSV: comma-separated values.

• JSON: JavaScript Object Notation.

• ROOT: binary .root file, not CINT script.

• YODA: for inclusion in a Rivet analysis.

9

https://hepdata.net/record/ins1283842?format=yaml&table=Table1&version=1

•format={csv,json,root,yaml,yoda}

•The table or version can be omitted.

•Search results can also be returned as JSON.

Page 10: watt hepdata dphep mar2017 · 2018. 11. 20. · G. Watt What is HEPData? • Unique open-access repository for scattering data from more than 8000 experimental HEP (“hep-ex”)

G. Watt

Discoverability• Powerful search and faceting provided by Elasticsearch.

10https://hepdata.net/search/?q="dark+matter"&collaboration=ATLAS

Page 11: watt hepdata dphep mar2017 · 2018. 11. 20. · G. Watt What is HEPData? • Unique open-access repository for scattering data from more than 8000 experimental HEP (“hep-ex”)

G. Watt

DOIs and versioning

• DOIs minted for whole record and each table.11

Page 12: watt hepdata dphep mar2017 · 2018. 11. 20. · G. Watt What is HEPData? • Unique open-access repository for scattering data from more than 8000 experimental HEP (“hep-ex”)

G. Watt

Modes of data entry1. Manually harvested from “hep-ex” papers.

Assistant extracts tables from .tex source files.

2. Data directly submitted by experiments.

• Pre-2014: no guidelines on preferred format.

• Early 2014: encourage standard “input” format.

• Late 2014: introduce online submission system.

• Early 2017: allow submissions from hepdata.net.

12Clearly need to move more papers from 1. to 2.

Page 13: watt hepdata dphep mar2017 · 2018. 11. 20. · G. Watt What is HEPData? • Unique open-access repository for scattering data from more than 8000 experimental HEP (“hep-ex”)

G. Watt

(Old) submission system usage

13

• Total of 347 papers: ATLAS (124), CMS (100), ALICE (71), LHCb (21).

Page 14: watt hepdata dphep mar2017 · 2018. 11. 20. · G. Watt What is HEPData? • Unique open-access repository for scattering data from more than 8000 experimental HEP (“hep-ex”)

G. Watt

New submission system

14• Also a Sandbox for any user to test uploads without special privileges.

https://hepdata.net/submission

Page 15: watt hepdata dphep mar2017 · 2018. 11. 20. · G. Watt What is HEPData? • Unique open-access repository for scattering data from more than 8000 experimental HEP (“hep-ex”)

G. Watt

HEPData Coordinators

• ATLAS/CMS split into physics groups.15

https://hepdata.net/permissions/coordinators

Page 16: watt hepdata dphep mar2017 · 2018. 11. 20. · G. Watt What is HEPData? • Unique open-access repository for scattering data from more than 8000 experimental HEP (“hep-ex”)

G. Watt

Summary and Outlook• HEPData is a unique repository for publication-related HEP

data: “Level 1” according to the DPHEP classification.

• Transition to rewritten HEPData site (hepdata.net) hosted at CERN. Many improvements and new features.

• Submission system widely used by four main LHC experiments. Non-LHC experiments welcome to join and upload their data.

• Future plans:

• Broaden scope to include data from particle decays, neutrino experiments, low-energy data relevant for tuning Geant4.

• Expand input formats (e.g. mixed YAML/ROOT, HistFactory).

• More details: https://github.com/HEPData/chep2016-paper16