IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US, CN-AS, CN-ASD, IT-ASD, IT-DB, IT-GD, ¿¿-??, …

IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,

Embed Size (px)

Citation preview

Page 1: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


The Evolution of Databases in HEPA Time-Traveller's Tale

[ With an intermezzo covering LHC++ ]

Jamie Shiers, CERN

~ ~ ~…, DD-CO, DD-US, CN-AS, CN-ASD, IT-ASD, IT-DB, IT-GD, ¿¿-??, …

Page 2: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in




The past decade has been an era of sometimes tumultuous change in the area of Computing for High Energy Physics.

This talk addresses the evolution of databases in HEP, starting from the LEP era and the visions presented during the CHEP 92 panel "Databases for High Energy Physics" (D. Baden, B. Linder, R. Mount, J. Shiers).

It then reviews the rise and fall of Object Databases as a "one size fits all solution" in the mid to late 90's and finally summarises the more pragmatic approaches that are being taken in the final stages of preparation for LHC data taking.

The various successes and failures (depending on one's viewpoint) regarding database deployment during this period are discussed, culminating in the current status of database deployment for the Worldwide LCG.

Page 3: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



Once open a time…

Databases for HEP Panel at CHEP 1992:

1. Should we buy or build database systems for our calibration and book-keeping needs?

2. Will database technology advance sufficiently in the next 8 to 10 years to be able to provide byte-level access to petabytes of SSC/LHC data?

Drew Baden, University of Maryland (PASS Project);B. Linder, Oracle Corporation;Richard Mount, Caltech;Jamie Shiers, CERN.

Page 4: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in


PBuy or build? – Calibration &


1. Is it technically possible to use a commercial system?

2. Would it be manageable administratively and financially?

Questions already investigated during LEP planning phase

Computer Physics Communications 45 (1987) 299-310

“Possibility” in 1984 (technically but not Q2);

“Probability” in 1992 (technically and conceivably also Q2).

Page 5: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in


PCalibration & Bookkeeping in


Many experiments had independently developed solutions to these problems, with largely overlapping functionality

Two (of the) efforts at producing common solutions:

1. FATMEN – file catalog and more (used by DELPHI, L3, OPAL etc. );2. HEPDB – calibration database based on DBL3 and OPCAL

Both of these were based on ZEBRA-RZ ZEBRA-FZ for “updates”, with zftp/zserv for distribution

FATMEN also had an Oracle backend Later dropped…

Page 6: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



Arguments - 1992

A significant amount of HEP-specific code would need to be added – roughly comparable in size (within a factor of 2) of existing home-grown solutions

Commercial systems require well trained support staff at every site. (Home-grown too, but experience shows that this is typically much less than for commercial solutions.)

Licensing and support for large, diverse HEP collaborations clearly a concern

We shall revisit these questions later in the show…

Page 7: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in


P¿Use of existing packages for

SSC/LHC? Could / should Zebra be used in 2001?

Hope to exploit language features of Fortran90 (?) and/or C++

File catalog for multi-PB of data? Move towards nameserver approach (LFC = castor ns)

Existing data access and management tools? Away from sequential access…

Where do we store the data?

Actually in a database; Assumed for metadata; less likely for data itself

In some ‘super filesystem’ – e.g. based on IEEE MSS reference model

Page 8: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



Zebra RZ and Scalability Issues

Some fields within RZ were limited to 16 bits

Fine for storage of HBOOK histograms etc but a limitation for larger catalogs

Move from 16-bit fields to 32-bit (Sunanda Banerjee)

A disconcertingly recurrent theme over the next decade…

(Also introduction of ‘eXchange’ format – makes zftp redundant in favour of binary ftp) (Burkhard Holl)

Page 9: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in


PCHEP 92 – the Birth of OO in

HEP? Wide-ranging discussions on the future of s/w development in


A number of proposals presenting – including ZOO (I preferred “OOZE”) –

leading to:

RD41 – MOOSE The applicability of OO to offline particle physics code

RD44 – GEANT4 Produce a global object-oriented analysis and design of an

improved GEANT simulation toolkit for HEP RD45 – A Persistent Object Manager for HEP

(and later also LHC++ (subsequently ANAPHE))


Page 10: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,

CERN - Computing Challenges 12

LHC++ - a “C++ CERNLIB”

Jamie ShiersIT/ASD group, CERN


Page 11: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,

CERN - Computing Challenges 14

Background - CERNLIB

Widely used, Fortran-based library Developed at CERN over ~30 years

“Natural” evolution to Fortran 90/95/2K Significant effort spent on F90 preparation

Circa 1993: strong growth in OO (C++)Decision to drop plans for “F90 CERNLIB”Effort redirected towards OO solutions

Page 12: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,

CERN - Computing Challenges 15

Why C++? Why LHC++?

“The equivalent of the current CERN program library” (ATLAS TP)

“… only language that meets these requirements is C++” (CMS TP)

Decision of NA45 to move to C++Needs of R&D projects (RD41, 44, 45)Working group setup mid-95

Provide CERNLIB-like functionality in C++

Libraries for HEP Computing

Page 13: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,

CERN - Computing Challenges 16

Goals of Working Group

Re-use existing work where possible• e.g. CLHEP

Use standard solutions where appropriate• de-facto or de-jure: STL, OpenGL, NAG C, ODMG, ...

Investigate commercial components• including license / distribution issues / pros & cons

Formal User Requirements gathering step• following ESA PSS-05 guidelines

Be prepared for change!• LHC lifetime is extremely long

Page 14: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,

CERN - Computing Challenges 18




Pythia, Herwig, ...


C++ , STL, CLHEPNAG C libraryApplications

GEMINIHistograms (HTL)

OpenGL, OpenInventor,Event Generators

Pythia 7, ...

GEANT4Components & Modules

Page 15: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,

CERN - Computing Challenges 19

Status (Oct. 98)[ LHC++ ] has achieved the goal of

demonstrating that ... standard tools ... could solve typical HEP problems.~2-3K lines of code + commercial components provide a concrete CERNLIBCERNLIB replacement.

Several experiments (CMS, NA45, ...) heavily using LHC++ components

~100 sites registered for access• ~120 predicted in 1998 COCOTIME review• On track for 150-200 by end-1999

Page 16: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,

CERN - Computing Challenges 20


Emphasis on modularity of approach Can (have) replace(d) individual



Vendor STL

Page 17: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,

CERN - Computing Challenges 30

Software Migration

Strategy developed assuming migrationStrategy developed assuming migration Has already taken place! RogueWaveRogueWave to STLSTL; OpenGLOpenGL implementations HEP-specific code, e.g. histOOgrams histOOgrams to HTLHTL Future migrations: OpenInventorOpenInventor to FahrenheitFahrenheit?

C++C++ to JavaJava? JavaJava to ???Lifetime of future experiments >>

individual s/w packages!Wylbur to now ~ now to end-LHC

Page 18: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,

LHC++ Report - LCB 10 March 1999 34

Referee Report on LHC++

LCB 10 March 1999

J. Knobloch


Page 19: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,

LHC++ Report - LCB 10 March 1999 35

December 98 - Outstanding issues

• Alternatives to HEP Explorer should be investigated - growing interest in Java-based tools.

• There is interest in histogram storage without an OO DB

• Availability issues (frequency of releases, platforms and some licensing issues) remain to be finalised.

• Improvement in end-user training - slow outside CERN.

• CLHEP issues: releases coupled with rest of LHC++ or not? Need for a policy on further development.

• All experiments participating in LHC++ should define their milestones.

• Requirements and solutions for scripting should be evaluated and implemented.

• Information should be provided on LHC++ usage.

• Release strategy and priority setting should be looked at again, perhaps best through setting up an executive board.

Page 20: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,

LHC++ Report - LCB 10 March 1999 36

Asked the 4 Experiments

• Current usage of LHC++• Position on the various

components Priorities Concerns Activities

• Received answers from all 4 experiments

• ALICE repeats general rejection of LHC++ expressed at the last LCB:

We have briefly evaluated LHC++ and we were not very satisfied.

At the moment we are using ROOT and we are content with it. Currently we have no plans regarding LHC++

We will consider very seriously any solution that is adopted by the other three experiments as a common solution

We believe that ROOT can be such a common solution and we encourage the other experiments to consider it as such

Page 21: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,

LHC++ Report - LCB 10 March 1999 38

Proposal of conclusions

• Major progress has been made Meetings, new personnel

• Plan to establish an alternative to Explorer by something more adapted to the user requirements

• Improve the HEP specific documentation of commercial components

• Document the overall architecture and dependencies

• Clarify the boundaries of LHC++• Items from previous LCBs:

Scripting Usage statistics

• LHC++ is now moving from the development phase to the production phase. It is therefore appropriate to revisit the management of the project by establishing a “LHC++ Executive Board” composed of the LHC++ project leader and representatives of the LHC experiments

Discussing development and release plans

Setting priorities

Page 22: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,

CERN - Computing Challenges 39

Now back to our main thread…

We left off back at CHEP ’92… (The birth of OO in HEP?)

Page 23: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



RD45 – Initial Milestones

[The project ] should be approved for an initial period of one year. The following milestones should be reached by the end of the 1st year.

1. A requirements specification for the management of persistent objects typical of HEP data together with criteria for evaluating potential implementations. [ Later dropped – experiments far from ready ]

2. An evaluation of the suitability of ODMG's Object Definition Language for specifying an object model describing HEP event data.

3. Starting from such a model, the development of a prototype using commercial ODBMSes that conform to the ODMG standard. The functionality and performance of the ODBMSes should be evaluated.

It should be noted that the milestones concentrate on event

data. Studies or prototypes based on other HEP data should not be excluded, especially if they are valuable to gain experience in the initial months.

Page 24: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



RD45 – Initial Steps

Contacts with the main ODBMS vendors of that time O2, ObjectStore, Objectivity, Versant, Poet, …

Many presentations scheduled at CERN

Training on O2, Objectivity/DB, a few licenses acquired…

Prototyping focussed on Objectivity/DB with Versant (later) as primary fall-back

Scalability of architecture

Page 25: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



Objectivity/DB Scalability

Item Limit

Page size 216 bytes

# DB/Federated DB 216 - 1

# containers/DB 215 - 1

# logical pages/cont 216 - 1

Maximum DB size File-system limit

216 databases (files) of 100GB = 6.5PB

CERN has requested extended OID

16 bits again!

Page 26: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



LCRB review, March 1996

The RD45 project has made excellent progress in identifying and applying solutions for object persistence for HEP based on standards and commercial products

RD45 should be approved for a further year The LCRB agrees with the program of future work outlined in the

RD45 status report and regards the following activities (below) and milestones (next) as particularly important:

Provide the object persistence services needed for the first release of GEANT4 in early 1997

Collaborate with ATLAS and CMS in the development of those aspects of the Computing Technical Proposals which may be affected by the nature of object persistence services

Page 27: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



RD45 Milestones - 96

1. Identify and analyse the impact of using an ODBMS for event data on the Object Model, the physical organisation of the data, coding guidelines and the use of third party class libraries

2. Investigate and report on ways that Objectivity/DB features for replication, schema evolution and object versions can be used to solve data management problems typical of the HEP environment

3. Make an evaluation of the effectiveness of an ODBMS and MSS as the query and access method for physics analysis. The evaluation should include performance comparisons with PAW and Ntuples

Page 28: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



RD45 Milestones - 97

1. Demonstrate, by the end of 1997, the proof of principle that an ODBMS can satisfy the key requirements of typical production scenarios (e.g. event simulation and reconstruction), for data volumes up to 1TB. The key requirements will be defined, in conjunction with the LHC experiments, as part of this work,

2. Demonstrate the feasibility of using an ODBMS + MSS for Central Data Recording, at data rates sufficient to support ATLAS and CMS test-beam activities during 1997 and NA45 during their 1998 run,

3. Investigate and report on the impact of using an ODBMS for event data on end-users, including issues related to private and semi-private schema and collections, in typical scenarios including simulation, (re-)reconstruction and analysis.

Page 29: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



RD45 Milestones - 98

1. Provide, together with the IT/PDP group, production data management services based on Objectivity/DB and HPSS with sufficient capacity to solve the requirements of ATLAS and CMS test beam and simulation needs, COMPASS and NA45 tests for their '99 data taking runs.

2. Develop and provide appropriate database administration tools, (meta-)data browsers and data import/export facilities, as required for (1).

3. Develop and provide production versions of the HepOODBMS class libraries, including reference and end-user guides.

4. Continue R&D, based on input and use cases from the LHC collaborations to produce results in time for the next versions of the collaborations' Computing Technical Proposals (end 1999).

Page 30: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



Toward the 2001 Milestone“Choice of ODBMS vendor”

“If the ODBMS industry flourishes it is very likely that by 2005 CMS will be able to obtain products, embodying thousands of man-years of work, that are well matched to its worldwide data management and access needs. The cost of such products to CMS will be equivalent to at most a few man-years. We believe that the ODBMS industry and the corresponding market are likely to flourish. However, if this is not the case, a decision will have to be made in approximately the year 2000 to devote some tens of man-years of effort to the development of a less satisfactory data management system for the LHC experiments.”

(CMS Computing Technical Proposal, section 3.2, page 22)And the rest, as they say, is History…

Page 31: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,

CERN - Computing Challenges 48

Risk Analysis: Issues

Choice of TechnologyODBMS, ORDBMS, RDBMS, “light-weight” Persistency, files + meta-data, ...

Choice of Vendor (historically)#1 Objectivity, #2 Versant

Size of marketDid not take off as anticipated; unlikely to grow significantly in short-medium term

Page 32: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,

CERN - Computing Challenges 49

Risk Analysis and Actions

ODBMS market has not grown as predictedNeed to understand alternatives

Possibilities include: “Open Source” (?) ODBMS solution ORDBMS-based solution (also for event data) “Hybrid solutions”, incl. Meta-data + files

RD45 investigating & directly Based on experience at FNAL / BNL ...

Essential to consider all requirementsAnd not just file I/O…

Page 33: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,

CERN - Computing Challenges 50


Espresso is a proof-of-concept prototype built to answer questions from Risk Analysis

Could we build an alternative to Objectivity/DB?How much manpower would be required?Can we overcome limitations of Objy’s current architecture?

Support for VLDBs, multi-FD work-arounds etc.

Test / validate import architectural choices

Page 34: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,

CERN - Computing Challenges 52

Espresso – Lessons Learnt

Initial prototype suggests that building a full ODBMS is technically feasible

Discussions with other sites suggest that interest goes well beyond HEP

Manpower estimates / possible resources indicate “project” would have to start “soon”

2002: 3-year project with full system end-2004

Page 35: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



ODBMS – In Retrospect Used – in production – by several experiments at CERN, SLAC and

other labs for a total of a few PB of data for just under a decade

It was – for some extended period – the baseline of ATLAS and CMS

Enhancements to the product obtained (with some effort)

MSS interface (actually much more general xrootd); Linux port VLDB support (ODMG compliance)

Much experience with ODBMS-like solutions obtained

“Risk analysis” clearly identified need for fallback, proof-of-concept prototype (Espresso) and eventually full solution (POOL)

Migration to Oracle+DATE (COMPASS) / POOL (LHC expts) successfully reported on a CHEP 2004 (scale of several hundred TB)

Page 36: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,

CERN - Computing Challenges 54

The Story So Far…

• 1992: CHEP – DB panel, CLHEP K/O, CVS …

• 1994: start of OO projects

• 1997: proposal of ODBMS+MSS; BaBar

• 2001: CMS change of baseline Objy

• Now: LCG Persistency Framework RTAG– Resulted in POOL project…

Page 37: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,

CERN - Computing Challenges 55

15. Observations – IT “Eloise” Retreat 2000

• Large volume event data storage and retrieval is a complex problem that the particle physics community has had to face for decades.

• The LHC data presents a particularly acute problem in the cataloguing and sparse retrieval domains, as the number of recorded events is very large and the signal to background ratios are very small. All currently proposed solutions involve the use of a database in one way or another.

• A satisfactory solution has been developed over the last years based on a modular interface complying with the ODMG standard, including C++ binding, and the Objectivity/DB object database product.

• The pure object database market has not had strong growth and the user and provider communities have expressed concerns. The “Espresso” software design and partial implementation, performed by the RD-45 collaboration, has provided an estimate of 15 person-years of qualified software engineers for development of an adequate solution using the same modular interface. This activity has completed, resulting in the recent snapshot release of the Espresso proof-of-concept prototype. No further development or support of this prototype is foreseen by DB group.

• Major relational database vendors have announced support for Object-Relational databases, including C++ bindings.

• Potentially this could fulfil the requirements for physics data persistency using a mainstream product from an established company.

• CERN already runs a large Oracle relational database service

Page 38: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,

CERN - Computing Challenges 56


• The conclusion of the Espresso project, that a HEP-developed object database solution for the storage of event data would require more resources than available, should be announced to the user community.

• The possibility of a joint project between Oracle and CERN should be explored to allow participation in the Oracle 9i beta test with the goals of evaluating this product as a potential fallback solution and providing timely feedback on physics-style requirements. Non-staff human resources should be identified such that there is no impact on current production services for Oracle and Objectivity.

Fellow, later also openlab resources

Page 39: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



Oracle for Physics Data

Work on LHC Computing started ~1992 (some would say earlier…)

Numerous projects kicked off 1994/5 to look at handling multi-PB of data; move from Fortran to OO (C++) etc.

Led to production solutions from ~1997

Always said that ‘disruptive technology’, like Web, would Always said that ‘disruptive technology’, like Web, would have to be taken into accounthave to be taken into account

In 2002, major project started to move 350TB of data out of ODBMS solution; >100MB/s for 24 hour periods

Now ~2TB of physics data stored in Oracle on Linux servers A few % of total data volume; expected to double in 2004 [ I guess its 10 x this by now? ]

Oracle 10g launch, ZH

Page 40: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



LCG and Oracle

Current thinking is that bulk data will be streamed to [ ROOT ] files RDBMS backend also being studied for ‘analysis data’

File catalog (109 files) and file-level metadata will be stored in Oracle in a Grid-aware catalog

[ This was the “RLS” family ]

In longer term, event level metadata may also be stored in the database, leading to much larger data volumes

A few PB, assuming total data volume of 100-200PB [ This was probably an over estimate – TAG : RAW ratio? ]

Current storage management system – CASTOR at CERN – also uses a [Oracle] database to manage the naming / location of files

Bulk data stored in tape silos and faulted in to huge disk caches

Oracle 10g launch, ZH

Page 41: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,

6060Джейми Шиерс Ноябрь 2004 г.Научные и корпоративные grid-


Физические изыскания - перспективы

• Реинжениринг всех сервисов СУБД для физической науки на базе

Oracle 10g RACOracle 10g RAC

• Цели:– Изолирование – ‘сервисы’ 10g и / или физическое разделение– Масштабируемость - как для вычислительной мощности

для обработки БД, так и для устройств хранения– Надежность – автоматический обход сбоя в случае проблем– Управляемость – упрощение процессов администрирования

• Вернемся к этому вопросу позже, в разделе ‘Enterprise Grids’ …

Page 42: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



Physics Activities - Futures

Re-engineering all DB services for Physics on

Oracle 10g RACOracle 10g RAC

Goals are: Isolation – 10g ‘services’ and / or physical separation Scalability - in both database processing power and

storage Reliability – automatic failover in case of problems Manageability – significantly easier to administer than now

Will revisit this under ‘Enterprise Grids’ later…

Oracle Grid Tech. day, Moscow

Page 43: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



CERN & Oracle Share a common vision regarding the future of high

performance computing Wide spread use of commodity dual processor PCs running Linux; Focus on Grid computing

CERN has managed to influence Oracle product

Oracle 10g features:

Support for native IEEE floats & doubles;

Support for “Ultra large” Databases (ULDB); 16 bit fields again!

Cross-platform transportable tablespaces;

Instant-client developer etc.

Oracle Grid Tech. day, Moscow

Page 44: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



LHC DB Applications

Clear that many “LHC construction / operations applications will use Oracle

This is true also for detector construction / monitoring / calibration applications

But the main change is closer to “physics applications”

There was no “general purpose” DB service for the physics community at the time of LEP

Some applications certainly (OPAL online tape DB, …)

But these are legion at the time of LHC… See hidden slides for some more info…


Page 45: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



2D Workshop - Background

A number of discussions, initially with LHC online community, revealed significant number of new database applications, and developers, in the pipeline

Meeting in October 2004 with offline and online representatives from all 4 LHC experiments led to proposal of Database Developers’ Workshop

This is it!

Significant amount of preparation by many people

Please profit from it – and not the wireless network

Silencing your mobile phone would be much appreciated…

[ A free Tom Kyte book was given to all attendees]

Page 46: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



(Workshop) Goals & Motivation

We are (finally) very close to LHC startup

Many new projects coming along

There is simply no time to waste!

Non-optimal (DB) applications are very expensive to ‘fix’ a posteriori

Try to establish some basic ‘best practice’ guidelines

And also well defined dev integration production path

Page 47: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



WLCG and Database Services Many ‘middleware’ components require a database:

dCache – PostgreSQL (CNAF porting to Oracle?) CASTOR / DPM / FTS* / LFC / VOMS – Oracle or MySQL Some MySQL only: RB, R-GMA#, SFT#

Most of these fall into the ‘Critical’ or ‘High’ category at Tier0 See definitions below; T0 = C/H, T1 = H/M, T2 = M/L

Implicit requirement for ‘high-ish service level’ (to avoid using a phrase such as H/A…)

At this level, no current need beyond site-local+ services Which may include RAC and / or DataGuard [ TBD together with service provider ]

Expected at AA & VO levels

*gLite 1.4 end October #Oracle version foreseen +R/O copies of LHCb FC?

Page 48: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



Those Questions again…

1. Should we buy or build database systems for our calibration and book-keeping needs?

It now seems to be accepted that we build our calibration & book-keeping systems on top of a database system.

Both commercial and open-source databases are supported.

2. Will database technology advance sufficiently in the next 8 to 10 years to be able to provide byte-level access to petabytes of SSC/LHC data?

We (HEP) have run production database services up to the PB level. The issues related to licensing, and – perhaps more importantly – support, to cover the full range of institutes participating in an LHC experiment, remain.

Risk analysis suggests a more cautious – and conservative – approach, such as that currently adopted. (Who are today the concrete alternatives to the market leader?)

Page 49: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



If you want to know more…

Visit http://hepdb.blogspot.com/

And also:

http://wwwasd.web.cern.ch/wwwasd/cernlib/rd45/ http://wwwasd.web.cern.ch/wwwasd/lhc++/indexold.html http://hep-proj-database.web.cern.ch/hep-proj-database/

Page 50: IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale [ With an intermezzo covering LHC++ ] Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US,


e E






es in



General Conclusions

“The past decade has been an era of sometimes tumultuous change in the area of Computing for High Energy Physics.”

This is certainly true – virtually no stone has been left unturned

Could we have got where we are now more directly?

With hindsight this might seem clear, but…

“Predicting is always very difficult. Especially predicting the future.”– Niels Bohr