Pierre Dubath SDC-CH Department of Astronomy University of...

Preview:

Citation preview

Oct. 20th, 2016 Euclid data processing challenges 1

EuclidConsortium

The Euclid ChallengesThe Euclid Challenges

Pierre DubathPierre DubathSDC-CHSDC-CH

Department of AstronomyDepartment of AstronomyUniversity of GenevaUniversity of Geneva

IAU Symposium 325IAU Symposium 325

AstroinformaticsAstroinformaticsSorrento (Italy), October 20-24, 2016Sorrento (Italy), October 20-24, 2016

Oct. 20th, 2016 Euclid data processing challenges 2

EuclidConsortiumList of Collaborators

Nikolaos Apostolakos, Andrea Bonchi, Andrey Belikov,Massimo Brescia, Peter Capak, Jean Coupon, ChristopheDabin, Hubert Degaudenzi, Shantanu Desai, Florian Dubath,Adriano Fontana, Sotiria Fotopoulou, Marco Frailis, AudreyGalametz, Catherine Grenet, John Hoar, Mark Holliman, BenHoyle, Olivier Ilbert, Martin Kuemmel, Clotilde Laigle,Giuseppe Longo, Henry Joy McCracken, Martin Melchior,Yannick Mellier, Joe Mohr, Nicolas Morisset, StéphanePaltani, Roser Pello, Stefano Pilo, Gianluca Polenta, MauricePoncet, Roberto Saglia, Mara Salvato, Marc Sauvage, MarcSchefer, Marco Scodeggio, Stella Seitz, Santiago Serrano,Marco Soldati, Andrea Tramacere, Rees Williams, AndreaZacchei, etc.

Oct. 20th, 2016 Euclid data processing challenges 3

EuclidConsortiumOutline

1.Science and mission overview

2.Instruments and data analysis

3.Software development

4.Software integration and operation preparation

5.Swiss Science Data Center (SDC-CH) major tasks

● This presentation ● targets a non-Euclid audience● Focus on data processing aspects of the Euclid mission

Oct. 20th, 2016 Euclid data processing challenges 4

EuclidConsortiumAn Ever Expanding Universe ?

Physics Nobel Price 2011Physics Nobel Price 2011

Discovery of the accelerated expansion Discovery of the accelerated expansion of the Universe through distantof the Universe through distantsupernovae observationssupernovae observations

Perlmutter, Schmidt and RiessPerlmutter, Schmidt and Riess

Oct. 20th, 2016 Euclid data processing challenges 5

EuclidConsortiumThe Euclid mission main goal

● What is the Nature of the Dark Matter and Energy?

68%68%

27%27%

5%5%

Oct. 20th, 2016 Euclid data processing challenges 6

EuclidConsortiumThe Euclid mission

● ESA medium scientific Cosmology mission selected in 2011

● Soyuz launch from Kourou to L2 in 2020 and 6 year mission● Survey of 15'000 square degrees : Optical and NIR images

and NIR spectra → shape and distance measurements ofbillions of galaxies

● Constraints on cosmology models from different types ofmeasurements (or probes):

– Gravitational (strong and weak) lensing– Baryonic Acoustic Oscillation (BAO)– Integrated Sachs-Wolfe (ISW) effect (galaxy clusters)– Redshift-space distortions (Kaiser effect)

Oct. 20th, 2016 Euclid data processing challenges 7

EuclidConsortium

Weak lensing illustration

Masses bend light paths !

Oct. 20th, 2016 Euclid data processing challenges 8

EuclidConsortium

Oct. 20th, 2016 Euclid data processing challenges 9

EuclidConsortium3D dark matter map, COSMOS field

NASA, ESA, R. Massey (California Institute of Technology).

Oct. 20th, 2016 Euclid data processing challenges 10

EuclidConsortiumBaryonic Acoustic Oscillation

Oct. 20th, 2016 Euclid data processing challenges 11

EuclidConsortiumStrong Lensing

Euclid will lead to the detection of very large numbers of strong lensesat cluster and galaxy scales

Beautiful images...but, only Euclid legacy science !

Oct. 20th, 2016 Euclid data processing challenges 12

EuclidConsortiumThe Euclid Spacecraft

1.2m Korsch SiliconCarbide primary mirror

Oct. 20th, 2016 Euclid data processing challenges 13

EuclidConsortium

Oct. 20th, 2016 Euclid data processing challenges 14

EuclidConsortium… and ground photometry for PHZ!

Dark Energy Survey (DES), Kilo-Degree Survey(KiDS), LSST (?),Javalahambre/Spain, Subaru/Japan (?), CFHT/Canada

Oct. 20th, 2016 Euclid data processing challenges 15

EuclidConsortiumProcessing budget

2021 2022 2023 2024 2025 2026 2027

Storage (PB) 15 30 50 60 75 90 90 Computing (kilo cores / year)

2.5 5 8.5 12 16 20 21

Numbers from Christophe Dabin @ tk1

Oct. 20th, 2016 Euclid data processing challenges 16

EuclidConsortium

Level 1

Level S

SIMVIS EXTNIR SIR

Level E

LE1

OPS

MOC

GroundStation

MER

SPE SHEPHZ

LE3

SOC

Level 2

Level 3

Processing functional break down

● SIM : simulated dataSIM : simulated data

● VIS : visible calibrated framesVIS : visible calibrated frames● NIR : near IR calibrated framesNIR : near IR calibrated frames● SIR : calibrated 1-D spectraSIR : calibrated 1-D spectra● EXT : calibrated ground framesEXT : calibrated ground frames● MER : catalog with consistentMER : catalog with consistent

photometry and spectroscopyphotometry and spectroscopy● SPE : spectroscopic redshiftsSPE : spectroscopic redshifts● PHZ : photometric redshiftsPHZ : photometric redshifts● SHE : shape measurements SHE : shape measurements ● LE3 : high-level processingLE3 : high-level processing

Oct. 20th, 2016 Euclid data processing challenges 17

EuclidConsortiumEuclid data flow

Oct. 20th, 2016 Euclid data processing challenges 18

EuclidConsortiumEuclid SGS organization

Oct. 20th, 2016 Euclid data processing challenges 19

EuclidConsortiumEuclid SGS organization

OU task : Algorithms specification & validationOU task : Algorithms specification & validation

Oct. 20th, 2016 Euclid data processing challenges 20

EuclidConsortiumEuclid SGS organization

SDC task : Software development and Data processingSDC task : Software development and Data processing

OU task : Algorithms specification & validationOU task : Algorithms specification & validation

Oct. 20th, 2016 Euclid data processing challenges 21

EuclidConsortiumSoftware Development

● C++ and Python languages

● One reference platform– Linux from the Red Hat family (currently CentOS7)– Set of common libraries (EDEN)

● Software development on a virtual machine (LODEEN)

● RPM packaging

● XML-based common data model

● A common building and packaging framework

Oct. 20th, 2016 Euclid data processing challenges 22

EuclidConsortiumElements framework

Elements is a Cmake-based building and packagingframework (capitalizing on CERN expertise) featuring :

● a standard source code structure● easy software building according to CMakeLists.txt

instructions● automated RPM packaging (make rpm)● basic services, such as program option handling and

logging

Oct. 20th, 2016 Euclid data processing challenges 23

EuclidConsortiumProjects (Elements Framework)

Oct. 20th, 2016 Euclid data processing challenges 24

EuclidConsortiumDistributed data processing

● 10+ SDCs involved

● Central metadata database

● Data centric approach: softwareruns were the required data hasbeen shipped

● In each SDC● Distributed processing

management tools● Computing infrastructure for

– processing– storage

EuclidEuclidMetadataMetadataData BaseData Base

SDC SDC Processing &Processing &Local ArchiveLocal Archive

SDC SDC Processing &Processing &Local ArchiveLocal Archive

SDC SDC Processing &Processing &Local ArchiveLocal Archive

SDCSDCProcessing &Processing &Local ArchiveLocal Archive

SDC SDC Processing &Processing &Local ArchiveLocal Archive

SDC SDC Processing &Processing &Local ArchiveLocal Archive

SOCSOCProcessing &Processing &Local ArchiveLocal Archive

Euclid Archive SystemData ProductsMetadata UpdatesMetadata Queries

Oct. 20th, 2016 Euclid data processing challenges 25

EuclidConsortium

Science Archive Meta-Data Storage

Distributed Processing Infrastructure

SDC zSDC z

Data Storage

FileXML

Computing Infrastructure

InfrastructureAbstractionLayer (IAL)

Processing Control (Processing Order Definition)

Software Continuous Integration and Deployment (CernVM FS)

Monitoring(Icinga)

Euclid Archive SystemEuclid Archive System

SDC ySDC ySDC xSDC x

……

Oct. 20th, 2016 Euclid data processing challenges 26

EuclidConsortiumInfrastructure Abstraction Layer

Meta Scheduler

Pipeline Run Server

Creates and traverses data fow graphSubmits and monitors HPC jobs

IALDRM

WorkSpace

Submission Host, HPCIAL Host

Data StoragePolls Processing OrdersFetches inputs from EAS and prepares workspaceIngests outputs into EAS

MetadataData Base

Queuing System

Compute Nodes

Contains all inputs,outputs, intermediarydata for pipeline runs.

SDCProcessingOrder

Definition

Oct. 20th, 2016 Euclid data processing challenges 27

EuclidConsortiumChallenge-driven development

● Iterative development through the planning of anumber of incremental integration tests

● Series of challenges for different aspects of the system● weak lensing (Great)● infrastructure● “science”● photometric redshifts

● Consolidation of the interfaces (Common Data Model)

Oct. 20th, 2016 Euclid data processing challenges 28

EuclidConsortiumInfrastructure Challenge 6

Science Archive Meta-Data Storage

SDC zSDC z

Data Storage

FileXML

Computing Infrastructure

InfrastructureAbstractionLayer (IAL)

Processing Control (COORS) (Processing Orders)

Software Continuous Integration and Deployment (CernVM FS)

Monitoring(Icinga)

Euclid Archive SystemEuclid Archive System

SDC ySDC ySDC xSDC x

……

preliminary versions of (almost) all components involving almost all SDCs!

Oct. 20th, 2016 Euclid data processing challenges 29

EuclidConsortiumScience challenges 2 and 3

Level 1

Level S

SIMVIS EXTNIR SIR

Level E

LE1

OPS

MOC

GroundStation

MER

SPE SHEPHZ

LE3

SOC

Level 2

Level 3

Science 2 challenge(spring 2016)

Science 3 challenge(spring 2017)

Oct. 20th, 2016 Euclid data processing challenges 30

EuclidConsortiumSDC-CH major tasks

● Develop and provide the Elements building andpackaging framework to the collaboration

● Photometric redshift-related software development● Phosphoros : template fitting algorithm implementation● PHZ pipeline combining template fitting and machine

learning algorithms● Strong lens detection

● Contribution to algorithm exploration – (Paraficz et al. 2016 https://arxiv.org/abs/1605.04309)– (Tramacere et al. 2016 https://arxiv.org/abs/1609.06728)

● Development of a new (SExtractor) framework in C++

Oct. 20th, 2016 Euclid data processing challenges 31

EuclidConsortiumPhosphoros challenge results

Oct. 20th, 2016 Euclid data processing challenges 32

EuclidConsortiumSExtractor++

● A new modular and extensible SExtractor framework

● For the astronomical and the Euclid communities

● Long term maintenance and evolution perspectives

● Modern software design● API based on interfaces● Single responsibility principles● Design patterns● BOOST plugin system for adding algorithm steps

● Collaboration between Emmanuel Bertin and the Euclidcommunity

Oct. 20th, 2016 Euclid data processing challenges 33

EuclidConsortiumSExtractor++ status

● Framework ready

● Simplified aperture photometry : SExtractor comparison !

● Multi-frame model fitting

SExtractor 2.23.1SExtractor 2.23.1 SExtractor++SExtractor++

Oct. 20th, 2016 Euclid data processing challenges 34

EuclidConsortiumConclusions

● Euclid challenges: science goals, hardwaredevelopment, algorithm determination, softwaredevelopment, etc...

● Challenge-driven development : best approach forbuilding up software systems through largecollaborations?

● Possible extra benefits for the astronomical community:● The “Elements” building and packaging framework● Part of the “Infrastructure Abstraction Layer” (IAL)● Science tools, such as Phosphoros and SExtractor

Oct. 20th, 2016 Euclid data processing challenges 35

EuclidConsortium

Thanks for your attention !

Recommended