40
05/05/14 Emmanuel Gangler – Ljubljana seminar 1/27 Emmanuel Gangler – LPC – Clermont-Ferrand (France) An era of Big Data in astronomy

An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 1/27

Emmanuel Gangler – LPC – Clermont-Ferrand (France)

An era of Big Data in astronomy

Page 2: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27

* The project* The science* The Big Data

Page 3: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 3/27

In short : ● A stage-4 survey :

● 8.4 m telescope● Cerro Pachon (Chile)● (Very) wide-field astronomy● 9.6□ camera

● 0.2 '' pixel

● All visible sky in 6 bands (ugrizy) (~20000□)

● 15 s exposure, 1 visit / 3 days

● During 10 years !

● 60 Pbytes of raw data

Page 4: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 4/27

Observation strategy●Main survey: 90% of time

● Pairs of 15 seconds exposure● 2 visits separated by 1H● mag r~24.5 on 1 exposure● r~27.5 after 10 yr (150 visits)

●Deep drilling survey: 10% of time● r~26 ; 30 fields, 300□

● Continuous exposures 1H/night

Page 5: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 5/27

The LSST consortium

R&D Construction Operations

● Non-profit organization

● 35 institutions, with major US contribution:

● SLAC, Google...

● Non-US : Chilean republic, France/CNRS/IN2P3

● + international partners (UK, ...)

● Expect ~900 scientist involved

● NSF/DOE/Private funding : ~670 M$

● Operations: 10 years (2022-2031)

Page 6: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 6/27

The Cerro Pachon site

Page 7: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 7/27

The camera and telescope● Camera is located in the telescope beam● Constraints on

● Enveloppe (ø<1.6m)● mass (3t)● Heat dissipation● Lifetime (10 yr) and maintenance

●Median seeing : 0.6'' → 0.2'' pixel●Minimum pixel size 10μm●Plate scale → 10.3m focal length●Depth requirement : aperture 6.5m→ focal ratio <1.5

●FOV 3.5° → 3.2 Gpix, → 63cm ø focal plane

●Fast slew (5°/sec) → Compact design

Page 8: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 8/27

The focal plane

189 CCDs

Page 9: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 9/27

* The project* The science* The Big Data

Page 10: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 10/27

Which science will LSST address ?

Slide fromIvezic

Page 11: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 11/27

The LSST science book● 4 major themes

● Dark Energy, Dark matter● Mapping Milky Way● Transient optical sky● Solar system

● 11 science collaborations● Weak Lensing● BAO● Supernovae● Strong lensing● Galaxies● AGN● Milky way and the local volume structure● Stellar populations● Transient/variable stars● Solar system● Informatics and statistics arXiv:0912.0201

ArXiv 1211.0310

Page 12: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 12/27

Slide fromIvezic

Page 13: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 13/27

Page 14: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 14/27

LSST dark energy probes● Quantitative and qualitative step:

● ~50 000 SN deep field (2013 : 500 SN)

→ homogeneity test

● ~10 B galaxies (10 M DESI ; 1M BOSS)

→ Structure growth

→ Redshift tomography

→ GR consistency checksBAO

Page 15: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 15/27

Page 16: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 16/27

TransientsGRB Orphan afterglows

GRB afterglow

?

● Transient detection● High cadence in deep drilling field● High rate of false positives

● Follow-up will be the key● LSST releases the alerts within 1 min● Spectroscopic● Other wavelength

● LSST is also a follow-up instrument !

Kasliwal

Prsa

Page 17: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 17/27

Page 18: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 18/27

Page 19: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 19/27

* The project* The science* The Big Data

Page 20: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 20/27

Big Data:when « more is different »

Processing capacity doubles every 18 monthbut data volume doubles every year !

x1000 in 10 years : same trend in astronomy

Page 21: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 21/27

Data management is a pillar of the project :

Telescope Caméra Data Management Outreach

“How do you turn petabytes of data into scientific knowledge?”Kirk Borne (George Mason U.)

« The data volumes […] of LSST are so large that the limitation on our ability to do science isn't the ability to collect the data, it's the ability to understand […] the data »

Andrew Conolly (U. Washington)

Page 22: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 22/27

Page 23: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 23/27

LSST data flow

~ 1/1 000 000 000 of LSST data

Camera : 198 CCD (16 Mpix) read in parallel→ 3,2 G pixels !~ 6 Gbyte / 17 seconds→ 15 TB / night

Page 24: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 24/27

LSST data flowCamera : 198 CCD (16 Mpix) read in parallel→ 3,2 G pixels !~ 6 Gbyte / 17 seconds→ 15 TB / night

During 10 years !

~ 1000 visits per field→ opens the time domain

Page 25: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 25/27

LSST Data in short● Huge data flow:

● Images : 2x6.4 Gbyte/39 seconds● 15 TB/night● 100 PB image archives ● 40. 109 objets ( 100-200 TB catalog )● 5 000. 109 detections ''sources'' ( 3-5 PB catalog )● 32 000. 109 measurements ''forced sources'' (1-2 PB catalog)● Nightly transient alerts: >2.106

● Big data paradigm: acquire data first, analyze them taler

● Astroinformatics paradigm: characterize first, analyze later● Data anlysis is NOT part of the project

Simulation 1 CCD 4k x 4k

(arXiv:0909.3892)

Page 26: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 26/27

Available data:

Page 27: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 27/27

How to handle the problem● Interdisciplinary field with statisticians and IT

researchers● Data access

– Which DB model ? (relational, graph, line, column, …)– How to paralellize access

● Data visualization– Interactive data exploring– Explore time domain, 3D– Explore new display (screen walls, pads)

● Data mining– Machine learning : supervised/unsupervized– Sub-linear and approximative algorithms

Page 28: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 28/27

Are LSST data « big » ?SDSS LSST 1 yr.

(~2020)LSST 10 yr.

(~2030)

Raw data 14 TB 6 PB 60 PB

Archive (tape) 19 PB 270 PB

Disk (DAC) 16 PB 90 PB

DB (baseline) 7 TB 0,5 PB 5 PB

Moore Equivalent 2014

12 TB 1.2 TB

● 12 TB : >> usual sizes handled by DBMS → Big

● ~6H for a un full scan at 600 Mo/s.

● 110H to index 3TB sous MySQL

● System has to be distributed

● Moore « law » isn't one ...

Answer to Schegel 2012

Page 29: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 29/27

QservThe LSST baseline

Unified user interface− User input in SQL− Distributed nature is hidden

− ~1000 nodes in parallel− Fault-tolerant− Commodity hardware

Partitioning :● Geometry (cone searches)● Light-curves (Sources and Object together)

Limitations● Patches are independant on the sky● Results size● Computation time

Page 30: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 30/27

An new approach : map/reduce

Move the calculus to the data→ High level computing skills needed

Qserv● Still classical approach

− Select (extract) data− Run user algorithm

● Will fail when selectivity is low !

DBMS

Calculus

Query

x

Map/Reduce● Write algorithm in parallel form

− No data transfer− Use local CPU

● Not all tasks can be written in this form

Controler

(Cartoon from PhD comics)

Page 31: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 31/27

Data mining

Astroinformatics point of view:

Borne 2009VO domain

Page 32: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

Which knowledge to extract ?● Classical problems in astronomy

– Objects classification● Cluster significance ? (statistical/scientific)● Confusion problems● Efficient algorithms for

– Highly dimensional problems (> 1000 dimensions, >1010 entries)

– 2-points (or N-points) correlations– Rarity detection

● Rarity metric, efficient algorithmic● Discoveries ? Anomalies (detector, software)

– Dimensional reduction● Compact data representation

– Measurements errors● Usually neglected in existing approaches

S. G. Djorgovski,

32/40

Page 33: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

Classification examples

● Color-color plot classification

● Star-galaxy (+morphology)● Star/QSO● ...

D. Bard

Page 34: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

Classification as a process

● Whole range of classifiers (ANN, SVM, KNN, LDA, QDA, GMM ...)

● Optimum depends on spectific task

● ROC curve (Receiver operating characteristics)

KNN SVM

● Specific training required to be efficient● Methods are not black boxes !→ Astrostatistics

Page 35: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar

Rarity detection● Transient searches dominated by noise

● Supernovae detection (http://arxiv.org/abs/1106.5491)

● 1000 detections for 1 event

● Millions of detections / night !PTF data

Page 36: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar

Classifying transients :● A real Big Data Mining issue

Page 37: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar

New ideas● IT research comes with new approaches

● Tested in Big Data environment. ● Yet to be applied (if makes sense) to astronomy

● Integration of Data Mining and Data Base– Searching for relation between variables (Functional

dependancies)● e.g. Is (u,g,r,i,z) a predictor of 'is Star'

Works blindly : is (c1,c2,c5) predicting (c4) ?

– Skyline or Pareto front approach● e.g. Define a partial order in the data

Extract the extremas under this partial order rule– Graph-based searches (comes from text analysis)

● e.g. Connect objects by their « likeliness »Search for information contained in the « is alike » network

Page 38: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar

New ideas● General statements about Big Data

● 1-pass (or few-pass) only algorithms– Read the data at most once whenever possible

● Sublinear approach : – Don't even read all data– Approximate methods : degree of approximation is under

control– Some can go down to log-log-N !

● ex. compute an approximate number of non-empty bins in a Nth-dim histogram

– Not all problems can be approximated

Page 39: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

Conclusions

● LSST will provide unprecedented data

● Opens up time domain● … and a LOT of scientific opportunities

● Proper knowledge on how to use these data needed

● Training of students● IT cross-disciplinary field : astroinformatics

– Annual conference

– National initiatives

– EU initiatives (COST BigSkyEarth)

● Data access has to be organized

● A European center is under study

– Expression of interest is welcomed !

Page 40: An era of Big Data in astronomy - fiz.fmf.uni-lj.sidunja/AD/presentations/AD_2014-05-05_EG.pdf · in astronomy. 05/05/14 Emmanuel Gangler – Ljubljana seminar 2/27 * The project

05/05/14 Emmanuel Gangler – Ljubljana seminar 40/27

Thank you ...