25
J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science Operations Centre ESA/ESAC/SRE-OOO

J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science

Embed Size (px)

Citation preview

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 1

Data Management Challenges in Gaia

Jose Hernandez

Alexander Hutton

Gaia Science Operations Centre

ESA/ESAC/SRE-OOO

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 2

• Gaia Observing strategy

• Data flow and Pipelines

• Data Challenges

• Data Tracking

• Tools

• Examples

Outline

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 3

• Survey Mission at L2

• Scan the sky along great circles

• Accumulate the data on-board

• Download to Earth every night

• Full Sky Observed every 6 months

• Repeat it for at least 5 years => 10 Full Sky Maps

Gaia Observing strategy

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 4

Gaia Observing strategy

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 5

• 100 Tb of raw data

• We expect to observe 109 Sources (could end up being 2x109)

• Spectra for 2x108 sources

• 80 Observations per source on average:

• 1011 Astro/Photo Observations

• 2x1010 Spectra

Gaia: Some numbers after 5 years

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 6

Operaciones

New NorciaCebreros

Mission Operation Centre (MOC)

ESOC

Science Operation Centre (SOC)

ESAC

Launcher

Satellite

Data Processing & Analysis

Consortium

DPAC

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 7

New NorciaCebreros

Mission Operation Centre (MOC)

ESOC

Science Operation Centre (SOC)

ESAC

Launcher

Satellite

Data Processing & Analysis

Consortium

DPAC

Data flow

Malargüe

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 8

Data flow

Figure Courtesy A. Brown, DPAC

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 9

Data Processing Cycles

MDB-02MDB-01MDB-00

DPCsDPCs

DPCsDPCsDPCs

DPCs

MOC

<=8.5 Mbit/s

Daily Pipelines

SOC

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 10

• Sheer number of Observations

• Ensuring No Data Loss

• Managing the Daily Data Flow

• Data Tracking

• DPCs Autonomous and Geographically Distributed

Some Challenges

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 11

• Single Data Model/ICD with DPCs

• MDB Dictionary Tool on-line:• Keeps track of versions, changes,…

• Immediate visibility

• Automatic generation of DM classes, DB schema, Data Consumers…

• DM evolution controlled by CCB

Tools: Data Modeling

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 12

• All Data tagged with a barcode

• Named “Solution Identifier”

• It is just a Long (64bit) Number

• Each solutionId has some metadata

Data Management and Tracking

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 13

• Used to identify data

• Who, when, where generated the data

• What SW version, environment, run number, at what time

• We also use it to manage the daily data flow

• Related data gets same solutionId, this is a form of doing “data binning”

Data Tracking: solutionId

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 14

Data Tracking: solutionId

• Track Data Provenance• Verify correct calibrations

get used• Find what was affected

by incorrect data• Remove incorrect data

from the pipelines

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 15

• Current Numbers:

• 10.4x109 Astro/Photo Observations

• 1.3x109 Spectra

• Received 6.3 Tb RAW Science Data

• 144Gb of HouseKeeping Data

• 21Tb Generated in the processing

• Typically the daily pipelines are writing thousands object/sec

Data Integrity and Completness

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 16

• Challenges:

• Ensuring there are no data leakages

• Data consistency and completeness

• Within the pipelines and wrt the MDB

Data Integrity and Completeness

SOC MDB

DPCC

DPCB

DPCG

DPCI

DPCT

MOC

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 17

• All Gaia Data can be related to On Board Time, examples:• At time x the source image crosses CCD

• At time y Charge Injections occur

• Spacecraft attitude

• Use OBMT to collapse records of the same time together and count the number of Objects per bin

Time Data Binning

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 18

Time Data Binning

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 19

Time Data Binning

Galactic Centre Crossings

Galaxy Tail Crossings

FOV-PFOV-F

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 20

• Data Binning gets done on the fly as the pipeline stores it, no overhead

• We can then compare the TimeLine data at different points

• We can also check Data Consistency

• All the checks can be automated and alarms raised if problems found

Time Data Binning

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 21

Examples: Omega Centauri

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 22

Time Data Binning

Galactic PlaneCrossing (FOV-P)

Omega CentauriCrossing

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 23

Omega Centauri observation

50 sec

100,000 Observations

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 24

Omega Centauri observation

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 25

Questions?

NGC 1818 in LMC