Upload
marilyn-phillips
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,
Calgary, Canada 1
Data Management Challenges in Gaia
Jose Hernandez
Alexander Hutton
Gaia Science Operations Centre
ESA/ESAC/SRE-OOO
J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,
Calgary, Canada 2
• Gaia Observing strategy
• Data flow and Pipelines
• Data Challenges
• Data Tracking
• Tools
• Examples
Outline
J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,
Calgary, Canada 3
• Survey Mission at L2
• Scan the sky along great circles
• Accumulate the data on-board
• Download to Earth every night
• Full Sky Observed every 6 months
• Repeat it for at least 5 years => 10 Full Sky Maps
Gaia Observing strategy
J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,
Calgary, Canada 5
• 100 Tb of raw data
• We expect to observe 109 Sources (could end up being 2x109)
• Spectra for 2x108 sources
• 80 Observations per source on average:
• 1011 Astro/Photo Observations
• 2x1010 Spectra
Gaia: Some numbers after 5 years
J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,
Calgary, Canada 6
Operaciones
New NorciaCebreros
Mission Operation Centre (MOC)
ESOC
Science Operation Centre (SOC)
ESAC
Launcher
Satellite
Data Processing & Analysis
Consortium
DPAC
J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,
Calgary, Canada 7
New NorciaCebreros
Mission Operation Centre (MOC)
ESOC
Science Operation Centre (SOC)
ESAC
Launcher
Satellite
Data Processing & Analysis
Consortium
DPAC
Data flow
Malargüe
J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,
Calgary, Canada 8
Data flow
Figure Courtesy A. Brown, DPAC
J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,
Calgary, Canada 9
Data Processing Cycles
MDB-02MDB-01MDB-00
DPCsDPCs
DPCsDPCsDPCs
DPCs
MOC
<=8.5 Mbit/s
Daily Pipelines
SOC
J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,
Calgary, Canada 10
• Sheer number of Observations
• Ensuring No Data Loss
• Managing the Daily Data Flow
• Data Tracking
• DPCs Autonomous and Geographically Distributed
Some Challenges
J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,
Calgary, Canada 11
• Single Data Model/ICD with DPCs
• MDB Dictionary Tool on-line:• Keeps track of versions, changes,…
• Immediate visibility
• Automatic generation of DM classes, DB schema, Data Consumers…
• DM evolution controlled by CCB
Tools: Data Modeling
J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,
Calgary, Canada 12
• All Data tagged with a barcode
• Named “Solution Identifier”
• It is just a Long (64bit) Number
• Each solutionId has some metadata
Data Management and Tracking
J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,
Calgary, Canada 13
• Used to identify data
• Who, when, where generated the data
• What SW version, environment, run number, at what time
• We also use it to manage the daily data flow
• Related data gets same solutionId, this is a form of doing “data binning”
Data Tracking: solutionId
J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,
Calgary, Canada 14
Data Tracking: solutionId
• Track Data Provenance• Verify correct calibrations
get used• Find what was affected
by incorrect data• Remove incorrect data
from the pipelines
J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,
Calgary, Canada 15
• Current Numbers:
• 10.4x109 Astro/Photo Observations
• 1.3x109 Spectra
• Received 6.3 Tb RAW Science Data
• 144Gb of HouseKeeping Data
• 21Tb Generated in the processing
• Typically the daily pipelines are writing thousands object/sec
Data Integrity and Completness
J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,
Calgary, Canada 16
• Challenges:
• Ensuring there are no data leakages
• Data consistency and completeness
• Within the pipelines and wrt the MDB
Data Integrity and Completeness
SOC MDB
DPCC
DPCB
DPCG
DPCI
DPCT
MOC
J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,
Calgary, Canada 17
• All Gaia Data can be related to On Board Time, examples:• At time x the source image crosses CCD
• At time y Charge Injections occur
• Spacecraft attitude
• Use OBMT to collapse records of the same time together and count the number of Objects per bin
Time Data Binning
J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,
Calgary, Canada 19
Time Data Binning
Galactic Centre Crossings
Galaxy Tail Crossings
FOV-PFOV-F
J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,
Calgary, Canada 20
• Data Binning gets done on the fly as the pipeline stores it, no overhead
• We can then compare the TimeLine data at different points
• We can also check Data Consistency
• All the checks can be automated and alarms raised if problems found
Time Data Binning
J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,
Calgary, Canada 22
Time Data Binning
Galactic PlaneCrossing (FOV-P)
Omega CentauriCrossing
J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,
Calgary, Canada 23
Omega Centauri observation
50 sec
100,000 Observations