25
BigData for the BigBang How large research infrastructures help shape data-driven science Dr. Marco de Vos ([email protected]) Managing Director of ASTRON & Lector Computer Science at HIT Bubble around central black hole in M87 LOFAR image © de Gasparin et al., 2012

Big Data for the Big Bang

  • Upload
    surf

  • View
    132

  • Download
    0

Embed Size (px)

Citation preview

  • BigData for the BigBangHow large research infrastructures help shape data-driven science

    Dr. Marco de Vos ([email protected])Managing Director of ASTRON & Lector Computer Science at HITBubble around central black hole in M87LOFAR image de Gasparin et al., 2012

  • ObservationModelTestableHypothesisExperimentDesignResearchQuestionHow Science Works

  • 4WSRT image Oosterloo et al., 2010

  • SKA as an example

    13 Tbit/s (all antennas)150 Gbit/s (all stations)10 Tbyte/day

  • Wat is de Square Kilometre Array (SKA)?25 250 PB/day0,8 3 PB/s

  • 10 minutes full HDTVOne week full HDTV20 years full HDTV**) ~ 2 minutes of HDTV per human on earth.Half pageDecent Book (5cm)Small Library (50 Meters of books)Large Libary(50 Km of books)Library of CongressBooks around Earth Equatorhalf-a-minute8 hours~ 1 year~ 1000 year~ 1 million yearsBooks 1/3 Earth Sun.~ 1 billion years*)*) = 140 books per human on earth.Universe Lifetime (years)Data Scale (log)10001000000100000000010000000000001000000000000000100000000000000000010181015101210910610320000 years full HDTV **)

  • Data lives on disk and tapeMove data to CPU as neededDeep Storage HierarchyData lives in persistent memoryMany CPUs surround and useShallow/Flat Storage HierarchyOld Compute-centric ModelNew Data-centric ModelMassive Parallelism Persistent MemoryLargest change in system architecture since the System 360 Huge impact on hardware, systems software, and application designFlashPhase ChangeManycoreFPGAData-centric Architectures

  • Power play(source: ExaScale Computing Study, 2008)

  • Ongecalibreerdbeeld

  • Gecalibreerdbeeld

  • Wijnholds & van der Veen, 2009: Calibration of short-baseline effectsNoise coupling, receiver noise, extended emissionSolve by describing in non-diagonal noise covariance matrix with non-zero entries on short baselines.

  • Alternating between estimation of correlated noise parameters and calibration parameters.Statistically and computationally efficient (860 params solved in 0.4sec on single 2.4GHz core)

  • Data Stewardship

  • How eScience worksSystemModel Environment ModelInstrument ModelSensor DataPredict(simulated data)SolveSystem being monitoredControl

  • Astronomical research is critically dependent on ever more accurate observations. Radio-telescopes thus not only become increasingly big and complex, but also generate an ever growing stream of data. Classical ICT architectures are no longer sufficient to handle this data-flood. In this presentation we will examine data-centric architectures, machine learning and data-stewardship. In these areas, research for the Big-Bang will explore ways to handle Big Data in a variety of scientific disciplines. ASTRON/LOFAR 2004 - Reproduction in whole or in part is prohibited without written consent of the author.* ASTRON/LOFAR 2004 - Reproduction in whole or in part is prohibited without written consent of the author.I1: Galaxies, what are they? / I2: How science worksA1: Big Questions / A2: Big Telescopes / A3: Big Data P1: Data Centric Models / P2: Machine Learning / P3: Data StewardshipCC: How does eScience work? / MS: To boldly go* ASTRON/LOFAR 2004 - Reproduction in whole or in part is prohibited without written consent of the author.Hubble, 1923* ASTRON/LOFAR 2004 - Reproduction in whole or in part is prohibited without written consent of the author.* ASTRON/LOFAR 2004 - Reproduction in whole or in part is prohibited without written consent of the author.(i) Understanding the history and role of neutral Hydrogen in the Universe from the dark ages to the present-day,(ii) Detecting and timing binary pulsars and spin-stable millisecond pulsars in order to test theories of gravity (including General Relativity and quantum gravity), to discover gravitational waves from cosmological sources, and to determine the equation of state of nuclear matter and(iii) Detecting and imaging radio continuum emission from galaxies and active galactic nuclei to trace the evolution of galaxies, black holes, star formation and magnetism from the dawn of galaxies to the present era.

    Addressing the themes of Origins and Fundamental Physics, these main goals are supplemented with the theme of Discovery. In particular, the advances that SKA1 represents in terms of its sensitivity, survey speed, time domain sampling and image fidelity will open up new regions of discovery space. ASTRON/LOFAR 2004 - Reproduction in whole or in part is prohibited without written consent of the author.* ASTRON/LOFAR 2004 - Reproduction in whole or in part is prohibited without written consent of the author.*Case 1: SKA much bigger than LOFAR ASTRON/LOFAR 2004 - Reproduction in whole or in part is prohibited without written consent of the author.14 Exabytes/day raw data

    * ASTRON/LOFAR 2004 - Reproduction in whole or in part is prohibited without written consent of the author.* ASTRON/LOFAR 2004 - Reproduction in whole or in part is prohibited without written consent of the author.1. Data-stewardship requires balancing of equipment (telescopes) and eInfrastructure beforehand* ASTRON/LOFAR 2004 - Reproduction in whole or in part is prohibited without written consent of the author.