Transcript
Page 1: Status of LOFAR Data in HDF5 Format · 2011-11-23 · IDL, VisIt, HDFView, h5py, PyTables, DAL (LOFAR), PyDAL(LOFAR) LOFAR has started work on the next generaon of its C++ Data Access

Abstract

TheLowFrequencyArray(LOFAR)projectissolvingthechallengeofdatasizeandcomplexityusingtheHierarchicalDataFormat,version5(HDF5).MostofLOFAR'sstandarddataproductswillbestoredusingtheHDF5format;theBeam‐Formed(akaPulsar)andTransientBufferBoard(TBB)TimeSeriesDatahavetransiQonedfromproject‐specificbinarytoHDF5format.Itsupportsanunlimitedvarietyofdatatypes,andisdesignedforflexibleandefficientI/Oandforhighvolumeandcomplexdata. HDF5isportable,extensible,andparallelizable,allowingapplicaQonstoevolveintheiruseofHDF5.WereportonourefforttopavethewaytowardsnewastronomicaldataencapsulaQonusingHDF5,whichcanbeusedbyfuturegroundandspaceprojects.TheLOFARprojecthasformedacollaboraQonwithNROA,theVOAandtheHDFGrouptoobtainfundingforafull‐QmestaffmembertoworkondocumenQnganddevelopingstandardsforastronomicaldatawri[eninHDF5.WehopeoureffortwillenhanceHDF5visibilityandusagewithinthecommunity,specificallyforSKAPathfindersandothermajornewradiotelescopessuchasLOFAR,EVLA,ALMA,ASKAP,MeerKAT,MWA,LWA,andeMERLIN.

TheLOFARRadioTelescope

WWW.LOFAR.ORG

LOFARDatainHDF5:SolvesVariety,Complexity&SizeIssuesTimeSeriesdatasharesimilarissuestoalldatasetsproducedbyLOFARobservaQons‐‐theyvarytremendouslyintype,sizeandcomplexity.InsteadofusingmanydifferentfileformatstoencapsulateLOFARdata,theHierarchicalDataFormat,version5(HDF5)waschosenasaviablesoluQonforpotenQallymassiveLOFARdataproducts.

TheLOFAR“Superterp”,Exloo,NL(Fig.1)

LOFARdatastaQsQcs:•  Dataratesupto8GB/sec•  Filesizes,10sto100sTB•  Datasetsrangingfrom1upto6dimensions

•  SingledatasetcanbespreadovermulQpledisks

•  DatawriQngmustbemulQ‐threadedandparallelizable

•  Mostastronomicaldatacontainers(CASAMSorFITS,or~30pulsardataformats)couldnotmeetallourneeds

• The“LOwFrequencyARray”(LOFAR)• Currently,thelargestradiotelescopeintheworld• 48staQons(40NL,8InternaQonal);completebyendof2011(Figure1onrightshowsthecenterstaBonsinNL)• Baselinesfrom1‐1500km;ulQmatelyachievesub‐arcsecondresoluQonovermuchoftheband• LowBandAntennabandpass:30‐80MHz• HighBandAntennabandpass:120‐240MHz• TotalRecordableBandwidth:48MHz;100MHzTBBs• DataCorrelaQon:IBMBlueGene/Psupercomputer,Groningen,NL(seephotoinFigure2onright)• Offlineprocessingclusterhas100nodes,eachwith:24cores,64GBRAM,21TBstorage• LongTermArchive(LTA)has:2.2PBdisk,5PBtape• Accessto22,600coresviaBigGridandJUROPA

LOFARBeam‐FormedTimeSeriesDataFormatSpecificaQonsinHDF5

1 DocumentsavailablefromtheLOFARWiki(“DataProducts”link),h<p://lus.lofar.org/wiki2 UVVisibilityandNearFieldImagingareunderdevelopment

EachLOFARdipoleantennahasaringbuffer,transientbufferboard,thatcanstoreupto1.3secondsofrawQmeseriesdatasampledevery5ns(LOFAR’shighestQmeresoluQon).LOFARactsasa“Qmemachine”–atransienteventtriggersthebuffereddatatobewri[entoHDF5,providingupto1.3secondsofhistoricalinformaQonatfulltemporalandspaQalresoluQon.

TBB’s are used to study: air showers producedby cosmic rays, lightning onEarthandotherplanetarybodieswithinourownsolarsystem,andunknown

sub‐secondQmescaleastronomicaltransients.

TransientBufferBoard(TBB)Data:look‐backintothe“past” Summary,CollaboraQonsAndFutureConsideraQons

HDF5‐friendlyTools:IDL,VisIt,HDFView, h5py,PyTables,DAL(LOFAR),PyDAL(LOFAR)

LOFARhasstartedworkonthenextgeneraQonof itsC++DataAccessLibrary(DAL+PyDAL),basedoncurrentQmeseriesdatausecasesandusepa[erns,withthegoalofspeed‐up,simplificaQonofuseandbe[erpythonbindings. WealsointendtoexpandtheDALforusewithaddiQonalLOFARdataformatsandtobeablewriteconvertersto/from FITS, the Casa TablesMeasurement Sets and HDF5. Improvements coming in2012!

LOFARhasteamedupwithNRAO,theIVOAandTheHDFGroupinrequesQngfundingfordefininganddevelopingastronomicaldatastandardsinHDF5,basedonLOFARsICDandDALgroundwork.Thiseffortisnicknamed“AstroHDF”.Futureworkalsoinvolvesdeveloping converts to/from HDF5/FITS and collaboraQons on interfaces inDS9 andVisItforHDF5(WCS‐compliant)astronomicaldata.

We believe that the HDF5 format is well‐suited for complex and large astronomicaldata.ForsomeofLOFAR’sdataformatchallenges,HDF5istheonlyviablesoluQon.InHDF5,LOFARcaneasilymap10sofTBofcomplex,6‐dimensionaldatastructureswithintricateheaderandlogginginformaQonthroughoutthedata‐files. WefeelthatHDF5canmeet thedata needs anddemandsof futureprojects such as LSST and the SKA,wheresizeandcomplexitywillbeevengreaterofanissuethanfacedbyLOFAR.

TheBlueGene/PsupercomputeratGroningen(Fig.2)

LOFARInterfaceControlDocuments1(ICDs)providedetaileddescripQonsofallexpectedLOFARDataProductsinHDF5format:

• Beam‐Formed(BF)Data[shownontheleF]• TransientBufferBoard(TBB)TimeSeries[bo<omleFofposter]• RadioSkyImageCubes• DynamicSpectrumData• RotaRonMeasure(RM)SynthesisCubes• UVVisibilityData2• Near‐fieldImages2

ElementBeam

Sub‐ArrayPoinRng

LOFARStaRons

Tied‐ArrayBeams

DATA(I)

TIME

FREQUENCY

DATA(Q)

TIME

FREQUENCY

DATA(U)

TIME

FREQUENCY

DATA(V)

TIME

FREQUENCY

BinaryDataContainer

HDF5Container:Header&DataStructureLayout

Header+Data==

HDF5“File”(BeamFormed)

LOFARDataAccessLibrary(DAL):C++andPythonDataI/O

LinuxCluster“Offline”KnownPulsarPipeline

ConvertersteptoberemovedwhenDALlayeriscompleted.

Data

BlueGene/P“Online”Tied‐ArrayBeamPipeline

FortheLOFARproject,adataproductisdefinedwithinthecontextofHDF5.HDF5allowsforstorage,notonlyofthedata, but for the associated and related meta‐datadescribingthedata’scontents,condiQonsofobservaQons,logs, etc.. As an “all‐in‐one” wrapper, the HDF5 formatsimplifiesthemanagementofcomplexdatasets.DescribesLayout

LOFAR Data Format ICDBeam-Formed DataDocument ID: LOFAR-USG-ICD-003.tex

ISSUE 2.04.08

SVN Repository Revision: 8333

A. Alexov, K. Anderson, L. Bahren, J.-M. Grießmeier, J.W.T. Hessels,

J.S. Masters, B.W. Stappers

SVN Date: 2011-09-02

Contents

Change record 3

1. Introduction 81.1. Purpose and Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2. Context and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3. Applicable documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2. Overview 9

3. Organization of the data 103.1. High level LOFAR BF file structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2. Overview of BF Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.3. Hierarchical Structure of the BF H5 File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4. Detailed Data Specification 144.1. The Root Group (ROOT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.1.1. Common LOFAR Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.1.2. Additional Beam-Formed Root Attributes . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2. The System Logs Group (SYS LOG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.3. The Sub-Array Pointing Group (SUB ARRAY POINTING {NNN}) . . . . . . . . . . . . . . . . . . 184.4. The Processing History Group (PROCESS HISTORY) . . . . . . . . . . . . . . . . . . . . . . . . 184.5. The Beam Group (BEAM {NNN}) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.6. The Coordinates Group (COORDINATES) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.6.1. The Time coordinate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.6.2. The Spectral coordinate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.7. The Stokes Datasets (STOKES {N}) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.8. The Subband/Channel tables/arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.9. TiedArray Observing Sub-Modes & Data Format . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.9.1. Standard Observing Mode for TiedArray . . . . . . . . . . . . . . . . . . . . . . . . . 284.9.2. Sub-Observing Modes for TiedArray . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

1

WhyuseHDF5?•  FormatformanagingandstoringlargeandcomplexscienQficdata•  Unlimitedvarietyofdatatypes•  NoinherentfilesizelimitaQons•  ParallelreadingandwriQng(MPI)•  Runsonmassivelyparallel/distributedsystems•  Built‐incompression•  LibraryAPIinC/C++/Java/Fortran•  Self‐describingandportable•  Free&has20+yearhistory@NASA

DefinesSWSpecificaQon

StatusofLOFARDatainHDF5FormatAnastasiaAlexov1,2,PimSchellart3,SanderterVeen3andtheLOFARICDGroup

1:UniversityofAmsterdam(UvA)[[email protected]],2:NetherlandsInsQtuteforRadioAstronomy(ASTRON);3:RadboudUniversityNijmegen

StaQonCorrelator

TransientBufferBoardRingBuffer,1.3s

LOFARantennasignal

LOFARCorrelator

ImagesBeam‐FormedData(HDF5)

TimeSeriesDataWriter(HDF5)

PipelineSoxware(HDF5)

A/DConverter

48MHz

100MHz

LOFARDataAccessLibrary(DAL):C++andPythonDataI/O

LOFARTBBDataFlow

Hearmoreat:HDF5BoFWedNov9th@17:15

LOFARprojectisdevelopingtheC++DataAccessLibrary(DAL),whichprovidesfullscopeconstructorsforcreaQngandaccessingLOFARdataproducts,usingtheICDspecificaQons.TBBandBeam‐Formedhavebeenimplemented.

CrabGiantPulse(@center),asseenbytheTBBboardsat0.015secsamplingBme

CollaboraQvemailinglistinfo:nextgen‐[email protected]:[email protected]/messagebody“subscribednextgen‐astrodata”

TBB HDF5 Data Stats: •  100 MHz of bandwidth •  1.3 sec of look-back time •  40MB - 48GB dataset/station •  2GB - 2.25TB dataset sizes •  Data maps easily to HDF5 •  2012 Upgrade: 5 sec look-back •  2012 Upgrade: 8.6 TB dataset

AST

RA

U

NIJ

E

NUI

R

O

M

T

T

D

EN

HY

SIC

S

DBO

D

VER

GN

EM

PEA

S TYI

P

R

FO

www.hdfgroup.org

Hearmoreat:HDF5BoF

WedNov9th@17:15

Recommended