Upload
yagil
View
25
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Authors David Boyes, Benjamin Mampaey, Cis Verbeeck, Veronique Delouille, Jean-François Hochedez STCE + ROB. SDO Data Access and Distribution in Europe and the WisSDOm Data Centre in ROB, Brussels. What will be covered. Where is the data and the access architecture for the users - PowerPoint PPT Presentation
Citation preview
July 2010 Cospar10 Bremen Slide 1
SDO Data Access and Distribution in Europe and the WisSDOm Data Centre in ROB, Brussels
AuthorsDavid Boyes, Benjamin Mampaey, Cis Verbeeck,
Veronique Delouille, Jean-François Hochedez
STCE + ROB
July 2010 Cospar10 Bremen Slide 2
What will be covered
• Where is the data and the access architecture for the users
• Some basic terms
• User access methods
– modules
– basic web access
– virtual observatories
– simplified web access
– pseudo files and other developments
• Interesting issues
– Retention
– Saved searches
– Evolving calibration
• Neat stuff to come
– Cutouts
– Helioviewer
– Grid integration
July 2010 Cospar10 Bremen Slide 3
Where is the data for the users
• Data is available from one or more data centre(s) - all are networked
• Some users are "close", some are "far" - distance matters
• All data is available somewhere
• Users can get data (an "export")
– from the nearest centre directly
– via the nearest centre from a remote centre
– directly from another centre
• Most of this is automatic
– you will see differences in e.g. delays
July 2010 Cospar10 Bremen Slide 4
How the data is accessed(a bit technical)
• the system is the netDRMS
– created by the JSOC at Stanford
• files are generated by content
• system holds data files + metadata
– SUMS + DRMS
• mediator is an "export" module
• makes your very own file
– FITS, tar of FITS etc.
• SQL etc. is hidden from user
July 2010 Cospar10 Bremen Slide 5
Access summary ...
• No files until you ask for them
• Data is referenced by content - provided as a file(s) with whatever name you want
• The exported files are built using stored elements, so e.g. FITS with Rice compression quite direct as AIA data is stored internally in this format
• Can get anything but...
– you may as well ask for all metadata
– the files can be large - best not to ask for 100's
July 2010 Cospar10 Bremen Slide 6
Some basic terms• series
– basic collection of data items with shared properties
– by convention named <project>.<data>
– all series records share a metadata format (i.e. keywords)
• keywords
– FITS style keywords plus added metadata only keywords
– correspond to columns in the metadata (DRMS) database
• online means
– available from a disk at the site
– so offline means : not yet arrived/available, deleted but can be fetched
• data format
– whatever is stored is native (FITS, JP2000), conversion is post-processing
– characterised by resolution, cadence (e.g. 4K x 4K at 10s, 1K x 1K at 90s)
– naturally can't do better, but can reduce by "cutouts" in time or space
• data records
– can be several items as a group (e.g. image + bad pixel map + alternative format)
– data is SUMS plus metadata, referenced by metadata tables (DRMS) - usually one to one
– each is self contained, for example cadence is not part of data
July 2010 Cospar10 Bremen Slide 7
Example series
• aia_test.lev1 AIA images 4Kx4K full disk full cadence
• aia_test.synoptic2 AIA images reduced to 1Kx1K full disk and 90s cadence
• hmi_test.M_45s magnetograms, 45s cadence
• hmi_test.v_45s dopplergrams, 45s cadence
• jpeg2K to come, browsing and forecasting
July 2010 Cospar10 Bremen Slide 8
User access methods
• Direct via “modules”
– on site of data centre
• Query based
– precursor to full data access
– checks a part of the data (metadata) without having to retrieve the very large part
• Indirect via network
– web/http based
– delivers data somewhere - maybe to fetch immediately or later
• Direct via wrapper
– on site e.g. IDL (Matlab on way)
July 2010 Cospar10 Bremen Slide 9
A practical pause - limitations
• Sheer size of request - even if you have a 2TB USB stick, that's only 2 days
• Network speed - at about 200Mb/s it takes a day to get a day's worth
• Search/database speed - millions of records
• Raw data access/retrieval speed - the basic image data takes time to get from disk
• Retention time - you can get anything, but you probably have to wait for a full day from 2 years ago that nobody else has ever used
July 2010 Cospar10 Bremen Slide 10
• At the data centres, for example
– show_series
– show_info
– jsoc_export_as_fits
[jdb@db1 ~]$ show_info -s ds=aia_test.synoptic2
First Record: aia_test.synoptic2[2010-05-21T15:00:00.57Z][171] is first of 6 records matching first keyword, Recnum = 1Last Record: aia_test.synoptic2[2010-07-14T11:58:41.07Z][335] is first of 2 records matching first keyword, Recnum = 445376Last Recnum: 445377
[jdb@db1 ~]$ show_series
aia_test.lev1 aia_test.synoptic2 drms.sites hmi.doptest hmi_test.m_45s hmi_test.s_720s lm_jps.lev1_test4k10s
[jdb@db1 ~]$ jsoc_export_as_fits reqid=REQ_FTP expversion=0.5 rsquery=aia_test.lev1[:#209866] path=tmp method=url protocol=FITS
'10552320' bytes exported.
Access by : modules - the basic bricks
July 2010 Cospar10 Bremen Slide 11
Access by : basic web access
• System developed by JSOC : lookdata.html
• Online via JSOC web site, but heavily loaded
• Being tested at ROB
• Provides an easy access to an overview of all the available data
• Formulating a selection query does require knowledge of query syntax
• Provides for a wide variety of data packaging
– normal user FITS or internal format (FITS with no keywords)
– via web for immediate or later access, as one or more individual files or as tar
– ROB working on fewer packaging options
July 2010 Cospar10 Bremen Slide 12
Access by : basic web access
July 2010 Cospar10 Bremen Slide 13
Access by : Virtual Observatories
• VSO
– development of existing VSO
– prototype for SDO running and definitive version in preparation
– http://sdac.virtualsolar.org/cgi/search
• Soteria
– demo provider made for ROB/USET, SDO provider being coded now
– http://soteria-space.eu/
• Uniform search paradigm
• Infrastructure hides efficient searches with complex syntax e.g. SQL in various flavours
July 2010 Cospar10 Bremen Slide 14
Access by : Soteria Virtual Observatory
• One part of an EU project
• Based on current web access technology
• The example is for the ROB USET telescope as a data provider, each SDO site will able be able to act as a provider
July 2010 Cospar10 Bremen Slide 15
Access by : simplified web access
• Work in progress
• Limited offer to direct request of tar files or individual FITS format files, front end for PFS
• Simplified enquiry based such as :
– aia.lev1 + time + period + cadence + wavelengths
• Preparation is actually more complex than basic access - for example it requires decisions as to what keys are useful for what series
July 2010 Cospar10 Bremen Slide 16
Access by : pseudo files (PFS)
• Systematically named files in a directory tree with no real files until you access them
• Typically based on query covering a much wider range than you really need (or could use)
• Real files kept in cache so further access very cheap
July 2010 Cospar10 Bremen Slide 17
mnt`-- aia_test.lev1 `-- 2010 `-- 06 `-- 17 |-- H0000 | |-- AIA20100617_000000570000_0171.fits | |-- AIA20100617_000003570000_0304.fits | |-- AIA20100617_000009580000_94.fits | |-- AIA20100617_000018570000_1600.fits | |-- AIA20100617_000050070000_211.fits | |-- AIA20100617_000053050000_335.fits | |-- AIA20100617_000056100000_193.fits...... | |-- AIA20100617_004505070000_335.fits | |-- AIA20100617_004506570000_1600.fits | |-- AIA20100617_004508070000_193.fits | |-- AIA20100617_004509580000_94.fits | `-- AIA20100617_004511070000_131.fits |-- H0100 | |-- AIA20100617_010000580000_0171.fits | |-- AIA20100617_010002080000_211.f.......
|-- AIA20100617_043008060000_193.fits |-- AIA20100617_043009550000_94.fits |-- AIA20100617_043011090000_131.fits |-- AIA20100617_043018580000_1600.fits |-- AIA20100617_044500560000_0171.fits |-- AIA20100617_044502050000_211.fits |-- AIA20100617_044503570000_0304.fits |-- AIA20100617_044505070000_335.fits |-- AIA20100617_044506570000_1600.fits |-- AIA20100617_044508070000_193.fits |-- AIA20100617_044509580000_94.fits `-- AIA20100617_044511070000_131.fits
9 directories, 160 files
Access by : pseudo files (PFS)
• Example with 160 file names, all AIA wavelengths, 15min cadence
• In prototype at ROB, source downloadable
July 2010 Cospar10 Bremen Slide 18
Access by : useful methods in development
• Order and notify via e-mail for manual fetch
• Order and automatic delivery (e.g. sftp)
July 2010 Cospar10 Bremen Slide 19
Interesting issue - Retention
• All netDRMS sites have full information for selected series - their “subscribed” series
• But is it on line?
– sites keep the latest, but must selectively discard
• Enquiry modules can tell if online, but implications (delay...) if not?
• You can request it, but it can take some time to obtain
– for now quick, but after a year or so a record nobody has looked at will be from tape
July 2010 Cospar10 Bremen Slide 20
Interesting issue - Saved searches
• How to describe a selection of data
• Can save result as a record list for a reasonable number of records but this does not save the query
– save both query and result?
• For both your own use and publication
• Saved query might give different results (e.g. online only)
• Relates to the issue of calibration
July 2010 Cospar10 Bremen Slide 21
Interesting issue - Evolving calibration and which data did I use?
• More accurate calibration will be available as time goes on and more calibration points are acquired
• So the newest and best data can change
• This done for most by applying a calibration series e.g. via Solarsoft
• But there can also be metadata changes
• The raw data is unlikely to change
July 2010 Cospar10 Bremen Slide 22
Neat stuff to come - cutouts
• This is well on the way again being developed by JSOC and LMSAL - for those who don't need the full 4Kx4K
• Very much reduced data storage requirements
• Closely related to event tracking and the HEK
July 2010 Cospar10 Bremen Slide 23
Neat stuff to come - Helioviewer
• www.helioviewer.org
• Existing project now being directed towards use with SDO data
• JPEG2000 based viewer with event marker overlay
• integration with JPEG2000 series
• rapid browsing with links to full data
• ROB is CoI in requested next stage
July 2010 Cospar10 Bremen Slide 24
Neat stuff to come - grid integration
• The data element size (10's of MB) is natural for use in a high performance grid
• The data already geographically distributed - variety of access routes
• Distributed variety of resources - large clusters, pipelines, GPU's
• Sites are on high performance research networks
July 2010 Cospar10 Bremen Slide 25
Thanks to
• JSOC at Stanford
• LMSAL
• Belnet and Geant2 for networking
• The enthusiastic cooperation from the partner data centres
• Our sister institutes at the ROB site for hosting the data centre and infrastructure
July 2010 Cospar10 Bremen Slide 26
Web addresses
• The main source : JSOC at jsoc.stanford.edu
• HEK : www.lmsal.com/hek
• ROB : wissdom.oma.be
• SAO : www.cfa.harvard.edu/sao
• GDS : www.mps.mpg.de/projects/seismo/GDC-SDO
• UCLan : www.star.uclan.ac.uk
• IAS : idc-medoc.ias.u-psud.fr