Importance of accessible well documented and scientifically relevant data – what we learned from UARS and Aura Mark Schoeberl STC

Embed Size (px)

Citation preview

  • Slide 1
  • Importance of accessible well documented and scientifically relevant data what we learned from UARS and Aura Mark Schoeberl STC
  • Slide 2
  • Why should we be motivated to provide accessible, well documented data Encourages instrument data analysis Leads to scientific improvement of the data Leads to new science discoveries Leads to new observational missions Leads to improvements in current data sets through inter-comparison Increases our general knowledge of the Earth System
  • Slide 3
  • Experience with Two Missions UARS 1991-2005 Aura 2004-present
  • Slide 4
  • What was done with UARS UARS data was controlled by a strict protocol. Only the selected PIs could use the data. Protocol covered several years after launch. This upset and angered the community (more on next slide). UARS data was released very slowly and reluctantly. Data was archived on a UARS machine (VAX) with limited tools. The UARS machine was used to process data as well as distribute it. By todays standards the data set was tiny. Data documentation was not emphasized and descriptions were finally made available through a special JGR issue that came out many years after launch. If you had problems with the data you had to call the PI. The protocol was dissolved a couple years after launch and the data was generally distributed it is now available through the Goddard DISC. Validation not much except for balloons, UARS PIs generally did not talk to aircraft or ground people. Archiving when UARS was decommissioned we archived L1 and L2 data.
  • Slide 5
  • UARS: To release (data) or not to release, that is the question.. Releasing the data as soon as possible Improves the data Publicizes the data But, releasing the data too soon Gives the team a bad reputation Can produce bad science When to decide to release is difficult. You cant wait forever.. You loose community and sponsor support, yet you want to release good data for your own reputation. My general rule: If the data is good enough to do some science then release!
  • Slide 6
  • What was done with Aura Data was released within 6 months and some within a few weeks. Data was produced in a common format. Data users group was formed to create common format guidelines, etc. Validation Produced an extensive validation plan The Big Book of Aura which contained technical details on the instruments and algorithms for validators. Instrument teams participated in validation activity (especially aircraft campaigns) to get familiar with the validation data and techniques and also to help the validators understand the satellite data Provided a validation center (AVDC) that could segment the commissioning phase data over validation sites. Validators were required to share their data before they could get access to Aura data. AVDC also archived team meeting presentations and other documents. Documentation Insisted on extensive documentation of products including accuracy and precision Access Products had to be available at the end of the commissioning phase. NASA data went mostly to the Goddard DISC (except TES). Access to data was controlled by the DISC and NASA policy free access to all data sets
  • Slide 7
  • Aura Organization Mission HIRDLS MLS OMI TES Mission HIRDLS MLS OMI TES Validation Working Group Data Working Group $$ Aircraft and Ground Based Programs NASA Science Programs Needs NASA Data Centers AVDC Validation Data DataData DataData
  • Slide 8
  • How well did it work? Pressure on teams to release as soon as possible. Some resistance to this. Having TES in a different archive system was stupid. The Langley DAAC did not have the resources that the Goddard DISC has, and getting TES data was initially difficult. Development of the data documents was slow and painful for some investigators The validation missions could have been better targeted to instrument measurements. Part of this was NASA politics. Tried to achieve a balance between campaigns and long term measurements (SHADOZ, Ticosonde, NDACC)
  • Slide 9
  • Four Guidelines for Satellite Data from UARS and Aura 1)Data release and availability Instrument data is released and available, not hidden behind bureaucratic protocols, and some kind of browse system exists. 2)Data format data must be packed in easily readable self describing formats. 3)Data tools exist - browse, unpack, read 4)Data documentation data must be well documented and data quality flags clearly explained.
  • Slide 10
  • (1) Data release and availability Release data as soon as you can... Data must be easily accessible - If users cant find the data, they wont use it Data sets should be freely available (lesson from UARS) Registering users: (if you insist on this) should be optional and a positive experience. A catalogue/browse systems should help the users understand the data The GSFC Giovanni system provides an example of an excellent browse system including time series, image plots, multiple data types and download. But it is limited to atmospheric data. The catalogue/browse system should link directly to data access ports Ordering and staging data granules using a market basket approach is useful only in browse mode this approach now seems to me archaic. With the advent high speed access, all data should be made available through anonymous ftp sites or the equivalent.
  • Slide 11
  • (2) Data Formats/Gridding The variety of data formats and the different mapping systems is somewhat bewildering to the novice. While NASA uses HDF, NOAA uses GRIB/BUFR and NCAR uses NetCDF, etc. No agreement on formatting standards within U.S. and Europe. Aura mandated HDF5 as did AVDC. This was a good move although some validators objected until AVDC provided a conversion tool. Nonetheless, variety formats are used even within EOS and this creates barriers to the data
  • Slide 12
  • (3) Data Tools The best data centers provide generalized unpacking and reading tools to extract data for use. Part of the issue is that the tools must be provided in multiple languages: IDL, Fortran, C, MatLab.. others.* The NASA DISC provides data read tools in multiple languages Nevertheless data distributors should provide at least some tools to help users extract the data *GISS Panolpy is a good generalized browse/unpacking tool.
  • Slide 13
  • (4) Data documentation Aura experience is that documentation of the data set is key to scientific utility. Should change as data products are reprocessed it needs to be a living documents that includes new products, changes in precision, etc. Documentation should be available on line as HTML. Documentation should include recommendations to users on how to use the data or how they might misuse the data. Documentation should include reference to publications on the data, who to contact with questions, etc. Readme files how to use the data need to be up to date Data Quality flags or equivalent (accuracy, precision) are critical to prevent misuse. Flags or screening data needs to be clear Example of how to use flags would be helpful
  • Slide 14
  • Lets do some example reviews
  • Slide 15
  • Score sheet developed from these ideas InstrumentMLSMIPASOMIIASI Available Format/Grid Tools Documentation Total 0 = not available 1 = inadequate 2 = adequate 3 = good but could be improved 4 = excellent
  • Slide 16
  • Products nicely labeled Data access clear Data Tools Browse through Giovanni Excellent documentation MLS CO
  • Slide 17
  • Click on order data Click on CO Info Description of CO Data Click on Readers
  • Slide 18
  • Score Sheet 0 = not available 1 = inadequate 2 = adequate 3 = good but could be improved 4 = excellent InstrumentMLSMIPASGOME IIVIIRS Available4 Simply formatted4 Tools3 Documentation4 Total15
  • Slide 19
  • MIPAS CO Availability of MIPAS data MIPAS Earthnet Online Need to register with ESA and create a research project NERC Earth Observations NERC web page has get MIPAS data Need to log on to NEODC, no instructions how Multiple groups working on MIPAS leads to some confusion on the data sets (UK, IFAC, KIT) Data tools BEAT and VISAN extensive read tools but somewhat confusing Browse tool : http://earth.eo.esa.int/pcs/envisat/mipas/reports/dailymaps/20120101/
  • Slide 20
  • MIPAS Browse Not bad but really limited
  • Slide 21
  • Overall Impression of MIPAS No clear path through multiple web information sources excessive documentation in some places under documentation in others not clear how these link Bureaucratic barriers to getting the data Some data files in non standard format requires specialized readers different readers from different groups Browse system ok but limited to last two years Varying scales of browse imagery is confusing
  • Slide 22
  • Score Sheet 0 = not available 1 = inadequate 2 = adequate 3 = good but could be improved 4 = excellent InstrumentMLSMIPASGOME IIIASI Available42 Simply formatted42 Tools33 Documentation43 Total1510
  • Slide 23
  • GOME 2 NO 2 (TEMIS) One click download Reader tool User manual (refers to Sciamachy)
  • Slide 24
  • Overall Impression of GOME II TEMIS Good browse Data available via ftp Inconsistent formats (NO 2 is in HDF, CH 2 O is in ASCII) Documentation adequate, but too brief
  • Slide 25
  • Score Sheet 0 = not available 1 = inadequate 2 = adequate 3 = good but could be improved 4 = excellent InstrumentMLSMIPASGOME II (TEMIS) IASI Available424 Simply formatted 423 Tools322 Documentation432 Total15911
  • Slide 26
  • IASI Want to look at CO, compare to AIRS & TES Data web page Only Level 1c as EPS -> HDF5 Requires NEODC registration
  • Slide 27
  • IASI cont. Registered with NEODC
  • Slide 28
  • IASI Cont. Link Broken ARGHHH!!!
  • Slide 29
  • Score Sheet 0 = not available 1 = inadequate 2 = adequate 3 = good but could be improved 4 = excellent InstrumentMLSMIPASGOME II (TEMIS) IASI Available4240 Simply formatted 4232 Tools3220 Documentation4322 Total199114
  • Slide 30
  • Lessons Learned and Recommendations 1)Data availability data must be easily available for download get rid of registration, use anon. FTP site with files organized logically (product, year, month, day, swaths). 2)Data format data must be packed in easily readable self describing formats - use NetCDF or equivalent, no do-it-yourself formats. 3)Provide data tools (unpack and read tools at minimum). 4)Data must be well documented and data quality flags obvious.
  • Slide 31
  • Another General Recommendation Form a data product utilization advisory group. Provide the data center with advice/feedback on web presence and accessibility Provide data center advice on priority products Be a comprehensive data product review and test group Who should be in this group Scientists familiar with products but not this instrument Scientists from adjacent fields facing similar problems Other data users
  • Slide 32
  • Grazie