Importance of accessible well documented and scientifically
relevant data what we learned from UARS and Aura Mark Schoeberl
STC
Slide 2
Why should we be motivated to provide accessible, well
documented data Encourages instrument data analysis Leads to
scientific improvement of the data Leads to new science discoveries
Leads to new observational missions Leads to improvements in
current data sets through inter-comparison Increases our general
knowledge of the Earth System
Slide 3
Experience with Two Missions UARS 1991-2005 Aura
2004-present
Slide 4
What was done with UARS UARS data was controlled by a strict
protocol. Only the selected PIs could use the data. Protocol
covered several years after launch. This upset and angered the
community (more on next slide). UARS data was released very slowly
and reluctantly. Data was archived on a UARS machine (VAX) with
limited tools. The UARS machine was used to process data as well as
distribute it. By todays standards the data set was tiny. Data
documentation was not emphasized and descriptions were finally made
available through a special JGR issue that came out many years
after launch. If you had problems with the data you had to call the
PI. The protocol was dissolved a couple years after launch and the
data was generally distributed it is now available through the
Goddard DISC. Validation not much except for balloons, UARS PIs
generally did not talk to aircraft or ground people. Archiving when
UARS was decommissioned we archived L1 and L2 data.
Slide 5
UARS: To release (data) or not to release, that is the
question.. Releasing the data as soon as possible Improves the data
Publicizes the data But, releasing the data too soon Gives the team
a bad reputation Can produce bad science When to decide to release
is difficult. You cant wait forever.. You loose community and
sponsor support, yet you want to release good data for your own
reputation. My general rule: If the data is good enough to do some
science then release!
Slide 6
What was done with Aura Data was released within 6 months and
some within a few weeks. Data was produced in a common format. Data
users group was formed to create common format guidelines, etc.
Validation Produced an extensive validation plan The Big Book of
Aura which contained technical details on the instruments and
algorithms for validators. Instrument teams participated in
validation activity (especially aircraft campaigns) to get familiar
with the validation data and techniques and also to help the
validators understand the satellite data Provided a validation
center (AVDC) that could segment the commissioning phase data over
validation sites. Validators were required to share their data
before they could get access to Aura data. AVDC also archived team
meeting presentations and other documents. Documentation Insisted
on extensive documentation of products including accuracy and
precision Access Products had to be available at the end of the
commissioning phase. NASA data went mostly to the Goddard DISC
(except TES). Access to data was controlled by the DISC and NASA
policy free access to all data sets
Slide 7
Aura Organization Mission HIRDLS MLS OMI TES Mission HIRDLS MLS
OMI TES Validation Working Group Data Working Group $$ Aircraft and
Ground Based Programs NASA Science Programs Needs NASA Data Centers
AVDC Validation Data DataData DataData
Slide 8
How well did it work? Pressure on teams to release as soon as
possible. Some resistance to this. Having TES in a different
archive system was stupid. The Langley DAAC did not have the
resources that the Goddard DISC has, and getting TES data was
initially difficult. Development of the data documents was slow and
painful for some investigators The validation missions could have
been better targeted to instrument measurements. Part of this was
NASA politics. Tried to achieve a balance between campaigns and
long term measurements (SHADOZ, Ticosonde, NDACC)
Slide 9
Four Guidelines for Satellite Data from UARS and Aura 1)Data
release and availability Instrument data is released and available,
not hidden behind bureaucratic protocols, and some kind of browse
system exists. 2)Data format data must be packed in easily readable
self describing formats. 3)Data tools exist - browse, unpack, read
4)Data documentation data must be well documented and data quality
flags clearly explained.
Slide 10
(1) Data release and availability Release data as soon as you
can... Data must be easily accessible - If users cant find the
data, they wont use it Data sets should be freely available (lesson
from UARS) Registering users: (if you insist on this) should be
optional and a positive experience. A catalogue/browse systems
should help the users understand the data The GSFC Giovanni system
provides an example of an excellent browse system including time
series, image plots, multiple data types and download. But it is
limited to atmospheric data. The catalogue/browse system should
link directly to data access ports Ordering and staging data
granules using a market basket approach is useful only in browse
mode this approach now seems to me archaic. With the advent high
speed access, all data should be made available through anonymous
ftp sites or the equivalent.
Slide 11
(2) Data Formats/Gridding The variety of data formats and the
different mapping systems is somewhat bewildering to the novice.
While NASA uses HDF, NOAA uses GRIB/BUFR and NCAR uses NetCDF, etc.
No agreement on formatting standards within U.S. and Europe. Aura
mandated HDF5 as did AVDC. This was a good move although some
validators objected until AVDC provided a conversion tool.
Nonetheless, variety formats are used even within EOS and this
creates barriers to the data
Slide 12
(3) Data Tools The best data centers provide generalized
unpacking and reading tools to extract data for use. Part of the
issue is that the tools must be provided in multiple languages:
IDL, Fortran, C, MatLab.. others.* The NASA DISC provides data read
tools in multiple languages Nevertheless data distributors should
provide at least some tools to help users extract the data *GISS
Panolpy is a good generalized browse/unpacking tool.
Slide 13
(4) Data documentation Aura experience is that documentation of
the data set is key to scientific utility. Should change as data
products are reprocessed it needs to be a living documents that
includes new products, changes in precision, etc. Documentation
should be available on line as HTML. Documentation should include
recommendations to users on how to use the data or how they might
misuse the data. Documentation should include reference to
publications on the data, who to contact with questions, etc.
Readme files how to use the data need to be up to date Data Quality
flags or equivalent (accuracy, precision) are critical to prevent
misuse. Flags or screening data needs to be clear Example of how to
use flags would be helpful
Slide 14
Lets do some example reviews
Slide 15
Score sheet developed from these ideas
InstrumentMLSMIPASOMIIASI Available Format/Grid Tools Documentation
Total 0 = not available 1 = inadequate 2 = adequate 3 = good but
could be improved 4 = excellent
Slide 16
Products nicely labeled Data access clear Data Tools Browse
through Giovanni Excellent documentation MLS CO
Slide 17
Click on order data Click on CO Info Description of CO Data
Click on Readers
Slide 18
Score Sheet 0 = not available 1 = inadequate 2 = adequate 3 =
good but could be improved 4 = excellent InstrumentMLSMIPASGOME
IIVIIRS Available4 Simply formatted4 Tools3 Documentation4
Total15
Slide 19
MIPAS CO Availability of MIPAS data MIPAS Earthnet Online Need
to register with ESA and create a research project NERC Earth
Observations NERC web page has get MIPAS data Need to log on to
NEODC, no instructions how Multiple groups working on MIPAS leads
to some confusion on the data sets (UK, IFAC, KIT) Data tools BEAT
and VISAN extensive read tools but somewhat confusing Browse tool :
http://earth.eo.esa.int/pcs/envisat/mipas/reports/dailymaps/20120101/
Slide 20
MIPAS Browse Not bad but really limited
Slide 21
Overall Impression of MIPAS No clear path through multiple web
information sources excessive documentation in some places under
documentation in others not clear how these link Bureaucratic
barriers to getting the data Some data files in non standard format
requires specialized readers different readers from different
groups Browse system ok but limited to last two years Varying
scales of browse imagery is confusing
Slide 22
Score Sheet 0 = not available 1 = inadequate 2 = adequate 3 =
good but could be improved 4 = excellent InstrumentMLSMIPASGOME
IIIASI Available42 Simply formatted42 Tools33 Documentation43
Total1510
Slide 23
GOME 2 NO 2 (TEMIS) One click download Reader tool User manual
(refers to Sciamachy)
Slide 24
Overall Impression of GOME II TEMIS Good browse Data available
via ftp Inconsistent formats (NO 2 is in HDF, CH 2 O is in ASCII)
Documentation adequate, but too brief
Slide 25
Score Sheet 0 = not available 1 = inadequate 2 = adequate 3 =
good but could be improved 4 = excellent InstrumentMLSMIPASGOME II
(TEMIS) IASI Available424 Simply formatted 423 Tools322
Documentation432 Total15911
Slide 26
IASI Want to look at CO, compare to AIRS & TES Data web
page Only Level 1c as EPS -> HDF5 Requires NEODC
registration
Slide 27
IASI cont. Registered with NEODC
Slide 28
IASI Cont. Link Broken ARGHHH!!!
Slide 29
Score Sheet 0 = not available 1 = inadequate 2 = adequate 3 =
good but could be improved 4 = excellent InstrumentMLSMIPASGOME II
(TEMIS) IASI Available4240 Simply formatted 4232 Tools3220
Documentation4322 Total199114
Slide 30
Lessons Learned and Recommendations 1)Data availability data
must be easily available for download get rid of registration, use
anon. FTP site with files organized logically (product, year,
month, day, swaths). 2)Data format data must be packed in easily
readable self describing formats - use NetCDF or equivalent, no
do-it-yourself formats. 3)Provide data tools (unpack and read tools
at minimum). 4)Data must be well documented and data quality flags
obvious.
Slide 31
Another General Recommendation Form a data product utilization
advisory group. Provide the data center with advice/feedback on web
presence and accessibility Provide data center advice on priority
products Be a comprehensive data product review and test group Who
should be in this group Scientists familiar with products but not
this instrument Scientists from adjacent fields facing similar
problems Other data users