25
Bridging the gap between quality and validation Gregory Leptoukh NASA GSFC Some of the slides have been borrowed from Steve Volz who in turn has borrowed from other folks (including myself) who borrowed some from Chris Lynnes … in a good collaborative spirit of reusing and recycling

Bridging the gap between quality and validation Gregory Leptoukh NASA GSFC Some of the slides have been borrowed from Steve Volz who in turn has borrowed

Embed Size (px)

Citation preview

Bridging the gap between quality and validation

Gregory LeptoukhNASA GSFC

Some of the slides have been borrowed from Steve Volz who in turn has borrowed from other folks (including myself) who borrowed some from Chris

Lynnes … in a good collaborative spirit of reusing and recycling

Overview• Perspectives of Data Quality

– At the Instrument/Measurement Level– At the Data Record Level

• Data Quality Management• NASA Earth Science Programmatic Overview of

Stewardship (from Steve Volz)• Quality of Level 3 products • Open Issues• Proposed mini-workshop in Montreal

Context

• Users of the satellite data are insisting that the data used are accurate and reliable

• Users & Decision Makers are asking for better information about data quality…

• But this turns out to be more surprisingly complicated.

What you need depends on who you are• Measuring Climate Change:

– Model validation - gridded contiguous data with uncertainties in each grid cell

– Long-term time series – bias assessment is the must

• Studying phenomena using multi-sensor data:– Consistently processed and presented data with quality information

• Realizing Societal Benefits through Applications:– Near-Real Time for transport and event monitoring - in some cases,

coverage might be more important that quality– Monitoring (e.g., air quality exceedance levels) – uncertainty

• Educational (users generally not well-versed in the intricacies of quality; just taking all the data as usable can impair educational lessons) – only the best products

Quality must include assessment of uncertainty and bias

What Is Data Quality?

• “Data Quality refers to the degree of excellence exhibited by the data in relation to the portrayal of the actual phenomena.”

• “The state of completeness, validity, consistency, timeliness and accuracy that makes data appropriate for a specific use.”

Completeness and Excellence

Different kinds of reported data quality

• Pixel-level Quality: algorithmic guess at usability of data point– Granule-level Quality: statistical roll-up of Pixel-level Quality

• Product-level Quality: how closely the data represent the actual geophysical state

• Record-level Quality: how consistent and reliable the data record is across generations of measurements

Different quality types are often erroneously assumed having the same meaningEnsuring Data Quality at these different levels requires different focus and

action

Pixel-level: Use only pixels with quality "Good" or better.Total Column Precipitable

WaterQuality

Using bad quality data is in general not negligible: use bad pixels and hurricanes may look dry in the AIRS image above

Product-level: “Which of products to use?“

Top-of-atmosphere Total Solar Irradiance (TSI) data measured by various satellites from 1975 to 2005 (from Datla et al., 2010, Int. J. of Rem. Sensing, 31, 867–880)

2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018

Approved Planned/Pending approval

High accuracy SSH from mid-inclination orbit

Medium accuracy SSH from high-inclination sun-synchronous orbit

Jason-3

Jason-CSJason-2

Sentinel-3A

Saral/AltiKa

GFO-FO

Jason-1

ExtendedIn Orbit

Record-level: Ensuring continuity across generations of satellites

ENVISAT

• Defining robust measurement strategies as the data source and measurement approach may change

Reliable Data Quality Requires Sustained Effort

• Common, Coordinated understanding and definition of quality terms and their application

• Cradle to Grave and Beyond Quality Management

Data Quality and Data/Information PolicyData Policy Tenets relevant to Data Quality: • NASA will plan and follow data acquisition policies that ensure the collection of

long-term data sets needed to satisfy the research requirements of NASA's Earth science program.

• NASA commits to the full and open sharing of Earth science data obtained from NASA Earth observing satellites, sub-orbital platforms and field campaigns with all users as soon as such data become available.

• There will be no period of exclusive access to NASA Earth science data. Following a post-launch checkout period, all data will be made available to the user community. Any variation in access will result solely from user capability, equipment, and connectivity.

• NASA will make available all NASA-generated standard products along with the source code for algorithm software, coefficients, and ancillary data used to generate these products.– All NASA Earth science missions, projects, and grants and cooperative agreements

shall include data management plans to facilitate the implementation of these data principles.

NASA Earth Science’s Defined Data Maturity Levels

Beta• Products intended to enable users to gain familiarity with the parameters and the

data formats.

Provisional• Product was defined to facilitate data exploration and process studies that do not

require rigorous validation. These data are partially validated and improvements are continuing; quality may not be optimal since validation and quality assurance are ongoing.

Validated• Products are high quality data that have been fully validated and quality checked,

and that are deemed suitable for systematic studies such as climate change, as well as for shorter term, process studies. These are publication quality data with well-defined uncertainties, but they are also subject to continuing validation, quality assurance, and further improvements in subsequent versions. Users are expected to be familiar with quality summaries of all data before publication of results; when in doubt, contact the appropriate instrument team.

NASA Earth Science’s Defined Data Validation Levels

Open questions

• What do users want?• What do users really need?• What do data providers want users to pay

attention to?

04/20/23 Leptoukh: Data Quality 14

General Product-Level Issues• How can we extrapolate validation knowledge

about Level 2 product quality to the corresponding Level 3 gridded product quality?

• How can we determine biases from product-level quality?

• How can we harmonize quality across products – which one has better quality over certain areas?

15Leptoukh: Data Quality04/20/23

General Pixel-Level Issues• How well we extrapolate validation

knowledge about selected Level 2 pixels to the Level 2 (swath) product?

• How can we harmonize terms and methods for pixel-level quality, e.g. AIRS “good” vs. MODIS “3”?

• What part should these different qualities play in provenance – quality provenance?

• When is granule-level quality useful?

16Leptoukh: Data Quality04/20/23

Level 3 (Gridded) data quality issues

• Modelers need gridded “non-gappy” data with error bars in each grid cell

• Many differences between Level 3 data from different sensors and little uncertainty information

• Standard deviation within a grid cell reflects spatial variability at low-mid latitudes but mostly temporal variability at high latitudes

• What is validation of Level 3 product?

17Leptoukh: Data Quality04/20/23

Bias-related Issues• How does bias relate to product-level quality?• How does sampling bias affect product quality?

–Spatial: sampling polar area more than equatorial–Temporal: sampling one time of a day only–Vertical: not sensitive to a certain part of the atmosphere thus emphasizing other parts– Pixel Quality : filtering by quality may mask out areas with specific features– Clear sky: e.g., measuring humidity only where there are clouds may lead to dry bias–Surface type related

18Leptoukh: Data Quality04/20/23

Current initiatives• NASA 2010 ESDSWG / MPARWG initiative expands 2008

MEaSUREs and ACCESS programs emphasis on data quality. Legacy of the NASA Guidelines for Ensuring Quality of Information, 2001

• ESA is currently implementing contractual requirements for providing quality information within the Climate Change Initiative

• New (May’10) Guideline for the Generation of Datasets and Products Meeting GCOS Requirements

• CEOS QA4EO provides recommendations for capturing uncertainties but basically stops at Level 1

• ISO 19115 provides rich metadata structure for QA

04/20/23 Leptoukh: Data Quality 19

How it is done now?Different disciplines have different approaches to quality

handling:• Sea Surface Temperature – plenty of measurements,

good assessment of biases• Precipitation – multiple rain gauges, appreciation of

sampling bias• Ocean Color – good Cal/Val program• Land• Atmospheric – not very consistent

04/20/23 Leptoukh: Data Quality 20

Sea Surface Temperature exampleSea Surface Temperature Error Budget:

White Paper18 June 2010

A very good example of capturing many issues related to quality and uncertainty. In an effort to coordinate NASA funded projects that relate to satellite-derived SST products, an Interim Sea Surface Temperature Science Team (ISSTST) consisting primarily of those in the U.S. research community using or developing SST products was formed to characterize the uncertainty budget of satellite-derived SSTs.

The ISSTST workshop was attended by 42 ISSTST members and 4 observers . The white paper is the result of this workshop.

04/20/23 Leptoukh: Data Quality 21

What do we propose to do?A framework for consistent assessment, capture

and presentation of data quality information•Extend QA4EO effort to Level 2 and 3 data•Address various biases•Consistently aggregate to Level 3 products to ensure compatibility between data from different instruments •Deliver quality information to users of data in a way that users can understand and use it

04/20/23 Leptoukh: Data Quality 22

Purpose of the Montreal mini-workshop

Bring up the quality/validation issues and propose a two-day workshop in Spring’11

•Inform WGISS and WGCV•Share quality and validation issues and methodologies•Prepare a proposal for the Spring workshop

04/20/23 Leptoukh: Data Quality 23

Possible participants for the 2-hour workshop in Montreal

• Greg Leptoukh (GSFC, context, goals) √• Jan-Peter Muller (UK, terrain perspectives) √• Bojan Bojkov (ESRIN, atmospheric chemistry) √• Joanne Nightingale (GSFC, LPV) √• Peter Minnett (Miami, SST) in principle• Chuck McClain (GSFC, Ocean Color)haven’t talked to yet• Hook Hua (JPL, Informatics, Semantics) haven’t talked to yet• Steve Hughes (JPL PDS, Dictionaries) haven’t talked to yet• Rob Raskin (Informatics, Ontologies) haven’t talked to yet• Pascal Lecomte (ESA perspectives) Peter is going to talk to him

04/20/23 Leptoukh: Data Quality 24

Notional graph of Product Quality validation-based) and Aggregated Pixel Quality

04/20/23 Leptoukh: Data Quality 25

Product quality ≠ aggregated pixel quality but they are getting closer as the product matures