Upload
lonna
View
59
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Advancing Collaborative Connections for Earth System Science (ACCESS) Program. A Data Quality Screening Service for Remote Sensing Data. Christopher Lynnes, NASA/GSFC (P.I.) Edward Olsen, NASA/JPL Peter Fox, RPI Bruce Vollmer, NASA/GSFC Robert Wolfe, NASA/GSFC. Contributions from - PowerPoint PPT Presentation
Citation preview
A Data Quality Screening Service for Remote Sensing
DataChristopher Lynnes, NASA/GSFC (P.I.)
Edward Olsen, NASA/JPLPeter Fox, RPI
Bruce Vollmer, NASA/GSFCRobert Wolfe, NASA/GSFC
Contributions fromR. Strub, T. Hearty, Y-I Won, M. Hegde, V. Jayanti
S. Zednik, P. West, N. Most, S. Ahmad, C. Praderas, K. Horrocks, I. Tcherednitchenko, and A.
Rezaiyan-Nojani
Advancing Collaborative Connections for Earth System Science (ACCESS) Program
2
Outline Why a Data Quality Screening Service? Making Quality Information Easier to Use via
the Data Quality Screening Service (DQSS) DEMO DQSS Architecture DQSS Accomplishments DQSS Going Forward The Complexity of Data Quality
3
Why a Data Quality Screening Service?
4The quality of data can vary considerably
AIRS Variable Best (%)
Good (%)
Do Not Use (%)
Total Precipitable Water
38 38 24
Carbon Monoxide 64 7 29Surface Temperature
5 44 51Version 5 Level 2 Standard Retrieval Statistics
5Quality schemes can be relatively simple…
Total Column Precipitable Water
Qual_H2O
Best Good Do Not Usekg/m2
6…or they can be more complicated
Hurricane Ike, viewed by the Atmospheric Infrared Sounder (AIRS)
PBest : Maximum pressure for which quality value is
“Best” in temperature profiles
Air Temperatureat 300 mbar
300 mbar
7Quality flags are also sometimes packed together
into bytesCloud Mask Status Flag0=Undetermined1=Determined
Cloud Mask Cloudiness Flag0=Confident cloudy1=Probably cloudy2=Probably clear3=Confident clear
Day / Night Flag0=Night1=Day
Sunglint Flag0=Yes1=No
Snow/ Ice Flag0=Yes1=No
Surface Type Flag0=Ocean, deep lake/river1=Coast, shallow lake/river2=Desert3=Land
Big-endian arrangement for the Cloud_Mask_SDS variable in atmospheric products from Moderate Resolution Imaging
Spectroradiometer (MODIS)
Recommendations for quality filtering can be
confusingAIRS
Quality IndicatorsMODIS Aerosols Confidence Flags
Data Assim. Best 0
Climatic Studies Good 1
Do Not Use Do Not Use 2 Use these flags in order
to stay within expected error bounds
3 Very Good2 Good1 Marginal0 Poor
3 Very Good2 Good1 Marginal0 Poor
Ocean Land
±0.05 ± 0.15 t ±0.03 ± 0.10 tOcean Land
Different framing for recommendations
Opposite direction for QC codes
9
Current user scenarios...
Nominal scenario Search for and download data Locate documentation on handling quality Read & understand documentation on quality Write custom routine to filter out bad pixels
Equally likely scenario (especially in user communitiesnot familiar with satellite data) Search for and download data Assume that quality has a negligible effect
Repeat for
each user
10The effect of bad
qualitydata is often not
negligibleTotal Column
Precipitable Water Quality
Best Good Do Not Usekg/m2
Hurricane Ike, 9/10/2008
11Neglecting quality may introduce bias (a more subtle
effect)AIRS Relative Humidity Comparison against Dropsonde with and without Applying PBest Quality Flag Filtering
Boxed data points indicate AIRS Relative Humidity data with dry bias > 20%
From a study by Sun Wong (JPL) on specific humidity in the Atlantic Main Development Region for Tropical
Storms
without QC filtering
with QC filtering
12Percent of Biased-High Data in MODIS Aerosols Over Land Increases as Confidence
Flag Decreases
Bad
Marginal
Good
Very Good
0% 20% 40% 60% 80% 100%
Compliant*Biased LowBiased High
*Compliant data are within + 0.05 + 0.2tAeronet
Statistics derived from Hyer, E., J. Reid, and J. Zhang, 2010, An over-land aerosol optical depth data set for data assimilation by filtering, correction, and aggregation of MODIS Collection 5 optical depth retrievals, Atmos. Meas. Tech., 3, 4091–4167.
Poor
13
Making Quality Information Easier to Use
via the Data Quality Screening Service (DQSS)
14
DQSS Team P.I.: Christopher Lynnes Software Implementation: Goddard Earth Sciences
Data and Information Services Center Implementation: Richard Strub Local Domain Experts (AIRS): Thomas Hearty and Bruce
Vollmer
AIRS Domain Expert: Edward Olsen, AIRS/JPL MODIS Implementation
Implementation: Neal Most, Ali Rezaiyan, Cid Praderas, Karen Horrocks, Ivan Tcherednitchenko
Domain Experts: Robert Wolfe and Suraiya Ahmad
Semantic Engineering: Tetherless World Constellation @ RPI Peter Fox, Stephan Zednik, Patrick West
15The DQSS filters out bad pixels for the user
Default user scenario Search for data Select science team recommendation for quality
screening (filtering) Download screened data
More advanced scenario Search for data Select custom quality screening options Download screened data
16DQSS replaces bad-quality pixels with fill values
Mask based on user criteria
(Quality level < 1*)
Good quality data pixels
retained
Output file has the same format and structure as the input file (except for extra mask and original_data fields)
Original data array(Total column precipitable water)
*0 = Best
17Visualizations help users see the effect of different quality filters
18DQSS can encode the science team recommendations on
quality screening AIRS Level 2 Standard Product
Use only Best for data assimilation uses Use Best+Good for climatic studies
MODIS Aerosols Use only VeryGood (highest value) over
land Use Marginal+Good+VeryGood over ocean
19
Initial settings are based on Science Team recommendation.
(Note: “Good” retains retrievals that are Good or better).
You can choose settings for all parameters at once...
...or variable by variable
Or, users can select their own criteria...
20
DEMO
http://ladsweb.nascom.nasa.gov Select Terra or Aqua Level 2 Atmosphere Select “51 - Collection 5.1”! Add results to Shopping Cart (do not go directly to
“Order Now”) Go to shopping cart and ask to “Post-Process and
Order” http://mirador.gsfc.nasa.gov
(Search for ‘AIRX2RET’)
21
DQSS Architecture
22
DQSS Flow
user selection
Screener
Quality Ontology
data file
quality mask
screened data file
EndUserdata file
w/ mask
Masker
Ontology Query
23DQSS Ontology(The Whole Enchilada)
Data Field Binding Module
Data Field Semantics
Module
Quality View Module
24DQSS Quality View Zoom
25DQSS encapsulates ontology and screening parameters
MODAPS Minions
LAADSWeb GUI
DQSS Ontolog
y Service
data productscreening options
MODAPSDatabase
screeningrecipe(XML)
selected options
screening recipe(XML)
screening parameters
(XML)Encapsulation enables reuse in other data centers with diverse environments.
26
DQSS Accomplishments DQSS is operational at two EOSDIS data
centers: TRL = 9 MODIS L2 Aerosols MLS L2 Water Vapor AIRS Level 2 Retrievals MODIS L2 Water Vapor
Metrics are being collected routinely DQSS is almost released (All known
paperwork filed)
27Papers and Presentations
Managing Data Quality for Collaborative Science workshop (peer-reviewed paper + talk)
Sounder Science meeting talk ESDSWG Poster ESIP Poster A-Train User Workshop (part of GES DISC
presentation) AGU: Ambiguity of Data Quality in Remote Sensing
Data
28
Metrics DQSS is included in web access logs sent to EOSDIS
Metrics System (EMS) Tagged with “Protocol” DQSS EMS MCT (ESDS Metrics) bridge implemented for both
DAACs* Metrics from EMS:
Users: 208 Downloads: 73,452 But still mostly AIRS, some MLS MODIS L2 Aerosol may be the “breakthrough” product
*DQSS is the first multi-DAAC bridge for ACCESS
29ESDSWG Participation by DQSS
Active participant (Lynnes) in Technology Infusion Working Group Subgroups: Semantic Web, Interoperability
and Processes and Strategies Established and led two ESIP clusters as
technology infusion activities Federated Search Discovery Earth Science Collaboratory
Participated in ESDSWG reorganization tiger team
Founded ESDSWG colloquium series
30
DQSS Spinoff Benefits DQSS staff unearthed a subtle error in the MODIS
Dark Target Aerosols algorithm Quality Confidence flags are sometimes set to Very
Good, Good or Marginal when they should be marked as “Bad”
Fixed in Version 6 of the algorithm Also fixed in Aerostat ACCESS project
Semantic Web technology infusion in DQSS enables future infusion of the RPI component of Multi-Sensor Data Synergy Advisor into the Atmospheric Composition Portal Shared skills, shared language, shared tools
The Earth Science Collaboratory concept was born in a discussion with Kuo about DQSS in the ESDSWG Poster Session in New Orleans.
31
Going Forward Maintain software as part of GES DISC core services
Modify as necessary for new releases New data product versions have simpler quality
schemes Add data products as part of data support
Sustainability Test: How long to add a product? Answer: a couple days for MLS Ozone
Continue outreach Demonstrated to scientists in NASA ARSET
Release Software and Technology
32
DQSS Recap
Screening satellite data can be difficult and time consuming for users
The Data Quality Screening System provides an easy-to-use service
The result should be: More attention to quality on users’ part More accurate handling of quality information… …With less user effort
33The Complexity of Data QualityPart 1
Quality Control ≠ Quality Assessment QC represents algorithm’s “happiness” with an
individual retrieval or profile at “pixel” level QA is science team’s statistical assessment of
quality at the product level, based on cal / val campaigns
Algorithm version N
Data Product
version NScience Team
Algorithm version
N+1
Data Product
version N+1Science Team
QC guess quality assessme
ntimproved QC estimation
QC guess qualityassessme
nt
QC vs. QA – User View
Give me just the good- quality data values
Tell me how good the dataset is
Data Provider
Quality Control
Quality Assessment
35
Quality (Priority) is in the Eye of the Beholder
Climate researchers: long-term consistency, temporal coverage, spatial coverage
Algorithm developers: accuracy of retrievals, information content
Data assimilators: spatial consistency Applications: latency, spatial resolution Education/Outreach: usability, simplicity
Recommendation 1: Harmonize Quality Terms
Start with ISO 19115/19157 Data Quality model, but...
Q: Examples of “Temporal Consistency” quality issues?ISO: Temporal Consistency=“correctness of the order of events”
Land Surface Temperature anomaly from Advanced Very High Resolution Radiometer
trend artifact from orbital
drift
discontinuity artifact
from change in satellites
Recommendation 2: Address More Dimensions of Quality
Accuracy: measurement bias + dispersion Accuracy of data with low-quality flags Accuracy of grid cell aggregations
Consistency: Spatial Temporal Observing conditions
38Recommendation #2 (cont.)More dimensions of quality
Completeness Temporal: Time range, diurnal coverage,
revisit frequency Spatial: Coverage and Grid Cell
Representativeness Observing conditions: cloudiness, surface
reflectance N.B.: Incompleteness affects accuracy
via sampling bias AIRS dry sampling bias at high latitudes due to
incompleteness in high-cloud conditions, which tend to have rain and convective clouds with icy tops
AIRS wet sampling bias where low clouds are prevalent
Quality Indicator: AOD spatial completeness (coverage)
MODIS Aqua MISR
Due to a wider swath, MODIS AOD covers more area than MISR. The seasonal and zonal patterns are rather similar.
Average Percentage of Non-Fill Values in Daily Gridded Products
40Quality Indicator: Diurnal Coverage for MODIS Terra in summer
Local Hour of Day
Probability (%) of an “Overpass*” in a Given Day*That is, being included in a MODIS L2 pixel during that hour of the day
Because Terra is sun-synchronous with a 10:30 equator crossing, observations are limited to a short period during the day at all but high latitudes (Arctic in summer).
Recommendation 3: Address Fitness for Purpose Directly*
Standardize terms of recommendation Enumerate more positive realms and
examples Enumerate negative examples
*A bit controversial
42Recommendation #4: More Outreach...
...Once we know what we want to say Quality Screening: no longer any excuse not to for
DQSS-served datasets Should NASA supply DQSS for more datasets?
Quality Assessment: dependent on Recommendations 1-3
Venues Science team meetings...but for QC this is
“preaching to the choir” NASA Applied Remote Sensing Education and
Training (ARSET): get them at the start of their NASA data usage! Demonstrated DQSS and Aerostat to ARSET Any others like ARSET?
Training workshops?
43
Backup Slides
44
Lessons Learned
The tall pole is acquiring knowledge about what QC indicators really mean, and how they should be used.
Is anyone out there actually using the quality documentation? Detailed analysis by DQSS of quality documentation turned up
errors, ambiguities, inconsistencies, even an algorithm bug Yet the help desks are not getting questions about these...
Seemed like a good idea at the time... Visualizations: good for validating screening, but do users use
them? Hdf-java: not quite the panacea for portability we hoped DQSS – OPeNDAP gateway:
OPeNDAP access to DQSS-screened data enables use by IDV /McIDAS-V
But it’s hard to integrate into a search interface (Not giving up yet...)
45Possible Client Side Concept
Ontology
Web Form
Data Provider
Masker
Screener
dataset-specific
instance infodata files
screening criteria
XML File
XML File
data files
GES DISC End User
46DQSS Target Uses and Users
RoutineVisual-ization
Quick Recon.
Machine-level Metrics
Interdisciplinary X X Educational X X Expert/Power ? X X Applications X Algorithm Developers X
47
ESDSWG Activities Contributions to all 3 TIWG subgroups
Semantic Web: Contributed Use Case for Semantic Web tutorial @ ESIP Participation in ESIP Information Quality Cluster
Services Interoperability and Orchestration Combining ESIP OpenSearch with servicecasting and
datacasting standards Spearheading ESIP Federated OpenSearch cluster
5 servers (2 more soon); 4+ clients Now ESIP Discover cluster (see above)
Processes and Strategies: Developed Use Cases for Decadal Survey Missions:
DESDynI-Lidar, SMAP, and CLARREO
File-Level Quality Statistics are not always useful for data
selectionStudy Area
Percent Cloud Cover?
Level 3 grid cell standard deviation is difficult to interpret due to its dependence
on magnitude
Mean
StandardDeviation
MODIS AerosolOptical Thicknessat 550 nm
Neither pixel count nor standard deviation alone express how representative the grid
cell value isMODIS Aerosol Optical Thickness at 0.55 microns
Level 3 Grid AOT Mean
Level 2 Swath
Level 3 Grid AOT Standard Deviation
Level 3 Grid AOT Input Pixel Count0 1 2 0 10.5
1 122