18
1 Automated Data Quality Assurance for Marine Observations James V. Koziana Science Applications International Corporation (SAIC) Hampton, VA 23666 USA Third Meeting of GCOOS DMAC Renaissance Orlando Hotel Forbes Place Orlando, FL 23-24 February 2009

1 Automated Data Quality Assurance for Marine Observations James V. Koziana Science Applications International Corporation (SAIC) Hampton, VA 23666 USA

Embed Size (px)

Citation preview

Page 1: 1 Automated Data Quality Assurance for Marine Observations James V. Koziana Science Applications International Corporation (SAIC) Hampton, VA 23666 USA

1

Automated Data Quality Assurance for Marine Observations

James V. KozianaScience Applications International Corporation (SAIC)

Hampton, VA 23666 USA

Third Meeting of GCOOS DMACRenaissance Orlando Hotel

Forbes Place Orlando, FL

23-24 February 2009

Page 2: 1 Automated Data Quality Assurance for Marine Observations James V. Koziana Science Applications International Corporation (SAIC) Hampton, VA 23666 USA

2

Outline Introduction

IOOS State of Oceans (time and space)

Data Quality Assurance System Quality Assurance Pyramid Science Data Challenges Quality Assurance System

Block diagram and discussions

Results Conclusions

Present and Future Applications Wrap-up

Page 3: 1 Automated Data Quality Assurance for Marine Observations James V. Koziana Science Applications International Corporation (SAIC) Hampton, VA 23666 USA

3

Vice Admiral Lautenbacher, Jr., U.S. Navy (Ret.) Under Secretary of Commerce for Oceans & Atmosphere; November 21, 2005 ACOOS

NANOOS

SCCOOS

CenCOOS

PacIOOS GCOOS

CaRA

SECOORA

MACOORA

NERAGLOS

11 Groups Funded byNOAA Coastal Services Center to

Establish Regional Associations (RAs)

U.S. Integrated Ocean Observing System (IOOS)

Vision: Lead the way in the provision of products and services based on ocean observations for a wide range of societal benefits.

Goal: Achieve unprecedented levels of resolution, quality, and distribution of all global and coastal ocean observations to improve predictions of ecosystem, weather and water, and climate events.

IOOS Requirements• Vision for observing systems will bring streams of real-time data from a distributed sensor system• Each data provider will prepare their data using Data Management and Communications (DMAC) standards and protocols.

Page 4: 1 Automated Data Quality Assurance for Marine Observations James V. Koziana Science Applications International Corporation (SAIC) Hampton, VA 23666 USA

4

State of Oceans & Coasts Varies Across Time and Space

Geophysical1. Sea surface meteorological variables2. Land–Sea Stream flows3. Sea level4. Surface waves, currents5. Ice distribution6. Temperature, Salinity7. Bathymetry

Biophysical1. Optical properties2. Benthic habitats

Chemical1. pCO2

2. Dissolved inorganic nutrients3. Contaminants4. Dissolved oxygen

Biological1. Fish species, abundance2. Zooplankton species, abundance3. Phytoplankton species, biomass (ocean

color)4. Waterborne pathogens

IOOS Core Variables

Page 5: 1 Automated Data Quality Assurance for Marine Observations James V. Koziana Science Applications International Corporation (SAIC) Hampton, VA 23666 USA

5

http://upload.wikimedia.org/wikipedia/commons/e/ee/Groundhog-Standing2.jpg

http://contraryguy-plants.buzznet.com/user/photos/fast-food-close-up/?id=1420108#usersubnav

Data Quality

Data Quality refers to the quality of data. Data are of high quality "if they are fit for their intended uses in operations, decision making and planning" (J.M. Juran). Alternatively, the data are deemed of high quality if they correctly represent the real-world construct to which they refer. These two views can often be in disagreement, even about the same set of data used for the same purpose.

Quality Assurance (QA) and Quality Control (QC)

Quality assurance (QA): an integrated system of activities involving planning, quality control, quality assessment, reporting and quality improvement to ensure that a product or service meets defined standards of quality with a stated level of confidence.

Quality control (QC): the overall system of technical activities whose purpose is to measure and control the quality of a product or service so that it meets the needs of users. The aim is to provide quality that is satisfactory, adequate, dependable, and economical.

QUALITY ASSURANCE DIVISIONNational Center for Environmental Research And Quality AssuranceOffice of Research and DevelopmentU. S. Environmental Protection AgencyDecember 10, 1997

Page 6: 1 Automated Data Quality Assurance for Marine Observations James V. Koziana Science Applications International Corporation (SAIC) Hampton, VA 23666 USA

6

Quality Checks• Static: Single station; single-time checks– locates external out liners in

observations.– Unaware of previous or current meteorological or hydrological situation by other

observations and grids• Validity

• Internal consistency

• Vertical consistency

• Dynamic: Which defines the QC information by taking advantage of other available hydrological information.

– Position Consistency

– Temporal Consistency

– Spatial consistency

• Single character “data descriptor” for each observation– Provides an overall opinion to quality by combining the information from various quality

checks

– Algorithms used to complete the “data descriptors” functions of type of QC checks applied to observations and sophisticated checks

• Level 1: least sophisticated

• Level 2: medium

• Level 3: Most sophisticated

Page 7: 1 Automated Data Quality Assurance for Marine Observations James V. Koziana Science Applications International Corporation (SAIC) Hampton, VA 23666 USA

7

Range Limit Checks (Dynamic, Seasonal and Regional)

Rate Checks (Time Continunity)

Inter-comparisons (same sensor same platform)

Inter-comparisons (nearby similar sensor/platform)

Inter-comparisons (dis-similar sensor/platform)

Comparison with statistical trends

Comparison with remotely sensed data

Comparison with model

Begin at bottom and work upward

Increased

Accuracy

Algorithms more concrete

Works on wider variety of data

Algorithms more conceptual

Work on less variety of data

Quality Assurance Pyramid

Page 8: 1 Automated Data Quality Assurance for Marine Observations James V. Koziana Science Applications International Corporation (SAIC) Hampton, VA 23666 USA

Science Data Lifecycle

System Development Deployment/O&M Exploitation

Sensor Development

Platform Development

Algorithm Development

Algorithm Implementation

Model Development

Data Center Development

Platform Operations

Platform Maintenance

Data Acquisition

Processing DisseminationQA Archiving

Data Center Ops

Basic Research

Model Runs

Applications Decision Support

Model Implementation

Enhanced

Understanding

• Enhanced Understanding

• Forecasts

Page 9: 1 Automated Data Quality Assurance for Marine Observations James V. Koziana Science Applications International Corporation (SAIC) Hampton, VA 23666 USA

9

Science Data Management Challenges Data management systems are an important component of Earth Science missions

Support primary mission goal of timely delivery of high quality data products to science community

They are expensive and time consuming to develop and maintain Relative size of data management code vs science code Problems with traditional stovepipe development approach Continuous change associated with science algorithms

The science team needs to be able to allocate their time and resources to science, not data management. Data management functions represent 60% - 80% of the code for typical science

applications Continuous change inherent in research environments

Highly iterative nature of algorithm development drives numerous changes to code during development and after launch Data management code which is tightly coupled to algorithms and data products

Can be time consuming to modify as changes ripple throughout the code Leads to stovepipe approach that results in significant duplication of code and

effort

Page 10: 1 Automated Data Quality Assurance for Marine Observations James V. Koziana Science Applications International Corporation (SAIC) Hampton, VA 23666 USA

10

Data Quality Assurance System for Earth Science Data and Information

Scalable, modular system that can be used to address various methods of characterizing the quality of data products This approach facilitates science software development

Reduce level of effort required and program risk (cost effective)Allow data management team to be more responsive to science algorithm developers (flexibility)

System is designed to: Include substational core functionality that is common to any science applications Be easily configurable to work with many different data sets (observations and model output) Readily accommodate algorithm and data product additions and modifications with minimal code changes. Balance flexibility and performance

Page 11: 1 Automated Data Quality Assurance for Marine Observations James V. Koziana Science Applications International Corporation (SAIC) Hampton, VA 23666 USA

11

Data Quality Assurance System Block Diagram

“Run time” defined configuration files

Common Data Structure Components

Algorithm QC 1

ControlSubsystem

InputData

OutputFile

User Supplied

QC Algorithms

DataStore

Config. Files

AlgorithmLibrary

NetCDF

HDF

SensorML

XML

ASCII

Input Subsystem

NetCDF

HDF

SensorML

XML

ASCII

Output Subsystem

Algorithm QC 2

Algorithm QC 3

InputData FlowOutput

Framework

DataBaseDataBase

Page 12: 1 Automated Data Quality Assurance for Marine Observations James V. Koziana Science Applications International Corporation (SAIC) Hampton, VA 23666 USA

18

Dqa_app input_data input_data_cfg input_limits input_limits_cfg output_data output_data_cfg

42007 6/25/2008 0:00:00 ATMP1 29 42007 6/25/2008 1:00:00 ATMP1 29 42007 6/25/2008 2:00:00 ATMP1 28.9 42007 6/25/2008 3:00:00 ATMP1 29 42007 6/25/2008 4:00:00 ATMP1 29.1 42007 6/25/2008 5:00:00 ATMP1 28.7 42007 6/25/2008 6:00:00 ATMP1 28.8 42007 6/25/2008 7:00:00 ATMP1 28.9 42007 6/25/2008 8:00:00 ATMP1 28.5 42007 6/25/2008 9:00:00 ATMP1 28.5 42007 6/25/2008 10:00:00 ATMP1 28.4 42007 6/25/2008 11:00:00 ATMP1 28.2 42007 6/25/2008 12:00:00 ATMP1 28.3 42007 6/25/2008 13:00:00 ATMP1 28.2 42007 6/25/2008 14:00:00 ATMP1 26.6 42007 6/25/2008 15:00:00 ATMP1 64.5 42007 6/25/2008 16:00:00 ATMP1 37.5 42007 6/25/2008 17:00:00 ATMP1 35.5 42007 6/25/2008 18:00:00 ATMP1 28.2 42007 6/25/2008 19:00:00 ATMP1 28.2 42007 6/25/2008 20:00:00 ATMP1 28.3 42007 6/25/2008 21:00:00 ATMP1 28.4 42007 6/25/2008 22:00:00 ATMP1 28.6 42007 6/25/2008 23:00:00 ATMP1 28.4

42007 6/25/2008 0:00:00 ATMP1 29 042007 6/25/2008 1:00:00 ATMP1 29 042007 6/25/2008 2:00:00 ATMP1 28.9 042007 6/25/2008 3:00:00 ATMP1 29 042007 6/25/2008 4:00:00 ATMP1 29.1 042007 6/25/2008 5:00:00 ATMP1 28.7 042007 6/25/2008 6:00:00 ATMP1 28.8 042007 6/25/2008 7:00:00 ATMP1 28.9 042007 6/25/2008 8:00:00 ATMP1 28.5 042007 6/25/2008 9:00:00 ATMP1 28.5 042007 6/25/2008 10:00:00 ATMP1 28.4 042007 6/25/2008 11:00:00 ATMP1 28.2 042007 6/25/2008 12:00:00 ATMP1 28.3 042007 6/25/2008 13:00:00 ATMP1 28.2 042007 6/25/2008 14:00:00 ATMP1 26.6 042007 6/25/2008 15:00:00 ATMP1 64.5 V42007 6/25/2008 16:00:00 ATMP1 37.5 V42007 6/25/2008 17:00:00 ATMP1 35.5 L42007 6/25/2008 18:00:00 ATMP1 28.2 042007 6/25/2008 19:00:00 ATMP1 28.2 042007 6/25/2008 20:00:00 ATMP1 28.3 042007 6/25/2008 21:00:00 ATMP1 28.4 042007 6/25/2008 22:00:00 ATMP1 28.6 042007 6/25/2008 23:00:00 ATMP1 28.4 0

NDBC Provided Air Temp DataBuoy 42007 6/25/08

Quality Checked Data forBuoy 42007 6/25/08

QA

Range Checks (Hard & soft

flags)

Time Continunity

Checks (Hard & soft

flags)

Input, Output and Control

Configuration Files

Input/Output Configuration File Type Data Parameters Data Set Dimensions

Control Subsystem (ie., Data Flow)

Quality Assurance Procedures

Page 13: 1 Automated Data Quality Assurance for Marine Observations James V. Koziana Science Applications International Corporation (SAIC) Hampton, VA 23666 USA

19

Western Gulf of Mexico Recent Marine Datahttp://ndbc.noaa.gov/maps/WestGulf.shtml

Louisiana/Mississippi Coastal Region Recent Marine Data

http://ndbc.noaa.gov/maps/WestGulf_inset.shtml

Source of Data

Page 14: 1 Automated Data Quality Assurance for Marine Observations James V. Koziana Science Applications International Corporation (SAIC) Hampton, VA 23666 USA

20

Regional and Seasonal LimitsCentral Gulf of Mexico

Parameter JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC

BARO MAX 1029.3 1028.4 1026.7 1025.3 1022.4 1021.4 1022.2 1021.1 1020.1 1022.7 1026.2 1028.1BARO MIN 1010.3 1007.7 1007.9 1008.2 1009.8 1010.1 1012.5 1011.1 1008.5 1008.3 1010.2 1011.0ATMP MAX 27.9 27.9 28.5 28.9 30.0 31.3 32.1 32.1 31.5 30.6 29.3 28.1ATMP MIN 12.7 13.2 14.6 18.1 21.6 24.3 25.5 25.5 24.6 21.2 16.9 14.4WTMP MAX 28.2 28.1 28.3 28.5 29.4 30.6 31.5 31.8 32.7 30.5 29.5 28.7WTMP MIN 18.5 17.8 18.3 18.0 23.1 25.8 27.2 27.6 25.5 24.9 22.2 20.3WDIR MAX 360.0 360.0 360.0 360.0 360.0 360.0 360.0 360.0 360.0 360.0 360.0 360.0WDIR MIN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0WSPD MAX 14.5 14.2 13.9 12.6 11.3 10.3 9.4 9.8 12.8 13.4 13.6 13.8WSPD MIN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0WVHGT MX 3.0 3.0 2.9 2.5 2.2 2.0 1.7 1.8 2.7 2.7 2.7 2.8WVHGT MN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

Seasonal/Regional Air Temperature Ranges

0

5

10

15

20

25

30

35

Month of Year

Air Temp (Max)

AirTemp (Min)

Seasonal/Regional Wind Speed

0

2

4

6

8

10

12

14

16

Month of Year

Wind Speed (Max)

Wind Speed (Min)

NDBC Technical Document 03-02

Page 15: 1 Automated Data Quality Assurance for Marine Observations James V. Koziana Science Applications International Corporation (SAIC) Hampton, VA 23666 USA

21

Air Temperature for Station 42001 (June 25, 08)

0

10

20

30

40

50

60

700 2 4 6 8 10

12

14

16

18

20

22

Time (Hours)

Atm

osp

heri

c T

em

pera

ture

(C

)

Air Temperature (42001)

Max Air Temp

Min Air Temp

VL

Note: L: Failed Limits CheckV: Failed Time-Continunity Test

Page 16: 1 Automated Data Quality Assurance for Marine Observations James V. Koziana Science Applications International Corporation (SAIC) Hampton, VA 23666 USA

22

Wind Speed for Station 42001 (August 2005)

0

5

10

15

20

25

30

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91

Time (Days)

Win

d S

peed

(Met

ers/

Sec

ond)

Maximum Wind Speed

Wind Direction

Minimum WInd Speed

Storm Flag

Seasonal/Regional Min Limit

Seasonal/Regional Max Limit

Seasonal/Regional Max Limit

Seasonal/Regional Min Limit

Wind Speed

“a” soft flags for observations above the Seasonal/Regional Max Limit

Katrina August 23 to 29, 2005

http://visibleearth.nasa.gov/view_rec.php?id=7938

(hours)

42007

Page 17: 1 Automated Data Quality Assurance for Marine Observations James V. Koziana Science Applications International Corporation (SAIC) Hampton, VA 23666 USA

23

Present and Future Applications

QA system is well-suited for a diverse science community employing many different methods to characterize the quality of their data products. Allow single-data providers to large-scale data providers (i.e., observational and model) to

perform automatic quality assurance on their data products QA system processes a real-time data stream

High quality data product Associated metadata Aggregated quality flags

Expanding the QA system To address additional input/output data types To enhance the algorithm library by developing additional quality control algorithms

o simple quality tests (ex., storm limits, time continuity, internal consistency and others)o higher order algorithm development that exploit the relationship between sensors and parameters.

To explore more supplicated algorithms that provide higher level accuracy presented by specific configuration of sensors.

Explore applying data mining to quality control Define interface to the QA for data providers

Use of the state-of-the-art visualization Use of analysis tools to monitor the real-time data streams How users analyze data to determine the root causes of problems and editing

Page 18: 1 Automated Data Quality Assurance for Marine Observations James V. Koziana Science Applications International Corporation (SAIC) Hampton, VA 23666 USA

24

Wrap-Up Scalable, modular Data Quality Assurance System

QA Algorithm Library is extendable to other data parameters by changing ASCII configuration files

Reduces level of effort required (cost effective)

Be easily configurable to work with many different data sets (observations and model output)

Readily accommodate algorithm and data product additions and modifications with minimal code changes.

QA was performed at same confidence level (Range Limit and Time Continuity Checks)

Initial validation testing with NDBC products Limited set of daily air temp (24 hours station 42007) and wind speed (3 days station 42001). Air temp (3 hard flags) Wind speed (48 soft flags)