15
Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team

Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team

Embed Size (px)

Citation preview

Page 1: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team

Automated QA/QC Technique for Climate Sensor Data

EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team

Page 2: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team

TOC

• QA/QC Requirements• Detecting Outliers– Types of Outliers– Detection Methods– Statistical Correlation Functions– QuaT Correlational Method

• Data Mining for further automation

Page 3: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team

QA/QC Requirements

• Detect Abnormal Data & Outliers• Correct abnormal data and outliers where it is

possible• Find additional property/correlation among

variables– To catch changes overtime

Page 4: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team

Detecting Outliers

• Type of Outliers– Correctable Outliers• Caused by calibration, sensor cleaning, low battery

voltage, erroneous sensor installation, etc. Outliers caused by these factors can be corrected

– Error Values• Missing or impossible values caused by sensor failure:

physical damage, irreversible factor effects• This type of outliers cannot be corrected and must be

discarded

Page 5: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team

Detecting Outliers

• Detection Methods1. Normal value range check (Single variable)2. Diurnal pattern check (Single variable)3. Correlational pattern check (Multiple variables)4. Additional methods can be found by data mining

Page 6: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team

Normal value range check

For example, humidity if it is over 100% does not make sense. Also consideration to regional and seasonal factors required.• Knowledge Required

Known/valid normal value ranges for all variables Also subsets of normal value ranges for all variables in different regions or seasons

Page 7: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team

Diurnal pattern check

The radiation should be high in the day low in the night• Knowledge Required

Known/valid diurnal patternAlso different diurnal patterns for all variables in different regions or seasons

• Challenge– How to slice time– What value ranges are considered to be high, average, or

low for each variable, simply take standard deviation?

Page 8: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team

Correlational pattern check

For example, the radiation and the temperature should show correlations• Knowledge Required

Known correlation between the variablesHow can we verify the correlations?• Correlation functions from statistics will be useful• Also, a method called QuaT might be useful to analyze

the similarity of the trends of two variables along the timeline

Page 9: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team

Additional Analyses

4. Additional methods might be helpful from data mining– Finding additional correlations– Value range change over time (Global climate

change)

Page 10: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team

Statistic Functions

• Pearson’s Product Moment• Spearman’s Rank Correlation• Kendall’s Rank Correlation

Page 11: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team

Pearson’s Product Moment

• Pearson’s only works for parametric dataset– Dataset needs to be tested for normality before it

can be analyzed – Normality test: Shapiro-Wilk Normality test• If a dataset is determined to be non-parametric,

either ,or both of, Spearman’s or Kendall’s

– Also, outliers decreases the precision of Pearson’s

Page 12: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team

Spearman’s & Kendall Correlation

• If a dataset is not parametric, these correlation functions can be used

• Both requires values to be presorted/ranked• Spearman’s – compares the distance of the

values of the same rank from the two variables

• Kendall’s – shows the ratio of the values of the same rank from the two variables

Page 13: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team

QuaT

• An algorithm to determine the similarity of the two trend curves

• Introduced by Okabe A. & Masuyama A. of Tokyo University

• “A robust exploratory method for qualitative trend curve analysis”

• http://www.csis.u-tokyo.ac.jp/dp/8.pdf

Page 14: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team

QuaT - Basic steps of the algorithm

1. Find peaks and bottoms for the curves that are compared

2. Calculate the height of each peak3. Determine the distinct height, a threshold height, and

extract peaks that are higher or equal to the distinct height. In other words, ignore less distinct peaks

4. Compare extracted peaks and determine if the two variables’ curves have the times of peaks occur at the same time and magnitude (order) for both variable

Page 15: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team

Basic Relationship among and between the Variables

• Radiation (short, long, net, PAR)• Rainfall (humidity, soil moisture)• Temperature (air, surface, body)• Wind (speed, direction)

Affecting Relatiohship Affected Specific VariableRadiation Category direct Temperature CategoryRadiation Category affect Wind CategoryRadiation Category inverse Rainfall Category Soil MoistureRainfall Category inverse Radiation CategoryRainfall Category inverse Temperature CategoryRainfall Category affect Wind CategoryWind Category inverse Temperature Category