Upload
joanna-henry
View
212
Download
0
Embed Size (px)
Citation preview
Automated QA/QC Technique for Climate Sensor Data
EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team
TOC
• QA/QC Requirements• Detecting Outliers– Types of Outliers– Detection Methods– Statistical Correlation Functions– QuaT Correlational Method
• Data Mining for further automation
QA/QC Requirements
• Detect Abnormal Data & Outliers• Correct abnormal data and outliers where it is
possible• Find additional property/correlation among
variables– To catch changes overtime
Detecting Outliers
• Type of Outliers– Correctable Outliers• Caused by calibration, sensor cleaning, low battery
voltage, erroneous sensor installation, etc. Outliers caused by these factors can be corrected
– Error Values• Missing or impossible values caused by sensor failure:
physical damage, irreversible factor effects• This type of outliers cannot be corrected and must be
discarded
Detecting Outliers
• Detection Methods1. Normal value range check (Single variable)2. Diurnal pattern check (Single variable)3. Correlational pattern check (Multiple variables)4. Additional methods can be found by data mining
Normal value range check
For example, humidity if it is over 100% does not make sense. Also consideration to regional and seasonal factors required.• Knowledge Required
Known/valid normal value ranges for all variables Also subsets of normal value ranges for all variables in different regions or seasons
Diurnal pattern check
The radiation should be high in the day low in the night• Knowledge Required
Known/valid diurnal patternAlso different diurnal patterns for all variables in different regions or seasons
• Challenge– How to slice time– What value ranges are considered to be high, average, or
low for each variable, simply take standard deviation?
Correlational pattern check
For example, the radiation and the temperature should show correlations• Knowledge Required
Known correlation between the variablesHow can we verify the correlations?• Correlation functions from statistics will be useful• Also, a method called QuaT might be useful to analyze
the similarity of the trends of two variables along the timeline
Additional Analyses
4. Additional methods might be helpful from data mining– Finding additional correlations– Value range change over time (Global climate
change)
Statistic Functions
• Pearson’s Product Moment• Spearman’s Rank Correlation• Kendall’s Rank Correlation
Pearson’s Product Moment
• Pearson’s only works for parametric dataset– Dataset needs to be tested for normality before it
can be analyzed – Normality test: Shapiro-Wilk Normality test• If a dataset is determined to be non-parametric,
either ,or both of, Spearman’s or Kendall’s
– Also, outliers decreases the precision of Pearson’s
Spearman’s & Kendall Correlation
• If a dataset is not parametric, these correlation functions can be used
• Both requires values to be presorted/ranked• Spearman’s – compares the distance of the
values of the same rank from the two variables
• Kendall’s – shows the ratio of the values of the same rank from the two variables
QuaT
• An algorithm to determine the similarity of the two trend curves
• Introduced by Okabe A. & Masuyama A. of Tokyo University
• “A robust exploratory method for qualitative trend curve analysis”
• http://www.csis.u-tokyo.ac.jp/dp/8.pdf
QuaT - Basic steps of the algorithm
1. Find peaks and bottoms for the curves that are compared
2. Calculate the height of each peak3. Determine the distinct height, a threshold height, and
extract peaks that are higher or equal to the distinct height. In other words, ignore less distinct peaks
4. Compare extracted peaks and determine if the two variables’ curves have the times of peaks occur at the same time and magnitude (order) for both variable
Basic Relationship among and between the Variables
• Radiation (short, long, net, PAR)• Rainfall (humidity, soil moisture)• Temperature (air, surface, body)• Wind (speed, direction)
Affecting Relatiohship Affected Specific VariableRadiation Category direct Temperature CategoryRadiation Category affect Wind CategoryRadiation Category inverse Rainfall Category Soil MoistureRainfall Category inverse Radiation CategoryRainfall Category inverse Temperature CategoryRainfall Category affect Wind CategoryWind Category inverse Temperature Category