1
November 2012 November 2012 November 2012 November 2012 Problems in multiple linear regression Multicollinearity is a statistical phenomenon in which two or more predictor variables are highly correlated. In this situation the coefficient of the multiple regression may change erratically in response to small changes, and it may not give valid results about estimation of parameters. Variance Inflation Factor (VIF) INTRODUCTION Microwave radiometers onboard satellites have been used to measure a wide variety of atmospheric and surface parameters. The Advanced Microwave Sounding Unit-A (AMSU-A) is one of the satellites with the largest impact to reduce forecast errors in data assimilation. All data assimilation systems are affected by biases, caused by problems with the data, by approximations in the observation operators used to simulate the data, by limitations of the assimilating model, or by the assimilation methodology itself. A clear symptom of bias in the assimilation is the presence of systematic features in the analysis increments (Dee, 2005). The objective of this study is to introduce the AMSU-A radiance pre-processing and quality control modules including bias correction at the KIAPS observation processing system. KIAPS AMSU-A Processing System Sihye Lee, Ju-Hye Kim, Jeon-Ho Kang, and Hyoung-Wook Chun Korea Institute of Atmospheric Prediction Systems (KIAPS), Seoul, South Korea The 19 th International TOVS Study Conference, Jeju Lsland, South Korea, 26 March - 1 April 2014 Development of AMSU-A Pre-processing and Quality Control Modules at KIAPS Observation Processing System 10p.07 SUMMARY Quality Control and Bias Correction Multicollinearity of Airmass Predictors Observation Extraction: AMSU-A level-1d radiance data have been extracted using the ECMWF BUFR decoder. Sanity Check: Physical reality checks on geolocation and observation, blacklisting of broken channels, and QC flagging for clear-sky radiance assimilation. Background Ingest: Atmospheric variables of model background have been matched to the observation state with space interpolation. First Thinning: Duplicate observations in a defined grid box have been eliminated using the removal scores. Observation Operator: The RTTOV_10.2 fast RTM have been implemented to convert the atmospheric variables of model state to the radiance of observation state, and to calculate the Jacobian matrices of model state. Initial Quality Control: The pixels contaminated by cloud, precipitation, and sea ice have been removed and assimilation channels have been selected, with considering surface type and topography. Bias Correction: Scan and airmass bias correction modules have been developed in two steps based on 30-day innovation statistics. Outlier Removal: The expected standard deviation of first guess (FG) departure has been estimated from assigned observation errors to eliminate outliers. Final Thinning: Final thinning have been performed with considering the assimilation resolutions, and then survived radiance data have been prepared to pass KIAPS data assimilation system. Monitoring and Statistics: Bias correction coefficients and observation errors have been updated by off-line monitoring codes of statistics and QC scores. November 2012 November 2012 November 2012 November 2012 November 2012 November 2012 QC flags: Scattering index, Cloud liquid water, Sea ice index [Grody et al., 1999, 2001] ( 1) ( 1) ( 2) ( 15) _ 113.2 (2.41 0.0049 ) 0.454 B CH B CH B CH B CH Scatt indx T T T T =− + + ( 1) ( 3) _ 2.85 0.20 0.028 B CH B CH Seaice indx T T = + For latitudes beyond 50 degrees of the equator, 0 ( 1) ( 2) [ 0.754 ln(285.0 ) 2.265 ln(285.0 )] B CH B CH CLW D T T µ = + 0 8.240 (2.622 1.846 ) D µµ = cos( _ ) sat zenith µ = Threshold for initial QC • Scatt Index > 40 • CLW > 0.2 • Sea-ice index > 50 November 2012 November 2012 November 2012 FG departures after sanity check: ( ) ( ) b o M H x y FG departures after sanity check and duplicate removal: ( ) ( ) b o M H x y FG departures after sanity check, duplicate removal and initial QC: ( ) ( ) b o M H x y 2012/11/02/00UTC 2012/11/02/00UTC 2012/11/02/00UTC ( ) ( ) b o M H x b y Bias corrected FG departures: Bias corrected FG departures after outlier removal: ( ) ( ) b o M H x b y Bias corrected FG departures after outlier removal and final thinning: ( ) ( ) b o M H x b y 2012/11/02/00UTC 2012/11/02/00UTC 2012/11/02/00UTC Global distributions for quality control and bias correction Thick 850-300 Thick 200-50 Thick 50-5 Thick 10-1 1.04 1.04 10.42 1.24 Spatial distributions of airmass predictors Remedies for multicollinearity of airmass predictors • Selection of different airmass predictors or one predictor Ridge regression or principle component regression (PCR) with 4 atmospheric thickness predictors R VIF j j 2 1 1 = We have developed the AMSU-A data pre-processing and quality control system to provide the well-qualified radiance data for KIAPS data assimilation system. It appears to be successful in controlling the scan and airmass bias in the crucial channels which sound tropospheric and stratospheric temperature below 50 km altitude. However, multicollinearity is observed when 4 thickness predictors are highly correlated among themselves. We have tried to find a small set of linear combinations of the covariates which are uncorrelated with each other. As a result, multicollinearity of predictors are resolved with PCR of 4 PCs, the bias correction performance at lower stratospheric channels is not shown improved much, though. Observed TB (0.35° x 0.23°) In channel 5, monthly mean of observed TB is high at low latitude for November 2012, but it decreases at high latitude. The land variability (i.e., standard deviation) is more than ~4.5 K. Background TB (0.35° x 0.23°) Monthly mean of background (Unified Model output: e.g., qwqu00.pp_006) is similar to observed TB but land variation of background TB is less than that of observation. O-B (innovation) Both monthly mean and standard deviation of innovation are high in land, especially for high topography such as the Andes mountains and desert area. Bias correction (2) Step 2: global multiple linear regression of the scan-corrected innovations against 4 predictors (thickness 850-300, 200-50, 50-5, 10-1 hPa) to correct the airmass bias , = , ( 850 850 )+ , ( 200 200 )+ , ( 50 50 )+ , ( 10 10 )+ , b air : airmass bias j : channel s : satellite a, b, c, d, e : airmass coefficients Z 850 : Thickness 850−300 Z 200 : Thickness 200−50 Z 50 : Thickness 50−10 Z 10 : Thickness 10−1 (1) Step 1: mean innovation at each scan angle to equal to the mean innovation at the center scan angle , = , , =0 0 8 16 24 32 40 48 56 Scan position 200 220 240 260 Brightness Temperature (K) Observed TB 0 8 16 24 32 40 48 56 Scan position 2 4 6 8 10 12 Brightness Temperature (K) Observed TB (SD) ch09 ch10 ch08 ch11 ch07 ch12 ch06 ch13 ch14 ch05 ch04 ch04 ch11 ch14 ch13 ch10 ch07 ch08 ch06 ch12 ch09 ch05 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 Scan position -2 -1 0 1 2 O - B (K) NOAA-15 NOAA-18 NOAA-19 MetOp-A AMSU-A: ch05 b scan : scan bias j : channel s : satellite θ : scan angle November 2012 Similar patterns for observed and background TBs Z multiple linear regression Step of PCR to calculate new airmass bias coefficients PC: Score matrix (T) = X * V Eigenvectors (V) of S Covariance matrix (S) of predictors (X) Eigenvalues (D) of S PC regression of new data set monthly dataset (November 2012) Experiments to remedy multicollinearity Multiple linear regression with 4 predictors Principal component egression with 4 PCs Linear regression with 1 predictor for each channel 2012/11/02/00UTC 2012/11/02/00UTC 2012/11/02/00UTC 2012/11/02/00UTC 2012/11/02/00UTC 2012/11/02/00UTC CH05 CH05 CH05 CH08 CH08 CH08 Extracted AMSU-A radiance data (TB) monitoring

10p.07 Development of AMSU-A Pre-processing and Quality ...cimss.ssec.wisc.edu/itwg/itsc/itsc19/program/posters/10p_07_lee.pdf · observed TB but land variation of background TB is

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 10p.07 Development of AMSU-A Pre-processing and Quality ...cimss.ssec.wisc.edu/itwg/itsc/itsc19/program/posters/10p_07_lee.pdf · observed TB but land variation of background TB is

November 2012 November 2012 November 2012 November 2012

Problems in multiple linear regression

• Multicollinearity is a statistical phenomenon in which two or more predictor variables are highly correlated.

• In this situation the coefficient of the multiple regression may change erratically in response to small changes,

and it may not give valid results about estimation of parameters.

• Variance Inflation Factor (VIF)

INTRODUCTION • Microwave radiometers onboard satellites have been used to measure a wide variety of atmospheric and surface parameters. The Advanced Microwave Sounding Unit-A (AMSU-A) is one of the satellites with the largest impact to reduce forecast errors in data assimilation.

• All data assimilation systems are affected by biases, caused by problems with the data, by approximations in the observation operators used to simulate the data, by limitations of the assimilating model, or by the assimilation methodology itself. A clear symptom of bias in the assimilation is the presence of systematic features in the analysis increments (Dee, 2005).

• The objective of this study is to introduce the AMSU-A radiance pre-processing and quality control modules including bias correction at the KIAPS observation processing system.

KIAPS AMSU-A Processing System

Sihye Lee, Ju-Hye Kim, Jeon-Ho Kang, and Hyoung-Wook Chun Korea Institute of Atmospheric Prediction Systems (KIAPS), Seoul, South Korea

The 19th International TOVS Study Conference, Jeju Lsland, South Korea, 26 March - 1 April 2014

Development of AMSU-A Pre-processing and Quality Control Modules at KIAPS Observation Processing System

10p.07

SUMMARY

Quality Control and Bias Correction Multicollinearity of Airmass Predictors

• Observation Extraction: AMSU-A level-1d radiance data have been extracted using the ECMWF BUFR decoder.

• Sanity Check: Physical reality checks on geolocation and observation, blacklisting of broken channels, and QC flagging for clear-sky radiance assimilation.

• Background Ingest: Atmospheric variables of model background have been matched to the observation state with space interpolation.

• First Thinning: Duplicate observations in a defined grid box have been eliminated using the removal scores.

• Observation Operator: The RTTOV_10.2 fast RTM have been implemented to convert the atmospheric variables of model state to the radiance of observation state, and to calculate the Jacobian matrices of model state.

• Initial Quality Control: The pixels contaminated by cloud, precipitation, and sea ice have been removed and assimilation channels have been selected, with considering surface type and topography.

• Bias Correction: Scan and airmass bias correction modules have been developed in two steps based on 30-day innovation statistics.

• Outlier Removal: The expected standard deviation of first guess (FG) departure has been estimated from assigned observation errors to eliminate outliers.

• Final Thinning: Final thinning have been performed with considering the assimilation resolutions, and then survived radiance data have been prepared to pass KIAPS data assimilation system.

• Monitoring and Statistics: Bias correction coefficients and observation errors have been updated by off-line monitoring codes of statistics and QC scores.

November 2012 November 2012

November 2012 November 2012

November 2012 November 2012

QC flags: Scattering index, Cloud liquid water, Sea ice index [Grody et al., 1999, 2001]

( 1) ( 1)

( 2) ( 15)

_ 113.2 (2.41 0.0049 )

0.454B CH B CH

B CH B CH

Scatt indx T TT T

= − + −

+ − ( 1) ( 3)_ 2.85 0.20 0.028B CH B CHSeaice indx T T= + −

For latitudes beyond 50 degrees of the equator, 0 ( 1)

( 2)

[ 0.754 ln(285.0 )

2.265 ln(285.0 )]B CH

B CH

CLW D TT

µ= + −

− −

0 8.240 (2.622 1.846 )D µ µ= − −

cos( _ )sat zenithµ =

Threshold for initial QC

• Scatt Index > 40

• CLW > 0.2

• Sea-ice index > 50 November 2012 November 2012 November 2012

FG departures after sanity check: ( )( )bo MH xy −

FG departures after sanity check and duplicate removal: ( )( )bo MH xy −

FG departures after sanity check, duplicate removal and initial QC: ( )( )bo MH xy −

2012/11/02/00UTC 2012/11/02/00UTC 2012/11/02/00UTC

( )( )bo MH xby −−Bias corrected FG departures: Bias corrected FG departures after

outlier removal: ( )( )bo MH xby −−Bias corrected FG departures after outlier removal and final thinning: ( )( )bo MH xby −−

2012/11/02/00UTC 2012/11/02/00UTC 2012/11/02/00UTC

Global distributions for quality control and bias correction

Thick850-300 Thick200-50 Thick50-5 Thick10-1

1.04 1.04 10.42 1.24

Spatial distributions of airmass predictors

Remedies for multicollinearity of airmass predictors

• Selection of different airmass predictors or one predictor

• Ridge regression or principle component regression (PCR) with 4 atmospheric thickness predictors

RVIF

jj 21

1−

=

• We have developed the AMSU-A data pre-processing and quality control system to provide the well-qualified radiance data for KIAPS data assimilation system.

• It appears to be successful in controlling the scan and airmass bias in the crucial channels which sound tropospheric and stratospheric temperature below 50 km altitude.

• However, multicollinearity is observed when 4 thickness predictors are highly correlated among themselves. We have tried to find a small set of linear combinations of the covariates which are uncorrelated with each other.

As a result, multicollinearity of predictors are resolved with PCR of 4 PCs, the bias correction performance at lower stratospheric channels is not shown improved much, though.

• Observed TB (0.35° x 0.23°)

In channel 5, monthly mean of observed TB is

high at low latitude for November 2012, but it

decreases at high latitude. The land variability

(i.e., standard deviation) is more than ~4.5 K.

• Background TB (0.35° x 0.23°)

Monthly mean of background (Unified Model

output: e.g., qwqu00.pp_006) is similar to

observed TB but land variation of background

TB is less than that of observation.

• O-B (innovation)

Both monthly mean and standard deviation of

innovation are high in land, especially for high

topography such as the Andes mountains and

desert area.

Bias correction

(2) Step 2: global multiple linear regression of the scan-corrected innovations against 4 predictors (thickness 850-300, 200-50,

50-5, 10-1 hPa) to correct the airmass bias

𝑏𝑗,𝑠𝑎𝑎𝑎 = 𝑎𝑗,𝑠 (𝑍850−𝑍850)+𝑏𝑗,𝑠(𝑍200−𝑍200)+ 𝑐𝑗,𝑠(𝑍50−𝑍50)+ 𝑑𝑗,𝑠(𝑍10−𝑍10)+ 𝑒𝑗,𝑠

bair : airmass bias j : channel s : satellite a, b, c, d, e : airmass coefficients

Z850 : Thickness850−300 Z200 : Thickness200−50 Z50 : Thickness50−10 Z10 : Thickness10−1

(1) Step 1: mean innovation at each scan angle to equal to the mean innovation at the center scan angle

𝑏𝑗,𝑠𝑠𝑠𝑎𝑠 𝜃 = 𝑂 − 𝐵 𝑗,𝑠 𝜃 − 𝑂 − 𝐵 𝑗,𝑠 𝜃 = 0

0 8 16 24 32 40 48 56Scan position

200

220

240

260

Brig

htne

ss T

empe

ratu

re (K

)

Observed TB

0 8 16 24 32 40 48 56Scan position

2

4

6

8

10

12

Brig

htne

ss T

empe

ratu

re (K

)

Observed TB (SD)

ch09ch10ch08ch11ch07ch12

ch06ch13

ch14ch05ch04

ch04ch11

ch14ch13

ch10

ch07

ch08ch06

ch12

ch09ch05

0 4 8 12 16 20 24 28 32 36 40 44 48 52 56Scan position

-2

-1

0

1

2

O -

B (K

)

AMSU-A : ch05NOAA-15NOAA-18NOAA-19MetOp-A

AMSU-A: ch05

bscan : scan bias j : channel s : satellite θ : scan angle

November 2012

Similar patterns for observed and background TBs

𝑍 − Z

multiple linear regression

Step of PCR to calculate new airmass bias coefficients

PC: Score matrix (T) = X * V

Eigenvectors (V) of S

Covariance matrix (S) of predictors (X)

Eigenvalues (D) of S

PC regression of new data set

monthly dataset (November 2012)

Experiments to remedy multicollinearity

① Multiple linear regression with 4 predictors

③ Principal component egression with 4 PCs

② Linear regression with 1 predictor for each channel

2012/11/02/00UTC 2012/11/02/00UTC 2012/11/02/00UTC

2012/11/02/00UTC 2012/11/02/00UTC 2012/11/02/00UTC

CH05 CH05 CH05

CH08 CH08 CH08

Extracted AMSU-A radiance data (TB) monitoring