Download pptx - ECMWF Observation Influence Training Course 2010 slide 1 Influence matrix diagnostic to monitor the assimilation system Carla Cardinali

ECMWFObservation Influence Training Course 2010 slide 1

Influence matrix diagnostic to monitor the assimilation

system

Carla Cardinali


Monitoring Assimilation System

ECMWF 4D-Var system handles a large variety of space and surface-

based observations. It combines observations and atmospheric state a

priori information by using a linearized and non-linear forecast model

Effective monitoring of a such complex system with 108 degree of

freedom and 107 observations is a necessity. No just few indicators but a

more complex set of measures to answer questions like

How much influent are the observations in the analysis?How much influence is given to the a priori information?How much does the estimate depend on one single influential obs?


Influence Matrix: Introduction

Unusual or influential data points are not necessarily bad observations

but they may contain some of most interesting sample information

In Ordinary Least-Square the information is quantitatively available in

the Influence Matrix

Tuckey 63, Hoaglin and Welsch 78, Velleman and Welsch 81

Diagnostic methods are available for monitoring multiple regression

analysis to provide protection against distortion by anomalous data

y = Sy


Influence Matrix in OLS

The OLS regression model is

y = Xβ + ε

y = Sy

T -1 Tβ (X X) X y

Y (mx1) observation vector

X (mxq) predictors matrix, full rank q

β (qx1) unknown parameters

(mx1) error2( ) 0, ( )E Var ε ε I

1T TS X(X X) X

The fitted response is

OLS provide the solution

m>q


Influence Matrix Properties

y = Sy

S (mxm) symmetric, idempotent and positive definite matrix

It is seen

The diagonal element satisfy 0 1iiS

ˆ

y

Sy

( )Tr qS

ˆiij

j

yS

y

Cross-Sensitivity

ˆiii

i

yS

y

Self-Sensitivity

Average Self-Sensitivity=q/m


Influence Matrix Related Findings

y = Sy

The change in the estimate that occur when the i-th is deleted

iii

iii

iiiii

yyr

rS

Syy

ˆ

1ˆˆ )(

CV score can be computed by relying on the all data estimate ŷ and Sii

m

i ii

iim

i

iii S

yyyy

1

2

21

2)(

)1(

)ˆˆ()ˆˆ(


Outline

Conclusion

Observation and background Influence

Generalized Least Square method

Findings related to data influence and information content

Toy model: 2 observations


Hxb

y

HK

I-HK

Hxb

y

HKI-HK

Solution in the Observation Space

The analysis projected at the observation location

ˆ ( )a b y Hx HKy I HK Hx

The estimation ŷ is a weighted mean

Hxb y

HKI-HK

1 1 1 1T T K = (B + H R H) H R

B(qxq)=Var(xb)R(pxp)=Var(y)

K(qxp) gain matrixH(pxq) Jacobian matrix

( )a q b x Ky I KH x


Influence Matrix

ˆ( )T T T Observation Influence

y

S HK K Hy

ˆ

b

Background Influence

yI - S

Hx

Observation Influence is complementary to Background Influence

ˆ ( ) b y Sy I S Hx

ˆ ( )a b y Hx HKy I HK Hx


Influence Matrix Properties

The diagonal element satisfy 0 1iiS

1

N

iii

S Total Information Content

1

. .

N

iii

SAverage Influence

Tot Obs Number


Synop Surface Pressure Influence

>1

Sii>1due to the numericalapprossimation


Aircraft 250 hPa U-Comp Influence


>1


QuikSCAT U-Comp Influence

>1



Observation Influence: Vertical levels

>1



AMSU-A channel 13 Influence


>1


Toy Model: 2 Observations

1 1 1 1( )T T S R H B + HR H H

2oR IH I

2

2o

b

r

x1

x2y2

y1

Find the expression for S as function of r and the expression of for α=0 and 1given the assumptions:

ˆ ( ) b y Sy I S x

y

2

2

b

b

B



1 1 1 1( )T T S R H B + HR H H

2oR IH I

2

2o

b

r

x1

x2y2

y1

2

2

b

b

B

2

121122211

r

SSSS1

22

2

22

2222

2

12

1

12

1212

1

rr

r

rr

r

rr

r

rr

r

S

1

12211

r

SS0

Sii

r

1

1

1/2

1/3

1/2

=1

=0

0


Consideration (1)

• Where observations are dense Sii tends to be small and

the background sensitivities tend to be large and also the surrounding observations have large influence (off-diagonal term)

• When observations are sparse Sii and the background

sensitivity are determined by their relative accuracies (r) and the surrounding observations have small influence (off-diagonal term)



ˆ ( ) b y Sy I S x

2

21o

b

r

2

1 1 1 2 22 2 2

2 2ˆ ( )

4 4 4y y x y x

=0 1 1 1

1 1ˆ

2 2y y x

=1 1 1 1 2 2

1 2 1ˆ ( )

3 3 3y y x y x

x1

x2y2

y1

22

2

22

2222

2

12

1

12

1212

1

rr

r

rr

r

rr

r

rr

r

S


Consideration (2)

• When observation and background have similar accuracies (r), the estimate ŷ1 depends on y1 and x1 and an

additional term due to the second observation. We see that if R is diagonal the observational contribution is devaluated with respect to the background because a group of correlated background values count more than the single observation (2-α2 → 2). Also by increasing background correlation, the nearby observation and background have a larger contribution


Global and Partial Influence

Global Influence = GI = 1

N

iii

S

N

100% only Obs Influence

0% only Model Influence

Partial Influence = PI = ii

i I

IpS Type

Variable

Area

Level


Global and Partial Influence

Level

1 1000-850 2 850-700 3 700-500 4 500-400 5 400-300 6 300-200 7 200-100 8 100-70 9 70-50 10 50-30 11 30-0

Type

SYNOPAIREPSATOBDRIBUTEMPPILOTAMSUAHIRSSSMIGOESMETEOSATQuikSCAT

Variable

uvTqps

Tb

Area

Tropics S.HemN.Hem

Europe US N.Atl …


Average Influence and Information Content

Global Observation Influence GI=15%

Global Background Influence I-GI=85%

0

5

10

15

20

25

Synop

Dribu

Airep

Satob

Tem

pPilo

t

SCAT

AMSU-A

AMSU-B

AIRS

HIRS

TCW

V

SSM/I

Goes

Met

eo

Ozone

Info

rma

tio

n c

on

ten

t %

0

0.1

0.2

0.3

0.4

0.5

0.6

Synop

Dribu

Airep

Satob

Temp

Pilot

SCAT

AMSU-A

AMSU-B

AIRS

HIRS

TCWV

SSM/I

Goe

s

Met

eo

Ozo

ne

Mea

n I

nfl

uen

ce

2003 ECMWF Operational


0

4

8

12

16

20

DFS %

Global

0

2

4

6

8

10

12

14

16

Mean Influence %

2007 ECMWF OpDFS 20%

20

16

12

8

4

048

12162024

Synop

Dribu

Airep

Satob

Temp

Pilot

SCAT

AMSU-A

AMSU-B

AIRS

HIRS

TCWV

SSM/I

Goes

Met

eo

Ozone

0

0.2

0.4

0.6


DFS %Global Information Content 20%North Pole 5.6%South Pole 1.1%Tropics 5.5%

0

4

8

12

16

20

DFS %

North Pole South Pole Tropics


Evolution of the B matrix: B computed from EnDA

Xt+εStochastics

y+εo

SST+εSST

Xb+εb

y+εo

SST+εSST

AMSU-A ch6


Evolution of the GOS: Interim ReanalysisAircraft 200-300 hPa

1999

2007


Evolution of the GOS: Interim ReanalysisAMSU-A ch6

1999

2007


Evolution of the GOS and of the B: Interim ReanalysisAMSU-A ch13

1999

2007

2003


Evolution of the GOS: Interim ReanalysisAMSU-A

1999

2007


Evolution of the GOS: Interim ReanalysisU-comp Aircraft, Radiosonde, Vertical Profiler, AMV

1999

2007


ConclusionsThe Influence Matrix is well-known in multi-variate linear regression. It is

used to identify influential data. Influence patterns are not part of the

estimates of the model but rather are part of the conditions under which

the model is estimated

Disproportionate influence can be due to:incorrect data (quality control)legitimately extreme observations occurrence

to which extent the estimate depends on these data

Sii=1Data-sparse Single observation

Model under-confident (1-Sii)

Sii=0Data-dense

Model over-confident tuning (1-Sii)


Conclusions

Observational Influence pattern would provide information on different

observation system

New observation systemSpecial observing field campaign

Thinning is mainly performed to reduce the spatial correlation but also to

reduce the analysis computational cost

Knowledge of the observations influence helps in selecting appropriate

data density

Diagnose the impact of improved physics representation in the linearized

forecast model in terms of observation influence


Background and Observation Tuning in ECMWF 4D-Var

Observations

Model


Influence Matrix Computation

1

1 1

11( '') ( )( ) ( )( )

N MT Ti

i i i ii i i

u uN

J L L L L

1 1( '') T S R H J H

B

A sample of N=50 random vectors from (0,1) Truncated eigenvector expansion

with vectors obtained through the combined Lanczos/conjugatealgorithm. M=40

1 1 1( )T A B + HR H


Hessian Approximation B-A

0.00

0.50

1.00

1.50

2.00

2.50

0 100 200 300 400 500 600 700 800 900 1000

Number of Hessian Vectors

Nu

mb

er o

f S

elfs

ensi

tivi

ties

>1

(%

)

1

1( )( )

MTi

i ii i

L L

1

1( )( )

NT

i ii

u uN L L 500 random vector to represent B


Ill-Condition Problem

A set of linear equation is said to be ill-conditioned if small variations in

X=(HK I-HK) have large effect on the exact solution ŷ, e.g matrix close

to singularity

max

min

( )

X =K

A Ill-conditioning has effects on the stability and solution accuracy . A

measure of ill-conditioning is

A different form of ill-conditioning can results from collinearity: XXT

close to singularity

Large difference between the background and observation error

standard deviation and high dimension matrix


Flow Dependent b: MAMT+Q

DRIBU psInfluence

b

o