ECMWFObservation Influence Training Course 2010 slide 1
Influence matrix diagnostic to monitor the assimilation
system
Carla Cardinali
ECMWFObservation Influence Training Course 2010 slide 2
Monitoring Assimilation System
ECMWF 4D-Var system handles a large variety of space and surface-
based observations. It combines observations and atmospheric state a
priori information by using a linearized and non-linear forecast model
Effective monitoring of a such complex system with 108 degree of
freedom and 107 observations is a necessity. No just few indicators but a
more complex set of measures to answer questions like
How much influent are the observations in the analysis?How much influence is given to the a priori information?How much does the estimate depend on one single influential obs?
ECMWFObservation Influence Training Course 2010 slide 3
Influence Matrix: Introduction
Unusual or influential data points are not necessarily bad observations
but they may contain some of most interesting sample information
In Ordinary Least-Square the information is quantitatively available in
the Influence Matrix
Tuckey 63, Hoaglin and Welsch 78, Velleman and Welsch 81
Diagnostic methods are available for monitoring multiple regression
analysis to provide protection against distortion by anomalous data
y = Sy
ECMWFObservation Influence Training Course 2010 slide 4
Influence Matrix in OLS
The OLS regression model is
y = Xβ + ε
y = Sy
T -1 Tβ (X X) X y
Y (mx1) observation vector
X (mxq) predictors matrix, full rank q
β (qx1) unknown parameters
(mx1) error2( ) 0, ( )E Var ε ε I
1T TS X(X X) X
The fitted response is
OLS provide the solution
m>q
ECMWFObservation Influence Training Course 2010 slide 5
Influence Matrix Properties
y = Sy
S (mxm) symmetric, idempotent and positive definite matrix
It is seen
The diagonal element satisfy 0 1iiS
ˆ
y
Sy
( )Tr qS
ˆiij
j
yS
y
Cross-Sensitivity
ˆiii
i
yS
y
Self-Sensitivity
Average Self-Sensitivity=q/m
ECMWFObservation Influence Training Course 2010 slide 6
Influence Matrix Related Findings
y = Sy
The change in the estimate that occur when the i-th is deleted
iii
iii
iiiii
yyr
rS
Syy
ˆ
1ˆˆ )(
CV score can be computed by relying on the all data estimate ŷ and Sii
m
i ii
iim
i
iii S
yyyy
1
2
21
2)(
)1(
)ˆˆ()ˆˆ(
ECMWFObservation Influence Training Course 2010 slide 7
Outline
Conclusion
Observation and background Influence
Generalized Least Square method
Findings related to data influence and information content
Toy model: 2 observations
ECMWFObservation Influence Training Course 2010 slide 8
Hxb
y
HK
I-HK
Hxb
y
HKI-HK
Solution in the Observation Space
The analysis projected at the observation location
ˆ ( )a b y Hx HKy I HK Hx
The estimation ŷ is a weighted mean
Hxb y
HKI-HK
1 1 1 1T T K = (B + H R H) H R
B(qxq)=Var(xb)R(pxp)=Var(y)
K(qxp) gain matrixH(pxq) Jacobian matrix
( )a q b x Ky I KH x
ECMWFObservation Influence Training Course 2010 slide 9
Influence Matrix
ˆ( )T T T Observation Influence
y
S HK K Hy
ˆ
b
Background Influence
yI - S
Hx
Observation Influence is complementary to Background Influence
ˆ ( ) b y Sy I S Hx
ˆ ( )a b y Hx HKy I HK Hx
ECMWFObservation Influence Training Course 2010 slide 10
Influence Matrix Properties
The diagonal element satisfy 0 1iiS
1
N
iii
S Total Information Content
1
. .
N
iii
SAverage Influence
Tot Obs Number
ECMWFObservation Influence Training Course 2010 slide 11
Synop Surface Pressure Influence
>1
Sii>1due to the numericalapprossimation
ECMWFObservation Influence Training Course 2010 slide 12
Aircraft 250 hPa U-Comp Influence
Sii>1due to the numericalapprossimation
>1
ECMWFObservation Influence Training Course 2010 slide 13
QuikSCAT U-Comp Influence
>1
Sii>1due to the numericalapprossimation
ECMWFObservation Influence Training Course 2010 slide 14
Observation Influence: Vertical levels
>1
Sii>1due to the numericalapprossimation
ECMWFObservation Influence Training Course 2010 slide 15
AMSU-A channel 13 Influence
Sii>1due to the numericalapprossimation
>1
ECMWFObservation Influence Training Course 2010 slide 16
Toy Model: 2 Observations
1 1 1 1( )T T S R H B + HR H H
2oR IH I
2
2o
b
r
x1
x2y2
y1
Find the expression for S as function of r and the expression of for α=0 and 1given the assumptions:
ˆ ( ) b y Sy I S x
y
2
2
b
b
B
ECMWFObservation Influence Training Course 2010 slide 17
Toy Model: 2 Observations
1 1 1 1( )T T S R H B + HR H H
2oR IH I
2
2o
b
r
x1
x2y2
y1
2
2
b
b
B
2
121122211
r
SSSS1
22
2
22
2222
2
12
1
12
1212
1
rr
r
rr
r
rr
r
rr
r
S
1
12211
r
SS0
Sii
r
1
1
1/2
1/3
1/2
=1
=0
0
ECMWFObservation Influence Training Course 2010 slide 18
Consideration (1)
• Where observations are dense Sii tends to be small and
the background sensitivities tend to be large and also the surrounding observations have large influence (off-diagonal term)
• When observations are sparse Sii and the background
sensitivity are determined by their relative accuracies (r) and the surrounding observations have small influence (off-diagonal term)
ECMWFObservation Influence Training Course 2010 slide 19
Toy Model: 2 Observations
ˆ ( ) b y Sy I S x
2
21o
b
r
2
1 1 1 2 22 2 2
2 2ˆ ( )
4 4 4y y x y x
=0 1 1 1
1 1ˆ
2 2y y x
=1 1 1 1 2 2
1 2 1ˆ ( )
3 3 3y y x y x
x1
x2y2
y1
22
2
22
2222
2
12
1
12
1212
1
rr
r
rr
r
rr
r
rr
r
S
ECMWFObservation Influence Training Course 2010 slide 20
Consideration (2)
• When observation and background have similar accuracies (r), the estimate ŷ1 depends on y1 and x1 and an
additional term due to the second observation. We see that if R is diagonal the observational contribution is devaluated with respect to the background because a group of correlated background values count more than the single observation (2-α2 → 2). Also by increasing background correlation, the nearby observation and background have a larger contribution
ECMWFObservation Influence Training Course 2010 slide 21
Global and Partial Influence
Global Influence = GI = 1
N
iii
S
N
100% only Obs Influence
0% only Model Influence
Partial Influence = PI = ii
i I
IpS Type
Variable
Area
Level
ECMWFObservation Influence Training Course 2010 slide 22
Global and Partial Influence
Level
1 1000-850 2 850-700 3 700-500 4 500-400 5 400-300 6 300-200 7 200-100 8 100-70 9 70-50 10 50-30 11 30-0
Type
SYNOPAIREPSATOBDRIBUTEMPPILOTAMSUAHIRSSSMIGOESMETEOSATQuikSCAT
Variable
uvTqps
Tb
Area
Tropics S.HemN.Hem
Europe US N.Atl …
ECMWFObservation Influence Training Course 2010 slide 23
Average Influence and Information Content
Global Observation Influence GI=15%
Global Background Influence I-GI=85%
0
5
10
15
20
25
Synop
Dribu
Airep
Satob
Tem
pPilo
t
SCAT
AMSU-A
AMSU-B
AIRS
HIRS
TCW
V
SSM/I
Goes
Met
eo
Ozone
Info
rma
tio
n c
on
ten
t %
0
0.1
0.2
0.3
0.4
0.5
0.6
Synop
Dribu
Airep
Satob
Temp
Pilot
SCAT
AMSU-A
AMSU-B
AIRS
HIRS
TCWV
SSM/I
Goe
s
Met
eo
Ozo
ne
Mea
n I
nfl
uen
ce
2003 ECMWF Operational
ECMWFObservation Influence Training Course 2010 slide 25
0
4
8
12
16
20
DFS %
Global
0
2
4
6
8
10
12
14
16
Mean Influence %
2007 ECMWF OpDFS 20%
20
16
12
8
4
048
12162024
Synop
Dribu
Airep
Satob
Temp
Pilot
SCAT
AMSU-A
AMSU-B
AIRS
HIRS
TCWV
SSM/I
Goes
Met
eo
Ozone
0
0.2
0.4
0.6
ECMWFObservation Influence Training Course 2010 slide 27
DFS %Global Information Content 20%North Pole 5.6%South Pole 1.1%Tropics 5.5%
0
4
8
12
16
20
DFS %
North Pole South Pole Tropics
ECMWFObservation Influence Training Course 2010 slide 28
Evolution of the B matrix: B computed from EnDA
Xt+εStochastics
y+εo
SST+εSST
Xb+εb
y+εo
SST+εSST
AMSU-A ch6
ECMWFObservation Influence Training Course 2010 slide 29
Evolution of the GOS: Interim ReanalysisAircraft 200-300 hPa
1999
2007
ECMWFObservation Influence Training Course 2010 slide 30
Evolution of the GOS: Interim ReanalysisAMSU-A ch6
1999
2007
ECMWFObservation Influence Training Course 2010 slide 31
Evolution of the GOS and of the B: Interim ReanalysisAMSU-A ch13
1999
2007
2003
ECMWFObservation Influence Training Course 2010 slide 32
Evolution of the GOS: Interim ReanalysisAMSU-A
1999
2007
ECMWFObservation Influence Training Course 2010 slide 33
Evolution of the GOS: Interim ReanalysisU-comp Aircraft, Radiosonde, Vertical Profiler, AMV
1999
2007
ECMWFObservation Influence Training Course 2010 slide 34
ConclusionsThe Influence Matrix is well-known in multi-variate linear regression. It is
used to identify influential data. Influence patterns are not part of the
estimates of the model but rather are part of the conditions under which
the model is estimated
Disproportionate influence can be due to:incorrect data (quality control)legitimately extreme observations occurrence
to which extent the estimate depends on these data
Sii=1Data-sparse Single observation
Model under-confident (1-Sii)
Sii=0Data-dense
Model over-confident tuning (1-Sii)
ECMWFObservation Influence Training Course 2010 slide 35
Conclusions
Observational Influence pattern would provide information on different
observation system
New observation systemSpecial observing field campaign
Thinning is mainly performed to reduce the spatial correlation but also to
reduce the analysis computational cost
Knowledge of the observations influence helps in selecting appropriate
data density
Diagnose the impact of improved physics representation in the linearized
forecast model in terms of observation influence
ECMWFObservation Influence Training Course 2010 slide 36
Background and Observation Tuning in ECMWF 4D-Var
Observations
Model
ECMWFObservation Influence Training Course 2010 slide 37
Influence Matrix Computation
1
1 1
11( '') ( )( ) ( )( )
N MT Ti
i i i ii i i
u uN
J L L L L
1 1( '') T S R H J H
B
A sample of N=50 random vectors from (0,1) Truncated eigenvector expansion
with vectors obtained through the combined Lanczos/conjugatealgorithm. M=40
1 1 1( )T A B + HR H
ECMWFObservation Influence Training Course 2010 slide 38
Hessian Approximation B-A
0.00
0.50
1.00
1.50
2.00
2.50
0 100 200 300 400 500 600 700 800 900 1000
Number of Hessian Vectors
Nu
mb
er o
f S
elfs
ensi
tivi
ties
>1
(%
)
1
1( )( )
MTi
i ii i
L L
1
1( )( )
NT
i ii
u uN L L 500 random vector to represent B
ECMWFObservation Influence Training Course 2010 slide 39
Ill-Condition Problem
A set of linear equation is said to be ill-conditioned if small variations in
X=(HK I-HK) have large effect on the exact solution ŷ, e.g matrix close
to singularity
max
min
( )
X =K
A Ill-conditioning has effects on the stability and solution accuracy . A
measure of ill-conditioning is
A different form of ill-conditioning can results from collinearity: XXT
close to singularity
Large difference between the background and observation error
standard deviation and high dimension matrix
ECMWFObservation Influence Training Course 2010 slide 40
Flow Dependent b: MAMT+Q
DRIBU psInfluence
b
o