16
1 Information Content Tristan L’Ecuyer

Information Content

  • Upload
    calvin

  • View
    14

  • Download
    0

Embed Size (px)

DESCRIPTION

Information Content. Tristan L ’ Ecuyer. Degrees of Freedom. Using the expression for the state vector that minimizes the cost function it is relatively straight-forward to show that where I m is the m x m identity matrix and A is the averaging kernel. - PowerPoint PPT Presentation

Citation preview

Page 1: Information Content

1

Information Content

Tristan L’Ecuyer

Page 2: Information Content

2

Degrees of Freedom

Using the expression for the state vector that minimizes the cost function it is relatively straight-forward to show that

where Im is the m x m identity matrix and A is the averaging kernel. NOTE: Even if the number of retrieval parameters is equal to or less

than the number of measurements, a retrieval can still be under-constrained if noise and redundancy are such that the number of degrees of freedom for signal is less than the number of parameters to be retrieved.

sd =Tr Tr Tr -1-1 T -1 -1 T -1

x a y a yS S = K S K +S K S K = A

nd =Tr Tr -1T

y a y mS KS K +S = I - A

Page 3: Information Content

3

Entropy-based Information Content

The Gibbs entropy is the logarithm of the number of discrete internal states of a thermodynamic system

where pi is the probability of the system being in state i and k is the Boltzmann constant.

The information theory analogue has k=1 and the p i representing the probabilities of all possible combinations of retrieval parameters.

More generally, for a continuous distribution (eg. Gaussian):

i ii

S(P)=-k p lnp

2S P(x) =- P(x)log P(x) dx

Page 4: Information Content

4

Entropy of a Gaussian Distribution

For the Gaussian distributions typically used in optimal estimation

we have:

For an m-variable Gaussian dist.:

2

1/2 2

x-x1P(x)= exp -

2σ2π σ

2 21/2

21/2 2 2

x-x x-x1S P(x) = exp - log 2π σ +exp - dx

2σ 2σ2π σ

1/ 2

2S P(x) =log 2 e

1/ 2 12 22S P( ) =m log 2 log ye x S

Page 5: Information Content

5

Information Content of a Retrieval

The information content of an observing system is defined as the difference in entropy between an a priori set of possible solutions, S(P1), and the subset of these solutions that also satisfy the measurements, S(P2):

If Gaussian distributions are assumed for the prior and posterior state spaces as in the O. E. approach, this can be written:

since, after minimizing the cost function, the covariance of the posterior state space is:

)S(P)S(PH 21

2 2

1 1H= log log

2 2 1 T 1 1

1 2 a y aS S S K S K S

11y

T1ax KSKSS

Page 6: Information Content

6

Interpretation

Qualitatively, information content describes the factor by which knowledge of a quantity is improved by making a measurement.

Using Gaussian statistics we see that the information content provides a measure of how much the ‘volume of uncertainty’ represented by the a priori state space is reduced after measurements are made.

Essentially this is a generalization of the scalar concept of ‘signal-to-noise’ ratio.

2

1H= log

2-1

x aS S

Page 7: Information Content

7

Blue a priori state space

Green state space that also matches MODIS visible channel (0.64 μm)

Red state space that matches both 0.64 and 2.13 μm channels

Yellow state space that matches all 17 MODIS channels

Liquid Cloud Retrievals

Prior State Space 0.64 μm (H=1.20)

LW

P (

gm

-3)

Re (μm)

LW

P (

gm

-3)

Re (μm)

0.64 & 2.13 μm(H=2.51)

17 Channels(H=3.53)

Page 8: Information Content

8

Measurement Redundancy

Using multiple channels with similar sensitivities to the parameters of interest merely adds redundant information to the retrieval.

While this can have the benefit of reducing random noise, it cannot remove biases introduced by forward model assumptions that often impact both channels in similar ways as well.

Page 9: Information Content

9

The information content of individual channels in an observing system can be assessed via:

where kj is the row of K corresponding to channel j.

The channels providing the greatest amount of information can then be sequentially selected by adjusting the covariance matrix via:

Channel Selection

Tj 2 j a j

1H log 1 k S k

2

Tl

1a

1l kkSS

Page 10: Information Content

10

Method

Evaluate Sy

Compute K Establish prior information Evaluate the information content of each channel, Hj, with

respect to the a priori, Sa

Select the channel that provides the most information and update the covariance matrix using the appropriate row of K

Recompute the information content of all remaining channels with respect to this new error covariance, S1

Select the channel that provides the most additional information Repeat this procedure until the signal-to-noise ratio of all

remaining channels is less than 1: j 2H = 0.5 log 1+1 0.5

Page 11: Information Content

11

Optimizing Retrieval Algorithms

GOAL: Select optimal channel configuration that maximizes retrieval information content for the least possible computational cost by limiting the amount of redundancy in the observations

APPROACH: Use Jacobian of the forward model combined with appropriate error statistics to determine the set of measurements that provides the most information concerning the geophysical parameters of interest for the least computational cost

Page 12: Information Content

12

Information Spectra

Relative to the a priori, the 11 μm channel provides the most information due to its sensitivity to cloud height and its lower uncertainty relative to the visible channels.

Once the information this channel carries is added to the retrieval, the I.C. of the remaining IR channels is greatly reduced and two visible channels are chosen next.

IWP = 100 gm-2 Re = 16 μm Ctop = 9 km

Page 13: Information Content

13

Unrealistic Errors

When a uniform 10% measurement uncertainty is assumed, the visible/near-IR channels are weighted unrealistically strongly relative to the IR.

IWP = 100 gm-2 Re = 16 μm Ctop = 9 kmIWP = 100 gm-2 Re = 16 μm Ctop = 9 km

10 %

Page 14: Information Content

14

Thin Cloud (IWP = 10 gm-2)

For very thin clouds, the improved accuracy of IR channels relative to those in the visible increases their utility in the retrieval.

IWP = 10 gm-2 Re = 16 μm Ctop = 9 kmIWP = 100 gm-2 Re = 16 μm Ctop = 9 km

Page 15: Information Content

15

Larger Crystals (Re = 40 μm)

At large effective radii, both the visible and IR channels lose sensitivity to effective radius. Two IR channels are chosen primarily for retrieving cloud height and optical depth.

IWP = 100 gm-2 Re = 40 μm Ctop = 9 kmIWP = 100 gm-2 Re = 16 μm Ctop = 9 km

Page 16: Information Content

16

High Cloud (Ctop = 14 km)

The enhanced contrast between cloud top temperature and the surface increases the signal to noise ratio of the IR channels.

IWP = 100 gm-2 Re = 16 μm Ctop = 14 kmIWP = 100 gm-2 Re = 16 μm Ctop = 9 km