37
Gaussianization based on Principal Components Analisys (GPCA): an easy tool for optimal signal processing. Valero Laparra Jesús Malo Gustavo Camps

Valero Laparra Jesús Malo Gustavo Camps

  • Upload
    taite

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

Gaussianization based on Principal Components Analisys (GPCA): an easy tool for optimal signal processing. Valero Laparra Jesús Malo Gustavo Camps. INDEX. What? Why? How? Conclusions Toolbox. What?. Estimate multidimensional Probability Densities - PowerPoint PPT Presentation

Citation preview

Page 1: Valero Laparra Jesús Malo Gustavo Camps

Gaussianization based on Principal Components Analisys

(GPCA):an easy tool for optimal signal

processing.

Valero Laparra

Jesús Malo

Gustavo Camps

Page 2: Valero Laparra Jesús Malo Gustavo Camps

INDEX

-What?-Why?-How?-Conclusions

-Toolbox

Page 3: Valero Laparra Jesús Malo Gustavo Camps

What?

• Estimate multidimensional Probability Densities

• How the N-D data is distributed in the N-D space

What to pay atention to!What is important from our data

Page 4: Valero Laparra Jesús Malo Gustavo Camps

What?

Page 5: Valero Laparra Jesús Malo Gustavo Camps

Why?

• GENERIC OPTIMAL SOLUTIONS

Page 6: Valero Laparra Jesús Malo Gustavo Camps

Why?

• GENERIC OPTIMAL SOLUTIONS

Page 7: Valero Laparra Jesús Malo Gustavo Camps

Why?

• GENERIC OPTIMAL SOLUTIONS

Page 8: Valero Laparra Jesús Malo Gustavo Camps

Why?

• GENERIC OPTIMAL SOLUTIONS

Page 9: Valero Laparra Jesús Malo Gustavo Camps

How?

• PDF estimation through samples always asume a model.

• HISTOGRAM: without assuming a functional model

Page 10: Valero Laparra Jesús Malo Gustavo Camps

How?

• X = [ -1.66 1.25 0.73 1.72 0.88 0.19 -0.81 0.42 -0.14 …]

Page 11: Valero Laparra Jesús Malo Gustavo Camps

How?

• Problem: Number of bins estimationNbins = √Nsamples

Page 12: Valero Laparra Jesús Malo Gustavo Camps

How?

• Problem: “the curse of dimensionality”

- Nb_total = Nb_dim ^N_dim

- If we assume: Ns = Nb^2

- Ns = Nb^2*Nd

Page 13: Valero Laparra Jesús Malo Gustavo Camps

How?

• Problem: “the curse of dimensionality”Nb total = Nb dimension ^N dimensions

e.g.– Assuming a minimum number of Nb = 11 bins– We need Ns = 11^2*Nd– If Nd = 1, Ns = 121 968 Bytes Nd = 2, Ns = 14641 117.128 Bytes Nd = 3, Ns = 1771561 14.172.488 Bytes Nd = 4, Ns = 214358881 1.714.871.048 Bytes Nd = 5, Ns = 25.937.000.000 HELP MEMORY

Nd = 6, Ns = 3.138.400.000.000 HELP MEMORY

Page 14: Valero Laparra Jesús Malo Gustavo Camps

How?

• From P(x) to P(y) (Gaussian)

T??

Page 15: Valero Laparra Jesús Malo Gustavo Camps

How?

Page 16: Valero Laparra Jesús Malo Gustavo Camps

How?

MATLAB, MATLAB, WHAT A WONDERFUL WORLD

Answer: GPCA

Page 17: Valero Laparra Jesús Malo Gustavo Camps

How? Theoretical convergence Proof

• Negentropy:

Page 18: Valero Laparra Jesús Malo Gustavo Camps

How?

OPEN ISSUE

Page 19: Valero Laparra Jesús Malo Gustavo Camps

How?

• Stop criterion:

NOTE THAT:

Measuring Mutual

Information

GAUSSIAN UNIQUE DISTRIBUTION

WITH MARGINAL DISTRIBUTIONS GAUSSIANS AND INDEPENDENTS

I (Xn) = ~ 0

Page 20: Valero Laparra Jesús Malo Gustavo Camps

How? GPCA Inverse

NOTE THAT:

Synthesis

Page 21: Valero Laparra Jesús Malo Gustavo Camps

How? GPCA Jacobian

Page 22: Valero Laparra Jesús Malo Gustavo Camps

CONCLUSIONS

• The optimal solution of many problems involves the knoledge of the data pdf.

• GPCA obtains a transform that convert any pdf in a Gaussian pdf.

• It has an easy inverse.

• It has an easy Jacobian.

• This transform can be used to calculate the pdf of any data.

Page 23: Valero Laparra Jesús Malo Gustavo Camps

GPCA toolbox (Matlab)3 examples

• PDF estimation

• Mutual Information Measures

• Synthesis

Wiki-page

Wiki-page

Beta version

Beta version

Page 24: Valero Laparra Jesús Malo Gustavo Camps

Basic toolbox

• [datT Trans] = GPCA (dat, Nit, Perc)

- dat = data matrix with [N dimensions x N samples]

e.g. 100 samples from 2-D gaussiandat = [2 x 100]

- Nit = Number of iterations

- Perc = percentage of increase the pdf Range.

Page 25: Valero Laparra Jesús Malo Gustavo Camps

Basic toolbox

• Perc = percentage of increase the pdf range.

Page 26: Valero Laparra Jesús Malo Gustavo Camps

Basic toolbox

• [datT Trans] = auto_GPCA(dat)

• [datT] = apply_GPCA(dat,Trans)

• [dat] = inv_GPCA(datT,Trans)

• [Px pT detJ JJ] = GPCA_probability(x0,Trans)

Page 27: Valero Laparra Jesús Malo Gustavo Camps

Estimating PDF/manifold

• [datT Trans] = auto_GPCA(dat)• [Px pT detJ JJ] = GPCA_probability (XX,Trans);

Page 28: Valero Laparra Jesús Malo Gustavo Camps

Estimating PDF/manifold

• [datT Trans] = auto_GPCA(dat)• [Px pT detJ JJ] = GPCA_probability (XX,Trans);

Page 29: Valero Laparra Jesús Malo Gustavo Camps

Estimating PDF/manifold

• [datT Trans] = auto_GPCA(dat)• [Px pT detJ JJ] = GPCA_probability (XX,Trans);

Page 30: Valero Laparra Jesús Malo Gustavo Camps

Estimating PDF/manifold

• PROBLEMS

– Not always arrives to Gaussian– Pdf with clusters is more complicated– The Jacobian estimation is highly point-dependent– The derivative (in the Jacobian estimation) is much more

irregular than the integral.– The pdf has to be estimated for each point

Page 31: Valero Laparra Jesús Malo Gustavo Camps

Measuring Mutual Information

• [datT Trans] = auto_GPCA(dat)• MI = abs(min(cumsum(cat(1,Trans.I)))));

Error = (Real MI – Estimated MI) / Real MI (10 realizations)

N - dim Pdf - 1 Pdf - 2 Pdf - 3

3 0.0697 0.0787 0.0630

4 0.0150 0.0031 0.0048

5 0.0353 0.0297 0.0328

8 0.0313 0.0369 0.0372

10 0.0148 0.0145 0.0132

Page 32: Valero Laparra Jesús Malo Gustavo Camps

Measuring Mutual Information

Page 33: Valero Laparra Jesús Malo Gustavo Camps

Measuring Mutual Information

• PROBLEMS

– Entropy estimators are not perfectly defined– More iterations, more error– As more complicated pdf, more error

Page 34: Valero Laparra Jesús Malo Gustavo Camps

Synthesizing data• [datT Trans] = auto_GPCA(dat)• [dat2] = inv_GPCA( randn(Dim,Nsamples) , Trans);

T1

T2

Inv T1

Inv T2

Page 35: Valero Laparra Jesús Malo Gustavo Camps

Synthesizing data

• [datT Trans] = auto_GPCA(dat)• [dat2] = inv_GPCA(randn(Dim,Nsamples),Trans);

Page 36: Valero Laparra Jesús Malo Gustavo Camps

Synthesizing data

• PROBLEMS

– Not always arrive to a Gaussian– Little variations on the variance of the random data obtains very

different results.– No information about features of the data in the transformed

domain.

Page 37: Valero Laparra Jesús Malo Gustavo Camps

• Thanks for your time