Inverse regression approach to (robust) non-linear high-to-low … · 2019-12-20 · Inverse...

Inverse regression approach to (robust) non-linear high-to-lowdimensional mapping

Emeline Perthame

Joint work with Florence Forbes

INRIA, team MISTIS, Grenoble

LMNO, Caen

October 27, 2016

1 / 25

Outlines

1. Non linear mapping problem

2. GLLiM/SLLiM: inverse regression approach

3. Estimation of parameters

4. Results and conclusion

2 / 25

Outlines

3 / 25

A non linear mapping problem

• A non linear mapping problem

y1.........yD

g(y) x1

• Prediction of X from Y through a non linear regression function g

E(X |Y = y) = g(y)

with Y ∈ RD ,X ∈ RL,D L

4 / 25

A non linear mapping problem

• Application: Ω mission on Mars → launch of a spectrometer aroundMars

• Problem: Retrieving physical properties from hyperspectral images

− Y: spectrum (D=184)

− X: composition of the ground (L=3)

Mars Express - Omega (2004) [http://geops.geol.u-psud.fr/]

0 50 100 150

Wavelength

nce prop. of dust

prop. of CO2 ice

prop. of water ice

5 / 25

Some approaches

• Difficulty: D large → curse of dimensionality

• Solutions: via dimensionality reduction

− Reduce dimension of y before regression: eg. PCA on y

→ Risk: poor prediction of x

− Take x into account: PLS, SIR, Kernel SIR, PC based methods

→ Two steps approaches not expressed as a single optimizationproblem

→ Our approach: inverse regression to reduce dimension

6 / 25

Outlines

7 / 25

Proposed Method: An inverse regression strategy

• x ∈ RL low-dimensional space,

• y ∈ RD high-dimensional space,

• (y , x) are realizations of (Y ,X ) ∼ p(Y ,X ; θ), θ parameters

Inverse conditional density: p(Y | X ; θ)

• Y is a noisy function of X

• Modeled via mixtures → Tractable θ estimation

Forward conditional density: p(X | Y ; θ∗), with θ∗ = f (θ)

→ High-to-low prediction, eg. X = E[X | Y = Y ; θ∗]

8 / 25

Student Locally-linear Mapping (SLLiM)

A piecewise affine model:

• Introduce a missing variable Z → Z = k ⇔ Y is the image of X by anaffine transformation

Y =K∑

I(Z = k)(AkX + bk + Ek )

Definition of SLLiM

p(Y |X ,Z = k ; θ) = S(Y ;AkX + bk ,Σk , αyk , γ

• Affine transformations are local: mixture of K Student laws

p(X |Z = k ; θ) = S(X ; ck ,Γk , αk , 1)

p(Z = k ; θ) = πk

• The set of all model parameters is:

θ = πk , ck ,Γk ,Ak , bk ,Σk , αk , k = 1 . . .K

9 / 25

Why a Student mixture ?

• Dealing with outliers → Generalized Student distribution for the jointdensity of (X ,Y )

SM (y ;µ,Σ, α, γ) =Γ(α+ M /2)

|Σ|1/2 Γ(α) (2πγ)M/2[1 + δ(y , µ,Σ)/(2γ)]−(α+M/2),

• Gaussian scale mixture representation (using weight variable Udistributed according to a Gamma distribution )

SM (y ;µ,Σ, α, γ) =

∫ ∞0

NM (y ;µ,Σ/u) G(u;α, γ) du

• Parameters estimation is tractable by an EM algorithm

-6 -4 -2 0 2 4 6

GaussianStudent α=0.1

10 / 25

Low-to-high (Inverse) Regression

• If X and Y are both observed

− The parameter vector, θ, can be estimated in closed-form using an EMinference procedure

− This yields the inverse conditional density which is a Student mixture:

p(Y |X ; θ) =K∑

πkS(X ; ck ,Γk , αk , 1)∑Kj=1 πjS(X ; cj ,Γj , αj , 1)

S(Y ;AkX + bk ,Σkαyk , γ

• Both densities are Student mixtures parameterized by θ. Therefore, toobtain:

− A low-to-high inverse regression function:

E[Y |X = x ; θ] =K∑

πkS(x ; ck ,Γk , αk , 1)∑Kj=1 πjS(x ; cj ,Γj , αk , 1)

(Akx + bk ),

11 / 25

High-to-low (Forward) Regression

• The forward conditional density is a Student mixture as well:

p(X |Y ; θ∗) =K∑

π∗kS(Y ; c∗k ,Γ∗k , αk , 1)∑K

j=1 π∗j S(Y ; c∗j ,Γ

∗j , αj , 1)

S(X ;A∗kY + b∗k ,Σ∗k , α

xk , γ

• The forward parameter vector, θ∗ has an analytic expression as afunction of θ

• Both densities are Student mixtures parameterized by θ. Therefore, toobtain:

− A high-to-low forward regression function:

E[X |Y = y ; θ] =K∑

πkS(y ; c∗k ,Γ∗k , αk , 1)∑K

j=1 πjS(y ; c∗j ,Γ∗j , αj , 1)

(A∗ky + b∗k ).

12 / 25

The forward parameter vector θ∗ from θ

c∗k = Akck + bk ,

Γ∗k = Σk + AkΓkATk ,

A∗k = Σ∗kATk Σ−1

b∗k = Σ∗k (Γ−1k ck −AT

k Σ−1k bk ),

Σ∗k = (Γ−1k + AT

k Σ−1k Ak )−1.

13 / 25

A joint model approach to reduce the number of parameters

• Joint model

p(X = x ,Y = y |Z = k) = SL+D

];mk ,Vk , αk , 1

Akck + bk

]and Vk =

[Γk ΓkA

AkΓk Σk + AkΓkATk

]• Reduce the number of parameters to estimate

− Forward strategy + Γk diagonal

∗ nb. par. = 12D(D − 1) + DL + 2L + D

∗ D = 500,L = 2→ 126 254 parameters

− Inverse strategy + Σk diagonal

∗ nb. par. 12L(L− 1) + DL + 2D + L

∗ D = 500,L = 2→ 2 003 parameters

14 / 25

Extension to partially observed responses

• Incorporate a latent component into the low-dimensional variable:

]where T ∈ RLt is observed and W ∈ RLw is latent (L = Lt + Lw)

• Example on Mars data: lighting ? temperature ? grain size ?

• Observed pairs (yn ,Tn),n = 1 . . .N (T ∈ RLt)

• Additional latent variable W (W ∈ RLw)

• Assuming the independence of T and W given Z :

p(X = (T ,W )> | Z = k) = SL((T ,W )>; ck ,Γk , αk , 1)

with ck =

], Γk =

[Γtk 0

15 / 25

Extension to partially observed responses

• Extension of SLLiM to more general covariance structure

• With Ak =[At

K∑k=1

I(Z = k)(AtkT + Aw

k W + bk + Ek )

rewrites

K∑k=1

I(Z = k)(AtkT + bk + E ′k )

with Var(E ′k ) ∝ Σk + Aw

k Aw>k

− Diagonal Σk −→ Factor analysis with Lw factors (at most)

− A compromise between full O(D2) and diagonal O(D) covariances

16 / 25

Outlines

17 / 25

Estimation of θ = (ck ,Γk ,Ak , bk ,Σk , πk , αk )1≤k≤K by EM algorithm

• E-step

− Update posterior probabilities

(EZ ) p(Z = k |t , y , θ(i)) → “SMM-like”

(EW ) p(W |Z = k , t , y , θ(i)) → Probabilistic PCA or FactorAnalysis like

(EU ) E(U |Z = k , t , y , θ(i)) → Down-weighting extreme/atypicvalues in estimators → More robust

• M-step

(MX ) (πk , ck ,Γk ) → “SMM-like”

(MY |X ) (Ak , bk ,Σk ) → Hybrid between linear regression andPPCA/FA

Ak = Yk XTk (

]+ Xk X

Tk )−1

(Mα) αk → Not in closed-form but standard (specific to Student)

18 / 25

Outlines

19 / 25

Application L = D = 1

• RATP → Subway in Paris

• Measure of air quality atChatelet station, line 4

• March 2015 → N = 341measures

• Prediction of NO (L=1) fromNO2 (D=1)

→ Robustness of SLLiM

20 30 40 50 60 70 80

20 / 25

Application L = D = 1 / SLLiM compared to GLLiM

20 30 40 50 60 70 80

GLLiMSLLiM

20 30 40 50 60 70 80

GLLiMSLLiM

→ Illustration of robustness of the proposed model

21 / 25

Application L = D = 1 / SLLiM compared to GLLiM

1 2 3 4 5 6 7 8 9 10

GLLiMSLLiMGLLiM-WOSLLiM-WO

→ SLLiM achieves better prediction rates than GLLiM on complete data

→ SLLiM becomes equivalent to GLLiM when outliers are removed

22 / 25

Other applications and augmented version of SLLiM

• Application when D L

− Hyperspectral data on Mars (D=184, L=2, N=6983)

→ Comparison with other non linear regression methods

Table: Mars data: average NRMSE and standard deviations in parenthesis forproportions of CO2 ice and dust over 100 runs.

Method Prop. of CO2 ice Prop. of dust

SLLiM (K=10) 0.168 (0.019) 0.145 (0.020)GLLiM (K=10) 0.180 (0.023) 0.155 (0.023)MARS 0.173 (0.016) 0.160 (0.021)SIR 0.243 (0.025) 0.157 (0.016)RVM 0.299 (0.021) 0.275 (0.034)

23 / 25

Results - Application to hyperspectral image analysis

GLLiM SLLiM SplinesProportion of CO2 ice

Proportion of dust

24 / 25

Conclusion and future work

• Mixture model used for prediction

• Addition of latent variables of partially observed responses

• Selection of K and Lw

− K fixed ? Or selected by BIC ?

− Lw selected by BIC ?

Thank you for your attention ! Any questions ?

25 / 25

Inverse regression approach to (robust) non-linear high-to-low … · 2019-12-20 · Inverse...

Documents

Robust Cross-Validation of Linear Regression QSAR · PDF fileRobust Cross-Validation of Linear Regression QSAR Models ... Robust Cross-Validation of Linear Regression QSAR Models

Robust Regression for QuantFin and Fama-French 1992 Redux€¦ · 3. FF92 Redux with Robust Regression Different conclusions with robust regression than FF92: - Equity returns are

Ridge regression and inverse problems · Ridge regression and inverse problems ANDERS BJORKSTR¨ OM¨ Stockholm University, Sweden January 18, 2001 Abstract Why is ridge regression

Robust Optimization for Deep Regression

Robust regression through the Huber's criterion and

Sliced Inverse Regression for Dimension Reductionlib.cufe.edu.cn/upload_files/other/3_20140520034812...Sliced Inverse Regression for Dimension Reduction KER-CHAU LI* Modem advances

Kernel Inverse Regression for spatial random fields - CORE · Kernel Inverse Regression for spatial random elds ... Spatial statistics includes any techniques which study phenomenons

Half-Day 2: Robust Regression Estimation · Half-Day 2: Robust Regression Estimation 14 / 38 General Regression M-EstimationRobust Regression MM-estimationRobuste InferenzGLM Example

Seminar on Robust Regression Methods

TRANSFORMATIONS IN REGRESSION: A ROBUST ANALYSIS

Localized Sliced Inverse Regression - Duke Universityftp.stat.duke.edu/WorkingPapers/08-19.pdf · Localized Sliced Inverse Regression ... Duke University, Durham, NC ... 2.2 Localized

Robust and Efficient Forward, Differential, and Inverse

An Introduction to Robust Regression

Chapter 6 Robust and Resistant Regression

Random Cascaded-Regression Copse for Robust Facial

Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Fast Algorithms for Robust Regression

Robust Nonlinear Regression in Enzyme Kinetic Parameters

Abundant Inverse Regression Using Suﬃcient Reduction and

A unified approach to robust, regression-based