Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Inverse regression approach to (robust) non-linear high-to-lowdimensional mapping
Emeline Perthame
Joint work with Florence Forbes
INRIA, team MISTIS, Grenoble
LMNO, Caen
October 27, 2016
1 / 25
Outlines
1. Non linear mapping problem
2. GLLiM/SLLiM: inverse regression approach
3. Estimation of parameters
4. Results and conclusion
2 / 25
Outlines
1. Non linear mapping problem
2. GLLiM/SLLiM: inverse regression approach
3. Estimation of parameters
4. Results and conclusion
3 / 25
A non linear mapping problem
• A non linear mapping problem
y =
y1.........yD
g(y) x1
...xL
= x
• Prediction of X from Y through a non linear regression function g
E(X |Y = y) = g(y)
with Y ∈ RD ,X ∈ RL,D L
4 / 25
A non linear mapping problem
• Application: Ω mission on Mars → launch of a spectrometer aroundMars
• Problem: Retrieving physical properties from hyperspectral images
− Y: spectrum (D=184)
− X: composition of the ground (L=3)
Mars Express - Omega (2004) [http://geops.geol.u-psud.fr/]
0 50 100 150
0.1
0.2
0.3
0.4
0.5
Wavelength
Refl
ecta
nce prop. of dust
prop. of CO2 ice
prop. of water ice
5 / 25
Some approaches
• Difficulty: D large → curse of dimensionality
• Solutions: via dimensionality reduction
− Reduce dimension of y before regression: eg. PCA on y
→ Risk: poor prediction of x
− Take x into account: PLS, SIR, Kernel SIR, PC based methods
→ Two steps approaches not expressed as a single optimizationproblem
→ Our approach: inverse regression to reduce dimension
6 / 25
Outlines
1. Non linear mapping problem
2. GLLiM/SLLiM: inverse regression approach
3. Estimation of parameters
4. Results and conclusion
7 / 25
Proposed Method: An inverse regression strategy
• x ∈ RL low-dimensional space,
• y ∈ RD high-dimensional space,
• (y , x) are realizations of (Y ,X ) ∼ p(Y ,X ; θ), θ parameters
Inverse conditional density: p(Y | X ; θ)
• Y is a noisy function of X
• Modeled via mixtures → Tractable θ estimation
Forward conditional density: p(X | Y ; θ∗), with θ∗ = f (θ)
→ High-to-low prediction, eg. X = E[X | Y = Y ; θ∗]
8 / 25
Student Locally-linear Mapping (SLLiM)
A piecewise affine model:
• Introduce a missing variable Z → Z = k ⇔ Y is the image of X by anaffine transformation
Y =K∑
k=1
I(Z = k)(AkX + bk + Ek )
Definition of SLLiM
p(Y |X ,Z = k ; θ) = S(Y ;AkX + bk ,Σk , αyk , γ
yk )
• Affine transformations are local: mixture of K Student laws
p(X |Z = k ; θ) = S(X ; ck ,Γk , αk , 1)
p(Z = k ; θ) = πk
• The set of all model parameters is:
θ = πk , ck ,Γk ,Ak , bk ,Σk , αk , k = 1 . . .K
9 / 25
Why a Student mixture ?
• Dealing with outliers → Generalized Student distribution for the jointdensity of (X ,Y )
SM (y ;µ,Σ, α, γ) =Γ(α+ M /2)
|Σ|1/2 Γ(α) (2πγ)M/2[1 + δ(y , µ,Σ)/(2γ)]−(α+M/2),
• Gaussian scale mixture representation (using weight variable Udistributed according to a Gamma distribution )
SM (y ;µ,Σ, α, γ) =
∫ ∞0
NM (y ;µ,Σ/u) G(u;α, γ) du
• Parameters estimation is tractable by an EM algorithm
-6 -4 -2 0 2 4 6
0.0
0.1
0.2
0.3
0.4
x
Den
sity
GaussianStudent α=0.1
10 / 25
Low-to-high (Inverse) Regression
• If X and Y are both observed
− The parameter vector, θ, can be estimated in closed-form using an EMinference procedure
− This yields the inverse conditional density which is a Student mixture:
p(Y |X ; θ) =K∑
k=1
πkS(X ; ck ,Γk , αk , 1)∑Kj=1 πjS(X ; cj ,Γj , αj , 1)
S(Y ;AkX + bk ,Σkαyk , γ
yk )
• Both densities are Student mixtures parameterized by θ. Therefore, toobtain:
− A low-to-high inverse regression function:
E[Y |X = x ; θ] =K∑
k=1
πkS(x ; ck ,Γk , αk , 1)∑Kj=1 πjS(x ; cj ,Γj , αk , 1)
(Akx + bk ),
11 / 25
High-to-low (Forward) Regression
• The forward conditional density is a Student mixture as well:
p(X |Y ; θ∗) =K∑
k=1
π∗kS(Y ; c∗k ,Γ∗k , αk , 1)∑K
j=1 π∗j S(Y ; c∗j ,Γ
∗j , αj , 1)
S(X ;A∗kY + b∗k ,Σ∗k , α
xk , γ
xk )
• The forward parameter vector, θ∗ has an analytic expression as afunction of θ
• Both densities are Student mixtures parameterized by θ. Therefore, toobtain:
− A high-to-low forward regression function:
E[X |Y = y ; θ] =K∑
k=1
πkS(y ; c∗k ,Γ∗k , αk , 1)∑K
j=1 πjS(y ; c∗j ,Γ∗j , αj , 1)
(A∗ky + b∗k ).
12 / 25
The forward parameter vector θ∗ from θ
c∗k = Akck + bk ,
Γ∗k = Σk + AkΓkATk ,
A∗k = Σ∗kATk Σ−1
k ,
b∗k = Σ∗k (Γ−1k ck −AT
k Σ−1k bk ),
Σ∗k = (Γ−1k + AT
k Σ−1k Ak )−1.
13 / 25
A joint model approach to reduce the number of parameters
• Joint model
p(X = x ,Y = y |Z = k) = SL+D
([xy
];mk ,Vk , αk , 1
)with
mk =
[ck
Akck + bk
]and Vk =
[Γk ΓkA
Tk
AkΓk Σk + AkΓkATk
]• Reduce the number of parameters to estimate
− Forward strategy + Γk diagonal
∗ nb. par. = 12D(D − 1) + DL + 2L + D
∗ D = 500,L = 2→ 126 254 parameters
− Inverse strategy + Σk diagonal
∗ nb. par. 12L(L− 1) + DL + 2D + L
∗ D = 500,L = 2→ 2 003 parameters
14 / 25
Extension to partially observed responses
• Incorporate a latent component into the low-dimensional variable:
X =
[TW
]where T ∈ RLt is observed and W ∈ RLw is latent (L = Lt + Lw)
• Example on Mars data: lighting ? temperature ? grain size ?
• Observed pairs (yn ,Tn),n = 1 . . .N (T ∈ RLt)
• Additional latent variable W (W ∈ RLw)
• Assuming the independence of T and W given Z :
p(X = (T ,W )> | Z = k) = SL((T ,W )>; ck ,Γk , αk , 1)
with ck =
[ctk0
], Γk =
[Γtk 0
0 ILw
]
15 / 25
Extension to partially observed responses
• Extension of SLLiM to more general covariance structure
• With Ak =[At
k Awk
],
Y =
K∑k=1
I(Z = k)(AtkT + Aw
k W + bk + Ek )
rewrites
Y =
K∑k=1
I(Z = k)(AtkT + bk + E ′k )
with Var(E ′k ) ∝ Σk + Aw
k Aw>k
− Diagonal Σk −→ Factor analysis with Lw factors (at most)
− A compromise between full O(D2) and diagonal O(D) covariances
16 / 25
Outlines
1. Non linear mapping problem
2. GLLiM/SLLiM: inverse regression approach
3. Estimation of parameters
4. Results and conclusion
17 / 25
Estimation of θ = (ck ,Γk ,Ak , bk ,Σk , πk , αk )1≤k≤K by EM algorithm
• E-step
− Update posterior probabilities
(EZ ) p(Z = k |t , y , θ(i)) → “SMM-like”
(EW ) p(W |Z = k , t , y , θ(i)) → Probabilistic PCA or FactorAnalysis like
(EU ) E(U |Z = k , t , y , θ(i)) → Down-weighting extreme/atypicvalues in estimators → More robust
• M-step
(MX ) (πk , ck ,Γk ) → “SMM-like”
(MY |X ) (Ak , bk ,Σk ) → Hybrid between linear regression andPPCA/FA
Ak = Yk XTk (
[0 0
0 Swk
]+ Xk X
Tk )−1
(Mα) αk → Not in closed-form but standard (specific to Student)
18 / 25
Outlines
1. Non linear mapping problem
2. GLLiM/SLLiM: inverse regression approach
3. Estimation of parameters
4. Results and conclusion
19 / 25
Application L = D = 1
• RATP → Subway in Paris
• Measure of air quality atChatelet station, line 4
• March 2015 → N = 341measures
• Prediction of NO (L=1) fromNO2 (D=1)
→ Robustness of SLLiM
20 30 40 50 60 70 80
010
020
0300
400
500
NO2
NO
20 / 25
Application L = D = 1 / SLLiM compared to GLLiM
20 30 40 50 60 70 80
010
020
0300
400
500
NO2
NO
GLLiMSLLiM
20 30 40 50 60 70 80
010
020
0300
400
500
NO2
NO
GLLiMSLLiM
→ Illustration of robustness of the proposed model
21 / 25
Application L = D = 1 / SLLiM compared to GLLiM
1 2 3 4 5 6 7 8 9 10
0.76
0.78
0.80
0.82
0.84
K
NRMSE
GLLiMSLLiMGLLiM-WOSLLiM-WO
→ SLLiM achieves better prediction rates than GLLiM on complete data
→ SLLiM becomes equivalent to GLLiM when outliers are removed
22 / 25
Other applications and augmented version of SLLiM
• Application when D L
− Hyperspectral data on Mars (D=184, L=2, N=6983)
→ Comparison with other non linear regression methods
Table: Mars data: average NRMSE and standard deviations in parenthesis forproportions of CO2 ice and dust over 100 runs.
Method Prop. of CO2 ice Prop. of dust
SLLiM (K=10) 0.168 (0.019) 0.145 (0.020)GLLiM (K=10) 0.180 (0.023) 0.155 (0.023)MARS 0.173 (0.016) 0.160 (0.021)SIR 0.243 (0.025) 0.157 (0.016)RVM 0.299 (0.021) 0.275 (0.034)
23 / 25
Results - Application to hyperspectral image analysis
GLLiM SLLiM SplinesProportion of CO2 ice
Proportion of dust
24 / 25
Conclusion and future work
• Mixture model used for prediction
• Addition of latent variables of partially observed responses
• Selection of K and Lw
− K fixed ? Or selected by BIC ?
− Lw selected by BIC ?
Thank you for your attention ! Any questions ?
25 / 25