Online Robust Reduced-Rank Regression · As expected, for deterministic estimation scheme under Gaussianit,y OLSE is equivalent to GMLE., closed-form solution / not adaptive to outliers

Online Robust Reduced-Rank Regression

Yangzhuoran Fin Yang Ziping ZhaoMonash University ShanghaiTech University

Melbourne, Australia Shanghai, China

The Eleventh IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM)Hangzhou, China

8th�10th June, 2020

Yangzhuoran Fin YangMonash University

Prof. Ziping ZhaoShanghaiTech University

Y. F. Yang and Z. Zhao Online Robust Reduced-Rank Regression IEEE SAM 2020 2 / 29

Outline

1 The Ubiquitous Reduced-Rank Regression (RRR) in Data Science

2 Towards RRR Modeling under Robustness Pursuit and Streaming Data

3 An Online Algorithm via Stochastic Majorization-Minimization (SMM)

4 Numerical Simulations

5 Conclusions


Reduced-Rank Regression in Data Science

Outline





5 Conclusions



The Reduced-Rank Regression Models in Data Science

Reduced-rank regression (RRR) [VelRei'13] is a multivariate linear regression modelwith a reduced-rank coe�cient matrix

y = µ + ABTx + Dz + ε, (RRR)

where µ ∈ RP , loading matrix A ∈ RP×r, factor matrix B ∈ RQ×r, D ∈ RQ×Rare coe�cients, and ε denotes the model innovation.

The ABT forms a Stiefel (low-rank) manifold to realize dimension reduction via factors BTx.

Figure from Internet

Financial econometrics[Johansen'92]

[XieGaoJin'16]

Wireless systems[StankovicHaardt'08]

[Zheng'14]

Computer vision[DongTorre'10]

[WalterSaskaFranchi'18]

Environmental engineering[Glasbey'92]

� � �



Estimation of RRR Models

Classical ways for RRR parameter estimation (learning) is via Gaussian assumption on innovationsε ∼ N (0,Σ) , i.e.,

f(ε) = (2π)−k2 det(Σ)−

12 exp{−1

2εTΣ−1ε}.

Problem formulation

Given N observed samples {xi,yi, zi}Ni=1,

minimizeθ

12

∑Ni=1

∥∥yi − µ−ABTxi −Dzi∥∥2

2 (OLSE)

where θ , {µ,A,B,D}, or

minimizeθ

N2 log det(Σ) + 1

2

∑Ni=1

∥∥Σ− 12 (yi − µ−ABTxi −Dzi)

∥∥2

2

subject to Σ � 0(GMLE)

where θ , {µ,A,B,D,Σ}.



Estimation of RRR Models

Classical ways for RRR parameter estimation (learning) is via Gaussian assumption on innovationsε ∼ N (0,Σ) , i.e.,

f(ε) = (2π)−k2 det(Σ)−

12 exp{−1

2εTΣ−1ε}.

Problem formulation

Given N observed samples {xi,yi, zi}Ni=1,

minimizeθ

12

∑Ni=1

∥∥yi − µ−ABTxi −Dzi∥∥2

2 (OLSE)

where θ , {µ,A,B,D}, or

minimizeθ

N2 log det(Σ) + 1

2

∑Ni=1

∥∥Σ− 12 (yi − µ−ABTxi −Dzi)

∥∥2

2

subject to Σ � 0(GMLE)

where θ , {µ,A,B,D,Σ}.



Estimation of RRR Models (cont.)

To solve OLSE/GMLE for RRR, we �rst examine the �rst-order optimality conditions for {µ,D,Σ}.

The optimum for {µ,D,Σ} as functions of {A,B} is{[µ,D

](A,B) =

(Y −ABTX

)[1,ZT ]

([1,ZT ]T [1,ZT ]

)−1

Σ (A,B) = 1+PN

(YP−ABTXP

)(YP−ABTXP

)T,

where Y , [y1, . . . ,yN ], X , [x1, . . . ,xN ], Z , [z1, . . . , zN ], and P , IN − ZT (ZZT )−1Z.

Substituting {µ,D,Σ} into the objective, we have the subproblems w.r.t. {A,B}

minimizeA,B

tr[(

N−ABTM) (

N−ABTM)T ]

ABT -subprob. in OLSE

minimizeA,B

det[(

N−ABTM) (

N−ABTM)T ]

ABT -subprob. in GMLE,

where N , YP and M , XP.Y. F. Yang and Z. Zhao Online Robust Reduced-Rank Regression IEEE SAM 2020 7 / 29



Proposition (Johansen'91, Lütkepohl'05)

De�ne Rmm ,MMT , Rmn ,MNT , Rnm , NMT = RTmn, and Rnn , NNT . The optimum in

OLSE/GMLE subproblems with respect to A and B is obtained at

A? = RnmR− 1

2mmUr and B? = R

− 12

mmUr, (1)

where Ur ∈ RQ×r contains the left singular vectors corresponding to the r largest singular values of

matrix R− 1

2mmRmnR

− 12

nn sorted in nonincreasing order.

As expected, for deterministic estimation scheme under Gaussianity, OLSE is equivalent to GMLE.

, closed-form solution/ not adaptive to outliers and large-scale data

In data analytics, deterministic Gaussian estimation is ine�cient with outliers and data proliferation.







A? = RnmR− 1

2mmUr and B? = R

− 12

mmUr, (1)


matrix R− 1

2mmRmnR

− 12











A? = RnmR− 1

2mmUr and B? = R

− 12

mmUr, (1)


matrix R− 1

2mmRmnR

− 12






RRR with Robustness Pursuit and Streaming Data

Outline





5 Conclusions



Target 1: Robust Against Heavy-tails and OutliersFor applications especially involved in big data, the data to analyze often exhibit features ofheavy-tails and outliers [George'14].

Robust pursuit for RRR is essential in scenarios likestatistical robust channel estimation [Garcia'06],erratic seismic noise attenuation [ChenSacchi'15],genetic analysis [SheChen'17], etc.

−8 −6 −4 −2 0 2 4 6−6

−4

−2

0

2

4

6

8True location and shape

−8 −6 −4 −2 0 2 4 6−6

−4

−2

0

2

4

6

8Location and shape estimated by sample average

−8 −6 −4 −2 0 2 4 6−6

−4

−2

0

2

4

6

8Location and shape estimated by Cauchy MLE



Target 2: Online Estimation under Large-Scale Dataset

With the proliferation of data, data dimension can be huge orthe data is continuously collected as streams.

Dealing with data batch can be computationally expensive.

Model needs to be updated with new information coming in.

online channel estimation [Venkateswaran'10]online cointegration detection in �nance [ZhaoPalomar'17]real-time functional magnetic resonance imaging [Ulfarsson'10]etc.

Source: Apache Flink®

ResearchQuestion: Design an RRR estimation scheme which is robust and amenable to large datasets?

Yes, this paper!










Yes, this paper!










Yes, this paper!



Online RRRR: Robust Reduced-Rank Regression with Data Streams

To promote robustness, Cauchy distribution assumption is used [Huber'11].

Assume ε ∼ Cauchy(0,Σ) with Σ ∈ SP++, then its p.d.f. is

f(ε) = Γ[(1+P )/2]Γ(1/2)πP/2 det(Σ)1/2

(1 + εTΣ−1ε

)− 1+P2 ,

and the negative log-likelihood function of a single observation ξ , {x,y, z} is de�ned as

`(θ, ξ) , 12 log det(Σ) + 1+P

2 log[1 + (y − µ−ABTx−Dz)TΣ−1(y − µ−ABTx−Dz)

].

Problem formulation for Online RRRR

Given the loss function `(θ, ξ) for one sample ξ, the online estimation problem is given as

minimizeθ

[L(θ) , Eξ[`(θ, ξ)]

]subject to Σ � 0.

(CMLE)

This is a highly nonconvex constrained stochastic optimization problem.


Online Estimation via Stochastic Majorization-Minimization

Outline





5 Conclusions



Stochastic Estimation via SMM

A classical approach for solving the �CMLE� problem is the SAA method [Plambeck'96].

Sample Average Approximation (SAA)

For function `(·) of parameter θ and data ξi, the optimization step at iteration k with N (k) samples is

θ(k) ← arg minθ∈Θ1

N(k)

∑N(k)

i=1 `(θ, ξi).

Since the function `(·) is non-convex, per-iteration SAA computation is computationally expensive.

Stochastic Majorization Minimization (SMM) [Razaviyayn'16]

SMM in each iteration optimizes over a surrogate majorizing (upper-bound) function ¯(θ,θ(k−1), ξi) as

θ(k) ← arg minθ∈Θ1

N(k)

∑N(k)

i=1¯(θ,θ(k−1), ξi),

Themajorizing function is commonly chosen to be strictly convex or one leading to a closed-form solution.



Find a Majorizing Function in SMM

x(k)x(k)

f (x)f (x)

u(x; x(k))u(x; x(k))

x(k)x(k) x(k+1)x(k+1)

f (x)f (x)


x(k)x(k) x(k+1)x(k+1) x(k+2)x(k+2)

f (x)f (x)


u(x; x(k+1))u(x; x(k+1))

f(x(k+1)) · f(x(k))f(x(k+1)) · f(x(k))

It is easy to see that the key in using SMM is to �nd a good majorizing function ¯(θ,θ(k−1), ξi).

Lemma: Linear majorization [ZhaoPalomar'17]

Given θ(k−1), the loss function `(θ, ξi) can be majorized as

¯(θ,θ(k−1), ξi) ,12 log det (Σ)+ 1+P

2

(y

(k)i −

√w

(k)i µ−ABT x

(k)i −Dz

(k)i

)T×Σ−1

(y

(k)i −

√w

(k)i µ−ABT x

(k)i −Dz

(k)i

)+ const.,

where y(k)i ,

√w

(k)i yi, x

(k)i ,

√w

(k)i xi, z

(k)i ,

√w

(k)i zi, and the weight w

(k)i is a function of θ(k−1).



Solving the Subproblem in SMM

The majorized objective function becomes

L(θ,θ(k)) , 1N(k)

∑N(k)

i=1¯(θ,θ(k), ξi)

The subproblem in SMM in each iteration k is

minimizeθ

L(θ,θ(k))

subject to Σ � 0,(Majorized Subprob. in CMLE)

Examining the �rst order optimality condition for µ, D, and Σ, we have{[µ,D

](A,B) =

(Y(k) −ABT X(k)

)Q(k)T

(Q(k)Q(k)T

)−1,

Σ (A,B) = 1+PN(k)

(Y(k)P(k) −ABT X(k)P(k)

)(Y(k)P(k) −ABT X(k)P(k)

)T,

where Q(k) ,[√

w(k), Z(k)T]T

and P , IN −Q(k)T (Q(k)Q(k)T )−1Q(k).




The majorized objective function becomes

L(θ,θ(k)) , 1N(k)

∑N(k)

i=1¯(θ,θ(k), ξi)

The subproblem in SMM in each iteration k is

minimizeθ

L(θ,θ(k))

subject to Σ � 0,(Majorized Subprob. in CMLE)

Examining the �rst order optimality condition for µ, D, and Σ, we have{[µ,D

](A,B) =

(Y(k) −ABT X(k)

)Q(k)T

(Q(k)Q(k)T

)−1,

Σ (A,B) = 1+PN(k)



)T,

where Q(k) ,[√

w(k), Z(k)T]T

and P , IN −Q(k)T (Q(k)Q(k)T )−1Q(k).




Given[µ,D

](A,B) and Σ (A,B), the objective L(θ,θ(k)) becomes

L(θ,θ(k)) = N(k)

2 log det[

1+PN(k)



)T ]+ const.

The subproblem now becomes

minimizeA,B

det[(

N(k) −ABTM(k))(

N(k) −ABTM(k))T ]

, (Majorized ABT -Subprob. in CMLE)

where N(k) = Y(k)P(k) and M(k) = X(k)P(k).

The solution to �Majorized ABT -Subproblem in CMLE� is the solution in �ABT -Subproblem in GMLE�!

Solving CMLE is just iteratively solving the GMLE!



SMM-Based Algorithm for Online RRRR

Online RRRR via SMM

Input: Training data {ξi}∞i=1, the initial parameter θ(0) ∈ Θ, and k = 1;For (i = 1, . . .)

1 Calculate {w(k), Y(k), X(k), Z(k),Q(k),P(k)} based on the parameter θ(k) and data {ξi}N(k)

i=1 ;

2 Calculate {M(k),N(k),R(k)mm,R

(k)mn,R

(k)nn , };

3 Compute r left singular vectors of R− 1

2mmRmnR

− 12

nn ;

4 Update θ(k+1); k ← k + 1;

Output: θ(k) = {µ(k), A(k), B(k), D(k),Σ(k)}.

Can we design a uni�ed algorithm framework for Online RRRR via CMLE GENERAL loss function?

A class of cases can be solved by the SMM algorithm with OLSE/GMLE solutions [ZhouZhaoPal'18]!




Online RRRR via SMM



i=1 ;


(k)mn,R

(k)nn , };


2mmRmnR

− 12

nn ;

4 Update θ(k+1); k ← k + 1;







Online RRRR via SMM



i=1 ;


(k)mn,R

(k)nn , };


2mmRmnR

− 12

nn ;

4 Update θ(k+1); k ← k + 1;





Numerical Simulations

Outline





5 Conclusions



Simulation Setup

A RRR model is speci�ed with P = Q = 10 and r = R = 1.

A path of 1000 samples is generated where innovations follow a Student's t-distribution withdegree of freedom of 3 mimicking the real data scenarios.

In the online estimation, we start with 25 samples and 1 sample is added in each iteration.



Comparisons on Convergence and Estimation Accuracy

15

20

40

1 2 5 10 50 200

1000

6000

Run time (in secs)

Obj

ectiv

e fu

nctio

n va

lue

SAA − solverSAA − MMSMM

Figure: Alg. convergence (average over 30 MC runs)

0.1

0.5

1.0

2.0

5.0

10.0

30.0

80.0

1 2 5 10 50 200

1000

6000

Run time (in secs)

||A(k

) B(k

)T−

[AB

T] T

RU

E|| F2

/||[A

BT] T

RU

E|| F2 [Gaussian MLE]

[Cauchy MLE] SAA − solver[Cauchy MLE] SAA − MM[Cauchy MLE] SMM

Figure: Estimation accuracy (average over 30 MC runs)



Comparisons on Runtime

To show the computational e�ciency, the estimation time is compared by varying the parameterdimensions where P = Q and r = R = 1 based on 100 Monte Carlo simulations.

Table: Average runtime (in secs) with standard error in parentheses

(P, Q) (5, 5) (10, 10) (20, 20) (30, 30)SAA - MM 48.0 (16.8) 69.9 (17.7) 153.2 (27.6) 264.9 (27.2)SMM 17.2 (6.85) 25.4 (7.49) 54.0 (11.04) 88.5 (11.06)

SMM consistently runs faster and more stable than the SAA and scales well with the dimension.



An R Package

The proposed algorithm has been provided in open source ,.

You are welcome to download it from the GitHub

devtools::install_github("finyang/RRRR")

or from the Comprehensive R Archive Network (CRAN) .

install.packages("RRRR")

On CRAN, the package has achieved a total download of downloadsdownloads 11341134 .


Conclusions

Outline





5 Conclusions


Conclusions

Conclusions

We have discussed the online robust reduced-rank regression problem.

An e�cient algorithm based on the stochastic majorization minimization has been proposed.

The e�ectiveness of the model and algorithm has been demonstrated via simulations.


Selected References I

Bernardini, E. and Cubadda, G. (2015).

Macroeconomic forecasting and structural analysis through regularized reduced-rank regression.International Journal of Forecasting, 31(3):682�691.

Chen, K. and Sacchi, M. D. (2015).

Robust reduced-rank �ltering for erratic seismic noise attenuation.Geophysics, 80(1):V1�V11.

George, G., Haas, M. R., and Pentland, A. (2014).

Big data and management.

Glasbey, C. A. (1992).

A reduced rank regression model for local variation in solar radiation.Journal of the Royal Statistical Society. Series C (Applied Statistics), 41(2):381.

Gustafsson, T. and Rao, B. D. (2002).

Statistical analysis of subspace-based estimation of reduced-rank linear regressions.IEEE Transactions on Signal Processing, 50(1):151�159.

Hua, Y., Nikpour, M., and Stoica, P. (2001).

Optimal reduced-rank estimation and �ltering.IEEE Transactions on Signal Processing, 49(3):457�469.

Huang, D. and De la Torre, F. (2010).

Bilinear kernel reduced rank regression for facial expression synthesis.In European Conference on Computer Vision, pages 364�377. Springer.

Selected References II

Huber, P. J. (2011).

Robust statistics.Springer.

Johansen, S. (1991).

Estimation and hypothesis testing of cointegration vectors in gaussian vector autoregressive models.Econometrica, 59(6):1551.

Johansen, S. and Juselius, K. (1992).

Some structural hypotheses in a multivariate cointegration analysis of the purchasing power parity and the uncovered interest parity for UK.Journal of Econometrics, 53(1-3):211�244.

Nicoli, M. and Spagnolini, U. (2005).

Reduced-rank channel estimation for time-slotted mobile communication systems.IEEE Transactions on Signal Processing, 53(3):926�944.

Plambeck, E. L., Fu, B. R., Robinson, S. M., and Suri, R. (1996).

Sample-path optimization of convex stochastic performance functions.Mathematical Programming, Series B, 75(2):137�176.

Razaviyayn, M., Sanjabi, M., and Luo, Z. Q. (2016).

A stochastic successive minimization method for nonsmooth nonconvex optimization with applications to transceiver design in wireless communicationnetworks.Mathematical Programming, 157(2):515�545.

She, Y. and Chen, K. (2017).

Robust reduced-rank regression.Biometrika, 104(3):633�647.

Selected References III

Ulfarsson, M. O. and Solo, V. (2011).

Sparse variable reduced rank regression via stiefel optimization.In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3892�3895. ieeexplore.ieee.org.

Velu, R. and Reinsel, G. C. (2013).

Multivariate Reduced-Rank Regression: Theory and Applications.Springer Science & Business Media.

Venkateswaran, V. and others (2010).

Analog beamforming in MIMO communications with phase shift networks and online channel estimation.IEEE Transactions on.

Walter, V., Saska, M., and Franchi, A. (2018).

Fast mutual relative localization of uavs using ultraviolet led markers.In 2018 International Conference on Unmanned Aircraft Systems (ICUAS), pages 1217�1226. IEEE.

Xie, H., Gao, F., and Jin, S. (2016).

An overview of low-rank channel estimation for massive mimo systems.IEEE Access, 4:7313�7321.

Zhao, Z. and Palomar, D. P. (2017).

Robust maximum likelihood estimation of sparse vector error correction model.In 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pages 913�-917. IEEE.

Zheng, W. (2014).

Multi-view facial expression recognition based on group sparse reduced-rank regression.IEEE Transactions on A�ective Computing, 5(1):71�85.

Thanks!

For more information:

zipingzhao.github.io

Documents

Online Robust Reduced-Rank Regression · As expected, for deterministic estimation scheme under Gaussianit,y OLSE is equivalent to GMLE., closed-form solution / not adaptive to outliers