Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Georgios B. Giannakis
Acknowledgment: Drs. G. Mateos, K. Slavakis, G. Leus, and M. Mardani
1
Robust and Scalable Algorithms for Big Data Analytics
Arlington, VA, USA March 22, 2013
n Robust principal component analysis
Ø Linear low-rank models and sparse outliers
n Scalable algorithms for big network data analytics Ø (De-) centralized and online rank minimization
n Robust sparse embedding via dictionary learning Ø Nonlinear low-rank models Ø Data-adaptive compressed sensing
n Concluding remarks
2
Roadmap
BIG
Fast
BIG
Messy
3 3
Principal component analysis
Objective: robustify PCA by controlling outlier sparsity
n Motivation: (statistical) learning from high-dimensional data
n Principal component analysis (PCA) [Pearson’1901] Ø Extraction of low(est)-dimensional structure Ø Applications: source (de)coding, anomaly ID, recommender systems … Ø PCA is non-robust to outliers [Huber’81], [Jolliffe’86], [Wright et al’09-12]
DNA microarray Traffic surveillance
4 4
PCA formulations n Training data
n Minimum reconstruction error Ø Compression operator Ø Reconstruction operator
n Component analysis model
Solution:
5 5
Robustifying PCA n Outlier variables s.t. outlier
otherwise
Ø Nominal data obey ; outliers something else Ø Linear regression [Fuchs’99], [Giannakis et al’11]
Ø Both and unknown, typically sparse!
n Natural (but intractable) estimator
(P0)
G. Mateos and G. B. Giannakis, ``Robust PCA as bilinear decomposition with outlier sparsity regularization,'' IEEE Transactions on Signal Processing, pp. 5176-5190, Oct. 2012.
6 6
Universal robustness
(P1)
n (P0) is NP-hard relax e.g., [Tropp’06]
Ø Role of sparsity-controlling is central
Q: Does (P1) yield robust estimates ? A: Yap! Huber estimator is a special case
7 7
Alternating minimization (P1)
Ø update: SVD of outlier-compensated data Ø update: row-wise soft-thresholding of residuals
Proposition : Algorithm 1’s iterates converge to a stationary point of (P1)
γ -γ
8 8
Video surveillance
Data: http://www.cs.cmu.edu/~ftorre/
Data PCA Robust PCA “Outliers”
n Background modeling from video feeds [De la Torre-Black ‘01]
9 9
Robust unveiling of communities
n Network: NCAA football teams (vertices), Fall ‘00 games (edges)
Ø Identified exactly: Big 10, Big 12, ACC, SEC,…; Outliers: Independent teams
ARI=0.8967
n Robust kernel PCA for identification of cohesive subgroups
Data: http://www-personal.umich.edu/~mejn/netdata/
10 10
Online robust PCA
n Nominal:
n Outliers:
n Motivation: Real-time big data and memory limitations
Ø At time , do not re-estimate
Ø Scalability via exponentially weighted subspace tracking
n Robust principal component analysis
Ø Linear low-rank models and sparse outliers
n Scalable algorithms for big network data Ø (De-) centralized and online rank minimization
n Robust embedding via dictionary learning Ø Nonlinear low-rank models Ø Data-adaptive compressed sensing
n Concluding remarks
11
Roadmap
12
Modeling traffic anomalies
n Graph G (N, L) with N nodes, L links, and F flows (F >> L); OD flow zf,t
є {0,1}
Anomaly
LxT LxF
n Packet counts per link l and time slot t
n Matrix model across T time slots:
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
f1
f2
l
n Anomalies: changes in origin-destination (OD) flows [Lakhina et al’04]
Ø Failures, congestions, DoS attacks, intrusions, flooding
13
Low-rank plus sparse matrices
n Z has low rank, e.g., [Zhang et al‘05]; A is sparse across time and flows
Data: http://math.bu.edu/people/kolaczyk/datasets.html
0 200 400 600 800 10000
2
4x 10
8
Time index(t)|a
f,t|
14
General decomposition problem
n Given and routing matrix , identify sparse when is low rank
Ø fat but still low rank
(P1)
n Rank minimization with the nuclear norm, e.g., [Recht-Fazel-Parrilo’10]
Ø Principal Comp. Pursuit (PCP) [Candes et al’10], [Chandrasekaran et al’11]
15
Challenges and importance
n not necessarily sparse and fat PCP not applicable
n Important special cases
Ø R = I : matrix decomposition with PCP [Candes et al’10] Ø X = 0 : compressive sampling with basis pursuit [Chen et al’01] Ø X = CLxρW’ρxT and A = 0 : PCA [Pearson 1901] Ø X = 0, R = D unknown: dictionary learning [Olshausen’97]
n LT + FT >> LT
X A Y
16
Exact recovery n Noise-free case
M. Mardani, G. Mateos, and G. B. Giannakis,``Recovery of low-rank plus compressed sparse matrices with application to unveiling traffic anomalies," IEEE Trans. Information Theory, 2013.
(P0)
Theorem: Given and , assume every row and column of has at most k<s non-zero entries, and has full row rank. If C1)-C2) hold, then with (P0) exactly recovers
C1)
C2)
Q: Can one recover sparse and low-rank exactly? A: Yes! Under certain conditions on
17
In-network processing n Robust imputation of network data matrix
Network health cartography
Smart metering
G. Mateos and K. Rajawat ”Dynamic network cartography,” IEEE Signal Processing Magazine, May 2013.
Goal: Given few rows per agent, perform distributed cleansing and imputation by leveraging low-rank of nominal data and sparsity of the outliers.
n Challenge: not separable across rows (links/agents)
?
?
? ?
?
? ?
?
?
?
18
Separable regularization n Key property
Lxρ ≥rank[X]
V’
W’ C
n Separable formulation equivalent to (P1)
(P2)
Ø Nonconvex; less variables:
Proposition: If stat. pt. of (P2) and , then is a global optimum of (P1).
19
Decentralized rank minimization
M. Mardani, G. Mateos, and G. B. Giannakis, “In-network sparsity regularized rank minimization: Algorithms and applications," IEEE Transactions on Signal Processing, 2013.
n Alternating-direction method of multipliers (ADMM) solver for (P2) Ø Method [Glowinski-Marrocco’75], [Gabay-Mercier’76] Ø Learning over networks [Schizas-Ribeiro-Giannakis’07]
Consensus-based optimization Attains centralized performance
20
Internet2 data n Real network data
Ø Dec. 8-28, 2008 Ø N=11, L=41, F=121, T=504
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
False alarm probability
Det
ectio
n pr
obab
ility
[Lakhina04], rank=1[Lakhina04], rank=2[Lakhina04], rank=3Proposed method[Zhang05], rank=1[Zhang05], rank=2[Zhang05], rank=3
Data: http://www.cs.bu.edu/~crovella/links.html
0100
200300
400500
0
50
100
0
1
2
3
4
5
6
Time
Pfa = 0.03 Pd = 0.92
---- True ---- Estimated
Flows
Ano
mal
y vo
lum
e
21
Online rank minimization
n Approach: regularized exponentially-weighted LS formulation
M. Mardani, G. Mateos, and G. B. Giannakis, "Dynamic anomalography: Tracking network anomalies via sparsity and low rank," IEEE Journal of Selected Topics in Signal Processing, pp. 50-66, Feb. 2013.
n Construct an estimated map of anomalies in real time
Ø Streaming data model:
0
2
4CHIN--ATLA
0
20
40
Anom
aly am
plitu
de
WASH--STTL
0 1000 2000 3000 4000 5000 60000
10
20
30
Time index (t)
WASH--WASH
0
5ATLA--HSTN
0
10
20
Link
traff
ic lev
el DNVR--KSCY
0
10
20
Time index (t)
HSTN--ATLA
---- Estimated ---- True
o---- Estimated ---- True
Tracking cleansed link traffic Real time unveiling of anomalies
n Robust principal component analysis
Ø Linear low-rank models and sparse outliers
n Scalable algorithms for big network data analytics Ø (De-) centralized and online rank minimization
n Robust sparse embedding via dictionary learning Ø Nonlinear low-rank models; data-adaptive compressed sensing
n Concluding remarks
22
Roadmap
23 23
Nonlinear low-dimensional models?
q Compressive sampling (CS) [Donoho/Candes’06]: Linear operator Ø CS vs data-adaptive principal component analysis (PCA) [Pearson’1901] Ø Data-adaptive nonlinear CS? ; quad-CS [Ohlsson etal’13]
q Nonlinear dimensionality reduction for data on manifolds Ø Kernel PCA [Scholkopf etal’98]; SDE [Weinberger’04]; reconstruction? Ø Local linear embedding (LLE) [Roweis-Saul’00]; LEM; MDS; Isomap … Ø Sparsity-aware embeddings [Huang etal’10], [Vidal’11], [Kong etal’12] Ø Dictionary learning (DL) [Olshausen’97]; online DL [Mairal etal’10], [Carin etal’11]
24 24
Learning sparse manifold models
Ø Robust sparse embedding via dictionary learning (RSE-DL)
Ø reduces and morphs training data to yield a smoother basis for
Ø Use matrix to learn dictionary ( )
Sparse training data fit Smooth affine manifold fit
q Training data on a smooth but unknown manifold
25
Parsimonious nonlinear embedding
Ø Reduced complexity embedding step ( )
q RSE-DL appropriate for (de-)compression and reconstruction
q Robust sparse coding: works for clustering/classification
q Embedding preserves
26 26
RSE-DL compression and reconstruction
q Reconstruct:
q Operational phase @ Rx: given (possibly noisy)
q Operational phase @ Tx: per data vector
Ø Less computationally demanding modules ( )
q Compress:
27
Test case: Swiss roll
Ø Noise on manifold: , channel noise:
28
Comparisons with LLE, RSE, RSGE
(Average over 100 realizations)
Missing data q ”USC girl” (predates Lena!) with 50% misses
q RSE-DL: reduced complexity relative to e.g., Bayesian-type [Chen etal’ 10]
30
Concluding summary n Robust PCA; online via robust subspace tracking
Ø Leveraging linear low-rank models and outlier sparsity
n Unveiling anomalies in large-scale network data
Ø Scalable decentralized and online algorithms
Thank you!
Ø Performance bounds? Dynamical network data? Ø Learning via quantized big data (few bits)? Ø RSE-DL for nonlinear compressive sampling?
n Data-adaptive, nonlinear, low-dimensional models
n The road ahead
31
Numerical validation n Setup L=105, F=210, T = 420 R ~ Bernoulli(1/2) Xo = RPQ’, P, Q ~ N(0, 1/FT) aij ϵ {-1,0,1} w.p. {π/2, 1-π, π/2}
n Relative recovery error
% non-zero entries (ρ)
rank
(XR
) (r)
0.1 2.5 4.5 6.5 8.5 10.5 12.5
10
20
30
40
50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
rank
(X0)
[r]
[(s/FT)%]