How Much Restricted Isometry is Needed in Nonconvex Matrix ... › media › Slides › nips ›...

How Much Restricted Isometry is Needed in

Nonconvex Matrix Recovery?

Cédric

Lavaei

Somayeh

Sojoudi

Richard Y.

Neural Information Processing Systems 2018Wed Dec 5th 5 - 7 PM @ Room 210 & 230 AB

Poster #46

Nonconvex matrix recovery (Burer & Monteiro 2003)

Recommendation

engines

Phase retrieval

Power system

state estimation

Cluster

analysis

1. Express low-rank matrix as product of factors

table of movie ratingsmovie genres

user preferences

2. Minimize least-squares loss of linear model

specific

table elements

ratingstable of movie ratings

Spurious local minima

Global minimum

X = UUT

Exact recovery guarantee (Bhojanapalli et al. 2016)

δ-Restricted isometry property (δ-RIP)

If δ < 1/5, then no spurious local minima.

See also (Ge et al. 2017; Li & Tang 2017; Zhu et al. 2017)

Local search is guaranteed to succeed.

Exact recovery guarantee (Bhojanapalli et al. 2016)

See also (Ge et al. 2017; Li & Tang 2017; Zhu et al. 2017)

δ-Restricted isometry property (δ-RIP)

If δ < 1/5, then no spurious local minima.

δ-Concentration inequality

If δ very small, then no spurious local minima.

Exact recovery guarantee (Ge et al. 2016)

See also (Ge et al. 2015; 2017; Sun et al. 2015; 2016; Park et al. 2017; etc.)

Similar idea drives many proofs

If δ < 1/5, then no spurious local min.

Preserve lengths with <10% distortion

Can this be significantly improved?

δ0 11/5

Good ???

• Problem hard.

• Specific to algorithm.

• Proof idea is limited.

• Problem easy.

• Agnostic to algorithm.

• Proof idea is powerful.

Previous attempts all stuck at 1/5 (Bhojanapalli et al. 2016) (Ge et al. 2017) (Li & Tang 2017) (Zhu

et al. 2017) etc.

δ0 11/5

Good ???

• Problem hard.

• Problem easy.

• Agnostic to algorithm.

• Proof idea is powerful.

δ0 11/5

Good Bad

• Problem hard.

If δ ≥ 1/2, many counterexamples.

Contribution 1.

Zhang, Josz, Sojoudi, Lavaei, NeurIPS (2018)

δ0 11/2

Good Bad

Contribution 2.

Let rank r = 1.

Zhang, Sojoudi, Lavaei. Submitted to JMLR (2018)

Counterexample with δ = 1/2

Satisfies ½-RIP.

Zhang, Sojoudi, Lavaei, Submitted to JMLR (2018)

Satisfies ½-RIP. Ground truth

z = (1,0)

Spurious local min

x = (0,1/√2)

• 100,000 trials w/ SGD

• 87,947 successful

• 12% failure rate

z = (1,0)

Spurious local min

x = (0,1/√2)

• 100,000 trials w/ SGD

• 87,947 successful

• 12% failure rate

z = (1,0)

Spurious local min

x = (0,1/√2)

Generalization to

arbitrary rank-1

ground truth

Proof idea. Counterexamples via convex optimization

21Zhang, Josz, Sojoudi, Lavaei, NeurIPS (2018)

Key insight. Relax into a semidefinite program

Main Result 1. Counterexamples are almost everywhere

Theorem 1 (Zhang, Josz, Sojoudi, Lavaei 2018).

Given x, z not colinear and nonzero,

there exists a counterexample that

• satisfy δ-RIP and 1/2 ≤ δ < 1

• has z as ground truth

• has x as spurious local min.

Main Result 1. Counterexamples are almost everywhere

Theorem 1 (Zhang, Josz, Sojoudi, Lavaei 2018).

Given x, z not colinear and nonzero,

there exists a counterexample that

• satisfy δ-RIP and 1/2 ≤ δ < 1

• has z as ground truth

• has x as spurious local min.

Take-away.

If δ-RIP with δ ≥ 1/2, then

expect spurious local minima.

Conjecture (Zhang, Josz, Sojoudi, Lavaei 2018).

If δ-RIP with δ < 1/2, then no spurious local min.

Main Result 2. Sharp RIP-based guarantee

25Zhang, Sojoudi, Lavaei. Submitted to JMLR (2018)

Theorem 2 (Zhang, Sojoudi, Lavaei 2018).

If δ-RIP with δ < 1/2 and r =1, then no spurious local

Proof for rank-1 case

Ongoing work.

Generalization to rank-r

Theorem 2 (Zhang, Sojoudi, Lavaei 2018).

If δ-RIP with δ < 1/2 and r =1, then no spurious local

Practical implications?

δ-RIP with 1/2 ≤ δ < 1

“Engineered”

spurious local

minimum

Global minimum

2. Start SGD at

3. Make 10k SGD steps, measure error

error = | xfinal - xgood |

xinit = (1-γ) xbad + γ Gaussian

1. Select γ in [0,1]

median

(1k trials)

xinit = xbad xinit ~ Gaussian

Example 1

100% success rate

Example 1

100% success rate

median

(1k trials)

failure

Example 1

100% success rate

median

(1k trials)

failure

success

max95%

median

(1k trials)

γxinit = xbadxinit = w

Example 2

<95% success rate

failure

Practical implications?

δ-RIP with 1/2 ≤ δ < 1

success

spurious

local min

spurious

local min> 0%

failure

Limitations of “no spurious local min” guarantees

δ0 11/5

Good Bad

R.Y. Zhang, C. Josz, S. Sojoudi, J. Lavaei, NeurIPS (2018)

Wed Dec 5th 5 - 7 PM @ Room 210 & 230 AB

Poster #46

If δ < 1/2, then no spurious local min (?)

δ0 11/2

Good Bad

Poster #46

If δ < 1/2, then no spurious local min (?)

δ0 11/2

Good Bad

Poster #46

Limitations of “no spurious local min” guarantees

How Much Restricted Isometry is Needed in Nonconvex Matrix ... › media › Slides › nips ›...

Documents

Nonconvex Robust Optimization for Problems with …dbertsim/papers/Robust Optimization/Nonconvex Rob… · Nonconvex Robust Optimization for ... We demonstrate the practicability

Enhancing convexification techniques for solving nonconvex

BANACH ALGEBRAS GENERATED BY AN INVERTIBLE ISOMETRY … · 2014-12-19 · Banach algebra, Lp-space, p-pseudofunctions, invertible isometry, spectrum, measure class preserving transformation

ISOMETRY GROUPS OF NON-POSITIVELY CURVED SPACES: STRUCTURE ...egg.epfl.ch/~nmonod/articles/structure.pdf · the interaction between X and transits: namely the full isometry group

Nonsmooth, Nonconvex OptimizationNonsmooth, Nonconvex Optimization Introduction Nonsmooth, Nonconvex Optimization Example Methods Suitable for Nonsmooth Functions Failure of Steepest

ISOMETRY LIE ALGEBRAS OF INDEFINITE HOMOGENEOUS SPACES …perso.ens-lyon.fr/zeghib/Homogeneous_index2.pdf · ISOMETRY LIE ALGEBRAS OF HOMOGENEOUS SPACES 3 any nil-invariant symmetric

Book of Abstracts Workshop on Manifold Learning Bonn, 30 ... · Thu, 16:00-16:45 Krahmer, New and improved Johnson-Lindenstrauss embed-dings via the Restricted Isometry Property Fri,

CONSUMER THEORY WITH NONCONVEX CONSUMPTION

Adaptive Methods for Nonconvex Optimization

Nonconvex ADMM for Distributed Sparse PCApeople.ece.umn.edu/~mhong/GlobalSIP15.pdf · IntroductionDistributed SPCA rmulationsFoProposed ADMM Algorithm Numerical Results Nonconvex

Reﬂexivity of the isometry group of some classical spacesmatematicas.unex.es/~fcabello/files/printable/30pub.pdf · Reflexivity of the isometry group of some classical spaces 411

Nonconvex Mixed-Integer Nonlinear Optimization

Nonsmooth, nonconvex optimization, implications for deep ...fmalgouy/enseignement/downlo… · Nonsmooth, nonconvex optimization, implications for deep-learning S. Gerchinovitz1,

Nonconvex Optimization for Communication Systems · Nonconvex Optimization for Communication Systems Mung Chiang Electrical Engineering Department Princeton University, Princeton,

1 Numerical geometry of non-rigid shapes Isometry-Invariant Similarity Isometry-Invariant Similarity It is incredible what Gromov can do just with the

Non-Negative Matrix Factorization, Convexity and Isometry

Coordinate descent algorithms for nonconvex …myweb.uiowa.edu/pbreheny/publications/Breheny2011.pdfCOORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS

Isometry Transformations - McGraw Hill Education · 2013-08-28 · LESSON5 Isometry Transformations 5 Date Time Math Message Translations (slides), reflections (flips),and rotations

How Much Restricted Isometry is Needed In Nonconvex Matrix ... · 2r, then we can view the least-squares residual A(xxT) bas a dimension-reduced embedding of the displacement vector

Globally Solving Nonconvex Quadratic Programming Problems ... · Globally Solving Nonconvex Quadratic Programming Problems ... Globally Solving Nonconvex Quadratic Programming