19
Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard University Microsoft Research

Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard UniversityMicrosoft Research

Embed Size (px)

Citation preview

Page 1: Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard UniversityMicrosoft Research

Bayesianism, Convexity, and the quest towards Optimal

Algorithms

Boaz Barak

Harvard University Microsoft Research

Page 2: Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard UniversityMicrosoft Research

Talk Plan

Skipping today:

• Dubious historical analogy.

• Philosophize about automating algorithms.

• Wave hands about convexity and the Sum of Squares algorithm.

• Sudden shift to Bayesianism vs Frequentism.

• Non-results on the planted clique problem (or, how to annoy your friends).

• Sparse coding / dictionary learning / tensor prediction [B-Kelner-Steurer’14,’15 B-Moitra’15]

• Unique games conjecture / small set expansion[..B-Brandao-Harrow-Kelner-Steurer-Zhou’12..]

• Connections to quantum information theory

Page 3: Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard UniversityMicrosoft Research

Prologue: Solving equations

Solutions for cubics and quartics.

Babylonians (~2000BC):

del Ferro-Tartaglia-Cardano-Ferrari (1500’s):

Solutions for quadratic equations.

Euler(1740’s):

root: Gauss (1796):

Special cases of quintics

…Ruffini-Abel-Galois (early 1800’s):

• Some equations can’t be solved in radicals

• Characterization of solvable equations.

• Birth of group theory• 17-gon construction now “boring”:

few lines of Mathematica.

Vandermonde(1777):Solve with square and fifth roots

van Roomen/Viete (1593):“Challenge all mathematicians in the world”

𝑥45−45 𝑥43+…+45 𝑥=√7 / 4+… √45 / 64

Page 4: Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard UniversityMicrosoft Research

A prototypical TCS paper

Interesting problem

Efficient Algorithm(e.g. MAX-FLOW in P)

Hardness Reduction(e.g. MAX-CUT NP-hard)

Can we make algorithms boring?Can we reduce creativity in algorithm design?

Can we characterize the “easy” problems?

Page 5: Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard UniversityMicrosoft Research

Characterizing easy problemsGoal: A single simple algorithm that solves efficiently every problem that can be efficiently solved.

Trivially True: Algorithm that enumerates Turing machines.Trivially False: Analyzing algorithmResolving P vs NP

Revised Goal: A single simple algorithm that is

conjectured to be optimal in some interesting domain

of problems.

Byproducts: New algorithms, theory of computational knowledge.

Next slide

Part 1???

Part 2

Page 6: Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard UniversityMicrosoft Research

Domain: Combinatorial Optimization*Maximize/minimize an objective subject to constraints Examples: Satisfiability, Graph partitioning and coloring, Traveling Salesperson, Matching, ...

Characteristics:• Natural notions of approximation and noise.

• No/little algebraic structure

• “’’ (“good characterization”), “”

• Threshold behavior: either very easy or very hard (e.g. 2SAT vs 3SAT, random kSAT)

• Same algorithmic ideas and themes keep recurring.

Hope: Make this formal for some subclass of optimization.

Non-Examples: Integer factoring, Determinant

Page 7: Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard UniversityMicrosoft Research

Theme: Convexity

Page 8: Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard UniversityMicrosoft Research

Convexity in optimization

Interesting Problem

ConvexProblem

General Solver

Example: Can embed in or

Sum of Squares Algorithm: [Shor’87,Parrilo’00,Lasserre’01]Universal embedding of any* optimization problem into an -dimensional convex set.

• Both “quality” of embedding and running time grow with

• optimal solution, exponential time.

• Encapsulates many natural algorithms. Optimal among a natural class [Lee-Raghavenrda-Steurer’15]

Creativity

!!

Algorithmic version of works related to Hilbert’s 17th problem [Artin 27,Krivine64,Stengle74]

Page 9: Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard UniversityMicrosoft Research

Talk Plan

• Dubious historical analogy.

• Philosophize about automating algorithms.

• Wave hands about convexity and the Sum of Squares algorithm.

• Sudden shift to Bayesianism vs Frequentism.

• Non-results on the planted clique problem.

Page 10: Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard UniversityMicrosoft Research

Frequentists vs Bayesians

“There is 10% chance that the digit of is 7”

“Nonsense! The digit is either 7 or isn’t.”

“I will take an bet on this.”

Page 11: Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard UniversityMicrosoft Research

Computational version graph with (unknown) maximum clique of size

What’s the probability that vertex is in ?

Information Theoretically: Either or

For computationally bounded observer: May be

Page 12: Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard UniversityMicrosoft Research

Making this formal

Classical Bayesian Uncertainty: posterior distribution

graph with (unknown) maximum clique of size

𝜇 : {0,1 }𝑛→ℝ ,∀ 𝑥 ,𝜇 (𝑥 )≥0 ,∑𝑥

𝜇 (𝑥 )=1

consistent with observations:

Theorem: 𝑆𝑂𝑆𝑑 (𝐺 )= max𝜇:𝑑−𝑝 .𝑑𝑖𝑠𝑡

𝔼𝜇∑𝑥 𝑖

Computational

degree pseudo-distribution

𝔼𝜇𝑝2≥0 ∀𝑝 , deg (𝑝 )≤ 𝑑/2

Corollary:

Convex set.Defined by eq’s +

PSD constraint

Page 13: Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard UniversityMicrosoft Research

Making this formal

Classical Bayesian Uncertainty: posterior distribution

graph with (unknown) maximum clique of size

𝜇 : {0,1 }𝑛→ℝ ,∀ 𝑥 ,𝜇 (𝑥 )≥0 ,∑𝑥

𝜇 (𝑥 )=1

consistent with observations:

Theorem: 𝑆𝑂𝑆𝑑 (𝐺 )= max𝜇:𝑑−𝑝 .𝑑𝑖𝑠𝑡

𝔼𝜇∑𝑥 𝑖

Computational

degree pseudo-distribution

𝔼𝜇𝑝2≥0 ∀𝑝 , deg (𝑝 )≤ 𝑑/2

Corollary:

Convex set.Defined by eq’s +

PSD constraint

A General Perspective:

For every sound but incomplete proof system

is -pseudo-distribution consistent w observations

for every function and number .

If then

incomplete might not be actual distribution.

Computational analog to Bayesian probabilities.

Algorithms : Proof Systems

Frequentist : BayesianPseudorandom : Pseudodistribution

Page 14: Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard UniversityMicrosoft Research

Planted Clique Problem

Distinguish between and clique

[Karp’76,Kucera’95]

Theorem [Lovász’79,Juhász’82]:

No known poly time algorithm does better than

Theorem [Feige-Krauthgamer’02]:

Central problem in average-case complexity. Related to problems in statistics, sparse recovery, finding equilibrium, …[Hazan-Krauthgamer’09, Koiran-Zouzias’12, Berthet-Rigolet’12]

Can SOS do better?

Page 15: Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard UniversityMicrosoft Research

“Theorem” [Meka-Wigderson’13]:

𝔼𝜇∏𝑖∈𝑇

𝑥 𝑖≅

“Proof”: Let and define of “maximal ignorance”:

otherwise

, is a clique

(Same pseudo-dist as used for by Feige-Krauthgamer)

Bug [Pisier]: Concentration bound is false.

In fact, for , deg 2 s.t. [Kelner]

Moments are OK for [Meka-Potechin-Wigderson’15, Desphande-Montanari’15, Hopkins-Kothari-Potechin’15 ]

is valid p-dist assuming higher degree Matrix-valued Chernoff bound

Page 16: Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard UniversityMicrosoft Research

MW’s “moral” error

Pseudo-distributions should be as simple as possible but not simpler.

Following A. Einstein.

Pseudo-distributions should have maximum entropy but respect the data.

Page 17: Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard UniversityMicrosoft Research

MW violated Bayeisan reasoning:

According to MW:

Consider , ,

??

𝑖∈𝑆⇒ deg (𝑖 )∼𝑁 (𝑛2 +𝑘 ,𝑛2 )

𝑖∉𝑆⇒ deg (𝑖 )∼𝑁 (𝑛2 ,𝑛2 )

should be reweighed by

By Bayesian reasoning:

Thm: Bayesian moments get for [Hopkins-Kothari-Potechin-Raghavendra-Schramm’16]

Pseudo-distributions should have maximum entropy but respect the data.

Page 18: Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard UniversityMicrosoft Research

Why is MW’s error interesting?• Shows SoS captures Bayesian reasoning in a

way that other algorithms do not.

Even if SoS is not the optimal algorithm we’re looking for, the dream of a more general theory of hardness, easiness and knowledge is worth pursuing.

• Suggests new way to define what a computationally bounded observer knows about some quantity..

• ..and a more principled way to design algorithms based on such knowledge. (see [B-

Kelner-Steurer’14,’15])

Page 19: Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard UniversityMicrosoft Research

Why is MW’s error interesting?

• Shows SoS captures Bayesian reasoning in a way that other algorithms do not.

• Suggests new way to define what a computationally bounded observer knows about some quantity..

• ..and a more principled way to design algorithms based on such knowledge. (see [B-

Kelner-Steurer’14,’15])Even if SoS is not the optimal algorithm we’re looking for, the dream of a more general theory of hardness, easiness and knowledge is worth pursuing.

Thanks!!