Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard UniversityMicrosoft Research

Bayesianism, Convexity, and the quest towards Optimal

Algorithms

Boaz Barak

Harvard University Microsoft Research

Talk Plan

Skipping today:

• Dubious historical analogy.

• Philosophize about automating algorithms.

• Wave hands about convexity and the Sum of Squares algorithm.

• Sudden shift to Bayesianism vs Frequentism.

• Non-results on the planted clique problem (or, how to annoy your friends).

• Sparse coding / dictionary learning / tensor prediction [B-Kelner-Steurer’14,’15 B-Moitra’15]

• Unique games conjecture / small set expansion[..B-Brandao-Harrow-Kelner-Steurer-Zhou’12..]

• Connections to quantum information theory

Prologue: Solving equations

Solutions for cubics and quartics.

Babylonians (~2000BC):

del Ferro-Tartaglia-Cardano-Ferrari (1500’s):

Solutions for quadratic equations.

Euler(1740’s):

root: Gauss (1796):

Special cases of quintics

…Ruffini-Abel-Galois (early 1800’s):

• Some equations can’t be solved in radicals

• Characterization of solvable equations.

• Birth of group theory• 17-gon construction now “boring”:

few lines of Mathematica.

Vandermonde(1777):Solve with square and fifth roots

van Roomen/Viete (1593):“Challenge all mathematicians in the world”

𝑥45−45 𝑥43+…+45 𝑥=√7 / 4+… √45 / 64

A prototypical TCS paper

Interesting problem

Efficient Algorithm(e.g. MAX-FLOW in P)

Hardness Reduction(e.g. MAX-CUT NP-hard)

Can we make algorithms boring?Can we reduce creativity in algorithm design?

Can we characterize the “easy” problems?

Characterizing easy problemsGoal: A single simple algorithm that solves efficiently every problem that can be efficiently solved.

Trivially True: Algorithm that enumerates Turing machines.Trivially False: Analyzing algorithmResolving P vs NP

Revised Goal: A single simple algorithm that is

conjectured to be optimal in some interesting domain

of problems.

Byproducts: New algorithms, theory of computational knowledge.

Next slide

Part 1???

Part 2

Domain: Combinatorial Optimization*Maximize/minimize an objective subject to constraints Examples: Satisfiability, Graph partitioning and coloring, Traveling Salesperson, Matching, ...

Characteristics:• Natural notions of approximation and noise.

• No/little algebraic structure

• “’’ (“good characterization”), “”

• Threshold behavior: either very easy or very hard (e.g. 2SAT vs 3SAT, random kSAT)

• Same algorithmic ideas and themes keep recurring.

Hope: Make this formal for some subclass of optimization.

Non-Examples: Integer factoring, Determinant

Theme: Convexity

Convexity in optimization

Interesting Problem

ConvexProblem

General Solver

Example: Can embed in or

Sum of Squares Algorithm: [Shor’87,Parrilo’00,Lasserre’01]Universal embedding of any* optimization problem into an -dimensional convex set.

• Both “quality” of embedding and running time grow with

• optimal solution, exponential time.

• Encapsulates many natural algorithms. Optimal among a natural class [Lee-Raghavenrda-Steurer’15]

Creativity

!!

Algorithmic version of works related to Hilbert’s 17th problem [Artin 27,Krivine64,Stengle74]

Talk Plan

• Dubious historical analogy.

• Philosophize about automating algorithms.

• Wave hands about convexity and the Sum of Squares algorithm.

• Sudden shift to Bayesianism vs Frequentism.

• Non-results on the planted clique problem.

Frequentists vs Bayesians

“There is 10% chance that the digit of is 7”

“Nonsense! The digit is either 7 or isn’t.”

“I will take an bet on this.”

Computational version graph with (unknown) maximum clique of size

What’s the probability that vertex is in ?

Information Theoretically: Either or

For computationally bounded observer: May be

Making this formal

Classical Bayesian Uncertainty: posterior distribution

graph with (unknown) maximum clique of size

𝜇 : {0,1 }𝑛→ℝ ,∀ 𝑥 ,𝜇 (𝑥 )≥0 ,∑𝑥

𝜇 (𝑥 )=1

consistent with observations:

Theorem: 𝑆𝑂𝑆𝑑 (𝐺 )= max𝜇:𝑑−𝑝 .𝑑𝑖𝑠𝑡

𝔼𝜇∑𝑥 𝑖

Computational

degree pseudo-distribution

𝔼𝜇𝑝2≥0 ∀𝑝 , deg (𝑝 )≤ 𝑑/2

Corollary:

Convex set.Defined by eq’s +

PSD constraint

Making this formal

Classical Bayesian Uncertainty: posterior distribution

graph with (unknown) maximum clique of size

𝜇 : {0,1 }𝑛→ℝ ,∀ 𝑥 ,𝜇 (𝑥 )≥0 ,∑𝑥

𝜇 (𝑥 )=1

consistent with observations:

Theorem: 𝑆𝑂𝑆𝑑 (𝐺 )= max𝜇:𝑑−𝑝 .𝑑𝑖𝑠𝑡

𝔼𝜇∑𝑥 𝑖

Computational

degree pseudo-distribution

𝔼𝜇𝑝2≥0 ∀𝑝 , deg (𝑝 )≤ 𝑑/2

Corollary:

Convex set.Defined by eq’s +

PSD constraint

A General Perspective:

For every sound but incomplete proof system

is -pseudo-distribution consistent w observations

for every function and number .

If then

incomplete might not be actual distribution.

Computational analog to Bayesian probabilities.

Algorithms : Proof Systems

Frequentist : BayesianPseudorandom : Pseudodistribution

Planted Clique Problem

Distinguish between and clique

[Karp’76,Kucera’95]

Theorem [Lovász’79,Juhász’82]:

No known poly time algorithm does better than

Theorem [Feige-Krauthgamer’02]:

Central problem in average-case complexity. Related to problems in statistics, sparse recovery, finding equilibrium, …[Hazan-Krauthgamer’09, Koiran-Zouzias’12, Berthet-Rigolet’12]

Can SOS do better?

“Theorem” [Meka-Wigderson’13]:

𝔼𝜇∏𝑖∈𝑇

𝑥 𝑖≅

“Proof”: Let and define of “maximal ignorance”:

otherwise

, is a clique

(Same pseudo-dist as used for by Feige-Krauthgamer)

Bug [Pisier]: Concentration bound is false.

In fact, for , deg 2 s.t. [Kelner]

Moments are OK for [Meka-Potechin-Wigderson’15, Desphande-Montanari’15, Hopkins-Kothari-Potechin’15 ]

is valid p-dist assuming higher degree Matrix-valued Chernoff bound

MW’s “moral” error

Pseudo-distributions should be as simple as possible but not simpler.

Following A. Einstein.

Pseudo-distributions should have maximum entropy but respect the data.

MW violated Bayeisan reasoning:

According to MW:

Consider , ,

??

𝑖∈𝑆⇒ deg (𝑖 )∼𝑁 (𝑛2 +𝑘 ,𝑛2 )

𝑖∉𝑆⇒ deg (𝑖 )∼𝑁 (𝑛2 ,𝑛2 )

should be reweighed by

By Bayesian reasoning:

Thm: Bayesian moments get for [Hopkins-Kothari-Potechin-Raghavendra-Schramm’16]

Pseudo-distributions should have maximum entropy but respect the data.

Why is MW’s error interesting?• Shows SoS captures Bayesian reasoning in a

way that other algorithms do not.

Even if SoS is not the optimal algorithm we’re looking for, the dream of a more general theory of hardness, easiness and knowledge is worth pursuing.

• Suggests new way to define what a computationally bounded observer knows about some quantity..

• ..and a more principled way to design algorithms based on such knowledge. (see [B-

Kelner-Steurer’14,’15])

Why is MW’s error interesting?

• Shows SoS captures Bayesian reasoning in a way that other algorithms do not.

• Suggests new way to define what a computationally bounded observer knows about some quantity..

• ..and a more principled way to design algorithms based on such knowledge. (see [B-

Kelner-Steurer’14,’15])Even if SoS is not the optimal algorithm we’re looking for, the dream of a more general theory of hardness, easiness and knowledge is worth pursuing.

Thanks!!

Documents

Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard UniversityMicrosoft Research