Upload
sibyl-hawkins
View
212
Download
0
Tags:
Embed Size (px)
Citation preview
Bayesianism, Convexity, and the quest towards Optimal
Algorithms
Boaz Barak
Harvard University Microsoft Research
Talk Plan
Skipping today:
• Dubious historical analogy.
• Philosophize about automating algorithms.
• Wave hands about convexity and the Sum of Squares algorithm.
• Sudden shift to Bayesianism vs Frequentism.
• Non-results on the planted clique problem (or, how to annoy your friends).
• Sparse coding / dictionary learning / tensor prediction [B-Kelner-Steurer’14,’15 B-Moitra’15]
• Unique games conjecture / small set expansion[..B-Brandao-Harrow-Kelner-Steurer-Zhou’12..]
• Connections to quantum information theory
Prologue: Solving equations
Solutions for cubics and quartics.
Babylonians (~2000BC):
del Ferro-Tartaglia-Cardano-Ferrari (1500’s):
Solutions for quadratic equations.
Euler(1740’s):
root: Gauss (1796):
Special cases of quintics
…Ruffini-Abel-Galois (early 1800’s):
• Some equations can’t be solved in radicals
• Characterization of solvable equations.
• Birth of group theory• 17-gon construction now “boring”:
few lines of Mathematica.
Vandermonde(1777):Solve with square and fifth roots
van Roomen/Viete (1593):“Challenge all mathematicians in the world”
𝑥45−45 𝑥43+…+45 𝑥=√7 / 4+… √45 / 64
A prototypical TCS paper
Interesting problem
Efficient Algorithm(e.g. MAX-FLOW in P)
Hardness Reduction(e.g. MAX-CUT NP-hard)
Can we make algorithms boring?Can we reduce creativity in algorithm design?
Can we characterize the “easy” problems?
Characterizing easy problemsGoal: A single simple algorithm that solves efficiently every problem that can be efficiently solved.
Trivially True: Algorithm that enumerates Turing machines.Trivially False: Analyzing algorithmResolving P vs NP
Revised Goal: A single simple algorithm that is
conjectured to be optimal in some interesting domain
of problems.
Byproducts: New algorithms, theory of computational knowledge.
Next slide
Part 1???
Part 2
Domain: Combinatorial Optimization*Maximize/minimize an objective subject to constraints Examples: Satisfiability, Graph partitioning and coloring, Traveling Salesperson, Matching, ...
Characteristics:• Natural notions of approximation and noise.
• No/little algebraic structure
• “’’ (“good characterization”), “”
• Threshold behavior: either very easy or very hard (e.g. 2SAT vs 3SAT, random kSAT)
• Same algorithmic ideas and themes keep recurring.
Hope: Make this formal for some subclass of optimization.
Non-Examples: Integer factoring, Determinant
Theme: Convexity
Convexity in optimization
Interesting Problem
ConvexProblem
General Solver
Example: Can embed in or
Sum of Squares Algorithm: [Shor’87,Parrilo’00,Lasserre’01]Universal embedding of any* optimization problem into an -dimensional convex set.
• Both “quality” of embedding and running time grow with
• optimal solution, exponential time.
• Encapsulates many natural algorithms. Optimal among a natural class [Lee-Raghavenrda-Steurer’15]
Creativity
!!
Algorithmic version of works related to Hilbert’s 17th problem [Artin 27,Krivine64,Stengle74]
Talk Plan
• Dubious historical analogy.
• Philosophize about automating algorithms.
• Wave hands about convexity and the Sum of Squares algorithm.
• Sudden shift to Bayesianism vs Frequentism.
• Non-results on the planted clique problem.
Frequentists vs Bayesians
“There is 10% chance that the digit of is 7”
“Nonsense! The digit is either 7 or isn’t.”
“I will take an bet on this.”
Computational version graph with (unknown) maximum clique of size
What’s the probability that vertex is in ?
Information Theoretically: Either or
For computationally bounded observer: May be
Making this formal
Classical Bayesian Uncertainty: posterior distribution
graph with (unknown) maximum clique of size
𝜇 : {0,1 }𝑛→ℝ ,∀ 𝑥 ,𝜇 (𝑥 )≥0 ,∑𝑥
𝜇 (𝑥 )=1
consistent with observations:
Theorem: 𝑆𝑂𝑆𝑑 (𝐺 )= max𝜇:𝑑−𝑝 .𝑑𝑖𝑠𝑡
𝔼𝜇∑𝑥 𝑖
Computational
degree pseudo-distribution
𝔼𝜇𝑝2≥0 ∀𝑝 , deg (𝑝 )≤ 𝑑/2
Corollary:
Convex set.Defined by eq’s +
PSD constraint
Making this formal
Classical Bayesian Uncertainty: posterior distribution
graph with (unknown) maximum clique of size
𝜇 : {0,1 }𝑛→ℝ ,∀ 𝑥 ,𝜇 (𝑥 )≥0 ,∑𝑥
𝜇 (𝑥 )=1
consistent with observations:
Theorem: 𝑆𝑂𝑆𝑑 (𝐺 )= max𝜇:𝑑−𝑝 .𝑑𝑖𝑠𝑡
𝔼𝜇∑𝑥 𝑖
Computational
degree pseudo-distribution
𝔼𝜇𝑝2≥0 ∀𝑝 , deg (𝑝 )≤ 𝑑/2
Corollary:
Convex set.Defined by eq’s +
PSD constraint
A General Perspective:
For every sound but incomplete proof system
is -pseudo-distribution consistent w observations
for every function and number .
If then
incomplete might not be actual distribution.
Computational analog to Bayesian probabilities.
Algorithms : Proof Systems
Frequentist : BayesianPseudorandom : Pseudodistribution
Planted Clique Problem
Distinguish between and clique
[Karp’76,Kucera’95]
Theorem [Lovász’79,Juhász’82]:
No known poly time algorithm does better than
Theorem [Feige-Krauthgamer’02]:
Central problem in average-case complexity. Related to problems in statistics, sparse recovery, finding equilibrium, …[Hazan-Krauthgamer’09, Koiran-Zouzias’12, Berthet-Rigolet’12]
Can SOS do better?
“Theorem” [Meka-Wigderson’13]:
𝔼𝜇∏𝑖∈𝑇
𝑥 𝑖≅
“Proof”: Let and define of “maximal ignorance”:
otherwise
, is a clique
(Same pseudo-dist as used for by Feige-Krauthgamer)
Bug [Pisier]: Concentration bound is false.
In fact, for , deg 2 s.t. [Kelner]
Moments are OK for [Meka-Potechin-Wigderson’15, Desphande-Montanari’15, Hopkins-Kothari-Potechin’15 ]
is valid p-dist assuming higher degree Matrix-valued Chernoff bound
MW’s “moral” error
Pseudo-distributions should be as simple as possible but not simpler.
Following A. Einstein.
Pseudo-distributions should have maximum entropy but respect the data.
MW violated Bayeisan reasoning:
According to MW:
Consider , ,
??
𝑖∈𝑆⇒ deg (𝑖 )∼𝑁 (𝑛2 +𝑘 ,𝑛2 )
𝑖∉𝑆⇒ deg (𝑖 )∼𝑁 (𝑛2 ,𝑛2 )
should be reweighed by
By Bayesian reasoning:
Thm: Bayesian moments get for [Hopkins-Kothari-Potechin-Raghavendra-Schramm’16]
Pseudo-distributions should have maximum entropy but respect the data.
Why is MW’s error interesting?• Shows SoS captures Bayesian reasoning in a
way that other algorithms do not.
Even if SoS is not the optimal algorithm we’re looking for, the dream of a more general theory of hardness, easiness and knowledge is worth pursuing.
• Suggests new way to define what a computationally bounded observer knows about some quantity..
• ..and a more principled way to design algorithms based on such knowledge. (see [B-
Kelner-Steurer’14,’15])
Why is MW’s error interesting?
• Shows SoS captures Bayesian reasoning in a way that other algorithms do not.
• Suggests new way to define what a computationally bounded observer knows about some quantity..
• ..and a more principled way to design algorithms based on such knowledge. (see [B-
Kelner-Steurer’14,’15])Even if SoS is not the optimal algorithm we’re looking for, the dream of a more general theory of hardness, easiness and knowledge is worth pursuing.
Thanks!!