View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Optimization, Learnability, and Games:From the Lens of Smoothed Analysis
Shang-Hua TengComputer Science@Viterbi School of Engineering@USC
Joint work with Daniel Spielman (Yale), Heiko Röglin (Maastricht University), Adam Kalai (Microsoft New England Lab), Alex Samorodnitsky (Hebrew University), Xi Chen (USC), Xiaotie Deng (City University of Hong Kong)
This Talk
• Part I: Overview of Smoothed Analysis
• Part II: Multiobjective Optimization
• Part III: Machine Learning
• Part VI: Games, Markets and Equilibrium
• Part V: Discussions
Practical Performance of Algorithms “While theoretical work on models of computation and
methods for analyzing algorithms has had enormous payoff, we are not done. In many situations, simple algorithms do well. Take for example the Simplex algorithm for linear programming, or the success of simulated annealing of contain supposedly intractable problems. We don't understand why! It is apparent that worst-case analysis does not provide useful insights on the performance of algorithms and heuristics and our models of computation need to be further developed and refined. Theoreticians are investing increasingly in careful experimental work leading to identification of important new questions in algorithms area. Developing means for predicting the performance of algorithms and heuristics on real data and on real computers is a grand challenge in algorithms.”
-- Challenges for Theory of Computing: Report for an NSF-Sponsored Workshop on Research in Theoretical Computer Science (Condon, Edelsbrunner, Emerson, Fortnow, Haber, Karp, Leivant, Lipton, Lynch, Parberry, Papadimitriou, Rabin, Rosenberg, Royer, Savage,Selman, Smith, Tardos, and Vitter), 1999
Linear Programming & Simplex Method
max s.t.
Worst-Case: exponentialAverage-Case: polynomialWidely used in practice
Smoothed Analysis of Simplex Method(Spielman + Teng, 2001)
Theorem: For all A, b, c, simplex method takes expected time polynomial in
max s.t.
maxs.t.
G is Gaussian
Smoothed Complexity
Interpolates between worst and average case
Considers neighborhood of every input
If low, all bad inputs are unstable
Data in practice are not arbitrary but could be generated with noises and imprecision
Optimization: Single Criterion & Multiobjective
min f(x) subject to x ∈ S.
Examples:
• Linear Programming
• Shortest path
• Minimum spanning tree
• TSP
• Set cover
Optimization: Single Criterion & Multiobjective
Real-life logistical problems often involve multiple
objectives
• Travel time, fare, departure time
• Delay, cost, reliability
• Profit and risk
Optimization: Single Criterion & Multiobjective
min f1(x), ..., min fd(x) subject to x ∈ S
There may not be a solution that is simultaneously optimal for all fi
Question: What can we do algorithmically to support a decision maker?
Decision Makers only Choose Pareto-Optimal Solutions
Fact: Every monotone function, e.g., 1 f1(x)+ ... +d fd(x)is optimized by a Pareto-optimal solution.
Computational Problem:Return the Pareto curve (surface, set)
Decision Makers only Choose Pareto-Optimal Solutions
Return the Pareto curve (surface, set)
Central Question: How large is the Pareto set?
A Concrete Model
S : can encode arbitrary combinatorial structure.
Examples: all paths from s to t, all Hamiltonian cycles, all spanning trees, . . .
How Large can a Pareto Set be?
• Worst Case: Exponential
• In Practice: Usually smaller
– Müller-Hannemann, Weihe (2001)
Train Connection
travel time, fare, number of train changes
This Talk
• Part I: Overview of Smoothed Analysis
• Part II: Multi-objective Optimization
• Part III: Machine Learning
• Part VI: Games, Markets and Equilibrium
• Part V: Discussions
P.A.C. Learning!?
X = {0,1}ⁿ f: X → {–1,+1}
PAC assumption: target is from a particular concept class
(for example, an AND, e.g. f(x) = “Bank” & “Adam” & “Free”)
Input: training data (xj from D, f(xj)) j≤mNoiseless
NIGERIA BANK VIAGRA ADAM LASER SALE FREE IN f(x)
x1 YES YES YES NO NO YES NO YES SPAM
x2 YES NO NO YES YES YES YES YES LEGIT
x3 NO YES YES YES YES YES YES YES LEGIT
x4 YES YES YES NO NO NO NO YES SPAM
x5 YES YES YES YES YES NO YES YES SPAM
[Valiant84]
P.A.C Learning
Poly-time learning algorithm– Succeed with prob. ≥ 1- (e.g. 0.99)
– m = # examples = poly(n/ε)
Output: h: X → {–1,+1} with
err(h) = Prx←D[ h(x)≠f(x) ] ≤
OPTIONAL: “Proper” learning: the class from which h is.
Agnostic P.A.C. Learning!?
X = {0,1}ⁿ f: X → {–1,+1}
Without PAC assumption: target is from a particular concept class
Input: training data (xj from D, f(xj)) j≤m
Poly-time learning algorithm
– Succeed with prob. ≥ 1- (e.g. 0.99)
– m = # examples = poly(n/ε)
Output: h: X → {–1,+1} with
err(h) = Prx←D[ h(x) ≠ f(x) ] ≤ + ming from the class err(g)
[Kearns Schapire Sellie 92]
• Computation is limiting resource– “Easy” ignoring computation
– YET: Children learn many things computers can’t
– Worst-case poly-time algorithms?• PAC-learn DNF, Decision trees, juntas
• Learning parity with noise
Computational Learning Theory
Some Smoothed Results in Learning(Kalai-Samorodnitsky-Teng)
• PAC learn decision trees over smoothed (constant-bounded) product distributions
• PAC learn DNFs over smoothed (constant-bounded) product distribution
• Agnostically learn decision trees over smoothed (constant-bounded) product distributions
A Formal Statement of the First Result
For μ ϵ [0,1]ⁿ, let πμ be the product distribution where entries of μ define the mean of Boolean variables
Theorem 1: Concept Function: decision tree f: {0,1}ⁿ → {–1,+1} of size s Distribution: πμ defined by μ ϵ ν+[–.01,.01]ⁿ where ν ϵ [.02,.98]ⁿ Data: m=poly(ns/ε) training examples (xj, f(xj)) j≤m: xj iid from πμ, Learning Algorithm: a polynomial-time algorithm Output: a function h Quality: Prx←πμ
[ sgn(h(x))≠f(x) ] ≤ ε.
Fourier over Product Distributions
• x ϵ {0,1}ⁿ, μ ϵ [0,1]ⁿ,
1
22
2
(1 )
[ ] (also called ( , ))
ˆ ˆ( ) ( ) ( ) E [ ( ) ]
ˆ ˆ ( ) (Pars
for any
, where
1 eval)
ˆ ˆ ( )
i ii
i i
S i Si S
S SS
S
S
x
n x
f x f S f
x
x x S
x S f x
f f
x
S
f f S
E [ ]i ix
Non-Concentration Bound on Fourier Structures
For any f:{0,1}ⁿ→{–1,1}, α,β > 0, and d ≥ 1,
Continuous generalization of Schwartz-Zippel theoremLet p:Rⁿ→R be a degree-d multi-linear polynomial with leading
coefficient of 1. Then, for any ό>0,
25[.49,.51]
ˆ ˆPr s.t. ( ) ( ) 200n
dS T f S f T T d
[ 1,1]Pr ( ) 2
n
d
xp x
ò ò
e.g., p(x)=x1x2x9+.3x7–0.2
Some Related Work
• Decision Trees:
• P.A.C. Membership Queries:
• Uniform Distributions
[Kushilevitz-Mansour’91; Goldreich-Levin’89]
• [Bshouty’94]
• Agnostic Membership Queries: Uniform D
[Gopalan-Kalai-Klivans’08]• DNF: P.A.C. Membership Queries + Uniform D [Jackson’94]
Some Smoothed Results in Learning(Kalai-Samorodnitsky-Teng)
• PAC learn decision trees over smoothed (constant-bounded) product distributions
• PAC learn DNFs over smoothed (constant-bounded) product distribution
• Agnostically learn decision trees over smoothed (constant-bounded) product distributions
Multi-Objective Optimization
President UUSA(xUSA,xCA,xMA,…)
Pareto optimum [Approximation]
UCA(xUSA,xCA,xMA …) UMA(xUSA,xCA,xMA,…)
Multi-Player Games
President UUSA(xUSA,xCA,xMA,…)
Best response Nash equilibrium
Governor of CAUCA(xUSA,xCA,xMA,…)
Governor of MAUMA(xUSA,xCA,xMA,…)
“Is the smoothed complexity of (another classic algorithm,)Lemke-Howson (algorithm) for two-player games,
polynomial?”
0
0
1
-1
-1
1
-1
1
0
0
1
-1
1
-1
-1
1
0
0
BIMATRIX Games
Mixed Strategies
Mixed equilibrium always exists:
Search Problem: Find an equilibrium
Nash Equilibria in Two-Player Games
Arrow-Debreu Equilibrium Price
A price vector
Distributed Exchange
• Every Trader:– Sells the initial endowment to “market”: (to get a budget)
– Buys from the “market” to optimize her individual utilities
• Market Clearing Price
Complexity of Nash Equilibria
[Daskalakis-Goldberg-Papadimitriou, 2005]• For any constant k ≥ 3, NASH is PPAD-hard.
[Chen-Deng, 2005]• 2-player NASH is PPAD-complete.
[Chen-Deng-Teng, 2006]• If PPAD is not in P, then 2-player NASH does not have a fully
polynomial-time approximation scheme
Smoothed Complexity of Equilibria
[Chen-Deng-Teng, 2006]
• NO Smoothed Polynomial-Time Complexity for Lemke-Howson or any BIMATRIX algorithm, unless computation of game and market equilibria and Brouwer fixed points is in randomized P!
[Huang-Teng, 2006]
• Computation of Arrow-Debreu equilibria in Leontief Exchange Economies is not in Smoothed P, unless …
Tale of Two Types of Equilibria
Local Search
(Potential Games)• Linear Programming
– P
• Simplex Method– Smoothed P
• PLS– FPTAS
• Intuitive
Fixed-Point Computation
(Matrix Games)• 2-Player Nash equilibrium
– Unknown
• Lemke-Howson Algorithm– If in P, then NASH in RP
• PPAD– FPTAS, then NASH in RP
• Intuitive to some
Random Separation of Local Search and Fixed Point Computation
Aldous (1983): • Randomization helps local search
Chen & Teng (2007):• Randomization doesn’t help Fixed-Point-
Computation!!!
… in the black-box query model
Open Questions
• How hard is PPAD?
• Non-concentration of multi-linear polynomials
• Optimal smoothed bound for Pareto Sets