Upload
colleen-liddane
View
19
Download
1
Embed Size (px)
DESCRIPTION
On Agnostic Boosting and Parity Learning. Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua. Defs. Agnostic Learning = learning with adversarial noise Boosting = turn weak learner into strong learner Parities = parities of subsets of the bits - PowerPoint PPT Presentation
Citation preview
Adam Tauman Kalai, Georgia Tech.
Yishay Mansour, Google and Tel-Aviv
Elad Verbin, Tsinghua
On Agnostic Boosting and Parity Learning
Defs• Agnostic Learning = learning with adversarial noise• Boosting = turn weak learner into strong learner• Parities = parities of subsets of the bits
• f:{0,1}n→{0,1}. f(x)=x1x3x7
1. Agnostic Boosting• Turning a weak agnostic learner to a strong
agnostic learner
2. 2O(n/logn)-time algorithm for agnostically learning parities over any distribution
Outline
AgnosticBoosterAgnosticBooster
Agnostic boostingWeak learner. For any noise rate < ½ produces
a better-than-trivial hypothesis
Strong Learner. Produces almost-optimal hypothesisRuns weak learner as black box
Learning with Noise
Learning without noise
well understood*
with random noise well understood* (SQ model)
with agnostic noise
* up to well-studied open problems (i.e. we know where we’re stuck)
It’s, like, a really hard model!!!
Agnostic Learning: some known resultsclass Ground distribution notes
HalfspacesKalai, Klivans, Mansour, Servedio]
uniform, log-concave
Parities[Feldman, Gopalan, Khot, Ponnuswami]
uniform 2O(n/logn)
Decision Trees[Gopalan, Kalai, Klivans]
uniform with MQ
DisjunctionsKalai, Klivans, Mansour, Servedio]
all distributions 2O(√n)
??? all distributions
Agnostic Learning: some known resultsclass Ground distribution notes
HalfspacesKalai, Klivans, Mansour, Servedio]
uniform, log-concave
Parities[Feldman, Gopalan, Khot, Ponnuswami]
uniform 2O(n/logn)
Decision Trees[Gopalan, Kalai, Klivans]
uniform with MQ
DisjunctionsKalai, Klivans, Mansour, Servedio]
all distributions 2O(√n)
??? all distributions
Due to hardness, or lack of tools???Agnostic boosting: strong tool, makes
it easier to design algorithms.
Why care about agnostic learning? More relevant in practice
Impossibility results might be useful for building cryptosystems
Non-noisy learning ≈ CSP
Agnostic Learning ≈ MAX-CSP
Noisy learning No noise
Random noise
Adversarial (≈agnostic) noise
f
f
fLearning algorithm.
Should approximate gup to error +
g
Learning algorithm.Should approximate f
up to error
Learning algorithm.Should approximate f
up to error
f:{0,1}n→{0,1} from class F.alg gets samples <x,f(x)> wherex is drawn from distribution D.
allowed to corrupt -fraction
% noise
Agnostic learning (geometric view)
Foptopt +
g PROPER LEARNING
f
Parameters: F, metricInput: oracle for gGoal: return some element of blue ball
Agnostic boosting
weaklearnerweak
learner errD(g,h)· ½ -
w.h.p. h
opt · ½ -
g
definition
D
Agnostic boosting
weaklearnerweak
learner errD(g,h)· ½ -
w.h.p. h
opt · ½ -
g
AgnosticBoosterAgnosticBooster
w.h.p. h’ Samples from g
Runs weak learner poly(times
errD(g,h’) · opt +
D
Agnostic boosting
(,)-weaklearner
(,)-weaklearner errD(g,h)· ½ -
w.h.p. h
opt · ½ -
g
AgnosticBoosterAgnosticBooster
w.h.p. h’ Samples from g
Runs weak learner poly(times
errD(g,h’) · opt + +
D
AgnosticBoosterAgnosticBooster
Agnostic boostingWeak learner. For any noise rate < ½ produces
a better-than-trivial hypothesis
Strong Learner. Produces almost-optimal hypothesis
“ApproximationBooster”
“ApproximationBooster”
Analogypoly-time MAX-3-SAT algorithm that
when opt=7/8+ε produces solution with value 7/8+ε100
algorithm for MAX-3-SATproduces solution with value opt +
running time poly(n,
Gap
½ 0 1
No hardness gap close to ½
booster
no gap anywhere(additive PTAS)
Agnostic boosting New Analysis for Mansour-McAllester booster.
uses branching programs; nodes are weak hypotheses
Previous Agnostic Boosting: Ben-David+Long+Mansour, and Gavinsky, defined
agnostic boosting differently. Their result cannot be used for our application
Booster
h1
x
h1 (x)=1h 1
(x)=
0
10
Booster: Split step
h1
x
h1 (x)=1h 1
(x)=
0
1h2 h2 (x)=1h 2
(x)=
0
10
different distribution
h1 h1 (x)=1h 1
(x)=
0
h2’ h2 ‘(x)=1h 2
‘(x)=
0
10
0
different distribution
choose the “better” option
Booster: Split step
h1
x
h1 (x)=1h 1
(x)=
0
1h2 h2 (x)=1h 2
(x)=
0
0 h3 h3 (x)=1H 3
(x)=
0
10
Booster: Split step
h1
x
h1 (x)=1
h 1(x
)=0
h2 h2 (x)=1h 2
(x)=
0
0 h3 h3 (x)=1H 3
(x)=
0
10
h4 h4 (x)=1H 4
(x)=
0
10
…
Booster: Merge step
h1
x
h1 (x)=1
h 1(x
)=0
h2 h2 (x)=1h 2
(x)=
0
0 h3 h3 (x)=1H 3
(x)=
0
10
h4 h4 (x)=1H 4
(x)=
0
10
Merge if “similar”
Booster: Merge step
h1
x
h1 (x)=1
h 1(x
)=0
h2 h2 (x)=1h 2
(x)=
0
0 h3 h3 (x)=1
H 3(x
)=0
0
h4
h 4(x
)=1
H 4(x
)=0
1
0
Booster: Another split step
h1
x
h1 (x)=1
h 1(x
)=0
h2 h2 (x)=1h 2
(x)=
0
0 h3 h3 (x)=1
H 3(x
)=0
0
h4
h 4(x
)=1
H 4(x
)=0
0
h5
0 1 …
Booster: final result
h1
x
h1 h1
h1h1 h1
h1 h1h1
h1h1
0 1
Agnostically learning parities
Application: Parity with Noise Uniform
distributionAny distribution
Random Noise 2O(n/logn)
[Blum Kalai Wasserman]
Agnostic learning
2O(n/logn)
[Feldman Gopalan Khot Ponnuswami], via Fourier
2O(n/logn)
This work*
* non-proper learner. hypothesis is circuit with 2O(n/logn) gates Feldman et al give black-box reduction to random-noise case. We give direct result
Theorem: ε, have weak learner that for noise ½-ε produces an hypothesis which is wrong on ½-(2ε)n0.001/2 fraction of space. Running time 2O(n/logn)
Corollary: Learners for many classes (without noise)
Can learn without noise any class with “guaranteed correlated parity”, in time 2O(n/logn)
e.g. DNF, any others? A weak parity learner that runs in 2O(n0.32) time
would beat the best algorithm known for learning DNF Good evidence that parity with noise is hard
efficient cryptosystems[Hopper-Blum, Blum-Furst-etal, and many others]
?
Main Idea: 1. Take Learner which
resists random noise (BKW)
2. Add Randomness to its behavior, until you get a Weak Agnostic learner.“Between two evils, I pick the one I haven’t tried before”
– Mae West
“Between two evils, I pick uniformly at random”
– CS folklore
Idea of weak agnostic parity learner
Summary
Problem: It is difficult but perhaps possible to design agnostic learning algorithms.
Proposed Solution: Agnostic Boosting.
Contributions:
1. Right(er) definition for weak agnostic learner
2. Agnostic boosting
3. Learning Parity with noise in hardest noise model
4. Entertaining STOC ’08 participants
Open Problems
1. Find other applications for Agnostic Boosting
2. Improve PwN algorithms. Get proper learner for parity with noise Reduce PwN with agnostic noise to PwN with
random noise
3. Get evidence that PwN is hard Prove that if parity with noise is easy then
FACTORING is easy. 128$ reward!
May the parity be with you!
The end.
Sketch of weak parity learner
Weak parity learner Sample labeled points from distribution, sample
unlabeled x, let’s guess f(x)
+ + +
to next round
Bucket according to last 2n/logn bits
Weak parity learner
+ + +
LAST ROUND:
=0=0 =0
√n vectors with sum=0. gives guess for f(x)
Weak parity learner
+ + +
LAST ROUND:
=0=0 =0
n 49.02/1
√n vectors with sum=0. gives guess for f(x) by symmetry, prob. of mistake = %mistakes Claim: %mistakes (Cauchy-Schwartz)
Intuition behind two main parts
Intuition behind Boosting
Intuition behind Boosting
decrease weight
increase weight
Intuition behind Boosting
decrease weight
increase weight
2
1
0 Run, reweight, run, reweight, … . Take majority of
hypotheses. Algorithmic & Efficient Yao-von Neumann Minimax
Principle
1