39
Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua On Agnostic Boosting and Parity Learning

Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Embed Size (px)

DESCRIPTION

On Agnostic Boosting and Parity Learning. Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua. Defs. Agnostic Learning = learning with adversarial noise Boosting = turn weak learner into strong learner Parities = parities of subsets of the bits - PowerPoint PPT Presentation

Citation preview

Page 1: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Adam Tauman Kalai, Georgia Tech.

Yishay Mansour, Google and Tel-Aviv

Elad Verbin, Tsinghua

On Agnostic Boosting and Parity Learning

Page 2: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Defs• Agnostic Learning = learning with adversarial noise• Boosting = turn weak learner into strong learner• Parities = parities of subsets of the bits

• f:{0,1}n→{0,1}. f(x)=x1x3x7

1. Agnostic Boosting• Turning a weak agnostic learner to a strong

agnostic learner

2. 2O(n/logn)-time algorithm for agnostically learning parities over any distribution

Outline

Page 3: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

AgnosticBoosterAgnosticBooster

Agnostic boostingWeak learner. For any noise rate < ½ produces

a better-than-trivial hypothesis

Strong Learner. Produces almost-optimal hypothesisRuns weak learner as black box

Page 4: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Learning with Noise

Learning without noise

well understood*

with random noise well understood* (SQ model)

with agnostic noise

* up to well-studied open problems (i.e. we know where we’re stuck)

It’s, like, a really hard model!!!

Page 5: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Agnostic Learning: some known resultsclass Ground distribution notes

HalfspacesKalai, Klivans, Mansour, Servedio]

uniform, log-concave

Parities[Feldman, Gopalan, Khot, Ponnuswami]

uniform 2O(n/logn)

Decision Trees[Gopalan, Kalai, Klivans]

uniform with MQ

DisjunctionsKalai, Klivans, Mansour, Servedio]

all distributions 2O(√n)

??? all distributions

Page 6: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Agnostic Learning: some known resultsclass Ground distribution notes

HalfspacesKalai, Klivans, Mansour, Servedio]

uniform, log-concave

Parities[Feldman, Gopalan, Khot, Ponnuswami]

uniform 2O(n/logn)

Decision Trees[Gopalan, Kalai, Klivans]

uniform with MQ

DisjunctionsKalai, Klivans, Mansour, Servedio]

all distributions 2O(√n)

??? all distributions

Due to hardness, or lack of tools???Agnostic boosting: strong tool, makes

it easier to design algorithms.

Page 7: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Why care about agnostic learning? More relevant in practice

Impossibility results might be useful for building cryptosystems

Non-noisy learning ≈ CSP

Agnostic Learning ≈ MAX-CSP

Page 8: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Noisy learning No noise

Random noise

Adversarial (≈agnostic) noise

f

f

fLearning algorithm.

Should approximate gup to error +

g

Learning algorithm.Should approximate f

up to error

Learning algorithm.Should approximate f

up to error

f:{0,1}n→{0,1} from class F.alg gets samples <x,f(x)> wherex is drawn from distribution D.

allowed to corrupt -fraction

% noise

Page 9: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Agnostic learning (geometric view)

Foptopt +

g PROPER LEARNING

f

Parameters: F, metricInput: oracle for gGoal: return some element of blue ball

Page 10: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Agnostic boosting

weaklearnerweak

learner errD(g,h)· ½ -

w.h.p. h

opt · ½ -

g

definition

D

Page 11: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Agnostic boosting

weaklearnerweak

learner errD(g,h)· ½ -

w.h.p. h

opt · ½ -

g

AgnosticBoosterAgnosticBooster

w.h.p. h’ Samples from g

Runs weak learner poly(times

errD(g,h’) · opt +

D

Page 12: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Agnostic boosting

(,)-weaklearner

(,)-weaklearner errD(g,h)· ½ -

w.h.p. h

opt · ½ -

g

AgnosticBoosterAgnosticBooster

w.h.p. h’ Samples from g

Runs weak learner poly(times

errD(g,h’) · opt + +

D

Page 13: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

AgnosticBoosterAgnosticBooster

Agnostic boostingWeak learner. For any noise rate < ½ produces

a better-than-trivial hypothesis

Strong Learner. Produces almost-optimal hypothesis

Page 14: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

“ApproximationBooster”

“ApproximationBooster”

Analogypoly-time MAX-3-SAT algorithm that

when opt=7/8+ε produces solution with value 7/8+ε100

algorithm for MAX-3-SATproduces solution with value opt +

running time poly(n,

Page 15: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Gap

½ 0 1

No hardness gap close to ½

booster

no gap anywhere(additive PTAS)

Page 16: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Agnostic boosting New Analysis for Mansour-McAllester booster.

uses branching programs; nodes are weak hypotheses

Previous Agnostic Boosting: Ben-David+Long+Mansour, and Gavinsky, defined

agnostic boosting differently. Their result cannot be used for our application

Page 17: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Booster

h1

x

h1 (x)=1h 1

(x)=

0

10

Page 18: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Booster: Split step

h1

x

h1 (x)=1h 1

(x)=

0

1h2 h2 (x)=1h 2

(x)=

0

10

different distribution

h1 h1 (x)=1h 1

(x)=

0

h2’ h2 ‘(x)=1h 2

‘(x)=

0

10

0

different distribution

choose the “better” option

Page 19: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Booster: Split step

h1

x

h1 (x)=1h 1

(x)=

0

1h2 h2 (x)=1h 2

(x)=

0

0 h3 h3 (x)=1H 3

(x)=

0

10

Page 20: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Booster: Split step

h1

x

h1 (x)=1

h 1(x

)=0

h2 h2 (x)=1h 2

(x)=

0

0 h3 h3 (x)=1H 3

(x)=

0

10

h4 h4 (x)=1H 4

(x)=

0

10

Page 21: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Booster: Merge step

h1

x

h1 (x)=1

h 1(x

)=0

h2 h2 (x)=1h 2

(x)=

0

0 h3 h3 (x)=1H 3

(x)=

0

10

h4 h4 (x)=1H 4

(x)=

0

10

Merge if “similar”

Page 22: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Booster: Merge step

h1

x

h1 (x)=1

h 1(x

)=0

h2 h2 (x)=1h 2

(x)=

0

0 h3 h3 (x)=1

H 3(x

)=0

0

h4

h 4(x

)=1

H 4(x

)=0

1

0

Page 23: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Booster: Another split step

h1

x

h1 (x)=1

h 1(x

)=0

h2 h2 (x)=1h 2

(x)=

0

0 h3 h3 (x)=1

H 3(x

)=0

0

h4

h 4(x

)=1

H 4(x

)=0

0

h5

0 1 …

Page 24: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Booster: final result

h1

x

h1 h1

h1h1 h1

h1 h1h1

h1h1

0 1

Page 25: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Agnostically learning parities

Page 26: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Application: Parity with Noise Uniform

distributionAny distribution

Random Noise 2O(n/logn)

[Blum Kalai Wasserman]

Agnostic learning

2O(n/logn)

[Feldman Gopalan Khot Ponnuswami], via Fourier

2O(n/logn)

This work*

* non-proper learner. hypothesis is circuit with 2O(n/logn) gates Feldman et al give black-box reduction to random-noise case. We give direct result

Theorem: ε, have weak learner that for noise ½-ε produces an hypothesis which is wrong on ½-(2ε)n0.001/2 fraction of space. Running time 2O(n/logn)

Page 27: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Corollary: Learners for many classes (without noise)

Can learn without noise any class with “guaranteed correlated parity”, in time 2O(n/logn)

e.g. DNF, any others? A weak parity learner that runs in 2O(n0.32) time

would beat the best algorithm known for learning DNF Good evidence that parity with noise is hard

efficient cryptosystems[Hopper-Blum, Blum-Furst-etal, and many others]

?

Page 28: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Main Idea: 1. Take Learner which

resists random noise (BKW)

2. Add Randomness to its behavior, until you get a Weak Agnostic learner.“Between two evils, I pick the one I haven’t tried before”

– Mae West

“Between two evils, I pick uniformly at random”

– CS folklore

Idea of weak agnostic parity learner

Page 29: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Summary

Problem: It is difficult but perhaps possible to design agnostic learning algorithms.

Proposed Solution: Agnostic Boosting.

Contributions:

1. Right(er) definition for weak agnostic learner

2. Agnostic boosting

3. Learning Parity with noise in hardest noise model

4. Entertaining STOC ’08 participants

Page 30: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Open Problems

1. Find other applications for Agnostic Boosting

2. Improve PwN algorithms. Get proper learner for parity with noise Reduce PwN with agnostic noise to PwN with

random noise

3. Get evidence that PwN is hard Prove that if parity with noise is easy then

FACTORING is easy. 128$ reward!

Page 31: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

May the parity be with you!

The end.

Page 32: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Sketch of weak parity learner

Page 33: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Weak parity learner Sample labeled points from distribution, sample

unlabeled x, let’s guess f(x)

+ + +

to next round

Bucket according to last 2n/logn bits

Page 34: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Weak parity learner

+ + +

LAST ROUND:

=0=0 =0

√n vectors with sum=0. gives guess for f(x)

Page 35: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Weak parity learner

+ + +

LAST ROUND:

=0=0 =0

n 49.02/1

√n vectors with sum=0. gives guess for f(x) by symmetry, prob. of mistake = %mistakes Claim: %mistakes (Cauchy-Schwartz)

Page 36: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Intuition behind two main parts

Page 37: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Intuition behind Boosting

Page 38: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Intuition behind Boosting

decrease weight

increase weight

Page 39: Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Intuition behind Boosting

decrease weight

increase weight

2

1

0 Run, reweight, run, reweight, … . Take majority of

hypotheses. Algorithmic & Efficient Yao-von Neumann Minimax

Principle

1