Upload
jennifer-egan
View
219
Download
4
Embed Size (px)
Citation preview
Reductions to the Noisy Parity Problem
Vitaly FeldmanParikshit Gopalan
Subhash KhotAshok K. Ponnuswami
HarvardUWGeorgia TechGeorgia Tech
aka
New Results on Learning Parities, Halfspaces, Monomials, Mahjongg etc.
Uniform Distribution Learning
x, f(x) x ← {0,1}n
f: {0,1}n ! {+1,-1}
Goal: Learn the function f in poly(n) time.
Uniform Distribution Learning
x, f(x)
Goal: Learn the function f in poly(n) time.
Information theoretically impossible.
Will assume f has nice structure, such as
1. Parity f(x) = (-1)·x
2. Halfspace f(x) = sgn(w·x)
3. k-junta f(x) = f(xi1,…,xik
)
4. Decision Tree
5. DNF
®(x) = (¡ 1)P
i 2 ®xi
Uniform Distribution Learning
x, f(x)
Goal: Learn the function f in poly(n) time.
1. Parity nO(1) Gaussian elim.
2. Halfspace nO(1) LP
3. k-junta n0.7k [MOS]
4. Decision Tree nlog n Fourier
5. DNF nlog n Fourier
®(x) = (¡ 1)P
i 2 ®xi
Uniform Distribution Learning with Random Noise
x, (-1)e·f(x)
Goal: Learn the function f in poly(n) time.
x ← {0,1}n
f: {0,1}n ! {+1,-1}
e = 1 w.p = 0w.p 1 -
x, (-1)e·f(x)
Goal: Learn the function f in poly(n) time.
1. Parity Noisy Parity
2. Halfspace nO(1) [BFKV]
3. k-junta nk Fourier
4. Decision Tree nlog n Fourier
5. DNF nlog n Fourier
®(x) = (¡ 1)P
i 2 ®xi
Uniform Distribution Learning with Random Noise
Coding Theory: Decoding a random linear code from random noise.
Best Known Algorithm:
2n/log n Blum-Kalai-Wasserman [BKW]
Believed to be hard.
Variant: Noisy parity of size k. Brute force runs in time O(nk).
®(x) = (¡ 1)P
i 2 ®xi
The Noisy Parity Problem
x, (-1)e·f(x)
Agnostic Learning under the Uniform Distribution
x, g(x)
Goal: Get an approx. to g that is as good as f.
g(x) is a {-1,+1} random variable.
Prx[g(x) f(x)] ≤
x, g(x)
Goal: Get an approx. to g that is as good as f.
If the function f is a
1. Parity 2n/log n [FGKP]
2. Halfspace nO(1) [KKMS]
3. k-junta nk [KKMS]
4. Decision Tree nlog n [KKMS]
5. DNF nlog n [KKMS]
®(x) = (¡ 1)P
i 2 ®xi
Agnostic Learning under the Uniform Distribution
x, g(x)
Given g which has a large Fourier coefficient, find it.
Coding Theory: Decoding a random linear code with adversarial noise.
If queries were allowed:
• Hadamard list decoding [GL, KM].
• Basis of algorithms for Decision trees [KM], DNF [Jackson].
®(x) = (¡ 1)P
i 2 ®xi
Agnostic Learning of Parities
Reductions between problems and models
x, f(x) x, g(x)
Noise-free Random Agnostic
x, (-1)e·f(x)
Reductions to Noisy Parity
Theorem [FGKP]: Learning Juntas, Decision Trees and DNFs reduce to learning noisy parities of size k.
Class Size of Parity Error-rate
k-junta k ½ - 2-k
Decision tree, DNF
log n ½ - n-2
x = y
Uniform Distribution Learning
x, f(x)
Goal: Learn the function f in poly(n) time.
1. Parity nO(1) Gaussian elim.
2. Halfspace nO(1) LP
3. k-junta n0.7k [MOS]
4. Decision Tree nlog n Fourier
5. DNF nlog n Fourier
®(x) = (¡ 1)P
i 2 ®xi
Reductions to Noisy Parity
Theorem [FGKP]: Learning Juntas, Decision Trees and DNFs reduce to learning noisy parities of size k.
Class Size of Parity Error-rate
k-junta k ½ - 2-k
Decision tree, DNF
log n ½ - n-2
Evidence in favor of noisy parity being hard?
Reduction holds even with random classification noise.
x = y
x, (-1)e·f(x)
Goal: Learn the function f in poly(n) time.
1. Parity Noisy Parity
2. Halfspace nO(1) [BFKV]
3. k-junta nk Fourier
4. Decision Tree nlog n Fourier
5. DNF nlog n Fourier
®(x) = (¡ 1)P
i 2 ®xi
Uniform Distribution Learning with Random Noise
Reductions to Noisy Parity
Theorem [FGKP]: Agnostically learning parity with error-rate reduces to learning noisy parity with error-rate .
With BKW, gives 2n/log n agnostic learning algorithm.
Main Idea: A noisy parity algorithm can help find large Fourier coefficients from random examples.
Reductions between problems and models
x, f(x) x, g(x)
Noise-free Random Agnostic
x, (-1)e·f(x)
Probabilistic Oracle
Probabilistic Oracles
Given h: {0,1}n ! [-1,1]
h
x, b
x ← {0,1}n, b 2 {-1,+1}.
E[b | x] = h(x).
Simulating Noisefree Oracles
x, f(x)
f
x, b
E[b | x] = f(x) 2 {-1,1}, hence b = f(x)
Let f: {0,1}n ! {-1,1}.
Simulating Random Noise
x, f(x)
0.8f
x, b
E[b | x] = 0.8 f(x)
Hence b = f(x) w.p 0.9
b = -f(x) w.p 0.1
Given f: {0,1}n ! {-1,1} and = 0.1
Let h(x) = 0.8 f(x).
Simulating Adversarial Noise
x, g(x)
h
x, b
Given g(x) is a {-1,1} r.v. and Prx[g(x) f(x)] = .
Let h(x) = E[g(x)].
≡
Bound on error rate implies Ex[|h(x) – f(x)|] <
Reductions between problems and models
x, f(x) x, g(x)
Noise-free Random Agnostic
x, (-1)e·f(x)
Probabilistic Oracle
… for the slideshow.