103
1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald de Wolf)

Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

1/ 23

OptimalQuantum Sample Complexity

of Learning Algorithms

Srinivasan Arunachalam

(Joint work with Ronald de Wolf)

Page 2: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

2/ 23

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves

Practical goal: learn“something” from given data

Recent success: deep learning is extremely good at imagerecognition, natural language processing, even the game of Go

Why the recent interest? Flood of available data, increasingcomputational power, growing progress in algorithms

Quantum machine learning

What can quantum computing do for machine learning?

The learner will be quantum, the data may be quantum

Some examples are known of reduction in time complexity:

clustering (Aımeur et al. ’06)principal component analysis (Lloyd et al. ’13)perceptron learning (Wiebe et al. ’16)recommendation systems (Kerenidis & Prakash ’16)

Page 3: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

2/ 23

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves

Practical goal: learn“something” from given data

Recent success: deep learning is extremely good at imagerecognition, natural language processing, even the game of Go

Why the recent interest? Flood of available data, increasingcomputational power, growing progress in algorithms

Quantum machine learning

What can quantum computing do for machine learning?

The learner will be quantum, the data may be quantum

Some examples are known of reduction in time complexity:

clustering (Aımeur et al. ’06)principal component analysis (Lloyd et al. ’13)perceptron learning (Wiebe et al. ’16)recommendation systems (Kerenidis & Prakash ’16)

Page 4: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

2/ 23

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves

Practical goal: learn“something” from given data

Recent success: deep learning is extremely good at imagerecognition, natural language processing, even the game of Go

Why the recent interest? Flood of available data, increasingcomputational power, growing progress in algorithms

Quantum machine learning

What can quantum computing do for machine learning?

The learner will be quantum, the data may be quantum

Some examples are known of reduction in time complexity:

clustering (Aımeur et al. ’06)principal component analysis (Lloyd et al. ’13)perceptron learning (Wiebe et al. ’16)recommendation systems (Kerenidis & Prakash ’16)

Page 5: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

2/ 23

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves

Practical goal: learn“something” from given data

Recent success: deep learning is extremely good at imagerecognition, natural language processing, even the game of Go

Why the recent interest? Flood of available data, increasingcomputational power, growing progress in algorithms

Quantum machine learning

What can quantum computing do for machine learning?

The learner will be quantum, the data may be quantum

Some examples are known of reduction in time complexity:

clustering (Aımeur et al. ’06)principal component analysis (Lloyd et al. ’13)perceptron learning (Wiebe et al. ’16)recommendation systems (Kerenidis & Prakash ’16)

Page 6: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

2/ 23

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves

Practical goal: learn“something” from given data

Recent success: deep learning is extremely good at imagerecognition, natural language processing, even the game of Go

Why the recent interest? Flood of available data, increasingcomputational power, growing progress in algorithms

Quantum machine learning

What can quantum computing do for machine learning?

The learner will be quantum, the data may be quantum

Some examples are known of reduction in time complexity:

clustering (Aımeur et al. ’06)principal component analysis (Lloyd et al. ’13)perceptron learning (Wiebe et al. ’16)recommendation systems (Kerenidis & Prakash ’16)

Page 7: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

2/ 23

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves

Practical goal: learn“something” from given data

Recent success: deep learning is extremely good at imagerecognition, natural language processing, even the game of Go

Why the recent interest? Flood of available data, increasingcomputational power, growing progress in algorithms

Quantum machine learning

What can quantum computing do for machine learning?

The learner will be quantum, the data may be quantum

Some examples are known of reduction in time complexity:

clustering (Aımeur et al. ’06)principal component analysis (Lloyd et al. ’13)perceptron learning (Wiebe et al. ’16)recommendation systems (Kerenidis & Prakash ’16)

Page 8: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

2/ 23

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves

Practical goal: learn“something” from given data

Recent success: deep learning is extremely good at imagerecognition, natural language processing, even the game of Go

Why the recent interest? Flood of available data, increasingcomputational power, growing progress in algorithms

Quantum machine learning

What can quantum computing do for machine learning?

The learner will be quantum, the data may be quantum

Some examples are known of reduction in time complexity:

clustering (Aımeur et al. ’06)principal component analysis (Lloyd et al. ’13)perceptron learning (Wiebe et al. ’16)recommendation systems (Kerenidis & Prakash ’16)

Page 9: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

2/ 23

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves

Practical goal: learn“something” from given data

Recent success: deep learning is extremely good at imagerecognition, natural language processing, even the game of Go

Why the recent interest? Flood of available data, increasingcomputational power, growing progress in algorithms

Quantum machine learning

What can quantum computing do for machine learning?

The learner will be quantum, the data may be quantum

Some examples are known of reduction in time complexity:

clustering (Aımeur et al. ’06)principal component analysis (Lloyd et al. ’13)perceptron learning (Wiebe et al. ’16)recommendation systems (Kerenidis & Prakash ’16)

Page 10: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

3/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target concept c : some function c ∈ C (Unknown)

Distribution D : 0, 1n → [0, 1] (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D

Page 11: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

3/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target concept c : some function c ∈ C (Unknown)

Distribution D : 0, 1n → [0, 1] (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D

Page 12: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

3/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target concept c : some function c ∈ C (Unknown)

Distribution D : 0, 1n → [0, 1] (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D

Page 13: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

3/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target concept c : some function c ∈ C (Unknown)

Distribution D : 0, 1n → [0, 1] (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D

Page 14: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

3/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target concept c : some function c ∈ C (Unknown)

Distribution D : 0, 1n → [0, 1] (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D

Page 15: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

4/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target concept c : some function c ∈ C. (Unknown)

Distribution D : 0, 1n → [0, 1]. (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D

Page 16: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

4/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target concept c : some function c ∈ C. (Unknown)

Distribution D : 0, 1n → [0, 1]. (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D

Page 17: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

4/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target concept c : some function c ∈ C. (Unknown)

Distribution D : 0, 1n → [0, 1]. (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D

Page 18: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

4/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target concept c : some function c ∈ C. (Unknown)

Distribution D : 0, 1n → [0, 1]. (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D

Page 19: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

5/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target concept c : some function c ∈ C. (Unknown)

Distribution D : 0, 1n → [0, 1]. (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D.

Formally: A theory of the learnable (L.G. Valiant’84)

Using i.i.d. labeled examples, learner for C should outputhypothesis h that is Probably Approximately Correct

Error of h w.r.t. target c : errD(c , h) = Prx∼D [c(x) 6= h(x)]

An algorithm (ε, δ)-PAC-learns C if:

∀c ∈ C ∀D : Pr[ errD(c , h) ≤ ε︸ ︷︷ ︸Approximately Correct

] ≥ 1− δ︸ ︷︷ ︸Probably

Page 20: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

5/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target concept c : some function c ∈ C. (Unknown)

Distribution D : 0, 1n → [0, 1]. (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D.

Formally: A theory of the learnable (L.G. Valiant’84)

Using i.i.d. labeled examples, learner for C should outputhypothesis h that is Probably Approximately Correct

Error of h w.r.t. target c : errD(c , h) = Prx∼D [c(x) 6= h(x)]

An algorithm (ε, δ)-PAC-learns C if:

∀c ∈ C ∀D : Pr[ errD(c , h) ≤ ε︸ ︷︷ ︸Approximately Correct

] ≥ 1− δ︸ ︷︷ ︸Probably

Page 21: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

5/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target concept c : some function c ∈ C. (Unknown)

Distribution D : 0, 1n → [0, 1]. (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D.

Formally: A theory of the learnable (L.G. Valiant’84)

Using i.i.d. labeled examples, learner for C should outputhypothesis h that is Probably Approximately Correct

Error of h w.r.t. target c : errD(c , h) = Prx∼D [c(x) 6= h(x)]

An algorithm (ε, δ)-PAC-learns C if:

∀c ∈ C ∀D : Pr[ errD(c , h) ≤ ε︸ ︷︷ ︸Approximately Correct

] ≥ 1− δ︸ ︷︷ ︸Probably

Page 22: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

5/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target concept c : some function c ∈ C. (Unknown)

Distribution D : 0, 1n → [0, 1]. (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D.

Formally: A theory of the learnable (L.G. Valiant’84)

Using i.i.d. labeled examples, learner for C should outputhypothesis h that is Probably Approximately Correct

Error of h w.r.t. target c : errD(c , h) = Prx∼D [c(x) 6= h(x)]

An algorithm (ε, δ)-PAC-learns C if:

∀c ∈ C ∀D : Pr[ errD(c , h) ≤ ε︸ ︷︷ ︸Approximately Correct

] ≥ 1− δ︸ ︷︷ ︸Probably

Page 23: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

6/ 23

Complexity of learning

Recap

Concept: some function c : 0, 1n → 0, 1Concept class C: set of concepts

An algorithm (ε, δ)-PAC-learns C if:

∀c ∈ C ∀D : Pr[ errD(c , h) ≤ ε︸ ︷︷ ︸Approximately Correct

] ≥ 1− δ︸ ︷︷ ︸Probably

How to measure the efficiency of the learning algorithm?

Sample complexity: number of labeled examples used by learner

Time complexity: number of time-steps used by learner

This talk: focus on sample complexity

No need for complexity-theoretic assumptions

No need to worry about the format of hypothesis h

Page 24: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

6/ 23

Complexity of learning

Recap

Concept: some function c : 0, 1n → 0, 1Concept class C: set of concepts

An algorithm (ε, δ)-PAC-learns C if:

∀c ∈ C ∀D : Pr[ errD(c , h) ≤ ε︸ ︷︷ ︸Approximately Correct

] ≥ 1− δ︸ ︷︷ ︸Probably

How to measure the efficiency of the learning algorithm?

Sample complexity: number of labeled examples used by learner

Time complexity: number of time-steps used by learner

This talk: focus on sample complexity

No need for complexity-theoretic assumptions

No need to worry about the format of hypothesis h

Page 25: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

6/ 23

Complexity of learning

Recap

Concept: some function c : 0, 1n → 0, 1Concept class C: set of concepts

An algorithm (ε, δ)-PAC-learns C if:

∀c ∈ C ∀D : Pr[ errD(c , h) ≤ ε︸ ︷︷ ︸Approximately Correct

] ≥ 1− δ︸ ︷︷ ︸Probably

How to measure the efficiency of the learning algorithm?

Sample complexity: number of labeled examples used by learner

Time complexity: number of time-steps used by learner

This talk: focus on sample complexity

No need for complexity-theoretic assumptions

No need to worry about the format of hypothesis h

Page 26: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

6/ 23

Complexity of learning

Recap

Concept: some function c : 0, 1n → 0, 1Concept class C: set of concepts

An algorithm (ε, δ)-PAC-learns C if:

∀c ∈ C ∀D : Pr[ errD(c , h) ≤ ε︸ ︷︷ ︸Approximately Correct

] ≥ 1− δ︸ ︷︷ ︸Probably

How to measure the efficiency of the learning algorithm?

Sample complexity: number of labeled examples used by learner

Time complexity: number of time-steps used by learner

This talk: focus on sample complexity

No need for complexity-theoretic assumptions

No need to worry about the format of hypothesis h

Page 27: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

6/ 23

Complexity of learning

Recap

Concept: some function c : 0, 1n → 0, 1Concept class C: set of concepts

An algorithm (ε, δ)-PAC-learns C if:

∀c ∈ C ∀D : Pr[ errD(c , h) ≤ ε︸ ︷︷ ︸Approximately Correct

] ≥ 1− δ︸ ︷︷ ︸Probably

How to measure the efficiency of the learning algorithm?

Sample complexity: number of labeled examples used by learner

Time complexity: number of time-steps used by learner

This talk: focus on sample complexity

No need for complexity-theoretic assumptions

No need to worry about the format of hypothesis h

Page 28: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

6/ 23

Complexity of learning

Recap

Concept: some function c : 0, 1n → 0, 1Concept class C: set of concepts

An algorithm (ε, δ)-PAC-learns C if:

∀c ∈ C ∀D : Pr[ errD(c , h) ≤ ε︸ ︷︷ ︸Approximately Correct

] ≥ 1− δ︸ ︷︷ ︸Probably

How to measure the efficiency of the learning algorithm?

Sample complexity: number of labeled examples used by learner

Time complexity: number of time-steps used by learner

This talk: focus on sample complexity

No need for complexity-theoretic assumptions

No need to worry about the format of hypothesis h

Page 29: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

7/ 23

Vapnik and Chervonenkis (VC) dimension

VC dimension of C ⊆ c : 0, 1n → 0, 1

Let M be the |C| × 2n Boolean matrix whose c-th row is the truth tableof concept c : 0, 1n → 0, 1VC-dim(C): largest d s.t. the |C| × d rectangle in M contains 0, 1dThese d column indices are shattered by C

Page 30: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

7/ 23

Vapnik and Chervonenkis (VC) dimension

VC dimension of C ⊆ c : 0, 1n → 0, 1Let M be the |C| × 2n Boolean matrix whose c-th row is the truth tableof concept c : 0, 1n → 0, 1

VC-dim(C): largest d s.t. the |C| × d rectangle in M contains 0, 1dThese d column indices are shattered by C

Page 31: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

7/ 23

Vapnik and Chervonenkis (VC) dimension

VC dimension of C ⊆ c : 0, 1n → 0, 1Let M be the |C| × 2n Boolean matrix whose c-th row is the truth tableof concept c : 0, 1n → 0, 1VC-dim(C): largest d s.t. the |C| × d rectangle in M contains 0, 1d

These d column indices are shattered by C

Page 32: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

7/ 23

Vapnik and Chervonenkis (VC) dimension

VC dimension of C ⊆ c : 0, 1n → 0, 1Let M be the |C| × 2n Boolean matrix whose c-th row is the truth tableof concept c : 0, 1n → 0, 1VC-dim(C): largest d s.t. the |C| × d rectangle in M contains 0, 1dThese d column indices are shattered by C

Page 33: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

8/ 23

Vapnik and Chervonenkis (VC) dimension

VC dimension of C ⊆ c : 0, 1n → 0, 1M is the |C| × 2n Boolean matrix whose c-th row is the truth table of c

VC-dim(C): largest d s.t. the |C| × d rectangle in M contains 0, 1dThese d column indices are shattered by C

Table : VC-dim(C) = 2

Concepts Truth tablec1 0 1 0 1c2 0 1 1 0c3 1 0 0 1c4 1 0 1 0c5 1 1 0 1c6 0 1 1 1c7 0 0 1 1c8 0 1 0 0c9 1 1 1 1

Page 34: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

9/ 23

Vapnik and Chervonenkis (VC) dimension

VC dimension of C ⊆ c : 0, 1n → 0, 1M is the |C| × 2n Boolean matrix whose c-th row is the truth table of c

VC-dim(C): largest d s.t. the |C| × d rectangle in M contains 0, 1dThese d column indices are shattered by C

Table : VC-dim(C) = 2

Concepts Truth tablec1 0 1 0 1c2 0 1 1 0c3 1 0 0 1c4 1 0 1 0c5 1 1 0 1c6 0 1 1 1c7 0 0 1 1c8 0 1 0 0c9 1 1 1 1

Table : VC-dim(C) = 3

Concepts Truth tablec1 0 1 1 0c2 1 0 0 1c3 0 0 0 0c4 1 1 0 1c5 1 0 1 0c6 0 1 1 1c7 0 0 1 1c8 0 1 0 1c9 0 1 0 0

Page 35: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

10/ 23

VC dimension characterizes PAC sample complexity

VC dimension of CM is the |C| × 2n Boolean matrix whose c-th row is the truth table of c

VC-dim(C): largest d s.t. the |C| × d rectangle in M contains 0, 1dThese d column indices are shattered by C

Fundamental theorem of PAC learning

Suppose VC-dim(C) = d

Blumer-Ehrenfeucht-Haussler-Warmuth’86:

every (ε, δ)-PAC learner for C needs Ω(

dε + log(1/δ)

ε

)examples

Hanneke’16: there exists an (ε, δ)-PAC learner for C using

O(

dε + log(1/δ)

ε

)examples

Page 36: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

10/ 23

VC dimension characterizes PAC sample complexity

VC dimension of CM is the |C| × 2n Boolean matrix whose c-th row is the truth table of c

VC-dim(C): largest d s.t. the |C| × d rectangle in M contains 0, 1dThese d column indices are shattered by C

Fundamental theorem of PAC learning

Suppose VC-dim(C) = d

Blumer-Ehrenfeucht-Haussler-Warmuth’86:

every (ε, δ)-PAC learner for C needs Ω(

dε + log(1/δ)

ε

)examples

Hanneke’16: there exists an (ε, δ)-PAC learner for C using

O(

dε + log(1/δ)

ε

)examples

Page 37: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

10/ 23

VC dimension characterizes PAC sample complexity

VC dimension of CM is the |C| × 2n Boolean matrix whose c-th row is the truth table of c

VC-dim(C): largest d s.t. the |C| × d rectangle in M contains 0, 1dThese d column indices are shattered by C

Fundamental theorem of PAC learning

Suppose VC-dim(C) = d

Blumer-Ehrenfeucht-Haussler-Warmuth’86:

every (ε, δ)-PAC learner for C needs Ω(

dε + log(1/δ)

ε

)examples

Hanneke’16: there exists an (ε, δ)-PAC learner for C using

O(

dε + log(1/δ)

ε

)examples

Page 38: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

10/ 23

VC dimension characterizes PAC sample complexity

VC dimension of CM is the |C| × 2n Boolean matrix whose c-th row is the truth table of c

VC-dim(C): largest d s.t. the |C| × d rectangle in M contains 0, 1dThese d column indices are shattered by C

Fundamental theorem of PAC learning

Suppose VC-dim(C) = d

Blumer-Ehrenfeucht-Haussler-Warmuth’86:

every (ε, δ)-PAC learner for C needs Ω(

dε + log(1/δ)

ε

)examples

Hanneke’16: there exists an (ε, δ)-PAC learner for C using

O(

dε + log(1/δ)

ε

)examples

Page 39: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

11/ 23

Quantum PAC learning

(Bshouty-Jackson’95): Quantum generalization of classical PAC

Learner is quantum:

Data is quantum: Quantum example is a superposition∑x∈0,1n

√D(x) |x , c(x)〉

Measuring this state gives (x , c(x)) with probability D(x),so quantum examples are at least as powerful as classical

Page 40: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

11/ 23

Quantum PAC learning

(Bshouty-Jackson’95): Quantum generalization of classical PAC

Learner is quantum:

Data is quantum: Quantum example is a superposition∑x∈0,1n

√D(x) |x , c(x)〉

Measuring this state gives (x , c(x)) with probability D(x),so quantum examples are at least as powerful as classical

Page 41: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

11/ 23

Quantum PAC learning

(Bshouty-Jackson’95): Quantum generalization of classical PAC

Learner is quantum:

Data is quantum: Quantum example is a superposition∑x∈0,1n

√D(x) |x , c(x)〉

Measuring this state gives (x , c(x)) with probability D(x),so quantum examples are at least as powerful as classical

Page 42: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

11/ 23

Quantum PAC learning

(Bshouty-Jackson’95): Quantum generalization of classical PAC

Learner is quantum:

Data is quantum: Quantum example is a superposition∑x∈0,1n

√D(x) |x , c(x)〉

Measuring this state gives (x , c(x)) with probability D(x),

so quantum examples are at least as powerful as classical

Page 43: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

11/ 23

Quantum PAC learning

(Bshouty-Jackson’95): Quantum generalization of classical PAC

Learner is quantum:

Data is quantum: Quantum example is a superposition∑x∈0,1n

√D(x) |x , c(x)〉

Measuring this state gives (x , c(x)) with probability D(x),so quantum examples are at least as powerful as classical

Page 44: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

12/ 23

Classical vs. Quantum PAC learning algorithm!

Question

Can quantum sample complexity be significantly smaller than classical?

Page 45: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

13/ 23

Quantum PAC learning

Quantum Data

Quantum example: |Ec,D 〉 =∑

x∈0,1n√

D(x) |x , c(x)〉Quantum examples are at least as powerful as classical examples

Quantum is indeed more powerful for learning! (for uniform distribution)

Sample complexity: Learning class of linear functionsClassical: Ω(n) classical examples neededQuantum: O(1) quantum examples suffice (Bernstein-Vazirani’93)

Time complexity: Learning DNFsClassical: Best known upper bound is quasi-poly. time (Verbeugt’90)Quantum: Polynomial-time (Bshouty-Jackson’95)

Page 46: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

13/ 23

Quantum PAC learning

Quantum Data

Quantum example: |Ec,D 〉 =∑

x∈0,1n√

D(x) |x , c(x)〉Quantum examples are at least as powerful as classical examples

Quantum is indeed more powerful for learning! (for uniform distribution)

Sample complexity: Learning class of linear functions

Classical: Ω(n) classical examples neededQuantum: O(1) quantum examples suffice (Bernstein-Vazirani’93)

Time complexity: Learning DNFsClassical: Best known upper bound is quasi-poly. time (Verbeugt’90)Quantum: Polynomial-time (Bshouty-Jackson’95)

Page 47: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

13/ 23

Quantum PAC learning

Quantum Data

Quantum example: |Ec,D 〉 =∑

x∈0,1n√

D(x) |x , c(x)〉Quantum examples are at least as powerful as classical examples

Quantum is indeed more powerful for learning! (for uniform distribution)

Sample complexity: Learning class of linear functionsClassical: Ω(n) classical examples neededQuantum: O(1) quantum examples suffice (Bernstein-Vazirani’93)

Time complexity: Learning DNFsClassical: Best known upper bound is quasi-poly. time (Verbeugt’90)Quantum: Polynomial-time (Bshouty-Jackson’95)

Page 48: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

13/ 23

Quantum PAC learning

Quantum Data

Quantum example: |Ec,D 〉 =∑

x∈0,1n√

D(x) |x , c(x)〉Quantum examples are at least as powerful as classical examples

Quantum is indeed more powerful for learning! (for uniform distribution)

Sample complexity: Learning class of linear functionsClassical: Ω(n) classical examples neededQuantum: O(1) quantum examples suffice (Bernstein-Vazirani’93)

Time complexity: Learning DNFs

Classical: Best known upper bound is quasi-poly. time (Verbeugt’90)Quantum: Polynomial-time (Bshouty-Jackson’95)

Page 49: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

13/ 23

Quantum PAC learning

Quantum Data

Quantum example: |Ec,D 〉 =∑

x∈0,1n√

D(x) |x , c(x)〉Quantum examples are at least as powerful as classical examples

Quantum is indeed more powerful for learning! (for uniform distribution)

Sample complexity: Learning class of linear functionsClassical: Ω(n) classical examples neededQuantum: O(1) quantum examples suffice (Bernstein-Vazirani’93)

Time complexity: Learning DNFsClassical: Best known upper bound is quasi-poly. time (Verbeugt’90)Quantum: Polynomial-time (Bshouty-Jackson’95)

Page 50: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

14/ 23

Quantum PAC learning

Quantum Data

Quantum example: |Ec,D 〉 =∑

x∈0,1n√

D(x) |x , c(x)〉Quantum examples are at least as powerful as classical examples

Quantum is indeed more powerful for learning! (for a fixed distribution)

Learning class of linear functions under uniform D:Classical: Ω(n) classical examples neededQuantum: O(1) quantum examples suffice (Bernstein-Vazirani’93)

Learning DNF under uniform D:Classical: Best known upper bound is quasi-poly. time (Verbeugt’90)Quantum Polynomial-time (Bshouty-Jackson’95)

But in the PAC model,learner has to succeed for all D!

Page 51: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

15/ 23

Quantum sample complexity

Quantum upper bound

Classical upper bound O(

dε + log(1/δ)

ε

)carries over to quantum

Best known quantum lower bounds

Atici & Servedio’04: lower bound Ω(√

dε + d + log(1/δ)

ε

)Zhang’10: improved first term to d1−η

ε for all η > 0

Page 52: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

15/ 23

Quantum sample complexity

Quantum upper bound

Classical upper bound O(

dε + log(1/δ)

ε

)carries over to quantum

Best known quantum lower bounds

Atici & Servedio’04: lower bound Ω(√

dε + d + log(1/δ)

ε

)

Zhang’10: improved first term to d1−η

ε for all η > 0

Page 53: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

15/ 23

Quantum sample complexity

Quantum upper bound

Classical upper bound O(

dε + log(1/δ)

ε

)carries over to quantum

Best known quantum lower bounds

Atici & Servedio’04: lower bound Ω(√

dε + d + log(1/δ)

ε

)Zhang’10: improved first term to d1−η

ε for all η > 0

Page 54: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

16/ 23

Quantum sample complexity = Classical sample complexity

Quantum upper bound

Classical upper bound O(

dε + log(1/δ)

ε

)carries over to quantum

Best known quantum lower bounds

Atici & Servedio’04: lower bound Ω(√

dε + d + log(1/δ)

ε

)Zhang’10 improved first term to d1−η

ε for all η > 0

Our result: Tight lower bound

We show: Ω(

dε + log(1/δ)

ε

)quantum examples are necessary

Two proof approaches

Information theory: conceptually simple, nearly-tight bounds

Optimal measurement: tight bounds, some messy calculations

Page 55: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

16/ 23

Quantum sample complexity = Classical sample complexity

Quantum upper bound

Classical upper bound O(

dε + log(1/δ)

ε

)carries over to quantum

Best known quantum lower bounds

Atici & Servedio’04: lower bound Ω(√

dε + d + log(1/δ)

ε

)Zhang’10 improved first term to d1−η

ε for all η > 0

Our result: Tight lower bound

We show: Ω(

dε + log(1/δ)

ε

)quantum examples are necessary

Two proof approaches

Information theory: conceptually simple, nearly-tight bounds

Optimal measurement: tight bounds, some messy calculations

Page 56: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

16/ 23

Quantum sample complexity = Classical sample complexity

Quantum upper bound

Classical upper bound O(

dε + log(1/δ)

ε

)carries over to quantum

Best known quantum lower bounds

Atici & Servedio’04: lower bound Ω(√

dε + d + log(1/δ)

ε

)Zhang’10 improved first term to d1−η

ε for all η > 0

Our result: Tight lower bound

We show: Ω(

dε + log(1/δ)

ε

)quantum examples are necessary

Two proof approaches

Information theory: conceptually simple, nearly-tight bounds

Optimal measurement: tight bounds, some messy calculations

Page 57: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

17/ 23

Proof approach

1 First, we consider the problem of probably exactly learning:quantum learner should identify the concept

2 Here, quantum learner is given one out of |C| quantum states.Identify the target concept using copies of the quantum state

3 Quantum state identification has been well-studied

4 We’ll get to probably approximately learning soon!

Page 58: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

17/ 23

Proof approach

1 First, we consider the problem of probably exactly learning:quantum learner should identify the concept

2 Here, quantum learner is given one out of |C| quantum states.Identify the target concept using copies of the quantum state

3 Quantum state identification has been well-studied

4 We’ll get to probably approximately learning soon!

Page 59: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

17/ 23

Proof approach

1 First, we consider the problem of probably exactly learning:quantum learner should identify the concept

2 Here, quantum learner is given one out of |C| quantum states.Identify the target concept using copies of the quantum state

3 Quantum state identification has been well-studied

4 We’ll get to probably approximately learning soon!

Page 60: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

17/ 23

Proof approach

1 First, we consider the problem of probably exactly learning:quantum learner should identify the concept

2 Here, quantum learner is given one out of |C| quantum states.Identify the target concept using copies of the quantum state

3 Quantum state identification has been well-studied

4 We’ll get to probably approximately learning soon!

Page 61: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz . Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the optimal success probability, thenPopt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Let VC-dim(C) = d . Suppose s0, . . . , sd is shattered by C. Fix

D(s0) = 1− ε, D(si ) = ε/d on s1, . . . , sdLet k = Ω(d) and E : 0, 1k → 0, 1d be an error-correcting code

Pick 2k codeword concepts czz∈0,1k ⊆ C:

cz(s0) = 0, cz(si ) = E (z)i ∀ i ∈ b

Page 62: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz . Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the optimal success probability, thenPopt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Let VC-dim(C) = d . Suppose s0, . . . , sd is shattered by C. Fix

D(s0) = 1− ε, D(si ) = ε/d on s1, . . . , sdLet k = Ω(d) and E : 0, 1k → 0, 1d be an error-correcting code

Pick 2k codeword concepts czz∈0,1k ⊆ C:

cz(s0) = 0, cz(si ) = E (z)i ∀ i ∈ b

Page 63: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz . Goal: identify z

Optimal measurement could be quite complicated,

but we can always use the Pretty Good Measurement

Crucial property: if Popt is the optimal success probability, thenPopt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Let VC-dim(C) = d . Suppose s0, . . . , sd is shattered by C. Fix

D(s0) = 1− ε, D(si ) = ε/d on s1, . . . , sdLet k = Ω(d) and E : 0, 1k → 0, 1d be an error-correcting code

Pick 2k codeword concepts czz∈0,1k ⊆ C:

cz(s0) = 0, cz(si ) = E (z)i ∀ i ∈ b

Page 64: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz . Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the optimal success probability, thenPopt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Let VC-dim(C) = d . Suppose s0, . . . , sd is shattered by C. Fix

D(s0) = 1− ε, D(si ) = ε/d on s1, . . . , sdLet k = Ω(d) and E : 0, 1k → 0, 1d be an error-correcting code

Pick 2k codeword concepts czz∈0,1k ⊆ C:

cz(s0) = 0, cz(si ) = E (z)i ∀ i ∈ b

Page 65: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz . Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the optimal success probability, then

Popt ≥ Ppgm ≥ P2opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Let VC-dim(C) = d . Suppose s0, . . . , sd is shattered by C. Fix

D(s0) = 1− ε, D(si ) = ε/d on s1, . . . , sdLet k = Ω(d) and E : 0, 1k → 0, 1d be an error-correcting code

Pick 2k codeword concepts czz∈0,1k ⊆ C:

cz(s0) = 0, cz(si ) = E (z)i ∀ i ∈ b

Page 66: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz . Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the optimal success probability, thenPopt ≥ Ppgm

≥ P2opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Let VC-dim(C) = d . Suppose s0, . . . , sd is shattered by C. Fix

D(s0) = 1− ε, D(si ) = ε/d on s1, . . . , sdLet k = Ω(d) and E : 0, 1k → 0, 1d be an error-correcting code

Pick 2k codeword concepts czz∈0,1k ⊆ C:

cz(s0) = 0, cz(si ) = E (z)i ∀ i ∈ b

Page 67: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz . Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the optimal success probability, thenPopt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Let VC-dim(C) = d . Suppose s0, . . . , sd is shattered by C. Fix

D(s0) = 1− ε, D(si ) = ε/d on s1, . . . , sdLet k = Ω(d) and E : 0, 1k → 0, 1d be an error-correcting code

Pick 2k codeword concepts czz∈0,1k ⊆ C:

cz(s0) = 0, cz(si ) = E (z)i ∀ i ∈ b

Page 68: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz . Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the optimal success probability, thenPopt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Let VC-dim(C) = d . Suppose s0, . . . , sd is shattered by C. Fix

D(s0) = 1− ε, D(si ) = ε/d on s1, . . . , sdLet k = Ω(d) and E : 0, 1k → 0, 1d be an error-correcting code

Pick 2k codeword concepts czz∈0,1k ⊆ C:

cz(s0) = 0, cz(si ) = E (z)i ∀ i ∈ b

Page 69: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz . Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the optimal success probability, thenPopt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Let VC-dim(C) = d . Suppose s0, . . . , sd is shattered by C. Fix

D(s0) = 1− ε, D(si ) = ε/d on s1, . . . , sdLet k = Ω(d) and E : 0, 1k → 0, 1d be an error-correcting code

Pick 2k codeword concepts czz∈0,1k ⊆ C:

cz(s0) = 0, cz(si ) = E (z)i ∀ i ∈ b

Page 70: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz . Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the optimal success probability, thenPopt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Let VC-dim(C) = d . Suppose s0, . . . , sd is shattered by C.

Fix

D(s0) = 1− ε, D(si ) = ε/d on s1, . . . , sdLet k = Ω(d) and E : 0, 1k → 0, 1d be an error-correcting code

Pick 2k codeword concepts czz∈0,1k ⊆ C:

cz(s0) = 0, cz(si ) = E (z)i ∀ i ∈ b

Page 71: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz . Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the optimal success probability, thenPopt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Let VC-dim(C) = d . Suppose s0, . . . , sd is shattered by C. Fix

D(s0) = 1− ε, D(si ) = ε/d on s1, . . . , sd

Let k = Ω(d) and E : 0, 1k → 0, 1d be an error-correcting code

Pick 2k codeword concepts czz∈0,1k ⊆ C:

cz(s0) = 0, cz(si ) = E (z)i ∀ i ∈ b

Page 72: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz . Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the optimal success probability, thenPopt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Let VC-dim(C) = d . Suppose s0, . . . , sd is shattered by C. Fix

D(s0) = 1− ε, D(si ) = ε/d on s1, . . . , sdLet k = Ω(d) and E : 0, 1k → 0, 1d be an error-correcting code

Pick 2k codeword concepts czz∈0,1k ⊆ C:

cz(s0) = 0, cz(si ) = E (z)i ∀ i ∈ b

Page 73: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz . Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the optimal success probability, thenPopt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Let VC-dim(C) = d . Suppose s0, . . . , sd is shattered by C. Fix

D(s0) = 1− ε, D(si ) = ε/d on s1, . . . , sdLet k = Ω(d) and E : 0, 1k → 0, 1d be an error-correcting code

Pick 2k codeword concepts czz∈0,1k ⊆ C:

cz(s0) = 0, cz(si ) = E (z)i ∀ i ∈ b

Page 74: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz . Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the optimal success probability, thenPopt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Let VC-dim(C) = d . Suppose s0, . . . , sd is shattered by C. Fix

D(s0) = 1− ε, D(si ) = ε/d on s1, . . . , sdLet k = Ω(d) and E : 0, 1k → 0, 1d be an error-correcting code

Pick 2k codeword concepts czz∈0,1k ⊆ C:

cz(s0) = 0, cz(si ) = E (z)i ∀ i ∈ b

Page 75: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

19/ 23

Pick concepts cz ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Suppose VC (C) = d + 1 and s0, . . . , sd is shattered by C, i.e.,|C| × (d + 1) rectangle of s0, . . . , sd contains 0, 1d+1

Concepts Truth tablec ∈ C s0 s1 · · · sd−1 sd · · · · · ·

c1 0 0 · · · 0 0 · · · · · ·c2 0 0 · · · 1 0 · · · · · ·c3 0 0 · · · 1 1 · · · · · ·...

......

. . ....

... · · · · · ·c2d 0 1 · · · 1 1 · · · · · ·

c2d+1 1 0 · · · 0 1 · · · · · ·...

......

. . ....

... · · · · · ·c2d+1 1 1 · · · 1 1 · · · · · ·

......

.... . .

...... · · · · · ·

c(s0) = 0

Among c1, . . . , c2d, pick 2k concepts that correspond to codewords ofE : 0, 1k → 0, 1d on s1, . . . , sd

Page 76: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

19/ 23

Pick concepts cz ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Suppose VC (C) = d + 1 and s0, . . . , sd is shattered by C, i.e.,|C| × (d + 1) rectangle of s0, . . . , sd contains 0, 1d+1

Concepts Truth tablec ∈ C s0 s1 · · · sd−1 sd · · · · · ·

c1 0 0 · · · 0 0 · · · · · ·c2 0 0 · · · 1 0 · · · · · ·c3 0 0 · · · 1 1 · · · · · ·...

......

. . ....

... · · · · · ·c2d 0 1 · · · 1 1 · · · · · ·

c2d+1 1 0 · · · 0 1 · · · · · ·...

......

. . ....

... · · · · · ·c2d+1 1 1 · · · 1 1 · · · · · ·

......

.... . .

...... · · · · · ·

c(s0) = 0

Among c1, . . . , c2d, pick 2k concepts that correspond to codewords ofE : 0, 1k → 0, 1d on s1, . . . , sd

Page 77: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

19/ 23

Pick concepts cz ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Suppose VC (C) = d + 1 and s0, . . . , sd is shattered by C, i.e.,|C| × (d + 1) rectangle of s0, . . . , sd contains 0, 1d+1

Concepts Truth tablec ∈ C s0 s1 · · · sd−1 sd · · · · · ·

c1 0 0 · · · 0 0 · · · · · ·c2 0 0 · · · 1 0 · · · · · ·c3 0 0 · · · 1 1 · · · · · ·...

......

. . ....

... · · · · · ·c2d 0 1 · · · 1 1 · · · · · ·

c2d+1 1 0 · · · 0 1 · · · · · ·...

......

. . ....

... · · · · · ·c2d+1 1 1 · · · 1 1 · · · · · ·

......

.... . .

...... · · · · · ·

c(s0) = 0

Among c1, . . . , c2d, pick 2k concepts that correspond to codewords ofE : 0, 1k → 0, 1d on s1, . . . , sd

Page 78: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

20/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz . Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

If Popt is the optimal success probability, then Popt ≥ Ppgm ≥ P2opt

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Let VC-dim(C) = d . Suppose s0, . . . , sd is shattered by C. FixD : D(s0) = 1− ε, D(si ) = ε/d on s1, . . . , sdLet k = Ω(d) and E : 0, 1k → 0, 1d be an error-correcting code

Pick 2k concepts czz∈0,1k ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Learning cz approximately (wrt D) is equivalent to identifying z!

Page 79: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

20/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz . Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

If Popt is the optimal success probability, then Popt ≥ Ppgm ≥ P2opt

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Let VC-dim(C) = d . Suppose s0, . . . , sd is shattered by C. FixD : D(s0) = 1− ε, D(si ) = ε/d on s1, . . . , sdLet k = Ω(d) and E : 0, 1k → 0, 1d be an error-correcting code

Pick 2k concepts czz∈0,1k ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Learning cz approximately (wrt D) is equivalent to identifying z!

Page 80: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

21/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner thatidentifies z from |ψcz 〉 = |Ecz ,D 〉

⊗T w.p. ≥ 1− δGoal: Show T ≥ d/ε

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Recall k = Ω(d) because we used a good ECC

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 81: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

21/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner thatidentifies z from |ψcz 〉 = |Ecz ,D 〉

⊗T w.p. ≥ 1− δ

Goal: Show T ≥ d/ε

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Recall k = Ω(d) because we used a good ECC

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 82: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

21/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner thatidentifies z from |ψcz 〉 = |Ecz ,D 〉

⊗T w.p. ≥ 1− δGoal: Show T ≥ d/ε

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Recall k = Ω(d) because we used a good ECC

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 83: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

21/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner thatidentifies z from |ψcz 〉 = |Ecz ,D 〉

⊗T w.p. ≥ 1− δGoal: Show T ≥ d/ε

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Recall k = Ω(d) because we used a good ECC

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 84: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

21/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner thatidentifies z from |ψcz 〉 = |Ecz ,D 〉

⊗T w.p. ≥ 1− δGoal: Show T ≥ d/ε

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm

≥ P2opt ≥ (1− δ)2

Recall k = Ω(d) because we used a good ECC

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 85: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

21/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner thatidentifies z from |ψcz 〉 = |Ecz ,D 〉

⊗T w.p. ≥ 1− δGoal: Show T ≥ d/ε

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Recall k = Ω(d) because we used a good ECC

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 86: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

21/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner thatidentifies z from |ψcz 〉 = |Ecz ,D 〉

⊗T w.p. ≥ 1− δGoal: Show T ≥ d/ε

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Recall k = Ω(d) because we used a good ECC

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 87: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

21/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner thatidentifies z from |ψcz 〉 = |Ecz ,D 〉

⊗T w.p. ≥ 1− δGoal: Show T ≥ d/ε

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Recall k = Ω(d) because we used a good ECC

Ppgm ≤

· · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 88: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

21/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner thatidentifies z from |ψcz 〉 = |Ecz ,D 〉

⊗T w.p. ≥ 1− δGoal: Show T ≥ d/ε

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Recall k = Ω(d) because we used a good ECC

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 89: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

21/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner thatidentifies z from |ψcz 〉 = |Ecz ,D 〉

⊗T w.p. ≥ 1− δGoal: Show T ≥ d/ε

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Recall k = Ω(d) because we used a good ECC

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 90: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

22/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner that identifiesz from |ψcz 〉 = |Ecz ,D 〉

⊗T with probability ≥ 1− δ

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 91: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

22/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner that identifiesz from |ψcz 〉 = |Ecz ,D 〉

⊗T with probability ≥ 1− δ

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 92: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

22/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner that identifiesz from |ψcz 〉 = |Ecz ,D 〉

⊗T with probability ≥ 1− δ

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 93: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

23/ 23

Conclusion and future work

Further results

Agnostic learning: No quantum bounds known before (unlike PACmodel). Showed quantum examples do not reduce sample complexity

Also studied the model with random classification noise and showthat quantum examples are no better than classical examples

Future work

Quantum machine learning is still young!

Theoretically, one could consider more optimistic PAC-like modelswhere learner need not succeed ∀c ∈ C and ∀D

Efficient quantum PAC learnability of AC0 under uniform D?

Page 94: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

23/ 23

Conclusion and future work

Further results

Agnostic learning: No quantum bounds known before (unlike PACmodel).

Showed quantum examples do not reduce sample complexity

Also studied the model with random classification noise and showthat quantum examples are no better than classical examples

Future work

Quantum machine learning is still young!

Theoretically, one could consider more optimistic PAC-like modelswhere learner need not succeed ∀c ∈ C and ∀D

Efficient quantum PAC learnability of AC0 under uniform D?

Page 95: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

23/ 23

Conclusion and future work

Further results

Agnostic learning: No quantum bounds known before (unlike PACmodel). Showed quantum examples do not reduce sample complexity

Also studied the model with random classification noise and showthat quantum examples are no better than classical examples

Future work

Quantum machine learning is still young!

Theoretically, one could consider more optimistic PAC-like modelswhere learner need not succeed ∀c ∈ C and ∀D

Efficient quantum PAC learnability of AC0 under uniform D?

Page 96: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

23/ 23

Conclusion and future work

Further results

Agnostic learning: No quantum bounds known before (unlike PACmodel). Showed quantum examples do not reduce sample complexity

Also studied the model with random classification noise and showthat quantum examples are no better than classical examples

Future work

Quantum machine learning is still young!

Theoretically, one could consider more optimistic PAC-like modelswhere learner need not succeed ∀c ∈ C and ∀D

Efficient quantum PAC learnability of AC0 under uniform D?

Page 97: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

23/ 23

Conclusion and future work

Further results

Agnostic learning: No quantum bounds known before (unlike PACmodel). Showed quantum examples do not reduce sample complexity

Also studied the model with random classification noise and showthat quantum examples are no better than classical examples

Future work

Quantum machine learning is still young!

Theoretically, one could consider more optimistic PAC-like modelswhere learner need not succeed ∀c ∈ C and ∀D

Efficient quantum PAC learnability of AC0 under uniform D?

Page 98: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

23/ 23

Conclusion and future work

Further results

Agnostic learning: No quantum bounds known before (unlike PACmodel). Showed quantum examples do not reduce sample complexity

Also studied the model with random classification noise and showthat quantum examples are no better than classical examples

Future work

Quantum machine learning is still young!

Theoretically, one could consider more optimistic PAC-like modelswhere learner need not succeed ∀c ∈ C and ∀D

Efficient quantum PAC learnability of AC0 under uniform D?

Page 99: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

23/ 23

Conclusion and future work

Further results

Agnostic learning: No quantum bounds known before (unlike PACmodel). Showed quantum examples do not reduce sample complexity

Also studied the model with random classification noise and showthat quantum examples are no better than classical examples

Future work

Quantum machine learning is still young!

Theoretically, one could consider more optimistic PAC-like modelswhere learner need not succeed ∀c ∈ C and ∀D

Efficient quantum PAC learnability of AC0 under uniform D?

Page 100: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

23/ 23

Conclusion and future work

Further results

Agnostic learning: No quantum bounds known before (unlike PACmodel). Showed quantum examples do not reduce sample complexity

Also studied the model with random classification noise and showthat quantum examples are no better than classical examples

Future work

Quantum machine learning is still young!

Theoretically, one could consider more optimistic PAC-like modelswhere learner need not succeed ∀c ∈ C and ∀D

Efficient quantum PAC learnability of AC0 under uniform D?

Page 101: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

24/ 23

Buffer 1: Proof approach via Information theory

Suppose s0, . . . , sd is shattered by C. By definition:∀a ∈ 0, 1d ∃c ∈ C s.t. c(s0) = 0, and c(si ) = ai ∀ i ∈ [d ]

Fix a nasty distribution D:

D(s0) = 1− 4ε, D(si ) = 4ε/d on s1, . . . , sd.Good learner produces hypothesis h s.t.h(si ) = c(si ) = ai for ≥ 3

4 of is

Think of c as uniform d-bit string A, approximated by h ∈ 0, 1dthat depends on examples B = (B1, . . . ,BT )

1 I (A : B) ≥ I (A : h(B)) ≥ Ω(d) [because h ≈ A]2 I (A : B) ≤

∑Ti=1 I (A : Bi ) = T · I (A : B1) [subadditivity]

3 I (A : B1) ≤ 4ε [because prob of useful example is 4ε]

This implies Ω(d) ≤ I (A : B) ≤ 4Tε, hence T = Ω( dε )

For analyzing quantum examples, only step 3 changes:

I (A : B1) ≤ O(ε log(d/ε)) ⇒ T = Ω( dε

1log(d/ε) )

Page 102: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

25/ 23

Buffer 2: Proof approach in detail

Suppose we’re given state |ψi 〉 with prob pi , i = 1, . . . ,m. Goal:learn i

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement.

This has POVM operators

Mi = piρ−1/2|ψi 〉〈ψi |ρ−1/2, where ρ =

∑i pi |ψi 〉〈ψi |

Success probability of PGM: PPGM =∑

i piTr(Mi |ψi 〉〈ψi |)

Crucial property (BK’02): if POPT is the success probablity of theoptimal POVM, then POPT ≥ PPGM ≥ P2

OPT

Let G be the m ×m Gram matrix of the vectors√

pi |ψi 〉,then PPGM =

∑i

√G (i , i)2

Page 103: Optimal Quantum Sample Complexity of Learning Algorithmscomputationalcomplexity.org/Archive/2017/slides/25_AdW.pdf · Recent success: deep learning is extremely good at image recognition,

26/ 23

Buffer 3: Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have PPGM ≥ (1− δ)2

Let G be the 2k × 2k Gram matrix of the vectors√

pz |ψcz 〉, then

PPGM =∑

z

√G (z , z)2

Gxy = g(x ⊕ y). Can diagonalize G using Hadamard transform, and

its eigenvalues will be 2k g(s). This gives√

G∑z

√G (z , z)2 ≤ · · · 4-page calculation · · · ≤

≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)