E M e - ai.vub.ac.be · Abu-Mostafa rnia Califo Institute of echnology T Lecture 7: The C V...

Review of Le ture 6• mH(N) is polynomialif H has a break point k

bottom

1 2 3 4 5 6 . .

1 1 2 2 2 2 2 . .

2 1 3 4 4 4 4 . .

3 1 4 7 8 8 8 . .

N 4 1 5 11 . . . . . . . .

5 1 6 : .

6 1 7 : .

: : : : .

mH(N) ≤k−1∑

︸︷︷︸maximum power is Nk−1

• The VC Inequality

(a) (b) (c)

data setsspace of

Hoeffding Inequality Union Bound VC Bound

PP [ |Ein(g) − Eout(g)| > ǫ ] ≤ 2 M e− 2 ǫ2N

↓ ↓ ↓

↓ ↓ ↓PP [ |Ein(g) − Eout(g)| > ǫ ] ≤ 4 mH(2N) e−

Learning From DataYaser S. Abu-MostafaCalifornia Institute of Te hnologyLe ture 7: The VC Dimension

Sponsored by Calte h's Provost O e, E&AS Division, and IST • Tuesday, April 24, 2012

Outline• The denition• VC dimension of per eptrons• Interpreting the VC dimension• Generalization boundsLearning From Data - Le ture 7 2/24

Denition of VC dimensionThe VC dimension of a hypothesis set H, denoted by dv (H), is

the largest value of N for whi h mH(N) = 2N

the most points H an shatterN ≤ dv (H) =⇒ H an shatter N pointsk > dv (H) =⇒ k is a break point for H

Learning From Data - Le ture 7 3/24

The growth fun tionIn terms of a break point k:

mH(N) ≤k−1∑

In terms of the VC dimension dv :mH(N) ≤

dv ∑

︸︷︷︸maximum power is N dv Learning From Data - Le ture 7 4/24

Examples• H is positive rays: •

dv = 1

• H is 2D per eptrons: •

dv = 3 • •

• H is onvex sets:dv = ∞

bottomLearning From Data - Le ture 7 5/24

VC dimension and learningdv (H) is nite =⇒ g ∈ H will generalize• Independent of the learning algorithm• Independent of the input distribution• Independent of the target fun tion HYPOTHESIS SET

ALGORITHM

LEARNING FINALHYPOTHESIS

g ~ f ~

f: X Y

TRAINING EXAMPLES

UNKNOWN TARGET FUNCTION

DISTRIBUTION

PROBABILITY

x y x yNN11

( , ), ... , ( , )

VC dimension of per eptronsFor d = 2, dv = 3

In general, dv = d + 1

We will prove two dire tions:dv ≤ d + 1

dv ≥ d + 1

Here is one dire tionA set of N = d + 1 points in R

d shattered by the per eptron:

xT1 xT2 xT3 ...x

1 0 0 . . . 0

1 1 0 . . . 0

1 0 1 0

. . . ... . . . 0

1 0 . . . 0 1

X is invertibleLearning From Data - Le ture 7 8/24

Can we shatter this data set?

For any y =

y2...yd+1

±1...±1

, an we nd a ve tor w satisfyingsign(Xw) = y

Easy! Just make sign(Xw)= y

whi h means w = X−1y

We an shatter these d + 1 pointsThis implies what?[a dv = d + 1

[b dv ≥ d + 1 X

[ dv ≤ d + 1

[d No on lusionLearning From Data - Le ture 7 10/24

Now, to show that dv ≤ d + 1

We need to show that:[a There are d + 1 points we annot shatter[b There are d + 2 points we annot shatter[ We annot shatter any set of d + 1 points[d We annot shatter any set of d + 2 points X

Take any d + 2 pointsFor any d + 2 points,

x1, · · · ,xd+1,xd+2

More points than dimensions =⇒ we must havexj =

where not all the ai's are zerosLearning From Data - Le ture 7 12/24

So?xj =

Consider the following di hotomy:xi's with non-zero ai get yi = sign(ai)

and xj gets yj = −1

No per eptron an implement su h di hotomy!Learning From Data - Le ture 7 13/24

Why?xj =

ai xi =⇒ wTxj =

ai wTxi

If yi = sign(wTxi) = sign(ai), then ai w

Txi > 0

This for esw

ai wTxi > 0

Therefore, yj = sign(wTxj) = +1

Putting it togetherWe proved dv ≤ d + 1 and dv ≥ d + 1

dv = d + 1

What is d + 1 in the per eptron?It is the number of parameters w0, w1, · · · , wd

1. Degrees of freedomParameters reate degrees of freedom# of parameters: analog degrees of freedomdv : equivalent `binary' degrees of freedom

The usual suspe tsPositive rays (dv = 1):

PSfrag repla ements

x1 x2 x3 xN. . .

h(x) = −1 h(x) = +1a

00.20.40.60.81-0.1-0.08-0.06-0.04-0.0200.020.040.060.080.1Positive intervals (dv = 2):

PSfrag repla ements

x1 x2 x3 xN. . .

h(x) = −1 h(x) = −1h(x) = +1

00.20.40.60.81-0.1-0.08-0.06-0.04-0.0200.020.040.060.080.1Learning From Data - Le ture 7 18/24

Not just parametersParameters may not ontribute degrees of freedom:

dv measures the ee tive number of parametersLearning From Data - Le ture 7 19/24

2. Number of data points neededTwo small quantities in the VC inequality:

PP [|Ein(g) − Eout(g)| > ǫ] ≤ 4mH(2N)e−1

︸︷︷︸δ

If we want ertain ǫ and δ, how does N depend on dv ?Let us look at N de−N

N de−N

Fix N de−N = small valueHow does N hange with d?Rule of thumb:

N ≥ 10 dv 20 40 60 80 100 120 140 160 180 200

10−5

N 30e−N

Rearranging thingsStart from the VC inequality:

PP [|Eout − Ein)| > ǫ] ≤ 4mH(2N)e−1

︸︷︷︸δGet ǫ in terms of δ:

δ = 4mH(2N)e−1

8ǫ2N =⇒ ǫ =

4mH(2N)

δ︸︷︷︸Ω

With probability ≥ 1 − δ, |Eout − Ein| ≤ Ω(N,H, δ)Learning From Data - Le ture 7 23/24

Generalization boundWith probability ≥ 1 − δ, |Eout − Ein| ≤ Ω(N,H, δ)

With probability ≥ 1 − δ,Eout ≤ Ein + Ω

E M e - ai.vub.ac.be · Abu-Mostafa rnia Califo Institute of echnology T Lecture 7: The C V...

Documents

Sdiversity.caltech.edu/documents/7117/FSRI_2019.pdf7 S" 7 S# N< WJJ# W1W7 The FSRI participants attend daily math classes and evening math workshops that have been aligned with Caltech's

CRISIS AMBIENTAL - congresofilosofiaensb · DEL RÍO CAUCA alo AUCA TULUA ara onso O ESTACIÓN 1. Antes Suarez 2. Antes Ovejas 3. Antes Timba 4. Paso de La Balsa 5. Paso de La Bolsa

Anuario Estadístico 2018 · Anuario Estadístico 2018 P. 6 Órganos de Gobierno y datos básicos Datos referidos a 31 de diciembre Cuentas de ahorro de 2018 onso d adinistracin Situacin

The History of Caltech's Underrepresented Students

CONCURRENT RESOLUTIONS - GPO · CONCURRENT RESOLUTIONS ... A-4697254, Freire, Ildef onso Henrique. B6 CONCURRENT RESOLUTIONS—MAR. 6, ... Charles Waldemar, or Charles Waldemar

D ¯( Nfatc/AM/slides09.pdf · C V dimension B-V: C: V. rning Lea rom F Data aser Y S. Abu-Mostafa rnia Califo Institute of echnology T Lecture 9: The r Linea Mo del I I Sp red onso

TRACKING SATELLITES STATION PEERS - Vanderbilt …schmidt/PDF/ACE4.pdf · Sp onso rs NSF, D ARP A, Bellco re, Bo eing, CDI/GDIS, ... COMMANDS BULK DATA TRANSFER LOCAL AREA NETWORK

Engineering Science - California Institute of Technologycalteches.library.caltech.edu/617/1/ES54.4.1991.pdfBread Mold The Road to Tuva Engineering & Science In Caltech's 100 years

ANALISIS DE OFERTA Y DEMANDA DE TRABAJO …€¦ · analisis de oferta y demanda dl trabajo en la pequeila agricultura ch!lena . al£onso monardcs . i. introduccion

The Ant Colony Optimization Metaheuristic: Algorithms, Applications ...ai.vub.ac.be/~nlemmens/Teaching/MAS/Literature/ACO.ps.pdf · The Ant Colony Optimization Metaheuristic: Algorithms,

Retraining Madrid-Summary report - InMotion project. onso rs of MD, tions visit sday The grou of the O the work The grou Associat mission ... Microsoft Word - Retraining Madrid-Summary

Sp onso rship packa ges - Kingswood College...Community Carnival In 2015, Kingswood College is celebrating 125 years of education. A Community Carnival is being held on Saturday 10

THE ECONOMY RUN - California Institute of Technologycalteches.library.caltech.edu/1181/1/Economy.pdf · 2012-12-26 · THE ECONOMY RUN A report from one of Caltech's student observers

The Genius of Michael - COnnecting REpositories · C. \VatsOlJ Lectures. 'f/atson, ... thiT tradition, He fotmded Caltech's Friday Ez';;J7i77g Lecture Del7iomtratiom ... for it was

OFFICR OF ORIGIN onso ma. Frca o. acc-3943. REPORT MADE … UGO_0010.pdf11111 SMUT 00/ITSDL) S3U Cairo and 33U Istanbul are therefore asked to participate in this project by . pouching

January 11, 2001, vol. 1, no. 1 Beaver fever heats up ...caltechcampuspubs.library.caltech.edu/2837/1/v 1-1.pdf · January 11, 2001, vol. 1, no. 1 Beaver fever heats up Braun Caltech's

Statistical foundations of machine learning - ai.vub.ac.be. Introduction.pdf · Outline of the course • Foundations of statistical inference. • Estimation • Hypothesis testing

Formigas AF low · Percurso total: 1,95 Km dos emos onso Henriques BANCADA VIP BANCADA D BANCADAS B BANCADAS A BANCADA C 360º SALTO 360º A CHEGADA SALTO A A B A B C D Praça da

Chapter 7 Social Language Learning - ai.vub.ac.be · on the question how meaning is ... meanings can be developed through communication and ... and the learner must guess the way

californiachaparral.comcaliforniachaparral.com/images/CargoCult.pdf · by RICHARD P. FEYNMAN Some remarks on science, pseudoscience, and learning how to not fool yourself. Caltech's