Steep learning curves Reading: Bishop Ch. 3.0, 3.1

Steep learning curves

Reading: Bishop Ch. 3.0, 3.1

Administrivia

•Reminder:

•Microsoft on campus for recruiting

•Next Mon, Feb 5

•FEC141, 11:00 AM

•All welcome

Viewing and re-viewing•Last time:

•(4)5 minutes of math: function optimization

•Measuring performance

•Today:

•Cross-validation

•Learning curves

Separation of train & test•Fundamental principle (1st

amendment of ML):

•Don’t evaluate accuracy (performance) of your classifier (learning system) on the same data used to train it!

Holdout data•Usual to “hold out” a separate set of data for

testing; not used to train classifier

•A.k.a., test set, holdout set, evaluation set, etc.

•E.g.,

• is training set (or empirical) accuracy

• is test set (or generalization) accuracy

Gotchas...•What if you’re unlucky when you split data

into train/test?

•E.g., all train data are class A and all test are class B?

•No “red” things show up in training data

•Best answer: stratification

•Try to make sure class (+feature) ratios are same in train/test sets (and same as original data)

•Why does this work?

Gotchas...•What if you’re unlucky when you split

data into train/test?

•E.g., all train data are class A and all test are class B?

•No “red” things show up in training data

•Almost as good: randomization

•Shuffle data randomly before split

•Why does this work?

Gotchas

•What if the data is small?

•N=50 or N=20 or even N=10

•Can’t do perfect stratification

•Can’t get representative accuracy from any single train/test split

Gotchas

•No good answer

•Common answer: cross-validation•Shuffle data vectors•Break into k chunks•Train on first k-1 chunks•Test on last 1•Repeat, with a different chunk held-out•Average all test accuracies together

Gotchas

• In code:

for (i=0;i<k;++i) {[Xtrain,Ytrain,Xtest,Ytest]=splitData(X,Y,N/k,i);

model[i]=train(Xtrain,Ytrain);cvAccs[i]=measureAcc(model[i],Xtest,Ytest);

}avgAcc=mean(cvAccs);stdAcc=stddev(cvAccs);

CV in pix[X;y]Original

[X’;y’]Randomshuffle

k-waypartition

[X1’Y1’]

[X2’Y2’]

[Xk’Yk’]

k train/test sets

k accuracies53.7% 85.1% 73.2%

But is it really learning?•Now we know how well our models are

performing

•But are they really learning?

•Maybe any classifier would do as well

•E.g., a default classifier (pick the most likely class) or a random classifier

•How can we tell if the model is learning anything?

The learning curve•Train on successively larger fractions of

•Watch how accuracy (performance) changes Learning

Static classifier(no learning)

Anti-learning(forgetting)

Measuring variance•Cross validation helps you get better

estimate of accuracy for small data

•Randomization (shuffling the data) helps guard against poor splits/ordering of the data

•Learning curves help assess learning rate/asymptotic accuracy

•Still one big missing component: variance

•Definition: Variance of a classifier is the fraction of error due to the specific data set it’s trained on

Measuring variance•Variance tells you how much you expect

your classifier/performance to change when you train it on a new (but similar) data set

•E.g., take 5 samplings of a data source; train/test 5 classifiers

•Accuracies: 74.2, 90.3, 58.1, 80.6, 90.3

•Mean accuracy: 78.7%

•Std dev of acc: 13.4%

•Variance is usually a function of both classifier and data source

•High variance classifiers are very susceptible to small changes in data

Putting it all together•Suppose you want to measure the expected

accuracy of your classifier, assess learning rate, and measure variance all at the same time?for (i=0;i<10;++i) { // variance reps

shuffle datado 10-way CV partition of datafor each train/test partition { // xval

for (pct=0.1;pct+=0.1;pct<=0.9) { // LCSubsample pct fraction of training settrain on subsample, test on test set

}}avg across all folds of CV partitiongenerate learning curve for this partition

}get mean and std across all curves

Putting it all together“hepatitis” data

5 minutes of math...•Decision trees make very few assumptions

about data

•Don’t know anything about relations between instances, except sets induced by feature splits

•No sense of spatial/topological relations among data

•Often, our data is real, honest-to-Cthulhu, mathematically sound vector data

•As opposed to the informal sense of vector that I have used so far

•Often comes endowed with a natural inner product and norm

5 minutes of math•Mathematicians like to study the properties of

spaces in general

•From linear algebra, you’ve already met the notion of a vector space:

•Definition: a vector space, V, is a set of elements (vectors) plus a scalar field, F, such that the following properties hold:

•Vector addition:

•Scalar multiplication:

•Linearity; commutativity; associativity; etc.

5 minutes of math

•By itself, vector spaces only partially useful

•Gets more useful when you add a norm and an inner product

5 minutes of math•Definition: a norm, ||.||, is a function of a

single vector ( V∈ ) that returns a scalar ( F∈ ) such that for all a, b V∈ and c F∈ :

• ||a|| ≥ 0

• ||c a|| = |c| ||a||

• ||a+b||≤||a|| + ||b||

• Intutition: norm gives you the length of a vector

•A vector space+norm ⇒ Banach space (*)

5 minutes of math•Definition: an inner product, 〈 ∙ , ∙ 〈 , is a

function of two vectors ( V∈ ) that returns a scalar ( F∈ ) such that:

•Symmetry

•Linearity in first variable

•Non-negativity

•Non-degeneracy

•A vector space+inner product ⇒ Hilbert space (*)

Steep learning curves Reading: Bishop Ch. 3.0, 3.1

Documents

Technique: The Steep Turn

The Steep Learning Curve

Shape Your Path To The Present - Personalized Mindfulness · Mindfulness for New Employees The first few weeks of a job often present employees with steep learning curves on multiple

steep slope stabilization

STEEP Analysis

MATTHEW WALSH, LI ANG ZHANG, DAVID BLANCETT Joint … › content › dam › rand › pubs › ...control, but this control comes at the cost of steep learning curves. The Air Force

The chiki steep!

Steep Magazine #2

A318 Steep Approach Operations

Culture The Steep-slope Enjoyment - durbacher.de · –The Steep-slope specialists Durbach’s topography means that wine is grown almost exclusively on steep- slopes with gradients

High Pressure Low NPSH · The Model SHP Advantages: • Exceptional differential pressures / head developed at max speed of 3500 RPM. • No pulsation! • Steep performance curves

7th Annual European Carbon Capture & Storage - Platts · PDF file7th Annual European Carbon Capture & Storage . ... Independent foundation established in 1864 ... curves are steep

STEEP MASTERS

KING QUEEN BISHOP KNIGHT ROOK PAWN ... - … · QUEEN QUEEN QUEEN QUEEN BISHOP BISHOP BISHOP BISHOP KNIGHT KNIGHT KNIGHT KNIGHT ROOK ROOK ROOK ROOK PAWN PAWN ... PAWN. Chess Who is

Simulating oversize and heavy vehicle manoeuvres using AutoTURN … · Narrow mountain roads, steep hills and challenging ‘S’ curves are just part of the landscape in the ‘toe’

Horizontal Curves Chapter 24. Types of Circular Curves Simple Curve Compound Curves Broken-Back Curves Reverse Curves Broken-Back Curves should be avoided

BindsNET: A machine learning-oriented spiking neural ... · not developed to target ML applications, and often feature abstruse syntax resulting in steep learning curves for new users

STEEP TRANSISTORS WORKSHOP 2016 - E2SWITCH STW2016 FiG.pdf · STEEP TRANSISTORS WORKSHOP 2016, ESSDERC/EPFL, Lausanne, Switzerland SUNDAY SEPTEMBER 11, 2016 OPENING STEEP TRANSISTOR

! ! CAUTIONCAUTION Steep Grades Sharp Curves Bad Conditions 17 accidents 12 injured 2 dead = Don’t be a statistic... Over the last ten years, Teton Pass has seen many accidents,

Steep Magazine #1