Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for...

CSC411/2515 Fall 2016

Neural Networks Tutorial

Lluís Castrejón Oct. 2016

Slides adapted from Yujia Li’s tutorial and Prof. Zemel’s lecture notes.

Overfitting • Thetrainingdatacontainsinformationabouttheregularitiesinthemappingfrominputtooutput.Butitalsocontainsnoise– Thetargetvaluesmaybeunreliable.– Thereissamplingerror.Therewillbeaccidentalregularitiesjustbecauseoftheparticulartrainingcasesthatwerechosen

• Whenwefitthemodel,itcannottellwhichregularitiesarerealandwhicharecausedbysamplingerror.– Soitfitsbothkindsofregularity.– Ifthemodelisveryflexibleitcanmodelthesamplingerrorreallywell.Thisisadisaster.

Overfitting

Picture credit: Chris Bishop. Pattern Recognition and Machine Learning. Ch.1.1.

Preventingoverfitting

• Useamodelthathastherightcapacity:– enoughtomodelthetrueregularities– notenoughtoalsomodelthespuriousregularities(assumingtheyareweaker)

• Standardwaystolimitthecapacityofaneuralnet:– Limitthenumberofhiddenunits.– Limitthesizeoftheweights.– Stopthelearningbeforeithastimetooverfit.

Limitingthesizeoftheweights

Weight-decayinvolvesaddinganextratermtothecostfunctionthatpenalizesthesquaredweights.

– Keepsweightssmallunlesstheyhavebigerrorderivatives.

wCwhen

∂−==

Theeffectofweight-decay

• Itpreventsthenetworkfromusingweightsthatitdoesnotneed– Thiscanoftenimprovegeneralizationalot.– Ithelpstostopitfromfittingthesamplingerror.– Itmakesasmoothermodelinwhichtheoutputchangesmoreslowlyastheinputchanges.

• But,ifthenetworkhastwoverysimilarinputsitpreferstoputhalftheweightoneachratherthanalltheweightononeà otherformofweightdecay?

w/2 w/2 w 0

Decidinghowmuchtorestrictthecapacity

• Howdowedecidewhichlimittouseandhowstrongtomakethelimit?– Ifweusethetestdatawegetanunfairpredictionoftheerrorratewewouldgetonnewtestdata.

– Supposewecomparedasetofmodelsthatgaverandomresults,thebestoneonaparticulardatasetwoulddobetterthanchance.Butitwon’tdobetterthanchanceonanothertestset.

• Souseaseparatevalidationsettodomodelselection.

Usingavalidationset

• Dividethetotaldatasetintothreesubsets:– Trainingdataisusedforlearningtheparametersofthemodel.

– Validationdataisnotusedoflearningbutisusedfordecidingwhattypeofmodelandwhatamountofregularizationworksbest

– Testdataisusedtogetafinal,unbiasedestimateofhowwellthenetworkworks.Weexpectthisestimatetobeworsethanonthevalidationdata

• Wecouldthenre-dividethetotaldatasettogetanotherunbiasedestimateofthetrueerrorrate.

Preventingoverfittingbyearlystopping

• Ifwehavelotsofdataandabigmodel,itsveryexpensivetokeepre-trainingitwithdifferentamountsofweightdecay

• Itismuchcheapertostartwithverysmallweightsandletthemgrowuntiltheperformanceonthevalidationsetstartsgettingworse

• Thecapacityofthemodelislimitedbecausetheweightshavenothadtimetogrowbig.

Whyearlystoppingworks

• Whentheweightsareverysmall,everyhiddenunitisinitslinearrange.– Soanetwithalargelayerofhiddenunitsislinear.

– Ithasnomorecapacitythanalinearnetinwhichtheinputsaredirectlyconnectedtotheoutputs!

• Astheweightsgrow,thehiddenunitsstartusingtheirnon-linearrangessothecapacitygrows.

outputs

inputs

• YannLeCunandothersdevelopedareallygoodrecognizerforhandwrittendigitsbyusingbackpropagationinafeedforwardnetwith:– Manyhiddenlayers– Manypoolsofreplicatedunitsineachlayer.– Averagingtheoutputsofnearbyreplicatedunits.– Awidenetthatcancopewithseveralcharactersatonceeveniftheyoverlap.

• DemoofLENET

RecognizingDigitsHand-writtendigitrecognitionnetwork

– 7291trainingexamples,2007testexamples– Bothcontainambiguousandmisclassifiedexamples– Inputpre-processed(segmented,normalized)

• 16x16graylevel[-1,1],10outputs

LeNet:SummaryMainideas:

• Localà globalprocessing • Retaincoarseposninfo

Maintechnique:weightsharing–unitsarrangedinfeaturemaps

Connections:1256units,64,660cxns,9760freeparameters

Results:0.14%(train),5.0%(test)

vs.3-layernetw/40hiddenunits: 1.6%(train),8.1%(test)

The82errorsmadebyLeNet5

Notice that most of the errors are cases that people find quite easy. The human error rate is probably 20 to 30 errors

Abruteforceapproach

• LeNetusesknowledgeabouttheinvariancestodesign:– thenetworkarchitecture– ortheweightconstraints– orthetypesoffeature

• Butitsmuchsimplertoincorporateknowledgeofinvariancesbyjustcreatingextratrainingdata:– foreachtrainingimage,producenewtrainingdatabyapplyingallofthetransformationswewanttobeinsensitiveto

– Thentrainalarge,dumbnetonafastcomputer.– Thisworkssurprisinglywell

Makingbackpropagationworkforrecognizingdigits

• Usingthestandardviewingtransformations,andlocaldeformationfieldstogetlotsofdata.

• Usemany,globallyconnectedhiddenlayersandlearnforaverylongtime– ThisrequiresaGPUboardoralargecluster

• Usetheappropriateerrormeasureformulti-classcategorization– Cross-entropy,withsoftmaxactivation

• Thisapproachcanget35errorsonMNIST!17

Fabricatingtrainingdata

Goodgeneralizationrequireslotsoftrainingdata,includingexamplesfromallrelevantinputregions

ImprovesolutionifgooddatacanbeconstructedExample:ALVINN

ALVINN:simulatingtrainingexamples

On-the-flytraining:currentvideocameraimageasinput,currentsteeringdirectionastarget

But:over-trainonsameinputs;noexperiencegoingoff-road

Method:generatenewexamplesbyshiftingimages

Replace10low-error&5randomtrainingexampleswith15new

Key:relationbetweeninputandoutputknown!

NeuralNetDemos

Scene recognition - Places MIT

Digit recognition

Neural Nets Playground

Neural Style Transfer

Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for...

Documents

Artificial Neural Networks: Intro · 2018-05-17 · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making

CSC411 Neural Networks Introguerzhoy/411/lec/W03/neuralnetworks.pdfTitle: CSC411 Neural Networks Intro Author: guerzhoy Created Date: 1/23/2017 4:31:10 PM

Training Course Data

Neural Networks Tutorialurtasun/courses/CSC411/tutorial5.pdfOverﬁng ) Picture credit: Chris Bishop. Pattern Recognition and Machine Learning. Ch.1.1

CSC411Fall2014 MachineLearning&DataMining% …rsalakhu/CSC411/notes/lecture_ensemble1.pdfDiffer(in(training(strategy,(and(combination(method(1. Paralleltrainingwithdifferenttrainingsets:

CSC411/2515 Tutorial: K-NN and Decision Treejlucas/teaching/csc411/lectures/tut3... · CSC411/2515 Tutorial: K-NN and Decision Tree Aryan Arbabi csc411-20179-ta@cs.toronto.edu September

Midterm Review - University of Torontorsalakhu/CSC411/notes/Review.pdfMidterm Review • Multivariate Gaussian distributions, Multivariate Gaussian Naïve Bayes, classifier, definition

Training Data Defined

CSC411%% Machine%Learning%rsalakhu/CSC411/notes/Lecture1.pdf · •$From$their$blog:$-$Restricted$Boltzmann$machines$$-$ProbabilisKc$Matrix$Factorizaon$ To$putthese$algorithms$to$use,$we$had$to$work$to$overcome$some$limitaons,$for$

Data Security Training

Data Team Training

CSC411/2515 Lecture 2: Nearest NeighborsCSC411/2515 Lecture 2: Nearest Neighbors Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC411-Lec2

Welcome to CSC411 - Department of Computer Science ...guerzhoy/411/lec/W01/intro.pdf · Welcome to CSC411 CSC411: Machine Learning and Data Mining, Winter 2017 Michael Guerzhoy Y

CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23

CSC411- Machine Learning and Data Mining Tutorial 10– March 23 th, 2007 University of Toronto (Mississauga Campus)

CSC411/2515 Lecture 2: Nearest Neighbors › ... › slides › lec02-slides.pdf · CSC411/2515 Lecture 2: Nearest Neighbors Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

CSC 411: Lecture 01: Introductionurtasun/courses/CSC411/01_intro.pdf · Urtasun & Zemel (UofT) CSC 411: 01-Introduction Sep 14, 2015 30 / 35. Machine Learning vs Data Mining Data-mining:

Training Website Data

Data Literacy Training

CSC411/2515: Tutorial 3urtasun/courses/CSC411/tutorial3.pdf · CSC411/2515: Tutorial 3 Erin Grant cscf411,2515gta@cs.toronto.edu October f1st, 2nd, 6th g, 2015