Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
DeepLearningandSta.s.cs:Connec.ons
PadhraicSmythChancellor’sProfessorDepartmentsofComputerScienceandSta.s.csUniversityofCalifornia,[email protected]
PadhraicSmyth:MonashUniversity,July2019:2
AIResearch10to20yearsago
LogicandAutomatedReasoning
KnowledgeRepresenta.on
MachineLearningNatural
LanguageProcessing
SpeechRecogni.on
ComputerVision
GamePlaying
SearchAlgorithms
Robo.cs
PadhraicSmyth:MonashUniversity,July2019:3
AIResearchin2019
LogicandAutomatedReasoning
KnowledgeRepresenta.on
NaturalLanguageProcessing
SpeechRecogni.on
ComputerVision
GamePlaying
SearchAlgorithms
Robo.cs
DeepMachineLearning
PadhraicSmyth:MonashUniversity,July2019:4
(Russakovskyetal,2015)ImageNetLargeScaleVisualRecogniBonChallenge
PadhraicSmyth:MonashUniversity,July2019:5
FigurefromKevinMurphy,Google,2016
PadhraicSmyth:MonashUniversity,July2019:6
FigurefromKevinMurphy,Google,2016
Deepneuralnetworks
PadhraicSmyth:MonashUniversity,July2019:7
PadhraicSmyth:MonashUniversity,July2019:8
DeepNetworksforDetecBngSkinCancer
FromEstevaetal,Nature,2017
PadhraicSmyth:MonashUniversity,July2019:9
Microsoft/IBM Benchmarks for Speech Recognition
Source: https://www.economist.com/node/21710907/sites/all/modules/custom/ec_essay
PadhraicSmyth:MonashUniversity,July2019:10
Microsoft/IBM Benchmarks for Speech Recognition
Source: https://www.economist.com/node/21710907/sites/all/modules/custom/ec_essay
PadhraicSmyth:MonashUniversity,July2019:11
2016 IEEE Conference on Acoustics, Speech, and Signal Processing
PadhraicSmyth:MonashUniversity,July2019:12
FromKodish-Wachsetal,AMIASymposium,2018
PadhraicSmyth:MonashUniversity,July2019:13
PedestrianDetecBon:AlgorithmsandHumans
Algorithms
HumanAnnotators
FromZhangetal,CVPR2016
PadhraicSmyth:MonashUniversity,July2019:14
APerspecBveonDeepLearning• Deeplearning(DL)research:
– Highvisibilitysuccessesinvision,speech,text,game-playing– Fundingagencies,companies,students,publicareinaweofdeeplearning– Companiesaredrivingalotoftheinterest– Highlyempirical–li[leguidancefromtheory– Fewlinks(todate)tostaBsBcsorstaBsBcalthinking
• Academicresearchcanplayakeyrole– Computerscience,staBsBcs,mathemaBcs,etc– Provideguidance:wheredoesDLworkwell?Andnotsowell?
• ObjecBveempiricalanalyses• Developmentofprinciplesandtheory
– Providebalancetothe“hype”
PadhraicSmyth:MonashUniversity,July2019:15
OutlineofToday’sTalk
• Keyideasindeeplearning
• LinkstostaBsBcalthinking
• LimitaBonsofcurrentdeeplearning
• OpportuniBesfornewideasanddirecBons
PadhraicSmyth:MonashUniversity,July2019:16
PredicBveModeling
f=blackbox predicBonoftargetyinputsx
parametersθ
Goalistolearnamodelfromtrainingdatatopredictyvalues
Machinelearning:emphasisonpredic.onsofySta.s.cs:emphasisonmodelsandparameters
PadhraicSmyth:MonashUniversity,July2019:17
Training Data: D = {xi, yi}, i = 1, . . . , N
Model: yi ⇡ f(xi;✓✓✓)
d-dimensionalinputvector
targetvalue
funcBonalformofthemodel
p-dimensionalparametervector(unknown)
Loss: �
�yi, f(xi;✓✓✓)
�
Model’sPredicBon
IdealTarget
PadhraicSmyth:MonashUniversity,July2019:18
TheThreeComponentsofPredicBveModeling
1.Predic.onModelf:WhatfuncBonalformshouldwechooseforf?
2.LossFunc.onHowdowecomparef’spredicBonstotruey?
3.Op.miza.onGivenfandalossfuncBon,howcanwelearnf’sparameters
PadhraicSmyth:MonashUniversity,July2019:19
TheThreeComponentsofPredicBveModeling
1.Predic.onModelf:WhatfuncBonalformshouldwechooseforf?
2.LossFunc.onHowdowecomparef’spredicBonstotruey?
3.Op.miza.onGivenfandalossfuncBon,howcanwelearnf’sparameters
PadhraicSmyth:MonashUniversity,July2019:20
ExamplesofPredicBonModels
-100 -50 0 50 1000
0.2
0.4
0.6
0.8
1
LinearRegression
Logis.cRegression
f(x;✓✓✓) = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd
=dX
j=0
✓jxj = ✓
✓
✓
Tx
f(x;✓✓✓) = P (y = 1|x;✓✓✓)
=1
1 + e�z, z = ✓✓✓Tx
z
1
1 + e�z
PadhraicSmyth:MonashUniversity,July2019:21
LogisBcRegressionasaSimpleNeuralNetwork
x1
x2
x3
+1
Each“edge”inthenetworkhasanassociatedweightorparameter,θj
f(x;✓✓✓) = P (y = 1|x;✓✓✓)
=1
1 + e�z, z = ✓✓✓Tx
PadhraicSmyth:MonashUniversity,July2019:22
ANeuralNetworkwithOneHiddenLayer(from1990’s)
Herethemodellearns3differentlogisBcfuncBons,eachonea“hiddenunit”andthencombinestheoutputsofthe3tomakeapredicBonMorecomplexthanlogisBcfuncBon,manymoreparameters
x1
x2
x3
+1HiddenLayer
Output
Inputs
f(x;✓✓✓)
PadhraicSmyth:MonashUniversity,July2019:23
DeepLearning:ModelswithMoreHiddenLayers
Usethisideatorecursivelybuild“deepmodels”withmulBplehiddenlayers
x1
x2
x3
+1
Veryflexible,highlynon-linearfuncBonsCanhavedifferenttypesofnon-lineariBes,skiplayers,etc
HiddenLayer1
HiddenLayer2
Output
Inputs
f(x;✓✓✓)
PadhraicSmyth:MonashUniversity,July2019:24
Figurefromh[p://parse.ele.tue.nl/
ExampleofaNetworkforDigitClassificaBon
MathemaBcallythenetworkisjustadifferenBablefuncBon…butaverycomplicatedone
EachoutputisanesBmateofaclassprobabilityP(c=k|x),implementedviaamulBnomiallogisBcfuncBon
Inputpixels,nofeatureextracBon
PadhraicSmyth:MonashUniversity,July2019:25
DeepNetworkarchitectureforGoogLeNetimagerecogni.onnetwork27layers,millionsofparameters
PixelInputs
Output
PadhraicSmyth:MonashUniversity,July2019:26
ABriefHistoryofNeuralNetworks…• ThePerceptronEra:1950sand60s
– GreatopBmismwithperceptrons(linearmodels)....– ...unBlMinsky,1969:perceptronshadlimitedrepresentaBonpower– Hardproblemsrequirehiddenlayers....buttherewasnotrainingalgorithm
• TheBackpropaga.onEra:Late1980stomid-90’s– InvenBonofbackpropagaBon–trainingofmodelswithhiddenlayers– Wildenthusiasm(intheUSatleast)....conferences,funding,etc– Mid1990’s:enthusiasmdiesout:trainingdeepNNsishard
• TheDeepLearningEra:2010-present– 3rdwaveofneuralnetworkenthusiasm– Whathappenedsincemid90’s?
• NowpracBcaltotraindeepnetworks• MuchlargerdatasetsandgreatercomputaBonalpower• FastopBmizaBontechniques+othergoodengineeringtricks
PadhraicSmyth:MonashUniversity,July2019:27
FigureadaptedfromEfronandHasBe,Computer-AidedStaBsBcalInference,2016
PadhraicSmyth:MonashUniversity,July2019:28
FeatureextracBonFigureadaptedfromEfronandHasBe,Computer-AidedStaBsBcalInference,2016
PadhraicSmyth:MonashUniversity,July2019:29
FigureadaptedfromEfronandHasBe,Computer-AidedStaBsBcalInference,2016
Rec.fiedLinearUnit(ReLu)Connec.onswithlinearsplines(e.g.,EckleandSchmidt-Heiber,2018)
PadhraicSmyth:MonashUniversity,July2019:30
FeatureextracBon LogisBcModelFigureadaptedfromEfronandHasBe,Computer-AidedStaBsBcalInference,2016
PadhraicSmyth:MonashUniversity,July2019:31
MachineLearningbeforeDeepModels
FigurefromMarc’Aurelio-Ranzato
PadhraicSmyth:MonashUniversity,July2019:32
DeepConvoluBonalNetwork
FigurefromPeemanetal,2012
Keypoint:end-to-enddifferen.abilityallowsfeatures(convolu.onalfilters)tobelearned,removesneedforhand-cradedfeatureextrac.on(densewordembeddingsplaythesamerolefortext)
PadhraicSmyth:MonashUniversity,July2019:33
ConvoluBonalFiltersforImageData
FigurefromMarc’Aurelio-Ranzato
Keyidea:LearnsuchfiltersinadiscriminaBvefashion
PadhraicSmyth:MonashUniversity,July2019:34
ExamplesofLearnedSpaBalFiltersinPixelSpace
PadhraicSmyth:MonashUniversity,July2019:35
GeneralizedLinearModels
(e.g.,logisBc)
PadhraicSmyth:MonashUniversity,July2019:36
RecursiveGLMs
Define a latent feature via a GLM:
[Mohamed, 2015; Tran et al, 2018]
PadhraicSmyth:MonashUniversity,July2019:37
Define GLM on latent feature:
RecursiveGLMs
Define a latent feature via a GLM:
[Mohamed, 2015; Tran et al, 2018]
PadhraicSmyth:MonashUniversity,July2019:38
BuildingNeuralNetsfromRecursiveGLMs
PadhraicSmyth:MonashUniversity,July2019:39
BuildingNeuralNetsfromRecursiveGLMs
PadhraicSmyth:MonashUniversity,July2019:40
BuildingNeuralNetsfromRecursiveGLMs
HIDDENUNIT/NEURON
HIDDENLAYER
ACTIVATIONFUNCTION
WEIGHTMATRIX
PadhraicSmyth:MonashUniversity,July2019:41
DeepNeuralNetworks
PadhraicSmyth:MonashUniversity,July2019:42
DeepNeuralNetworks
……..
PadhraicSmyth:MonashUniversity,July2019:43
DeepNeuralNetworks
……..
PadhraicSmyth:MonashUniversity,July2019:44
DeepNeuralNetworks
…....ineffectdoingregressionwithlearnedfeaturesorbasisfuncBons
…..
Feature Extractor Statistical Model
NEW REPRESENTATION OF FEATURES COMPUTED BY NN LAYERS
PadhraicSmyth:MonashUniversity,July2019:45
DeepNeuralNetworkRepresentaBons• Itmaybeusefultoviewdeepnetworksastrainablefeatureextractorswith
staBsBcalmodelsas“back-ends”’• ThisviewencouragesthemixingofdeterminisBcDNNrepresentaBons
(“embeddings”)withconvenBonalstaBsBcalmodels
• Examples– Deepsurvivalmodels– Neuralpointprocessmodels– RecurrentnetworkmodelsforBme-series
PadhraicSmyth:MonashUniversity,July2019:46
RecurrentNetworksandState-SpaceModels
PadhraicSmyth:MonashUniversity,July2019:47
RecurrentNetworksandState-SpaceModels
RNNstructureissimilartostate-spacemodelsinstaBsBcse.g.,Kalmanfilters,hiddenMarkovmodels,andsoon
RNN:nodistribuBonalassumpBonsonstatevariables->moreflexibilityState-spaceapproach:be[ercharacterizaBonofuncertainty
PadhraicSmyth:MonashUniversity,July2019:48
TheThreeComponentsofPredicBveModeling
1.Predic.onModelf:WhatfuncBonalformshouldwechooseforf?
2.Lossfunc.onHowdowecomparef’spredicBonstoy?
3.Op.miza.onGivenfandalossfuncBon,howcanwelearnf’sparameters
PadhraicSmyth:MonashUniversity,July2019:49
Loss: �
�yi, f(xi;✓✓✓)
�
Model’sPredicBon
IdealTarget
PadhraicSmyth:MonashUniversity,July2019:50
Loss: �
�yi, f(xi;✓✓✓)
�
Example: Squared Error � =
�yi � f(xi;✓✓✓)
�2
Example: Log Loss � = log
1
P (yi|xi;✓✓✓)
PadhraicSmyth:MonashUniversity,July2019:51
Loss: �
�yi, f(xi;✓✓✓)
�
Example: Squared Error � =
�yi � f(xi;✓✓✓)
�2
Example: Log Loss � = log
1
P (yi|xi;✓✓✓)
PadhraicSmyth:MonashUniversity,July2019:52
Empirical Loss:
L(✓✓✓) =
NX
i=1
�
�yi, f(xi;✓✓✓)
�
funcBonalformofthemodel
p-dimensionalparametervector(unknown)
sumovertrainingdatapoints
FocusisongetngpointesBmatesofθ,byminimizaBonofriskSimplemodels:lossisconvex,opBmizaBoncanbestraighuorwardDeepnetworkmodels:thelossisnon-convex,difficulttoopBmize
EmpiricalRisk
PadhraicSmyth:MonashUniversity,July2019:53
TheThreeComponentsofPredicBveModeling
1.Predic.onModelf:WhatfuncBonalformshouldwechooseforf?
2.Lossfunc.onHowdowecomparef’spredicBonstoy?
3.Op.miza.onGivenfandalossfuncBon,howcanwelearnf’sparameters
PadhraicSmyth:MonashUniversity,July2019:54
GradientDescent
Scalarlearningrate:Howfarwemove
Vectorgradient:Direc.onwemoveUpdatedp-dimensional
parametervector
Currentparametervector
Simplegradientmethodsarethe“workhorse”ofmachinelearningNewton(2ndorder)methodsarerarelyused
……….requiresinversionofp x p Hessianmatrix,O(p3)
✓(k+1) = ✓(k) � � rL(✓)
PadhraicSmyth:MonashUniversity,July2019:55
rL(✓) =NX
i=1
rLi(✓)
rL(✓) ⇡ N
m
mX
j=1
rLj(✓)
FullGradient:
Stochas.cGradient:
ApproximaBonofthefullgradient
Randomsampleofmdatapoints(“mini-batch”)
IntuiBon:form << N,wecanmakemanyfastnoisyupdatesCanleadtosublinearconvergenceforlargeN
PadhraicSmyth:MonashUniversity,July2019:56
StochasBcGradientin2dParameterSpace
Gradientsteps
Stochas.cgradientsteps
Empiricallyworksverywellonlargedatasets:sometheoreBcalsupportAnapplicaBonofRobbins-Monro(1951)stochasBcapproximaBonmethodUsefulforstaBsBcalmodelfitngingeneral(notjustfordeeplearning)
e.g.,Wangetal,2015;Chenetal,2016
PadhraicSmyth:MonashUniversity,July2019:57
CONNECTIONSTOSTATISTICS
PadhraicSmyth:MonashUniversity,July2019:58
TheThreeComponentsofPredicBveModeling
Model+LossFunc.on+Op.miza.onMethod
Thefunc.onalformoff
Howwemeasurethequalityofthemodel’spredic.ons
Thealgorithmthatfindstheparametersthatminimizeempiricalrisk
Deeplearningwaspresentedasanop.miza.onproblemWhereissta.s.cslurking?
PadhraicSmyth:MonashUniversity,July2019:59
Empirical Loss with Regularization:
L(✓✓✓) =
NX
i=1
�
�yi, f(xi;✓✓✓)
�+ �R(✓✓✓)
EmpiricalRiskMinimizaBon
FindtheparametersthatminimizeempiricalriskontrainingdataThisdirectlycorrespondstomaximizinglikelihood:
Squarederrorloss->GaussianlikelihoodforregressionLogloss->binomial/mulBnomiallikelihoodforclassificaBon
ImplicaBon:implicitcondiBonalindependenceassumpBonoverdata
PadhraicSmyth:MonashUniversity,July2019:60
Empirical Loss with Regularization:
L(✓✓✓) =
NX
i=1
�
�yi, f(xi;✓✓✓)
�+ �R(✓✓✓)
StrengthofregularizaBon
RegularizaBononparameters
EmpiricalRiskMinimizaBonwithRegularizaBon
TheregularizaBontermcanbeinterpretedas(minus)alogprior RL2(✓✓✓) =
X✓2j
RL1(✓✓✓) =X
|✓j |
InaddiBon,DLtechniquessuchasdropoutcanbeinterpretedasaformofprior–generalizestoabroad“dropoutfamily”(seeBhadraetal,ArXiv2019andNalisnicketal,ICML2019)
Gaussianprior
Laplacianprior
PadhraicSmyth:MonashUniversity,July2019:61
Lookslikeadeterminis.cproblem?
PadhraicSmyth:MonashUniversity,July2019:62
Lookslikeadeterminis.cproblem?
PadhraicSmyth:MonashUniversity,July2019:63
Lookslikeadeterminis.cproblem?
PadhraicSmyth:MonashUniversity,July2019:64
Minimizedbysetngf(x;θ)toE[y|x],ateveryx
Lookslikeadeterminis.cproblem?
PadhraicSmyth:MonashUniversity,July2019:65
Minimizedbysetngf(x;θ)toE[y|x],ateveryx
Lookslikeadeterminis.cproblem?
Conclusion:op.miza.onproblemisreallyasta.s.cales.ma.onproblem
PadhraicSmyth:MonashUniversity,July2019:66
TheBias-VarianceTradeoff
Expectedfutureerror=ModelBias2+ModelVariance+IntrinsicUncertaintyApproximaBon
errorEsBmaBon
errorLowerbound
Note:thedecomposi.onaboveisop.mis.c:assumesfuturedataisfromsamedistribu.onastrainingdata
PadhraicSmyth:MonashUniversity,July2019:67
PadhraicSmyth:MonashUniversity,July2019:68
FromNeal,Mi[al,etal,ArXiv,2019
UnexpectedBias-VarianceTrendswithDNNs
PadhraicSmyth:MonashUniversity,July2019:69
ClassProbabiliBesForbothMSEandlog-losstheopBmalpredicBonatanyxisE[y|x]
ForK-aryclassificaBon,yisaK-dimensionalindicatorvector
i.e.,theopBmalpredictorforclasskistheprobabilityofthatclassSodeepnetworkswillproduceesBmatesofclassprobabiliBes…intheory,givenenoughdataandassumingnolocalminima(Note;thisisapropertyofthelossfuncBon,notdeepnetworks)
E[yk|x] = 1 P (yk = 1|x) + 0 P (yk = 0|x)= P (yk = 1|x)
PadhraicSmyth:MonashUniversity,July2019:70
ExampleofTestBedData:CIFARImageClassificaBon• Anexampleofawidelyused
datasetindeeplearningresearch– Upto100classes– 50,000imagesfortraining– 10,000imagesfortest
• Studiesongeneraliza.on,op.miza.on,etc,odenusethisdataset
PadhraicSmyth:MonashUniversity,July2019:71
DeepNetworksareozenMiscalibrated(CIFARdata)
PredictedasTigerwithP(y|x)=0.99
PredictedasTelevisionwithP(y|x)=0.99
PadhraicSmyth:MonashUniversity,July2019:72
DeepNetworksareozenMiscalibrated(CIFARdata)
NetworkofDepth5 NetworkofDepth110
FigurefromGuoetal,ICML2017
PredictedasTigerwithP(y|x)=0.99
PredictedasTelevisionwithP(y|x)=0.99
PadhraicSmyth:MonashUniversity,July2019:73
PadhraicSmyth:MonashUniversity,July2019:74
…….
ExpectedlosswithrespecttoP(x)…forthetrainingdata
PadhraicSmyth:MonashUniversity,July2019:75
2 4 6 8 10 12 14
X values
4
6
8
10
12
14
16
Y va
lues
True E[y|x] functionObserved data
2 4 6 8 10 12 14
X values
4
6
8
10
12
14
16
Pred
icte
d an
d tr
ue Y
val
ues Model 95% confidence intervals
Model prediction for E[y|x]True E[y|x] function
WhatwillhappenwhenweextrapolatebeyondP(x)?
PadhraicSmyth:MonashUniversity,July2019:76
2 4 6 8 10 12 14
X values
4
6
8
10
12
14
16
Y va
lues
True E[y|x] functionObserved data
2 4 6 8 10 12 14
X values
4
6
8
10
12
14
16
Pred
icte
d an
d tr
ue Y
val
ues Model 95% confidence intervals
Model prediction for E[y|x]True E[y|x] function
WhatwillhappenwhenweextrapolatebeyondP(x)?
PadhraicSmyth:MonashUniversity,July2019:77
FromTatemetal.,Nature2004.(seealsoresponsele[ersath[p://faculty.washington.edu/kenrice/naturele[er.pdf)
Generalizingfrom100mOlympicWinningTimes
PadhraicSmyth:MonashUniversity,July2019:78
FigurefromKevinMurphy,Google,2016
Deepneuralnetworks
Howwelldothesemodelsextrapolatetonewtypesofimages?
PadhraicSmyth:MonashUniversity,July2019:79
PadhraicSmyth:MonashUniversity,July2019:80
FromRechtetal,ICML2019
AccuracyofImageNetClassifiersonNewImageNetData
PadhraicSmyth:MonashUniversity,July2019:81
PadhraicSmyth:MonashUniversity,July2019:82
ADeepNeuralNetworkforImageRecogniBonFromNguyen,Yosinski,Clune,CVPR2015
PadhraicSmyth:MonashUniversity,July2019:83
ADeepNeuralNetworkforImageRecogniBon
ImagesusedforTraining NewImages
FromNguyen,Yosinski,Clune,CVPR2015
PadhraicSmyth:MonashUniversity,July2019:84
ADeepNeuralNetworkforImageRecogniBonFromNguyen,Yosinski,Clune,CVPR2015
PadhraicSmyth:MonashUniversity,July2019:85
ADeepNeuralNetworkforImageRecogniBonFromNguyen,Yosinski,Clune,CVPR2015
PadhraicSmyth:MonashUniversity,July2019:86
0 10 20 30 40 50 60 70 80 900
2000
4000
6000
8000
10000
12000
14000
AGE
MO
NTHL
Y IN
COM
E
DecisionBoundary
Poorextrapola.onfortestpointslikethis….
PadhraicSmyth:MonashUniversity,July2019:87
Non-RobustnessinDeepImageClassificaBon
FigurefromEngstrometal,ICML2019
PadhraicSmyth:MonashUniversity,July2019:88
ExternalversusInternalValidaBonFromZechetal.,PLOSMedicine,2018
AUCsontestdatafromhospitalnotusedinmodeltraining(“external)
AUCsontestdatafromhospitalsusedinmodeltraining(“internal”)
PadhraicSmyth:MonashUniversity,July2019:89
ExternalversusInternalValidaBonFromZechetal.,PLOSMedicine,2018
AUCsontestdatafromhospitalnotusedinmodeltraining(“external)
AUCsontestdatafromhospitalsusedinmodeltraining(“internal”)
PadhraicSmyth:MonashUniversity,July2019:90
BayesianAssessmentofBlackBoxModels
• Scenario– Black-boxpredicBon(e.g.,neuralnetwork)hasbeentrained,parametersarefixed,wecanonlyquerythemodel
– Wewishtoevaluateitsperformance(accuracy,calibraBon,precision,etc)onlineinanewenvironment
(NewworkinSmyth/SteyversgroupatUCIrvine)
ResultswithdeepnetworksonCIFARimageclassifica.on
PadhraicSmyth:MonashUniversity,July2019:91
BayesianAssessmentofBlackBoxModels
• Scenario– Black-boxpredicBon(e.g.,neuralnetwork)hasbeentrained,parametersarefixed,wecanonlyquerythemodel
– Wewishtoevaluateitsperformance(accuracy,calibraBon,precision,etc)onlineinanewenvironment
(NewworkinSmyth/SteyversgroupatUCIrvine)
N=100queries N=500queries N=10,000queries
ResultswithdeepnetworksonCIFARimageclassifica.on
PadhraicSmyth:MonashUniversity,July2019:92
BayesianAssessmentofAccuracyandCalibraBon(OngoingworkinSmyth/SteyversgroupatUCIrvine)
ResultsonCIFARimageclassificaBondataset
PadhraicSmyth:MonashUniversity,July2019:93
BayesianAssessmentviaRankingandAcBveLearning(OngoingworkinSmyth/SteyversgroupatUCIrvine)
palm treewardrobe
motorcyclesunflowerkeyboard
Most Accurate
0.0 0.2 0.4 0.6
lizardseal
ottershrew
boy
Least Accurate
Bayesianrankingbypredictedclass
ClasswithLeastAccuratePredicBons
ClasswithLeastCalibratedPredicBons
Bayesianac.velearning
PadhraicSmyth:MonashUniversity,July2019:94
THEOVERFITTINGQUESTION
PadhraicSmyth:MonashUniversity,July2019:95
FromPoggioetal,2018:TheoryofdeeplearningIII:thenon-overfitngpuzzle
LackofOverfitngofDeepNetworksonCIFAR-10
Moreparametersthandata
PadhraicSmyth:MonashUniversity,July2019:96
LackofOverfitng:DifferentNetworks,DifferentData
FromNeyshaburetal,2018;TowardsunderstandingtheroleofoverparametrizaBoningeneralizaBonofneuralnetworks
CIFAR-10Data SVHNData
MNISTData
PadhraicSmyth:MonashUniversity,July2019:97
OverfitngintheDLLiterature• Standardbias-variancetheoryseemsnottoapply
– DLmodelscaninterpolatethedata(zerotrainingerror)butsBllgeneralizewellontestdata
• Trainingerror(orloss)tendstoodenbemuchlowerthantesterror– ThisistradiBonallyanindicatorofoverfitng….butnothere
• Variousemergingconjecturesandtheories– e.g.minimum-norminterpolatorsgeneralizewellinoverparametrizedregime(seeBelkinetal(2018,2019),HasBeetal(2019)
…..butverymuchsBllanopenproblem
PadhraicSmyth:MonashUniversity,July2019:98
The“DoubleDescent”TheoryBelkinetal.,Reconcilingmodernmachinelearningandthebias-variancetradeoff,2018
PadhraicSmyth:MonashUniversity,July2019:99
“DoubleDescent”onMNISTData
RecentworkfromstaBsBcsthatconfirmsthesetheories:HasBeetal,ArXiv,2019
Belkinetal.,Reconcilingmodernmachinelearningandthebias-variancetradeoff,2018
PadhraicSmyth:MonashUniversity,July2019:100
CONCLUDINGCOMMENTS
PadhraicSmyth:MonashUniversity,July2019:101
CauBonaryNotesaboutDeepLearning
• Verylargeamountsoflabeleddataneeded(forclassificaBonproblems)
• ExtrapolaBonproperBesareunpredictable
• ModelbuildingandopBmizaBoncanbecomplex(significanthumaneffort)
• InterpretabilityandexplanaBonaredifficult• Relianceonempirical“folkwisdom”ratherthanprinciplesandtheory
PadhraicSmyth:MonashUniversity,July2019:102
QuesBonsworthaskingforAIApplicaBons
1. Ismachinelearninganappropriateapproach?
2. Ifso,isdeeplearningthebestapproach?
3. HowdowebuildmodelsthatgeneralizewelltonewsituaBons?
PadhraicSmyth:MonashUniversity,July2019:103
Scullleyetal,NIPS2015Conference
PadhraicSmyth:MonashUniversity,July2019:104
ConcludingComments
• Deeplearninghasachievedimpressiveresultsinpamernrecogni.on– ParBcularlyusefulwithhigh-dimensionalsignals(images,speech,text)
• Manyfounda.onalideasaregroundedinsta.s.cs(includingothertopicswedidnotdiscuss:fairness,adversarial/robustlearning,reinforcementlearning,…)
PadhraicSmyth:MonashUniversity,July2019:105
ConcludingComments
• Deeplearninghasachievedimpressiveresultsinpamernrecogni.on– ParBcularlyusefulwithhigh-dimensionalsignals(images,speech,text)
• Manyfounda.onalideasaregroundedinsta.s.cs(includingothertopicswedidnotdiscuss:fairness,adversarial/robustlearning,reinforcementlearning,…)
• However,deeplearninghasblindspots
– e.g.,reportedempiricalaccuraciesmaybeopBmisBc
• Asdeepmachinelearningisappliedmorebroadlyweneed– Robustprinciplesandtheorytoguidemodel-building– ObjecBvediagnosisandevaluaBonmethodsforpracBBoners
PadhraicSmyth:MonashUniversity,July2019:106
THANKYOUFORLISTENINGQUESTIONS?
PadhraicSmyth:MonashUniversity,July2019:107
AddiBonalReading• Efron,Bradley,andTrevorHasBe.ComputerAgeSta;s;calInference.Cambridge
UniversityPress,2016.(Chapter18:NeuralNetworksandDeepLearning).
• Goodfellow,Ian,YoshuaBengio,AaronCourville.DeepLearning.Cambridge:MITPress,2016
• Jordan,MichaelI.,andTomM.Mitchell.Machinelearning:Trends,perspecBves,and
prospects.Science349.6245(2015):255-260.
• Taddy,Ma[.TheTechnologicalElementsofAr;ficialIntelligence.No.w24301.NaBonalBureauofEconomicResearch,2018.
• Brynjolfsson,Erik,andTomMitchell.Whatcanmachinelearningdo?WorkforceimplicaBons.Science358.6370(2017):1530-1534.
• Breiman,L.(2001).StaBsBcalmodeling:Thetwocultures.Sta;s;calScience,16(3),199-231.