Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text,...

Preview:

Citation preview

DeepLearningandSta.s.cs:Connec.ons

PadhraicSmythChancellor’sProfessorDepartmentsofComputerScienceandSta.s.csUniversityofCalifornia,Irvinesmyth@ics.uci.edu

PadhraicSmyth:MonashUniversity,July2019:2

AIResearch10to20yearsago

LogicandAutomatedReasoning

KnowledgeRepresenta.on

MachineLearningNatural

LanguageProcessing

SpeechRecogni.on

ComputerVision

GamePlaying

SearchAlgorithms

Robo.cs

PadhraicSmyth:MonashUniversity,July2019:3

AIResearchin2019

LogicandAutomatedReasoning

KnowledgeRepresenta.on

NaturalLanguageProcessing

SpeechRecogni.on

ComputerVision

GamePlaying

SearchAlgorithms

Robo.cs

DeepMachineLearning

PadhraicSmyth:MonashUniversity,July2019:4

(Russakovskyetal,2015)ImageNetLargeScaleVisualRecogniBonChallenge

PadhraicSmyth:MonashUniversity,July2019:5

FigurefromKevinMurphy,Google,2016

PadhraicSmyth:MonashUniversity,July2019:6

FigurefromKevinMurphy,Google,2016

Deepneuralnetworks

PadhraicSmyth:MonashUniversity,July2019:7

PadhraicSmyth:MonashUniversity,July2019:8

DeepNetworksforDetecBngSkinCancer

FromEstevaetal,Nature,2017

PadhraicSmyth:MonashUniversity,July2019:9

Microsoft/IBM Benchmarks for Speech Recognition

Source: https://www.economist.com/node/21710907/sites/all/modules/custom/ec_essay

PadhraicSmyth:MonashUniversity,July2019:10

Microsoft/IBM Benchmarks for Speech Recognition

Source: https://www.economist.com/node/21710907/sites/all/modules/custom/ec_essay

PadhraicSmyth:MonashUniversity,July2019:11

2016 IEEE Conference on Acoustics, Speech, and Signal Processing

PadhraicSmyth:MonashUniversity,July2019:12

FromKodish-Wachsetal,AMIASymposium,2018

PadhraicSmyth:MonashUniversity,July2019:13

PedestrianDetecBon:AlgorithmsandHumans

Algorithms

HumanAnnotators

FromZhangetal,CVPR2016

PadhraicSmyth:MonashUniversity,July2019:14

APerspecBveonDeepLearning•  Deeplearning(DL)research:

–  Highvisibilitysuccessesinvision,speech,text,game-playing–  Fundingagencies,companies,students,publicareinaweofdeeplearning–  Companiesaredrivingalotoftheinterest–  Highlyempirical–li[leguidancefromtheory–  Fewlinks(todate)tostaBsBcsorstaBsBcalthinking

•  Academicresearchcanplayakeyrole–  Computerscience,staBsBcs,mathemaBcs,etc–  Provideguidance:wheredoesDLworkwell?Andnotsowell?

•  ObjecBveempiricalanalyses•  Developmentofprinciplesandtheory

–  Providebalancetothe“hype”

PadhraicSmyth:MonashUniversity,July2019:15

OutlineofToday’sTalk

•  Keyideasindeeplearning

•  LinkstostaBsBcalthinking

•  LimitaBonsofcurrentdeeplearning

•  OpportuniBesfornewideasanddirecBons

PadhraicSmyth:MonashUniversity,July2019:16

PredicBveModeling

f=blackbox predicBonoftargetyinputsx

parametersθ

Goalistolearnamodelfromtrainingdatatopredictyvalues

Machinelearning:emphasisonpredic.onsofySta.s.cs:emphasisonmodelsandparameters

PadhraicSmyth:MonashUniversity,July2019:17

Training Data: D = {xi, yi}, i = 1, . . . , N

Model: yi ⇡ f(xi;✓✓✓)

d-dimensionalinputvector

targetvalue

funcBonalformofthemodel

p-dimensionalparametervector(unknown)

Loss: �

�yi, f(xi;✓✓✓)

Model’sPredicBon

IdealTarget

PadhraicSmyth:MonashUniversity,July2019:18

TheThreeComponentsofPredicBveModeling

1.Predic.onModelf:WhatfuncBonalformshouldwechooseforf?

2.LossFunc.onHowdowecomparef’spredicBonstotruey?

3.Op.miza.onGivenfandalossfuncBon,howcanwelearnf’sparameters

PadhraicSmyth:MonashUniversity,July2019:19

TheThreeComponentsofPredicBveModeling

1.Predic.onModelf:WhatfuncBonalformshouldwechooseforf?

2.LossFunc.onHowdowecomparef’spredicBonstotruey?

3.Op.miza.onGivenfandalossfuncBon,howcanwelearnf’sparameters

PadhraicSmyth:MonashUniversity,July2019:20

ExamplesofPredicBonModels

-100 -50 0 50 1000

0.2

0.4

0.6

0.8

1

LinearRegression

Logis.cRegression

f(x;✓✓✓) = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd

=dX

j=0

✓jxj = ✓

Tx

f(x;✓✓✓) = P (y = 1|x;✓✓✓)

=1

1 + e�z, z = ✓✓✓Tx

z

1

1 + e�z

PadhraicSmyth:MonashUniversity,July2019:21

LogisBcRegressionasaSimpleNeuralNetwork

x1

x2

x3

+1

Each“edge”inthenetworkhasanassociatedweightorparameter,θj

f(x;✓✓✓) = P (y = 1|x;✓✓✓)

=1

1 + e�z, z = ✓✓✓Tx

PadhraicSmyth:MonashUniversity,July2019:22

ANeuralNetworkwithOneHiddenLayer(from1990’s)

Herethemodellearns3differentlogisBcfuncBons,eachonea“hiddenunit”andthencombinestheoutputsofthe3tomakeapredicBonMorecomplexthanlogisBcfuncBon,manymoreparameters

x1

x2

x3

+1HiddenLayer

Output

Inputs

f(x;✓✓✓)

PadhraicSmyth:MonashUniversity,July2019:23

DeepLearning:ModelswithMoreHiddenLayers

Usethisideatorecursivelybuild“deepmodels”withmulBplehiddenlayers

x1

x2

x3

+1

Veryflexible,highlynon-linearfuncBonsCanhavedifferenttypesofnon-lineariBes,skiplayers,etc

HiddenLayer1

HiddenLayer2

Output

Inputs

f(x;✓✓✓)

PadhraicSmyth:MonashUniversity,July2019:24

Figurefromh[p://parse.ele.tue.nl/

ExampleofaNetworkforDigitClassificaBon

MathemaBcallythenetworkisjustadifferenBablefuncBon…butaverycomplicatedone

EachoutputisanesBmateofaclassprobabilityP(c=k|x),implementedviaamulBnomiallogisBcfuncBon

Inputpixels,nofeatureextracBon

PadhraicSmyth:MonashUniversity,July2019:25

DeepNetworkarchitectureforGoogLeNetimagerecogni.onnetwork27layers,millionsofparameters

PixelInputs

Output

PadhraicSmyth:MonashUniversity,July2019:26

ABriefHistoryofNeuralNetworks…•  ThePerceptronEra:1950sand60s

–  GreatopBmismwithperceptrons(linearmodels)....–  ...unBlMinsky,1969:perceptronshadlimitedrepresentaBonpower–  Hardproblemsrequirehiddenlayers....buttherewasnotrainingalgorithm

•  TheBackpropaga.onEra:Late1980stomid-90’s–  InvenBonofbackpropagaBon–trainingofmodelswithhiddenlayers–  Wildenthusiasm(intheUSatleast)....conferences,funding,etc–  Mid1990’s:enthusiasmdiesout:trainingdeepNNsishard

•  TheDeepLearningEra:2010-present–  3rdwaveofneuralnetworkenthusiasm–  Whathappenedsincemid90’s?

•  NowpracBcaltotraindeepnetworks• MuchlargerdatasetsandgreatercomputaBonalpower•  FastopBmizaBontechniques+othergoodengineeringtricks

PadhraicSmyth:MonashUniversity,July2019:27

FigureadaptedfromEfronandHasBe,Computer-AidedStaBsBcalInference,2016

PadhraicSmyth:MonashUniversity,July2019:28

FeatureextracBonFigureadaptedfromEfronandHasBe,Computer-AidedStaBsBcalInference,2016

PadhraicSmyth:MonashUniversity,July2019:29

FigureadaptedfromEfronandHasBe,Computer-AidedStaBsBcalInference,2016

Rec.fiedLinearUnit(ReLu)Connec.onswithlinearsplines(e.g.,EckleandSchmidt-Heiber,2018)

PadhraicSmyth:MonashUniversity,July2019:30

FeatureextracBon LogisBcModelFigureadaptedfromEfronandHasBe,Computer-AidedStaBsBcalInference,2016

PadhraicSmyth:MonashUniversity,July2019:31

MachineLearningbeforeDeepModels

FigurefromMarc’Aurelio-Ranzato

PadhraicSmyth:MonashUniversity,July2019:32

DeepConvoluBonalNetwork

FigurefromPeemanetal,2012

Keypoint:end-to-enddifferen.abilityallowsfeatures(convolu.onalfilters)tobelearned,removesneedforhand-cradedfeatureextrac.on(densewordembeddingsplaythesamerolefortext)

PadhraicSmyth:MonashUniversity,July2019:33

ConvoluBonalFiltersforImageData

FigurefromMarc’Aurelio-Ranzato

Keyidea:LearnsuchfiltersinadiscriminaBvefashion

PadhraicSmyth:MonashUniversity,July2019:34

ExamplesofLearnedSpaBalFiltersinPixelSpace

PadhraicSmyth:MonashUniversity,July2019:35

GeneralizedLinearModels

(e.g.,logisBc)

PadhraicSmyth:MonashUniversity,July2019:36

RecursiveGLMs

Define a latent feature via a GLM:

[Mohamed, 2015; Tran et al, 2018]

PadhraicSmyth:MonashUniversity,July2019:37

Define GLM on latent feature:

RecursiveGLMs

Define a latent feature via a GLM:

[Mohamed, 2015; Tran et al, 2018]

PadhraicSmyth:MonashUniversity,July2019:38

BuildingNeuralNetsfromRecursiveGLMs

PadhraicSmyth:MonashUniversity,July2019:39

BuildingNeuralNetsfromRecursiveGLMs

PadhraicSmyth:MonashUniversity,July2019:40

BuildingNeuralNetsfromRecursiveGLMs

HIDDENUNIT/NEURON

HIDDENLAYER

ACTIVATIONFUNCTION

WEIGHTMATRIX

PadhraicSmyth:MonashUniversity,July2019:41

DeepNeuralNetworks

PadhraicSmyth:MonashUniversity,July2019:42

DeepNeuralNetworks

……..

PadhraicSmyth:MonashUniversity,July2019:43

DeepNeuralNetworks

……..

PadhraicSmyth:MonashUniversity,July2019:44

DeepNeuralNetworks

…....ineffectdoingregressionwithlearnedfeaturesorbasisfuncBons

…..

Feature Extractor Statistical Model

NEW REPRESENTATION OF FEATURES COMPUTED BY NN LAYERS

PadhraicSmyth:MonashUniversity,July2019:45

DeepNeuralNetworkRepresentaBons•  Itmaybeusefultoviewdeepnetworksastrainablefeatureextractorswith

staBsBcalmodelsas“back-ends”’•  ThisviewencouragesthemixingofdeterminisBcDNNrepresentaBons

(“embeddings”)withconvenBonalstaBsBcalmodels

•  Examples–  Deepsurvivalmodels–  Neuralpointprocessmodels–  RecurrentnetworkmodelsforBme-series

PadhraicSmyth:MonashUniversity,July2019:46

RecurrentNetworksandState-SpaceModels

PadhraicSmyth:MonashUniversity,July2019:47

RecurrentNetworksandState-SpaceModels

RNNstructureissimilartostate-spacemodelsinstaBsBcse.g.,Kalmanfilters,hiddenMarkovmodels,andsoon

RNN:nodistribuBonalassumpBonsonstatevariables->moreflexibilityState-spaceapproach:be[ercharacterizaBonofuncertainty

PadhraicSmyth:MonashUniversity,July2019:48

TheThreeComponentsofPredicBveModeling

1.Predic.onModelf:WhatfuncBonalformshouldwechooseforf?

2.Lossfunc.onHowdowecomparef’spredicBonstoy?

3.Op.miza.onGivenfandalossfuncBon,howcanwelearnf’sparameters

PadhraicSmyth:MonashUniversity,July2019:49

Loss: �

�yi, f(xi;✓✓✓)

Model’sPredicBon

IdealTarget

PadhraicSmyth:MonashUniversity,July2019:50

Loss: �

�yi, f(xi;✓✓✓)

Example: Squared Error � =

�yi � f(xi;✓✓✓)

�2

Example: Log Loss � = log

1

P (yi|xi;✓✓✓)

PadhraicSmyth:MonashUniversity,July2019:51

Loss: �

�yi, f(xi;✓✓✓)

Example: Squared Error � =

�yi � f(xi;✓✓✓)

�2

Example: Log Loss � = log

1

P (yi|xi;✓✓✓)

PadhraicSmyth:MonashUniversity,July2019:52

Empirical Loss:

L(✓✓✓) =

NX

i=1

�yi, f(xi;✓✓✓)

funcBonalformofthemodel

p-dimensionalparametervector(unknown)

sumovertrainingdatapoints

FocusisongetngpointesBmatesofθ,byminimizaBonofriskSimplemodels:lossisconvex,opBmizaBoncanbestraighuorwardDeepnetworkmodels:thelossisnon-convex,difficulttoopBmize

EmpiricalRisk

PadhraicSmyth:MonashUniversity,July2019:53

TheThreeComponentsofPredicBveModeling

1.Predic.onModelf:WhatfuncBonalformshouldwechooseforf?

2.Lossfunc.onHowdowecomparef’spredicBonstoy?

3.Op.miza.onGivenfandalossfuncBon,howcanwelearnf’sparameters

PadhraicSmyth:MonashUniversity,July2019:54

GradientDescent

Scalarlearningrate:Howfarwemove

Vectorgradient:Direc.onwemoveUpdatedp-dimensional

parametervector

Currentparametervector

Simplegradientmethodsarethe“workhorse”ofmachinelearningNewton(2ndorder)methodsarerarelyused

……….requiresinversionofp x p Hessianmatrix,O(p3)

✓(k+1) = ✓(k) � � rL(✓)

PadhraicSmyth:MonashUniversity,July2019:55

rL(✓) =NX

i=1

rLi(✓)

rL(✓) ⇡ N

m

mX

j=1

rLj(✓)

FullGradient:

Stochas.cGradient:

ApproximaBonofthefullgradient

Randomsampleofmdatapoints(“mini-batch”)

IntuiBon:form << N,wecanmakemanyfastnoisyupdatesCanleadtosublinearconvergenceforlargeN

PadhraicSmyth:MonashUniversity,July2019:56

StochasBcGradientin2dParameterSpace

Gradientsteps

Stochas.cgradientsteps

Empiricallyworksverywellonlargedatasets:sometheoreBcalsupportAnapplicaBonofRobbins-Monro(1951)stochasBcapproximaBonmethodUsefulforstaBsBcalmodelfitngingeneral(notjustfordeeplearning)

e.g.,Wangetal,2015;Chenetal,2016

PadhraicSmyth:MonashUniversity,July2019:57

CONNECTIONSTOSTATISTICS

PadhraicSmyth:MonashUniversity,July2019:58

TheThreeComponentsofPredicBveModeling

Model+LossFunc.on+Op.miza.onMethod

Thefunc.onalformoff

Howwemeasurethequalityofthemodel’spredic.ons

Thealgorithmthatfindstheparametersthatminimizeempiricalrisk

Deeplearningwaspresentedasanop.miza.onproblemWhereissta.s.cslurking?

PadhraicSmyth:MonashUniversity,July2019:59

Empirical Loss with Regularization:

L(✓✓✓) =

NX

i=1

�yi, f(xi;✓✓✓)

�+ �R(✓✓✓)

EmpiricalRiskMinimizaBon

FindtheparametersthatminimizeempiricalriskontrainingdataThisdirectlycorrespondstomaximizinglikelihood:

Squarederrorloss->GaussianlikelihoodforregressionLogloss->binomial/mulBnomiallikelihoodforclassificaBon

ImplicaBon:implicitcondiBonalindependenceassumpBonoverdata

PadhraicSmyth:MonashUniversity,July2019:60

Empirical Loss with Regularization:

L(✓✓✓) =

NX

i=1

�yi, f(xi;✓✓✓)

�+ �R(✓✓✓)

StrengthofregularizaBon

RegularizaBononparameters

EmpiricalRiskMinimizaBonwithRegularizaBon

TheregularizaBontermcanbeinterpretedas(minus)alogprior RL2(✓✓✓) =

X✓2j

RL1(✓✓✓) =X

|✓j |

InaddiBon,DLtechniquessuchasdropoutcanbeinterpretedasaformofprior–generalizestoabroad“dropoutfamily”(seeBhadraetal,ArXiv2019andNalisnicketal,ICML2019)

Gaussianprior

Laplacianprior

PadhraicSmyth:MonashUniversity,July2019:61

Lookslikeadeterminis.cproblem?

PadhraicSmyth:MonashUniversity,July2019:62

Lookslikeadeterminis.cproblem?

PadhraicSmyth:MonashUniversity,July2019:63

Lookslikeadeterminis.cproblem?

PadhraicSmyth:MonashUniversity,July2019:64

Minimizedbysetngf(x;θ)toE[y|x],ateveryx

Lookslikeadeterminis.cproblem?

PadhraicSmyth:MonashUniversity,July2019:65

Minimizedbysetngf(x;θ)toE[y|x],ateveryx

Lookslikeadeterminis.cproblem?

Conclusion:op.miza.onproblemisreallyasta.s.cales.ma.onproblem

PadhraicSmyth:MonashUniversity,July2019:66

TheBias-VarianceTradeoff

Expectedfutureerror=ModelBias2+ModelVariance+IntrinsicUncertaintyApproximaBon

errorEsBmaBon

errorLowerbound

Note:thedecomposi.onaboveisop.mis.c:assumesfuturedataisfromsamedistribu.onastrainingdata

PadhraicSmyth:MonashUniversity,July2019:67

PadhraicSmyth:MonashUniversity,July2019:68

FromNeal,Mi[al,etal,ArXiv,2019

UnexpectedBias-VarianceTrendswithDNNs

PadhraicSmyth:MonashUniversity,July2019:69

ClassProbabiliBesForbothMSEandlog-losstheopBmalpredicBonatanyxisE[y|x]

ForK-aryclassificaBon,yisaK-dimensionalindicatorvector

i.e.,theopBmalpredictorforclasskistheprobabilityofthatclassSodeepnetworkswillproduceesBmatesofclassprobabiliBes…intheory,givenenoughdataandassumingnolocalminima(Note;thisisapropertyofthelossfuncBon,notdeepnetworks)

E[yk|x] = 1 P (yk = 1|x) + 0 P (yk = 0|x)= P (yk = 1|x)

PadhraicSmyth:MonashUniversity,July2019:70

ExampleofTestBedData:CIFARImageClassificaBon•  Anexampleofawidelyused

datasetindeeplearningresearch–  Upto100classes–  50,000imagesfortraining–  10,000imagesfortest

•  Studiesongeneraliza.on,op.miza.on,etc,odenusethisdataset

PadhraicSmyth:MonashUniversity,July2019:71

DeepNetworksareozenMiscalibrated(CIFARdata)

PredictedasTigerwithP(y|x)=0.99

PredictedasTelevisionwithP(y|x)=0.99

PadhraicSmyth:MonashUniversity,July2019:72

DeepNetworksareozenMiscalibrated(CIFARdata)

NetworkofDepth5 NetworkofDepth110

FigurefromGuoetal,ICML2017

PredictedasTigerwithP(y|x)=0.99

PredictedasTelevisionwithP(y|x)=0.99

PadhraicSmyth:MonashUniversity,July2019:73

PadhraicSmyth:MonashUniversity,July2019:74

…….

ExpectedlosswithrespecttoP(x)…forthetrainingdata

PadhraicSmyth:MonashUniversity,July2019:75

2 4 6 8 10 12 14

X values

4

6

8

10

12

14

16

Y va

lues

True E[y|x] functionObserved data

2 4 6 8 10 12 14

X values

4

6

8

10

12

14

16

Pred

icte

d an

d tr

ue Y

val

ues Model 95% confidence intervals

Model prediction for E[y|x]True E[y|x] function

WhatwillhappenwhenweextrapolatebeyondP(x)?

PadhraicSmyth:MonashUniversity,July2019:76

2 4 6 8 10 12 14

X values

4

6

8

10

12

14

16

Y va

lues

True E[y|x] functionObserved data

2 4 6 8 10 12 14

X values

4

6

8

10

12

14

16

Pred

icte

d an

d tr

ue Y

val

ues Model 95% confidence intervals

Model prediction for E[y|x]True E[y|x] function

WhatwillhappenwhenweextrapolatebeyondP(x)?

PadhraicSmyth:MonashUniversity,July2019:77

FromTatemetal.,Nature2004.(seealsoresponsele[ersath[p://faculty.washington.edu/kenrice/naturele[er.pdf)

Generalizingfrom100mOlympicWinningTimes

PadhraicSmyth:MonashUniversity,July2019:78

FigurefromKevinMurphy,Google,2016

Deepneuralnetworks

Howwelldothesemodelsextrapolatetonewtypesofimages?

PadhraicSmyth:MonashUniversity,July2019:79

PadhraicSmyth:MonashUniversity,July2019:80

FromRechtetal,ICML2019

AccuracyofImageNetClassifiersonNewImageNetData

PadhraicSmyth:MonashUniversity,July2019:81

PadhraicSmyth:MonashUniversity,July2019:82

ADeepNeuralNetworkforImageRecogniBonFromNguyen,Yosinski,Clune,CVPR2015

PadhraicSmyth:MonashUniversity,July2019:83

ADeepNeuralNetworkforImageRecogniBon

ImagesusedforTraining NewImages

FromNguyen,Yosinski,Clune,CVPR2015

PadhraicSmyth:MonashUniversity,July2019:84

ADeepNeuralNetworkforImageRecogniBonFromNguyen,Yosinski,Clune,CVPR2015

PadhraicSmyth:MonashUniversity,July2019:85

ADeepNeuralNetworkforImageRecogniBonFromNguyen,Yosinski,Clune,CVPR2015

PadhraicSmyth:MonashUniversity,July2019:86

0 10 20 30 40 50 60 70 80 900

2000

4000

6000

8000

10000

12000

14000

AGE

MO

NTHL

Y IN

COM

E

DecisionBoundary

Poorextrapola.onfortestpointslikethis….

PadhraicSmyth:MonashUniversity,July2019:87

Non-RobustnessinDeepImageClassificaBon

FigurefromEngstrometal,ICML2019

PadhraicSmyth:MonashUniversity,July2019:88

ExternalversusInternalValidaBonFromZechetal.,PLOSMedicine,2018

AUCsontestdatafromhospitalnotusedinmodeltraining(“external)

AUCsontestdatafromhospitalsusedinmodeltraining(“internal”)

PadhraicSmyth:MonashUniversity,July2019:89

ExternalversusInternalValidaBonFromZechetal.,PLOSMedicine,2018

AUCsontestdatafromhospitalnotusedinmodeltraining(“external)

AUCsontestdatafromhospitalsusedinmodeltraining(“internal”)

PadhraicSmyth:MonashUniversity,July2019:90

BayesianAssessmentofBlackBoxModels

•  Scenario–  Black-boxpredicBon(e.g.,neuralnetwork)hasbeentrained,parametersarefixed,wecanonlyquerythemodel

–  Wewishtoevaluateitsperformance(accuracy,calibraBon,precision,etc)onlineinanewenvironment

(NewworkinSmyth/SteyversgroupatUCIrvine)

ResultswithdeepnetworksonCIFARimageclassifica.on

PadhraicSmyth:MonashUniversity,July2019:91

BayesianAssessmentofBlackBoxModels

•  Scenario–  Black-boxpredicBon(e.g.,neuralnetwork)hasbeentrained,parametersarefixed,wecanonlyquerythemodel

–  Wewishtoevaluateitsperformance(accuracy,calibraBon,precision,etc)onlineinanewenvironment

(NewworkinSmyth/SteyversgroupatUCIrvine)

N=100queries N=500queries N=10,000queries

ResultswithdeepnetworksonCIFARimageclassifica.on

PadhraicSmyth:MonashUniversity,July2019:92

BayesianAssessmentofAccuracyandCalibraBon(OngoingworkinSmyth/SteyversgroupatUCIrvine)

ResultsonCIFARimageclassificaBondataset

PadhraicSmyth:MonashUniversity,July2019:93

BayesianAssessmentviaRankingandAcBveLearning(OngoingworkinSmyth/SteyversgroupatUCIrvine)

palm treewardrobe

motorcyclesunflowerkeyboard

Most Accurate

0.0 0.2 0.4 0.6

lizardseal

ottershrew

boy

Least Accurate

Bayesianrankingbypredictedclass

ClasswithLeastAccuratePredicBons

ClasswithLeastCalibratedPredicBons

Bayesianac.velearning

PadhraicSmyth:MonashUniversity,July2019:94

THEOVERFITTINGQUESTION

PadhraicSmyth:MonashUniversity,July2019:95

FromPoggioetal,2018:TheoryofdeeplearningIII:thenon-overfitngpuzzle

LackofOverfitngofDeepNetworksonCIFAR-10

Moreparametersthandata

PadhraicSmyth:MonashUniversity,July2019:96

LackofOverfitng:DifferentNetworks,DifferentData

FromNeyshaburetal,2018;TowardsunderstandingtheroleofoverparametrizaBoningeneralizaBonofneuralnetworks

CIFAR-10Data SVHNData

MNISTData

PadhraicSmyth:MonashUniversity,July2019:97

OverfitngintheDLLiterature•  Standardbias-variancetheoryseemsnottoapply

–  DLmodelscaninterpolatethedata(zerotrainingerror)butsBllgeneralizewellontestdata

•  Trainingerror(orloss)tendstoodenbemuchlowerthantesterror–  ThisistradiBonallyanindicatorofoverfitng….butnothere

•  Variousemergingconjecturesandtheories–  e.g.minimum-norminterpolatorsgeneralizewellinoverparametrizedregime(seeBelkinetal(2018,2019),HasBeetal(2019)

…..butverymuchsBllanopenproblem

PadhraicSmyth:MonashUniversity,July2019:98

The“DoubleDescent”TheoryBelkinetal.,Reconcilingmodernmachinelearningandthebias-variancetradeoff,2018

PadhraicSmyth:MonashUniversity,July2019:99

“DoubleDescent”onMNISTData

RecentworkfromstaBsBcsthatconfirmsthesetheories:HasBeetal,ArXiv,2019

Belkinetal.,Reconcilingmodernmachinelearningandthebias-variancetradeoff,2018

PadhraicSmyth:MonashUniversity,July2019:100

CONCLUDINGCOMMENTS

PadhraicSmyth:MonashUniversity,July2019:101

CauBonaryNotesaboutDeepLearning

•  Verylargeamountsoflabeleddataneeded(forclassificaBonproblems)

•  ExtrapolaBonproperBesareunpredictable

•  ModelbuildingandopBmizaBoncanbecomplex(significanthumaneffort)

•  InterpretabilityandexplanaBonaredifficult•  Relianceonempirical“folkwisdom”ratherthanprinciplesandtheory

PadhraicSmyth:MonashUniversity,July2019:102

QuesBonsworthaskingforAIApplicaBons

1.  Ismachinelearninganappropriateapproach?

2.  Ifso,isdeeplearningthebestapproach?

3.  HowdowebuildmodelsthatgeneralizewelltonewsituaBons?

PadhraicSmyth:MonashUniversity,July2019:103

Scullleyetal,NIPS2015Conference

PadhraicSmyth:MonashUniversity,July2019:104

ConcludingComments

•  Deeplearninghasachievedimpressiveresultsinpamernrecogni.on–  ParBcularlyusefulwithhigh-dimensionalsignals(images,speech,text)

•  Manyfounda.onalideasaregroundedinsta.s.cs(includingothertopicswedidnotdiscuss:fairness,adversarial/robustlearning,reinforcementlearning,…)

PadhraicSmyth:MonashUniversity,July2019:105

ConcludingComments

•  Deeplearninghasachievedimpressiveresultsinpamernrecogni.on–  ParBcularlyusefulwithhigh-dimensionalsignals(images,speech,text)

•  Manyfounda.onalideasaregroundedinsta.s.cs(includingothertopicswedidnotdiscuss:fairness,adversarial/robustlearning,reinforcementlearning,…)

•  However,deeplearninghasblindspots

–  e.g.,reportedempiricalaccuraciesmaybeopBmisBc

•  Asdeepmachinelearningisappliedmorebroadlyweneed–  Robustprinciplesandtheorytoguidemodel-building–  ObjecBvediagnosisandevaluaBonmethodsforpracBBoners

PadhraicSmyth:MonashUniversity,July2019:106

THANKYOUFORLISTENINGQUESTIONS?

PadhraicSmyth:MonashUniversity,July2019:107

AddiBonalReading•  Efron,Bradley,andTrevorHasBe.ComputerAgeSta;s;calInference.Cambridge

UniversityPress,2016.(Chapter18:NeuralNetworksandDeepLearning).

•  Goodfellow,Ian,YoshuaBengio,AaronCourville.DeepLearning.Cambridge:MITPress,2016

•  Jordan,MichaelI.,andTomM.Mitchell.Machinelearning:Trends,perspecBves,and

prospects.Science349.6245(2015):255-260.

•  Taddy,Ma[.TheTechnologicalElementsofAr;ficialIntelligence.No.w24301.NaBonalBureauofEconomicResearch,2018.

•  Brynjolfsson,Erik,andTomMitchell.Whatcanmachinelearningdo?WorkforceimplicaBons.Science358.6370(2017):1530-1534.

•  Breiman,L.(2001).StaBsBcalmodeling:Thetwocultures.Sta;s;calScience,16(3),199-231.

Recommended