107
Deep Learning and Sta.s.cs: Connec.ons Padhraic Smyth Chancellor’s Professor Departments of Computer Science and Sta.s.cs University of California, Irvine [email protected]

Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

DeepLearningandSta.s.cs:Connec.ons

PadhraicSmythChancellor’sProfessorDepartmentsofComputerScienceandSta.s.csUniversityofCalifornia,[email protected]

Page 2: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:2

AIResearch10to20yearsago

LogicandAutomatedReasoning

KnowledgeRepresenta.on

MachineLearningNatural

LanguageProcessing

SpeechRecogni.on

ComputerVision

GamePlaying

SearchAlgorithms

Robo.cs

Page 3: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:3

AIResearchin2019

LogicandAutomatedReasoning

KnowledgeRepresenta.on

NaturalLanguageProcessing

SpeechRecogni.on

ComputerVision

GamePlaying

SearchAlgorithms

Robo.cs

DeepMachineLearning

Page 4: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:4

(Russakovskyetal,2015)ImageNetLargeScaleVisualRecogniBonChallenge

Page 5: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:5

FigurefromKevinMurphy,Google,2016

Page 6: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:6

FigurefromKevinMurphy,Google,2016

Deepneuralnetworks

Page 7: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:7

Page 8: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:8

DeepNetworksforDetecBngSkinCancer

FromEstevaetal,Nature,2017

Page 9: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:9

Microsoft/IBM Benchmarks for Speech Recognition

Source: https://www.economist.com/node/21710907/sites/all/modules/custom/ec_essay

Page 10: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:10

Microsoft/IBM Benchmarks for Speech Recognition

Source: https://www.economist.com/node/21710907/sites/all/modules/custom/ec_essay

Page 11: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:11

2016 IEEE Conference on Acoustics, Speech, and Signal Processing

Page 12: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:12

FromKodish-Wachsetal,AMIASymposium,2018

Page 13: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:13

PedestrianDetecBon:AlgorithmsandHumans

Algorithms

HumanAnnotators

FromZhangetal,CVPR2016

Page 14: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:14

APerspecBveonDeepLearning•  Deeplearning(DL)research:

–  Highvisibilitysuccessesinvision,speech,text,game-playing–  Fundingagencies,companies,students,publicareinaweofdeeplearning–  Companiesaredrivingalotoftheinterest–  Highlyempirical–li[leguidancefromtheory–  Fewlinks(todate)tostaBsBcsorstaBsBcalthinking

•  Academicresearchcanplayakeyrole–  Computerscience,staBsBcs,mathemaBcs,etc–  Provideguidance:wheredoesDLworkwell?Andnotsowell?

•  ObjecBveempiricalanalyses•  Developmentofprinciplesandtheory

–  Providebalancetothe“hype”

Page 15: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:15

OutlineofToday’sTalk

•  Keyideasindeeplearning

•  LinkstostaBsBcalthinking

•  LimitaBonsofcurrentdeeplearning

•  OpportuniBesfornewideasanddirecBons

Page 16: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:16

PredicBveModeling

f=blackbox predicBonoftargetyinputsx

parametersθ

Goalistolearnamodelfromtrainingdatatopredictyvalues

Machinelearning:emphasisonpredic.onsofySta.s.cs:emphasisonmodelsandparameters

Page 17: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:17

Training Data: D = {xi, yi}, i = 1, . . . , N

Model: yi ⇡ f(xi;✓✓✓)

d-dimensionalinputvector

targetvalue

funcBonalformofthemodel

p-dimensionalparametervector(unknown)

Loss: �

�yi, f(xi;✓✓✓)

Model’sPredicBon

IdealTarget

Page 18: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:18

TheThreeComponentsofPredicBveModeling

1.Predic.onModelf:WhatfuncBonalformshouldwechooseforf?

2.LossFunc.onHowdowecomparef’spredicBonstotruey?

3.Op.miza.onGivenfandalossfuncBon,howcanwelearnf’sparameters

Page 19: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:19

TheThreeComponentsofPredicBveModeling

1.Predic.onModelf:WhatfuncBonalformshouldwechooseforf?

2.LossFunc.onHowdowecomparef’spredicBonstotruey?

3.Op.miza.onGivenfandalossfuncBon,howcanwelearnf’sparameters

Page 20: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:20

ExamplesofPredicBonModels

-100 -50 0 50 1000

0.2

0.4

0.6

0.8

1

LinearRegression

Logis.cRegression

f(x;✓✓✓) = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd

=dX

j=0

✓jxj = ✓

Tx

f(x;✓✓✓) = P (y = 1|x;✓✓✓)

=1

1 + e�z, z = ✓✓✓Tx

z

1

1 + e�z

Page 21: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:21

LogisBcRegressionasaSimpleNeuralNetwork

x1

x2

x3

+1

Each“edge”inthenetworkhasanassociatedweightorparameter,θj

f(x;✓✓✓) = P (y = 1|x;✓✓✓)

=1

1 + e�z, z = ✓✓✓Tx

Page 22: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:22

ANeuralNetworkwithOneHiddenLayer(from1990’s)

Herethemodellearns3differentlogisBcfuncBons,eachonea“hiddenunit”andthencombinestheoutputsofthe3tomakeapredicBonMorecomplexthanlogisBcfuncBon,manymoreparameters

x1

x2

x3

+1HiddenLayer

Output

Inputs

f(x;✓✓✓)

Page 23: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:23

DeepLearning:ModelswithMoreHiddenLayers

Usethisideatorecursivelybuild“deepmodels”withmulBplehiddenlayers

x1

x2

x3

+1

Veryflexible,highlynon-linearfuncBonsCanhavedifferenttypesofnon-lineariBes,skiplayers,etc

HiddenLayer1

HiddenLayer2

Output

Inputs

f(x;✓✓✓)

Page 24: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:24

Figurefromh[p://parse.ele.tue.nl/

ExampleofaNetworkforDigitClassificaBon

MathemaBcallythenetworkisjustadifferenBablefuncBon…butaverycomplicatedone

EachoutputisanesBmateofaclassprobabilityP(c=k|x),implementedviaamulBnomiallogisBcfuncBon

Inputpixels,nofeatureextracBon

Page 25: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:25

DeepNetworkarchitectureforGoogLeNetimagerecogni.onnetwork27layers,millionsofparameters

PixelInputs

Output

Page 26: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:26

ABriefHistoryofNeuralNetworks…•  ThePerceptronEra:1950sand60s

–  GreatopBmismwithperceptrons(linearmodels)....–  ...unBlMinsky,1969:perceptronshadlimitedrepresentaBonpower–  Hardproblemsrequirehiddenlayers....buttherewasnotrainingalgorithm

•  TheBackpropaga.onEra:Late1980stomid-90’s–  InvenBonofbackpropagaBon–trainingofmodelswithhiddenlayers–  Wildenthusiasm(intheUSatleast)....conferences,funding,etc–  Mid1990’s:enthusiasmdiesout:trainingdeepNNsishard

•  TheDeepLearningEra:2010-present–  3rdwaveofneuralnetworkenthusiasm–  Whathappenedsincemid90’s?

•  NowpracBcaltotraindeepnetworks• MuchlargerdatasetsandgreatercomputaBonalpower•  FastopBmizaBontechniques+othergoodengineeringtricks

Page 27: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:27

FigureadaptedfromEfronandHasBe,Computer-AidedStaBsBcalInference,2016

Page 28: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:28

FeatureextracBonFigureadaptedfromEfronandHasBe,Computer-AidedStaBsBcalInference,2016

Page 29: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:29

FigureadaptedfromEfronandHasBe,Computer-AidedStaBsBcalInference,2016

Rec.fiedLinearUnit(ReLu)Connec.onswithlinearsplines(e.g.,EckleandSchmidt-Heiber,2018)

Page 30: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:30

FeatureextracBon LogisBcModelFigureadaptedfromEfronandHasBe,Computer-AidedStaBsBcalInference,2016

Page 31: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:31

MachineLearningbeforeDeepModels

FigurefromMarc’Aurelio-Ranzato

Page 32: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:32

DeepConvoluBonalNetwork

FigurefromPeemanetal,2012

Keypoint:end-to-enddifferen.abilityallowsfeatures(convolu.onalfilters)tobelearned,removesneedforhand-cradedfeatureextrac.on(densewordembeddingsplaythesamerolefortext)

Page 33: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:33

ConvoluBonalFiltersforImageData

FigurefromMarc’Aurelio-Ranzato

Keyidea:LearnsuchfiltersinadiscriminaBvefashion

Page 34: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:34

ExamplesofLearnedSpaBalFiltersinPixelSpace

Page 35: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:35

GeneralizedLinearModels

(e.g.,logisBc)

Page 36: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:36

RecursiveGLMs

Define a latent feature via a GLM:

[Mohamed, 2015; Tran et al, 2018]

Page 37: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:37

Define GLM on latent feature:

RecursiveGLMs

Define a latent feature via a GLM:

[Mohamed, 2015; Tran et al, 2018]

Page 38: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:38

BuildingNeuralNetsfromRecursiveGLMs

Page 39: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:39

BuildingNeuralNetsfromRecursiveGLMs

Page 40: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:40

BuildingNeuralNetsfromRecursiveGLMs

HIDDENUNIT/NEURON

HIDDENLAYER

ACTIVATIONFUNCTION

WEIGHTMATRIX

Page 41: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:41

DeepNeuralNetworks

Page 42: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:42

DeepNeuralNetworks

……..

Page 43: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:43

DeepNeuralNetworks

……..

Page 44: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:44

DeepNeuralNetworks

…....ineffectdoingregressionwithlearnedfeaturesorbasisfuncBons

…..

Feature Extractor Statistical Model

NEW REPRESENTATION OF FEATURES COMPUTED BY NN LAYERS

Page 45: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:45

DeepNeuralNetworkRepresentaBons•  Itmaybeusefultoviewdeepnetworksastrainablefeatureextractorswith

staBsBcalmodelsas“back-ends”’•  ThisviewencouragesthemixingofdeterminisBcDNNrepresentaBons

(“embeddings”)withconvenBonalstaBsBcalmodels

•  Examples–  Deepsurvivalmodels–  Neuralpointprocessmodels–  RecurrentnetworkmodelsforBme-series

Page 46: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:46

RecurrentNetworksandState-SpaceModels

Page 47: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:47

RecurrentNetworksandState-SpaceModels

RNNstructureissimilartostate-spacemodelsinstaBsBcse.g.,Kalmanfilters,hiddenMarkovmodels,andsoon

RNN:nodistribuBonalassumpBonsonstatevariables->moreflexibilityState-spaceapproach:be[ercharacterizaBonofuncertainty

Page 48: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:48

TheThreeComponentsofPredicBveModeling

1.Predic.onModelf:WhatfuncBonalformshouldwechooseforf?

2.Lossfunc.onHowdowecomparef’spredicBonstoy?

3.Op.miza.onGivenfandalossfuncBon,howcanwelearnf’sparameters

Page 49: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:49

Loss: �

�yi, f(xi;✓✓✓)

Model’sPredicBon

IdealTarget

Page 50: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:50

Loss: �

�yi, f(xi;✓✓✓)

Example: Squared Error � =

�yi � f(xi;✓✓✓)

�2

Example: Log Loss � = log

1

P (yi|xi;✓✓✓)

Page 51: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:51

Loss: �

�yi, f(xi;✓✓✓)

Example: Squared Error � =

�yi � f(xi;✓✓✓)

�2

Example: Log Loss � = log

1

P (yi|xi;✓✓✓)

Page 52: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:52

Empirical Loss:

L(✓✓✓) =

NX

i=1

�yi, f(xi;✓✓✓)

funcBonalformofthemodel

p-dimensionalparametervector(unknown)

sumovertrainingdatapoints

FocusisongetngpointesBmatesofθ,byminimizaBonofriskSimplemodels:lossisconvex,opBmizaBoncanbestraighuorwardDeepnetworkmodels:thelossisnon-convex,difficulttoopBmize

EmpiricalRisk

Page 53: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:53

TheThreeComponentsofPredicBveModeling

1.Predic.onModelf:WhatfuncBonalformshouldwechooseforf?

2.Lossfunc.onHowdowecomparef’spredicBonstoy?

3.Op.miza.onGivenfandalossfuncBon,howcanwelearnf’sparameters

Page 54: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:54

GradientDescent

Scalarlearningrate:Howfarwemove

Vectorgradient:Direc.onwemoveUpdatedp-dimensional

parametervector

Currentparametervector

Simplegradientmethodsarethe“workhorse”ofmachinelearningNewton(2ndorder)methodsarerarelyused

……….requiresinversionofp x p Hessianmatrix,O(p3)

✓(k+1) = ✓(k) � � rL(✓)

Page 55: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:55

rL(✓) =NX

i=1

rLi(✓)

rL(✓) ⇡ N

m

mX

j=1

rLj(✓)

FullGradient:

Stochas.cGradient:

ApproximaBonofthefullgradient

Randomsampleofmdatapoints(“mini-batch”)

IntuiBon:form << N,wecanmakemanyfastnoisyupdatesCanleadtosublinearconvergenceforlargeN

Page 56: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:56

StochasBcGradientin2dParameterSpace

Gradientsteps

Stochas.cgradientsteps

Empiricallyworksverywellonlargedatasets:sometheoreBcalsupportAnapplicaBonofRobbins-Monro(1951)stochasBcapproximaBonmethodUsefulforstaBsBcalmodelfitngingeneral(notjustfordeeplearning)

e.g.,Wangetal,2015;Chenetal,2016

Page 57: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:57

CONNECTIONSTOSTATISTICS

Page 58: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:58

TheThreeComponentsofPredicBveModeling

Model+LossFunc.on+Op.miza.onMethod

Thefunc.onalformoff

Howwemeasurethequalityofthemodel’spredic.ons

Thealgorithmthatfindstheparametersthatminimizeempiricalrisk

Deeplearningwaspresentedasanop.miza.onproblemWhereissta.s.cslurking?

Page 59: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:59

Empirical Loss with Regularization:

L(✓✓✓) =

NX

i=1

�yi, f(xi;✓✓✓)

�+ �R(✓✓✓)

EmpiricalRiskMinimizaBon

FindtheparametersthatminimizeempiricalriskontrainingdataThisdirectlycorrespondstomaximizinglikelihood:

Squarederrorloss->GaussianlikelihoodforregressionLogloss->binomial/mulBnomiallikelihoodforclassificaBon

ImplicaBon:implicitcondiBonalindependenceassumpBonoverdata

Page 60: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:60

Empirical Loss with Regularization:

L(✓✓✓) =

NX

i=1

�yi, f(xi;✓✓✓)

�+ �R(✓✓✓)

StrengthofregularizaBon

RegularizaBononparameters

EmpiricalRiskMinimizaBonwithRegularizaBon

TheregularizaBontermcanbeinterpretedas(minus)alogprior RL2(✓✓✓) =

X✓2j

RL1(✓✓✓) =X

|✓j |

InaddiBon,DLtechniquessuchasdropoutcanbeinterpretedasaformofprior–generalizestoabroad“dropoutfamily”(seeBhadraetal,ArXiv2019andNalisnicketal,ICML2019)

Gaussianprior

Laplacianprior

Page 61: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:61

Lookslikeadeterminis.cproblem?

Page 62: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:62

Lookslikeadeterminis.cproblem?

Page 63: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:63

Lookslikeadeterminis.cproblem?

Page 64: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:64

Minimizedbysetngf(x;θ)toE[y|x],ateveryx

Lookslikeadeterminis.cproblem?

Page 65: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:65

Minimizedbysetngf(x;θ)toE[y|x],ateveryx

Lookslikeadeterminis.cproblem?

Conclusion:op.miza.onproblemisreallyasta.s.cales.ma.onproblem

Page 66: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:66

TheBias-VarianceTradeoff

Expectedfutureerror=ModelBias2+ModelVariance+IntrinsicUncertaintyApproximaBon

errorEsBmaBon

errorLowerbound

Note:thedecomposi.onaboveisop.mis.c:assumesfuturedataisfromsamedistribu.onastrainingdata

Page 67: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:67

Page 68: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:68

FromNeal,Mi[al,etal,ArXiv,2019

UnexpectedBias-VarianceTrendswithDNNs

Page 69: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:69

ClassProbabiliBesForbothMSEandlog-losstheopBmalpredicBonatanyxisE[y|x]

ForK-aryclassificaBon,yisaK-dimensionalindicatorvector

i.e.,theopBmalpredictorforclasskistheprobabilityofthatclassSodeepnetworkswillproduceesBmatesofclassprobabiliBes…intheory,givenenoughdataandassumingnolocalminima(Note;thisisapropertyofthelossfuncBon,notdeepnetworks)

E[yk|x] = 1 P (yk = 1|x) + 0 P (yk = 0|x)= P (yk = 1|x)

Page 70: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:70

ExampleofTestBedData:CIFARImageClassificaBon•  Anexampleofawidelyused

datasetindeeplearningresearch–  Upto100classes–  50,000imagesfortraining–  10,000imagesfortest

•  Studiesongeneraliza.on,op.miza.on,etc,odenusethisdataset

Page 71: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:71

DeepNetworksareozenMiscalibrated(CIFARdata)

PredictedasTigerwithP(y|x)=0.99

PredictedasTelevisionwithP(y|x)=0.99

Page 72: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:72

DeepNetworksareozenMiscalibrated(CIFARdata)

NetworkofDepth5 NetworkofDepth110

FigurefromGuoetal,ICML2017

PredictedasTigerwithP(y|x)=0.99

PredictedasTelevisionwithP(y|x)=0.99

Page 73: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:73

Page 74: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:74

…….

ExpectedlosswithrespecttoP(x)…forthetrainingdata

Page 75: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:75

2 4 6 8 10 12 14

X values

4

6

8

10

12

14

16

Y va

lues

True E[y|x] functionObserved data

2 4 6 8 10 12 14

X values

4

6

8

10

12

14

16

Pred

icte

d an

d tr

ue Y

val

ues Model 95% confidence intervals

Model prediction for E[y|x]True E[y|x] function

WhatwillhappenwhenweextrapolatebeyondP(x)?

Page 76: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:76

2 4 6 8 10 12 14

X values

4

6

8

10

12

14

16

Y va

lues

True E[y|x] functionObserved data

2 4 6 8 10 12 14

X values

4

6

8

10

12

14

16

Pred

icte

d an

d tr

ue Y

val

ues Model 95% confidence intervals

Model prediction for E[y|x]True E[y|x] function

WhatwillhappenwhenweextrapolatebeyondP(x)?

Page 77: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:77

FromTatemetal.,Nature2004.(seealsoresponsele[ersath[p://faculty.washington.edu/kenrice/naturele[er.pdf)

Generalizingfrom100mOlympicWinningTimes

Page 78: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:78

FigurefromKevinMurphy,Google,2016

Deepneuralnetworks

Howwelldothesemodelsextrapolatetonewtypesofimages?

Page 79: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:79

Page 80: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:80

FromRechtetal,ICML2019

AccuracyofImageNetClassifiersonNewImageNetData

Page 81: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:81

Page 82: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:82

ADeepNeuralNetworkforImageRecogniBonFromNguyen,Yosinski,Clune,CVPR2015

Page 83: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:83

ADeepNeuralNetworkforImageRecogniBon

ImagesusedforTraining NewImages

FromNguyen,Yosinski,Clune,CVPR2015

Page 84: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:84

ADeepNeuralNetworkforImageRecogniBonFromNguyen,Yosinski,Clune,CVPR2015

Page 85: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:85

ADeepNeuralNetworkforImageRecogniBonFromNguyen,Yosinski,Clune,CVPR2015

Page 86: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:86

0 10 20 30 40 50 60 70 80 900

2000

4000

6000

8000

10000

12000

14000

AGE

MO

NTHL

Y IN

COM

E

DecisionBoundary

Poorextrapola.onfortestpointslikethis….

Page 87: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:87

Non-RobustnessinDeepImageClassificaBon

FigurefromEngstrometal,ICML2019

Page 88: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:88

ExternalversusInternalValidaBonFromZechetal.,PLOSMedicine,2018

AUCsontestdatafromhospitalnotusedinmodeltraining(“external)

AUCsontestdatafromhospitalsusedinmodeltraining(“internal”)

Page 89: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:89

ExternalversusInternalValidaBonFromZechetal.,PLOSMedicine,2018

AUCsontestdatafromhospitalnotusedinmodeltraining(“external)

AUCsontestdatafromhospitalsusedinmodeltraining(“internal”)

Page 90: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:90

BayesianAssessmentofBlackBoxModels

•  Scenario–  Black-boxpredicBon(e.g.,neuralnetwork)hasbeentrained,parametersarefixed,wecanonlyquerythemodel

–  Wewishtoevaluateitsperformance(accuracy,calibraBon,precision,etc)onlineinanewenvironment

(NewworkinSmyth/SteyversgroupatUCIrvine)

ResultswithdeepnetworksonCIFARimageclassifica.on

Page 91: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:91

BayesianAssessmentofBlackBoxModels

•  Scenario–  Black-boxpredicBon(e.g.,neuralnetwork)hasbeentrained,parametersarefixed,wecanonlyquerythemodel

–  Wewishtoevaluateitsperformance(accuracy,calibraBon,precision,etc)onlineinanewenvironment

(NewworkinSmyth/SteyversgroupatUCIrvine)

N=100queries N=500queries N=10,000queries

ResultswithdeepnetworksonCIFARimageclassifica.on

Page 92: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:92

BayesianAssessmentofAccuracyandCalibraBon(OngoingworkinSmyth/SteyversgroupatUCIrvine)

ResultsonCIFARimageclassificaBondataset

Page 93: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:93

BayesianAssessmentviaRankingandAcBveLearning(OngoingworkinSmyth/SteyversgroupatUCIrvine)

palm treewardrobe

motorcyclesunflowerkeyboard

Most Accurate

0.0 0.2 0.4 0.6

lizardseal

ottershrew

boy

Least Accurate

Bayesianrankingbypredictedclass

ClasswithLeastAccuratePredicBons

ClasswithLeastCalibratedPredicBons

Bayesianac.velearning

Page 94: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:94

THEOVERFITTINGQUESTION

Page 95: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:95

FromPoggioetal,2018:TheoryofdeeplearningIII:thenon-overfitngpuzzle

LackofOverfitngofDeepNetworksonCIFAR-10

Moreparametersthandata

Page 96: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:96

LackofOverfitng:DifferentNetworks,DifferentData

FromNeyshaburetal,2018;TowardsunderstandingtheroleofoverparametrizaBoningeneralizaBonofneuralnetworks

CIFAR-10Data SVHNData

MNISTData

Page 97: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:97

OverfitngintheDLLiterature•  Standardbias-variancetheoryseemsnottoapply

–  DLmodelscaninterpolatethedata(zerotrainingerror)butsBllgeneralizewellontestdata

•  Trainingerror(orloss)tendstoodenbemuchlowerthantesterror–  ThisistradiBonallyanindicatorofoverfitng….butnothere

•  Variousemergingconjecturesandtheories–  e.g.minimum-norminterpolatorsgeneralizewellinoverparametrizedregime(seeBelkinetal(2018,2019),HasBeetal(2019)

…..butverymuchsBllanopenproblem

Page 98: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:98

The“DoubleDescent”TheoryBelkinetal.,Reconcilingmodernmachinelearningandthebias-variancetradeoff,2018

Page 99: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:99

“DoubleDescent”onMNISTData

RecentworkfromstaBsBcsthatconfirmsthesetheories:HasBeetal,ArXiv,2019

Belkinetal.,Reconcilingmodernmachinelearningandthebias-variancetradeoff,2018

Page 100: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:100

CONCLUDINGCOMMENTS

Page 101: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:101

CauBonaryNotesaboutDeepLearning

•  Verylargeamountsoflabeleddataneeded(forclassificaBonproblems)

•  ExtrapolaBonproperBesareunpredictable

•  ModelbuildingandopBmizaBoncanbecomplex(significanthumaneffort)

•  InterpretabilityandexplanaBonaredifficult•  Relianceonempirical“folkwisdom”ratherthanprinciplesandtheory

Page 102: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:102

QuesBonsworthaskingforAIApplicaBons

1.  Ismachinelearninganappropriateapproach?

2.  Ifso,isdeeplearningthebestapproach?

3.  HowdowebuildmodelsthatgeneralizewelltonewsituaBons?

Page 103: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:103

Scullleyetal,NIPS2015Conference

Page 104: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:104

ConcludingComments

•  Deeplearninghasachievedimpressiveresultsinpamernrecogni.on–  ParBcularlyusefulwithhigh-dimensionalsignals(images,speech,text)

•  Manyfounda.onalideasaregroundedinsta.s.cs(includingothertopicswedidnotdiscuss:fairness,adversarial/robustlearning,reinforcementlearning,…)

Page 105: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:105

ConcludingComments

•  Deeplearninghasachievedimpressiveresultsinpamernrecogni.on–  ParBcularlyusefulwithhigh-dimensionalsignals(images,speech,text)

•  Manyfounda.onalideasaregroundedinsta.s.cs(includingothertopicswedidnotdiscuss:fairness,adversarial/robustlearning,reinforcementlearning,…)

•  However,deeplearninghasblindspots

–  e.g.,reportedempiricalaccuraciesmaybeopBmisBc

•  Asdeepmachinelearningisappliedmorebroadlyweneed–  Robustprinciplesandtheorytoguidemodel-building–  ObjecBvediagnosisandevaluaBonmethodsforpracBBoners

Page 106: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:106

THANKYOUFORLISTENINGQUESTIONS?

Page 107: Deep Learning and Stascs: Connecons · – High visibility successes in vision, speech, text, game-playing – Funding agencies, companies, students, public are in awe of deep learning

PadhraicSmyth:MonashUniversity,July2019:107

AddiBonalReading•  Efron,Bradley,andTrevorHasBe.ComputerAgeSta;s;calInference.Cambridge

UniversityPress,2016.(Chapter18:NeuralNetworksandDeepLearning).

•  Goodfellow,Ian,YoshuaBengio,AaronCourville.DeepLearning.Cambridge:MITPress,2016

•  Jordan,MichaelI.,andTomM.Mitchell.Machinelearning:Trends,perspecBves,and

prospects.Science349.6245(2015):255-260.

•  Taddy,Ma[.TheTechnologicalElementsofAr;ficialIntelligence.No.w24301.NaBonalBureauofEconomicResearch,2018.

•  Brynjolfsson,Erik,andTomMitchell.Whatcanmachinelearningdo?WorkforceimplicaBons.Science358.6370(2017):1530-1534.

•  Breiman,L.(2001).StaBsBcalmodeling:Thetwocultures.Sta;s;calScience,16(3),199-231.