Bayesian Inference & Neural Networks

Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline Level

/

Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline Level

/

Bayesian Inference & Neural Networks

Lukasz Krawczyk, 1st March 2017

HomoApriorius

HomoPragmaticus

HomoFriquentistus

HomoSapiens

HomoBayesianis

Good eveningMy name is Today Id like to talk about...

Agenda

About me

The Problem

Bayesian Inference

Hierarchical Models


About me

Data Scientist at Asurion Japan Holdings

Previous: Data Scientist at Abeja Inc.

MSc degree from Jagiellonian University, Poland

Contributor to several ML libraries

Currently Im working as a ...

PART 1
The Problem

Missing Uncertainty

Making a confident error is the worst thing we can do

With DL models we generally only have point estimates of parameters and predictions

Hard to make decisions when were not able to tell whether aDL model is certain about its output or not

Trust and adoption of DL is still low

PART 2
Bayesian Inference

Bayesian Inference

Inference

Posterior

Data

Credibile Region

Uncertainity

Better Insights

Prior

Model

Assumptions about datacontrolled by the prior

Bayes formula

P(true | D): The posteriorthe probability of the model parameters given the data: this is the result we want to compute.

P(D | true): The likelihoodproportional to the likelihood estimation in the frequentist approach.

P(true): The model priorencodes what we knew about the model prior to the application of the data D.

P(D): The data probabilitywhich in practice amounts to simply a normalization term.

There is no presentation about without Bayes Formula, so just as a reminder

Bayesian Inference

BayesianInferenceGeneral purpose framework

Generative models

Clarity of FS + Power of MLWhite-box modelling

Black-box fitting (NUTS, ADVI)

Uncertainity Intuitive insights

Learning from very small datasets

Probabilistic Programming

Automatic Differentation Variational InferenceNo U-Turn Sampler

Bayesian Inference

Bayesian Optimization (GP)

Hierarchical models (badass models)

Bonus pointsRobust in high dimensions

Minibatches

Knowledge transfer

BayesianInference

Bayesian Inference

Very easy way to cook your laptop

PART 3
Hierarchical Models

Hierarchical Models parameter pooling

Pooled

Unpooled

Partial-pooling

More accurate fittingNot enough dataGeneralization

Small datasetsMissing variations among groups

Example call duration model

Each advisor has his/her own distribution

Overall Call Center distribution is controlled by hyper parameter

}

Hierarchical Models - benefits

Modelling is very easy and intuitive

Natural hierarchical structure of observational data

Variation among individual groups

Knowledge transfer between groups

PART 4

Synergy

Replace weights with probability distributions

Example standard NN

x1 x2 y0.1 1.0 00.1 -1.3 1

2 hidden layers

sigmoid

tanh

Data

Backpropagation

Example NN with Bayesian Backpropagation

n=2

BayesianBackpropagation

2 hidden layers

Data

x1 x2 y0.1 1.0 [0,1,...]0.1 -1.3 [1,1,...]

Results

Uncertainity

Standard NN

NN with Bayesian Backpropagation

Synergy going deeper

~

~ Bayesian Hierarchical Model

Weight regularization similar to L2


~

~ Bayesian Hierarchical Model

RegularizationWeight regularization similar to L2


Bayesian Hierarchical Model

~ ~


Bayesian Hierarchical Model

~ ~

Knowledge transfer

Why is this important?

Scientific perspectiveNN models with small datasets

Complex hierarchical neural networks (Bayesian CNN)

Minibatches

Knowledge transfer

Business perspectiveClear and intuitive models

Uncertainity in Finance & Insurance is extremely important

Better trust and adoption of Neural Network-based models

Thank you!

Technology

Bayesian Inference & Neural Networks