Statistical methods for big data in life sciences and ...€¦ · Statistical methods for big data...

Statistical methods for big data

in life sciences and health with R

Philippe Jacquet

Data Scientist at CADMOS

June 2018

CADMOS

CADMOS: Center for Advanced Modeling Of Science

CADMOS

Free service for all UNIL, EPFL and UNIGE researchers

CADMOS

Help with HPC parallel computing (code optimization) orBig Data analysis (pipeline development), including somecomputing power at SCITAS (the EPFL cluster)

CADMOS

Help with HPC parallel computing (code optimization) orBig Data analysis (pipeline development), including somecomputing power at SCITAS (the EPFL cluster)

Contact:UNIL: philippe.jacquet@unil.ch, etienne.orliac@unil.chEPFL: vincent.keller@epfl.chUNIGE: jean-luc.falcone@unige.ch

Outline

13:30 – 15:00 Lecture on Neural Network

15:00 – 15:30 Coffee break

15:30 – 17:00 Practicals on Neural Network

Outline

13:30 – 15:00 Lecture on Neural Network

15:30 – 17:00 Practicals on Neural Network

TOMORROW (Room 321 - Amphipôle)

09:00 – 10:30 Lecture on Decision Tree and Random Forest

11:00 – 12:30 Practicals on Decision Tree and Random Forest

Outline

If you have any question during the lecture

you are welcome to ask

Outline

We will analyse the same dataset (iris data)

with all the methods

Outline

And present some applications

in health science

The iris dataset

4 measurements on 50 flowers in 3 species

The iris dataset

Sepal length Sepal width Petal length Petal width Species

5.1 3.5 1.4 0.2 setosa

7.0 3.2 4.7 1.4 versicolor

6.3 3.3 6.0 2.5 virginica

The iris dataset

5.1 3.5 1.4 0.2 setosa

7.0 3.2 4.7 1.4 versicolor

6.3 3.3 6.0 2.5 virginica

input output

The iris dataset

Given an input (4 measurements)

predict its output (species)

Machine learning

MACHINE LEARNING

UNSUPERVISED LEARNING- only input data- explorative models: clustering, dim. reduction

Machine learning

MACHINE LEARNING

UNSUPERVISED LEARNING- only input data- explorative models: clustering, dim. reduction

SUPERVISED LEARNING- input data + output data- predictive models: classi�cation, regression

Supervised learning

Data splitting:

train validation test

0 60% 80% 100%

model selection model assessementmodel training

Supervised learning

Data splitting:

0 60% 80% 100%

Model training: estimate the model parameters

Supervised learning

Data splitting:

0 60% 80% 100%

Model selection: estimate the performance of differentmodels in order to choose the best one

Supervised learning

Data splitting:

0 60% 80% 100%

Model selection: estimate the performance of differentmodels in order to choose the best one

Model assessment: having chosen a final model, estimateits prediction accuracy on new data

Supervised learning

EXAMPLE

Supervised learning

Model 1= Neural Network

Supervised learning

Estimate the model parameters using the training set

Supervised learning

Measure the model accuracy based on the validation set

Supervised learning

Model 2 = Decision Tree

Supervised learning

Model 2 = Decision Tree

Supervised learning

Model 3 = Random Forest

Supervised learning

Model 3 = Random Forest

Supervised learning

Model Selection

Supervised learning

Model Selection

Select the best model based on its accuracy

Supervised learning

Model Selection

Select the best model based on its accuracy

Neural Network = Decision Tree

Supervised learning

Model Assessment

Supervised learning

Model Assessment

Mesure best model accuracy on the test set

Supervised learning

Model Assessment

Mesure best model accuracy on the test set

THE END

Neural network for the iris data

How to build a neural network

for the iris data ?

Input Layer Output LayerHidden Layer

Sepal LengthProb. Setosa

Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica

Input Layer Output LayerMany Hidden Layers

Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica

Deep Learning = More than 1 Hidden Layer

Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica

z(1) y1

w(1)11

w(1)54

w(2)11

w(2)35

Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica

Weights Weights

z(1) y1

w(1)11

w(1)54

w(2)11

w(2)35

Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica

Weights WeightsActivationFunction

z(1) y1

w(1)11

w(1)54

w(2)11

w(2)35

Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica

What is

the mathematical formula

for this neural network ?

z(1) y1

w(1)11

w(1)54

w(2)11

w(2)35

Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica

X W(1) Z(1) ReLU Z(2) W(2) Z(3) SoftMAX Y

z(1) y1

w(1)11

w(1)54

w(2)11

w(2)35

Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica

W(1) Z(1) ReLU Z(2) W(2) Z(3) SoftMAX Y

z(1) y1

w(1)11

w(1)54

w(2)11

w(2)35

Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica

Z(1) ReLU Z(2) W(2) Z(3) SoftMAX Y

W (1)X

z(1) y1

w(1)11

w(1)54

w(2)11

w(2)35

Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica

ReLU Z(2) W(2) Z(3) SoftMAX Y

� �� W (1)X

z(1) y1

w(1)11

w(1)54

w(2)11

w(2)35

Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica

Z(2) W(2) Z(3) SoftMAX Y

� �� W (1)X )

z(1) y1

w(1)11

w(1)54

w(2)11

w(2)35

Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica

W(2) Z(3) SoftMAX Y

� �� W (1)X )� ��

z(1) y1

w(1)11

w(1)54

w(2)11

w(2)35

Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica

Z(3) SoftMAX Y

W (2) ReLU(

� �� W (1)X )� ��

z(1) y1

w(1)11

w(1)54

w(2)11

w(2)35

Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica

Y = SoftMax(W (2) ReLU(

� �� W (1)X )� ��

In summary

z(1) y1

w(1)11

w(1)54

w(2)11

w(2)35

Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica

Y = f (X ) = SoftMax(W (2)ReLU(W (1)X ))

Forward propagation

In forward propagation, we go from X to Y by applying a seriesof functional transformations :

Y = SoftMax(W (2)ReLU(W (1)X ))

Forward propagation

Questions:

How do we determine the weights W (1) and W (2) ?

Forward propagation

Questions:

Why do we need the activation function ReLU ?

Forward propagation

Questions:

Why do we need the activation function ReLU ?

Why do we use the SoftMax function at the end ?

Weight matrices

How do we determine

the weight values ?

Weight matrices

We want to choose the weights to make

the best predictions possible

Weight matrices

We will use an algorithm

to find these weights

Weight matrices

This algorithm needs

some initial weight values

Weight matrices

How do we choose

the initial weight values ?

Weight matrices

The naive choice

Weight matrices

We set all the weights to zero

Weight matrices

All the predictions are the same:

P=1/3 for the three species

Weight matrices

The weights are updated by the same amount

and thus cannot get unequal values

Weight matrices

The network cannot learn

Weight matrices

A good choice

Weight matrices

Random initialisation around 0

Weight matrices

We choose the weights according to

a normal distribution N (0, σ2)

Weight matrices

We choose the weights according to

a normal distribution N (0, σ2)

The value of σ2 affects the performance

(speed and accuracy)

Weight matrices

Two common choices

σ2 =1

Nin(Xavier)

σ2 =2

Nin + Nout(Glorot & Bengio)

Nin = the number of nodes going in the weight matrix W

Nout = the number of nodes going out the weight matrix W

Weight matrices

Probabilistic framework

Linear model:

Y = W X

Nin�

Wij Xj , i = 1, . . . ,Nout

Assumptions: Xj ∼ N (0,σ2x ) and Wij ∼ N (0,σ2)

The condition V (Yi) = V (Xj) leads to the values for σ2:Xavier and Glorot (when including backpropagation)

Weight matrices

EXAMPLE

Input Layer Hidden Layer

w(1)11

w(1)54

Weight matrices

EXAMPLE

w(1)11

w(1)54

Nin = 4

Nout = 5

Weight matrices

EXAMPLE

w(1)11

w(1)54

Nin = 4

Nout = 5

STEP 1

5.1 3.5 1.4 0.2 setosa

7.0 3.2 4.7 1.4 versicolor

6.3 3.3 6.0 2.5 virginica

input output

STEP 1

Matrix multiplication (σ = 0.5):

W (1) X =

0.5 0.1 −0.2 −0.4−0.4 1.0 0.5 1.0−0.2 −0.2 −0.5 −0.10.2 0.7 0.3 0.20.6 0.6 0.1 −0.4

5.13.51.40.2

5.42.22.9−1.7−2.4

Activation function

Why do we need

the activation function ReLU ?

Activation function

A neural network without any non-linear activation functionwould simply be a linear regression model

Activation function

There are several possible non-linear activation functions:

Activation function

There are several possible non-linear activation functions:

ReLU is very simple and very popular in deep neural networks

STEP 2

Activation function (ReLU):

ReLU(W (1) X ) = ReLU(

5.42.22.9−1.7−2.4

5.42.22.900

STEP 3

Matrix multiplication (σ = 0.5):

W (2)ReLU(W (1)X ) =

0.6 0.1 0.9 −0.2 −0.50.3 −0.3 0.3 −0.9 −0.90.3 0.2 0.4 −1.0 0.6

5.42.22.900

1.90.32.9

SoftMax function

Why do we use the

SoftMax function ?

SoftMax function

z(1) y1

w(1)11

w(1)54

w(2)11

w(2)35

Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica

SoftMax function

z(1) y1

w(1)11

w(1)54

w(2)11

w(2)35

Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica

The output Z (3) are arbitrary real values and we want to convertthem into probabilities:

SoftMax(Z (3))i =ez(3)

ez(3)1 + ez(3)

2 + ez(3)3

= Prob. Class i ∈ (0, 1)

SoftMax function

Binary classification

SoftMax function

Binary classification:

Empirical observation - logistic regression:

Motivate - sigmoid function (1 output neuron):

wikipedia

SoftMax function

Binary classification:

Empirical observation - logistic regression:

Motivate - sigmoid function (1 output neuron):

wikipedia

Multinomial logistic regression: SoftMax function (K classes):

P(X )� =eβ�X

�Kk=1 eβk X

SoftMax(X )� =eX�

�Kk=1 eXk

STEP 4

SoftMax function:

SoftMax(W (2)ReLU(W (1)X )) = SoftMax(

1.90.32.9

0.250.050.70

STEP 4

SoftMax function:

1.90.32.9

0.250.050.70

STEP 4

SoftMax function:

1.90.32.9

0.250.050.70

Prediction: Virginica

STEP 4

SoftMax function:

1.90.32.9

0.250.050.70

Prediction: Virginica This is wrong! It is Setosa

Network training using gradient descent

What to do when

the prediction is wrong ?

Statistical methods for big data in life sciences and ...€¦ · Statistical methods for big data...

Documents

CADMOS: A learning design tool for Moodle courses

STATISTICAL DATA - econ.core.hu

Statistical Tools - SigmaXL Version 6 - Excel Statistical Data

Indonesia Statistical Data

International Statistical Data

Data and Statistical Notes

Cadmos Emi Journal

Statistical Data Mining

Statistical Data Analysis Posters

Guidelines for Collecting Data to Support Statistical ... · Guidelines for Collecting Data to Support Statistical Analysis ... Guidelines for Collecting Data to Support Statistical

Analyzing Data Properties using Statistical Sampling ... · Analyzing Data Properties using Statistical Sampling Techniques ... Introduction Exploring a Subset of Data Statistical

Recording Data & Statistical Tests

Federal Statistical Research Data Centers (FSRDC)...Federal Statistical Research Data Centers (FSRDC) Samuel R. Bondurant Dallas-Fort Worth Federal Statistical Research Data Center

COLLECTING STATISTICAL DATA

Dental Cadmos MRONJ: comparison between two surgical ... · Dental Cadmos MRONJ: comparison between two surgical approaches--Manuscript Draft--Manuscript Number: DentalCadmos-D-19-00089R3

13 Collecting Statistical Data

CADMOS EUROPEAN ENGAGEMENT FUND - PPT€¦ · CADMOS EUROPEAN ENGAGEMENT FUND ... BP PLC Oil & Gas United Kingdom ... RECKITT BENCKISER GROUP Personal & Household Goods United Kingdom

Statistical Stop Data

MNGT6232 Data Analysis & Statistical Modelling for ... · Data Analysis & Statistical Modelling for Business ... Data Analysis & Statistical Modelling for Business2 ... Monitoring

Big Data, Statistical Inference and Official Statistics • BIG DATA, STATISTICAL INFERENCE AND OFFICIAL STATISTICS † 1351.0.55.054 1 BIG DATA, STATISTICAL INFERENCE AND OFFICIAL