Pls Tutorial

7/26/2019 Pls Tutorial

1/12

12/9/2013

1

Partial Least Squares

A tutorial

Lutgarde Buydens

Partial least Squares

Multivariate regression

Multiple Linear Regression(MLR)

Principal Component Regression (PCR)

Partial Least Squares (PLS)

Val idation

Preprocessing

Multivariate Regression

X Y

n

p k

Rows: Cases, observations, Collums: Variables, Classes, tags

Analytical observations of different samplesExperimental runs

Persons

.

P: Spectral variablesAnalytical measurements

K: Class information

Concentration,..X: Independent variabels (will be always available)

Y: Dependent variables ( to be predicted later from X)

200 0 4 00 0 6 00 0 8 00 0 1 00 00 1 20 00 1 40 00-0.5

0

0.5

1

1.5

2

Wavenumber(cm -1)

Rawdata

Multivariate Regressio n

X Y

n

p

k

Rows: Cases, observations Collums: Variables, Classes, tags

X: Independent variabels (will be always available)

Y: Dependent variables ( to be predicted later from X)

2 00 0 4 00 0 6 00 0 8 00 0 1 00 00 1 20 00 1 40 00-0.5

0

0.5

1

1.5

2

Wavenumber(cm-1

)

Rawdata

Y = f(X) : Predict Y from X

MLR: Multiple Linear Regression

PCR: Principal Component Regression

PLS: Partial Least Sqaures

From univariate to Mult iple Linear Regression (MLR)

Least squares regression

y= b0 +b1 x1 +

b0 : intercept

b1 : slope

y

x

MLR: Multiple Linear Regressio n

Least squares regression

y= b0 +b1 x1 +

b0 : intercept

b1 : slope

Multiple Linear Regression

y= b0 +b1 x1 + b2x2 + bpxp +

EYY ^

y

x

x2

x1

y

),(

yyrmaximizes


2/12

12/9/2013

2


y= b0 +b1 x1 + b2x2 + bpxp +

+

y b0 eX

nn n

b1

bp

=

EYY ^

1

1

1

1

:

::

p+1

x

x2

x1

y

yn1 = Xnpbp1 + en1

Ynk = XnpBpk + Enk

b = (XTX)-1XTy


Disadavantages: (XTX)-1

Uncorrelated X-variables required

n p +1

x2

x1

y r(x1,x2) 1




x2

x1

y r(x1,x2) 1

Fits a plane through a line !!

MLR: Multiple Linear Regressio n



x2

x1

y r(x1,x2) 1

3.763.67

-2.91-2.87

0.230.23

5.555.49

3.253.23

-0.99-1.01

3.67

-2.87

0.23

5.49

3.23

-1.01

3.76

-2.91

0.21

5.55

3.25

-0.99

x2x2 x1x1

Set A Set B

11.29

-8.09

2.19

19.09

10.33

-1.89

y

y= b1 x1 + b2x2 +

b1 b2 b1 b2

10.3 -6.92 2.96 0.28MLR

R2 =0.98 R2 =0.98




n p +1

Xp

n

Dimension reduction Variable Selection

Latent variables (PCR, PLS)

PCR: Princi pal Component Regressio n

Step 1: Perform PCA on the original X

Step 2 : Use the orthogonal PC-scores as independent variables in a MLR model

Step 3: Calculate b-coefficients from the a-coefficients

X

PCAT

p

cols

n-rows n-rows

a

colsa1

a2

aa

MLR

y

Step 1 Step2

n-rows

a2

aa

b1

b0

bp

Step 3a1


3/12

12/9/2013

3


PC1

Dimension reduction:

Use scores (projections) on latent variables that explain maximal variance in X

xp

x2

x1


Step 0 : Meancenter X

Step 1: Perform PCA: X = TPT

X* = (TPT

)*

Step 2: Perform MLR Y=TA

A = (TTT)-1TTY

Step 3 : Calculate B Y = X* B

Y = (T PT) B

A = PTB

B = (PPT)-1PA

B = PA

Calculate b0s

MLR on reconstructed X*= (TPT)*

yyb 0

PCR: Princi pal Component Regressio n

Optimal number of PCs

Calculate Crossvalidation RMSE for diff erent # PCs

nyy

RMSECV ii2)(

PLS: Partial Least Squares Regressio n

X

PLST

p

cols

n-rows n-rows

a

cola1

a2

aa

MLR

y

Phase 1

n-rows

a1

a2

aa

b1

b0

bp

Y

k

cols

n-rows

Phase 2

a1

k

cols

Phase 3


Projection to Latent Structure

PC1

xp

x2

x1

LV1 (w)

xp

x2

x1

PCR PLS

Use PC:

Maximizes variance in X

Use LV:

Maximizes covariance (X,y)

= VarX*vary*cor(X,y)


Sequential Algorithm: Latent variables and their scores are calculated sequentially

Step 0: Mean center X

Step 1: Calculate w

Calculate LV1= w1 that maximizes Covariance (X,Y) : SVD on XTY

(XTY)pk = WpaDaa ZTak w1 = 1

st col. of W

Phase 1 : Calculate new independent variables (T)

xp

x2

x1

w1


4/12

12/9/2013

4

PLS: Partial Least Squares Regression

Sequential Algorithm: Latent variables and th eir scores are calculated sequentially

Step 1: Calculate LV1= w1 that maximizes Covariance (X,Y) : SVD on XTY

(XTY)pk = WpaDaa ZTak w1 = 1

st col. of W

Step 2:

Calculate t1, scores (projections) of X on w 1

tn1 = Xnpwp1

Phase 1 : Calculate new independent variables (T)

xp

x2

x1

w

PLS: Partial Least Squares Regression

X

PLST

p

cols

n-rows n-rows

a

cola1

a2

aaMLR

y

Phase 1

n-rows

a1

a2

aa

b1

b0

bp

Y

k

cols

n-rows

Phase 2

a1

kcols

Phase 3

Optimal number of LVsCalculate Crossvalidation RMSE for diff erent # LVs

nyy

RMSECV ii2)(


3.763.67

-2.91-2.87

0.230.23

5.555.49

3.253.23

-0.99-1.01

3.67

-2.87

0.23

5.49

3.23

-1.01

3.76

-2.91

0.21

5.55

3.25

-0.99

x2x2 x1x1

Set A Set B

11.29

-8.09

2.19

19.09

10.33

-1.89

y

y= b1 x1 + b2x2 +

b1 b2 b1 b2

10.3 -6.92 2.96 0.28

1.60 1.62 1.60 1.62

1.60 1.62 1.60 1.62

MLR

PCR

PLS

MLR, PCR, PLS:

VALIDATION

Estimating predicti on error.

Basic Principle:

test how well your model works with new data,

it has not seen yet!

Common measure for prediction error


5/12

12/9/2013

5

Prediction error of the samples the model was built on

Error is biased!

Samples also used to build the model

model is biased towards accurate prediction of these

specific samples

A Biased Approach

Basic Principle:

test how well your model works with new data, it has not

seen yet!

Split data in training and test set.

Several ways:

One large test set

Leave one out and repeat: LOO

Leave n objects out and repeat: LNO

. . .

Apply entire model procedure on the test set

Validation: Basic Principle

Validation

Full data

set

Training

set

Test

set

Build model :

b0

bp

yRMSEP

Remark: f or fi nal model use whole data set.

Training and test sets

Split in training and test set.

Test set should berepresentativeof training set

Random choice is often thebest

Checkfor extremely unluckydivisions

Apply wholeprocedure on the

test and validation sets

Cross-validation

Most simple case: Leave-One-Out (=LOO, segment=1sample). Normally 10-20% out (=LnO).

Remark: for final model use whole data set.

Cross-validation: an example

The data


6/12

12/9/2013

6


Split data into training set and validation set


Split data into training set and test set


Build a model on the training set



Split data again into training set and valid. set

Until all samples have been in the validationset once

Common: Leave-One-Out (LOO)



Until all samples have been in the validation set once



7/12

12/9/2013

7

















Cross-validation: a warning

Data: 13 x 5 = 65 NIR spectra (1102 wavelengths)

13 samples: different composition of NaOH, NaOCland Na2CO3 5 temperatures: each sample measuredat 5 temperatures

Composit

ion

NaOH (w t%) NaOCl

(wt%)

Na2CO3 (wt%) Temperature ( C)

1 18.99 0 0 15 21 27 34 40

2 9.15 9.99 0.15 15 21 27 34 40

3 15.01 0 4.01 15 21 27 34 40

4 9.34 5.96 3.97 15 21 27 34 40

13 16.02 2.01 1.00 15 21 27 34 40

Cross-validation: a warning

The data

65

1102

65

31

2

13

Leave SAMPLE out

y


8/12

12/9/2013

8

Selection of number of LVs

Trough Validation:

Choose number of LVs that results in model withlowest prediction errror

Testset to assess final model cannot be used !

Divide trainingsetCrossvalidation

Validation

Full data

set

Training

Set

Testset

Test

set

1) determine #LVs : wit test set

2) Build model : b0

bp

yRMSEP

Remark: for final model use whole data set.

Double Cross Validation

Full data

set

Training

setC

1) determine #LVs : CV Innerloop

2) Build model : CV Outer loop

b0

bp

yRMSEP

Remark: f or fi nal model use whole data set Skip.

CV 1

CV2

Double cross -validation

The data

Double cross-validation




Used later to assess model performance!


9/12

12/9/2013

9

Double cross-validation

Apply crossvalidation on the rest: Split training set into

(new) training set and test set

1LV 2LV 3LV

1LV 2LV 3LV 1LV 2LV 3LV

Lowest RMSECV



10/12

12/9/2013

10


Repeat procedure



Repeat procedure



In th is way:

The number of LVs is determined by using samples notused to buildthe model with

The prediction error is also determined using samples the

model has not seenbefore

Remark: f or fi nal model use whole data set.

Raw + meancentered data

2 00 0 4 00 0 6 00 0 8 00 0 1 00 00 1 20 00 1 40 00-0.5

0

0.5

1

1.5

2

Wavenumber (cm-1)

Absorbance(a.u.)

Raw data

20 00 4 00 0 6 00 0 8 000 1 00 00 1 20 00 1 400 0-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Wavenumber (cm-1)

Absorbance(a.u.)

Meancentereddata

PLS: an example

RMSECV vs. No of LVs

1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Number of LVs

RMSECV

RMSECV values for prediction of NaOH

Regression coeffficients

3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000-0.5

0

0.5

1

1.5

2

Wavenumber (cm-1)

Absorbance(a.u.)

Raw data

3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000-2

0

2

4

6

8

10

Wavenumber (cm-1

)

Regressioncoefficient


11/12

12/9/2013

11

True vs. predicted

8 10 12 14 16 18 208

10

12

14

16

18

20

NaOH, true

NaOH,predicted

True values vs. predictions

500 1000 1500 2000 250

0

0.5

1

1.5

2

2.5

3

Wavelength (cm-1)

Intensity(a.u

)

Original spectrum

Offset

Slope

Scatter

Why Pre-Processing ?Data Artefacts

Baseline correcti on Al ignment Scatter correction Noise removal

Scaling, Normalisati on Transformation

..

Other Missing values

Outliers

0 200 400 600 800 1000 1200 1400 16000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Wavelength (a.u.)

Intensity

(a.u.

)

original

0 20 0 40 0 60 0 80 0 1 000 1 20 0 140 0 160 00

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Wavelength (a.u.)

Intensit

y

(a.u.

) offset

0 200 400 600 800 1000 1200 1400 16000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Wavelength (a.u.)

Intensity

(a.u.

)

offset+slope

0 2 00 4 00 6 00 8 00 1 00 0 12 00 1 40 0 16 000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Wavelength (a.u.)

Intens

ity

(a.u.

)

multiplicative

0 200 400 600 800 1000 1200 1400 16000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Wavelength (a.u.)

Intensity

(a.u.

)

originaloffsetoffset+slopemultiplicativeoffset + slope + multiplicative

0 200 400 600 800 1000 1200 1400 16000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Wavelength (a.u.)

Intensity

(a.u.

)

original

offset

offset+slopemultiplicativeoffset + slope + multiplicative

Pre-Processing Methods

4914 combinations: all reasonableSTEP 1:

(7x) BASELINESTEP 2:

(10x) SCATTER

STEP 3:

(10x) NOISE

STEP 4:

(7x) SCALING &

TRANSFORMATION

No baseline correction No scatter correction No noiseremoval Meancentering

(3x) Detrending

polynomial order

(2-3-4)

(4x) scaling: Mean

Median Max L2 norm

(9x) S-G smoothing

(window: 5-9-11 pt)

(order: 2-3-4)

Autoscaling

Range scaling

(2x)Derivatisation

(1st

2nd

)

SNV Pareto scaling

(3x) RNV (15, 25, 35)% Poisson scaling

AsLS MSC Level scaling

Log tr ansformation

Supervised pre-processing methods

OSC No noiseremoval Meancentering

DOSC Autoscaling

Range scaling

Pareto scalingPoisson scaling

Level scaling

Log scaling

Pre-Processing Results

Classification accuracy %Complexityofthemodel(noofLV)

Complexity of the model : no of LV

Classification Accuracy

Raw Data

J. Eng el et al. TrAC 2013


12/12

12/9/2013

12

PLS Toolbox (Eigenvector Inc.) www.eigenvector.com

For use in MATLAB (or standalone!)

XLSTAT-PLS (XLSTAT)

www.xlstat.com

For use in Microsoft Excel

Package pl s for R

Free software

http://cran.r-project.org

SOFTWARE

Documents

Pls Tutorial