Upload
tanek
View
28
Download
0
Tags:
Embed Size (px)
DESCRIPTION
S quares. L east. P artial. A Standard Tool for :. Multivariate R e g r e s s i o n. Regression :. Modeling dependent variable(s): Y. Chemical property Biol. activity. By predictor variables: X. Chem. composition Chem. structure (Coded). MLR. - PowerPoint PPT Presentation
Citation preview
PartialLeast
Squares
Multivariate R e g r e s s i o nA Standard Tool for :
Regression :
Modeling dependentdependent variable(s): YY
By predictorpredictor variables: XX
Chemical property
Biol. activity
Chem. composition
Chem. structure (Coded)
Traditional method: MLRMLR
IfIf X-variables are:
few ( # X-variables < # Samples)
Uncorrelated (Full Rank X)
Noise Free ( when some correlation exist)
But !But !
InstrumentsInstrumentsInstrumentsSpectrometers
Chromatographs
Sensor Arrays
Numerous
Correlated
Noisy
Incomplete
Data …Data …
X : Independent VariablesCorrelated
PredictorPredictor
The relation between
two Matrices X and Y
By a LinearLinear Multivariate Regression
PLSR Models:
The StructureStructure of X and Y
Richer results than
Traditional Multivariate regression
1
2
PLSR is a generalizationgeneralization of MLR
PLSR is able to analyze Data with:
Noise
Collinearity (Highly Correlated Data)
Numerous X-variables (> # samples)
incompleteness in both X and Y
HistoryHistory
Herman Wold (1975):
Modeling of chain matrices by:
Nonlinear Iterative Partial Least Squares
Regression between :
- a variablevariable matrix
- a parameterparameter vector
Other parameter vector
Fixed
Svante Wold & H. Martens (1980):
Completion and modification of
Two-blocksTwo-blocks (X,Y) PLS (simplest)
Herman Wold (~2000):
Projection to Latent Structures
As a more descriptivemore descriptive interpretation
A QQSSPPRR example
One Y-variable: a chemical propertyproperty
Quant. descriptiondescription of variation in chem. structurestructure
The Free Energy of unfolding of a protein
Seven X-variables:
19 different AminoAcids in position 49 of proteinHighlyHighly
CorrelatedCorrelated
123456789
10111213141516171819
Table1 PIEPIE0.23
-0.48-0.610.45
-0.11-0.510.000.151.201.28
-0.770.901.560.380.000.171.850.890.71
PIFPIF0.31
-0.60-0.771.54
-0.22-0.640.000.131.801.70
-0.991.231.790.49
-0.040.262.250.961.22
DGRDGR-0.550.511.20
-1.400.290.760.00
-0.25-2.10-2.000.78
-1.60-2.60-1.500.09
-0.58-2.70-1.70-1.60
SACSAC254.2303.6287.9282.9335.0311.6224.9337.2322.6324.0336.6336.3366.1288.5266.7283.9401.8377.8295.1
MRMR2.1262.9942.9942.9333.4583.2431.6623.8563.3503.5182.9333.8604.6382.8762.2792.7435.7554.7913.054
LamLam-0.02-1.24-1.08-0.11-1.19-1.430.03
-1.060.040.12
-2.26-0.33-0.05-0.31-0.40-0.53-0.31-0.84-0.13
DDGTSDDGTS8.58.28.5
11.06.38.87.1
10.116.815.0
7.913.311.2
8.27.48.89.98.8
12.0
VolVol82.2
112.3103.799.1
127.5120.565.0
140.6131.7131.5144.3132.3155.8106.788.5
105.3185.9162.7115.6
X YY
TransformationTransformation Symmetrical Distribution
12.542350.2546100584
loglog
1.0973.627-0.6992.7375.002
ScalingScaling Increase in weights of
more informative X-variables
No Knowledge about importance of variables
Auto ScalingAuto Scaling
1.Scale to unit variance (xi /SD).
2.Centering (xi – xaver).
Same weights for all X-variables
Auto Scaling
Numerically More Stable
The PLSR ModelModel (usually linearlinear)
A few “new” variables :
X-scores tta a (a=1,2, …,A)
Orthogonal
& Linear Combination of X-variables
Modelers of X Predictors of Y
: T = X W*
Weights
X = T P’ + E
TT (X-scores) (X-scores) ttaa (a=1,2, …,A)(a=1,2, …,A)
Are:
Modelers of X:
Predictors of Y: Y = T C’ + F
loadings
Y = XW* C’ + FPLS-Regression PLS-Regression
Coefficients Coefficients ((BB))
Estimation of Estimation of TT : :
By stepwise subtraction of each component (ttaap’p’aa) from X
X = T P’ + E
X - T P’ = E
X - ta pa’ = Ea
Residual after Residual after subtraction of subtraction of aathth component component
X= t1pp11 +t2pp22+ t3pp33+ t4pp44+… + tappaa
X= X1 + XX22 + + X3 + + XX44 + … + + … + XXAA
EE11 EE22 EE33 Ea-1
Stepwise “DeflationDeflation” of X-matrix t1 = Xw1
E1 = X – t1 p1’t2 = E1w2
E2= E1 – t2 p2’t3 = E2w3
Ea-1 = EEa-2a-2 – ta-1 p’a-1ta = Ea-1 wa
.
.
.
.
.
.
W