Upload
anupun
View
875
Download
80
Tags:
Embed Size (px)
Citation preview
Partial Least Squares Partial Least Squares Regression (PLSR)Regression (PLSR)
• Partial least squares (PLS) is a method for constructing predictive models when the predictors are many and highly collinear.
• Note that the emphasis is on predicting the responses and not necessarily on trying to understand the underlying relationship between the variables.
• When prediction is the goal and there is no practical need to limit the number of measured factors, PLS can be a useful tool.
• PLS was developed in the 1960’s by Herman Wold as an econometric technique, but some of its most avid proponents (including Wold’s son Svante) are chemical engineers and chemometricians.
• Partial least squares regression (PLSR) is a multivariate data analytical technique designed to handle intercorrelated regressors.
• It is based on Herman Wold’s general PLS principle in which complicated, multivariate systems analysis problems are solved by sequence of simple least squares regressions.
How Does PLS Work?How Does PLS Work?
• In principle, MLR can be used with very many predictors.
• However, if the number of predictors gets too large (for example, greater than the number of observations), you are likely to get a model that fits the sampled data perfectly but that will fail to predict new data well.
• This phenomenon is called over-fitting.
• In such cases, although there are many manifest predictors, there may be only a few underlying or latent factors that account for most of the variation in the response.
• The general idea of PLS is to try to extract these latent factors, accounting for as much of the manifest predictor variation as possible while modeling the responses well.
• For this reason, the acronym PLS has also been taken to mean ‘‘projection to latent structure.’’
• The overall goal is to use the predictors to predict the responses in the population.
• This is achieved indirectly by extracting latent variables T and U from sampled factors and responses, respectively.
• The extracted factors T (also referred to as X-scores) are used to predict the Y-scores U, and then the predicted Y-scores are used to construct predictions for the responses.
• This procedure actually covers various techniques, depending on which source of variation is considered most crucial.
• PCR is based on the spectral decomposition of XtX, where X is the matrix of predictor values;
• PLS is based on the singular value decomposition of XtY .
• If the number of extracted factors is greater than or equal to the rank of the sample factor space, then PLS is equivalent to MLR.
• An important feature of the method is that usually a great deal fewer factors are required.
• One One approach approach toto extract extract optimum number ofoptimum number of factors factors is to construct the PLS model for a given number of factors on one set of data and then to test it on another, choosing the number of extracted factors for which the total prediction error is minimized.
• Alternatively, van der Voet (1994) suggests choosing the least number of extracted factors whose residuals are not significantly greater than those of the model with minimum error.
• If no convenient test set is available, then each observation can be used in turn as a test set; this is known as cross-validation.
• The PLSR is a bilinear regression method that extracts a small number of factor, ta, a = 1, 2,…, A that are linear combinations of the K X variables, and use these factors as regressors for y.
• What is special for the PLSR compared to principal component regression (PCR) is that the y variable is used actively in determining how the regression factors ta are computed from the X.
• Each PLSR factor ta is defined so that it describes as much as possible of the covariance between X and y remaining after the previous a-1 factors have been estimated and subtracted.
• The purpose of using PLSR in multivariate calibration is to obtain good insight and good predictive ability at the same time.
• In classical stepwise multiple linear regression (SMLR) the collinearity is handled by picking out a small subset of individual, distinctly different X variables from all the available X variables.
• This reduced subset is used as regressors for y, leaving the other X variables unused.
• The estimated factors are often defined to be orthogonal to one another.
• The model for regressions on estimated latent variables can be summarized as follows:
T = w(X)
X = p(T) + E
y = q(T) + f
y = q(w(X)) + f = b(X) + f
• In practice, the model parameters have to be estimated from empirical data.
• Since the regression is intended for later prediction of y and X, the factor scores T are generally defined as functions of X:T = w(X).
• The major difference between calibration methods is how T is estimated.
• For instance, in PCR it is estimated as a series of eigenvector spectra for (X – 1x(X – 1xTT))TT(X – 1x(X – 1xTT),), etc.
• In PLSR w() is defined as a sequence of X versus y covariances.
PLS-Regression (PLS-R)PLS-Regression (PLS-R)PLS-A Powerful Alternative to PCRPLS-A Powerful Alternative to PCR
• It is possible to obtain the same prediction results as PCR, but based on a smaller number of components, by allowing the y-data structure to intervene directly in the X-decomposition.
• This by condensing the two-stage PCR process into just one: PLS-R (Partial Least Squares Regression).
• Usually the term used is just PLS, which has also been interpreted to signify Projection to Latent Structures.
• PLS claims to do the same job as PCR, only with fewer bilinear components.
PLS(X, Y); Initial Comparison with PLS(X, Y); Initial Comparison with PCA(X),PCA(Y)PCA(X),PCA(Y)
• In comparision between PCR and PLS, PLS uses the y-data structure, the y-variance, directly as a guiding hand in decomposing the X-matrix, so that the outcome constitutes as optimal regression, precisely in the strict prediction validation sense.
• A very first approximation to an understanding of how the PLS-approach works (though not entirely correct) is tentatively and simply to view it as two simultaneous PCA-analyses, PCA of X and PCA of Y.
• The equivalent PCA equations are presented at the following Figure.
• Note how the score and loading complements in X are called T and P respectively (X also has an alternative W-loading in addition to the familiar P-loading), while these are called U and Q respectively for the Y-space.
A
T
A
T
FQUY
EPTX
• However PLS does not really perform two independent PCA-analyses on the two spaces.
• On the contrary, PLS actively connects the X- and Y-spaces by specifying the u-score vector (s) to act as the starting points for (actually instead of) the t-score vectors in the X-space decomposition.
w = loading weight p = x loading q = y loading
• Thus the starting proxy-t1 is actually u1 in the PLS-R method, thereby letting the Y-data structure directly guide the otherwise much more “PCA-like” decomposition of X.
• Subsequently u1 is later substituted by t1 at the relevant stage in the PLS-algorithm in which the Y-space is decomposed.
• The crucial point is that it is the u1 (reflecting the Y-space structure) that first influences the X-decomposition leading to calculation of the X-loadings, but these are now termed “w” (for “loading-weights”).
• Then the X-space t-vectors are calculated, formally in a “standard” PCA fashion, but necessarily based on this newly calculated w-vector.
• This t-vector is now immediately used as the starting proxy- u1-vector, i.e. instead of u1, as described above only symmetrically with the X- and the Y-space interchanged.
• By this means, the X-data structure also influences the “PCA (Y)-like” decomposition.
B = W(PTW)-1QT
• Thus, what might at first sight appear as two sets of independent PCA decompositions is in fact based on these interchanged score vectors.
• In this way we have achieved the goal of modeling the X- and Y-space interdependently. PLS actively reduces the influence of large X-variations which do not correlate with Y.
• PCR is based on the spectral decomposition of X’X, where X is the matrix of variables and PLS is based on the singular value decomposition of X’Y.
• Alternative overview of PLS (indirect modeling) states that the overall goal is to use the variables to predict the responses in the population.
• This is achieved indirectly by extracting latent variables T and U from sampled variables and responses, respectively.
• The extracted factors T (also referred to as X-scores) are used to predict the Y-scores U, and then the predicted Y-scores are used to construct predictions for the responses.
Interpretation of PLS modelsInterpretation of PLS models
• In principle PLS models are interpreted in much the same way as PCA and PCR models.
• Plotting the X- and the Y-loadings in the same plot allows you to study the inter-variable relationship, now also including the relationship between the X- and Y-variables.
• Since PLS focuses on Y, the Y-relevant information is usually expected already in early components.
• There are however situations where the variation related to Y is very subtle, so many components will be necessary to explain enough of Y.
Loadings (p) and loading weights (w)Loadings (p) and loading weights (w)
• The P-loadings are very much like the well-known PCA-loadings; they express the relationship between the raw data matrix X and its score, T. (in PLS these may be called PLS scores.)
• These loadings may be interpreted in the same way as in PCA or PCR, so long as it is aware that the scores have been calculated by PLS.
• In many PLS applications P and W are quite similar. This means that the dominant structures in X “happen” to be directed more or less along the same directions as those with maximum correlation to Y.
• The loading weights, W, however represent the effective loadings directly connected to building the sought for regression relationship between X and Y.
• In PLS there is also a set of Y-loadings, Q, which are the regression coefficients from the Y-variables onto the scores, U.
• Q and W may be used to interpret relationships between the X- and Y-variables, and to interpret the patterns in the score plots related to these loadings.
Loading plot of non-spectra variablesLoading plot of non-spectra variables
Loading plot of spectra variablesLoading plot of spectra variables
• The fact that both P and W are important however, is clear from construction of the formal regression equation Y = XB from any specific PLS solution with A components.
• This B matrix is calculated from:
B = W(PTW)-1QT
This B-matrix is often used for practical (numerical) prediction purposes.
When to use which method?When to use which method?
• PLS-approach is easy to understand conceptually and to be preferred because it is direct, and effective.
• PLS is said to produce results, which are easier to interpret because they are less complex (using fewer components).
• Often PCR may give prediction errors as low as those of PLS, but almost invariably by using more PCs to do the jobs.
• PLS2 is a natural method to start with when there are many Y-variables.
• You quickly get an overview of the basic patterns and see if there is significant correlation between the Y-variables.
• PLS2 may actually in a few cases even give better results if Y is collinear, because it utilises all the available information in Y.
• The drawback is that you may need different numbers of PCs for the different Y-variables, which you must remember at interpretation and prediction.
Exercise- Interpretation of PLS (Jam)Exercise- Interpretation of PLS (Jam)