Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Partial Least Squares (PLS):Path Modeling
Method TalkWinter Term 2015/16
Pascal Stichler
Outline
1 Introduction to PLS
2 Putting PLS in Context
3 Model Definition
4 Solution Algorithm
5 Model Evaluation
6 Wrap-Up
2PLS
Today’s Lecture
Objectives
1 Evaluate when to use PLS
2 Learn how PLS works and how to use it
3 Investigate how to evaluate a PLS model, interpret the results andadjust the model accordingly
3PLS
Outline
1 Introduction to PLS
2 Putting PLS in Context
3 Model Definition
4 Solution Algorithm
5 Model Evaluation
6 Wrap-Up
4PLS: Introduction to PLS
PLS: A silver bullet?
Partial Least Squares Path Modeling is a statistical data analysismethodology that exists at the intersection of Regression Models,Structural Equation Models, and Multiple Table Analysis methods [9]
Goal: Use theoretical knowledge about structure of latent variables topredict indicators based on data
I Doing so with least possible distribution assumptions
I PLS-PM is known under several names: PLS-PM, PLS-SEM,component-based structural equation modeling, projection to latentstructures, soft modeling etc.
I Developed by Herman Wold in the mid 1960s under the term of "softmodeling" [14]
I After initial introduction and discussions it received little attention untilthe late 1990s, however since then sharply rising interest
5PLS: Introduction to PLS
Why use PLS?
PLS-PM is worth considering when ...Structural model
I . . . you have a theoretical model that involves latent variablesI . . . the phenomenon you investigate is relatively new and measurement
models need to be newly developedI . . . the structural equation model is complex with a large number of latent
variables and indicator variables [12]
Observed variablesI . . . you have small sample sets (e. g. more variables than observations) [7]I . . . you have non-normal distributed dataI . . . you have multicollinearity problemsI . . . you have formative and reflective measures (to be discussed)I . . . you need minimum requirements regarding measurement scales (e. g.
ratio and nominal variables)I . . . you need minimum requirements regarding residuals distribution [1]
6PLS: Introduction to PLS
Outline
1 Introduction to PLS
2 Putting PLS in Context
3 Model Definition
4 Solution Algorithm
5 Model Evaluation
6 Wrap-Up
7PLS: Putting PLS in Context
General OverviewTypes of PLS:
I PLS-Path Modeling:Component-based modeling based ontheoretical structure modelMainly used in: social sciences, econometrics,marketing and strategic management
I PLS-Regression:Regression based approach investigating thelinear relationship between multiple independentvariables and dependent variable(s)Mainly used in: chemometrics, bioinformatics,sensometrics, neuroscience and anthropology
I OPLS: Orthognal projection improvesinterpretability
I PLS-DA: Used when Xr is categorial
I CB-SEM: Covariance-based structural equationmodelling
LVp LVr
XrXp
I Predictors Xp ⊂ X
I Responses Xr ⊂ X withXp ∩Xr =∅
I Exogenous latent variablesLVp ⊂ LV
I Endogenous latent variablesLVr ⊂ LV with LVp ∩LVr =∅
8PLS: Putting PLS in Context
PLS-PM vs. CB-SEM
Both methods differ from statistical point of view. Hence, neither of the techniques isgenerally superior to the other and neither of them is appropriate for all situations. Ingeneral, the strenghts of PLS-SEM are CB-SEM’s weaknesses, and visa versa. [3]
PLS-PM (PLS-SEM)Variance-based
I The goal is prediction and theorydevelopment
I Formatively measured constructs arepart of the structural model
I The structural model is complex
I The sample size is small and/or thedata are non-normally distributed
I The plan is to use latent variable scoresin subsequent analyses
I Available Software: SmartPLS,PLSGraph, R packages (plspm) etc.
CB-SEMCovariance-based
I The goal is theory testing, theoryconfirmation, or the comparison ofalternative theories
I Error terms require additionalspecification, such as the covariation
I The structural model has non-recursiverelationships
I The research requires a globalgoodness-of-fit criterion
I Available Software: LISREL, AMOS,EQS etc.
Based on [8], [4], [11]9PLS: Putting PLS in Context
Comparison
Prediction
Theo
ry T
estin
g
CB-SEM
PLS-PM
10PLS: Putting PLS in Context
Outline
1 Introduction to PLS
2 Putting PLS in Context
3 Model Definition
4 Solution Algorithm
5 Model Evaluation
6 Wrap-Up
11PLS: Model Definition
Exemplary Model
X21
X22
X23
LV2
X11
X12
X13
LV1
X31
X32
X33
LV3
Measurement Model/Outer Model Measurement Model/Outer Model
Structural Model/Inner Model
Formal definition:
I X data set with n observations and m variables
I X can be divided into J exclusive blocks with K variables each X1,1 . . .XJ,K etc.
I Each block Xj associated with LVj ; estimation of variable ("score") denoted by L̂Vj = Yj
I LV1 and LV3: reflective blocks; LV2: formative block [9]12PLS: Model Definition
Structural Model (Inner Model)
1 Linear RelationshipAll relationships are considered linear relationships and can be notedasLVj = β0 +∑
i→jβjiLVi + εj
The coefficients βji represent the path coefficients
2 Recursive Model mandatoryCausality flow must be unidirectional (no loops)
3 Regression Specification (Predictor Specification)E(LVj |LVi) = β0i +∑
i→jβjiLVi
Specifying that the regression has to be linear under the assumptionthatcov(LVj ,εj) = 0 and εj = 0
13PLS: Model Definition
Measurement Model (Outer Model)Reflective Indicators
X11
X12
X13
LV1
λ11
λ13
λ12
I Linear relationships:Xjk = λ0jk +λjk LVj +εjk(λjk is called loading)
I RegressionSpecification:E(Xjk |LVj) =λ0jk +λjk LVj
I Characteristics:- Unidimensional- Correlated- Xjk "fully relevant"
Formative Indicators
X21
X22
X23
LV2
λ21
λ23
λ22
LVj = λ0j +λjk Xjk + εj
E(LVj |Xjk ) = λ0j +λjk Xjk
- Multidimensional- Uncorrelated- Xjk "partly relevant"
MIMIC*
X31
X32
X33
LV3
λ31
λ33
λ32
equivalent to reflective andformative (depending on
indicator)
equivalent to reflective andformative (depending on
indicator)
In R package plspm notpossible
*multiple effect indicators for multiple causes
14PLS: Model Definition
Weight Relations (Scores)
I The latent variables are only virtual entities
I However, as all linear relations depend on the latent variables, theyneed a representation: Weight Relations
Score: L̂Vj = Yj = ∑k
wjk Xjk
I The score, as a representation of the latent variable, is calculated asthe sum of its indicators (similar to the approach in principalcomponent analysis)
I Because of this PLS is called a component-based approach
15PLS: Model Definition
Outline
1 Introduction to PLS
2 Putting PLS in Context
3 Model Definition
4 Solution Algorithm
5 Model Evaluation
6 Wrap-Up
16PLS: Solution Algorithm
PLS-PM Algorithm Overview
1 Stage: Get the weights to compute latent variable scores→ Most important and most difficult
2 Stage: Estimate the path coefficients (inner model)→ Usually done via OLS
3 Stage: Obtain the loadings (outer model)→ Calculation of correlations
17PLS: Solution Algorithm
Stage 1: Latent Variable Scores
Start:initial arbitrary outerweights (e.g. wjk = 1)
Step 1:Compute the external
approximation oflatent variablesYj = wjk Xjk
Step 4:Calculate new
outer weights wjk
Step 2: Obtain inner
weights eij
Check for convergence of outer weights
Mode A:Simple
regressionMode B:Multiple
regression(Mode C:)
Combination
Inner weightingschemes:
• Centroid scheme
• Factor scheme• Path scheme
Step 3:Compute the internal
approximation oflatent variables
Zj = eijYi$⌃i j
k⌃
18PLS: Solution Algorithm
Stage 2 & 3
2. Stage: Path CoefficientsThe path coefficient estimates β̂ji = Bji are calculated usually using ordinaryleast squares in the multiple regression of Yi on the Yj ’s related with it
Yj = ∑i→j
β̂jiYi
In case high multicollinearity occurs PLS regression can also be applied [11]
3. Stage: LoadingsFor convenience and simplicity reasons, loadings are preferably calculatedas correlations between a latent variable and its indicators:
λ̂jk = cor(Xjk ,Yj)
19PLS: Solution Algorithm
PLS-PM usage in R (package plspm)
Parameters to define the PLS Path ModelData Data for the modelpath_matrix Definition of inner modelblocks List definitng the blocks of variables of the outer modelscaling List defining the measurment scale of variables for non-metric datamodes Vector defining the measuremnt mode of each block
Parameters related to the PLS-PM algorithmscheme Inner path weighting schemescaled Indicates whether the data should be standardizedtol Tolerance threshold for checking convergence of the iterative stagesmaxiter maximum number of iterationsplscomp Indicates the number of PLS components when handling non-metric data
Additional parametersboot.val Indicates whether bootstrap validation must be performedbr Number of bootstrap resamplesdataset Indicates whether the data matrix should be retrieved
20PLS: Solution Algorithm
Outline
1 Introduction to PLS
2 Putting PLS in Context
3 Model Definition
4 Solution Algorithm
5 Model Evaluation
6 Wrap-Up
21PLS: Model Evaluation
Interpreting the Results
In PLS the real challenge is interpreting the results and makingwell-founded adjustments the model [9], p. 54
Steps of Model Assessment:
1 Assessment MeasurementModel (Outer Model)
2 Assessment Structural Model(Inner Model)
(It is important to keep this order due to modeldependencies)
22PLS: Model Evaluation
1. Measurement Model Assessment (Outer Model)
I Formative Blocks: Evaluation relatively straightforward
I Reflective Blocks: Evaluation rather complex =⇒ Test theory applied
Formative Blocks:Variables are considered as causingthe latent variable
I They do not necessarilymeasure the same underlyingconstruct
I Not supposed to be correlated
I Compare outer weights tocheck which indicatorcontributes most efficiently
I Elimination of variables shouldbe based on multicollinearity
Reflective Blocks:Variables are considered as measuring thesame underlying construct
I Hence they need a strong mutualassociation
I Further they should be strongly related to itslatent variable
1 Unidimensionality of indicators
2 Indicators well explained
3 Constructs differ from each other
23PLS: Model Evaluation
Deep Dive: Reflective Indicators
1 Unidimensionality of indicators: All for one and one for all(a) Cronbach’s alpha
Measures the average inter-variable correlation(considered good if > 0.7)
(b) Dillon-Goldstein’s rhoFocus on the variance of the sum of variables (considered a betterindicator than Cronbach’s alpha ([1], p.320)(considered good if > 0.7)(see [11], [13] p. 50 for formal definition)
(c) First eigenvalueFirst eigenvalue of correlation matrix should be larger than one andsecond one significantly smaller (preferably smaller than 1)
2 Loadings & Communalities: Indicators well explained
I Loadings are considered for each indicator (considered good if > 0.7)I Communalities (squared loadings): amount of indicator variance
explained by its corresponding LV
24PLS: Model Evaluation
Deep Dive: Reflective Indicators
3 Cross-loadings: Constructs differ from each othercross-loadings =̂ loadings of an indicator with the rest of the latentvariablesGoal: Ensure that shared variance between construct and itsindicators is higher than for other constructs (no "traitor" indicators)=⇒ Loadings should always be highest for the respective block
[. . . ]$crossloadings
25PLS: Model Evaluation
2. Structural Model Assessment (Inner Model)Standard OLS regression output:
3 further indicators of model quality:I R2 determination coeffcient: Amount of variance of endogenous LVs explained by
its independent LVs (considered low below 0.3 and high above 0.6)
I Redundancy Index: Amount of variance in the endogenous block that explained byits independent LVs (defined as Rd(LVj ,xjk ) = loading2
jk R2j )
I Goodness-of-Fit (GoF): No single criterion exists for overall quality of a model.GoF as a pseudo criterion:
GoF =
√communality×R2 (considered good if >0.7) [10] [11]
I Validation: Resampling (bootstrapping, jackknifing) possible; more traditionalapproaches are not (as there are no assumptions made on the distribution)
26PLS: Model Evaluation
Outline
1 Introduction to PLS
2 Putting PLS in Context
3 Model Definition
4 Solution Algorithm
5 Model Evaluation
6 Wrap-Up
27PLS: Wrap-Up
Summary: PLS
Advisable for the following conditions (based on [8])Focus Prediction and theory developmentDistribution Minimum assumptions made regarding indicator distributionSample size Small sample size possible (however questioned in literature [2], [6], [5])
Model definitionIndicators Define blocks of variables and respective latent variablesMeasurement Model Define relations (formative/reflective)Structural Model Define internal model
Interpreting the resultsMeasurement Model (formative) Eliminate multicollinearityMeasurement Model (reflective) Unidimensionality, loadings & communalities and
cross-loadingsStructural Model Consider R2, redundancy index and GoFValidation Apply resampling (bootstrapping, jackknifing)
28PLS: Wrap-Up
Bibliography I
W. W. CHIN. The partial least squares approach to structuralequation modeling. In: Modern methods for business research,Vol. 295, No. 2 (1998), pp. 295–336.
D. GOODHUE, W. LEWIS, and R. THOMPSON. PLS, small samplesize, and statistical power in MIS research. In: System Sciences,2006. HICSS’06. Proceedings of the 39th Annual Hawaii InternationalConference on. Vol. 8. IEEE. 2006, 202b–202b.
J. F. HAIR JR et al. A primer on partial least squares structuralequation modeling (PLS-SEM). Sage Publications, 2013.
J. F. HAIR, C. M. RINGLE, and M. SARSTEDT. PLS-SEM: Indeed asilver bullet. In: Journal of Marketing Theory and Practice, Vol. 19,No. 2 (2011), pp. 139–152.
G. A. MARCOULIDES, W. W. CHIN, and C. SAUNDERS. A critical lookat partial least squares modeling. In: Mis Quarterly (2009),pp. 171–175.
29PLS: Wrap-Up
Bibliography IIG. A. MARCOULIDES and C. SAUNDERS. Editor’s comments: PLS: asilver bullet? In: MIS quarterly, Vol. 30, No. 2 (2006), pp. iii–ix.
B.-H. MEVIK and R. WEHRENS. The pls package: principalcomponent and partial least squares regression in R. In: Journal ofStatistical Software, Vol. 18, No. 2 (2007), pp. 1–24.
W. REINARTZ, M. HAENLEIN, and J. HENSELER. An empiricalcomparison of the efficacy of covariance-based and variance-basedSEM. In: International Journal of research in Marketing, Vol. 26,No. 4 (2009), pp. 332–344.
G. SANCHEZ. PLS path modeling with R. In: Online, January (2013).
M. TENENHAUS, S. AMATO, and V ESPOSITO VINZI. A globalgoodness-of-fit index for PLS structural equation modelling. In:Proceedings of the XLII SIS scientific meeting. Vol. 1. CLEUPPadova. 2004, pp. 739–742.
30PLS: Wrap-Up
Bibliography IIIM. TENENHAUS et al. PLS path modeling. In: Computationalstatistics & data analysis, Vol. 48, No. 1 (2005), pp. 159–205.
N. URBACH and F. AHLEMANN. Structural equation modeling ininformation systems research using partial least squares. In: Journalof Information Technology Theory and Application, Vol. 11, No. 2(2010), pp. 5–40.
V. E. VINZI, L. TRINCHERA, and S. AMATO. PLS path modeling: fromfoundations to recent developments and open issues for modelassessment and improvement. In: Handbook of partial least squares.Springer, 2010, pp. 47–82.
H. WOLD et al. Estimation of principal components and relatedmodels by iterative least squares. In: Multivariate analysis, Vol. 1(1966), pp. 391–420.
31PLS: Wrap-Up