Bayesian Structural Equations Modeling (SEM)

Bayesian Structural Equations Modeling

M’hamed (Hamy) Temkit1

1Division of BiostatisticsMayo Clinic, Arizona

Applied Statistics Seminar, November 17, 2016

M’hamed (Hamy) Temkit Division of Biostatistics

Outline

Introduction to SEM

Covariance Analysis

SEM Estimation (GLS vs MLE)

The General Model of SEM

LAAVAN

Bayesian Paradigm

Bayesian SEM

Bayesian CFA

BLAAVAN

CONCLUSION

Motivation

Two Paradigms

Covariance Analysis

Σ = Σ(θ)

Bayesian Inference

p(θ | y) = p(y | θ)p(θ)

Brief SEM Terminology

Measurement model

Structural model

Endogenous latent variables

Exogenous latent variables

Background

Factor Analysis (Spearman, 1904)

Path Analysis (Sewal Wright 1918,1921,1934,1960)

Confirmatory Factor Analysis (CFA)(Joreskog, 1969 )

General SEM ( Joreskog (1973), Wiley (1973))

LISREL model (Wiley (1973), Joreskog (1977))

Generalized least squares Browne (1974,1982,1984)

Relevant Reading References

Structural Equations With Latent Variables (Bollen, 1989)

Structural Equations Modeling With Amos (Byrn)

Latent Curve Models (Bollen, Curran 2006)

Structural Equation Modeling, A Bayesian Approach (Sik-YumLee 2007)

Structural Equation Modeling: A Multidisciplinary Journal

First Principle: Linear Regression

Linear Regression: The Machinery

yi = β0 + β1xi + εi , i = 1, n (regression line)

minn∑

(yi − β0 − β1xi )2 (OLS)

and if εi ∼ N(0, σ2) iid’s

maxn∏

2πσ2exp(− 1

n∑i=1

(yi − β0 − β1xi )2) (ML)

β ∼ N(β, σ2(X ′X )−1)

Pros and Cons of Regression (Linear Models)

Oversimplistic view of the Phenomena

Underestimates Measurement error (covariates are fixed)

Lacking in simultaneous equations in general (mediation )

Lacks flexibility to fit the SEM models

What is SEM

A melding of factor analysis and path (regression) analysisinto one comprehensive statistical methodolgy

Simultaneous equation modeling

Does the implied covariance matrix match up with theobserved covariance matrix

Degree to which they match represents the goodness of fit

Estimation (graph)

1.00 0.49

1.00 3.51

1.00 0.84

1.00 230.18

1.09 1.32

1.20 0.47

0.44 0.34

1.18 -123.86

Estimation (equations)

Measurement Model:

x1 = a1 + epistemiology + e1

x2 = a2 + b2 epistemiology + e2

x3 = a3 + tolerance + e3

x4 = a4 + b4 tolerance + e4

x5 = a5 + engagement + e5

x6 = a6 + b6 engagement + e6

x7 = a7 + range + e7

x8 = a8 + b8 range + e8

Structural Model:

tolerance = a9 + b9 epistemiology + e9

range = a10 + b10 tolerance

b11 engagement + e10

cov(epist, engag) 6= 0

Estimation: objective function

∑ni=1(x1i − x1)2 1

∑ni=1(x1i − x1)(x2i − x2) · · · cov(x1, x8)

cov(x1, x2) var(x2) · · · cov(x2, x8)· · · · · · · · · · · ·

cov(x1, x8) cov(x2, x8) · · · var(x8)

Σ(θ) = cov(x1, x2, · · · , x8) =

var(x1) cov(x1, x2) · · · cov(x1, x8)

cov(x1, x2) var(x2) · · · cov(x2, x8)· · · · · · · · · · · ·

cov(x1, x8) cov(x2, x8) · · · var(x8)

S ≈ Σ(θ)

Basically, minimize f (Σ(θ), S)

Generalized Least Squares (GLS)

x1, · · · , xn ∼ N(0,Σ(θ0)), xi ∈ Rp iid’s

vec SL−→ N(Σ(θ0),C )

G (θ) = 2−1tr(S − Σ(θ))V 2,V > 0

θL−→ N(θ0,D(θ0))

nG (θ)L−→ χ2

p∗−q

p∗ = p(p+1)2 , q parameters

H0 : Σ = Σ(θ) vs Ha : Σ 6= Σ(θ)

Maximum Likelihood (ML)

x1, · · · , xn ∼ N(µ0,Σ(θ0)), xi ∈ Rp iid’s

(n − 1)S ∼Wp(R0, ρ0)

F (θ) = log det(Σθ) + tr((SΣ(θ))−1)− log det(S)− p

θML−→ N(θ0,C2(θ0))

nF (θM)L−→ χ2

p∗−q

H0 : Σ = Σ(θ) vs Ha : Σ 6= Σ(θ)

SEM Modeling

Model ( Diagram )

Identifyability ( q ≤ 2−1p(p + 1)),check identifyabiltiy rules in Bollen (page 238)

Constraints ( loadings equal 1 )

EDA ( Distribution, correlation, outliers, etc...)

EDA ( Estimation )

Fit indices ( SMR ( residuals ))

Diagnostics ( residuals, outliers, etc... )

Measurement model (CFA)

xi = Λξi + εi , i = 1, · · · , n

ξ ∼ N(0,Φ), Latent variablesε ∼ N(0,Ψε), Ψε diagonalξ and ε are uncorrelated

Σ = ΛΦΛt + Ψε

Λ, Φ, Ψε are the parameters

CFA Example (graph)

1.00 0.55 0.73 1.00 1.11 0.93 1.00 1.18 1.08

0.55 1.13 0.84 0.37 0.45 0.36 0.80 0.49 0.57

0.81 0.98 0.38

x1 x2 x3 x4 x5 x6 x7 x8 x9

vsl txt spd

CFA (loadings and latents)

vsltxtspd

1 0 0λ21 0 0λ31 0 00 1 00 λ52 00 λ62 00 0 10 0 λ820 0 λ92

But also remember the variances and covariances

CFA using Laavan (R)

library(stringr)

library(lavaan)

library(DiagrammeR)

library(dplyr)

library(semPlot)

# specify the model

HS.model <-

" visual =~ x1 + x2 + x3

textual =~ x4 + x5 + x6

speed =~ x7 + x8 + x9 "

fit.HS <- sem(HS.model,

data=HolzingerSwineford1939)

summary(fit.HS)

semPaths(fit.HS, intercept = FALSE,

whatLabel = "est",

residuals = TRUE, exoCov = TRUE)

CFA Example (output)

> summary(fit.HS)

lavaan (0.5-22) converged normally after 35 iterations

Number of observations 301

Estimator ML

Minimum Function Test Statistic 85.306

Degrees of freedom 24

P-value (Chi-square) 0.000

Parameter Estimates:

Information Expected

Standard Errors Standard

Latent Variables:

Estimate Std.Err z-value P(>|z|)

visual =~

x1 1.000

x2 0.554 0.100 5.554 0.000

x3 0.729 0.109 6.685 0.000

textual =~

x4 1.000

x5 1.113 0.065 17.014 0.000

x6 0.926 0.055 16.703 0.000

speed =~

x7 1.000

x8 1.180 0.165 7.152 0.000

x9 1.082 0.151 7.155 0.000

Covariances:

visual ~~

textual 0.408 0.074 5.552 0.000

speed 0.262 0.056 4.660 0.000

textual ~~

speed 0.173 0.049 3.518 0.000

Variances:

.x1 0.549 0.114 4.833 0.000

.x2 1.134 0.102 11.146 0.000

.x3 0.844 0.091 9.317 0.000

.x4 0.371 0.048 7.779 0.000

.x5 0.446 0.058 7.642 0.000

.x6 0.356 0.043 8.277 0.000

.x7 0.799 0.081 9.823 0.000

.x8 0.488 0.074 6.573 0.000

.x9 0.566 0.071 8.003 0.000

visual 0.809 0.145 5.564 0.000

textual 0.979 0.112 8.737 0.000

speed 0.384 0.086 4.451 0.000

Structural model (SEM)

η = Bη + Γξ + ζ

y = Λyη + εx = Λxξ + δ

B, Γ, Λy , Λx ,Φ, Ψ, Θε,Θδ, are the parameters

SEM Example (graph)

1.00 2.18 1.82

1.00 1.26 1.06 1.26 1.00 1.19 1.28 1.27

1.48 0.57

0.621.31

2.15 0.79 0.351.36

x1 x2 x3

y1 y2 y3 y4 y5 y6 y7 y8

d60 d65

SEM Example (some equations)

[d60d65

[0 0B21 0

] [d60d65

[γ11γ21

] [i60]

[ξ1ξ2

Σ(θ) =

(Σyy (θ) Σyx(θ)Σxy (θ) Σxx(θ)

SEM Example ( R code)

# specify the model

model <- ’

# latent variables

ind60 =~ x1 + x2 + x3

dem60 =~ y1 + y2 + y3 + y4

dem65 =~ y5 + y6 + y7 + y8

# regressions

dem60 ~ ind60

dem65 ~ ind60 + dem60

# residual covariances

y1 ~~ y5

y2 ~~ y4 + y6

y3 ~~ y7

y4 ~~ y8

y6 ~~ y8

fit <- sem(model, data=PoliticalDemocracy)

summary(fit)

semPaths(fit, intercept = FALSE, whatLabel = "est",

residuals = FALSE, exoCov = FALSE)

SEM Example (output)

summary(fit)

lavaan (0.5-22) converged normally after 68 iterations

Estimator ML

Minimum Function Test Statistic 38.125

Degrees of freedom 35

P-value (Chi-square) 0.329

Information Expected

Standard Errors Standard

Latent Variables:

ind60 =~

x1 1.000

x2 2.180 0.139 15.742 0.000

x3 1.819 0.152 11.967 0.000

dem60 =~

y1 1.000

y2 1.257 0.182 6.889 0.000

y3 1.058 0.151 6.987 0.000

y4 1.265 0.145 8.722 0.000

dem65 =~

y5 1.000

y6 1.186 0.169 7.024 0.000

y7 1.280 0.160 8.002 0.000

y8 1.266 0.158 8.007 0.000

Regressions:

dem60 ~

ind60 1.483 0.399 3.715 0.000

dem65 ~

ind60 0.572 0.221 2.586 0.010

dem60 0.837 0.098 8.514 0.000

SEM Example (output)

Covariances:

.y1 ~~

.y5 0.624 0.358 1.741 0.082

.y2 ~~

.y4 1.313 0.702 1.871 0.061

.y6 2.153 0.734 2.934 0.003

.y3 ~~

.y7 0.795 0.608 1.308 0.191

.y4 ~~

.y8 0.348 0.442 0.787 0.431

.y6 ~~

.y8 1.356 0.568 2.386 0.017

Variances:

.x1 0.082 0.019 4.184 0.000

.x2 0.120 0.070 1.718 0.086

.x3 0.467 0.090 5.177 0.000

.y1 1.891 0.444 4.256 0.000

.y2 7.373 1.374 5.366 0.000

.y3 5.067 0.952 5.324 0.000

.y4 3.148 0.739 4.261 0.000

.y5 2.351 0.480 4.895 0.000

.y6 4.954 0.914 5.419 0.000

.y7 3.431 0.713 4.814 0.000

.y8 3.254 0.695 4.685 0.000

ind60 0.448 0.087 5.173 0.000

.dem60 3.956 0.921 4.295 0.000

.dem65 0.172 0.215 0.803 0.422

Why Bayesian

Flexibility to utilize prior knowledge ( priors )

Robust to small sample sizes

Bayes Factor and flexibility in comparing models

Easy production of the Latent scores ( Factors )

Blaavan ( open software in R )

WinBUGS ( open software )

Bayesian References

A Bayesian approach to confirmatory factor analysis (Lee,1980)

Evaluation of the Bayesian and maximum likelihoodapproaches in analyzing structural equation models with smallsmall sample sizes (Lee, Song, 2004)

Structural Equation Modeling, A Bayesian Approach (Lee,2007)

Basic and Advanced Bayesian Structural Equation Modeling,With Applications in the Medical and Behavioral Sciences(Song, Lee, 2012)

Bayesian estimation

log p(Θ|Y ,M) ∝ log p(Y |Θ,M) + log p(Θ)M: arbitrary SEM model

Y: observed dataset of raw observations, sample size nθ: Random vector of parameters in M

Conjugate priors

p(y |θ) =(nk

)θy (1− θ)n−y , θ ∈ (0, 1)

p(θ) ∝ θα−1(1− θ)β−1 , θ ∼ β(α, β)p(θ|y) ∝ p(y |θ)p(θ) ∝ θy (1− θ)n−y (1− θ)β−1

∝ θy+α−1(1− θ)n−y+β−1 ∼ β(y + α, n − y + β)The prior p(θ) and posterior p(θ|y) have the same distribution

Measurement model (CFA) Bayesian approach

yi = Λwi + εi , i = 1, · · · , n, yi ∈ Rk

wi ∼ N(0,Φ),w ∈ Rq

εi ∼ N(0,Ψε), Ψε diagonal , Ψεk elementswi and εi are independent

Λ, Φ, Ψε are the parametersLet Λt

k be the kth row of Λ

Measurement model (CFA) priors

The conjugate priors on the parameters are:

Ψεk ∼ IGamma(α∗0εk , β∗0εk)

[Λk |Ψεk ] ∼ N(Λ0k ,ΨεkH0yk)

Φ ∼ IWq(R∗0 , ρ0), R∗0 is pd

The problem is choosing the hyperparameters, such that we haveinformative vs. non informative priors

Measurement model (CFA) Gibbs Sampling (MCMC)

Let Y = y1, · · · , yn be the observed data matrixΩ = (w1, · · · ,wn) matrix of the the latent variables(Y ,Ω) is the complete dataset ( augmented data )

P(Λ, Φ, Ψε|Y ) the posterior is intractable

P(Λ, Φ, Ψε|Ω,Y ) usually standardP(Ω|Λ, Φ, Ψε,Y ) can be also derived based on Model M

Measurement model (CFA) Gibbs Sampling

The Gibbs sampling algorithm allows to sample fromP(Λ, Φ, Ψε,Ω|Y )

at the (j + 1)thiteration given Ωj , Λj , Φj , Ψjε

Generate Ωj+1 ∼ P(Ω|Λj , Φj , Ψjε,Y )

Generate Ψj+1ε ∼ P(Ψε|Ωj+1, Λj , Φj , Y )

Generate Φj+1 ∼ P(Φ|Ωj+1, Λj , Ψj+1ε ,Y )

Generate Λj+1 ∼ P(Λ|Ωj+1, Φj+1, Ψj+1ε ,Y )

Measurement model (CFA) Posterior Parameters Estimates

θt = (Λt , Φt , Ψtε), t = 1, · · · ,T ∗

T∗∑i=1

var(θ) =1

(T ∗ − 1)

T∗∑i=1

(θt − θ)(θt − θ)t

along with 95% confidence intervals using the Q0.025 and Q0.975

Bayesian CFA Example using Blaavan

library(blavaan)

# specify the model

bHS.model <- " visual =~ x1 + x2 + x3

textual =~ x4 + x5 + x6

speed =~ x7 + x8 + x9

# intercepts

x1 ~ 0

x2 ~ 0

x3 ~ 0

x4 ~ 0

x5 ~ 0

x6 ~ 0

x7 ~ 0

x8 ~ 0

x9 ~ 0

bfit.HS <- bsem(bHS.model,

data=HolzingerSwineford1939 )

summary(bfit.HS)

fitMeasures(bfit.HS,fit.measures="all", baseline.model= NULL)

Bayesian CFA Example (output)

blavaan (0.2-2) results of 10000 samples after 5000 adapt+burnin iterations

Number of missing patterns 1

Statistic MargLogLik PPP

Value -4481.087 0.000

Latent Variables:

Estimate Post.SD HPD.025 HPD.975 PSRF Prior

visual =~

x1 1.000

x2 1.221 0.018 1.186 1.255 1.000 dnorm(0,1e-2)

x3 0.463 0.012 0.438 0.487 1.000 dnorm(0,1e-2)

textual =~

x4 1.000

x5 1.404 0.020 1.365 1.445 1.004 dnorm(0,1e-2)

x6 0.731 0.016 0.7 0.761 1.001 dnorm(0,1e-2)

speed =~

x7 1.000

x8 1.320 0.020 1.28 1.357 1.002 dnorm(0,1e-2)

x9 1.286 0.019 1.25 1.325 1.002 dnorm(0,1e-2)

Covariances:

visual ~~

textual 15.500 1.321 12.998 18.14 1.000 dwish(iden,4)

speed 20.910 1.764 17.576 24.439 1.000 dwish(iden,4)

textual ~~

speed 13.003 1.118 10.9 15.259 1.000 dwish(iden,4)

Intercepts:

.x1 0.000

.x2 0.000

.x3 0.000

.x4 0.000

.x5 0.000

.x6 0.000

.x7 0.000

.x8 0.000

.x9 0.000

visual 0.000

textual 0.000

speed 0.000

Variances:

.x1 0.716 0.088 0.547 0.891 1.001 dgamma(1,.5)

.x2 1.219 0.138 0.96 1.5 1.000 dgamma(1,.5)

.x3 0.993 0.086 0.832 1.164 1.000 dgamma(1,.5)

.x4 0.449 0.053 0.346 0.552 1.001 dgamma(1,.5)

.x5 0.314 0.069 0.184 0.452 1.002 dgamma(1,.5)

.x6 0.509 0.048 0.417 0.604 1.000 dgamma(1,.5)

.x7 0.877 0.084 0.717 1.045 1.000 dgamma(1,.5)

.x8 0.567 0.077 0.417 0.72 1.000 dgamma(1,.5)

.x9 0.478 0.068 0.347 0.61 1.000 dgamma(1,.5)

visual 24.998 2.118 20.929 29.176 1.000 dwish(iden,4)

textual 10.256 0.882 8.518 11.953 1.001 dwish(iden,4)

speed 17.812 1.539 14.813 20.859 1.001 dwish(iden,4)

> fitMeasures(bfit.HS,fit.measures="all", baseline.model= NULL)

npar logl ppp bic dic p_dic waic

21.000 -4398.287 0.000 8916.354 8837.747 20.586 8838.364

p_waic looic p_loo margloglik

20.848 8838.391 20.861 -4481.087

Conclusions

The frequentist SEM approach is based on MLE

The Bayesian approach with data augmentation and MCMCmethods is flexible to analyze SEM

The Bayesian approach may be used when prior knowledge isavailabe when small sample size

Some open problems (power, optimal designs, GSEM, etc...)

THANK YOU!

Bayesian Structural Equations Modeling (SEM)

Health & Medicine

15: Bayesian Nonlinear Assimilation of Eulerian and ... · Bayesian nonlinear assimilation of Eulerian and Lagrangian coastal flow data, fully exploiting nonlinear governing equations

Bayesian Evaluation of Informative Hypotheses in SEM using Mplus

Bayesian Structural Equation Models · 2018-08-16 · Keywords: Bayesian SEM, structural equation models, JAGS, MCMC, lavaan. The intent of blavaan is to implement Bayesian structural

SEM 1 SEM 4 SEM 2 SEM 3 SEM 6

3. Bayesian Decision Theory - Sophia - Inria · Bayesian theory. 62 Bayesian Decision Theory . Bayesian Decision Theory . Bayesian Decision Theory – Discrete Features– – Discrete

2nd Sem Rev Packet - Weebly · 2nd Sem Rev Packet Monday, May 18, ... cot 1(-1): Chapter 5 ... '2916 = tan-I Chapter 6 (Vectors. Parametric Equations

Bayesian Inference via Filtering Equations for Ultra-High ...pdfs.semanticscholar.org/7dd5/7f9a4da2dc611ff5c9d7d5bf328f5fd1272d.pdfBayesian Inference via Filtering Equations for Ultra-High

Bayesian SEM: A more flexible representation of substantive theory

Introduction to Bayesian SEM

The time has come: Toward Bayesian SEM estimation in ......The time has come: Toward Bayesian SEM estimation in tourism research* A. George Assaf a, *, Mike Tsionas b, Haemoon Oh c

Bayesian SEM

MA 6351 TRANSFORMS AND PARTIAL DIFFERENTIAL EQUATIONS Sem 3... · 2015-08-07 · PARTIAL DIFFERENTIAL EQUATIONS This unit covers topics that explain the formation of partial differential

SYLLABUS - rku.ac.in Sem-4-Mechanical... · method. 03 9. Ordinary Differential Equations Numerical solution of ordinary differential equations, Euler’s method, Improved Euler’s

Lecture 16 SEM - Bauer College of Business · 2018-10-16 · 1 Lecture 16 SEM • Simultaneous equations models(SEM) differ from those we have seen so far because in each equation

sem ii - general - integral calculus and differential equations

Lecture 3: Bayesian Filtering Equations and Kalman Filter

Inference for stochastic differential equations via approximate Bayesian computation

Absolute Value Equations and Inequalities - Cypress Collegenews.cypresscollege.edu/Documents/sem/Absolute...Absolute Value Equations and Inequalities Objective 1: Solving Absolute

chaPter 38dm.education.wisc.edu/dkaplan2/intellcont/Kaplan_Depaoli... · 2014. 11. 17. · 38. Bayesian SEM 651 acterizing the second generation of SEM. Although ex-amples of Bayesian

Applications of structural equation modeling (SEM) in ... · Keywords: SEM, Ecological, Model fit, Sample size, Feedback loops, Model identification, Model selection, Bayesian, Latent