Univariate Twin Analysis

1

Univariate Twin Analysis

OpenMx Tc26 2012Hermine Maes, Elizabeth Prom-Wormley, Nick Martin

2

Overall Questions to be Answered

•Does a trait of interest cluster among related individuals?

•Can clustering be explained by genetic or environmental effects?

•What is the best way to explain the degree to which genetic and environmental effects affect a trait?

3

Family & Twin Study Designs

•Family Studies

•Classical Twin Studies

•Adoption Studies

•Extended Twin Studies

4

The Data

•Australian Twin Register

•18-30 years old, males and females

•Work from this session will focus on Body Mass Index (weight/height2) in females only

•Sample size

•MZF = 534 complete pairs (zyg = 1)

•DZF = 328 complete pairs (zyg = 3)

5

A Quick Look at the Data

6

Classical Twin Studies Basic Background

•The Classical Twin Study (CTS) uses MZ and DZ twins reared together

•MZ twins share 100% of their genes

•DZ twins share on average 50% of their genes

•Expectation- Genetic factors are assumed to contribute to a phenotype when MZ twins are more similar than DZ twins

7

Classical Twin Study

Assumptions

•MZ twins are genetically identical

•Equal Environments of MZ and DZ pairs

8

Basic Data Assumptions

•MZ and DZ twins are sampled from the same population, therefore we expect :-

•Equal means/variances in Twin 1 and Twin 2

•Equal means/variances in MZ and DZ twins

•Further assumptions would need to be tested if we introduce male twins and opposite sex twin pairs

9

“Old Fashioned” Data Checking

MZ DZT1 T2 T1 T2

mean21.3

521.3

421.4

521.4

6variance 0.73 0.79 0.77 0.82covariance(T1-

T2)0.59 0.25

Nice, but how can we actually be sure that these means and variances are truly the same?

10

Univariate AnalysisA Roadmap1- Use the data to test basic assumptions

(equal means & variances for twin 1/twin 2 and MZ/DZ pairs)

Saturated Model

2- Estimate contributions of genetic and environmental effects on the total variance of a phenotype

ACE or ADE Models

3- Test ACE (ADE) submodels to identify and report significant genetic and environmental contributions

AE or CE or E Only Models10

11

Saturated Twin Model

12

Saturated Code Deconstructed

mMZ1 mMZ2 mDZ1 mDZ2mean MZ = 1 x 2

matrixmean DZ = 1 x 2

matrix

13

Saturated Code Deconstructed

vMZ1 cMZ21

cMZ21 vMZ2

T1 T2

T1

T2

vDZ1 cDZ21

cDZ21 vDZ2

T1 T2

T1

T2

covMZ = 2 x 2 matrix

covDZ = 2 x 2 matrix

14

Time to Play... with OpenMx

Please Open the FiletwinSatCon.R

Estimated ValuesT1 T2 T1 T2Saturated Model

mean MZ DZcov T1 T1

T2 T210 Total Parameters Estimated

Standardize covariance matrices for twin pair correlations (covMZ & covDZ)

mMZ1, mMZ2, vMZ1,vMZ2,cMZ21mDZ1, mDZ2, vDZ1,vDZ2,cDZ21

16

Estimated Values

10 Total Parameters Estimated

Standardize covariance matrices for twin pair correlations (covMZ & covDZ)

mMZ1, mMZ2, vMZ1,vMZ2,cMZ21mDZ1, mDZ2, vDZ1,vDZ2,cDZ21

T1 T2 T1 T2Saturated Model

mean MZ 21.34 21.35 DZ 21.45 21.46cov T1 0.73 T1 0.77

T2 0.59 0.79 T2 0.24 0.82

17

Fitting Nested Models

•Saturated Model• likelihood of data without any constraints

• fitting as many means and (co)variances as possible

•Equality of means & variances by twin order• test if mean of twin 1 = mean of twin 2

• test if variance of twin 1 = variance of twin 2

•Equality of means & variances by zygosity• test if mean of MZ = mean of DZ

• test if variance of MZ = variance of DZ

Estimated Values

T1 T2 T1 T2Equate Means & Variances across Twin

Ordermean MZ DZcov T1 T1

T2 T2Equate Means Variances across Twin Order

& Zygositymean MZ DZcov T1 T1

T2 T2

19

Estimates

T1 T2 T1 T2Equate Means & Variances across Twin

Ordermean MZ 21.35 21.35 DZ 21.45 21.45cov T1 0.76 T1 0.79

T2 0.59 0.76 T2 0.24 0.79Equate Means Variances across Twin Order

& Zygositymean MZ 21.39 21.39 DZ 21.39 21.39cov T1 0.78 T1 0.78

T2 0.61 0.78 T2 0.23 0.78

20

StatsModel ep -2ll df AIC

diff -2ll

diffdf

p

Saturated 104055.9

3176

7521.9

3

mT1=mT2 8 4056176

9518 0.07 2

0.97

mT1=mT2 &varT1=VarT2 6

4058.94

1771

516.94

3.01 40.56

ZygMZ=DZ 4

4063.45

1773

517.45

7.52 60.28

No significant differences between saturated model and models where means/variances/covariances are equal by zygosity and between twins

21

Questions?

This is all great, but we still don’t understand the genetic and

environmental contributions to BMI

22

Patterns of Twin Correlation

rMZ = 2rDZAdditive

DZ twins on average

share 50% of additive

effects

rMZ = rDZShared

Environment

rMZ > 2rDZ

Additive &Dominanc

e

DZ twins on

average share 25%

of dominance effects

???

A = 2(rMZ-rDZ)C = 2rDZ - rMZ E = 1- rMZ

Additive & “Shared Environment”

23

BMI Twin Correlations

0.30

0.78

A = 2(rMZ-rDZ)

C = 2rDZ - rMZ

E = 1- rMZA = 0.96

C = -0.20

E = 0.22

ADE or ACE?

24

Univariate AnalysisA Roadmap

1- Use the data to test basic assumptions inherent to standard ACE (ADE) models

Saturated Model


ADE or ACE Models

3- Test ADE (ACE) submodels to identify and report significant genetic and environmental contributions

AE or E Only Models

25

ADE ModelA Few Considerations

•Dominance refers to non-additive genetic effects resulting from interactions between alleles at the same locus

•ADE models also include effects of interactions between different loci (epistasis)

•In addition to sharing on average 50% of their DNA, DZ twins also share about 25% of effects due to dominance

26

ADE Model Deconstructed Path Coefficients

a1 x 1 matrix

d1 x 1 matrix

e1 x 1 matrix

covA <- mxAlgebra( expression=a %*% t(a), name="A" )

covD <- mxAlgebra( expression=d %*% t(d), name=“D" )

covE <- mxAlgebra( expression=e %*% t(e), name="E" )

27

ADE Model DeconstructedVariance Components

a1 x 1 matrix

d1 x 1 matrix

e1 x 1 matrix

aT

eT

*

*

*

dT

28

ADE Model DeconstructedMeans & Covariances

expMean

expMean

expMean= 1 x 2 matrix

A+D+E

A+D+E

T1T2

T2

T1

covMZ & covDZ= 2 x 2 matrix

29


A+D+E

0.5A A+D+E

T1T2

T2

T1

A+D+E

A A+D+E

T1T2

T2

T1

30


A+D+E0.5A+0.25

D

0.5A+0.25D

A+D+E

T1T2

T2

T1

A+D+E A+D

A+D A+D+E

T1T2

T2

T1

31

ADE Model Deconstructed

4 Parameters Estimated

ExpMeanVariance due to AVariance due to DVariance due to E

32



Saturated Model


ADE or ACE Models


AE or E Only Models

33

Testing Specific Questions

Please Open twinADEcon.R•‘Full’ ADE Model

•Comparison against Saturated•Nested Models•AE Model•test significance of D

•E Model vs AE Model•test significance of A

•E Model vs ADE Model•test combined significance of A

& D

34

AE Model Deconstructed

AeModel <- mxModel( AdeFit, name="AE" )

Object containing Full Model

New Object ContainingAE Model

AeModel <- omxSetParameters( AeModel, labels=“d11", free=FALSE, values=0)

AeFit <- mxRun(AeModel)

The d path will not be estimated

Fixing the value of the c path to zero

35

Goodness of Fit Stats

ep -2ll df AICdiff -2ll

diffdf

p

Saturated

ADE

AE

DE no! no! no! no! no! no! no!

E

Goodness-of-Fit Statistics

ep -2ll df AICdiff -2ll

diffdf

p

Saturated

104055.9

3176

7521.9

3- - -

ADE 44063.4

5177

3517.4

57.52 6

0.28

AE 34067.6

6177

4519.6

64.21 1

0.06

E 24591.7

9177

51041.

79528.

32

0.00

37

What about the magnitudes of genetic and environmental

contributions to BMI?

38



Saturated Model


ADE or ACE Models


AE or E Only Models

39

Estimated Values

a d e c a2 d2 e2 c2

ADE -

AE -

E -

Estimated Values

a d e c a2 d2 e2

ADE 0.57 0.54 0.41 - 0.41 0.37 0.22

AE 0.78 - 0.42 - 0.78 - 0.22

E - - 0.88 - - - 1.00

41

Okay, so we have a sense of which model

best explains the data...or do we?

42

BMI Twin Correlations

0.30

0.78

A = 2(rMZ-rDZ)

C = 2rDZ - rMZ

E = 1- rMZ

ADE or ACE?

43

Giving the ACE Model a Try

(if there is time)

44


Model ep -2ll df AICdiff -2ll

diffdf

p

Saturated

ADE

ACE

45


Model ep -2ll df AICdiff -2ll

diffdf

p

Saturated 104055.9

3176

7521.93 - - -

ADE 44063.4

5177

3517.45 7.52 6

0.28

ACE 44067.6

6177

3521.66

11.73

60.07

46

Estimated Values

a d e c a2 d2 e2 c2

ADE

AE

ACE

AE

E

Estimated Values

a d e c a2 d2 e2 c2

ADE 0.57 0.54 0.41 - 0.41 0.37 0.22 -

AE 0.78 - 0.42 - 0.78 - 0.22 -

ACE 0.78 0.00 0.42 - 0.78 0.00 0.22 -

AE - - 0.56 0.68 - - 0.41 0.59

E - - 0.88 - - - 1.00 -

48

Conclusions

•BMI in young OZ females (age 18-30)

•additive genetic factors: highly significant

•dominance: borderline significant

•specific environmental factors: significant

•shared environmental factors: not

49

Potential Complications

•Assortative Mating

•Gene-Environment Correlation

•Gene-Environment Interaction

•Sex Limitation

•Gene-Age Interaction

For Your Reference: An Overall Map of the How Open Mx Processes Built the ADE Model as Coded in twinADEcon.R

Make matrices pathA, pathD, pathE

Do Matrix Algebra w Matrices covA, covD, covE, covMZ,covDZCall Data for Use in the Model dataMZ, dataDZ

Call Data for Use in the Model dataMZ, dataDZ

Build Model from Matrices/AlgebrasmodelMZ, modelDZ

Build/Complie Overall Model from Matrices/AlgebrasAdeModel

Run Overall Model AdeModel

Get Summary Information from Overall Model AdeSumm

Generate Parameter Estimates from Overall Model estMean, estCovMa, estCovDZ

51

Thank You!

Documents

Univariate Twin Analysis