View
45
Download
1
Category
Tags:
Preview:
DESCRIPTION
Univariate Twin Analysis- Saturated Models for Continuous and Categorical Data. September 2 , 2014 Elizabeth Prom-Wormley & Hermine Maes ecpromwormle@vcu.edu 804-828-8154. Overall Questions to be Answered. Does the data satisfy the assumptions of the classical twin study? - PowerPoint PPT Presentation
Citation preview
Univariate Twin Analysis- Saturated Models
for Continuous and Categorical Data
September 2, 2014Elizabeth Prom-Wormley &
Hermine Maesecpromwormle@vcu.edu
804-828-81541
Overall Questions to be Answered
• Does the data satisfy the
assumptions of the classical twin
study?
• Does a trait of interest cluster
among related individuals?2
Family & Twin Study Designs
• Family Studies
• Classical Twin Studies
• Adoption Studies
• Extended Twin Studies
3
4
The Data
• Please open twinSatConECPW Fall2014.R
• Australian Twin Register
• 18-30 years old, males and females
• Work from this session will focus on Body Mass
Index (weight/height2) in females only
• Sample size
– MZF = 534 complete pairs (zyg = 1)
– DZF = 328 complete pairs (zyg = 3)
Classical Twin Studies Basic Background
• The Classical Twin Study (CTS) uses MZ and
DZ twins reared together
– MZ twins share 100% of their genes
– DZ twins share on average 50% of their genes
• Expectation- Genetic factors are assumed to
contribute to a phenotype when MZ twins are
more similar than DZ twins6
Classical Twin StudyAssumptions
• MZ twins are genetically
identical
• Equal Environments of MZ and
DZ pairs
7
Basic Data Assumptions
• MZ and DZ twins are sampled from the same population, therefore we expect :-– Equal means/variances in Twin 1 and Twin 2– Equal means/variances in MZ and DZ twins
• Further assumptions would need to be tested if we introduce male twins and opposite sex twin pairs
8
“Old Fashioned” Data Checking
9
MZ DZT1 T2 T1 T2
mean21.3
521.3
421.4
521.4
6variance 0.73 0.79 0.77 0.82covariance(T1-
T2)0.59 0.25
Nice, but how can we actually be sure that these means and variances are truly the same?
Univariate AnalysisA Roadmap
10
1- Use the data to test basic assumptions (equal means & variances for twin 1/twin 2 and MZ/DZ pairs)
Saturated Model
2- Estimate contributions of genetic and environmental effects on the total variance of a phenotype
ACE or ADE Models
3- Test ACE (ADE) submodels to identify and report significant genetic and environmental contributions
AE or CE or E Only Models
10
Saturated Code Deconstructed
12
mMZ1
mMZ2
mDZ1 mDZ2
mean MZ = 1 x 2 matrix
mean DZ = 1 x 2 matrix
meanMZ <- mxMatrix( type="Full", nrow=1, ncol=ntv, free=TRUE, values=meVals, labels=c("mMZ1","mMZ2"), name=”meanMZ" )
meanDZ <- mxMatrix( type="Full", nrow=1, ncol=ntv, free=TRUE, values=meVals, labels=c("mDZ1","mDZ2"), name=”meanDZ" )
Saturated Code Deconstructed
13
vMZ1 cMZ21
cMZ21 vMZ2
T1 T2
T1
T2
vDZ1 cDZ21
cDZ21 vDZ2
T1 T2
T1
T2
covMZ = 2 x 2 matrix
covDZ = 2 x 2 matrix
covMZ <- mxMatrix( type="Symm", nrow=ntv, ncol=ntv, free=TRUE, values=cvVals, lbound=lbVals, labels=c("vMZ1","cMZ21","vMZ2"), name=”covMZ" )
covDZ <- mxMatrix( type="Symm", nrow=ntv, ncol=ntv, free=TRUE, values=cvVals, lbound=lbVals, labels=c("vDZ1","cDZ21","vDZ2"), name=”covDZ" )
Estimated ValuesT1 T2 T1 T2Saturated Model
mean MZ DZcov T1 T1
T2 T210 Total Parameters Estimated
Standardize covariance matrices for twin pair correlations (covMZ & covDZ)
mMZ1, mMZ2, vMZ1,vMZ2,cMZ21mDZ1, mDZ2, vDZ1,vDZ2,cDZ21
Estimated Values
16
10 Total Parameters Estimated
Standardize covariance matrices for twin pair correlations (covMZ & covDZ)
mMZ1, mMZ2, vMZ1,vMZ2,cMZ21mDZ1, mDZ2, vDZ1,vDZ2,cDZ21
T1 T2 T1 T2Saturated Model
mean MZ 21.34 21.35 DZ 21.45 21.46cov T1 0.73 T1 0.77
T2 0.59 0.79 T2 0.24 0.82
Fitting Nested Models
• Saturated Model– likelihood of data without any constraints
–fitting as many means and (co)variances as possible
• Equality of means & variances by twin order– test if mean of twin 1 = mean of twin 2
– test if variance of twin 1 = variance of twin 2
• Equality of means & variances by zygosity– test if mean of MZ = mean of DZ
– test if variance of MZ = variance of DZ17
Estimated Values
T1 T2 T1 T2Equate Means & Variances across Twin Ordermean MZ DZcov T1 T1
T2 T2Equate Means Variances across Twin Order &
Zygositymean MZ DZcov T1 T1
T2 T2
Estimates
19
T1 T2 T1 T2Equate Means & Variances across Twin Ordermean MZ 21.35 21.35 DZ 21.45 21.45cov T1 0.76 T1 0.79
T2 0.59 0.76 T2 0.24 0.79Equate Means Variances across Twin Order &
Zygositymean MZ 21.39 21.39 DZ 21.39 21.39cov T1 0.78 T1 0.78
T2 0.61 0.78 T2 0.23 0.78
Stats
20
Model ep -2ll df AIC diff -2ll
diffdf p
Saturated 10 4055.93
1767
521.93
mT1=mT2 8 4056 1769 518 0.07 2 0.9
7mT1=mT2 &varT1=VarT2 6 4058.9
41771
516.94 3.01 4 0.5
6Zyg
MZ=DZ 4 4063.45
1773
517.45 7.52 6 0.2
8No significant differences between saturated model and models where means/variances/covariances are equal by zygosity and between twins
Working with Binary and Ordinal Data
Elizabeth Prom-Wormley and Hermine
Maes
Special Thanks to Sarah Medland
Transitioning from Continuous Logic to Categorical Logic
• Ordinal data has 1 less degree of freedom compared to continuous data
• MZcov, DZcov, Prevalence• No information on the variance
• Thinking about our ACE/ADE model• 4 parameters being estimated• A/ C/ E/ mean
• ACE/ADE model is unidentified without adding a constraint
Two Approaches to the Liability Threshold Model
• Traditional
– Maps data to a standard normal distribution
– Total variance constrained to be 1
• Alternate
– Fixes an alternate parameter (usually E)
– Estimates the remaining parameters
Observed Binary BMI is Imperfect Measure of Underlying Continuous
DistributionDensity of BMI
twinData$bmiB2
De
nsi
ty
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Mean (bmiB2) = 0.39SD (bmiB2) = 0.49Prevalence “low” BMI = 60.6%
We are interested in the liability of risk for being in the “high” BMI category
It’s Helpful to Rescale
-2 -1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
density.default(x = test1)
N = 100000 Bandwidth = 0.0444
De
nsi
ty
Raw Data (Unstandardized)mean=0.49, SD=0.39-Data not mapped to a standard normal -No easy conversion to %-Difficult to compare between groups Since the scaling is now arbitrary
Standard Normal (Standardized)mean=0, SD=1 Area under the curvebetween two z-values is interpreted as a probability or percentage
Binary Review
Threshold calculated using the cumulative normal distribution (CND)
-We used frequencies and inverse CND to do our own estimation of the threshold
qnorm(0.816) = 0.90
- Threshold is the Z Value that corresponds with the proportion of the population having “low BMI”
Getting a Feel for the DataOpen twinSatOrd.R
Calculate the frequencies of the 5 BMI categories for the second twins of the MZ pairs
CrossTable(mzDataOrdF$bmi2)
Estimating MZ Twin 2 Thresholds by Hand
T1 = qnorm(0.124) T1 = -1.155
T2 = qnorm(0.124 + 0.236) T2 = -0.358
T3 = qnorm(0.124 + 0.236 + 0.291) T3 =0.388
T4 = qnorm(0.124 + 0.236 + 0.291 + 0.175)T4 = 0.939
Estimate Twin Pair Correlations for the Liabilities Too!
Handling Ordinal Data in OpenMx
• 1- Determine the 1st threshold
• 2- Determine displacements between 1st threshold and subsequent thresholds
• 3- Add the 1st threshold and the displacement to obtain the subsequent thresholds
Ordinal Saturated Code Deconstructed
Defining Threshold Matrices
threM <- mxMatrix( type="Full", nrow=nth, ncol=ntv, free=TRUE, values=thVal, lbound=thLB, labels=thLabMZ, name="ThreMZ" )
threD <- mxMatrix( type="Full", nrow=nth, ncol=ntv, free=TRUE, values=thVal, lbound=thLB, labels=thLabDZ, name="ThreDZ" )
t1MZ1 t1MZ2
t2MZ1 t2MZ2
t3MZ1 t3MZ2
t4MZ1 t4MZ2
t1DZ1 t1DZ2
t2DZ1 t2DZ2
t3DZ1 t3DZ2
t4DZ1 t4DZ2
1
LT1
1
covMZ
LT2Variance
Constraint
ThresholdModel
1
LT1
1
covDZ
LT2
1
μ MZT1
0μ MZT2
01
μ DZT1
0μ DZT2
0
Ordinal Saturated Code Deconstructed
Defining Threshold Matrices- ThreMZ1- Determine the 1st threshold
Tw1 Tw2
-1.89 -1.16
0.81 0.79
0.73 0.76
0.55 0.55
2- Determine displacements between1st thresholds and subsequent thresholds
Double Check- Moving from Frequencies to Displacements
Frequency BMI T2
Cumulative Frequency
Z Value Displacement
0.124 0.124 -1.16
0.236 0.360 -0.37 0.79
0.291 0.651 0.39 0.76
0.175 0.826 0.94 0.55
0.175 1 - -
Ordinal Saturated Code Deconstructed
Estimating Expected Threshold Matrices
threMZ <- mxAlgebra( expression= Inc %*% ThreMZ, name="expThreMZ" )
Inc <- mxMatrix( type="Lower", nrow=nth, ncol=nth, free=FALSE, values=1, name="Inc" )
1 0 0 0
1 1 0 0
1 1 1 0
1 1 1 1
-1.19 -1.16
0.81 0.79
0.73 0.76
0.55 0.55
% * % =
-1.19 -1.16
-0.38 -0.37
0.34 0.39
0.89 0.93
3- Add the 1st threshold and the displacement to obtain the subsequent thresholds
Ordinal Saturated Code Deconstructed
Estimating Correlations & Fixing VariancecorMZ <- mxMatrix( type="Stand", nrow=ntv,
ncol=ntv, free=TRUE, values=corVals, lbound=lbrVal, ubound=ubrVal, labels="rMZ", name="expCorMZ" )
corDZ <- mxMatrix( type="Stand", nrow=ntv, ncol=ntv, free=TRUE, values=corVals, lbound=lbrVal, ubound=ubrVal, labels="rDZ", name="expCorDZ" )
How Many Parameters in this Ordinal Model?
• MZ correlation- rMZ
• DZ correlation- rDZ
• Thresholds
– t1MZ1,t2MZ1,t3MZ1,t4MZ1
– t1MZ2,t2MZ2,t3MZ2,t4MZ2
– t1DZ1,t2DZ1,t3DZ1,t4DZ1
– t1DZ2,t2DZ2,t3DZ2,t4DZ2
Problem Set 1
• Open twinSatOrdA.R and twinSatOrd.R– What do these scripts do?– Looking at the scripts only: How are they similar?
How are they different?– What do these differences in the scripts reflect
regarding conceptual differences in the two models?
• Run either script and double check against your previously hand-calculated values of thresholds. Report your results. If you can’t get it to match up, don’t panic…do e-mail.
• Run twinSatOrd.R– Is testing an ACE model with the usual model
assumptions justified? Why or why not?
Univariate Analysis with Ordinal DataA Roadmap
1- Use the data to test basic assumptions inherent to standard ACE (ADE) models
Saturated Model
2- Estimate contributions of genetic and environmental effects on the liability of a trait
ADE or ACE Models
3- Test ADE (ACE) submodels to identify and report significant genetic and environmental contributions
AE or E Only Models
Recommended