Univariate Twin Analysis- Saturated Models for Continuous and Categorical Data

Univariate Twin Analysis- Saturated Models

for Continuous and Categorical Data

September 2, 2014Elizabeth Prom-Wormley &

Hermine Maesecpromwormle@vcu.edu

804-828-81541

Overall Questions to be Answered

• Does the data satisfy the

assumptions of the classical twin

study?

• Does a trait of interest cluster

among related individuals?2

Family & Twin Study Designs

• Family Studies

• Classical Twin Studies

• Adoption Studies

• Extended Twin Studies

The Data

• Please open twinSatConECPW Fall2014.R

• Australian Twin Register

• 18-30 years old, males and females

• Work from this session will focus on Body Mass

Index (weight/height2) in females only

• Sample size

– MZF = 534 complete pairs (zyg = 1)

– DZF = 328 complete pairs (zyg = 3)

A Quick Look at the Data

Classical Twin Studies Basic Background

• The Classical Twin Study (CTS) uses MZ and

DZ twins reared together

– MZ twins share 100% of their genes

– DZ twins share on average 50% of their genes

• Expectation- Genetic factors are assumed to

contribute to a phenotype when MZ twins are

more similar than DZ twins6

Classical Twin StudyAssumptions

• MZ twins are genetically

identical

• Equal Environments of MZ and

DZ pairs

Basic Data Assumptions

• MZ and DZ twins are sampled from the same population, therefore we expect :-– Equal means/variances in Twin 1 and Twin 2– Equal means/variances in MZ and DZ twins

• Further assumptions would need to be tested if we introduce male twins and opposite sex twin pairs

“Old Fashioned” Data Checking

MZ DZT1 T2 T1 T2

mean21.3

6variance 0.73 0.79 0.77 0.82covariance(T1-

T2)0.59 0.25

Nice, but how can we actually be sure that these means and variances are truly the same?

Univariate AnalysisA Roadmap

1- Use the data to test basic assumptions (equal means & variances for twin 1/twin 2 and MZ/DZ pairs)

Saturated Model

2- Estimate contributions of genetic and environmental effects on the total variance of a phenotype

ACE or ADE Models

3- Test ACE (ADE) submodels to identify and report significant genetic and environmental contributions

AE or CE or E Only Models

Saturated Twin Model

Saturated Code Deconstructed

mDZ1 mDZ2

mean MZ = 1 x 2 matrix

mean DZ = 1 x 2 matrix

meanMZ <- mxMatrix( type="Full", nrow=1, ncol=ntv, free=TRUE, values=meVals, labels=c("mMZ1","mMZ2"), name=”meanMZ" )

meanDZ <- mxMatrix( type="Full", nrow=1, ncol=ntv, free=TRUE, values=meVals, labels=c("mDZ1","mDZ2"), name=”meanDZ" )

Saturated Code Deconstructed

vMZ1 cMZ21

cMZ21 vMZ2

vDZ1 cDZ21

cDZ21 vDZ2

covMZ = 2 x 2 matrix

covDZ = 2 x 2 matrix

covMZ <- mxMatrix( type="Symm", nrow=ntv, ncol=ntv, free=TRUE, values=cvVals, lbound=lbVals, labels=c("vMZ1","cMZ21","vMZ2"), name=”covMZ" )

covDZ <- mxMatrix( type="Symm", nrow=ntv, ncol=ntv, free=TRUE, values=cvVals, lbound=lbVals, labels=c("vDZ1","cDZ21","vDZ2"), name=”covDZ" )

Time to Play...

Continue with the FiletwinSatConECPW

Fall2014.R

Estimated ValuesT1 T2 T1 T2Saturated Model

mean MZ DZcov T1 T1

T2 T210 Total Parameters Estimated

Standardize covariance matrices for twin pair correlations (covMZ & covDZ)

mMZ1, mMZ2, vMZ1,vMZ2,cMZ21mDZ1, mDZ2, vDZ1,vDZ2,cDZ21

Estimated Values

10 Total Parameters Estimated

Standardize covariance matrices for twin pair correlations (covMZ & covDZ)

mMZ1, mMZ2, vMZ1,vMZ2,cMZ21mDZ1, mDZ2, vDZ1,vDZ2,cDZ21

T1 T2 T1 T2Saturated Model

mean MZ 21.34 21.35 DZ 21.45 21.46cov T1 0.73 T1 0.77

T2 0.59 0.79 T2 0.24 0.82

Fitting Nested Models

• Saturated Model– likelihood of data without any constraints

–fitting as many means and (co)variances as possible

• Equality of means & variances by twin order– test if mean of twin 1 = mean of twin 2

– test if variance of twin 1 = variance of twin 2

• Equality of means & variances by zygosity– test if mean of MZ = mean of DZ

– test if variance of MZ = variance of DZ17

Estimated Values

T1 T2 T1 T2Equate Means & Variances across Twin Ordermean MZ DZcov T1 T1

T2 T2Equate Means Variances across Twin Order &

Zygositymean MZ DZcov T1 T1

Estimates

T1 T2 T1 T2Equate Means & Variances across Twin Ordermean MZ 21.35 21.35 DZ 21.45 21.45cov T1 0.76 T1 0.79

T2 0.59 0.76 T2 0.24 0.79Equate Means Variances across Twin Order &

Zygositymean MZ 21.39 21.39 DZ 21.39 21.39cov T1 0.78 T1 0.78

T2 0.61 0.78 T2 0.23 0.78

Model ep -2ll df AIC diff -2ll

diffdf p

Saturated 10 4055.93

521.93

mT1=mT2 8 4056 1769 518 0.07 2 0.9

7mT1=mT2 &varT1=VarT2 6 4058.9

516.94 3.01 4 0.5

MZ=DZ 4 4063.45

517.45 7.52 6 0.2

8No significant differences between saturated model and models where means/variances/covariances are equal by zygosity and between twins

Working with Binary and Ordinal Data

Elizabeth Prom-Wormley and Hermine

Special Thanks to Sarah Medland

Transitioning from Continuous Logic to Categorical Logic

• Ordinal data has 1 less degree of freedom compared to continuous data

• MZcov, DZcov, Prevalence• No information on the variance

• Thinking about our ACE/ADE model• 4 parameters being estimated• A/ C/ E/ mean

• ACE/ADE model is unidentified without adding a constraint

Two Approaches to the Liability Threshold Model

• Traditional

– Maps data to a standard normal distribution

– Total variance constrained to be 1

• Alternate

– Fixes an alternate parameter (usually E)

– Estimates the remaining parameters

Time to Look at the Data!

Please open BinaryWarmUp.R

Observed Binary BMI is Imperfect Measure of Underlying Continuous

DistributionDensity of BMI

twinData$bmiB2

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

Mean (bmiB2) = 0.39SD (bmiB2) = 0.49Prevalence “low” BMI = 60.6%

We are interested in the liability of risk for being in the “high” BMI category

It’s Helpful to Rescale

-2 -1 0 1 2 3

density.default(x = test1)

N = 100000 Bandwidth = 0.0444

Raw Data (Unstandardized)mean=0.49, SD=0.39-Data not mapped to a standard normal -No easy conversion to %-Difficult to compare between groups Since the scaling is now arbitrary

Standard Normal (Standardized)mean=0, SD=1 Area under the curvebetween two z-values is interpreted as a probability or percentage

Binary Review

Threshold calculated using the cumulative normal distribution (CND)

-We used frequencies and inverse CND to do our own estimation of the threshold

qnorm(0.816) = 0.90

- Threshold is the Z Value that corresponds with the proportion of the population having “low BMI”

Moving to Ordinal Data!

Getting a Feel for the DataOpen twinSatOrd.R

Calculate the frequencies of the 5 BMI categories for the second twins of the MZ pairs

CrossTable(mzDataOrdF$bmi2)

Estimating MZ Twin 2 Thresholds by Hand

T1 = qnorm(0.124) T1 = -1.155

T2 = qnorm(0.124 + 0.236) T2 = -0.358

T3 = qnorm(0.124 + 0.236 + 0.291) T3 =0.388

T4 = qnorm(0.124 + 0.236 + 0.291 + 0.175)T4 = 0.939

Estimate Twin Pair Correlations for the Liabilities Too!

Translating Back to the SEM Approach in OpenMx

Handling Ordinal Data in OpenMx

• 1- Determine the 1st threshold

• 2- Determine displacements between 1st threshold and subsequent thresholds

• 3- Add the 1st threshold and the displacement to obtain the subsequent thresholds

Ordinal Saturated Code Deconstructed

Defining Threshold Matrices

threM <- mxMatrix( type="Full", nrow=nth, ncol=ntv, free=TRUE, values=thVal, lbound=thLB, labels=thLabMZ, name="ThreMZ" )

threD <- mxMatrix( type="Full", nrow=nth, ncol=ntv, free=TRUE, values=thVal, lbound=thLB, labels=thLabDZ, name="ThreDZ" )

t1MZ1 t1MZ2

t2MZ1 t2MZ2

t3MZ1 t3MZ2

t4MZ1 t4MZ2

t1DZ1 t1DZ2

t2DZ1 t2DZ2

t3DZ1 t3DZ2

t4DZ1 t4DZ2

LT2Variance

Constraint

ThresholdModel

μ MZT1

0μ MZT2

μ DZT1

0μ DZT2

Defining Threshold Matrices- ThreMZ1- Determine the 1st threshold

Tw1 Tw2

-1.89 -1.16

0.81 0.79

0.73 0.76

0.55 0.55

2- Determine displacements between1st thresholds and subsequent thresholds

Double Check- Moving from Frequencies to Displacements

Frequency BMI T2

Cumulative Frequency

Z Value Displacement

0.124 0.124 -1.16

0.236 0.360 -0.37 0.79

0.291 0.651 0.39 0.76

0.175 0.826 0.94 0.55

0.175 1 - -

Estimating Expected Threshold Matrices

threMZ <- mxAlgebra( expression= Inc %*% ThreMZ, name="expThreMZ" )

Inc <- mxMatrix( type="Lower", nrow=nth, ncol=nth, free=FALSE, values=1, name="Inc" )

1 0 0 0

1 1 0 0

1 1 1 0

1 1 1 1

-1.19 -1.16

0.81 0.79

0.73 0.76

0.55 0.55

% * % =

-1.19 -1.16

-0.38 -0.37

0.34 0.39

0.89 0.93

3- Add the 1st threshold and the displacement to obtain the subsequent thresholds

Estimating Correlations & Fixing VariancecorMZ <- mxMatrix( type="Stand", nrow=ntv,

ncol=ntv, free=TRUE, values=corVals, lbound=lbrVal, ubound=ubrVal, labels="rMZ", name="expCorMZ" )

corDZ <- mxMatrix( type="Stand", nrow=ntv, ncol=ntv, free=TRUE, values=corVals, lbound=lbrVal, ubound=ubrVal, labels="rDZ", name="expCorDZ" )

How Many Parameters in this Ordinal Model?

• MZ correlation- rMZ

• DZ correlation- rDZ

• Thresholds

– t1MZ1,t2MZ1,t3MZ1,t4MZ1

– t1MZ2,t2MZ2,t3MZ2,t4MZ2

– t1DZ1,t2DZ1,t3DZ1,t4DZ1

– t1DZ2,t2DZ2,t3DZ2,t4DZ2

Problem Set 1

• Open twinSatOrdA.R and twinSatOrd.R– What do these scripts do?– Looking at the scripts only: How are they similar?

How are they different?– What do these differences in the scripts reflect

regarding conceptual differences in the two models?

• Run either script and double check against your previously hand-calculated values of thresholds. Report your results. If you can’t get it to match up, don’t panic…do e-mail.

• Run twinSatOrd.R– Is testing an ACE model with the usual model

assumptions justified? Why or why not?

Univariate Analysis with Ordinal DataA Roadmap

1- Use the data to test basic assumptions inherent to standard ACE (ADE) models

Saturated Model

2- Estimate contributions of genetic and environmental effects on the liability of a trait

ADE or ACE Models

3- Test ADE (ACE) submodels to identify and report significant genetic and environmental contributions

AE or E Only Models

Questions?

Univariate Twin Analysis- Saturated Models for Continuous and Categorical Data

Documents

HOW TO PERFORM UNIVARIATE ANALYZES? - Roscoffweb11.sb-roscoff.fr/download/w4m/howto/w4m_HowToPerformUniv… · The "Univariate" module The "Univariate" module on W4M allows you to

Comparing categorical data - Haese Mathematics categorical data Chapter18 Contents: A Categorical data B Examining categorical data C Comparing and reporting categorical data D Data

Univariate Statistics

Unconstrained Univariate Optimization - University of …aksikas/unlp1.pdf · 2008-10-17 · Unconstrained Univariate Optimization Univariate optimization means optimization of a

Univariate Data

Univariate Stats

Univariate Twin Analysis

Singh, A. et al. (2017) Comparison of exercise testing and ... · tests or Mann-Whitney tests. The Chi-squared test or Fisher’s exact test were used for categorical variables. Univariate

Univariate & bivariate analysis

CATEGORICAL EXEMPTION/ CATEGORICAL EXCLUSION DETERMINATION ... · PDF filecategorical exemption/ categorical exclusion determination ... categorical exemption/categorical exclusion

Automorphism groups of omega-categorical structuresShelah, in which the small index property is proved for uncountable saturated structures of regular cardinality [23]. The property,

UNIVARIATE MODELS - OJP

MANOVA Dig it!. Comparison to the Univariate Analysis of Variance allows for the investigation of the effects of a categorical variable on a continuous

Univariate Analysis

Univariate Statistics Summary - learn.stleonards.vic.edu.au · Univariate Statistics Summary Page 1 of 21 Further Maths Univariate Statistics Summary Types of Data Data can be classified

Univariate Statistik

Comparison study on univariate forecasting … · Comparison study on univariate forecasting ... activities mainly rely on qualitative methods, ... study-on-univariate-forecasting-techniques-for-apparel

Univariate EDA. Quantitative Univariate EDASlide #2 Exploratory Data Analysis Univariate EDA – Describe the distribution –Distribution is concerned with

Univariate ACE Modelibg.colorado.edu/cdrom2014/maes/Univariate/TwinModel.pdf · Univariate ACE Model Boulder Workshop 2014! Hermine H. Maes, Elizabeth Prom-Wormley

Univariate Time Series Analysis - uni-muenchen.de · Univariate Time Series Analysis Organizational Details and Outline Introduction Time series analysis: Focus: Univariate Time Series