26
Comparison of Genetic Risk Factors Between Two Type II Diabetes Subtypes Item Type text; Electronic Thesis Authors Schader, Lindsey Marie Publisher The University of Arizona. Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. Download date 31/05/2018 18:18:07 Link to Item http://hdl.handle.net/10150/595048

COMPARISON OF GENETIC RISK FACTORS BETWEEN …arizona.openrepository.com/arizona/bitstream/10150/595048/1/azu... · Biology THE UNIVERSITY OF ARIZONA ... variation between studies,

  • Upload
    vudat

  • View
    221

  • Download
    3

Embed Size (px)

Citation preview

Comparison of Genetic Risk FactorsBetween Two Type II Diabetes Subtypes

Item Type text; Electronic Thesis

Authors Schader, Lindsey Marie

Publisher The University of Arizona.

Rights Copyright © is held by the author. Digital access to this materialis made possible by the University Libraries, University of Arizona.Further transmission, reproduction or presentation (such aspublic display or performance) of protected items is prohibitedexcept with permission of the author.

Download date 31/05/2018 18:18:07

Link to Item http://hdl.handle.net/10150/595048

COMPARISON OF GENETIC RISK FACTORS BETWEEN TWO TYPE II DIABETES SUBTYPES

By

LINDSEY MARIE SCHADER

UND SDL

A Thesis Submitted to the Honors College

In Partial Fulfillment of the Bachelors degree With Honors in

Biology

THE UNIVERSITY OF ARIZONA

DECEMBER 2015

Approved by:

. .

Dr. Yann Klimentidis Department of Epidemiology and Biostatistics

Abstract

Type 2 Diabetes (T2D) is an extremely heterogeneous disease, and the heritability of T2D is not

fully accounted for. This study seeks to determine T2D subtypes based on clinical features

before T2D diagnosis, and to test whether genetic risk factors differ between the subtypes. A

sample of 13,459 White, GWAS study participants was obtained from FRAM, MESA, and ARIC.

This sample consisted of 832 cases (individuals who developed T2D during follow-up) and

12,066 controls (did not develop T2D). K-means clustering was used to cluster individuals in the

cases dataset based on metabolic and anthropometric characteristics. Cox proportional hazards

models were used to test whether T2D genetic risk factors differed between the groups. The

clustering analysis resulted in two clusters with cluster one consisting of a higher percentage of

women with higher WHR, lower HDL, and higher FI as compared to cluster two. There were no

statistically significant differences between the genetic risk factors of the two clusters. The

most significant differences in genetic risk factors were associated with adiposity, suggesting

some interaction between adiposity genes and the characteristic phenotypes of each cluster on

T2D development. Further research is needed to replicate subtypes and to find significant

genetic associations.

1

Introduction

Type 2 Diabetes (T2D) is an increasingly prevalent disease worldwide. In 2014 9.3% of the U.S.

population suffered from the disease (1), and it is projected that there will be around 300

million diabetes cases in 2025 (2). In addition, diabetes was the seventh leading cause of death

in the U.S. in 2010 (1). Diabetes along with its associated health complications cost the United

States a total of $245 billion in 2012, with $176 billion in direct costs (1). T2D is characterized

by the body’s inability to control blood glucose levels, which can lead to multiple health

problems including kidney failure, heart disease, hypoglycemia, hyperglycemia, eye problems,

and amputations, among others (1). Clearly diabetes is of epidemic proportions, and it is a

major health concern for the United States and the rest of the world.

In contrast to T1D which is characterized by the autoimmune destruction of pancreatic beta-

cells, T2D is an extremely heterogeneous disease characterized by either abnormal production

of insulin or insulin resistance (3). Some researchers consider the categorization of T2D as a

single disease entity to be a major error due to the large heterogeneity of the disease. T2D

exists on a continuum between insulin-resistant obese patients to insulin-deficient lean patients

(4), and patient phenotypes tend to differ in insulin dependency, metabolic characteristics, and

the presence of GAD antibodies (5). As a result of the phenotypic and genetic heterogeneity of

T2D, the disease is difficult to diagnose. T2D is typically diagnosed based on diabetes patients

simply not meeting the diagnosis criteria for other forms of the disease (6). Due to this

heterogeneity, researchers have suggested that there may be multiple subtypes of the disease.

2

Identifying such subtypes would improve diagnostic tools, and currently research is underway

to identify these subtypes.

Recent research has focused upon identifying T2D subtypes based on different physical

characteristics of patients. Faerch et. al grouped T2D patients based on their fasting insulin and

two hour glucose serum concentrations and found that these groups of patients had differing

trajectories of multiple phenotypic measurements such as beta-cell functioning and risk of

cardiovascular disease (7). More recently, Bapat et al. identified an increased level of Treg cells

(cells involved in immune responses and inflammation) in mice with age-associated T2D as

compared to healthy mice and mice with obesity-associated T2D. They also found that blocking

the growth of these Treg cells in mice prevented age-associated insulin resistance, suggesting a

major etiological difference between T2D patients (8). In addition to these studies, recent

research has used mathematical tools for grouping patients based on multiple characteristics.

One common method, called k-means clustering, has been used to classify different subtypes of

Parkinson’s disease as well as to group individuals based on their metabolic characteristics for

targeted nutritional advice (9,10). The use of clustering to subgroup T2D patients is very

limited. One study published in Science Translational Medicine used topological analysis to

create subgroups of T2D patients based on 73 clinical features. This process resulted in three

subgroups of T2D patients with differing genetic associations between the groups (11). Despite

this study, and other studies that focus on the genetics of T2D, only 15% of the heritability of

T2D has been explained, while around 80% of the heritability of T1D has been accounted for

(6). Even with these promising results on subtyping and the genetics of T2D, a comprehensive

picture is still lacking that includes both the genetics and the etiology of T2D.

3

The objective of our study is to cluster patients who continue on to develop T2D into distinct

subtypes and to test whether genetic risk factors differ between the two subtypes. This analysis

will allow us to identify genes that play a role in T2D development, but were previously

unidentified due to their effect on only one subgroup of T2D patients. This research is clinically

useful because it allows us to further understand the heterogeneity of T2D and thus to create

individualized treatments (4). In addition, the identification of genetic risk factors will allow us

to learn more about the etiology of T2D by performing research on the molecular functions of

any genes identified. Lastly, the ability to cluster patients based on phenotypic characteristics

before T2D development allows us to more accurately estimate disease risk by allowing us to

consider physical characteristics and genetics jointly for pre-diabetic patients. Currently, genetic

risk scores (GRS), or models that use genetic data as inputs to estimate disease risk, are used to

estimate patient risk for T2D. Some of these models also incorporate physical characteristics

(12). Our analysis allows us to more accurately describe how the physical environment and

genetics interact to lead to T2D development, by creating subtypes of patients based on

physical characteristics and by determining whether some genes cause increased risk of T2D for

a certain subtype.

Methods

Studies

We used data on 13,459 GWAS study participants obtained from the Framingham Heart Study

(FRAM), the MESA SHARe Study (MESA), and the Atherosclerosis Risk in Communities study

4

(ARIC). The first study, FRAM was conducted to identify risk factors for cardiovascular disease.

The data used in our study was from the offspring cohort which consisted of men and women

ages 30 to 62 years living in Framingham, Massachusetts. The phenotypic data used in our

study was from visit four of the FRAM Offspring cohort (13). The second study, MESA, was a

prospective cohort study with the purpose of investigating cardiovascular disease. The

genotyped cohort used in our study consisted of men and women ages 45 to 84 years from six

different communities across the United States (14). The last study, ARIC is another prospective

study with the aim of identifying the causes and clinical outcomes of atherosclerosis. The

cohort component used from this study consisted of men and women ages 45 to 64 years from

four communities across the United States (15).

The data on self-declared white participants from all three of these studies was compiled to

create our cohort. All individuals with prevalent T2D at the initiation of data collection were

excluded from the study, and subjects were divided into cases and controls. Cases consisted of

those individuals who developed T2D over the course of follow-up, while controls were subjects

who did not develop T2D over the course of follow-up. Among the cases we excluded

individuals on cholesterol-lowering or hypertension medication along with anyone with missing

values for the phenotypic variables of interest. The final cases dataset consisted of 178 FRAM

participants, 109 MESA participants, and 545 ARIC participants, for a total of 832 individuals.

The control dataset consisted of 2,602 FRAM participants, 2,119 MESA participants, and 7,345

ARIC participants, for a total of 12,066 individuals.

5

Phenotypes

The phenotypes chosen for clustering included the important metabolic and anthropometric

measurements that were available in all three studies. These variables included sex, body-mass

index (BMI), waist-to-hip ratio (WHR), triglycerides (TG), high-density lipoprotein (HDL), fasting

glucose (FG), fasting insulin (FI), total cholesterol (TC), systolic blood pressure (SBP), and

diastolic blood pressure (DBP). These phenotypes are common measurements used in risk

scores for predicting T2D development (12). Further glucose, total cholesterol, and triglyceride

levels have all been determined to be important indicators of metabolic health (10). The units

of measurement for TG, HDL, FI, and FG differed between the studies. To control for any

variation between studies, all phenotypic variables were scaled in the cohort dataset before

clustering analysis.

Genotypes

600 SNPs were selected for analysis along with 22 genetic risk scores (GRS). GRS were

calculated for the phenotypes of T2D, FI, FG, two hour glucose (THG), proinsulin (PRO), SNPs

relating to HbA1c (HBA), low levels of adiponectin (ADPN), BMI, WHR, TC, low-density

lipoprotein (LDL), HDL, adiposity (FAT), C-reactive protein (CRP), serum urate (URATE), blood

pressure (BP), insulin resistance (IR), beta-cell function (BC), WHR adjusted BMI, and

triglycerides (TG). GRS risk scores were calculated as weighted averages of alleles that have

been previously identified as risk alleles for the corresponding phenotypes.

6

Clustering Method

K-means clustering was used to create clusters of T2D patients from our cases cohort. The k-

means clustering method is a partitioning method where participants are grouped based on a

pre-specified number of clusters. In the initial stage, cluster membership is determined by

Euclidian distance from randomly chosen points. Then the mean of the clusters formed are

calculated and the shorter Euclidian distance from the cluster mean is used to determine new

clusters. This process is reiterated for a specified number of times or until stability (9). The

variables selected for the clustering analysis are listed in the phenotype section of this paper,

and all were measured before the patient was diagnosed with T2D. All phenotypic variables

were standardized, so one variable did not carry more weight than another in the analysis

based on the unit of measurement. Optimal cluster number was determined using the 2.0-10

version of the cascadeKM function from the package Vegan in R. The calinski method argument

was used, as determined most appropriate by Milligan et al. (16). An optimal cluster number of

two was determined by this method. For the clustering portion of analysis the k-means function

in R was used on our dataset of T2D cases only. The default algorithm of Hartigan and Wong

(1979) was used, and the argument of 25 repetitions was implemented in order to avoid local

optimal solutions.

To determine whether the clusters obtained were significantly different from one another a t-

test between means of the two clusters for all the phenotypic variables of interest was

performed.

7

Statistical Analyses

To test whether genetics has differing effects on T2D development between the two clusters a

Cox proportional hazards model was used to estimate the effects of genetics and cluster

membership upon the risk of T2D development over time. Two Cox proportional hazards

models were used for this analysis.

Primary Model

The first model was estimated on the cases only dataset. In this Cox proportional hazards

model, the outcome of hazard rate of T2D development was regressed upon cluster

membership, where cluster membership was included as an ‘as.factor’ variable, and an

interaction term between cluster membership and a genetic variable (SNP or GRS). This

interaction term was the coefficient of interest in our analysis as it conveys a difference in

genetic effects between the two datasets. The main effects of age, sex, and the genetic variable

were included in this model as control variables.

Secondary Model

coxph(Surv(time to development, diabetes incidence)~SNP (or GRS) + Age + Sex + as.factor(cluster)

+ as.factor(cluster)*SNP(or GRS), data = both clusters )

8

The second analysis consisted of running a Cox proportional hazards model on two separate

datasets. The first dataset consisted of members of the first cluster and all controls. The second

dataset consisted of members of the second cluster and all controls. A genetic variable (SNP or

GRS) was used as an input variable into the model, and the hazard rate of T2D development

was used as the output. Age and sex were controlled for. This model was estimated for both

datasets and the difference between their coefficients for the genetic variables was evaluated

to indicate the differing effects of SNPs and GRSs between the two clusters.

These models were run for all SNP and GRS of interest.

Results

A summary of phenotypic characteristics of the cases are shown in Table 1. The Calinski-index

algorithm determined that two clusters was the optimal cluster number for our cases dataset.

After performing k-means clustering in R, we obtained two clusters with significant differences

between cluster means for nine out of the eleven clustering variables as determined by a t-test

between cluster means (Table 2). The first cluster (C1) consists of mostly women with a higher

WHR, lower HDL, and higher FI than cluster 2 (C2) which consisted of mostly men with lower

WHR, higher HDL, and lower FI. The differences between the age and total cholesterol

phenotypes were insignificant between the clusters. Summary statistics of the phenotypic

characteristics of each cluster may be found in Table 3.

coxph(Surv(time to development, diabetes incidence)~ SNP(or GRS) +Age + Sex, data=cluster 1 or 2)

9

After clustering was performed, we fit our primary model to the cases only dataset. This model

consists of a Cox proportional hazards model with cluster membership coded for as an

‘as.factor’ variable. The most significant results of this analysis are listed in Table 4 (SNP) and

Table 5 (GRS). This model did not generate any statistically significant coefficients for the

interaction between cluster membership and genotype characteristic (SNP or GRS). The top five

most significant SNPs were rs1553318, rs731839, rs6882076, rs2294239, and rs2652834. These

SNPS relate to TG, TG, LDL, WHR, and HDL respectively. Furthermore, the most significant GRS

scores identified were FAT and TG. Our secondary Cox proportional hazards model was run on

the clustering datasets separately, to see what effects certain genetic characteristics had on

T2D development. The result of interest in this model is the coefficient corresponding to the

genetic characteristic in each cluster model and how they differ between the two clusters.

These results are summarized in Table 4 for the top 20 most significant SNPs identified by our

primary model and in Table 5 for all the GRS scores tested. The genes that the 20 most

significant SNPs correspond to are listed in Table 6.

Discussion

The objective of our study was to identify subtypes of individuals who go on to develop T2D and

to determine whether the role of specific genetic factors differs between the two groups. Our

cluster analysis resulted in two groups of phenotypically distinct patients. C1 was generally

characterized by women with higher WHR, lower HDL, and higher FI, as compared to C2 which

consisted mostly of men with lower WHR, higher HDL, and lower FI. Although there were no

10

statistically significant findings on differing genetic effects between the two groups, we did

identify some associations of interest that may be explored in future research. Both the most

significant SNPs and the most significant GRS scores relate to adiposity, suggesting a differing

interaction between the phenotypes that characterize each cluster and adiposity-related

genotypes.

Although our two clusters were distinct from one another, they do not reflect the disease

subtypes that have been found in recent research. One recent study identified that there is a

physiological difference between patients that have age-associated onset of T2D versus

obesity-associated onset of the disease. They found that altering Treg cells in mice prevented

age-associated onset of T2D while obesity-associated onset of T2D had no association with Treg

cells (8). Our clusters did not reflect these different subtypes of T2D. C1 had a slightly higher

age on average (55.06 vs. 54.86 years) but the difference is not significant. It is true that many

measurements of metabolic health differed between the two groups, but there is not a clear

distinction between late age of onset and obesity-related T2D. Another study, more similar to

our analysis, looked at subtyping T2D patients based on multiple phenotypic characteristics

using a topological approach. This study had more power to identify subtypes than our study

due to their wealth of phenotypic data (73 variables included in clustering) and large sample

size of 11,210 T2D patients. This analysis yielded three subtypes. The first subtype was

characterized by young, overweight patients, the second was characterized by low-weight

patients, while the third group of patients had high SBP, serum chloride, and troponin I levels.

These subtypes also do not reflect the subtypes discovered in our analysis or the analysis of

other researchers, and the detailed differences between their three subtypes included

11

metabolic measurements that were not available in our data, such as white blood cell count

and serum albumin (11). Current research is focusing on disease subtyping of T2D because the

disease is so heterogeneous, but our study, along with others does not create a clear consensus

on distinct subtypes.

In addition to the unique subtypes delineated by our analysis, some interesting genetic trends

were identified that reflect some current T2D research. The two most significant SNPs identified

in our analysis are both related to TG levels. The first SNP, rs1553318, is associated with the

HAVCR1 gene. This SNP increased the risk of T2D development in both clusters, but more so in

C2. The second SNP, rs731839, is near the PEPD gene. This SNP was associated with increased

T2D risk in C1 and decreased risk in C2. It is of interest that both SNPs are associated with TG

because the pre-diabetic state is characterized by elevated TG levels (a characteristic of

dyslipidemia). Furthermore, TG levels can be used to predict T2D risk, and one study found that

looking at the change in TG levels over time helped predict T2D risk in men (17). In addition to

the effects of TG on T2D risk, some research suggests that there is an association between a

certain variant of the PEPD gene and T2D development. A recent GWAS study on Chinese Hans

found that higher levels of n-3 fatty acids help mitigate the increased risk of T2D development

caused by the PEPD gene (18). This corresponds to our results where PEPD had a protective

effect against T2D development in C2, since C2 is characterized by lower TG levels ( associated

with high n-3 fatty acid consumption) (19).

The third most significant SNP in our analysis, rs6882076, is near the TIMD4 gene which affects

LDL levels. This SNP increased the risk of T2D development in C1 while decreasing risk in C2.

12

Although LDL is not a major characteristic used in calculating T2D risk (12) because LDL levels do

not differ significantly between diabetics and non-diabetics, research has found that LDL

particles in T2D patients are typically smaller than those in their non-diabetic peers (20).The

results of our analysis suggest that there may be some association between the TIMD4 gene

and the environment that affects T2D development.

There are far reaching implications for T2D research that incorporates both disease subtype

and genetic risk. First, by identifying distinct subtypes of patients, one may create more

accurate models of disease risk that incorporate both physical characteristics and genetics. The

interaction between genes and the environment is complex, and it is not fully detailed in a

mathematical analysis. By identifying characteristics of subgroups, researchers can gain a better

understanding of what traits may be involved in disease pathways and focus their research in

these areas. This may lead to a greater understanding of the disease itself and targeted

diagnosis and treatment based on disease subtype. The goal is to find distinct subtypes of the

disease that may be clearly defined and then to identify the metabolic pathways involved.

Our study had a number of limitations that prevented us from determining whether the

patients were clustered into truly distinct subtypes and which made it challenging to find

statistically significant results. First, we were not able to test whether the clusters could be

recreated in another dataset to confirm our findings. Further, our cohort was small for a genetic

analysis (only 832 T2D cases) because we removed all people on cholesterol lowering

medication and hypertension medication. Lastly, our cohort included only whites, so the results

may not be generalizable to other populations. Despite these weaknesses, our study has

13

multiple strengths including the inclusion of multiple studies which provided a wide range of

phenotypic and genetic data. In addition, the use of k-means clustering is not a common

method in the T2D literature, but its use has found successful subtypes for Parkinson’s disease

(9) and is therefore a promising method for disease subtyping.

Future research should focus upon replicating these subtypes in another cohort and including

more variables and subjects in the clustering analysis. Once clustering analyses produce

replicable results, thereby identifying distinct subtypes of T2D, animal models may be used to

further understand the disease etiology. There is still much to explore regarding how genetics

and the environment interact to influence the development of this disease.

14

References

1. National Diabetes Statistics Report, 2014. Centers for Disease Control and Prevention:

National Center for Chronic Disease Prevention and Health Promotion: Division of

Diabetes Translation. 2014;1-11.

2. King H, Aubert R, Herman W. Global Burden of Diabetes, 1995-2025. Diabetes Care.

1998;21:14141-1431.

3. Zimmet P, Albertit KGMM, Shaw J. Global and societal implications of the diabetes

epidemic. Nature. 2001;414(6865):782-787.

4. Gale EAM. Is type 2 diabetes a category error? The Lancet. 2013;381(9881):1956-1957.

5. Tuomi T, Santoro N, Caprio S, Cai M, Weng J, Groop L. The many faces of diabetes: a

disease with increasing heterogeneity. The Lancet. 2014;383(9922):1084-1094.

6. Groop L, Pociot F. Genetics of diabetes – are we missing the genes or the disease?

Molecular and Cellular Endocrinology. 2014;382(1):726-739.

7. Faerch K, Witte DR, Tabak AG, Perreault L, Herder C, et al. Trajectories of

cardiometabolic risk factors before diagnosis of three subtypes of type 2 diabetes: a

post-hoc analysis of the longitudinal Whiehall II cohort study. The Lancet Diabetes and

Endocrinology. 2013;1(1):43-51.

8. Bapat SP, Suh JM, Fang S, Liu S, Zhang Y, et al. Depletion of fat-resident Treg cells

prevent age-associated insulin resistance. Nature. 2015;528(7580):137-141.

9. Van Rooden SM, Heiser WJ, Kok JN, Verbaan D, van Hilten JJ, Marinus J. The

identification of Parkinson's disease subtypes using cluster analysis: a systematic

review. Movement Disorders. 2010;25(8):969–978.

15

10. O’Donovan CB, Walsh MC, Nugent AP, McNulty B, Walton J, et al. Use of metabotyping

for the delivery of personalized nutrition. Molecular Nutrition and Food Research.

2014;59(3):377-385.

11. Li L, Cheng W, Glicksberg BS, Gottesman O, Tamler R, et al. Identification of type 2

diabetes subgroups through topological analysis of patient similarity. Science

Translational Medicine. 2015;7(311): 311ra174-3.

12. Noble D, Mathur R, Dent T, Meads C, Greenhalgh T. Risk models and scores for type 2

diabetes: systematic review. BMJ. 2011;343:d7163.

13. Framingham Heart Study. NHBI. 2015.

https://www.framinghamheartstudy.org/index.php.

14. About MESA. MESA Coordinating Center: University of Washington. 2015.

http://www.mesa-nhlbi.org/aboutMESA.aspx.

15. Atherosclerosis Risk in Communities Study. Collaborating Studies Coordinating Center.

Department of Biostatistics Gillings School of Global Public Health. North Carolina

Chapel Hill. 2015. https://www2.cscc.unc.edu/aric/desc.

16. Milligan GW, Cooper MC. An examination of procedures for determining the number of

clusters in a data set. Psychometrika. 1985;50(2):159-179.

17. Tirosh A, Shai I, Bitzur R, Kochba I, Tekes-Manova D, Israeli E. Changes in triglyceride

levels over time and risk of type 2 diabetes in young men. Diabetes Care.

2008;31(10):2032+.

16

18. Zheng JS, Huang T, Li K, Chen Y, Xie H, et al. Modulation of the Association between the

PEPD variant and the risk of type 2 diabetes by n-3 fatty acids in Chinese Hans. Jounral

of Nutrigenetics and Nutrigenomics. 2015;8(1):36-43.

19. Harris WS, Bulchandani D. Why do omega-3 fatty acids lower serum triglycerides?

Current Opinion in Lipidology. 2006;17(4):387-393.

20. Nesto RW. LDL cholesterol lowering in type 2 diabetes: what is the optimum approach?

Clinical Diabetes. 2008;26(1):8-13.

17

Table 1: Baseline characteristics of participants in each of three cohorts.

ARIC FRAM MESA

n 545 178 109

Age (yrs) 54.02 (5.51) 54.92 (8.69) 59.87 (9.81)

% Female 59.82% 53.37% 52.29%

BMI (kg/m2) 29.49 (4.76) 30.88 (5.84) 29.52 (6.40)

WHR 0.97 (0.06) 0.95 (0.09) 0.93 (0.08)

TG* 1.82 (1.04) 196.84 (128.23) 150.79 (95.71)

HDL** 1.12 (0.34) 43.19 (12.27) 49.58 (15.41)

FG**** 6.00 (0.54) 117.25 (33.65) 95.17 (13.13)

TC 5.50 (1.03) 208.77 (34.96) 205.39 (36.79)

SBP (mmHg) 121.07 (15.47) 132.88 (16.92) 125.95 (20.60)

DBP (mmHg) 73.73 (10.23) 78.92 (9.29) 71.10 (10.40)

FI*** 105.31 (60.86) 38.21 (16.28) 11.76 (8.36)

*TG units: ARIC - mmol/L; FRAM - Meq/L; MESA - mg/dL

**HDL units: ARIC - mmol/L; FRAM&MESA - mg/dL

***FI units: ARIC - pmol/L , FRAM - μmol/L, MESA - mU/L

****FG units: ARIC - mmol/L, FRAM - mg/dl, MESA - mg/dl

Table 2: Cluster phenotype comparisons via t-test.

Variable clust1mean clust1SD clust2mean clust2SD t-stat p-value

WHR_pheno 0.49058065 0.6669220 -0.72419048 0.9669827 20.0258690 2.456061e-67

HDL_pheno -0.46698033 0.6136599 0.68935191 1.0576818 -18.0838850 2.524972e-56

FI_pheno 0.38237082 1.0520284 -0.56445216 0.5506614 16.9133715 5.768705e-55

Sex 0.77620968 0.4172040 0.27678571 0.4480769 16.2165669 2.746904e-50

TG_pheno 0.31562379 1.1145907 -0.46592083 0.5232078 13.5651645 1.177836e-37

BMI_pheno 0.31943235 0.9644238 -0.47154299 0.8521667 12.4497415 1.460480e-32

DBP_pheno 0.29310623 0.9775874 -0.43268063 0.8646014 11.2642577 2.324341e-27

SBP_pheno 0.24978211 0.9528888 -0.36872597 0.9509042 9.1979556 3.866903e-19

FG_pheno 0.22054251 0.9732836 -0.32556276 0.9468849 8.0709721 2.857736e-15

TC_pheno 0.05963691 0.9818105 -0.08803544 1.0184503 2.0820662 3.769834e-02

Age 55.05443548 6.8291606 54.87500000 7.7795657 0.3426994 7.319345e-01

18

Table 3: Characteristics of each cluster organized by study.

cluster 1 Cluster 2

Study ARIC FRAM MESA ARIC FRAM MESA

N 332 98 68 213 80 41

Age (yrs) 54.33 (5.62) 54.67 (8.28) 59.18 (8.31) 53.54 (5.30) 55.23 (9.20) 61.02 (11.92)

% Female 79.00% 79.00% 71.00% 30.00% 22.00% 22.00%

BMI (kg/m2) 31.00 (4.60) 32.38 (5.52) 32.15 (6.26) 27.14 (4.00) 29.04 (5.71) 25.17 (3.73)

WHR 1.00 (0.04) 0.99 (0.06) 0.97 (0.06) 0.92 (0.06) 0.89 (0.08) 0.87 (0.08)

TG* 2.14 (1.13) 239.53

(151.69) 181.49

(108.30) 1.34 (0.60) 144.55 (59.68) 99.88 (28.76)

HDL** 0.97 (0.21) 36.49 (8.15) 42.71 (9.64) 1.35 (0.37) 51.39 (11.48) 60.98 (16.49)

FG**** 6.11 (0.50) 124.18 (39.78) 99.97 (11.71) 5.84 (0.57)

108.76 (21.48) 87.20 (11.45)

TC 5.56 (0.97) 209.60 (36.32)

209.65 (40.12) 5.42 (1.11)

207.75 (33.42)

198.34 (29.61)

SBP (mmHg)

125.33 (15.03)

136.79 (16.07)

128.70 (17.66)

114.43 (13.73)

128.11 (16.81)

121.40 (24.28)

DBP (mmHg)

76.69 (10.11) 82.21 (8.98) 73.29 (9.75) 69.11 (8.59) 74.89 (8.04) 67.48 (10.56)

FI*** 127.72 (62.70) 44.60 (18.33) 15.13 (8.90) 70.37 (36.79) 30.39 (8.27) 6.18 (2.14)

*TG units: ARIC - mmol/L; FRAM - Meq/L; MESA - mg/dL

**HDL units: ARIC - mmol/L; FRAM&MESA - mg/dL ***FI units: ARIC - pmol/L , FRAM - μmol/L, MESA - mU/L ****FG units: ARIC - mmol/L, FRAM - mg/dl, MESA - mg/dl

19

Table 4: Most significant primary and secondary analysis results corresponding to SNP data.

SNP

Interaction coefficient for SNP and cluster membership

p-value

Hazard Ratio for Cluster 1

Hazard Ratio for Cluster 2

Hazard Ratio Difference

rs1553318 0.3298372 0.0018933 2.48E-

03 0.0041739 1.69E-03

rs731839 -0.313511 0.0052293 -1.58E-

01 -0.055687 1.02E-01

rs6882076 0.2692967 0.0091396 1.58E-

02 -0.008271 2.41E-02

rs2294239 -0.267014 0.0093985 -8.76E-

02 -0.053721 3.39E-02

rs2652834 -0.307825 0.0115077 -2.15E-

02 0.0146121 3.61E-02

rs13139571 -0.315492 0.0115378 -1.52E-

01 -0.103202 4.84E-02

rs576674 0.3626719 0.0118196 -8.95E-

02 0.0898402 1.79E-01

rs4823006 -0.2515 0.0119255 -7.15E-

02 -0.013835 5.77E-02

rs849135 -0.236855 0.01251 -1.39E-

01 0.0084402 1.48E-01

rs6477694 0.2613947 0.0144687 5.31E-

03 0.0278352 2.25E-02

rs7998202 0.4151296 0.0148486 8.41E-

02 0.1806077 9.65E-02

rs1689800 -0.241809 0.0187326 6.38E-

02 0.0086597 5.51E-02

rs6804842 0.2314893 0.0218175 9.61E-

02 0.0206176 7.55E-02

rs7739232 -0.457728 0.0244218 8.93E-

02 0.0068656 8.25E-02

rs2779116 0.2566808 0.0257419 -6.03E-

02 -0.097148 3.69E-02

rs6795735 -0.225945 0.0266387 2.60E-

02 -0.022721 4.88E-02

rs2954022 0.231793 0.0267414 -4.37E-

02 0.0294779 7.32E-02

rs2078267 -0.228196 0.027513 -4.82E-

02 0.0031743 5.14E-02

rs2954029 0.2308958 0.0275867 -4.88E-

02 0.0217629 7.05E-02

rs12328675 0.3312844 0.0291462 -1.17E- -0.01858 9.84E-02

20

01

rs4865796 -0.241173 0.0300234 3.18E-

02 -0.051541 8.34E-02

rs11694172 -0.256426 0.0301398 3.00E-

02 -0.057769 8.77E-02

rs8182584 -0.2283 0.0331252 -1.00E-

01 -0.0663 3.38E-02

rs2290547 0.3144762 0.0451986 -1.89E-

01 0.1023083 2.91E-01

rs10929925 -0.216843 0.0457031 -3.29E-

02 -0.039378 6.48E-03

rs17367504 -0.281034 0.0471991 1.96E-

02 -0.021738 4.13E-02

rs7225700 -0.20684 0.0485637 1.79E-

02 -0.078474 9.64E-02

rs3780181 0.3922958 0.0497388 -4.78E-

02 0.0059285 5.38E-02

21

Table 5: Primary and Secondary analysis results corresponding to GRS data.

GRS

Interaction coefficient for GRS and cluster membership

p-value Hazard Ratio for Cluster 1

Hazard Ratio for Cluster 2

Hazard Ratio Difference

FAT_GRS -1.035413 0.0464675 -0.125982 0.1108843 0.236866

TG_GRS -0.007944 0.0638599 -0.001277 -0.014616 0.013339

ADPN_GRS 1.7136247 0.0812667 1.0190954 0.9404897 0.078606

CRP_GRS 0.5235282 0.0934355 -0.138709 0.129723 0.268432

BMI_GRS 0.2146506 0.1011368 0.2438031 0.0369369 0.206866

PRO_GRS 1.4593472 0.1190019 0.5452933 -0.725566 1.270859

IR_GRS 0.0855827 0.1244277 0.0664784 0.0578434 0.008635

HDL_GRS -0.028909 0.142099 0.0128152 -0.032373 0.045188

URATE_GRS 0.3532343 0.1545706 -0.062828 0.139879 0.202707

HBA_GRS 1.3121146 0.1706568 0.4518716 1.6238976 1.172026

TG_GRS40 -0.431003 0.1836546 0.2381145 -0.864953 1.103068

BP_GRS 0.0507751 0.1887276 -0.032825 -0.02229 0.010535

FI_GRS 2.186906 0.1951746 2.8075312 2.6641247 0.143407

T2D_GRS 0.0767174 0.2964663 0.2875567 0.3214717 0.033915

LDL_GRS -0.008485 0.3233295 -0.006835 -0.010248 0.003413

TC_GRS -0.005176 0.4849864 -0.006943 -0.011592 0.004649

BC_GRS -0.027769 0.4905962 0.1140514 0.0997041 0.014347

WHR_GRS -0.648085 0.4958957 1.4135402 -0.021876 1.435416

FG_GRS 0.398858 0.5610801 1.7399193 2.3933806 0.653461

BMI96_GRS -0.163293 0.717723 0.5384569 0.209882 0.328575

WHRadjBMI48_GRS -0.212589 0.7464577 0.8287973 0.0777437 0.751054

THG_GRS 0.115274 0.796558 0.174948 0.0680792 0.106869

22

Table 6: Phenotypes and genes relating to the most significant SNPs in the primary analysis.

SNP Associated

Trait Effect

Risk Allele

Gene (ID) Chromosome

rs1553318 TG 2.63 HAVCR1 5:157052312

rs731839 TG 0.022 PEPD 19:33408159

rs6882076 LDL 1.67 C TIMD4 5:156963286

rs2294239 WHR 0.025 ZNRF3 22:29053489

rs2652834 HDL 0.39 LACTB 15:63104668

rs13139571 BP 0.321259 GUCY1A3,

LOC105377506 4:155724361

rs576674 FG 0.016697 G KL, ~36 kb upstream

13:32980164

rs4823006 WHR 0.023 A ZNRF3 22:29055683

rs849135 T2D 1.11 G JAZF1 7:28156794

rs6477694 BMI C C9orf4 9:109170062

rs7998202 HBA 0.031 G ATP11AUN 13:112677554

rs1689800 HDL 0.47 G GLUL, ZNF648 1:182199750

rs6804842 BMI G LOC101927874 3:25064946

rs7739232 HIPadjBMI A Locus: KLHL31 6:53675537

rs2779116 HBA 0.024 T SPTA1 1:158615625

rs6795735 T2D 1.08 C ADAMTS9-AS2 3:64719689

rs2954022 TC 2.3 C LOC105375745 8:125470379

rs2078267 URATE 0.073 C SLC22A11 11:64566642

rs2954029 TG 5.64 A LOC105375745 8:125478730

rs12328675 HDL 0.68 T COBLL1 2:164684290

23

rs4865796 FI 0.015358 A ARL15 5:53976834

rs11694172 TC 0.028 G FAM117B 2:202667581

rs8182584 T2D 1.04 T PEPD 19:33418804

rs2290547 HDL -0.03 A SETD2 3:47019693

rs10929925 HIP C SOX11 2:6015425

rs17367504 BP 0.9030779 A MTHFR 1:11802721

rs7225700 LDL 0.87 C LOC102724508 17:47314438

rs3780181 TC -0.044 G VLDLR 9:2640759