Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
1
Varicose veins of lower extremities: insights from the first large-scale genetic study 1
2
Alexandra S. Shadrina1,2*, Sodbo Z. Sharapov2,3, Tatiana I. Shashkova2,4,5, Yakov A. Tsepilov2,3* 3
4
1 Laboratory of Pharmacogenomics, Institute of Chemical Biology and Fundamental Medicine, 5
Novosibirsk, Russia 6
2 Laboratory of Theoretical and Applied Functional Genomics, Novosibirsk State University, 7
Novosibirsk, Russia 8
3 Laboratory of Recombination and Segregation Analysis, Institute of Cytology and Genetics, 9
Novosibirsk, Russia 10
4 Knomics LLC, Skolkovo Innovation Center, Moscow, Russia 11
5 Department of Biological and Medical Physics, Moscow Institute of Physics and Technology, 12
Moscow, Russia 13
14
15
16
* Corresponding authors: 17
Dr. Alexandra Shadrina, Laboratory of Pharmacogenomics, Institute of Chemical Biology and 18
Fundamental Medicine, 8 Lavrentjev Avenue, Novosibirsk 630090, Russia. Tel: +7 383 19
3635171, E-mail: [email protected] 20
ORCID 0000-0003-1384-3413 21
Dr. Yakov Tsepilov, Laboratory of Recombination and Segregation Analysis, Institute of 22
Cytology and Genetics, 10 Lavrentjev Avenue, Novosibirsk 630090, Russia. Tel: +7 383 23
3634980, E-mail: [email protected] 24
ORCID 0000-0002-4931-6052 25
26
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
2
Abstract 27
Varicose veins of lower extremities (VVs) are a common multifactorial vascular disease. 28
Genetic factors underlying VVs development remain largely unknown. Here we report the first 29
large-scale study of VVs performed on a freely available genetic data of 408,455 European-30
ancestry individuals. We identified 7 reliably associated loci that explain 10% of the SNP-based 31
heritability, and prioritized the most likely causal genes CASZ1, PPP3R1, EBF1, STIM2, and 32
HFE. Genetic correlation analysis confirmed known epidemiological associations and found 33
genetic overlap with various traits including fluid intelligence score, educational attainment, 34
smoking, and pain. Finally, we observed causal effects of height, weight, both fat and fat-free 35
mass, and plasma levels of MICB and CD209 proteins. 36
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
3
Varicose veins (VVs) are one of the clinical manifestations of chronic venous disease 37
posing both a cosmetic and medical problem. VVs can be found in different parts of the body, 38
but most commonly occur in the lower extremities. Prevalence estimates of this condition vary 39
across ethnic groups ranging from 2-4% in the Northern group of the Cook Islands to 50-60% in 40
some countries of the Western world1. Increased age, female sex, number of pregnancies, 41
obesity, history of deep venous thrombosis, and standing occupation are among other risk 42
factors2. VVs not related to the post-thrombotic syndrome or venous malformations are defined 43
as primary VVs. 44
Pathogenesis of VVs is still not fully clarified. According to current understanding, key 45
factors implicated in VVs development include changes in hemodynamic forces (decrease in 46
laminar shear stress and increase in venous filling pressure), endothelial activation, 47
inflammation, hypoxia, and dysregulation of matrix metalloproteinases and their tissue 48
inhibitors3-5. These alterations underlie pathological remodeling of the vascular wall and loss of 49
its tone. Questions remain about the order of events and the primary stimulus triggering the set of 50
disease-related changes. 51
The cumulative evidence from epidemiological, family, and genetic association studies 52
strongly indicates that there is a hereditary component in VVs etiology6-8. However, despite 53
progress in this field9-12, current knowledge of the genetic basis of this pathology is far from 54
being complete. Elucidating genes involved in susceptibility to VVs would help to identify key 55
molecular players in the disease initiation, provide deeper insights into its pathogenesis, and 56
eventually contribute to development of improved targeted therapy aimed at VVs treating and 57
preventing. 58
Large-scale biobanks linked to electronic health records open up unparalleled opportunities 59
to investigate the genetics of complex traits. Today, UK Biobank is the largest repository that 60
contains information on genotypes and phenotypes for half a million participating individuals13. 61
This resource is open to all bona fide researchers, and access to data is provided upon approval 62
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
4
of their application and payment of necessary costs. However, the need to incur high costs 63
related to data access and computation can be an insurmountable obstacle for those who cannot 64
afford these expenses. The Neale Lab (http://www.nealelab.is/) and the Gene ATLAS database14 65
(http://geneatlas.roslin.ed.ac.uk/) are two independent projects, which intend to remove this 66
burden by generating genetic association data and sharing them with broader scientific 67
community. These resources provide free open access to “quick-and-dirty” genome-wide 68
association study (GWAS) summary statistics for a wide range of phenotypes measured in the 69
UK Biobank. In our study, we aimed to employ state-of-the-art bioinformatics approaches to 70
extract maximum possible information from these open resources with regard to the genetics of 71
VVs of lower extremities. Our objectives were to (1) identify genetic loci reliably associated 72
with VVs risk and prioritize the genes that account for the revealed associations, (2) elucidate 73
pleiotropic effects of identified loci, (3) investigate genetic overlap between VVs and other 74
complex traits, (4) gain etiological insights and explore cause-and-effect relationships by means 75
of Mendelian randomization analysis. 76
Results 77
Study design, assumptions, and limitations 78
Our study was designed as a classical GWAS involving the initial discovery screening 79
followed by the replication stage. All calculations were entirely based on the UK Biobank data 80
for white British individuals available in open access databases. The overall workflow of the 81
study is depicted in Fig.1. In the discovery stage, we used GWAS summary statistics provided 82
by the Neale Lab. To be able to perform a replication, we extracted the UK Biobank data not 83
covered by this project by means of reverse meta-analysis of two overlapping datasets: genetic 84
association data provided by the Gene ATLAS database and the GWAS data supplied by the 85
Neale Lab. The projects used different software and methods of analysis. The Gene ATLAS 86
project14 applied less stringent filtering criteria and therefore had larger sample size. This 87
difference in quality control approaches enabled us to generate a subset of “subtracted” 88
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
5
individuals and use them as a replication cohort. Data obtained in both GWAS stages were used 89
for further downstream analyses. 90
In carrying out this study, we had to face a number of challenges and limitations that must 91
be acknowledged. The first limitation was intrinsic to the general approach to phenotype 92
definition based on the electronic medical records system, which was employed in the UK 93
Biobank study. Phenotype “VVs of lower extremities” was defined based on International 94
Classification of Disease (ICD-10) billing code “I83” present in the electronic patient record. 95
The Neale Lab reported the phenotype prevalence of 2.1%. It is much lower than VVs 96
prevalence estimated by European epidemiological studies. Despite the evidence of a “healthy 97
volunteer” selection bias in the UK Biobank study15, such a low rate indicates that a proportion 98
of individuals remained undiagnosed. This is in line with a recent primary healthcare register-99
based study reporting VVs prevalence rate of 3% in German general practice16. This 100
phenomenon could be explained by a non-life-threatening nature of varicose veins, which might 101
discourage patients from communication to the doctor. Given that individuals not diagnosed with 102
I83 served as controls in our study, we could therefore expect an overall decrease in the 103
statistical power to detect gene-disease associations. Another potential source of missing 104
associations was immensely strict criteria used by the Neale Lab for single nucleotide 105
polymorphisms (SNPs) quality control removing around 75% of SNPs initially provided by UK 106
Biobank. 107
The next important limitation arose from the lack of access to individual-level data 108
resulting in the inability to control a possible selection or sampling bias. We suggested that traits 109
related to VVs risk factors could potentially cause unequal representation of patients with 110
different characteristics in the case and the control groups, and thereby induce spurious 111
associations or effect modification. The Neale Lab analyses were adjusted for sex, but other 112
factors were beyond our control. In order to address this challenge, we performed an adjustment 113
for two potential confounders – body mass index (BMI) and deep venous thrombosis (DVT) – by 114
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
6
implementation of the method based on GWAS summary statistics17,18 (Supplementary 115
Methods). 116
Last but not least, a noteworthy debatable point in our study is the approach to generating a 117
replication cohort. Due to our intention to use only free open resources, we had to bypass the 118
need to refer to individual patient data. Our replication study sample comprised individuals who 119
have passed the Gene ATLAS quality control criteria, but have been filtered out by the Neale 120
Lab. It was almost 5 times smaller than our discovery study sample. However, although this 121
“dirty” design limited our power to replicate associations, it still could not produce any false 122
positive findings. Moreover, we calculated genetic correlations between VVs in the replication 123
cohort and VVs in both public databases to ensure that these traits are genetically similar 124
(Supplementary table 1). VVs in our replication study was highly correlated with the same trait 125
in other datasets (rg ~ 0.9). Furthermore, it is noteworthy that VVs in the Gene ATLAS, the 126
Neale Lab as well as in BMI- and DVT-adjusted Neale Lab GWAS were almost genetically 127
equivalent (rg = 0.99 or rg = 1.00). 128
Summarizing the above, we can state that the limitations which we could not circumvent 129
were mainly related to the loss of statistical power. Nevertheless, given a large sample size and a 130
large number of associations tested, we assume that this obstacle could be at least partially 131
compensated by a huge scale of the UK Biobank study itself. 132
Genome-wide association study for VVs of lower extremities 133
We defined associated loci as regions within 250 kb from the lead SNP and reported only 134
the most significant SNP hits per locus. 135
In the discovery stage, we identified 18 loci that met a genome-wide level of statistical 136
significance of P < 5.0e-08 as provided by the Neale Lab (Table 1, Supplementary table 2). 137
Adjustment for DVT and BMI only slightly changed the observed effects of SNPs, but yielded 138
two additional genome-wide significant loci tagged by rs11693897 and rs247749. However, both 139
these SNPs as well as polymorphism rs1061539 turned insignificant after the correction for the 140
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
7
genomic inflation factor (LD Score regression intercept) λ = 1.0403. In the replication stage, 7 141
loci were replicated at P < 0.0029 (0.05/17). Manhattan plot of –log10(P) is presented in Fig.2. A 142
quantile-quantile plot for observed vs. expected distribution of P-values is shown in 143
Supplementary fig. 2. 144
Regional association plots for replicated loci are given in Supplementary fig. 1. Genes 145
located in these regions are listed in Supplementary table 2. We performed conditional and joint 146
(COJO) analysis and detected two independent signals in the locus on chromosome 16 at a 147
distance of 39 kb from each other (Supplementary table 3). Lead SNPs rs9972645 and rs2911463 148
were in low linkage disequilibrium (r2 = 0.15). 149
Functional annotation of the revealed signals 150
Literature-based annotation 151
We explored whether genes in close proximity to replicated hits could have biologically 152
plausible roles in VVs based on their functions or previously revealed associations with other 153
complex traits. The results of literature-based candidate gene prioritization are provided in Table 154
2. Furthermore, we compiled a list of traits associated at a genome-wide significance level with 155
either lead SNPs, or variants in linkage disequilibrium (LD) with these polymorphisms using 156
both the Pubmed and the PhenoScanner databases (Table 2, Supplementary table 4). Our analysis 157
showed that among the genes nearest to the top GWAS loci (±250 kb), 2 genes were related to 158
vascular development and remodeling (CASZ1 and PIEZO1) with one gene having putative role 159
in vascular remodeling (STIM2), 4 genes were implicated in blood pressure and hypertension 160
(CASZ1, PIEZO1, SLC12A2, and EBF1), 2 genes were linked to inflammation (PPP3R1 and 161
EBF1), and 2 genes were ion channels involved in regulation of both cell volume and vascular 162
tone (PIEZO1 and SLC12A2). Near the SLC12A2, we identified the FBN2 gene which encodes a 163
protein regulating elastogenesis, so we prioritized both genes. 164
Two signals overlapped with findings from a recent GWAS for VVs conducted by 165
“23andMe” company12: rs11121615 in the CASZ1 gene, which was among the top SNPs 166
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
8
identified in that study, and rs2861819 near the PPP3R1, which was in complete LD 167
(D’ = 1.00, r2 = 0.98) with their another hit rs6712038. For 3 loci (tagged by rs2911463, 168
rs3101725, and rs7773004) a genome-wide significant association with red blood cell traits has 169
previously been demonstrated. One of these signals was near the HFE gene implicated in iron 170
absorption. The same locus showed association with height. 171
Changes in blood pressure, abnormal vascular wall remodeling, altered vascular tone, 172
deterioration of vein wall elastic properties as well as inflammation are known factors related to 173
VVs. Association with red blood cell traits is intriguing. Hypothetically, both erythrocyte 174
homeostasis and VVs could be linked to iron overload19,20. It is noteworthy that association 175
between hemochromatosis-related functional variant in the HFE gene and primary VVs was 176
revealed in our previous candidate-gene study in ethnic Russian individuals19. 177
Searching for functional variants 178
Ensembl Variant Effect Predictor (VEP)51 analysis identified one moderate-impact 179
missense variant rs2044693 in the PNO1 gene (in the locus tagged by rs2861819). SIFT and 180
PolyPhen tools predicted no damaging effect of this SNP on the protein function. Additionally, 181
we revealed 19 SNPs in regulatory regions (in loci tagged by rs9972656, rs2861819, 182
rs11135046, and rs7773004), and one synonymous variant rs1368298 in the EBF1 gene (in the 183
locus tagged by rs11135046). The full data is available in Supplementary table 5. 184
Testing for pleiotropy: VVs and gene expression 185
We used Summary data-based Mendelian Randomization (SMR) analysis followed by the 186
Heterogeneity in Dependent Instruments (HEIDI)52 test to identify genes whose expression level 187
is associated with the same causal SNPs that affect the risk of VVs. The results are given in 188
Supplementary table 6. SMR/HEIDI analysis provided evidence for pleiotropic effects of 189
variants within rs3101725 and rs2861819 regions. In the first region, association was revealed 190
with the expression of long intergenic non-protein coding RNA 1184 (LINC01184) located in 191
close proximity to the SLC12A2 on the reverse strand. Changes in the SLC12A2 expression were 192
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
9
not associated with VVs in our tests. Interestingly, the previous study showed that disruption of a 193
bi-directional promoter between these genes altered the expression of both SLC12A2 and 194
LINC01184, while deletion of the third exon of LINC01184 affected only LINC01184 195
expression36. 196
In the locus tagged by rs2861819, our analysis identified four genes – PPP3R1, PLEK, 197
WDR92, and PNO1. It is likely that this region contains a strong eQTL that affects the 198
expression of nearby genes in different tissues and influences the risk of VVs. 199
It is important to note that SMR/HEIDI analysis does not distinguish pleiotropy from 200
causality. Therefore, we can hypothesize causal relationships between the expression level of 201
identified genes and the risk of VVs development. 202
Besides this, we found associations with expression levels of four genes within rs2911463 203
region (APRT, ZFPM1, RNF166, and PIEZO1) as well as with expression level of the TRIM38 204
gene located ~280 kb from rs7773004. However, statistically significant heterogeneity revealed 205
in the HEIDI test indicated that these associations were mediated by polymorphisms different 206
from those that alter VVs risk. 207
DEPICT analysis 208
Data-driven Expression Prioritized Integration for Complex Traits (DEPICT) framework53 209
was used to conduct a gene set and tissue enrichment analysis as well as to provide additional 210
evidence for gene prioritization. Results of DEPICT analysis for SNP sets associated with VVs 211
at P < 5.0e-08 and P < 1.0e-05 are given in Supplementary tables 7 and 8, respectively. For the 212
first set, we identified a significant enrichment of the Mammalian Phenotype (MP) Ontology 213
terms “abnormal vasculogenesis” and “abnormal vascular development”. When we relaxed 214
significance threshold of input SNPs to P < 1.0e-05, 22 MP terms were shown to be enriched 215
with additional 9 categories related to abnormal morphology of heart and vessels (“abnormal 216
blood vessel morphology”, “abnormal vascular smooth muscle morphology”, “failure of heart 217
looping”, etc.). The remaining MP terms were associated with aberrant morphology of other 218
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
10
tissues and organs and disruption of embryonic development. Besides this, we observed 219
enrichment for several subnetworks of protein-protein interactions, including that for ENG 220
(major glycoprotein of the vascular endothelium, mutations in its gene cause multisystemic 221
vascular dysplasia), SMAD 2, 4, 6, 7 (transduce signals from TGF-beta family members), and 222
Notch4 (regulates arterial specification54). 223
Tissue enrichment analysis revealed no statistically significant categories. 224
DEPICT gene prioritization tool provided 2 and 47 prioritized genes depending on the 225
level of statistical significance of the input SNPs (P < 5.0e-08 and P < 1.0e-05, respectively; 226
Supplementary tables 7 and 8). 227
Summary of gene prioritization 228
Basing on cumulative evidence from different analyses, we drew up a list of genes which 229
most likely account for the revealed associations and seem to be functionally relevant to VVs 230
development (Table 3). For two loci on chromosomes 2 and 5, we were unable to prioritize any 231
candidate gene. However, for the locus on chromosome 2, we advocate for the PPP3R1 gene 232
since it displayed the lowest P and the greatest effect size in the SMR test (P = 1.3e-22 and β = 233
120.0, Supplementary table 6). Moreover, biological functions of PPP3R1 suggest its 234
involvement in VVs etiology (Table 2). In particular, it induces MCP-1 production, which was 235
shown to be enhanced in VVs55,56. Finally, a recent study demonstrated that expression of 236
PPP3R1 is increased in the venous tissue of VVs patients compared with veins of healthy 237
individuals (fold change = 1.5)57. 238
Testing for pleiotropy: VVs and other complex traits 239
SMR/HEIDI analysis revealed 31 traits that shared the same casual variants with VVs 240
(traits for this and further analyses were obtained from the GWAS-MAP database, see Methods 241
section for details). These SNPs were located in 5 out of 7 loci identified in our study (Fig. 3, 242
Supplementary table 10). Variants within loci tagged by rs11121615 and rs3101725 showed 243
positive SMR beta coefficients (the same direction of effect) with predicted and fat-free mass of 244
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
11
both legs and a whole body as well as with basal metabolic rate, and negative coefficients 245
(opposed effects) – with leg fat percentage as well as impedance of both legs and a whole body 246
(parameter negatively correlated with fat-free mass). Loci tagged by rs3101725 and rs7773004 247
comprised SNPs with pleiotropic effects on red blood cell erythrocyte distribution width 248
(negative beta). Locus tagged by rs7773004 was also related to numerous blood traits, such as 249
mean corpuscular haemoglobin concentration, mean platelet thrombocyte volume, and monocyte 250
and reticulocyte count. Three loci (rs11135046, rs28558138, and rs28558138) were linked to 251
blood pressure and hypertension with different SMR beta signs. Another interesting finding was 252
identification of positive SMR beta for associations with vascular/heart problems (rs28558138 253
locus), cellulitis (rs3101725 locus), forced viral capacity, and height size at age 10 (rs7773004 254
locus). Overall, our analysis revealed four main groups of traits: one cluster related mainly to 255
mass, one cluster involving mostly fat- and blood pressure-related traits, one cluster of blood-256
related traits linked only to rs7773004 locus, and one independent trait “red blood cell 257
erythrocyte distribution width” (Fig. 3). 258
Genetic correlations 259
Genetic correlations between VVs in different datasets (discovery and replication cohorts, 260
etc.) are given in Supplementary table 1. A list of genetic correlation estimates (rg) between VVs 261
and 861 complex traits is presented in Supplementary table 11. Twenty five traits showed 262
statistically significant correlation with VVs with absolute values of rg ≥ 0.2. Correlation matrix 263
for this subset is displayed as a heatmap in Fig. 4. We observed 5 main clusters: traits related to 264
the type of job, intelligence, and qualification; traits related to height and mass (including 265
predicted and fat-free mass of both legs); thrombosis-related traits; traits related to operations; 266
and traits related to pain (including leg pain and gonarthrosis) and health satisfaction. VVs trait 267
was closest to the thrombosis-related cluster (positive correlation). It was also positively 268
correlated with mass, operations, and pain-related traits as well as with lower levels of 269
qualification and heavy manual/walking/standing job. For traits with less prominent correlation 270
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
12
with VVs, we observed the same trend: pain and anthropometric traits (sitting and standing 271
height, BMI, mass, etc.) showed positive correlations, whilst higher levels of education - 272
negative. Interestingly, negative correlation was observed with usual walking pace, and positive 273
– with current smoking. 274
We calculated partial genetic correlations for the subset of 7 non-collinear traits with 275
|rg |≥ 0.2. Two traits – DVT and height size at age 10 – were shown to share common genetic 276
background with VVs after the adjustment for the remaining traits in the analyzed subset 277
(Supplementary fig. 3). In other words, their shared genetic components were demonstrated to be 278
at least partially independent of other complex traits. Additionally, we estimated partial genetic 279
correlations between VVs, standing height, and weight. Both standing height and weight had 280
independent genetic components shared with VVs (Supplementary fig. 4). 281
Hypothesis-free search for causal relationships 282
We applied a 2-sample Mendelian randomization (2SMR)58 strategy to infer causal 283
relationships between a broad range of “exposure” phenotypes and VVs as an outcome. In total, 284
39 complex traits were shown to be potential causative factors. Although only genome-wide 285
associated SNPs from exposure GWAS were selected as instrumental variables, we did not 286
require these loci to be replicated. Nevertheless, we checked the stability of our tests with regard 287
to instruments selection by performing the robustness analysis. This test along with the Steiger 288
test59 for the correct direction of effect underpinned the exclusion of 2 out of 39 traits 289
(Supplementary table 12). Further, we assessed violations of MR assumption of absence of 290
horizontal pleiotropy (influence of genetic instruments on the outcome only through the 291
exposure, also known as “exclusion-restriction criterion”) by means of sensitivity analyses58. We 292
did not observe a statistically significant intercept in MR-Egger regression for any trait. 293
However, only a small proportion of traits showed symmetry in Funnel plots and had no 294
heterogeneity in causal effects amongst instruments. This provides evidence that, for the 295
majority of traits, at least some of the selected instruments exhibit horizontal pleiotropic effects. 296
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
13
Such traits mainly involved several hundred genome-wide SNPs that made leave-one-out 297
analysis also uninformative. In order to correct for horizontal pleiotropy, we applied a 298
straightforward approach having excluded all instrumental variables associated with VVs at the 299
level of statistical significance higher than 0.01. Our correction led to symmetry in 26 out of 33 300
asymmetrical Funnel plots and eliminated heterogeneity in causal effects for 28 out of 34 traits, 301
although 6 traits lost the statistical significance of 2SMR coefficients (Supplementary fig. 5-42, 302
Supplementary table 12). Removing potential sources of heterogeneity also reduced absolute 303
values of 2SMR beta for all the corrected phenotypes. A graphical representation of our results, 304
including 25 causal inferences that we consider the most reliable, is shown in Fig. 5. Twenty one 305
traits were related to anthropometry and included standing and sitting height, weight, hip and 306
waist circumference, fat-free, predicted, and fat mass of legs and arms, etc. One trait was a 307
spirometry measurement associated with pulmonary function. Nonetheless, since it is positively 308
correlated with height, we suppose that it has no independent effect on VVs development. 309
Similarly, the reverse association between malabsorption/coeliac disease and VVs could actually 310
be induced by a weight loss as one of the complications of these conditions. Moreover, although 311
this trait has passed all the necessary tests, we avoid making strong claims about its causality 312
since it was self-reported and involved only ~1,500 cases. 313
All the above mentioned traits were derived from the Neale Lab repository and therefore 314
had about 80% sample overlap with VVs dataset obtained from the Gene ATLAS. Shared 315
participants between the “exposure” and the “outcome” GWAS can cause bias in the Mendelian 316
randomization analysis when weak instruments are selected60. In our study, this limitation was 317
partially mitigated by using only strong instruments associated with exposure traits at a high 318
level of statistical significance (P < 1.0e-08). In order to fully overcome the problem and confirm 319
our results, we searched for independent GWAS and performed a replication analysis. Data were 320
obtained for height, weight, waist circumference, and body fat percentage. Only height reached 321
Bonferroni-corrected level of statistical significance and remained significant after the correction 322
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
14
for pleiotropy (2SMR P < 1.0e-07, Supplementary table 13). However, null results for other 323
traits could be explained by a limited power of the analysis. For example, sample size for weight 324
was 4.5 times smaller than in the Neale Lab study. 325
Finally, the remaining two traits from the “reliable” set were plasma levels of MHC class I 326
polypeptide-related sequence B protein and CD209 antigen. These phenotypes were not UK 327
Biobank traits, therefore our results could not be confounded by a sample overlap. 328
Discussion 329
In the present study, we utilized freely available GWAS summary data to unravel the 330
genetic underpinnings of VVs of lower extremities. Lack of individual-level data made us be as 331
rigorous as possible to avoid false positive discoveries, so we skimmed off the most apparent 332
evidence. Nevertheless, as far as we are aware, our study is the largest and the most 333
comprehensive study of VVs genetics to date. Beyond that, the availability of GWAS results 334
makes our research fully reproducible. 335
We identified and replicated 7 susceptibility loci that explain 10% of the SNP-based 336
heritability, and prioritized the most likely causal genes (Fig.2, Tables 1-3). Overall, SNP 337
heritability on the liability scale was estimated at ~28 % assuming the disease prevalence of 20-338
30%. Almost 40% of the variance explained by our top SNPs was attributable to the 339
polymorphism in the CASZ1 gene involved in blood vessel development. This strong signal has 340
previously been revealed by “23andMe”12 and subsequently replicated in our own sample of 341
ethnic Russian individuals10. An especially interesting finding, in our opinion, is a novel 342
association of VVs with SNPs in a recently discovered PIEZO1 gene. PIEZO1 encodes a 343
pressure-activated ion channel which senses shear stress and controls vascular architecture27. 344
Mice embryos lacking functional Piezo1 exhibit defects in vascular remodeling and die at 345
midgestation61. Other prioritized genes were PPP3R1, EBF1, STIM2, and HFE. The latter gene 346
has been linked to the risk of VVs in our recent candidate-gene study19. Meanwhile, we could 347
not prioritize any gene in the locus tagged by rs3101725. On the one hand, this region contains 348
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
15
two genes (SLC12A2 and FBN2) that could play a role in varicose transformation, and on the 349
other hand, the causal polymorphism was shown to be eQTL for a nearby non-protein coding 350
RNA LINC01184 with unknown function. The revealed genes can be considered as good 351
candidates for future follow-up functional studies. It is noteworthy that SLC12A2, FBN2, STIM2, 352
and HFE were included in the druggable gene set by Finan et al.62, and the PIEZO1 gene product 353
belongs to a “potential drug target” class according to the Human Protein Atlas 354
(https://www.proteinatlas.org/). 355
Gene set enrichment analysis involving both genome-wide and less strongly associated 356
signals detected gene categories related to abnormal vascular development and morphology. This 357
observation is consistent with the hypothesis that pathological changes in the vein wall are the 358
primary event preceding VVs formation63. Furthermore, our genetic correlations analysis 359
confirmed known epidemiological associations between VVs and DVT1,16 as well as standing 360
job1,64,65, rough labour65,66, and lower levels of education64 (Fig.4, Supplementary table 11). 361
Shared familial susceptibility with venous thromboembolism has already been shown by Zöller 362
et al67. Here we demonstrate that DVT and VVs share specific genetic components which are 363
independent from other factors such as obesity or number of operations (Supplementary fig. 3). 364
Since none of the top GWAS hits were related to thrombosis, we can conclude that SNPs with 365
less prominent associations are responsible for this genetic overlap. On the contrary, albeit 366
several identified loci were also associated with blood pressure/hypertension and red blood cell 367
traits, including mean corpuscular hemoglobin concentration (Table 2, Fig. 3), we found no 368
evidence for genetic correlation between VVs and these traits. We can therefore attribute these 369
effects only to pleiotropy. Intriguingly, we observed small, but significant genetic overlap with 370
smoking (rg = 0.16). Smoking is considered only as suggestive risk factor for VVs since 371
epidemiological studies have mainly shown no association with this habit1,64,66,68. Other novel 372
interesting findings include genetic links with prospective memory and fluid intelligence 373
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
16
(negative correlation), pain (knee pain, pain all over the body, neck or shoulder pain, and leg 374
pain on walking), usual walking pace, and gonarthrosis. 375
Further, we obtained strong evidence for association between VVs and anthropometric 376
traits such as weight, height, waist and hip circumference. We not only observed genetic 377
correlations (Fig. 3, Fig. 4, Supplementary tables 10, 11), but also demonstrated these traits to be 378
causative factors for VVs development (Fig. 5, Supplementary table 12). It is important to note 379
that genetic overlap and causal relationships were inferred for both fat and fat-free mass. Thus, 380
we can speculate that increased weight is a risk factor for VVs regardless of whether it was 381
caused by excess body fat or a large mass of other tissues. According to some theories, 382
association between overweight and VVs can be explained by greater concentrations of 383
circulating estrogens or even by a confounding effect of parity1. Although this can be true, our 384
results suggest that the cause may also be an increase in mass per se. Moreover, we found that 385
height has common genetic component with VVs independent from weight and other traits 386
(Supplementary fig. 4) and is also causally related to VVs (Fig. 5). Causal inference was 387
additionally confirmed using an independent dataset (Supplementary table 13). This is in 388
agreement with the results of the Edinburgh Vein Study that has shown a significant relationship 389
between increasing height and VVs64. However, the majority of epidemiological studies to date 390
have been focused only on BMI and did not consider height or other body characteristics. 391
Notably, the formula for BMI contains the square of height in the denominator. It is possible that 392
the impact of height underlies the inconsistency in the results, when some studies show a 393
positive association between BMI and VVs, while others reveal no effect1. We therefore 394
recommend that future epidemiological studies collect and analyze data on height and weight 395
along with data on BMI. 396
Last but not least, we detected the causal effect of plasma levels of two proteins – MHC 397
class I polypeptide-related sequence B protein (MICB) and CD209 antigen. Both molecules are 398
involved in innate and adaptive immunity. MICB is a ligand for the activating receptor NKG2D 399
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
17
present on the surface of natural killer and some other immune-related cells. CD209, also known 400
as DC-SIGN, is a C-type lectin receptor expressed on dendritic cells and macrophages. In our 401
study, Mendelian randomization analysis for CD209 was performed using two genetic 402
instruments – rs505922 and rs8106657 identified by Suhre et al69. The first SNP is located in the 403
ABO gene responsible for ABO blood group determination. Allele rs505922 C tags non-O group 404
and is in linkage disequilibrium with the allele A of the neighboring SNP rs507666 (D’ = 1.00, r2 405
= 0.39), which was found to be the top variant associated with VVs in the “23andMe” GWAS12 406
This association was validated in our previous study using data from UK Biobank10. At the same 407
time, rs505922 explains nearly 40% of the variability of the plasma CD209 concentration (our 408
estimate based on the published data69) with the allele C being linked to higher levels of CD209. 409
It is tempting to speculate that association between the ABO gene polymorphisms and VVs as 410
well as between VVs and blood group A70 is mediated by this protein. We conducted 411
SMR/HEIDI analysis for the locus containing these SNPs and demonstrated that both 412
associations with VVs and CD209 were related to the same causal polymorphism (PSMR = 3.3e-413
06, PHEIDI = 0.75), that supports our hypothesis. Nevertheless, the role of CD209 as well as 414
MICB in VVs etiology needs further experimental confirmation. Besides this, it should be taken 415
into account that the level of CD209 in plasma does not necessarily correlates with its level on 416
the surface of the cells, and circulating CD209 may possess its own specific function. 417
418
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
18
Methods 419
Study sample, genotyping, and quality control 420
All cases and controls analyzed in our study were derived from UK Biobank13. 421
Sociodemographic, physical, lifestyle, and health-related characteristics of UK Biobank 422
participants have been reported elsewhere15. In brief, enrolled individuals were aged 40–69 423
years, were less likely to be obese, to smoke, to drink alcohol, and had fewer self-reported health 424
conditions compared with the general population. All study participants provided written 425
informed consent, and the study was approved by the North West Multi-centre Research Ethics 426
Committee. Genotyping was performed using the Affymetrix UK BiLEVE and the Affymetrix 427
UK Biobank Axiom arrays. Details on DNA extraction, genotyping, imputation, and quality 428
control (QC) procedures have been reported previously71. 429
The discovery stage of our GWAS was based on genetic association data provided by the 430
Neale Lab (http://www.nealelab.is/) for 337,199 QC positive individuals (6,958 patients 431
diagnosed with “I83: VVs of lower extremities” and 330,241 control individuals without this 432
diagnosis). Information on data processing is given on the Neale Lab website 433
(http://www.nealelab.is/blog/2017/9/11/details-and-considerations-of-the-uk-biobank-gwas). In 434
short, phenotypes were analyzed automatically by the PHEnome Scan ANalysis Tool. SNPs QC 435
criteria included minor allele frequency (MAF) > 0.1%, the Hardy-Weinberg equilibrium 436
P > 1.0e-10, and INFO score > 0.8. After filtration, 10,879,183 autosomal SNPs were left for the 437
analysis. Per-individual QC procedure included removing non-white British and closely related 438
individuals, patients with sex chromosome aneuploidies, and individuals who had withdrawn 439
consent from the UK Biobank study. Associations were adjusted for sex and the first 10 principal 440
components from the UK Biobank sample QC file. We also adjusted the data for body mass 441
index and deep venous thrombosis using GWAS summary statistics for these traits downloaded 442
from the same resource, and corrected the results for potential confounding using the intercept72 443
as a genomic control inflation factor λ. Details are given in Supplementary Methods. 444
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
19
A replication stage was performed by a reverse inverse-variance weighted meta-analysis 445
that yielded genetic association data for 71,256 individuals (Fig. 1). For these purpose, we 446
derived summary statistics for 408,455 UK Biobank participants (10,861 cases and 397,594 447
controls) from the Gene ATLAS database14 (http://geneatlas.roslin.ed.ac.uk/) and used these data 448
as a “minuend” dataset. A detailed description of our approach with statistical processing is 449
presented in Supplementary Methods, in the study of Deng et al.18, and in our manuscript 450
deposited on bioRxiv17. Data provided by the Gene ATLAS project were obtained for a cohort of 451
both related and unrelated individuals of white British descent. The associations were calculated 452
using Mixed Linear Models with adjustment for age, age2, sex, array batch, UK Biobank 453
Assessment Center, and the leading 20 genomic principal components as computed by UK 454
Biobank. Details of this study including QC criteria have been reported previously14. In brief, 455
they excluded individuals with missing or conflicting data, non-biallelic variants, and SNPs with 456
MAF <0.01% and/or the Hardy-Weinberg equilibrium P < 1.0e-50. The Gene ATLAS contains 457
information for over 30 million variants. In our analysis, we focused on 10,829,469 SNPs that 458
overlap the Neale Lab SNP set. 459
Conditional analysis 460
Regional association plots were generated with LocusZoom tool (http://locuszoom.org/) 461
for regions within 250 kb from the lead SNP. Only replicated loci were analyzed. Conditional 462
and joint (COJO) analysis was carried out by a summary statistics-based method described by 463
Yang et al.73 Calculations were performed using the GCTA software74. LD matrix was computed 464
with PLINK 1.9 software (https://www.cog-genomics.org/plink2) using 1000 Genomes data for 465
503 European individuals (http://www.internationalgenome.org/data/). We claimed the presence 466
of one independent signal per locus if none of the polymorphisms except the lead SNP passed 467
the significance threshold of P = 5.0e-08. 468
469
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
20
Literature-based functional annotation 470
We used regional association plots to identify genes located within associated loci. These 471
genes were further queried for potential involvement in the processes relevant to VVs 472
pathogenesis. For each gene, we scanned an Online Catalog of Human Genes and Genetic 473
Disorders (OMIM, https://www.omim.org/), the NCBI Gene 474
(https://www.ncbi.nlm.nih.gov/gene), and the Pubmed database to inquire into their biological 475
functions. Furthermore, we interrogated whether other hypothetical varicose-related traits were 476
associated with these genes according to previously published GWAS. The Pubmed search was 477
performed using the gene name as a keyword as well as combinations of the gene name and 478
“GWAS”, or “genome-wide association study”, or “varicose”, or “venous”, or “vascular”. The 479
information obtained was used for the literature-based gene prioritization. 480
Additionally, we used a PhenoScanner75 tool 481
(http://www.phenoscanner.medschl.cam.ac.uk/phenoscanner) to make a list of complex traits 482
associated at P < 5.0e-08 with lead SNPs or other polymorphisms in high LD (r2 ≥ 0.7) with lead 483
variants. 1000 Genomes-derived polymorphisms served as proxies for the analysis. 484
VEP analysis 485
For the VEP51 analysis, we used a set of SNPs within replicated regions associated with 486
VVs at P ≤ T, where log10(T) = log10(Pmin) + 1, and Pmin is a P-value for the strongest association 487
per locus (±250 kb from the independent hit). Analysis was performed using software available 488
online (https://www.ensembl.org/info/docs/tools/vep/index.html). 489
SMR/HEIDI analysis 490
SMR/HEIDI52 approach was used to test for potential pleiotropic effects of identified loci 491
on VVs and other complex traits including gene expression levels in certain tissues. SMR 492
analysis provides evidence for pleiotropy, but is unable to define whether both traits are affected 493
by the same underlying causal polymorphism. The latter is specified by a HEIDI test that 494
distinguishes pleiotropy from linkage disequilibrium. Summary statistics for gene expression 495
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
21
levels was obtained from Westra Blood eQTL76 (peripheral blood, 496
http://cnsgenomics.com/software/smr/#eQTLsummarydata) and the GTEx77 database (44 tissues, 497
https://gtexportal.org). VVs summary statistics was obtained from the Gene ATLAS database 498
since it enables the maximum statistical power due to the greatest sample size. At the same time, 499
VVs in this GWAS is completely correlated at a genetic level with VVs in our discovery dataset 500
(rg = 1.00, Supplementary table 1). Summary statistics for other complex traits was derived from 501
the GWAS-MAP database (see below). 502
Nominal P for SMR test was set at 2.63e-05 (0.05/1,898, where 1,898 is the total number 503
of probes for the gene expression analysis) and 1.89e-06 (0.05/7*3,787, where 7 is the number of 504
loci and 3,787 is the total number of non-VVs traits in GWAS-MAP). For HEIDI analysis in a 505
gene expression study, we used a conservative threshold of P = 0.01 (P < 0.01 corresponds to the 506
rejection of pleiotropy hypothesis). For HEIDI in a complex traits analysis, we implemented a 507
less conservative threshold of P = 0.001 since the number of independent test was much higher. 508
Details of data processing are given in Supplementary Methods. 509
DEPICT analysis 510
DEPICT analysis was performed using DEPICT53 software version 1.1, release 194 511
(https://data.broadinstitute.org/mpg/depict/) with default parameters. GWAS summary statistics 512
was obtained from the Gene ATLAS database. We employed DEPICT for both genome-wide 513
significant SNPs (P < 5.0e-08) as well as for loci associated with VVs at P < 1.0e-05. As in 514
previous analyzes, we defined locus as ±250 kb from the lead SNP. The major histocompatibility 515
complex (MHC) region was eliminated. Significance threshold was set at FDR < 0.05. 516
GWAS-MAP platform 517
GWAS-MAP platform integrates a database which was created to study cardiovascular 518
diseases and contains summary-level GWAS results for 123 metabolomics traits, 1,206 519
circulating proteins, 2,419 complex traits from the UK Biobank, and 8 traits related to coronary 520
artery disease, myocardial infarction and their risk factors. We additionally added 6 VVs-related 521
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
22
traits analyzed in the present study (our discovery and replication datasets; the Gene ATLAS 522
data for VVs; and the Neale Lab data for VVs, BMI, and DVT) as well as 33 traits from the 523
Gene ATLAS database that we supposed to be biologically relevant to VVs. Description of all 524
3,795 traits is provided in Supplementary table 9. GWAS-MAP platform contains embedded 525
software for LD Score regression72, 2SMR analysis (MR-Base package58), and our 526
implementation of SMR/HEIDI analysis52. Further details are given in Supplementary methods. 527
Genetic correlations 528
Genetic correlations (rg) between VVs and other complex traits were calculated using 529
LDSC software (https://github.com/bulik/ldsc/). We applied a cross-trait LD Score regression 530
technique as previously described by Bulik-Sullivan et al.78 This method requires only GWAS 531
summary statistics and is not biased by a sample overlap. We analyzed 861 heritable non-VVs 532
traits from the GWAS-MAP database. Only traits with a total sample size of ≥ 10,000 and ≥ 1 533
million SNPs tested were included in the analysis. GWAS summary statistics for VVs was 534
obtained from the Gene ATLAS database. Statistical significance threshold was set at 1.16e-05 535
(0.01/861). For 25 traits that passed this threshold with |rg| ≥ 0.2, we calculated a matrix of 536
genetic correlations. Partial genetic correlations were estimated using the inverse of the genetic 537
correlation matrix. Significance threshold was set at 3.1e-05 (0.01/325, where 325 is the number 538
of pairwise combinations). For partial genetic correlations between VVs, standing height, and 539
weight, nominal P was set at 3.3e-03 (0.01/3). Clustering and visualization was performed by the 540
“corrplot” package for the R language (basic “hclust” function). Further details are provided in 541
Supplementary Methods. 542
2SMR analysis 543
Casual relationships between 2,221 non-VVs phenotypes (“exposures”) from the GWAS-544
MAP database and VVs (“outcome”) were assessed by 2-sample Mendelian randomization 545
(2SMR) as previously described by the MR-Base collaboration58 (http://www.mrbase.org/). All 546
the details of our protocol are given in Supplementary Methods. Binary traits with the number of 547
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
23
cases or controls less than 1000 were not included in the study. GWAS summary statistics for 548
VVs was obtained from the Gene ATLAS database. Analysis was performed on the GWAS-549
MAP platform. Two 2SMR approaches were used: an inverse variance weighted meta-analysis 550
of Wald ratios (IVW) and MR-Egger regression. The nominal P was set at 1.1e-05 551
(0.05/2*2,221, where 2 is the number of approaches). For traits that passed either IVW or MR-552
Egger test, we performed the Steiger test59 for identifying the correct direction of effect, and 553
conducted the robustness analysis (our own approach). Traits that passed all the analyses were 554
then subjected to sensitivity tests58, that included assessing heterogeneity in causal effects 555
amongst instruments, horizontal pleiotropy test (based on the intercept in MR Egger regression), 556
leave-one-out analysis, and Funnel plots inspection. The nominal P for the Steiger test was set at 557
2.3e-05 (0.05/2,221), and for the robustness analysis, horizontal pleiotropy test, and 558
heterogeneity analysis – at 1.3e-03 (0.05/38, where 38 is the number of traits that passed both 559
2SMR analysis and the Steiger test). Leave-one-out and Funnel plots were examined manually. 560
Sensitivity analyses revealed the presence of horizontal pleiotropy for the majority of traits. To 561
correct for this confounder, we omitted all instrumental variables associated with VVs with 562
P < 0.01, and then repeated IVW 2SMR and sensitivity analyses. Additionally, for the resulting 563
set of traits, we searched for independent GWAS in the MR-Base database58 and conducted 564
confirmatory IVW 2SMR analysis (where appropriate) with the MR-Base default parameters. 565
The nominal P was set at 0.013 (0.05/4, where 4 is the number of traits). 566
567
568
569
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
24
Acknowledgments 570
The work of ASS was supported by the Russian Science Foundation [Project No 17-75-20223]. 571
The work of YAT was supported by the Russian Ministry of Science and Education under the 5-572
100 Excellence Programme. 573
The work of SZS was supported by the Institute of Cytology and Genetics [Project No 0324-574
2018-0017]. 575
The development of GWAS-MAP platform was supported by grants from the Russian Ministry 576
of Science and Education under the 5-100 Excellence Programme, British Council's Institutional 577
Links Programme for Novosibirsk State University and University of Edinburgh (Project 578
reference No IL4277322879), and by PolyKnomics BV. We thank Maatschap PolyOmica for 579
providing free of charge access to computational services. 580
581
We express our gratitude to the Neale Lab and the Gene ATLAS projects for providing GWAS 582
summary statistics for UK Biobank traits. We sincerely thank Yurii Aulchenko for valuable 583
discussion. We gratefully acknowledge Denis Gorev, Eugene Pakhomov, and Anna Torgasheva 584
who contributed to the development of the GWAS-MAP platform. We acknowledge Eugene 585
Phakomov who provided computational support with SMR/HEIDI analysis. We also would like 586
to thank Mariya Smetanina and Maxim Filipenko for obtaining funding. 587
588
Author contributions 589
YAT and ASS conceived the study and provided its design. YAT carried out statistical analyses. 590
ASS worked with literature sources. ASS and YAT interpreted all data and wrote the paper. TIS, 591
SZS, and YAT contributed to the design and implementation of the GWAS-MAP database. SZS 592
contributed to statistical analysis. All authors reviewed and approved the final version of the 593
manuscript. 594
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
25
Additional information 595
Competing interests 596
The authors declare no competing financial interests. 597
Materials & Correspondence 598
Correspondence and requests for materials should be addressed to Alexandra S. Shadrina (email: 599
[email protected]) or Yakov A. Tsepilov (email: [email protected]). 600
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
26
References 601
1. Beebe-Dimmer, J. L., Pfeifer, J. R., Engle, J. S. & Schottenfeld, D. The epidemiology of 602
chronic venous insufficiency and varicose veins. Ann. Epidemiol. 15, 175–184 (2005). 603
2. Segiet, O. A. et al. Biomolecular mechanisms in varicose veins development. Ann. Vasc. 604
Surg. 29, 377–384 (2015). 605
3. Raffetto, J. D. Pathophysiology of chronic venous disease and venous ulcers. Surg. Clin. 606
North Am. 98, 337–347 (2018). 607
4. Pfisterer, L., König, G., Hecker, M. & Korff, T. Pathogenesis of varicose veins - lessons 608
from biomechanics. Vasa 43, 88–99 (2014). 609
5. Lim, C. S., Kiriakidis, S., Sandison, A., Paleolog, E. M. & Davies, A. H. Hypoxia-610
inducible factor pathway and diseases of the vascular wall. J. Vasc. Surg. 58, 219–230 611
(2013). 612
6. Ng, M. Y. M., Andrew, T., Spector, T. D. & Jeffery, S. Linkage to the FOXC2 region of 613
chromosome 16 for varicose veins in otherwise healthy, unselected sibling pairs. J. Med. 614
Genet. 42, 235–239 (2005). 615
7. Krysa, J., Jones, G. T. & van Rij, A. M. Evidence for a genetic role in varicose veins and 616
chronic venous insufficiency. Phlebology 27, 329–335 (2012). 617
8. Grant, Y., Onida, S. & Davies, A. Genetics in chronic venous disease. Phlebol. J. Venous 618
Dis. 32, 3–5 (2017). 619
9. Ellinghaus, E. et al. Genome-wide association analysis for chronic venous disease 620
identifies EFEMP1 and KCNH8 as susceptibility loci. Sci. Rep. 7, 45652 (2017). 621
10. Shadrina, A. et al. Polymorphisms of genes involved in inflammation and blood vessel 622
development influence the risk of varicose veins. Clin. Genet. 94, 191–199 (2018). 623
11. Shadrina, A. et al. Genome-wide association study in ethnic Russians suggests an 624
association of the MHC class III genomic region with the risk of primary varicose veins. 625
Gene 659, 93–99 (2018). 626
12. Bell R.K., Durand E.Y., McLean C.Y., Eriksson N., Tung J.Y., Hinds D. A large scale 627
genome wide association study of varicose veins in the 23andMe cohort. In: The 64th 628
Annual Meeting of The American Society of Human Genetics, San Diego, California, 629
USA, 18-22 October 2014, paper no. 2082M, p.487. San Diego: ASHG. 630
https://blog.23andme.com/wp-content/uploads/2014/10/Bell_ASHG2014_varicose.pdf 631
Accessed July 12, 2018. 632
13. Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a 633
wide range of complex diseases of middle and old age. PLOS Med. 12, e1001779 (2015). 634
14. Canela-Xandri, O., Rawlik, K. & Tenesa, A. An atlas of genetic associations in UK 635
Biobank. Preprint at bioRxiv 176834 (2017). doi:10.1101/176834 636
15. Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK 637
Biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–638
1034 (2017). 639
16. Müller-Bühl, U. et al. Varicose veins are a risk factor for deep venous thrombosis in 640
general practice patients. Vasa 41, 360–365 (2012). 641
17. Tsepilov, Y. A. et al. Network based conditional genome wide association analysis of 642
human metabolomics. Preprint at bioRxiv 096982 (2016). doi:10.1101/096982 643
18. Deng, Y. & Pan, W. Conditional analysis of multiple quantitative traits based on marginal 644
GWAS summary statistics. Genet. Epidemiol. 41, 427–436 (2017). 645
19. Sokolova, E. A. et al. HFE p.C282Y gene variant is associated with varicose veins in 646
Russian population. Clin. Exp. Med. 16, 463–470 (2016). 647
20. Zamboni, P. et al. The overlapping of local iron overload and HFE mutation in venous leg 648
ulcer pathogenesis. Free Radic. Biol. Med. 40, 1869–1873 (2006). 649
21. Charpentier, M. S. et al. CASZ1 promotes vascular assembly and morphogenesis through 650
the direct regulation of an EGFL7/RhoA-mediated pathway. Dev. Cell 25, 132–143 651
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
27
(2013). 652
22. Charpentier, M. S., Dorr, K. M. & Conlon, F. L. Transcriptional regulation of blood vessel 653
formation. Cell Cycle 12, 2165–2166 (2013). 654
23. Levy, D. et al. Genome-wide association study of blood pressure and hypertension. Nat. 655
Genet. 41, 677–687 (2009). 656
24. Kato, N. et al. Meta-analysis of genome-wide association studies identifies common 657
variants associated with blood pressure variation in east Asians. Nat. Genet. 43, 531–538 658
(2011). 659
25. Lu, X. et al. Genome-wide association study in Chinese identifies novel loci for blood 660
pressure and hypertension. Hum. Mol. Genet. 24, 865–874 (2015). 661
26. Takeuchi, F. et al. Blood pressure and hypertension are associated with 7 loci in the 662
Japanese population. Circulation 121, 2302–2309 (2010). 663
27. Li, J. et al. Piezo1 integration of vascular architecture with physiological force. Nature 664
515, 279–282 (2014). 665
28. Gudipaty, S. A. et al. Mechanical stretch triggers rapid epithelial cell division through 666
Piezo1. Nature 543, 118–121 (2017). 667
29. Retailleau, K. et al. Piezo1 in smooth muscle cells is involved in hypertension-dependent 668
arterial remodeling. Cell Rep. 13, 1161–1171 (2015). 669
30. Rode, B. et al. Piezo1 channels sense whole body physical activity to reset cardiovascular 670
homeostasis and enhance performance. Nat. Commun. 8, 350 (2017). 671
31. Wang, S. et al. Endothelial cation channel PIEZO1 controls blood pressure by mediating 672
flow-induced ATP release. J. Clin. Invest. 126, 4527–4536 (2016). 673
32. Ilkan, Z. et al. Evidence for shear-mediated Ca2+entry through mechanosensitive cation 674
channels in human platelets and a megakaryocytic cell line. J. Biol. Chem. 292, 9204–675
9217 (2017). 676
33. Faucherre, A., Kissa, K., Nargeot, J., Mangoni, M. E. & Jopling, C. Piezo1 plays a role in 677
erythrocyte volume homeostasis. Haematologica 99, 70–5 (2014). 678
34. Cahalan, S. M. et al. Piezo1 links mechanical forces to red blood cell volume. Elife 4, 679
e0737 (2015). 680
35. van der Harst, P. et al. Seventy-five genetic loci influencing the human red blood cell. 681
Nature 492, 369–375 (2012). 682
36. Hodonsky, C. J. et al. Genome-wide association study of red blood cell traits in 683
Hispanics/Latinos: The Hispanic Community Health Study/Study of Latinos. PLOS 684
Genet. 13, e1006760 (2017). 685
37. Wu, W. et al. Calcineurin B stimulates cytokine production through a CD14-independent 686
Toll-like receptor 4 pathway. Immunol. Cell Biol. 94, 285–292 (2016). 687
38. Satonaka, H. et al. Calcineurin promotes the expression of monocyte chemoattractant 688
protein-1 in vascular myocytes and mediates vascular inflammation. Circ. Res. 94, 693-689
700 (2004). 690
39. Orlov, S. N., Tremblay, J. & Hamet, P. NKCC1 and hypertension: a novel therapeutic 691
target involved in the regulation of vascular tone and renal function. Curr. Opin. Nephrol. 692
Hypertens. 19, 163–168 (2010). 693
40. Olivieri, J., Smaldone, S. & Ramirez, F. Fibrillin assemblies: extracellular determinants of 694
tissue formation and fibrosis. Fibrogenesis Tissue Repair 3, 24 (2010). 695
41. Carta, L. et al. Fibrillins 1 and 2 perform partially overlapping functions during aortic 696
development. J. Biol. Chem. 281, 8016–8023 (2006). 697
42. Chami, N. et al. Exome genotyping identifies pleiotropic variants associated with red 698
blood cell traits. Am. J. Hum. Genet. 99, 8–21 (2016). 699
43. Gisler, R., Jacobsen, E. W. & Sigvardsson, M. Cloning of human early B-cell factor and 700
identification of target genes suggest a conserved role in B-cell development in man and 701
mouse. Blood. 96, 1457-1464 (2000). 702
44. Griffin, M. J. et al. Early B-cell factor-1 (EBF1) is a key regulator of metabolic and 703
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
28
inflammatory signaling pathways in mature adipocytes. J. Biol. Chem. 288, 35925–35939 704
(2013). 705
45. Ehret, G. B. et al. Genetic variants in novel pathways influence blood pressure and 706
cardiovascular disease risk. Nature 478, 103–109 (2011). 707
46. Singh, A. et al. Gene by stress genome-wide interaction analysis and path analysis 708
identify EBF1 as a cardiovascular and metabolic risk gene. Eur. J. Hum. Genet. 23, 854–709
862 (2015). 710
47. Brandman, O., Liou, J., Park, W. S. & Meyer, T. STIM2 is a feedback regulator that 711
stabilizes basal cytosolic and endoplasmic reticulum Ca2+ levels. Cell 131, 1327–1339 712
(2007). 713
48. Fernandez, R. A. et al. Upregulated expression of STIM2, TRPC6, and Orai2 contributes 714
to the transition of pulmonary arterial smooth muscle cells from a contractile to 715
proliferative phenotype. Am. J. Physiol. Physiol. 308, C581–C593 (2015). 716
49. van der Harst, P. et al. Seventy-five genetic loci influencing the human red blood cell. 717
Nature 492, 369–75 (2012). 718
50. Wood, A. R. et al. Defining the role of common variation in the genomic and biological 719
architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014). 720
51. McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016). 721
52. Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts 722
complex trait gene targets. Nat. Genet. 48, 481–487 (2016). 723
53. Pers, T. H. et al. Biological interpretation of genome-wide association studies using 724
predicted gene functions. Nat. Commun. 6, 5890 (2015). 725
54. Fish, J. E. & Wythe, J. D. The molecular regulation of arteriovenous specification and 726
maintenance. Dev. Dyn. 244, 391–409 (2015). 727
55. del Rio Solá L. et al. Varicose veins show enhanced chemokine expression. Eur. J. Vasc. 728
Endovasc. Surg. 38, 635–641 (2009). 729
56. Lattimer, C. R., Kalodiki, E., Geroulakos, G., Hoppensteadt, D. & Fareed, J. Are 730
inflammatory biomarkers increased in varicose vein blood? Clin. Appl. Thromb. Hemost. 731
22, 656–664 (2016). 732
57. Smetanina, M. A. et al. DNA methylation and gene expression profiling reveal MFAP5 as 733
a regulatory driver of extracellular matrix remodeling in varicose vein disease. 734
Epigenomics (2018). doi:10.2217/epi-2018-0001 [Epub ahead of print]. 735
58. Hemani, G. et al. The MR-Base platform supports systematic causal inference across the 736
human phenome. Elife 7, e34408 (2018). 737
59. Hemani, G., Tilling, K. & Davey Smith, G. Orienting the causal relationship between 738
imprecisely measured traits using GWAS summary data. PLOS Genet. 13, e1007081 739
(2017). 740
60. Burgess, S., Davies, N. M. & Thompson, S. G. Bias due to participant overlap in two-741
sample Mendelian randomization. Genet. Epidemiol. 40, 597–608 (2016). 742
61. Ranade, S. S. et al. Piezo1, a mechanically activated ion channel, is required for vascular 743
development in mice. Proc. Natl. Acad. Sci. 111, 10347–10352 (2014). 744
62. Finan, C. et al. The druggable genome and support for target identification and validation 745
in drug development. Sci. Transl. Med. 9, eaag1166 (2017). 746
63. Lim, C. S. & Davies, A. H. Pathogenesis of primary varicose veins. Br. J. Surg. 96, 1231–747
1242 (2009). 748
64. Lee, A. J., Evans, C. J., Allan, P. L., Ruckley, C. V. & Fowkes, F. G. R. Lifestyle factors 749
and the risk of varicose veins: Edinburgh Vein Study. J. Clin. Epidemiol. 56, 171–179 750
(2003). 751
65. Bihari, I., Tornoci, L. & Bihari, P. Epidemiological study on varicose veins in Budapest. 752
Phlebol. J. Venous Dis. 27, 77–81 (2012). 753
66. Zolotukhin, I. et al. Prevalence and risk factors for chronic venous disease in Russian 754
general population. Eur. J. Vasc. Endovasc. Surg. 54, 752-758 (2017). 755
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
29
67. Zöller, B., Ji, J., Sundquist, J. & Sundquist, K. Venous thromboembolism and varicose 756
veins share familial susceptibility: a nationwide family study in Sweden. J. Am. Heart 757
Assoc. 3, e000850 (2014). 758
68. Brand, F. N., Dannenberg, A. L., Abbott, R. D. & Kannel, W. B. The epidemiology of 759
varicose veins: the Framingham Study. Am. J. Prev. Med. 4, 96–101 (1988). 760
69. Suhre, K. et al. Connecting genetic risk to disease end points through the human blood 761
plasma proteome. Nat. Commun. 8, 14357 (2017). 762
70. Cornu-Thenard, A., Dab, W., de Vicenzi, I. & Valty, J. Relationship between blood 763
groups (ABO) and varicose veins of the lower limbs. A case-control study. Phlebol. J. 764
Venous Dis. 4, 37–40 (1989). 765
71. Bycroft, C. et al. Genome-wide genetic data on ~500,000 UK Biobank participants. 766
Preprint at bioRxiv 166298 (2017). doi:10.1101/166298 767
72. Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from 768
polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015). 769
73. Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics 770
identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012). 771
74. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: A Tool for Genome-wide 772
Complex Trait Analysis. Am. J. Hum. Genet. 88, 76–82 (2011). 773
75. Staley, J. R. et al. PhenoScanner: a database of human genotype–phenotype associations. 774
Bioinformatics 32, 3207–3209 (2016). 775
76. Westra, H.-J. et al. Systematic identification of trans eQTLs as putative drivers of known 776
disease associations. Nat. Genet. 45, 1238–1243 (2013). 777
77. GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 778
580–585 (2013). 779
78. Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. 780
Nat. Genet. 47, 1236–1241 (2015). 781
782
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
30
Figures 783
784
785
Figure 1. Scheme depicting the overall workflow of our study. 786
In the discovery stage, we used GWAS summary statistics for VVs provided by the Neale Lab 787
and adjusted it for two potential confounders – body mass index (BMI) and deep venous 788
thrombosis (DVT). GWAS data for these traits was also obtained from the Neale Lab database. 789
In order to validate the revealed association signals, we generated a replication cohort by means 790
of reverse meta-analysis using data from by the Gene ATLAS (larger sample size) and the Neale 791
Lab (smaller sample size) databases. VVs traits in all datasets including our simulated replication 792
sample were highly genetically correlated with each other (Supplementary table 1). For 7 793
replicated loci, we performed a functional annotation analysis. Other analyses were conducted 794
using the Gene ATLAS data since it enables to achieve the highest statistical power and is 795
genetically equivalent to our discovery dataset (rg = 1.00). 796
797
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
31
798 Figure 2. Graphical summary of the discovery GWAS stage after the adjustment for deep 799
venous thrombosis and body mass index and the correction for genomic control. Red line 800
corresponds to the genome-wide significance threshold of P = 5.0e-08. Only associations with 801
P < 1.0e-02 are presented. Points with -log10(P) >15 are depicted as points with -log10(P) =15. 802
803
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
32
804 Figure 3. Pleiotropic effects of variants within revealed loci on other complex traits. 805
Color depicts the sign and the magnitude of SMR beta coefficient. Negative sign (red) means 806
opposed effects, and positive sign (blue) means the same direction of effect. Traits that passed 807
both SMR and HEIDI test are marked with an asterisk. Prioritized genes in loci are shown in 808
brackets. Full names of traits as indicated in the Neale lab and the Gene ATLAS databases are 809
given in Supplementary table 14. 810
RBC, red blood cell. 811
812
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
33
813 Figure 4. Matrix of genetic correlations between VVs and other complex traits. 814
Color depicts the sign and the absolute value of genetic correlation coefficient (rg). Only the 815
traits having |rg| ≥ 2 with VVs are shown. Combinations with statistically significant rg are 816
marked with points (P < 3.1e-05). O levels of qualification correspond to the General Certificate 817
of Education Ordinary Level/the General Certificate of Secondary Education; A or AS level 818
correspond to Advanced or Advanced Subsidiary levels. “Qualifications: NN” means no 819
qualification. Full names of traits as indicated in the Neale lab and the Gene ATLAS databases 820
are given in Supplementary table 15. For “Prospective memory result” and “Health satisfaction”, 821
high scores correspond to poor performance. 822
DVT, deep venous thrombosis; VTE, venous thromboembolism. 823
824
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
34
825 Figure 5. Volcano plot depicting the results of hypothesis-free 2SMR analysis (IVW approach). 826
The x-axis shows 2SMR beta, and the y-axis denotes the level of statistical significance on a 827
logarithmic scale. Grey color indicates the traits that did not pass either 2SMR, or the Steiger 828
test, or additional robustness analysis. Small colored circles depict the traits that passed these 829
test, but either exhibited heterogeneity in causal effects amongst instruments after removing 830
potential outlier SNPs, or did not passed 2SMR after this procedure. Large colored circles 831
represent the traits that both passed 2SMR and showed no heterogeneity after the outliers 832
removal (“the most reliable traits”). 2SMR coefficients are shown as calculated before the 833
correction for pleiotropy. 834
“Pulmonary function” traits include different estimates of forced vital capacity and forced 835
expiratory volume in 1-second. Further details are provided in Supplementary table 12. 836
MICB, MHC class I polypeptide-related sequence B. 837
838
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
35
Supplementary tables 839
Supplementary table 1. Genetic correlations between VVs in different datasets included in the 840
analysis. 841
Supplementary table 2. Top loci associated with VVs of lower extremities. 842
Supplementary table 3. Results of conditional and joint analysis for replicated loci. 843
Supplementary table 4. Complex traits associated with the revealed hits (found in a 844
PhenoScanner catalogue). 845
Supplementary table 5. Results of the VEP analysis. 846
Supplementary table 6. Results of SMR/HEIDI analysis. Searching for pleiotropic effects on 847
VVs and gene expression. 848
Supplementary table 7. Results of DEPICT analysis for SNPs with P < 5e-08. 849
Supplementary table 8. Results of DEPICT analysis for SNPs with P < 1e-05. 850
Supplementary table 9. Description of traits included in the GWAS-MAP database. 851
Supplementary table 10. Results of SMR/HEIDI analysis. Searching for pleiotropic effects on 852
VVs and other complex traits. Associations that passed both SMR and HEIDI analyses. 853
Supplementary table 11. Genetic correlations between VVs and other complex traits. 854
Supplementary table 12. Results of hypothesis-free 2SMR analysis. 855
Supplementary table 13. Results of 2SMR analysis performed on independent datasets. 856
Supplementary table 14. Trait abbreviations, full names of traits, and their sources 857
(SMR/HEIDI). 858
Supplementary table 15. Trait abbreviations, full names of traits, and their sources (genetic 859
correlations). 860
861
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
36
Supplementary figure captions 862
863 Supplementary figure 1. Regional association plot of –log10 (P) for SNPs located at the 864
distance of ≤ 250 kb from the index SNP. Color of circles indicates the strength of linkage 865
disequilibrium with the lead SNP based on the squared correlation coefficient (r2). Blue line 866
indicates recombination rate (cM/Mb). Genes are indicated as blue bars under the plot. 867
868
Supplementary figure 2. A quantile-quantile plot for observed vs. expected distribution of P-869
values for χ2 statistics. 870
871
Supplementary figure 3. Matrix of partial genetic correlations between VVs and a subset of 872
non-collinear complex traits. Color depicts the sign and the absolute value of genetic correlation 873
coefficient (rg). Combinations with statistically significant correlations are marked with points. 874
Full names of traits as indicated in the Neale lab and the Gene ATLAS databases are given in 875
Supplementary table 15. 876
877
Supplementary figure 4. Matrix of partial genetic correlations between VVs, standing height, 878
and weight. Color depicts the sign and the absolute value of genetic correlation coefficient (rg). 879
Combinations with statistically significant correlations are marked with points. 880
881
Supplementary figures 5-42. Leave-one-out and Funnel plots before and after removal of 882
instrumental variables associated with VVs with P < 0.01. 883
884
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
37
Table 1. Top SNPs associated with varicose veins of lower extremities. Replicated associations are shown in bold. SNP Chr: position* EffA/RefA† EAF Discovery stage Replication stage,
N = 71,256
h2 (%)§ Without adjustment for BMI and DVT,
N = 337,199
Adjusted for BMI and DVT,
N = 336,107
β SE P β SE P P‡(GC) Z P
rs11121615 1: 10825577 T/C 0.69 -0.0056 0.0004 1.1e-49 -0.0056 0.0004 7.6e-50 6.0e-48 -8.55 1.2e-17 3.7
rs2911463 16: 88835545 A/G 0.69 -0.0041 0.0004 1.0e-27 -0.0041 0.0004 9.1e-28 9.7e-27 -4.05 5.1e-05 1.6
rs2861819 2: 68489221 C/G 0.66 0.0030 0.0004 8.5e-17 0.0030 0.0004 7.4e-17 3.0e-16 5.34 9.5e-08 1.2
rs3101725 5: 127524018 C/T 0.76 -0.0030 0.0004 1.5e-13 -0.0030 0.0004 2.9e-13 8.4e-13 -4.02 5.7e-05 0.9
rs11135046 5: 158230013 T/G 0.54 -0.0025 0.0003 5.9e-13 -0.0025 0.0003 3.7e-13 1.1e-12 -4.13 3.6e-05 0.9
rs28558138 4: 26818080 C/G 0.42 -0.0024 0.0004 1.8e-11 -0.0024 0.0004 1.3e-11 3.2e-11 -6.39 1.7e-10 1.1
rs7773004 6: 26267755 G/A 0.49 -0.0023 0.0003 5.2e-11 -0.0023 0.0003 6.4e-11 1.5e-10 -3.05 0.002 0.6
rs7856039 9: 118385117 C/T 0.67 0.0024 0.0004 5.7e-11 0.0024 0.0004 8.1e-11 1.9e-10 0.10 0.92 ̶
rs9880192 3: 128297569 C/G 0.41 -0.0023 0.0004 8.0e-11 -0.0023 0.0004 8.7e-11 2.0e-10 -2.03 0.04 ̶
rs192647746 4: 182760366 G/A 0.001 0.0336 0.0054 4.5e-10 0.0333 0.0052 1.2e-10 2.9e-10 -1.01 0.31 ̶
rs12625547 20: 50154647 G/T 0.17 -0.0029 0.0005 1.2e-10 -0.0029 0.0005 1.3e-10 2.9e-10 -2.18 0.03 ̶
rs236530 17: 68217471 T/C 0.75 -0.0026 0.0004 1.8e-10 -0.0026 0.0004 1.4e-10 3.2e-10 -2.84 0.005 ̶
rs2241173 17: 70028445 G/A 0.58 -0.0022 0.0004 4.5e-10 -0.0022 0.0004 4.4e-10 9.7e-10 -1.84 0.07 ̶
rs73107980 12: 48187601 T/C 0.24 0.0025 0.0004 1.7e-09 0.0025 0.0004 1.4e-09 2.8e-09 1.65 0.10 ̶
rs10084824 4: 89515141 A/C 0.01 0.0100 0.0018 4.8e-08 0.0101 0.0018 1.2e-08 2.3e-08 -1.75 0.08 ̶
rs6062619 20: 62683002 G/A 0.27 -0.0022 0.0004 2.0e-08 -0.0022 0.0004 1.4e-08 2.6e-08 -2.80 0.005 ̶
rs1936800 6: 127436064 T/C 0.53 -0.0019 0.0003 2.0e-08 -0.0020 0.0003 1.9e-08 3.6e-08 -0.31 0.76 ̶
rs11693897 2: 45933993 T/C 0.004 0.0148 0.0029 3.3e-07 0.0149 0.0027 3.3e-08 6.2e-08 -0.94 0.34 ̶
rs247749 5: 76296735 C/T 0.31 0.0021 0.0004 5.5e-07 0.0021 0.0004 3.9e-08 7.3e-08 1.78 0.07 ̶
rs1061539 6: 29937555 C/T 0.83 -0.0025 0.0005 4.0e-08 -0.0025 0.0005 4.8e-08 8.9e-08 -2.69 0.007 ̶
BMI, body mass index; DVT, deep venous thrombosis; EAF, effect allele frequency; SE, standard error; SNP, single nucleotide polymorphism *Chromosome: position on chromosome according to GRCh37.p13 assembly †Effect allele/reference allele ‡P-value corrected for the genomic inflation factor §Proportion of SNP-based heritability explained by the variant
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
38
Table 2. Gene prioritization based on a literature review.
Lead SNP Candidate
gene
OMIM*
code
Location
Functional effects of the encoded proteins Traits† associated with SNPs in the revealed
loci according to previous GWAS (lead SNP,
D’ and r2 for LD‡ with our hit)
rs11121615
CASZ1 609895 Intronic Castor zinc finger 1 transcription factor
Development of heart and blood vessels: blood vessel assembly,
lumen formation, and sprouting21,22
Blood pressure and arterial hypertension23-26
(rs880315 in CASZ1, D’ = 0.12, r2 = 0.01)
VVs of lower extremities12 (rs11121615 in
CASZ1, the same SNP)
rs2911463§ PIEZO1 611184 Intronic Piezo type mechanosensitive ion channel component 1
Converts mechanical forces into biological signals: senses shear
stress in vascular endothelium, smooth muscle cells, red blood
cells, and other cells, and mediates stretch-activated currents
Determinant of vascular architecture in both development and
adult physiology27
Controls epithelial cell number: triggers cell division in response
to mechanical stretch and induces cell extrusion in response to
crowding 28
Mediates hypertension-dependent arterial remodeling through
smooth muscle cell-related mechanisms29
Regulation of blood pressure, vascular tone, NO formation, and
blood flow redistribution30,31
Regulation of platelet function: contributes to the shear-dependent
thrombus formation under arterial flow (suggestive evidence)32
Red blood cell volume regulation33,34
Mean corpuscular hemoglobin
concentration35,36
(rs10445033 in PIEZO1, D’ = 0.88,
r2 = 0.55 with rs2911463, and D’ = 0.84,
r2 = 0.12 with rs9972645;
rs9932423 in PIEZO1, D’ = 0.93,
r2 = 0.86 with rs2911463 and D’ = 0.85,
r2 = 0.17 with rs9972645;
rs551118 5’near PIEZO1, D’ = 0.96,
r2 = 0.557 with rs2911463 and D’ = 0.83,
r2 = 0.10 with rs9972645)
rs2861819 PPP3R1 601302 6 kb from
the 5’ end
Protein phosphatase 3 regulatory subunit B, alpha
(calcineurin subunit B, type 1)
Pro-inflammatory effect: induction of cytokine and chemokine
VVs of lower extremities12
(rs6712038 5’near PPP3R1, D’ = 1.00,
r2 = 0.98)
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
39
LD, linkage disequilibrium; UTR, untranslated region *Online Mendelian Inheritance in Man® database (https://www.omim.org/)
(MCP-1) production37,38
rs3101725
SLC12A2 600840 3' UTR Solute carrier family 12 member 2
Membrane ion-transport protein involved in cell-volume
regulation
Regulation of vascular tone39
Regulation of blood pressure through vascular and renal effects39
Red blood cell distribution width36,42
(rs17764730 5’near SLC12A2, D’ = 0.95,
r2 = 0.84;
rs10063647 5’near SLC12A2,
D’ = 0.93, r2 = 0.30;
rs10089 in SLC12A2, D’ = 1.00, r2 = 0.06)
FBN2 612570 70 kb from
the 3’ end Fibrillin 2
Extracellular matrix protein, component of connective tissue
microfibrils involved in elastic fiber assembly40
Is implicated into the assembly of the aortic matrix41
rs11135046 EBF1 164343 Intronic Early B cell factor 1
Transcription factor
Regulation of development and differentiation of B lymphocytes43
Regulator of metabolic and inflammatory signaling pathways in
mature adipocytes44
Blood pressure and hypertension14,45
(rs11135046 in EBF1, the same SNP, and
rs11953630 3’near EBF1, D’ = 0.13,
r2 = 0.01)
Cardiometabolic traits46
(rs4704963 in EBF1, D’ = 1.00, r2 = 0.09)
rs28558138 STIM2 610841 44 kb from
the 5’ end Stromal interaction molecule 2
Regulation of Ca2+ concentration in cytosol and endoplasmic
reticulum47
Potential role in vascular remodeling due to smooth muscle cell
phenotype alteration48
N/A
rs7773004 HFE 613609 171 kb
from the 5’
end
Homeostatic iron regulator
Regulation of iron absorption
Mean corpuscular hemoglobin concentration
and mean corpuscular volume49
(rs7773004 3’near HFE,, the same SNP)
Height50
(rs7773004 3’near HFE, the same SNP)
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
40
†The full list of traits is given in Supplementary table 2 ‡LD were determined for European populations using LDlink online tool (https://analysistools.nci.nih.gov/LDlink/) §The second independent signal in this region (tagged by rs9972645) was also located in the PIEZO1 gene
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint
41
Table 3. Summary of gene prioritization.
Lead SNP Locus* Number of
genes per
locus†
Prioritized
gene
Nearest
gene, yes/no
Evidence for
prioritization
rs11121615 1: 11075577–10575577 4 CASZ1 yes L, D
rs2911463 16: 89085545–88585545‡ 20 PIEZO1 yes L, D
rs2861819 2: 68739221–68239221 8 PPP3R1 yes L, E, S
PLEK no S
WDR92 no S
PNO1 no V, S
rs3101725 5: 127774018–127274018 3 SLC12A2 yes L
FBN2 no L
LINC01184 no S
rs11135046 5: 158480013–157980013 1 EBF1 yes L
rs28558138 4: 27068080–26568080 2 STIM2 no L, D
rs7773004 6: 26517755–26017755 41 HFE no L
D, DEPICT analysis; L, literature-based prioritization; E, expression change in VVs; S, SMR/HEIDI analysis; V, Variant Effect Predictor analysis
*Chromosome: position on chromosome according to GRCh37.p13 assembly †Calculated based on regional association plots generated with LocusZoom tool (http://locuszoom.org/, Supplementary fig.1) ‡This locus contains two independent association signals. SNP rs9972645 that tags the second signal is also located in the PIEZO1 gene
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted July 13, 2018. . https://doi.org/10.1101/368365doi: bioRxiv preprint