21
The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for Genome-Wide Time-to-Event Data Analysis and Its Application to UK Biobank Wenjian Bi, Lars G. Fritsche, Bhramar Mukherjee, Sehee Kim, and Seunggeun Lee

A Fast and Accurate Method for Genome-Wide Time-to-Event … · 2020. 6. 25. · The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Fast and Accurate Method for Genome-Wide Time-to-Event … · 2020. 6. 25. · The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for

The American Journal of Human Genetics, Volume 107

Supplemental Data

A Fast and Accurate Method for Genome-Wide

Time-to-Event Data Analysis and Its Application

to UK Biobank

Wenjian Bi, Lars G. Fritsche, Bhramar Mukherjee, Sehee Kim, and Seunggeun Lee

Page 2: A Fast and Accurate Method for Genome-Wide Time-to-Event … · 2020. 6. 25. · The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for

Figure S1. Comparisons between Score test and SPACox-NoSPA based on 𝑽𝒂�̂�𝒆𝒎𝒑(𝑺).

(A) Comparison between 𝑉𝑎�̂�(𝑆) estimated from observed information matrix and empirical

variance 𝑉𝑎�̂�𝑒𝑚𝑝(𝑆), (B) Comparison of p values between Score test and SPACox-NoSPA, (C)

QQ plot of -log10(p values) of Score test and SPACox-NoSPA. We simulated 2×105 replications

under three event rates (ERs) of 1%, 10% and 50%. The sample size was 4,000 and we considered

common variants (MAF = 0.3, expected MAC = 2,400) and low-frequency variants (MAF = 0.01,

expected MAC = 80). MAF: Minor Allele Frequency; MAC: Minor Allele Counts. Score test and

SPACox-NoSPA use 𝑉𝑎�̂�(𝑆) and 𝑉𝑎�̂�𝑒𝑚𝑝(𝑆) to standardize the score statistics and calculate p

values.

Page 3: A Fast and Accurate Method for Genome-Wide Time-to-Event … · 2020. 6. 25. · The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for

Figure S2. Comparisons between Score test and SPACox-NoSPA based on 𝑽𝒂�̂�𝒆𝒎𝒑(𝑺)|�̇�.

(A) Comparison between 𝑉𝑎�̂�(𝑆) estimated from observed information matrix and empirical

variance 𝑉𝑎�̂�𝑒𝑚𝑝(𝑆)|�̇�, (B) Comparison of p values between Score test and SPACox-NoSPA with

variance 𝑉𝑎�̂�𝑒𝑚𝑝(𝑆)|�̇�, (C) QQ plot of -log10(P) of Score test and SPACox-NoSPA. We simulated

2×105 replications under three event rates (ERs) of 1%, 10% and 50%. The sample size was 4,000

and we considered common variants (MAF = 0.3, expected MAC = 2,400) and low-frequency

variants (MAF = 0.01, expected MAC = 80). MAF: Minor Allele Frequency; MAC: Minor Allele

Counts. Score test and SPACox-NoSPA use 𝑉𝑎�̂�(𝑆) and 𝑉𝑎�̂�𝑒𝑚𝑝(𝑆)|�̇� to standardize the score

statistics and calculate p values.

Page 4: A Fast and Accurate Method for Genome-Wide Time-to-Event … · 2020. 6. 25. · The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for

Figure S3. Empirical Type I Error Rates of Wald Test Based on Signs of �̂�.

From left to right, the plots considered 5 event rates (ERs) of 0.2%, 1%, 10%, 20%, and 50%. Top

and bottom plots are for empirical type I error rates at 𝛼 = 5 × 10−5 and 5 × 10−8, respectively.

Sample size 𝑛 = 100,000. For each pair of MAF and event rate, we simulated 109 replications.

Page 5: A Fast and Accurate Method for Genome-Wide Time-to-Event … · 2020. 6. 25. · The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for

Figure S4. Empirical Powers of SPACox, Firth, Wald, Score, and SPACC Tests when 𝜸 is

Negative.

From left to right, the plots considered 3 MAFs of 0.01, 0.05, and 0.3. From top to bottom, the

plots considered 5 event rates (ERs) of 0.2%, 1%, 10%, 20%, and 50%. Empirical powers were

evaluated at the significance level 5 × 10−8. Sample size 𝑛 = 100,000. For each pair of MAF and

event rate, we simulated 1,000 replications.

Page 6: A Fast and Accurate Method for Genome-Wide Time-to-Event … · 2020. 6. 25. · The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for

Figure S5. Empirical Powers of SPACox, Firth, Wald, Score, and SPACC tests when Testing

Rare Variants with MAF of 0.001.

From top to bottom, the plots considered 5 event rates (ERs) of 0.2%, 1%, 10%, 20%, and 50%.

Empirical powers were evaluated at a significance level 5 × 10−8. Sample size 𝑛 = 100,000. For

each event rate, we simulated 1,000 replications.

Page 7: A Fast and Accurate Method for Genome-Wide Time-to-Event … · 2020. 6. 25. · The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for

Figure S6. QQ Plots for 12 Diseases from UK Biobank.

QQ plots were based on p values calculated from SPACox method. The red line represents the

genome-wide significance level 𝛼 = 5 × 10−8.

Page 8: A Fast and Accurate Method for Genome-Wide Time-to-Event … · 2020. 6. 25. · The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for

Figure S7. Manhattan Plots for 12 Diseases from UK Biobank (SPACox-NoSPA).

Manhattan plots were based on p values calculated from SPACox-NoSPA method. The red line

represents the genome-wide significance level 𝛼 = 5 × 10−8. SPACox-NoSPA uses normal

approximation (based on empirical variance 𝑉𝑎�̂�𝑒𝑚𝑝(𝑆)) for all SNPs.

Page 9: A Fast and Accurate Method for Genome-Wide Time-to-Event … · 2020. 6. 25. · The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for

Figure S8. QQ Plots for 12 Diseases from UK Biobank (SPACox-NoSPA).

QQ plots were based on p values calculated from SPACox-NoSPA method. The red line

represents the genome-wide significance level 𝛼 = 5 × 10−8. SPACox-NoSPA uses normal

approximation (based on empirical variance 𝑉𝑎�̂�𝑒𝑚𝑝(𝑆)) for all SNPs.

Page 10: A Fast and Accurate Method for Genome-Wide Time-to-Event … · 2020. 6. 25. · The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for

Figure S9. QQ Plots and Manhattan Plots for Three Diseases of Essential Hypertension,

Asthma and Alzheimer’s Disease from UK Biobank (Wald).

Upper plots are for QQ plots and lower plots are for Manhattan plots. P values were calculated

from a hybrid-version Wald test in which Wald test is used when p values of SPACox < 5e-3.

Wald test was performed via coxph function in R package survival. The red line represents the

genome-wide significance level 𝛼 = 5 × 10−8.

Page 11: A Fast and Accurate Method for Genome-Wide Time-to-Event … · 2020. 6. 25. · The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for

Figure S10. Hazard Ratios and MAFs of the 611 Significant Loci from UK Biobank.

SPACox identified 611 significant loci for the 12 diseases. Variants within a region of 200kb or

at the same gene were treated as the same locus.

Page 12: A Fast and Accurate Method for Genome-Wide Time-to-Event … · 2020. 6. 25. · The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for

(A) (B)

Figure S11. P values of SPACC and SPACox for the 611 Significant Loci from UK

Biobank.

(A) SPACC uses top 4 PCs, sex, and birth year as covariates, (B) SPACC uses top 4 PCs, sex,

and time-to-event as covariates. SPACox identified 611 significant loci for the 12 diseases. The

red line represents the genome-wide significance level 𝛼 = 5 × 10−8. PC: principal component.

Page 13: A Fast and Accurate Method for Genome-Wide Time-to-Event … · 2020. 6. 25. · The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for

Figure S12. Cumulative Risk Curves of the most Significant SNPs for 12 diseases.

Page 14: A Fast and Accurate Method for Genome-Wide Time-to-Event … · 2020. 6. 25. · The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for

Table S1. Empirical Type I Error Rates of SPACox, SPACox-NoSPA, Wald, Firth, and Score

Tests.

Significance

Level Event

Rate MAF Empirical Type I Error Rates

SPACox SPACox-

NoSPA

Firth Wald Score

5.00E-05 0.2% 0.001 5.24E-05 0.004 6.43E-05 0.0006 0.0008 0.01 4.72E-05 0.0004 4.07E-05 0.0003 0.0005 0.3 4.32E-05 5.08E-05 4.94E-05 4.44E-05 5.13E-05

1% 0.001 4.90E-05 0.0007 5.01E-05 0.0004 0.0007 0.01 4.88E-05 0.0001 4.84E-05 0.0001 0.0001 0.3 4.83E-05 4.96E-05 4.96E-05 4.88E-05 5.01E-05

10% 0.001 5.02E-05 7.43E-05 5.22E-05 0.0001 0.0002 0.01 4.93E-05 5.16E-05 4.98E-05 5.84E-05 6.16E-05 0.3 5.02E-05 5.03E-05 5.04E-05 5.04E-05 5.05E-05

20% 0.001 5.02E-05 6.89E-05 5.27E-05 0.0001 0.0001 0.01 4.92E-05 5.10E-05 4.98E-05 5.53E-05 5.68E-05 0.3 4.95E-05 4.96E-05 4.99E-05 4.98E-05 4.99E-05

50% 0.001 5.01E-05 8.27E-05 5.22E-05 7.94E-05 8.69E-05 0.01 5.01E-05 5.32E-05 5.00E-05 5.28E-05 5.34E-05 0.3 5.04E-05 5.05E-05 5.00E-05 5.00E-05 5.01E-05

5.00E-08 0.2% 0.001 2.11E-08 0.0005 6.68E-08 3.76E-05 0.0004 0.01 3.94E-08 1.11E-05 4.31E-08 3.22E-06 1.25E-05 0.3 2.13E-08 4.25E-08 3.30E-08 4.54E-08 5.64E-08

1% 0.001 4.75E-08 2.61E-05 7.10E-08 9.14E-06 5.04E-05 0.01 4.87E-08 8.48E-07 5.15E-08 6.60E-07 1.22E-06 0.3 3.39E-08 3.74E-08 3.65E-08 4.36E-08 4.64E-08

10% 0.001 4.61E-08 2.02E-07 5.16E-08 1.10E-06 2.11E-06 0.01 4.44E-08 4.61E-08 4.43E-08 1.23E-07 1.42E-07 0.3 5.25E-08 5.25E-08 5.39E-08 5.39E-08 5.53E-08

20% 0.001 3.72E-08 1.40E-07 6.47E-08 6.07E-07 9.47E-07 0.01 6.50E-08 7.55E-08 6.50E-08 8.16E-08 9.40E-08 0.3 4.82E-08 4.82E-08 5.32E-08 4.76E-08 4.76E-08

50% 0.001 3.32E-08 3.35E-07 4.19E-08 2.74E-07 3.49E-07 0.01 4.36E-08 6.11E-08 4.61E-08 7.46E-08 7.60E-08 0.3 5.71E-08 5.71E-08 4.85E-08 4.71E-08 4.85E-08

Page 15: A Fast and Accurate Method for Genome-Wide Time-to-Event … · 2020. 6. 25. · The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for

Table S2. UK Biobank Inpatient Data Provider Information.

Hosptital

Admissions

(Inpatients)

Data Provider International

Classification of Disease

Censoring Total Sample

Size

Sample Size in UK

Biobank Analysis ICD9 ICD10

Hospital Episode

Statistics for

England

NHS Digital 1996

onwards

31 March

2017

366,439 248,992

Scottish Morbidity

Record Information and

Statistics Division,

Scotland

1987-1996 1996

onwards 31 October

2016

31,135 22,193

Patient Episode

Database for Wales Secure Anonymized

Information

Linkage, Wales

1999

onwards 29 February

2016

16,115 11,686

Page 16: A Fast and Accurate Method for Genome-Wide Time-to-Event … · 2020. 6. 25. · The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for

Table S3: Summary Information of the 38 Loci Significant Based on SPACox but not Significant Based on SPACC. We use a

Genome-wide Significance Level 5×10-8. (UTR = untranslated region, ncRNA = non-coding RNA)

Phenotype RSID CHR REF ALT MAF Gene.refGene Func.refGene Cox PH Walda SPACC

p value

SPACox p

value HR SE p value

Asthma rs5758324 22 G T 0.2145 TEF intronic 1.06 0.011 2.49E-08 1.19E-07 2.39E-08

rs2844649 6 A G 0.1135 SFTA2;

DPCR149

intergenic 1.08 0.014 5.03E-08 1.63E-07 4.45E-08

Cardiac

dysrhythmias

rs412768 14 A G 0.3109 MYH645 intronic 1.05 0.009 2.42E-08 5.51E-08 2.46E-08

Cataract rs149821426 4 G A 0.0033 LINC02429;

MIR548AG1

intergenic 1.49 0.071 1.87E-08 6.64E-08 4.83E-08

rs1043618 6 G C 0.3792 HSPA1A46 UTR5 1.05 0.010 3.76E-08 7.12E-08 3.82E-08

Coronary

atherosclerosis

rs9515203 13 T C 0.2641 COL4A241 intronic 0.94 0.012 1.22E-08 5.99E-08 1.28E-08

rs112043140 3 C T 0.2211 LRRC2 intronic 1.07 0.012 4.59E-08 9.66E-08 4.88E-08

Essential

hypertension

rs2304615 19 A G 0.2064 REXO1 intronic 0.97 0.006 4.87E-08 5.03E-08 4.30E-08

rs752520449 12 A G 0.0005 LINC02400;

GXYLT1

intergenic 1.82 0.106 1.85E-08 5.75E-08 4.65E-08

rs10838835 11 A G 0.1494 OR4B1;

OR4X2

intergenic 0.96 0.007 1.43E-08 6.27E-08 1.15E-08

rs7763581 6 T G 0.4866 FOXC1 downstream 0.97 0.005 3.06E-08 6.36E-08 2.93E-08

rs73094438 7 T C 0.0069 LOC401324;

HERPUD2

intergenic 1.19 0.031 1.34E-08 6.86E-08 1.85E-08

rs2814949 6 A G 0.3595 C6orf106 intronic 1.03 0.005 9.44E-09 6.90E-08 9.01E-09

rs76702537 12 C A 0.0316 LINC02468;

PDE3A47

intergenic 0.92 0.015 1.91E-08 6.93E-08 1.35E-08

rs9932220 16 G A 0.2181 SALL1;

LINC0157144

intergenic 0.97 0.006 2.48E-08 6.98E-08 2.13E-08

rs10828266 10 A G 0.2844 DNAJC1 intronic 0.97 0.006 1.01E-08 7.21E-08 9.89E-09

rs2725371 8 A G 0.3044 PURG UTR3 0.97 0.006 2.92E-08 7.51E-08 2.89E-08

rs9683944 4 A G 0.2034 LINC02510;

PCDH18

intergenic 1.04 0.006 3.64E-08 8.76E-08 3.63E-08

rs2282143 6 C T 0.0155 SLC22A1 exonic 1.12 0.020 1.77E-08 9.23E-08 2.52E-08

rs4678408 3 A G 0.3708 NME9;

MRAS

intergenic 0.97 0.005 9.43E-09 9.89E-08 9.20E-09

rs10756197 9 G A 0.4848 PTPRD-AS2;

TYRP143

intergenic 1.03 0.005 2.71E-08 1.00E-07 2.64E-08

Page 17: A Fast and Accurate Method for Genome-Wide Time-to-Event … · 2020. 6. 25. · The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for

rs433750 5 T G 0.4215 LOC101927078 ncRNA_intronic 0.97 0.005 3.63E-08 1.05E-07 3.41E-08

rs11657730 17 C T 0.3621 LOC100130370;

BAHCC1

intergenic 0.97 0.005 1.97E-08 1.07E-07 1.78E-08

rs10067451 5 G A 0.1094 LINC00461 ncRNA_intronic 0.95 0.008 3.03E-08 1.12E-07 2.33E-08

rs35275911 7 G C 0.1947 LOC101926943 ncRNA_intronic 1.04 0.006 2.67E-08 1.29E-07 2.75E-08

rs13062241 3 C T 0.4700 FGD539 intronic 0.97 0.005 5.20E-09 1.39E-07 5.06E-09

rs2046301 11 A G 0.1426 OR4C3;

OR4C45

intergenic 0.96 0.007 3.47E-08 1.55E-07 2.79E-08

rs2517521 6 A G 0.1453 HCG2248 UTR3 1.04 0.007 5.02E-08 2.04E-07 4.26E-08

rs28724242 6 A G 0.2964 HLA-DQB142 intronic 1.03 0.006 4.36E-08 2.05E-07 4.26E-08

rs9603420 13 G T 0.4888 B3GLCT;

RXFP2

intergenic 1.03 0.005 2.41E-08 4.64E-07 2.27E-08

Hyperlipidemia rs2517521 6 A G 0.1453 HCG22 UTR3 1.06 0.011 3.70E-08 1.01E-07 3.39E-08

Osteoarthrosis rs114786346 12 T C 0.0892 RFLNA intronic 1.08 0.014 3.78E-08 6.16E-08 4.19E-08

rs62063281 17 A G 0.2219 MAPT intronic 1.06 0.010 2.61E-08 6.18E-08 2.71E-08

rs1724411 17 T C 0.2303 LRRC37A4P;

MAPK8IP1P2

intergenic 1.06 0.010 2.64E-08 6.68E-08 2.73E-08

rs4841411 8 C G 0.4298 RP1L1;

MIR4286

intergenic 1.05 0.008 5.18E-08 7.59E-08 4.88E-08

rs2532386 17 G A 0.2253 KANSL1;

LRRC37A

intergenic 1.06 0.010 4.29E-08 9.18E-08 4.39E-08

Type 2

diabetes

rs646123 17 G A 0.2867 MLX intronic 1.06 0.011 3.11E-08 6.30E-08 3.33E-08

rs146886108 5 C T 0.0071 ANKH exonic 0.66 0.074 3.73E-08 6.64E-08 3.55E-08 a HR: hazard ratio, exp(𝛾); SE: standard error.

Page 18: A Fast and Accurate Method for Genome-Wide Time-to-Event … · 2020. 6. 25. · The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for

Table S4: Summary Information of the 17 Loci Significant Based on SPACC but not Significant Based on SPACox. We use a

Genome-wide Significance Level 5×10-8. (UTR = untranslated region, ncRNA = non-coding RNA)

Phenotype RSID CHR REF ALT MAF Gene.refGene Func.refGene Cox PH Walda SPACC

p value

SPACox

p value HR SE p value

Type 2

diabetes

rs3810291 19 G A 0.3225 ZC3H4 UTR3 1.06 0.011 8.04E-08 4.24E-08 7.94E-08

Hyperlipide

mia

rs2287029 19 C T 0.1885 DNM2 intronic 0.95 0.010 5.04E-08 2.38E-08 5.01E-08

Essential

hypertension

rs11857726 15 G A 0.3793 CHP1 intronic 1.03 0.005 5.23E-08 4.74E-08 5.12E-08

rs12902197 15 T A 0.1873 MIR4713HG ncRNA_intronic 1.04 0.007 5.89E-08 1.25E-08 5.59E-08

rs11249906 8 T G 0.3053 PPP1R3B;

LOC101929128

intergenic 1.03 0.006 5.91E-08 1.65E-08 5.61E-08

rs11998678 8 C T 0.4651 CTSB;DEFB13

6

intergenic 1.03 0.005 6.47E-08 4.42E-09 6.00E-08

rs7115856 11 A C 0.4633 HSD17B12 intronic 0.97 0.005 7.56E-08 3.09E-08 7.21E-08

rs142076278 16 A G 0.0148 LINC01571;

C16orf97

intergenic 1.12 0.022 1.04E-07 3.63E-08 1.24E-07

rs2867695 4 C T 0.1069 ANTXR2;

PRDM8

intergenic 0.96 0.008 1.10E-07 3.02E-08 9.82E-08

rs1242765 14 G A 0.2353 UNC79 intronic 0.97 0.006 1.20E-07 4.55E-08 1.08E-07

rs2251473 8 C A 0.4452 MTMR9 intronic 1.03 0.005 1.31E-07 7.96E-09 1.21E-07

rs2341599 4 G A 0.3406 MAP9;

GUCY1A3

intergenic 0.97 0.005 1.34E-07 9.78E-09 1.29E-07

rs8184986 22 A T 0.1345 CHEK2 intronic 1.04 0.007 1.65E-07 2.81E-08 1.67E-07

Coronary

atherosclero

sis

rs2073532 7 G C 0.3197 ETV1 intronic 1.06 0.011 5.08E-08 4.14E-08 5.31E-08

rs8003602 14 T C 0.2604 HHIPL1;

CYP46A1

intergenic 1.07 0.012 5.48E-08 3.81E-08 5.54E-08

rs10841443 12 C G 0.3315 LINC02398 ncRNA_intronic 1.06 0.011 5.90E-08 4.81E-08 6.08E-08

Asthma rs6835638 4 C T 0.1516 IL21-AS1 ncRNA_intronic 1.07 0.012 1.87E-07 4.25E-08 1.73E-07 a HR: hazard ratio, exp(𝛾); SE: standard error.

Page 19: A Fast and Accurate Method for Genome-Wide Time-to-Event … · 2020. 6. 25. · The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for

Table S5. Empirical Type I Error Rates of SPACox When Covariates are Time-Varying.

Event Rate MAF Significance Level SPACox Type I Error Rates

0.20% 0.001 5.00E-05 5.99E-05

5.00E-08 5.00E-08

0.01 5.00E-05 4.75E-05

5.00E-08 4.20E-08

0.3 5.00E-05 4.38E-05

5.00E-08 2.01E-08

1% 0.001 5.00E-05 4.96E-05

5.00E-08 4.00E-08

0.01 5.00E-05 4.93E-05

5.00E-08 4.00E-08

0.3 5.00E-05 4.84E-05

5.00E-08 5.45E-08

10% 0.001 5.00E-05 4.98E-05

5.00E-08 4.20E-08

0.01 5.00E-05 5.00E-05

5.00E-08 5.60E-08

0.3 5.00E-05 5.01E-05

5.00E-08 4.11E-08

Page 20: A Fast and Accurate Method for Genome-Wide Time-to-Event … · 2020. 6. 25. · The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for

Supplementary Methods

Section A: Discussion about the time-varying covariates

One of the strengths of the Cox PH model is its ability to encompass covariates that change over

time. Survival package gives a detailed vignette55 to describe how to incorporate time-varying

covariates into Cox PH model. SPACox follows its process to use time intervals to code time-

varying covariates. Suppose that multiple time intervals are for one subject, after fitting a null Cox

PH model, we add the martingale residuals corresponding to these time intervals as the overall

martingale residual of this subject, and then calculate empirical SPA of the martingale residuals.

In step 2, we use the weighted mean of the covariates to calculate centered covariate-adjusted

genotype �̃�.

We carried out simulation studies to evaluate type I error rates of SPACox. Similar as in the

main text, for subject 𝑖, we first simulated the censoring time 𝐶𝑖 and the underlying failure time

𝑇𝑖∗ , and then calculated 𝑇𝑖 = min(𝑇𝑖

∗, 𝐶𝑖) and 𝛿𝑖 = 𝐼(𝑇𝑖∗ ≤ 𝐶𝑖) . The censoring time 𝐶𝑖 was

simulated following a Weibull distribution with the scale parameter of 0.15 and the shape

parameter of 1. The survival time 𝑇𝑖 was simulated from a Cox proportional hazard model with a

Weibull baseline hazard function and two time-varying covariates as

𝑇𝑖(𝑋𝑖1, 𝑋𝑖2) =

{

√𝜆2 ⋅− log𝑈𝑖

exp(𝜂𝑖0), − log𝑈𝑖 <

exp(𝜂𝑖0) 𝑡𝑆

2

𝜆2

√𝜆2 ⋅− log𝑈𝑖

exp(𝜂𝑖1)−𝑡𝑆2 exp(𝜂𝑖

0)

exp(𝜂𝑖1)

+ 𝑡𝑆2, − log𝑈𝑖 ≥

exp(𝜂𝑖0) 𝑡𝑆

2

𝜆2

where 𝑈𝑖 was simulated following a uniform distribution on an interval (0,1), 𝑡𝑆 = 0.2 was the

time point at which covariate 𝑋𝑖1 changed from 𝑥𝑖10 to 𝑥𝑖1

1 and covariate 𝑋𝑖2 changed from 𝑥𝑖20 to

𝑥𝑖21 . We simulated 𝑥𝑖1

𝑗, 𝑗 = 0, 1 following a standard normal distribution and simulated 𝑥𝑖2

𝑗, 𝑗 =

Page 21: A Fast and Accurate Method for Genome-Wide Time-to-Event … · 2020. 6. 25. · The American Journal of Human Genetics, Volume 107 Supplemental Data A Fast and Accurate Method for

0, 1 following a Bernoulli distribution with a probability of 0.5. Linear predictor 𝜂𝑖𝑗= 0.5𝑥𝑖1

𝑗+

0.5𝑥𝑖2𝑗, 𝑗 = 0,1 and the scale parameter 𝜆 is selected to correspond to fixed event rates.

We considered common, low-frequency and rare variants with MAFs of 0.3, 0.01 and 0.001,

and simulated 106 genetic variants for each MAF. We considered five event rates of 0.2%, 1% and

10%, and simulated 1,000 datasets of time-to-event phenotypes for each event rate. Hence, for

each pair of MAF and event rate, totally 109 replications were evaluated. The type I error rates of

SPACox is presented in Table S5. We can see that, in all parameter settings, type I error rates can

be well controlled.

55. Therneau, T., Crowson, C., and Atkinson, E. (2017). Using time dependent covariates and

time dependent coefficients in the cox model. Survival Vignettes.