2
CORRESPONDENCE NATURE GENETICS | VOLUME 43 | NUMBER 5 | MAY 2011 391 sequenced in a smaller sample (Table 1). A permutation procedure that consists of comparing the observed test statistic to the ones obtained from random permutations of the cases and controls would lead to a correct estimate of the significance level, as long as the set of rare variants upon which the test statistic is based is recomputed with each replicate (from the full set of polymorphic SNPs, not just the SNPs that entered the original analysis); otherwise, the selection bias would not be carried over from one replicate to the next, and its effect would not be correctly accounted for. The advantage of working with a set of rare variants defined as those with a frequency calculated in the controls below a certain threshold (as opposed to, say, a frequency estimated from the combined sample of cases and controls) is that this procedure imposes no bounds on how high the frequency may get in the cases, which is a desirable effect. Working with a seemingly unbiased set of rare variants defined as those below a certain frequency threshold in either the cases or the controls still resulted in inflated type I errors, whereas using estimates from the combined set of cases and controls results in expected, albeit generally conservative, type I errors simplicity, whose frequencies were sampled from an infinite population according to Wright’s formula 2,3 ; a large majority of these SNPs had very low allele frequency. We simulated genotypes for 770 individuals according to the rules of Hardy-Weinberg equilibrium. We arbitrarily labeled 330 individuals as controls and the remaining 440 individuals as cases; no true genetic effect was simulated. On average, only 94 SNPs were polymorphic in the 770 individuals and 80 of them had allele frequencies in the controls of less than 1%; these numbers are similar to the number of variants discovered and the number of variants labeled as rare in Johansen et al. 1 . We compared the count of carriers of at least one minor allele across the rare SNPs, among the cases and the controls, to the count of non-carriers using Fisher’s exact test. Table 1 shows that the odds ratio estimated from these counts is close to 1.15. The true type I error of the test is 0.66% at a nominal P < 0.001 for a test known to be conservative. We note that this deviation from the null is aggravated when a larger set of rare variants enters the analysis from, for example, sequencing larger regions, and also more dramatically when the region is To the Editor: In the August 2010 edition of Nature Genetics, Johansen et al. 1 reported a genome-wide association study for hypertriglyceridemia (HTG) and re- sequenced four genes that showed association with HTG at genome-wide significant levels in a subset of 438 individuals with HTG (cases) and 327 controls. Focusing on rare protein- modifying variants (those with a minor allele frequency of at most 1% in the controls), they observed nearly twice as many carriers of at least one rare allele among the cases (28.1%) compared to the controls (15.3%; P = 2.6 × 10 –5 ). We want to point out some methodological issues that arise when aggregation-based methods are applied to a set of rare variants defined by frequency thresholds calculated from the controls. Because this procedure leaves the frequency of variants unbounded in the cases while inherently imposing an upper bound in the controls, this selection bias can result in inflated type I errors, even when no true genetic effect is present in the sequenced regions. To illustrate this, we simulated data for 380 SNPs, in linkage equilibrium for Defining rare variants by their frequencies in controls may increase type I error Table 1 Testing for the aggregation of rare alleles in cases Cases/controls SNPs Polymorphic Selection Rare OR 95% CI α (5%) α (0.1%) 440/330 380 94 Controls 80 1.148 1.143–1.153 13.1 0.66 Either 83 1.015 1.010–1.019 5.5 0.18 Combined 80 1.000 0.995–1.004 4.7 0.14 760 188 Controls 160 1.184 1.179–1.189 19.5 1.58 Either 166 1.018 1.014–1.023 6.0 0.24 Combined 161 1.001 0.997–1.005 5.1 0.06 110/80 380 57 Controls 41 1.916 1.899–1.933 47.7 7.09 Either 49 1.001 0.993–1.011 6.2 0.20 Combined 41 1.000 0.991–1.010 3.9 0.12 Odds ratio (OR), 95% confidence interval (CI) and true type I error (α) of the test at P < 5% and P < 0.1% estimated from 5,000 replicates. A genomic region that comprises SNPs in an infinite population is simulated under Wright’s formula, as applied in reference 3, with a scaled mutation rate of 0.001, a scaled-back mutation rate of 0.00033 and a scaled selection rate of 60. In the sample of cases and controls, only a fraction of the SNPs are polymorphic (Polymorphic, rounded to the near- est integer). The selection column indicates how rare variants are defined: those with frequency less than 1% in the controls; less than 1% in either the cases or the controls; and less than 1% in the combined samples of cases and controls. Depending on the selection method, the average number of polymorphic rare variants that are ascertained varies (Rare, rounded to the nearest integer). Each estimate is based on 5,000 simulated replicates. © 2011 Nature America, Inc. All rights reserved.

Defining rare variants by their frequencies in controls may increase type I error

  • Upload
    mathieu

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Defining rare variants by their frequencies in controls may increase type I error

correspondence

nature genetics | volume 43 | number 5 | mAy 2011 391

sequenced in a smaller sample (Table 1). A permutation procedure that consists of comparing the observed test statistic to the ones obtained from random permutations of the cases and controls would lead to a correct estimate of the significance level, as long as the set of rare variants upon which the test statistic is based is recomputed with each replicate (from the full set of polymorphic SNPs, not just the SNPs that entered the original analysis); otherwise, the selection bias would not be carried over from one replicate to the next, and its effect would not be correctly accounted for.

The advantage of working with a set of rare variants defined as those with a frequency calculated in the controls below a certain threshold (as opposed to, say, a frequency estimated from the combined sample of cases and controls) is that this procedure imposes no bounds on how high the frequency may get in the cases, which is a desirable effect. Working with a seemingly unbiased set of rare variants defined as those below a certain frequency threshold in either the cases or the controls still resulted in inflated type I errors, whereas using estimates from the combined set of cases and controls results in expected, albeit generally conservative, type I errors

simplicity, whose frequencies were sampled from an infinite population according to Wright’s formula2,3; a large majority of these SNPs had very low allele frequency. We simulated genotypes for 770 individuals according to the rules of Hardy-Weinberg equilibrium. We arbitrarily labeled 330 individuals as controls and the remaining 440 individuals as cases; no true genetic effect was simulated. On average, only 94 SNPs were polymorphic in the 770 individuals and 80 of them had allele frequencies in the controls of less than 1%; these numbers are similar to the number of variants discovered and the number of variants labeled as rare in Johansen et al.1. We compared the count of carriers of at least one minor allele across the rare SNPs, among the cases and the controls, to the count of non-carriers using Fisher’s exact test. Table 1 shows that the odds ratio estimated from these counts is close to 1.15. The true type I error of the test is 0.66% at a nominal P < 0.001 for a test known to be conservative. We note that this deviation from the null is aggravated when a larger set of rare variants enters the analysis from, for example, sequencing larger regions, and also more dramatically when the region is

To the Editor:In the August 2010 edition of Nature Genetics, Johansen et al.1 reported a genome-wide association study for hypertriglyceridemia (HTG) and re-sequenced four genes that showed association with HTG at genome-wide significant levels in a subset of 438 individuals with HTG (cases) and 327 controls. Focusing on rare protein-modifying variants (those with a minor allele frequency of at most 1% in the controls), they observed nearly twice as many carriers of at least one rare allele among the cases (28.1%) compared to the controls (15.3%; P = 2.6 × 10–5). We want to point out some methodological issues that arise when aggregation-based methods are applied to a set of rare variants defined by frequency thresholds calculated from the controls. Because this procedure leaves the frequency of variants unbounded in the cases while inherently imposing an upper bound in the controls, this selection bias can result in inflated type I errors, even when no true genetic effect is present in the sequenced regions.

To illustrate this, we simulated data for 380 SNPs, in linkage equilibrium for

defining rare variants by their frequencies in controls may increase type I error

Table 1 Testing for the aggregation of rare alleles in casesCases/controls SNPs Polymorphic Selection Rare OR 95% CI α (5%) α (0.1%)

440/330 380 94 Controls 80 1.148 1.143–1.153 13.1 0.66

Either 83 1.015 1.010–1.019 5.5 0.18

Combined 80 1.000 0.995–1.004 4.7 0.14

760 188 Controls 160 1.184 1.179–1.189 19.5 1.58

Either 166 1.018 1.014–1.023 6.0 0.24

Combined 161 1.001 0.997–1.005 5.1 0.06

110/80 380 57 Controls 41 1.916 1.899–1.933 47.7 7.09

Either 49 1.001 0.993–1.011 6.2 0.20

Combined 41 1.000 0.991–1.010 3.9 0.12

Odds ratio (OR), 95% confidence interval (CI) and true type I error (α) of the test at P < 5% and P < 0.1% estimated from 5,000 replicates. A genomic region that comprises SNPs in an infinite population is simulated under Wright’s formula, as applied in reference 3, with a scaled mutation rate of 0.001, a scaled-back mutation rate of 0.00033 and a scaled selection rate of 60. In the sample of cases and controls, only a fraction of the SNPs are polymorphic (Polymorphic, rounded to the near-est integer). The selection column indicates how rare variants are defined: those with frequency less than 1% in the controls; less than 1% in either the cases or the controls; and less than 1% in the combined samples of cases and controls. Depending on the selection method, the average number of polymorphic rare variants that are ascertained varies (Rare, rounded to the nearest integer). Each estimate is based on 5,000 simulated replicates.

© 2

011

Nat

ure

Am

eric

a, In

c. A

ll ri

gh

ts r

eser

ved

.

Page 2: Defining rare variants by their frequencies in controls may increase type I error

correspondence

392 volume 43 | number 5 | mAy 2011 | nature genetics

investigators should be encouraged to report not only those variants that have a frequency below a pre-defined threshold in the controls but to also report those which reach the same threshold in the cases and calculate a significance level from an unbiased set of variants or, more desirably, from a permutation procedure that correctly accounts for the bias.

COMPETING FINANCIAL INTERESTSThe author declares no competing financial interests.

Mathieu Lemire

Ontario Institute for Cancer Research, MaRS Centre, Toronto, Ontario, Canada. Correspondence should be addressed to M.L. ([email protected]).1. Johansen, C.T. et al. Nat. Genet. 42, 684–687

(2010).2. Pritchard, J.K. Am. J. Hum. Genet. 69, 124–137

(2001).3. Madsen, B.E. & Browning, S.R. PLoS Genet. 5,

e1000384 (2009).4. Li, B. & Leal, S.M. Am. J. Hum. Genet. 83, 311–321

(2008).5. Sasieni, P.D. Biometrics 53, 1253–1261 (1997).

80 loci out of twice the number of subjects in cases and controls (154 rare alleles out of 876 ‘total alleles’ in cases—calculated from 438 diploid genomes—compared to 53 rare alleles out of 654 in controls; P = 6.2 × 10–8) instead of comparing it to the total number of alleles across the loci (154 rare alleles out of 70,080 alleles in cases versus 53 out of 52,320 alleles in controls; P = 3.3 × 10–7). In more extreme scenarios, the total count of rare alleles across many loci could exceed twice the sample size.

Most of the rare variants reported in Johansen et al.1 were unique to the samples in which they were discovered and clearly aggregated more often in cases than controls; the results of Johansen et al.1 are thus unlikely to have been severely affected by this selection bias. Nevertheless, because of the potential bias and because it is difficult to fully interpret results not knowing the number of additional SNPs that could have entered the analysis if other selection criteria had been used,

(Table 1); however, these two strategies could potentially lead to a higher loss of power if, in a given region, both rare susceptibility and rare protective variants coexist3,4.

Instead of a test based on carrier counts, an allelic test that is based on comparing the total count of minor and major alleles across the rare variants could be used, a test also reported in Johansen et al.1 (but with inaccuracies, see below). We found the latter test to have higher true type I errors than the former (data not shown): this aggravation may partly be caused by the procedure of adding up alleles across loci, which effectively produces what would be the equivalent of a single ‘super locus’ in a much larger sample that is unlikely to follow the rules of Hardy-Weinberg equilibrium (because of potential differences in allele frequencies), a situation that is known to produce invalid allelic tests of association5. In their allelic test, Johansen et al.1 incorrectly compared the total number of rare alleles across the

To the Editor:Johansen et al.1 report an excess of rare variants in individuals with hyper-triglyceridemia (HTG) (cases) compared to controls. The definition of rare variants as those having low minor allele frequency in controls (as opposed to in all samples) biases the selection of rare variants in favor of those that have higher frequencies in cases. We used a simulation study to show that, although this bias is unlikely to alter the main conclusions of the Johansen et al.1 study, using such an approach in studies with different population allele frequencies could lead to erroneous conclusions.

The bias can be appreciated intuitively by considering two SNPs, A and B, both having a true frequency of 1% in both cases and controls. Suppose that in the samples observed, the frequency of A is 0.9% in cases and 1.1% in controls (caused by noise), whereas the frequency of B is 1.1% in cases and 0.9% in controls. SNP A will be removed from further consideration because the frequency in the controls is >1%. SNP B will be included in the analysis and will show a greater frequency in cases as compared to controls.

Johansen et al.1 reported 154 rare variants in 438 individuals with HTG and 53 rare variants in 327 controls. This gives a case to control frequency ratio of 2.17. In an attempt to create a dataset with similar allele frequencies, we simulated 180 variants with allele frequencies at regular intervals from 0.00005–0.095%, 10 variants with frequencies at regular intervals from 0.1–1% and 13 variants at regular frequencies from 2–14%. We simulated each variant using the same frequency in 438 cases samples and 327 control samples. We then excluded variants that had a frequency of (i) >1% in controls and (ii) >1% in cases and controls combined. We repeated the simulations 10,000 times. After we excluded variants that had a frequency of >1% in controls the mean number of singleton variants across 10,000 simulations was 56, similar to the 54 reported by Johansen et al.1. The mean number of rare variants with >10 alleles was 2.03, similar to the 2 reported by Johansen et al.1. The mean total number of rare variant alleles was 199, similar to the 207 reported by Johansen et al.1. We excluded a mean of 14.5 variants, as they

had allele frequencies >1%, similar to the 14 reported by Johansen et al.1. We conclude that the distribution of allele frequencies of the variants simulated was similar to the distribution of allele frequencies in the Johansen et al.1 study.

For each simulation, we calculated the case to control frequency ratios. When variants were excluded that had a frequency of >1% in controls, the median case to control frequency ratio was 1.049. When variants were excluded that had a frequency of >1% in cases and controls combined, the median case to control frequency ratio was 1.003. The bias introduced by setting a frequency threshold in controls only results in a significant difference in case to control frequency ratio (two sided t test P = 2.2 × 10–16). It is important to stress that the case to control frequency ratios seen in these simulations are much lower than the 2.16 reported by Johansen et al.1. As such, it is highly unlikely that the bias introduced by using a frequency threshold based on controls only will alter the main conclusions in that study.

We were interested in better understanding the extent of bias introduced by setting a rare

Bias due to selection of rare variants using frequency in controls

© 2

011

Nat

ure

Am

eric

a, In

c. A

ll ri

gh

ts r

eser

ved

.