6
Extensive tissue-related and allele-related mtDNA heteroplasmy suggests positive selection for somatic mutations Mingkun Li a,b , Roland Schröder a , Shengyu Ni a , Burkhard Madea c , and Mark Stoneking a,1 a Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, D04103 Leipzig, Germany; b Fondation Mérieux, Lyon 69002, France; and c Institut für Rechtsmedizin, Universitätsklinikum Bonn, D53111 Bonn, Germany Edited by Wen-Hsiung Li, University of Chicago, Chicago, IL, and approved January 22, 2015 (received for review October 13, 2014) Heteroplasmy in human mtDNA may play a role in cancer, other diseases, and aging, but patterns of heteroplasmy variation across different tissues have not been thoroughly investigated. Here, we analyzed complete mtDNA genome sequences at 3,500× average coverage from each of 12 tissues obtained at autopsy from each of 152 individuals. We identified 4,577 heteroplasmies (with an alter- native allele frequency of at least 0.5%) at 393 positions across the mtDNA genome. Surprisingly, different nucleotide positions (nps) exhibit high frequencies of heteroplasmy in different tissues, and, moreover, heteroplasmy is strongly dependent on the specific con- sensus allele at an np. All of these tissue-related and allele-related heteroplasmies show a significant age-related accumulation, sug- gesting positive selection for specific alleles at specific positions in specific tissues. We also find a highly significant excess of liver-spe- cific heteroplasmies involving nonsynonymous changes, most of which are predicted to have an impact on protein function. This apparent positive selection for reduced mitochondrial function in the liver may reflect selection to decrease damaging byproducts of liver mitochondrial metabolism (i.e., survival of the slowest). Overall, our results provide compelling evidence for positive selec- tion acting on some somatic mtDNA mutations. mtDNA | heteroplasmy | selection | human | tissue variation A lthough mtDNA heteroplasmy (intraindividual variability in mtDNA sequences) was initially thought to be rare in humans, studies using next-generation sequencing platforms have documented extensive heteroplasmy at low levels (<2%) (14). Heteroplasmy is thought to represent an intermediate stage in the fixation of mtDNA mutations within an individual, and heteroplasmic mtDNA mutations have been implicated in various diseases, cancer, and aging (58). Heteroplasmies occur prefer- entially at positions with a high mutation rate (2, 9), suggesting that mutation and drift are the primary forces influencing hetero- plasmy. In addition, elevated levels of nonsynonymous (NS) heteroplasmies within individuals relative to NS polymorphisms among individuals (2) suggest that there is negative selection against heteroplasmies that involve deleterious amino acid changes. In other words, deleterious amino acid changes can be observed as heteroplasmies as long as the allele frequency stays below a certain threshold and normalmitochondrial function is maintained; exceeding this threshold results in impaired mi- tochondrial function, as is commonly observed for disease- associated mtDNA mutations (10). However, fundamental aspects about the nature of hetero- plasmy remain unknown. For example, although certain nucleotide positions (nps) in the mtDNA genome are prone to heteroplasmy (2, 4, 11), it is not clear to what extent these positions simply have a high mutation rate, and thus are more prone to heteroplasmy across all tissues, vs. a role for tissue-specific processes in hetero- plasmy (i.e., certain tissues may be more prone to heteroplasmy due to their metabolic requirements, rate of cellular turnover, etc.). Tissue-specific patterns of heteroplasmy are commonly ob- served with mtDNA mutations associated with disease and are thought to reflect the differing bioenergetic requirements of dif- ferent tissues (10). Moreover, mice constructed to be heteroplasmic for different haplotypes show tissue-specific patterns of segregation over time (12, 13), suggesting positive selection related to mtDNA function in different tissues. However, these results may be specific to disease-associated mutations and to the artificial mouse con- structs. A recent study of heteroplasmy in 10 tissues from two in- dividuals found some tissue-specific patterns that suggested positive selection (11), but, currently, there is little evidence to support a significant role for positive selection as a force promoting hetero- plasmy (5, 14). Previous studies of variation in heteroplasmy across different tissues have been limited in terms of number of individuals and/or tissues studied (1, 2, 11, 1519). Here, we present the results of the most comprehensive study to date (to our knowledge) of patterns of heteroplasmy in different tissues in humans. Our results suggest a surprisingly significant role for positive selection as a force influencing some somatic heteroplasmic mutations. Results and Discussion We obtained samples from each of 12 tissues obtained at autopsy from each of 152 individuals, ranging in age at death from 3 d after birth to 96 y (mean age = 56 y; SI Appendix, Fig. S1). Details concerning the major causes of death are provided in SI Appendix, Table S1. The complete mtDNA genome was sequenced from each sample (where sample refers to a single tissue from a single Significance Heteroplasmy is the existence of different mtDNA sequences within an individual due to somatic or inherited mutations, and it has been implicated in many mtDNA-related diseases, other diseases, cancer, and aging. However, little is known about how heteroplasmy varies across different tissues from the same individual; here, we analyzed heteroplasmy variation across the entire mtDNA genome in 12 tissues obtained at autopsy from each of 152 individuals. Our results suggest that in addition to neutral processes and negative selection, positive selection has an important influence on heteroplasmy: As individuals get older, specific alleles are selected for at specific nucleotide positions in specific tissues. The functional con- sequences of these positively selected somatic mutations may play a role in human health and disease. Author contributions: M.L. and M.S. designed research; R.S. performed research; S.N. and B.M. contributed new reagents/analytic tools; M.L. analyzed data; and M.L. and M.S. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. Data deposition: The sequence reported in this paper has been deposited in the European Nucleotide Archives Sequence Read Archive (accession no. PRJEB5480). 1 To whom correspondence should be addressed. Email: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1419651112/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1419651112 PNAS | February 24, 2015 | vol. 112 | no. 8 | 24912496 GENETICS Downloaded by guest on November 6, 2020

Extensive tissue-related and allele-related mtDNA heteroplasmy … · Extensive tissue-related and allele-related mtDNA heteroplasmy suggests positive selection for somatic mutations

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Extensive tissue-related and allele-related mtDNA heteroplasmy … · Extensive tissue-related and allele-related mtDNA heteroplasmy suggests positive selection for somatic mutations

Extensive tissue-related and allele-related mtDNAheteroplasmy suggests positive selection forsomatic mutationsMingkun Lia,b, Roland Schrödera, Shengyu Nia, Burkhard Madeac, and Mark Stonekinga,1

aDepartment of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, D04103 Leipzig, Germany; bFondation Mérieux, Lyon 69002,France; and cInstitut für Rechtsmedizin, Universitätsklinikum Bonn, D53111 Bonn, Germany

Edited by Wen-Hsiung Li, University of Chicago, Chicago, IL, and approved January 22, 2015 (received for review October 13, 2014)

Heteroplasmy in human mtDNA may play a role in cancer, otherdiseases, and aging, but patterns of heteroplasmy variation acrossdifferent tissues have not been thoroughly investigated. Here, weanalyzed complete mtDNA genome sequences at ∼3,500× averagecoverage from each of 12 tissues obtained at autopsy from each of152 individuals. We identified 4,577 heteroplasmies (with an alter-native allele frequency of at least 0.5%) at 393 positions across themtDNA genome. Surprisingly, different nucleotide positions (nps)exhibit high frequencies of heteroplasmy in different tissues, and,moreover, heteroplasmy is strongly dependent on the specific con-sensus allele at an np. All of these tissue-related and allele-relatedheteroplasmies show a significant age-related accumulation, sug-gesting positive selection for specific alleles at specific positions inspecific tissues. We also find a highly significant excess of liver-spe-cific heteroplasmies involving nonsynonymous changes, most ofwhich are predicted to have an impact on protein function. Thisapparent positive selection for reduced mitochondrial function inthe liver may reflect selection to decrease damaging byproductsof liver mitochondrial metabolism (i.e., “survival of the slowest”).Overall, our results provide compelling evidence for positive selec-tion acting on some somatic mtDNA mutations.

mtDNA | heteroplasmy | selection | human | tissue variation

Although mtDNA heteroplasmy (intraindividual variabilityin mtDNA sequences) was initially thought to be rare in

humans, studies using next-generation sequencing platformshave documented extensive heteroplasmy at low levels (<2%)(1–4). Heteroplasmy is thought to represent an intermediate stagein the fixation of mtDNA mutations within an individual, andheteroplasmic mtDNA mutations have been implicated in variousdiseases, cancer, and aging (5–8). Heteroplasmies occur prefer-entially at positions with a high mutation rate (2, 9), suggestingthat mutation and drift are the primary forces influencing hetero-plasmy. In addition, elevated levels of nonsynonymous (NS)heteroplasmies within individuals relative to NS polymorphismsamong individuals (2) suggest that there is negative selectionagainst heteroplasmies that involve deleterious amino acidchanges. In other words, deleterious amino acid changes can beobserved as heteroplasmies as long as the allele frequency staysbelow a certain threshold and “normal” mitochondrial functionis maintained; exceeding this threshold results in impaired mi-tochondrial function, as is commonly observed for disease-associated mtDNA mutations (10).However, fundamental aspects about the nature of hetero-

plasmy remain unknown. For example, although certain nucleotidepositions (nps) in the mtDNA genome are prone to heteroplasmy(2, 4, 11), it is not clear to what extent these positions simply havea high mutation rate, and thus are more prone to heteroplasmyacross all tissues, vs. a role for tissue-specific processes in hetero-plasmy (i.e., certain tissues may be more prone to heteroplasmydue to their metabolic requirements, rate of cellular turnover,etc.). Tissue-specific patterns of heteroplasmy are commonly ob-served with mtDNA mutations associated with disease and are

thought to reflect the differing bioenergetic requirements of dif-ferent tissues (10). Moreover, mice constructed to be heteroplasmicfor different haplotypes show tissue-specific patterns of segregationover time (12, 13), suggesting positive selection related to mtDNAfunction in different tissues. However, these results may be specificto disease-associated mutations and to the artificial mouse con-structs. A recent study of heteroplasmy in 10 tissues from two in-dividuals found some tissue-specific patterns that suggested positiveselection (11), but, currently, there is little evidence to supporta significant role for positive selection as a force promoting hetero-plasmy (5, 14).Previous studies of variation in heteroplasmy across different

tissues have been limited in terms of number of individualsand/or tissues studied (1, 2, 11, 15–19). Here, we present the resultsof the most comprehensive study to date (to our knowledge) ofpatterns of heteroplasmy in different tissues in humans. Our resultssuggest a surprisingly significant role for positive selection as aforce influencing some somatic heteroplasmic mutations.

Results and DiscussionWe obtained samples from each of 12 tissues obtained at autopsyfrom each of 152 individuals, ranging in age at death from 3 d afterbirth to 96 y (mean age = 56 y; SI Appendix, Fig. S1). Detailsconcerning the major causes of death are provided in SI Appendix,Table S1. The complete mtDNA genome was sequenced from eachsample (where sample refers to a single tissue from a single

Significance

Heteroplasmy is the existence of different mtDNA sequenceswithin an individual due to somatic or inherited mutations, andit has been implicated in many mtDNA-related diseases, otherdiseases, cancer, and aging. However, little is known abouthow heteroplasmy varies across different tissues from thesame individual; here, we analyzed heteroplasmy variationacross the entire mtDNA genome in 12 tissues obtained atautopsy from each of 152 individuals. Our results suggest thatin addition to neutral processes and negative selection, positiveselection has an important influence on heteroplasmy: Asindividuals get older, specific alleles are selected for at specificnucleotide positions in specific tissues. The functional con-sequences of these positively selected somatic mutations mayplay a role in human health and disease.

Author contributions: M.L. and M.S. designed research; R.S. performed research; S.N. andB.M. contributed new reagents/analytic tools; M.L. analyzed data; and M.L. and M.S.wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The sequence reported in this paper has been deposited in the EuropeanNucleotide Archive’s Sequence Read Archive (accession no. PRJEB5480).1To whom correspondence should be addressed. Email: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1419651112/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1419651112 PNAS | February 24, 2015 | vol. 112 | no. 8 | 2491–2496

GEN

ETICS

Dow

nloa

ded

by g

uest

on

Nov

embe

r 6,

202

0

Page 2: Extensive tissue-related and allele-related mtDNA heteroplasmy … · Extensive tissue-related and allele-related mtDNA heteroplasmy suggests positive selection for somatic mutations

individual) to an average coverage of ∼3,458× (range: 46–36,444×; SI Appendix, Fig. S2).We used stringent criteria(Materials and Methods) to identify 4,577 heteroplasmies with analternative allele frequency of at least 0.5%, distributed across393 positions in the mtDNA genome (Fig. 1A and Dataset S1).When counting across all of the tissues from each individual,there are 1,198 heteroplasmies, of which 35% are observed in asingle tissue from an individual, 22% are observed in two tissuesfrom an individual, and 43% are observed in three or more tis-sues from an individual. Although we do not have information asto which heteroplasmies are inherited and which represent so-matic mutations, these numbers provide a rough guide, becauseheteroplasmies in a single tissue are more likely to be somaticmutations, heteroplasmies in three or more tissues are morelikely to be inherited (or to have occurred early in development),and heteroplasmies in two tissues are likely to be a mixture of

inherited and somatic mutations (because of the high mutationrate for mtDNA, we expect some cases where somatic mutationshave occurred at the same position independently in differenttissues). Thus, around 43–65% of the heteroplasmies detected inthis study are likely to be inherited.

Tissue-Specific and Allele-Specific Characteristics of HeteroplasmicMutations. Heteroplasmies are not distributed at random acrossthe mtDNA genome but occur preferentially at certain positions(Fig. 1A), mostly in the control region (CR), in agreement withprevious studies (1, 2). We focus attention on the 10 positionswhere heteroplasmy is observed most frequently, because thereare sufficient numbers of heteroplasmies for these positions toinvestigate tissue-related and age-related variation; these high-frequency heteroplasmies (HFHs) account for about 50% of thetotal number of observed heteroplasmies. Unexpectedly, theseHFH positions are strongly tissue-related (Fig. 1B). For example,np 72 shows high levels of heteroplasmy in the liver and kidney,moderate levels in skeletal muscle, and low levels in all othertissues, whereas np 189 shows high levels of heteroplasmy in skeletalmuscle but hardly any heteroplasmy in any other tissue. Thus, we donot find a general pattern of particular heteroplasmies at high fre-quency regardless of the type of tissue (implying a mutation-drivenprocess), and we do not find an elevated frequency of all hetero-plasmies in a particular tissue (implying a tissue-driven process).Instead, HFHs are localized to a particular position in a particulartissue, suggesting that heteroplasmy at this position is advantageous(or permitted) in this tissue but not in others.Another unexpected finding is that HFHs are strongly allele-

dependent. We identified seven positions for which there is a sig-nificant difference in the level of heteroplasmy depending on theconsensus allele at that position (Fig. 2). For example, heteroplasmyat np 189 is not only much more frequent in skeletal muscle than inother tissues, but heteroplasmy occurs significantly more frequentlyin all tissues when the consensus allele (based on the average allelefrequency across all tissues) at this position is A than when it is G.Thus, heteroplasmies at np 189 are biased in favor of AT→GCchanges and against GC→AT changes. In addition to the sevenpositions in Fig. 2, there are likely to be other positions with allele-related heteroplasmy, but the overall levels of heteroplasmy are toolow to detect a significant effect. Note that at some nps in sometissues the level of heteroplasmy can reach more than 50%, whichmeans that there is a different consensus allele at that np in thattissue than there is at that np in the other tissues. For example, at np16,093 virtually all individuals with C as the consensus allele in othertissues have T as the consensus allele in skeletal muscle (Fig. 2).Also note that for several positions, heteroplasmy is preferentiallyobserved when the consensus allele is the less frequent allele in thepopulation (e.g., positions 185, 16,086, 16,092, 16,093, 16,129;Fig. 2). Heteroplasmy is thus not only determined by the specificnp and the specific tissue but also by the specific consensus allele.Although some of these tissue-specific and allele-specific pat-terns have been observed in previous studies (SI Appendix, TableS2), the larger scale of the present study enables more thoroughinvestigation of the nature and significance of these patterns. Asdiscussed further below, our results imply positive selection forheteroplasmies involving specific alleles in specific tissues.Heteroplasmies are not only tissue-related and allele-related

but also age-related. There is a significant correlation betweenage and the occurrence of heteroplasmy in each individual tissue(Fig. 3), in agreement with previous studies (10, 20, 21). More-over, many individual HFHs are also significantly correlatedwith age (SI Appendix, Fig. S3), either across all tissues (e.g.,np 64, np 189) or only in some tissues (e.g., np 94, np 408). Overall,the strongest correlation observed is at np 564 in myocardial muscle(r= 0.85, P< 0.0001; SI Appendix, Fig. S3H), where age explains 70%of the heteroplasmy level variation among different individuals.Moreover, some of these age-related mutations are not independent

Fig. 1. Nonrandom distribution of heteroplasmies across positions and tis-sues. (A) Distribution and occurrence of heteroplasmy at 393 positions acrossthe human mtDNA genome. The length of each bar in the outer ring indi-cates the number of heteroplasmies observed across all individuals and tis-sues, whereas the inner rings indicate heteroplasmies for each tissue, orderedas follows from the innermost to outermost ring: ovary, skin, cerebrum,cerebellum, cortex, skeletal muscle, myocardial muscle, liver, kidney, largeintestine, small intestine, and blood. (B) Heat plot of the frequency and tissuedistribution of heteroplasmy at HFH sites (the 10 sites with the highestnumber of observed heteroplasmies). Rows are HFH sites; columns are tissues;and each cell contains a line for each individual (ordered from left to right asoldest to youngest in age), with the line color indicating the frequency ofheteroplasmy according to the scale. BL, blood; CEL, cerebellum; CER, cere-brum; CO, cortex; KI, kidney; LI, large intestine; LIV, liver; MM, myocardialmuscle; OV, ovary; SI, small intestine; SK, skin; SM, skeletal muscle.

2492 | www.pnas.org/cgi/doi/10.1073/pnas.1419651112 Li et al.

Dow

nloa

ded

by g

uest

on

Nov

embe

r 6,

202

0

Page 3: Extensive tissue-related and allele-related mtDNA heteroplasmy … · Extensive tissue-related and allele-related mtDNA heteroplasmy suggests positive selection for somatic mutations

of one another; we found nine pairs of sites in which heteroplasmiesco-occurred within the same tissue more often than expected bychance (SI Appendix, Table S3). Previous studies did not findevidence for nonindependence of heteroplasmic sites (11), butthese studies involved much smaller sample sizes, and thuslacked sufficient power to detect such associations. The potentialfunctional consequences of interactions between different hetero-plasmies in the same tissue need further investigation.The mutational spectrum of heteroplasmies differs from the

mutational spectrum of polymorphism data (i.e., differences in theconsensus sequences from the same individuals; P < 0.0001, Fisherexact test; SI Appendix, Table S4). The major difference is thattransversions are observed more often in heteroplasmies thanamong consensus sequences, especially for low-level and tissue-specific heteroplasmies (TSHs) (SI Appendix, Table S4). We pre-viously did not find a difference between the mutational spectrumof heteroplasmies vs. polymorphism data (2), which probably isbecause this previous study involved fewer samples and could onlydetect heteroplasmies with a relatively high alternative allele fre-quency. The elevated frequency of transversions among low-fre-quency heteroplasmies (SI Appendix, Table S4) is probably a furtherindication of negative selection against heteroplasmies that are

likely to have functional consequences; that is, at least some of thesetransversions presumably have a detrimental effect on mtDNAfunction and so can be tolerated as low-frequency heteroplasmiesbut cannot reach “fixation” within an individual, and hence do notappear as polymorphisms among individuals.Several lines of evidence, including verification of a subset of

the results by an independent method, suggest that these un-expected findings regarding TSH are real and not experimentalartifacts (SI Appendix, Supplementary Note 1). Mutational biasesthat might reflect biochemical processes, such as oxidation ordeamination, cannot explain the results, because for the sevenpositions that exhibit significant allele-related heteroplasmy (Fig. 2),four involve preferences for TA→CG changes over CG→TAchanges, whereas three involve preferences in the opposite di-rection, for CG→TA changes over TA→CG changes. The ex-planation that seems most likely for these tissue-related andallele-related aspects of heteroplasmy is that there is positiveselection for the specific alternative allele in the specific tissue,as suggested previously (11). For example, when the consensusallele at np 72 is T, there is presumably selection in favor of a Cin the kidney, liver, and (to a lesser extent) skeletal muscle.However, when the consensus allele is already a C, there is nosuch selection for, and hence no heteroplasmy, involving a T atthis position. In some tissues, this selection must be quite strong,because the alternative allele in some individuals becomes theconsensus allele in that tissue (Fig. 2).The functional explanation for this putative tissue-related

and allele-related selection remains unknown but may perhapsbe related to the different metabolic roles fulfilled by the

Fig. 2. Allele-related heteroplasmy. Distribution of the alternative allelefrequency at seven heteroplasmic sites that exhibit significant differences inlevels of heteroplasmy, depending on the consensus allele (defined by theconsensus sequence across all tissues for that individual). For each positionand tissue, the frequency of heteroplasmy is shown for each of the twoobserved consensus alleles at that position; the frequency of each alleleamong all of the individuals is shown in the upper left corner of each plot.Blue indicates the allele that is in higher frequency among the consensussequences, and red indicates the allele in lower frequency.

Fig. 3. Correlation between the number of heteroplasmic sites and age for eachtissue. The label for each plot indicates the tissue, correlation coefficient, and Pvalue for the null hypothesis that the correlation coefficient is equal to 0.

Li et al. PNAS | February 24, 2015 | vol. 112 | no. 8 | 2493

GEN

ETICS

Dow

nloa

ded

by g

uest

on

Nov

embe

r 6,

202

0

Page 4: Extensive tissue-related and allele-related mtDNA heteroplasmy … · Extensive tissue-related and allele-related mtDNA heteroplasmy suggests positive selection for somatic mutations

mitochondria in different tissues (22). Because all of the posi-tions that exhibit this tissue/allele-related behavior are in the CR,it seems likely that regulation of mtDNA stability, replication, and/ortranscription is involved, as suggested previously for CR het-eroplasmies (11). Indeed, many of the HFH positions in the CRpotentially alter the mtDNA secondary structure (SI Appendix,Fig. S4), which suggests that there could be an effect on regu-lation or control of mtDNA replication and/or transcription.

Excess of NS Heteroplasmies in Liver Tissue. A further striking dif-ference among tissues is observed in the distribution of TSHs innoncoding vs. coding regions, and nonsynonymous (NS) vs. syn-onymous (S) heteroplasmies. Overall, there are more TSHs in theliver than in any other tissue, and the ratio of TSHs in the CR vs.coding regions varies significantly among tissues (Fig. 4A and SIAppendix, Table S5). Furthermore, there are more TSHs involvingNS changes in the liver than in any other tissue (Fig. 4A). To in-vestigate the NS changes further, we calculated the hN/hS ratio,where hN is the number of NS heteroplasmies per NS site andhS is the number of S heteroplasmies per S site (see SI Ap-pendix, Supplementary Note 2 for further discussion of thisstatistic), for both tissue-shared heteroplasmies and TSHs (Fig.4B and SI Appendix, Table S5). The hN/hS ratio is elevated withrespect to the pN/pS ratio (NS polymorphisms per NS site di-vided by S polymorphisms per S site, where the polymorphismsare identified from the consensus sequences of the 152 indi-viduals) across all tissues (Fig. 4B). The higher hN/hS ratio isin keeping with previous observations, and probably reflectsnegative selection against some NS heteroplasmies that pre-vents them from reaching “fixation” within an individual, andthereby appearing as polymorphisms (2, 4). Surprisingly, thehN/hS ratio for TSHs in the liver is 3.11 (Fig. 4B and SI Ap-pendix, Table S5); hN/hS ratios greater than 1 are indicative ofpositive selection. To assess the statistical significance of theobserved ratio in the liver, we devised a resampling test that takesinto account the mutational spectrum of heteroplasmies (detailsare provided in SI Appendix, Supplementary Note 2), and the em-pirical P value (based on 100,000 resamplings) is 0.00241. Theseresults suggest that the hN/hS ratio in the liver is significantlygreater than 1, and hence there is positive selection for NS het-eroplasmies. Calculating the hN/hS ratio per gene (SI Appendix,Table S6) shows that this ratio is highest for ND5, but is sub-stantially elevated for many mtDNA genes, indicating that theputative positive selection for amino acid changes is occurring formany, if not all, of the mtDNA genes in the liver. Moreover, thealternative allele at 82% of all NS heteroplasmies in the liver hasnever been observed as the consensus allele among humanmtDNA sequences (SI Appendix, Table S5), and 84% of the liver-specific amino acid changes are predicted to have a medium orhigh risk of having an impact on protein function (Fig. 4C). Theproportion of liver-specific amino acid changes that are medium/high-risk is slightly more than the proportion of medium/high-riskamino acid changes for other tissues (Fig. 4C), and significantlymore than would be expected if NS heteroplasmies were occurringat random with respect to risk of a functional effect (SI Appendix,Supplementary Note 2).These results therefore suggest that there is positive selection for

somatic mutations that decrease mitochondrial function in the liver.Many important metabolic processes occur in the liver that produceDNA-damaging byproducts, and mitochondrial turnover in the liveris quite rapid compared with other tissues (23). We hypothesize thatpositive selection for presumably detrimental somatic mutations inthe liver could therefore reflect selection for a decrease in livermitochondrial function as a means of avoiding excess damage toliver mitochondria. This putative positive selection for presumablydetrimental mutations is consistent with the “survival of the slowest”hypothesis (24), which proposes that there may be positive selectionfor mutations that reduce mitochondrial damage. However, further

studies are needed to investigate the impact of the excess hN/hSratio and excess number of high/medium-risk somatic mutations onmitochondrial function in the liver.

Conclusions. We have identified an unexpectedly strong tissue-related and allele-related specificity for mtDNA heteroplasmies.

Fig. 4. Liver exhibits an excess number of NS heteroplasmies. (A) Number ofTSHs for each tissue that are noncoding, nonsynonymous (nonsyn), or synon-ymous (syn). No TSHs were observed for the CO, CEL, SI, or OV. (B) hN/hS ratiofor tissue-shared heteroplasmies, TSHs, and all heteroplasmies observed in themtDNA coding region for each tissue. The dashed horizontal line indicates thepN/pS ratio (i.e., the ratio of NS polymorphisms per NS site to S polymorphismsper S site) observed among the consensus sequences of the individuals in thisstudy. The asterisk indicates that the hN/hS ratio for liver-specific hetero-plasmies is significantly greater than 1 (P = 0.00241), based on a resampling test(SI Appendix, Supplementary Note 2), indicating positive selection for NS het-eroplasmies in the liver (also SI Appendix, Table S5). (C) Predicted risk (neutral,low, medium, or high) of functional impact for NS polymorphisms observedamong the consensus sequences of the individuals in this study, compared withNS TSHs in the liver and in all other tissues. The x-axis labels are as follows: LS-H,liver-specific heteroplasmies; other TS-H, other TSHs; Poly, polymorphisms.

2494 | www.pnas.org/cgi/doi/10.1073/pnas.1419651112 Li et al.

Dow

nloa

ded

by g

uest

on

Nov

embe

r 6,

202

0

Page 5: Extensive tissue-related and allele-related mtDNA heteroplasmy … · Extensive tissue-related and allele-related mtDNA heteroplasmy suggests positive selection for somatic mutations

These heteroplasmies are also strongly age-related. In addition,we find a significantly elevated hN/hS ratio in liver tissue. Theseresults indicate that positive selection, in addition to drift andnegative selection (2, 4, 14, 25), plays a major role in influencinghuman mtDNA heteroplasmies. Some of the HFH alleles thatwe identify are also associated with tumors of specific tissues (SIAppendix, Table S2); whether this association simply reflects theincreased frequency of that allele in that tissue during aging, orwhether the HFH plays a role in causing the tumor, remains anopen question. Evaluating the functional importance and po-tential impact of this presumed positive selection for somaticmtDNA mutations on human health and disease would nowseem to be crucial.

Materials and MethodsSample Preparation. Tissue dissections (blood, cerebellum, cerebrum, cortex,kidney, large intestine, liver, myocardial muscle, ovary, skeletal muscle, skin,and small intestine) were performed during autopsy within 48–72 h of deathand frozen at −20 °C. The collection of samples and informed consent pro-cedures for this study were approved by the Ethics Commission of theRheinische Friedrich Wilhelm University Medical Faculty and by the EthicsCommission of the University of Leipzig Medical Faculty. Approximately25 mg of tissue was mechanically homogenized with a TissueRuptor (Qiagen),and DNA was extracted from the homogenized tissue (or, in the case ofblood, from 100 μL) with QIAamp DNA Mini Kits (Qiagen) according to themanufacturer’s protocol.

Sequencing and Assembling. A double-indexed Illumina sequencing library,with barcodes specific for each sample, was prepared from each extract asdescribed previously (26). Up to 250 libraries were pooled in an equimolarratio, and mtDNA sequences were enriched via in-solution capture (27). Thecapture-enriched library pools were then pooled in an equimolar ratio intoa single pool (containing 1,732 individual libraries), which was then se-quenced on eight lanes on an Illumina HiSeq2000 platform with 95 basepair, paired-end reads. Base-calling was done with IBIS (28). A total of 3.7billion reads were obtained, which were then filtered as follows: (i) readswere discarded if the index sequence quality score was lower than 10 or ifthere were more than five bases with a quality score lower than 15, and(ii) forward reads and reverse reads were merged if they completely over-lapped (i.e., length of reads greater than length of the molecule). Afterquality-filtering, reads were first mapped to the human genome (HG19)using the Burrows–Wheeler Aligner (BWA) (29) and those reads that map-ped to the nuclear genome were removed. The remaining reads were thenmapped to the Revised Cambridge Reference Sequence (rCRS) (30), with thefirst 550 base pairs copied to the end of the sequence to permit mapping toa circular genome, followed by realignment with the Genomes AnalysisToolkit (GATK) (31). Duplicate reads were then removed by SAMtools (32),along with reads with a mapping quality score lower than 20 and a basequality score lower than 20. The preliminary consensus sequence was calledusing the majority rule [with N (no call) if the alternative allele frequencywas greater than 0.3]. The rCRS was then replaced by the consensus se-quence, and the mapping and assembly were redone by means of the sameprocedure. A total of 48 samples did not have sufficient reads and werediscarded from further analysis; the average sequencing coverage for theremaining individuals was 3,458 (range: 46–36,444). The raw sequencingdata are publicly available from the European Nucleotide Archive’s SequenceRead Archive through accession no. PRJEB5480.

Heteroplasmy Detection. In this study, we focus on heteroplasmies involvingbase substitutions at a single np; heteroplasmies involving indels requiredifferent strategies for reliable detection and will be the subject of a sub-sequent study. A two-stage process was used to identify substitution hetero-plasmies. First, we required a minimum alternative allele frequency of 2% oneach strand, at least three reads in each direction with the alternative allele,a detecting low-level mutations by utilizing the resequencing error profile ofthe data (DREEP) quality score (33) of at least 10, no indels in the 3 bp of theflanking sequence in each direction, and a total sequencing coverage that is atleast 50× and that is within 20–200% of the genome average. We also ex-cluded the poly-C runs (np 302–316, np 513–526, np 566–573, and np 16,181–16,194). Potential contamination was detected using the following criteria.First, we determined if the alternative alleles at five or more heteroplasmicpositions could define an alternative haplogroup; 28 samples were identifiedby this criterion and removed from further analysis. We then determined if

more than 50% of the heteroplasmies identified in a particular sample couldbe explained by contamination from another sample in the same library, andwe also required that more than 70% of the nucleotides from the potentialcontamination donor at the donor–recipient discrepant positions were detect-able. This latter requirement is important, because many heteroplasmies willoverlap with polymorphic positions due to the high mutation rate at somepositions; for example, sample 1 might have six heteroplasmies, three ofwhich could be derived from sample 2. However, the consensus sequencesfrom sample 1 and sample 2 differ at 20 positions; thus, if the hetero-plasmies in sample 1 are derived by contamination from sample 2, we wouldexpect to detect heteroplasmy at more of the positions at which sample 1and sample 2 differ. An additional three samples were detected by thissecond criterion, for a total of 31 samples (1.7% of the data, because eachsample is a single tissue from a single individual) removed from analysisbecause of potential contamination. In a previous study (33), this pipelineproved to be highly reliable when applied to the same type of data (false-positive discovery rate is lower than 0.01).

With this procedure, we identified 2,412 primary heteroplasmies. For eachprimary heteroplasmy in a specific tissue and individual, we then examinedthe reads covering that np for every other tissue in that individual, andcounted this np as a secondary heteroplasmy in that tissue if there was analternative allele identical to the allele for the primary heteroplasmy witha frequency of at least 0.5% and present in at least three reads in each di-rection. This procedure identified 2,165 additional heteroplasmies, for a totalof 4,577 heteroplasmies at 393 positions (Dataset S1).

Confirmation by Droplet Digital PCR.We used droplet digital PCR (ddPCR) (34)to confirm independently a subset of the heteroplasmies observed via se-quencing. Briefly, the method involves setting up a conventional PCR assayin a 20-μL volume with target DNA, primers, polymerase, nucleotides, buffer,and two allele-specific probes (one for each allele at the position of interest)labeled with different fluorescent tags. The PCR is then partitioned into∼20,000 individual emulsion droplets, each containing, on average, onetemplate DNA molecule. PCR is then carried out on the individual droplets,and the fluorescence from each droplet is read. Droplets containing eitherno template molecule or more than one template molecule are discarded,and the output is a count of the number of template molecules carryingeither the consensus or the alternative allele. The alternative allele fre-quency is then determined by dividing the number of template moleculeswith the alternative allele by the total number of template molecules. Wechose eight positions for confirmation (SI Appendix, Tables S7 and S8); thesepositions included four positions from the CR that exhibited varying degreesof tissue specificity and a wide range of alternative allele frequencies andfour positions at which NS heteroplasmies were observed in liver tissue. TheddPCR reactions containing allele-specific probes were prepared and ana-lyzed on a QX200 Droplet Digital PCR System (Bio-Rad) according to pro-tocols supplied by the manufacturer. The annealing temperature was 60 °Cin all reactions, and PCR was carried out for 40 cycles.

Secondary Structure. The secondary structure of mtDNA with differentnucleotideswas estimated for sevenpositions: 72, 185, 189, 16,083, 16,092, 16,093,and 16,129. We used Mfold software (35) and np 1–500 as the input sequencefor np 72, np 185, and np 189; for the other positions, the input sequence wasnp 16,024–16,569. In each case, the context sequence was retrieved from oneof the samples showing the allele-dependent pattern. If multiple predictionswere given for one input, the one showing the smallest change was chosen.

Estimation of hN/hS. The software KaKs_Calculator (https://code.google.com/p/kaks-calculator/) was used to estimate the number of NS and S sites (basedon the human mtDNA genetic code). Heteroplasmies observed at the sameposition in different individuals were regarded as multiple events, whereaspolymorphisms at the same position in different individuals were regardedas a single event.

Prediction of the Impact of NS Mutations. The functional impact of each NSheteroplasmy/polymorphism was predicted by Mutationassessor (36), release2. Mutations were classified as high risk, medium risk, low risk, or neutralbased on the evolutionary conservation of the affected amino acid in pro-tein homologs.

Identification of New Mutations. All of the complete mtDNA sequences (9,402entries) archived on PhyloTree, version 15 (www.phylotree.org/) were down-loaded and compared with the rCRS. All nucleotide changes relative to therCRS, including all haplogroup-defining mutations, were regarded as existing

Li et al. PNAS | February 24, 2015 | vol. 112 | no. 8 | 2495

GEN

ETICS

Dow

nloa

ded

by g

uest

on

Nov

embe

r 6,

202

0

Page 6: Extensive tissue-related and allele-related mtDNA heteroplasmy … · Extensive tissue-related and allele-related mtDNA heteroplasmy suggests positive selection for somatic mutations

mutations, whereas all other mutations identified in this study but not presentin the list of existing mutations were defined as new mutations.

Construction of the Tissue Tree Based on the Sharing of Heteroplasmies. Thedistance between tissue A and tissue B, based on the frequency of the alternativeallele at shared heteroplasmies, was estimated using the following formula:

DistanceðABÞ=P

NabsðfreðAÞ− freðBÞÞ

minðfreðAÞ+ freðBÞ, ð1− freðAÞÞ+ ð1− freðBÞÞN

,

where N is the total number of heteroplasmies where either tissue A or tissueB or both present the same alternative allele, fre(A) is the frequency of the

alternative allele in tissue A, and fre(B) is the frequency of the alternativeallele in tissue B. A neighbor-joining tree was estimated from the distancematrix using programs in the software package PHYLIP (evolution.genetics.washington.edu/phylip.html). TSHs were excluded from this analysis.

ACKNOWLEDGMENTS. We thank the medical staff at the Institüt fürRechtsmedizin (University of Bonn) for assistance with the sampling; theSequencing and Bioinformatics Groups in the Department of EvolutionaryGenetics (Max Planck Institute for Evolutionary Anthropology) for producingthe sequence data; A. Butthoff, E. Macholdt, and V. Lede for technical assis-tance; M. Meyer for assistance with the ddPCR experiments; and M. Mörl forhelpful discussion. This work was funded by the Institüt für Rechtsmedizin inBonn and by the Max Planck Society.

1. He Y, et al. (2010) Heteroplasmic mitochondrial DNA mutations in normal and tumourcells. Nature 464(7288):610–614.

2. Li M, et al. (2010) Detecting heteroplasmy from high-throughput sequencing ofcomplete human mitochondrial DNA genomes. Am J Hum Genet 87(2):237–249.

3. Payne BA, et al. (2013) Universal heteroplasmy of human mitochondrial DNA. HumMol Genet 22(2):384–390.

4. Ye K, Lu J, Ma F, Keinan A, Gu Z (2014) Extensive pathogenicity of mitochondrial het-eroplasmy in healthy human individuals. Proc Natl Acad Sci USA 111(29):10654–10659.

5. Chinnery PF, Hudson G (2013) Mitochondrial genetics. Br Med Bull 106:135–159.6. Greaves LC, Reeve AK, Taylor RW, Turnbull DM (2012) Mitochondrial DNA and dis-

ease. J Pathol 226(2):274–286.7. Lombès A, Auré K, Bellanné-Chantelot C, Gilleron M, Jardel C (2014) Unsolved issues

related to human mitochondrial diseases. Biochimie 100:171–176.8. Wallace DC (2012) Mitochondria and cancer. Nat Rev Cancer 12(10):685–698.9. Stoneking M (2000) Hypervariable sites in the mtDNA control region are mutational

hotspots. Am J Hum Genet 67(4):1029–1032.10. Wallace DC, Chalkia D (2013) Mitochondrial DNA genetics and the heteroplasmy

conundrum in evolution and disease. Cold Spring Harb Perspect Med 3(10):a021220.11. Samuels DC, et al. (2013) Recurrent tissue-specific mtDNA mutations are common in

humans. PLoS Genet 9(11):e1003929.12. Jenuth JP, Peterson AC, Shoubridge EA (1997) Tissue-specific selection for different

mtDNA genotypes in heteroplasmic mice. Nat Genet 16(1):93–95.13. Burgstaller JP, et al. (2014) MtDNA segregation in heteroplasmic tissues is common in

vivo and modulated by haplotype differences and developmental stage. Cell Reports7(6):2031–2041.

14. Durham SE, Samuels DC, Chinnery PF (2006) Is selection required for the accumulationof somatic mitochondrial DNA mutations in post-mitotic cells? Neuromuscul Disord16(6):381–386.

15. Chinnery PF, et al. (1999) Nonrandom tissue distribution of mutant mtDNA. Am J MedGenet 85(5):498–501.

16. Krjutškov K, et al. (2014) Tissue-specific mitochondrial heteroplasmy at position16,093 within the same individual. Curr Genet 60(1):11–16.

17. Liu VW, Zhang C, Nagley P (1998) Mutations in mitochondrial DNA accumulate dif-ferentially in three different human tissues during ageing. Nucleic Acids Res 26(5):1268–1275.

18. Shanske S, et al. (2004) Varying loads of the mitochondrial DNA A3243G mutation indifferent tissues: Implications for diagnosis. Am J Med Genet A 130A(2):134–137.

19. Thèves C, et al. (2006) Detection and quantification of the age-related point mutationA189G in the human mitochondrial DNA. J Forensic Sci 51(4):865–873.

20. Sondheimer N, et al. (2011) Neutral mitochondrial heteroplasmy and the influence of

aging. Hum Mol Genet 20(8):1653–1659.21. Williams SL, Mash DC, Züchner S, Moraes CT (2013) Somatic mtDNA mutation spectra

in the aging human putamen. PLoS Genet 9(12):e1003990.22. Fernández-Vizarra E, Enríquez JA, Pérez-Martos A, Montoya J, Fernández-Silva P

(2011) Tissue-specific differences in mitochondrial activity and biogenesis. Mito-

chondrion 11(1):207–213.23. Miwa S, Lawless C, von Zglinicki T (2008) Mitochondrial turnover in liver is fast in vivo

and is accelerated by dietary restriction: Application of a simple dynamic model.

Aging Cell 7(6):920–923.24. de Grey AD (1997) A proposed refinement of the mitochondrial free radical theory of

aging. BioEssays 19(2):161–166.25. Stewart JB, Freyer C, Elson JL, Larsson NG (2008) Purifying selection of mtDNA and its

implications for understanding evolution and mitochondrial disease. Nat Rev Genet

9(9):657–662.26. Kircher M, Sawyer S, Meyer M (2012) Double indexing overcomes inaccuracies in

multiplex sequencing on the Illumina platform. Nucleic Acids Res 40(1):e3.27. Maricic T, Whitten M, Pääbo S (2010) Multiplexed DNA sequence capture of mito-

chondrial genomes using PCR products. PLoS ONE 5(11):e14004.28. Kircher M, Stenzel U, Kelso J (2009) Improved base calling for the Illumina Genome

Analyzer using machine learning strategies. Genome Biol 10(8):R83.29. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler

transform. Bioinformatics 25(14):1754–1760.30. Andrews RM, et al. (1999) Reanalysis and revision of the Cambridge reference se-

quence for human mitochondrial DNA. Nat Genet 23(2):147.31. DePristo MA, et al. (2011) A framework for variation discovery and genotyping using

next-generation DNA sequencing data. Nat Genet 43(5):491–498.32. Li H, et al.; 1000 Genome Project Data Processing Subgroup (2009) The Sequence

Alignment/Map format and SAMtools. Bioinformatics 25(16):2078–2079.33. Li M, Stoneking M (2012) A new approach for detecting low-level mutations in next-

generation sequence data. Genome Biol 13(5):R34.34. Hindson BJ, et al. (2011) High-throughput droplet digital PCR system for absolute

quantitation of DNA copy number. Anal Chem 83(22):8604–8610.35. Zuker M (2003) Mfold web server for nucleic acid folding and hybridization pre-

diction. Nucleic Acids Res 31(13):3406–3415.36. Reva B, Antipin Y, Sander C (2011) Predicting the functional impact of protein mu-

tations: Application to cancer genomics. Nucleic Acids Res 39(17):e118.

2496 | www.pnas.org/cgi/doi/10.1073/pnas.1419651112 Li et al.

Dow

nloa

ded

by g

uest

on

Nov

embe

r 6,

202

0