58
Biased Biased Substitutions in Substitutions in the Human Genome: the Human Genome: Sex, Gambling and Sex, Gambling and Non-Darwinian Evolution Non-Darwinian Evolution Tim Dreszer, Katie Pollard and David Haussler

Biased Substitutions in the Human Genome: Sex, Gambling and Non-Darwinian Evolution Tim Dreszer, Katie Pollard and David Haussler

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Biased Substitutions in the Biased Substitutions in the Human Genome:Human Genome: Sex, Gambling and Sex, Gambling and

Non-Darwinian EvolutionNon-Darwinian Evolution

Tim Dreszer, Katie Pollard and David Haussler

Dreszer T. R., Wall G.D., Haussler D. and Pollard K.S.

Biased Substitutions in the Human Biased Substitutions in the Human Genome: Genome:

The Footprints of Male-Driven The Footprints of Male-Driven Biased Gene ConversionBiased Gene Conversion

Genome Res. 17:1420-1430 Epub: September 4, 2007. ISSN 1088-9051/07.

Fastest Evolving Regions of the Fastest Evolving Regions of the Human Genome Show a Human Genome Show a

Surprising “Bias”Surprising “Bias”

In Top 4 Regions:AT pair replaced by GC or CG pair 33 timesGC replaced by AT pair only once

1 Pollard K.S., Salama S.R., King B., Kern A., Dreszer T., Katzman S., Siepel A., Pedersen J., Bejerano G., Baertsch R., Rosenbloom K.R., Kent J. and Haussler D. Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics. 2(10):e168 Oct. 13, 2006.

Where this Began:

[1]

Initial TermsInitial Terms

This work finds surprising biased substitution patterns.

“Bias” here specifically refers to “Weak to Strong” SNPs or substitutions.

AT pairs bond with 2 hydrogen bonds and are here referred to as “Weak” Pairs

GC bond with 3 and are referred to as “Strong” Pairs

SNPs are single nucleotide polymorphisms while “substitutions” are single pair changes that have been “fixed” in the genome.

Large Scale Bias: IsochoresLarge Scale Bias: Isochores

In warm blooded vertebrate genomes, regions as large as 300 kilobases dubbed “isochores” can be strikingly higher or lower in GC content.[2]

Isochores stretch across conserved and non-conserved sequences.

The content of GC in genes is correlated with the isochores they are within[3], and one study suggests that the genes may lead the selection.[4]

2 Bernardi G, Olofsson B, Filipski J, Zerial M, Salinas J, Cuny G, Meunier-Rotival M, Rodier F. May 24, 1985. The mosaic genome of warm-blooded vertebrates. Science. 228(4702):953-8. 3 Bernardi G, Olofsson B, Filipski J, Zerial M, Salinas J, Cuny G, Meunier-Rotival M, Rodier F. The mosaic genome of warm-blooded vertebrates. Science. 228(4702):953-8. May 24, 1985.4 Press W.H. and Robins H. Oct. 2006. Isochores Exhibit Evidence of Genes Interacting With the Large-Scale Genomic Environment. Genetics, 174:1029-1040.

Three Possible Causes of Current Three Possible Causes of Current Human Bias and of Isochores Human Bias and of Isochores

Mutation Bias: variation in mutation rates in different regions of the genome.[6]

Natural Selection (fitness selection) for GC alleles may have driven isochore formation[7], and may be behind local GC content as well.

Biased Gene Conversion (BGC) may result in a pressure that pushes GC pairs to fixation at recombination hot spots.[8]

[5]

5 Eyre-Walker A. and Hurst L.D. July 2001. The evolution of isochores. Nat Rev Genet. 2(7):549-55.6 Sueoka N. April 1988. Directional mutation pressure and neutral molecular evolution. Proc. Natl. Acad. Sci. 85(8):2653-7.7 Bernardi G, Bernardi G. 1986. Compositional constraints and genome evolution. J. Mol Evol. 24(1-2):1-11.8 Eyre-Walker A. June 22 1993. Recombination and mammalian genome evolution. Proc Biol Sci. 252(1335):237-43.

Mismatched SNP Repair in Heteroduplex During Recombination

T GGCTGTAGATCGTTG ACGTA GATTACGTCGTCGACATCTAGCAAT TGCAT CTAATGCAGCA C A

Both mismatches are converted to “strong” G-C pairs, replacing “weak” SNPs.

Biased Gene Conversion:Biased Gene Conversion:

9 Brown, T. C., and J. Jiricny. 1988. Different base/base mispairs are corrected with different efficiencies and specificities in monkey kidney cells. Cell 54:705–711.

[9]

Distinguishing between Distinguishing between the Three Modelsthe Three Models

Mutation bias should result in similar patterns of bias between SNPs and substitutions.[10]

Natural Selection may result in a correlation between biased substitutions and conservation.

BGC may result in a correlation between biased substitutions and current recombination hot spots or rates.[11]

BGC should be most easily recognized in clusters of closely spaced substitutions. Clusters are not, however, inconsistent with fitness selection.

10 Lercher M.J., Urrutia A.O., Pavlícek A. and Hurst L.D. 2003. A unification of mosaic structures in the human genome. Hum. Mol. Genetics, 12(19):2411-2415. 11 Kong, A., Gudbjartsson, D.F., Sainz, J., Jonsdottir, G.M., Gudjonsson, S.A., Richardsson, B., Sigurdardottir, S., Barnard, J., Hallbeck, B., Masson, G., Shlien, A., Palsson, S.T., Frigge, M.L., Thorgeirsson, T.E., Gulcher, J.R., and Stefansson, K. . 2002. A high-resolution recombination map of the human genome, Nature Genetics, 31(3):241-247.

Mutation Bias?Mutation Bias?

SNPs in Humans (hg17) 3,424,895

Weak to Strong SNPs 1,368,922 ( 39.97 %)

Strong to Weak SNPs 1,506,024 ( 43.97 %)

Substitutions in Humans (hg17) 10,871,681

Weak to Strong 4,685,494 ( 43.10% )

Strong to Weak 4,650,554 ( 42.78% )

Bias by Window: G+C ContentBias by Window: G+C Content

Windows: Bias by Conservation?Windows: Bias by Conservation?

Windows: Bias At Hotspots?Windows: Bias At Hotspots?

Bias Near Telomeres?Bias Near Telomeres?

Bias by Substitution DensityBias by Substitution Density

Bias as a Social Disease: ZippersBias as a Social Disease: Zippers

Bias as a Social Disease: HeatBias as a Social Disease: Heat

Biased Gang Members are Recruited Biased Gang Members are Recruited From Unbiased IndividualsFrom Unbiased Individuals

SNPs:

Substitutions:

Mutation Bias is Rejected.Mutation Bias is Rejected.

Assuming the rate of mutation and fixation have not changed over the last 6my, biased mutations are fixed in the genome in greater proportion than they occur.

When examining clusters of biased differences, the evidence is strikingly against the biased mutation hypothesis.

Clusters are predicted by the BGC hypothesis but are not contrary to a natural selection model.

The Story is in the Data, The Story is in the Data, but How to Look At It?but How to Look At It?

Individual Weak to Strong Substitutions don’t show the story well: too much noise.

Comparing windows of fixed size results in comparing apples to oranges: a cluster of 3 with 2 WtoS compared to a cluster of 7 with 7 WtoS.

What is Needed: A measure of the degree to which a single substitution is biased and clustered.

With such a measure, mapping where this phenomenon occurs might be revealing.

Geography of a ChromosomeGeography of a Chromosome

UBCS or UC BS?UBCS or UC BS? Clusters are 5 or more substitutions within 300bp. Biased Clusters are at least 80% Weak to Strong substitutions. Biased Clustered Substitutions (BCS) are substitutions within biased

clusters. Expected BCS is the binomial probability of BCS within each 1mbp

bin.

Actual BCS – Expected BCS = Unexpected BCS

UBCS is Unexpected Biased Clusters Substitutions

Geography of a Chromosome Geography of a Chromosome Take 2Take 2

UBCS is PredictableUBCS is Predictable

UBCS Rises Near the Telomeres of UBCS Rises Near the Telomeres of All Human AutosomesAll Human Autosomes

Chromosome 5 is TypicalChromosome 5 is Typical

Chromosome Y is an ExceptionChromosome Y is an Exception

No Signal on Chromosome Y is what would be expected if Biased Gene Conversion is the Driving

force of UBCS.

Recombination doesn’t occur on chromosome Y.

Fitness Selection isn’t ruled out, but it doesn’t

predict a missing signal on Y.

Chromosome X is an EnigmaChromosome X is an Enigma

Is it Just that Sex is Mysterious?Is it Just that Sex is Mysterious?

The BGC model doesn’t predict a missing signal on X: Chromosome X recombines so there should be a signal.

Natural Selection doesn’t predict the missing signal either. However, there may be fitness based reasons for selecting GC SNPs near the telomeres of autosomes that do not apply to the sex chromosomes.

Is there a Clue in the Pseudo-Is there a Clue in the Pseudo-Autosomal Regions?Autosomal Regions?

Fun with CorrelationsFun with Correlations

Conservative Bias?Conservative Bias?

Recruitment?Recruitment?

Recombination Hot SpotsRecombination Hot Spots

Recombination RateRecombination Rate

Males are the Troublemakers!Males are the Troublemakers!

Rank TroublemakersRank Troublemakers

History vs. GeographyHistory vs. Geography

BGC Model is AcceptedBGC Model is Accepted While fitness selection cannot be ruled out, there is no

process known that would explain a fitness advantage of increasing the GC content of mega-base regions by selecting localized clusters of GC SNPs.

Fitness selection cannot explain the correlation with recombination rates or the lack of correlation with conserved regions.

Biased Gene Conversion explains all the observed data:Clusters of biased substitutions within 300 bases of each other.Selection of Biased Substitutions from Unbiased SNPsCorrelation of UBCS with recombination rate.Lack of signal on the Y chromosome.Lack of signal on the X chromosome.Even the correlation of UBCS with GC content makes sense.

Why is this Striking?Why is this Striking?

These datasets are mutually exclusive! They represent a pattern in substitutions occurring since humans and chimps diverged approximately 6mya.

UBCS Rises Near the Telomeres of UBCS Rises Near the Telomeres of All Chimp AutosomesAll Chimp Autosomes

UBCS Signal is Remarkably Similar UBCS Signal is Remarkably Similar Between Human and Chimp GenomesBetween Human and Chimp Genomes

Stable BCS Accumulation is RevealingStable BCS Accumulation is Revealing Moderate correlation between current recombination hotspots but

strong correlation with male recombination rates agrees with models of hot spots moving[12,13] while regional recombination rates remain steady.[14]

The similarity of human and chimp UBCS profiles attests to a stable force across 12 my of genetic divergence.

The highly localized bias suggests an explanation for the origin of isochores. The telomeres of autosomes may be Duret’s GC factories[15], allowing the build up of isochores over millions of years.

The borders between high and low GC regions seen today may represent the historical record of chromosomal rearrangements.

12 Pineda-Krch M. and Redfield R.J. April 2005. Persistence and Loss of Meiotic Recombination Hotspots. Genetics, 169:2319-2333.13 Winckler W., Myers S.R., Richter D.J., Onofrio R.C., McDonald G.J., Bontrop R.E., McVean G.A.T., Gabriel S.B., Reich D., Donnelly P., Altshuler D. April 1, 2005. Comparison of Fine-Scale Recombination Rates in Humans and Chimpanzees. Science 308(5718):107-111.14 Myers S, Bottolo L, Freeman C, McVean G, Donnelly P. Oct. 14 2005. A fine-scale map of recombination rates and hotspots across the human genome. Science. 310(5746):321-4.15 Duret L., Eyre-Walker A. and Galtier N. Aug. 2006. A new perspective on isochore evolution. Gene 385:71–74.

Chromosome 2 is an ExceptionChromosome 2 is an Exception

Current Recombination Rate is not Current Recombination Rate is not High in Zone of FusionHigh in Zone of Fusion

Two Autosomes on a DateTwo Autosomes on a DateAssuming: 1. The internal peak was created while the regions were still

telomeric. 2. Since fusion the region is no longer accumulating UBCS. 3. The force creating UBCS has been constant over the past

6my. Then the fusion might be dated.

Using the ability to predict the UBCS signal at human telomeres from the UBCS signal at chimpanzee telomeres, then the predicted height minus the actual height of the chr2 peak may proportional to the time that has elapsed since the fusion.

Estimated Fusion Date is 740,000 years ago with a CI95 of no more than 2.71 mya.

Chromosome 2 Fusion DatingChromosome 2 Fusion Dating

Non-Darwinian SelectionNon-Darwinian Selection BGC acts as a selection pressure[16], separate from

fitness. It selects GC SNPs over AT SNPs with enough pressure that some of them are fixed into the genome.

While the individual SNPs may have already been tested as not too harmful, a newly selected cluster may be a novel allele never before seen.

If a single point mutation is far more likely to be harmful than beneficial, what about a cluster of them?

BGC selection can be expected to accelerate positive selection.

BGC selection can also be expected to compete with and slow negative selection.

16 Nagylaki T. Oct. 1983. Evolution of a finite population under gene conversion. Proc. Natl. Acad. Sci. USA. 80(20):6278–6281.

Non-Darwinian EvolutionNon-Darwinian Evolution Take 1Take 1Despite a lack of correlation between UBCS and transcription density genome-wide, the most extremely biased regions of the genome contain a disproportionate number of genes.

Non-Darwinian Evolution has Non-Darwinian Evolution has Sculpted HumansSculpted Humans

Of the 10 top scoring regions of biased clustered substitutions 4 are involved in brain development or function! Eight of 10 are transcribed, while the other 2 are predicted genes, transcribed in mammals.

Non-Darwinian EvolutionNon-Darwinian Evolution Take 2Take 2

Although there is no genome-wide SNP bias, some extremely

biased clusters of SNPs do exist. Two of the top five regions occur in genes associated with

cancer in humans (e.g., SERINC1[17] and CSMD1[18]). The region with the most biased SNPs (8 within 148 bp) falls in

the intron of a gene required for pain perception.[19] These data suggest that the force leading to the fixation of

clusters of biased changes is still active and may represent cases where BGC is competing with purifying selection.

17 Zhang M., Yu L., Wu Q., Zheng L.H., Wei Y.H., Wan B., Zhao S.Y. July 2003. Identification and characterization of TDE2, a plasma-membrane protein with 11 transmembrane helices, and its variable expression in human lung cancer and liver cancer tissues.

18 Scholnick S.B., Richter T.M. 2003. The role of CSMD1 in head and neck carcinogenesis. Genes Chromosomes Cancer 38(3):281-283.

19 Kim E, Cho KO, Rothschild A, Sheng M. July 1996. Heteromultimerization and NMDA receptor-clustering activity of Chapsyn-110, a member of the PSD-95 family of proteins. Neuron. 17(1):103-13.

Association of BGC with Male but not Association of BGC with Male but not Female Meiosis is ProvocativeFemale Meiosis is Provocative

It is well known that mutation rates are higher in males than females[20, 21], which has been dubbed “male driven evolution”.[22]

While the higher mutation rate is attributed to the greater number of cell divisions in male germ cells, this does not explain the full increase.[23, 24, 25]

It is not obvious why or how the dangers of BGC would be tolerated in males while avoided in females.

20 Goetting-Minesky MP, Makova KD. Sep. 4, 2006. Mammalian Male Mutation Bias: Impacts of Generation Time and Regional Variation in Substitution Rates. J Mol Evol. [Epub ahead of print].21 Crow JF. 1993. How much do we know about spontaneous human mutation rates? Environ. Mol. Mutagen. 21(4):389. 22 Li WH, Yi S, Makova K. Dec. 2002. Male-driven evolution. Curr. Opin. Genet. Dev. 12(6):650-6.23 Lercher M.J., Williams E.J., Hurst L.D. Nov. 2001. Local similarity in evolutionary rates extends over whole chromosomes in human-rodent and mouse-rat comparisons: implications for understanding the mechanistic basis of the male mutation bias. Mol Biol Evol. 18(11):2032-9. 24 Filatov D.A., Charlesworth D. June 2002. Substitution rates in the X- and Y-linked genes of the plants, Silene latifolia and S. dioica. Mol. Biol. Evol. 19(6):898-907. 25 Gaffney D.J., Keightley P.D. Aug. 2005. The scale of mutational variation in the murid genome. Genome Res. 15(8):1086-94. Epub 2005 Jul 15.

““Wanna Get Lucky?”Wanna Get Lucky?”A Male Reproductive Strategy?A Male Reproductive Strategy?

Females reproductive strategy treats every single gamete as precious, since each has a high probability of becoming a child. Females guarantee one good copy of the genome.

Male gametes have an extremely low probability of success: there are millions per mating.

This allows a male strategy of rolling the dice in the form of mutations and BGC.

While most genetic changes are benign and some are harmful, one in a million will be beneficial.

“Sperm Selection”[26] or intra-mating sperm competition allows for testing the gametes in order to select that one in a million. Thus each mating is an evolutionary experiment!

Warning: the sperm selection strategy requires that new mutations are transcribed and tested.

26 Holt W.V., Van Look K.J. May 2004. Concepts in sperm heterogeneity, sperm selection and sperm competition as biological foundations for laboratory tests of semen quality. Reproduction. 127(5):527-35.

Transcription Associated RecombinationTranscription Associated Recombination It has been proposed that transcription-associated

recombination (TAR)[27, 28, 29, 30] might be driving BGC[31]. If this were true, then the demands of gamete generation

would make BGC hard to avoid in males, while the “maternal effect”[32] would explain how BGC is avoided in females.

Finally, sperm selection might mitigate some of the dangers of new biased cluster alleles.

27 Aguilera A., 2002. The connection between transcription and genomic instability. EMBO J. 21:195–201.28 Prado F., Piruat J.I. and Aguilera A. 1997. Recombination between DNA repeats in yeast hpr1Delta cells is linked to transcription elongation. The EMBO Journal 16:2826–2835.29 Bell S.J, Chow,Y.C., Ho,J.Y. and Forsdyke,D.R. 1998. Correlation of chi orientation with transcription indicates a fundamental relationship between recombination and transcription. Gene, 216:285–292.30 Nickoloff J.A. and Reynolds R.J. 1990. Transcription stimulates homologous recombination in mammalian cells. Mol. Cell. Biol. 10:4837–4845.31 Vinogradov A.E. Sept. 1, 2003. Isochores and tissue-specificity. Nucleic Acids Res. 31(17):5212–5220.32 Dobzhansky T. Maternal Effect as a Cause of the Difference between the Reciprocal Crosses in Drosophila Pseudoöbscura. Proc Natl Acad Sci U S A. 1935 Jul;21(7):443-6.

Loose EndsLoose Ends

The TAR model explains the finding that the most biased regions are disproportionately transcribed.

If sub-telomeric regions are GC factories building isochores, then the TAR model explains why Genes appear to lead the accumulation of GC in GC

rich isochores.[33] Widely expressed housekeeping genes are more often

found in GC rich isochores than are tissue specific genes.[34]

33 Press W.H. and Robins H. Oct. 2006. Isochores Exhibit Evidence of Genes Interacting With the Large-Scale Genomic Environment. Genetics, 174:1029-1040.34 Vinogradov A.E. Sept. 1, 2003. Isochores and tissue-specificity. Nucleic Acids Res. 31(17):5212–5220.

Sex, Gambling and Sex, Gambling and Non-Darwinian EvolutionNon-Darwinian Evolution

BCS are the Footprints of Male Driven BGC. BGC is a Non-Darwinian selective pressure that leads

to faster evolution but also can slow purifying selection.

Female reproductive strategy goes to extra effort to ensure the genome is protected.

Male reproductive strategy makes each gamete a roll of the dice, while each mating is an evolutionary experiment.

BGC is a stable force that has sculpted human and chimpanzee evolution.

To Do:• Other Species? How widespread?

Primates, Placental Mammals? Marsupials? Birds?

• Neanderthals and Chr2?• Duret’s challenge? • What is going on in PAR?• Wet lab work to tie down TAR -> BGC.

To See:http://www.soe.ucsc.edu/research/compbio/ubcs/

ThanksThanks David Haussler for taking me into his lab and putting me to work

on something so interesting. Katie Pollard for patiently explaining to me all the things I

should already have known, and putting up with my limitations. The Genome Browser and all contributors. Clint, some lab macaque and Watson and Crick Greg Wall for statistical work. Daryl Thomas for generating some of the datasets. Laurent Duret for excellent feedback. Bill Press for taking an interest. My wife for supporting me throughout, my son for being my

sounding board and my daughter for preventing me from finishing before the best discoveries were made.