View
214
Download
0
Category
Tags:
Preview:
Citation preview
Analyse comparative des génomes de primates: mais où est donc passée la sélection naturelle ?
Laurent Duret, Nicolas Galtier, Peter Arndt
ACI-IMPBIO 4-5 octobre 2007
What’s in our genome ?
• 3.1 109 bp• Repeated sequences: ~50%
• 20,000-25,000 protein-coding genes• Protein-coding regions : 1.2%
• Other functional elements in non-coding regions: 4-10%
How to identify functional elements ?
What make chimps different from us ?
• What are the functional elements responsible for adaptative evolution ?
30 106 point substitutions + indels + duplications (copy number variations)
Genome annotation by comparative genomics
• Basic principle :– Functional element <=> constrained by natural
selection– Detecting the hallmarks of selection in genomic
sequences• Negative selection (conservation)
• Positive selection (adaptation)
Evolution : mutation, selection, drift
Base modification,replication error, deletion,
insertion, ... = premutation
Mutation
DNA repair
germline
transmission to the offspring(polymorphism) Loss of the allele
Individual
Population (N)
Fixation
Su
bst
itu
tion
no transmission to the offspring
soma
Evolution : mutation, selection, drift
Probability of fixation:
p = f(s, Ne)
s : relative impact on fitness s = 0 : neutral mutation (random genetic drift) s < 0 : disadvantageous mutation = negative (purifying) selection s > 0 : advantageous mutation = positive(directional) selection
Ne : effective population size: stochastic effects of gamete sampling are stronger in small populations
|Nes| < 1 : effectively neutral mutation
Demonstrate the action of selection = reject the predictions of the neutral model
Base modification,replication error,
deletion, insertion, etc.
Mutation
Polymorphism
Individual
Population (Ne)Fixation
Su
bst
itu
tion
Substitution rate =
f(mutation rate, fixation probability)
|Nes| < 1 : substitution rate = mutation rate
Tracking natural selection ...
• Mutation rate: u
• Substitution rate: K
• Negative selection => K < u
• Neutral evolution => K = u
• Positive selection => K > u
How to estimate u ? => Use of neutral markers
Tracking natural selection ...• Synonymous substitution rate: Ks• Non-synonymous substitution rate: Ka
• Hypothesis: synonymous sites evolve (nearly) neutraly Ks ~ u
• Negative selection => Ka < Ks • Neutral evolution => Ka = Ks • Positive selection => Ka > Ks
Tracking natural selection ... is not so easy
• Patterns of neutral substitution vary along chromosomes
– Impact of molecular processes (replication, DNA-repair, transcription, recombination, …)
– Genomic environment (susceptibility to mutagens)
Mammalian genomic landscapes
• Large scale variations of base composition along chromosomes (isochores)
30
40
50
60
GC
%
0 200 400 600 800 1000kb
100 kb
Sliding windows : 20 kb, step = 2 kb
chromosome 19
chromosome 21
GC content variations affect both coding and non-coding regions
10%30%50%70%90%30%40%50%60%G+C% 3rd codon positionN=3661 genes
R2 = 0.43
DNA fragment G+C content
3661 human genes from 1652 large genomic sequences (> 50 kb; average = 134 kb). Total = 221 Mb (98% non-coding)
What is the evolutionary process responsible for these large-scale
variations in base composition ?
Variation in mutation patterns ?
• Analysis of polymorphism data: in GC-rich regions, AT->GC mutations have a higher probability of fixation than GC->AT mutations (Eyre-Walker 1999; Duret et al. 2002; Spencer et al. 2006)
Selection ?
• What could be the selective advantage confered by a single AT->GC mutations in a Mb-long genomic region ???
Biased Gene Conversion ?
Biased Gene Conversion (BGC)
If DNA mismatch repair is biased (i.e. probability of repair is not 50% in favor of each base) => BGC
Non-crossover Crossover
Molecular events of meiotic recombination
Heteroduplex DNA
T
G DNA mismatch repair
T
A
C
G
(G->A) (T->C)
BGC: a neutral process that looks like selection
• The dynamics of the fixation process for one locus under BGC is identical to that under directional selection (Nagylaki 1983)
• BGC intensity depends on:– Recombination rate– Bias in the repair of DNA mismatches– Effective population size
• GC-alleles have a higher probability of fixation than AT-alleles (Eyre-Walker 1999, Duret et al. 2002, Lercher et al. 2002, Spencer et al. 2006)
• This fixation bias in favor of GC-alleles increases with recombination rate (Spencer 2006)
Does BGC affect substitution patterns ?
• BGC should affect the relative rates of AT->GC vs GC->AT substitutions in regions of high recombination
• Relationship between neutral substitution patterns and recombinaion rate ?
Substitution patterns in the hominidae lineage
• Human, chimp, macaca whole genome alignments:– Genomicro: database of whole genome alignments– 2700 Mb (introns and intergenic regions)
• Substitutions infered by maximum likelihood approach (collaboration with Peter Arndt, Berlin)
• Substitution rates:– 4 transversion rates: A->T; C->G; A->C; C->A– 2 transition rates: A->G; G->A– transitions at CpG sites: G->A
• Cross-over rate: HAPMAP
GC-content expected at equilibrium (GC*)
• Equilibrium GC-content : the GC content that sequences would reach if the pattern of substitution remains constant over time = the future of GC-content
• Ratio of ATGC over GCAT substitution rates (taking into account CpG hypermutability)
GC-content expected at equilibrium and recombination
30%
40%
50%
60%
0 1 2 3 4 5 6 7 8 9
R2 = 36%p < 0.0001
Cross-Over Rate (cM/Mb)
EquilibriumGC-contentGC*
N = 2707 non-overlapping windows (1 Mb), from autosomes
QuickTime™ et undécompresseur TIFF (LZW)
sont requis pour visionner cette image.
GC-content and Recombination
• Strong correlation: suggests direct causal relationship
• GC-rich sequences promote recombination ? – Gerton et al. (2000), Petes & Merker (2002), Spencer et al. (2006)
• Recombination promotes ATGC substitutions ?
GC-content and recombination
N = 2707R2 = 14%p < 0.001
Cross-Over Rate (cM/Mb)
Present GC- content
40%
50%
60%
70%
0 1 2 3 4 5 6 7 8 9
QuickTime™ et undécompresseur TIFF (LZW)
sont requis pour visionner cette image.
GC-content expected at equilibrium and recombination
30%
40%
50%
60%
0 1 2 3 4 5 6 7 8 9
R2 = 36%p < 0.0001
Cross-Over Rate (cM/Mb)
EquilibriumGC-contentGC*
N = 2707 non-overlapping windows (1 Mb), from autosomes
QuickTime™ et undécompresseur TIFF (LZW)
sont requis pour visionner cette image.
Recombination and GC-content
• Recombination events: crossover + non-crossover
• Genetic maps: crossover
Non-crossover Crossover
Molecular events of meiotic recombination
=> The correlation between GC* and crossover rate might underestimate the real correlation between GC* and recombination
Evolution of GC-content: distance to telomeres
0.30
0.40
0.50
0.60
0.1 1 10 100
Distance to Telomere (Mb)
N = 2707R2 = 41%p < 0.0001
EquilibriumGC-contentGC*
GC* vs. crossover rate + distance telomeres: R2 = 53%
QuickTime™ et undécompresseur TIFF (LZW)
sont requis pour visionner cette image.
BGC: a realistic model ?
• Recombination occurs predominantly in hotspots that cover only 3% of the genome (Myers et al 2005)
• Recombination hotspots evolve rapidly (their location is not conserved between human and chimp) (Ptak et al. 2005, Winkler et al. 2005)
Can BGC affect the evolution of Mb-long isochores ?
BGC: a realistic model ?
• Probability of fixation of a AT-allele
• Probability of fixation of a GC-allele
• Effective population size N ~ 10,000• s : BGC coefficient
– Recombination hotspots: s = 1.3 10-4 (Spencer et al. 2006)
– No BGC outside hotspots: s = 0
• Hotspots density: 3% (in average), variations along chromosomes (0.05% to 10.7% )
• Pattern of mutation: constant across chromosomes
€
p =1− e−2s
1− e−4Ns
€
q =1− e2s
1− e4Ns
BGC: a realistic model ?
Crossover rate (cM/Mb)
EquilibriumGC-contentGC*
Observations Predictions of the BGC model
Summary (1)• Recombination :
– Strong impact on patterns of substitutions– drives the evolution of GC-content
• Most probably an consequence of BGC– Mutation: ! fixation bias favoring GC alleles !– Selection: ! correlation with recombination rate !– BGC: all observations fit the predictions of the
model
BGC can affect functional regions
• Fxy gene : translocated in the pseudoautosomal region (PAR) of the X chromosome in Mus musculus
X specific PAR
Recombination rate normal extreme
GC synonymous sites normal very high (55%) (90%)
Amino-acid substitutions in Fxy
Homo Rattus M. spretus M. musculus
Y
X
PAR
Y
X
PAR
0
20
80
40
60
Tim
e (M
yrs) 5’ part of Fxy : 4
2 1
0 1 0 28
3’ part of Fxy : 5
1 0
3 1
Amino-acid substitutions in Fxy
Homo Rattus M. spretus M. musculus
0
20
80
40
60
Tim
e (M
yrs) 5’ part of Fxy : 4
2 1
0 1 0 28
3’ part of Fxy : 5
1 0
3 1
28 non-synonymous substitutions, all ATGCNB: strong negative selection (Ka/Ks < 0.1)
Amino-acid substitutions in Fxy
BGC can drive the fixation of deleterious mutations
BGC: a neutral process that looks like selection
• BGC can confound selection tests
HARs: human-accelerated regions
• Pollard et al. (Nature, Plos Genet. 2006) : searching for positive selection in non-coding regulatory elements
• Identify regulatory elements that have significantly accelerated in the human lineage = HARs
Positive selection in the human lineage ?
• 49 significant HARs• HAR1: 120 bp
– Rate of evolution >> neutral rate (18 fixed substitutions in the human lineage, vs. 0.7 expected)
– Part of a non-coding RNA gene– Expressed in the brain– Involved in the evolution of human-specific brain
features ?
Positive selection ?
• GC-biased substitution pattern in HARs– HAR1: the 18 substitutions are all ATGC changes
– Known functional elements (coding or non-coding) are not GC-rich
• HAR1-5: no evidence of selective sweep (Pollard et al. 2006)
• HAR1: the accelerated region covers >1 kb, i.e. is not restricted to the functional element
Positive selection or BGC ?
• HARs are located in regions of high recombination
• Recombination occurs in hotspots (<2 kb)
• Given known parameters (population size, fixation bias), the BGC model predicts substitution hotspots within recombination hotspots
HARs = substitution hotspots caused by BGC in recombination hotspots
Conclusion (1)
GC-rich isochores = result of BGC in highly recombining parts of the genome
Recombination drives the evolution of GC-content in mammals
Probably a universal process: correlation GC / recombination in many
taxa (yeast, drosophila, nematode, paramecia, …)
Conclusion (2)
Recombination hotspots = the Achilles’ heel of our genome
BGC => substitution hotspots in recombination hotspots
Conclusion (3)Probability of fixation depends on:
- selection- drift (population size)- BGC
Extending the null hypothesis of neutral evolution: mutation + BGC
Galtier & Duret (2007) Trends Genet
Thanks
• Vincent Lombard (Génomicro)• Nicolas Galtier (Montpellier)• Peter Arndt (Berlin)
• Katherine Pollard (UC Davis)
Sex-specific effects
• Correlation GC* / crossover rate (deCODE genetic map):– male: R2 = 31% – female: R2 = 15%
• The rate of cross-over is a poor predictor of the total recombination rate in female: more variability in the ratio non-crossover / crossover along chromosomes ?
Chromosome length (Mb) Crossover rate (cM/Mb)
GC
*
Crossover rate (cM/Mb)
R2=0.84 R2=0.66
Cro
ssov
er r
ate
(cM
/Mb)
R2=0.82 R2=0.81
Human Human
Chicken Chicken
Cro
ssov
er r
ate
(cM
/Mb)
Cur
rent
GC
Chromosome length (Mb)
Chromosome size, recombination and GC-content
Recombination and GC-content: a universal relationship ?
G+C content vs. chromosome length: yeast
R2= 61%
Bradnam et al. (1999) Mol Biol Evol
G+C content vs. chromosome length: Paramecium
GC-content
Chromosome size (kb)
R2= 67%
Evolution of GC-content
• Equilibrium GC-content correlates with ...– Cross-over rate (HAPMAP): R2 = 36% – Distance to telomere: R2 = 41%
– Cross-over rate + distance telomeres: R2 = 53%
• Recombination pattern: ratio non-crossover / crossover higher near telomeres ?
Frequency distribution of GC and AT alleles
<5% 5%-15% 15%-50% >50%0
0.2
0.4
0.6
allele frequency
proportionof SNPs
GC ATGC
Distribution expected in absence of fixation bias
NB: the shape of the distribution may vary according
to population history, but should be identical for GC and AT alleles
Frequency distribution of AT and GC alleles at
silent sites• 410 SNPs with allele
frequency (Cargill et al 1999)
• Chimpanzee as an outgroup to orientate mutations
• GC alleles segregate at significantly higher frequencies than AT alleles in GC-median and GC-rich genes
<5%5%-15%15%-50%>50%00.20.40.6<5%5%-15%15%-50%>50%00.20.40.6<5%5%-15%15%-50%>50%00.20.4GC-poor genesGC-median genesGC-rich genes allele frequencyproportion of SNP's GC→ AT→ GC
Duret et al. 2002
Frequency distribution of GC and AT alleles
• Spencer (2006): analysis of HAPMAP data (SNPs from 60 unrelated individuals)
• The fixation bias in favor of GC increases near recombination hotspots
Frequency distribution of GC and AT alleles
Spencer (2006)
Average Derived Frequency
Allele AT->GC
Allele GC->AT
Allele GC->GC
Allele AT->AT
Recommended