Analyse comparative des génomes de primates: mais où est donc passée la sélection naturelle ?

Preview:

DESCRIPTION

Analyse comparative des génomes de primates: mais où est donc passée la sélection naturelle ?. Laurent Duret, Nicolas Galtier, Peter Arndt. ACI-IMPBIO 4-5 octobre 2007. What’s in our genome ?. 3.1 10 9 bp Repeated sequences: ~50% 20,000-25,000 protein-coding genes - PowerPoint PPT Presentation

Citation preview

Analyse comparative des génomes de primates: mais où est donc passée la sélection naturelle ?

Laurent Duret, Nicolas Galtier, Peter Arndt

ACI-IMPBIO 4-5 octobre 2007

What’s in our genome ?

• 3.1 109 bp• Repeated sequences: ~50%

• 20,000-25,000 protein-coding genes• Protein-coding regions : 1.2%

• Other functional elements in non-coding regions: 4-10%

How to identify functional elements ?

What make chimps different from us ?

• What are the functional elements responsible for adaptative evolution ?

30 106 point substitutions + indels + duplications (copy number variations)

Genome annotation by comparative genomics

• Basic principle :– Functional element <=> constrained by natural

selection– Detecting the hallmarks of selection in genomic

sequences• Negative selection (conservation)

• Positive selection (adaptation)

Evolution : mutation, selection, drift

Base modification,replication error, deletion,

insertion, ... = premutation

Mutation

DNA repair

germline

transmission to the offspring(polymorphism) Loss of the allele

Individual

Population (N)

Fixation

Su

bst

itu

tion

no transmission to the offspring

soma

Evolution : mutation, selection, drift

Probability of fixation:

p = f(s, Ne)

s : relative impact on fitness s = 0 : neutral mutation (random genetic drift) s < 0 : disadvantageous mutation = negative (purifying) selection s > 0 : advantageous mutation = positive(directional) selection

Ne : effective population size: stochastic effects of gamete sampling are stronger in small populations

|Nes| < 1 : effectively neutral mutation

Demonstrate the action of selection = reject the predictions of the neutral model

Base modification,replication error,

deletion, insertion, etc.

Mutation

Polymorphism

Individual

Population (Ne)Fixation

Su

bst

itu

tion

Substitution rate =

f(mutation rate, fixation probability)

|Nes| < 1 : substitution rate = mutation rate

Tracking natural selection ...

• Mutation rate: u

• Substitution rate: K

• Negative selection => K < u

• Neutral evolution => K = u

• Positive selection => K > u

How to estimate u ? => Use of neutral markers

Tracking natural selection ...• Synonymous substitution rate: Ks• Non-synonymous substitution rate: Ka

• Hypothesis: synonymous sites evolve (nearly) neutraly Ks ~ u

• Negative selection => Ka < Ks • Neutral evolution => Ka = Ks • Positive selection => Ka > Ks

Tracking natural selection ... is not so easy

• Patterns of neutral substitution vary along chromosomes

– Impact of molecular processes (replication, DNA-repair, transcription, recombination, …)

– Genomic environment (susceptibility to mutagens)

Mammalian genomic landscapes

• Large scale variations of base composition along chromosomes (isochores)

30

40

50

60

GC

%

0 200 400 600 800 1000kb

100 kb

Sliding windows : 20 kb, step = 2 kb

chromosome 19

chromosome 21

GC content variations affect both coding and non-coding regions

10%30%50%70%90%30%40%50%60%G+C% 3rd codon positionN=3661 genes

R2 = 0.43

DNA fragment G+C content

3661 human genes from 1652 large genomic sequences (> 50 kb; average = 134 kb). Total = 221 Mb (98% non-coding)

What is the evolutionary process responsible for these large-scale

variations in base composition ?

Variation in mutation patterns ?

• Analysis of polymorphism data: in GC-rich regions, AT->GC mutations have a higher probability of fixation than GC->AT mutations (Eyre-Walker 1999; Duret et al. 2002; Spencer et al. 2006)

Selection ?

• What could be the selective advantage confered by a single AT->GC mutations in a Mb-long genomic region ???

Biased Gene Conversion ?

Biased Gene Conversion (BGC)

If DNA mismatch repair is biased (i.e. probability of repair is not 50% in favor of each base) => BGC

Non-crossover Crossover

Molecular events of meiotic recombination

Heteroduplex DNA

T

G DNA mismatch repair

T

A

C

G

(G->A) (T->C)

BGC: a neutral process that looks like selection

• The dynamics of the fixation process for one locus under BGC is identical to that under directional selection (Nagylaki 1983)

• BGC intensity depends on:– Recombination rate– Bias in the repair of DNA mismatches– Effective population size

• GC-alleles have a higher probability of fixation than AT-alleles (Eyre-Walker 1999, Duret et al. 2002, Lercher et al. 2002, Spencer et al. 2006)

• This fixation bias in favor of GC-alleles increases with recombination rate (Spencer 2006)

Does BGC affect substitution patterns ?

• BGC should affect the relative rates of AT->GC vs GC->AT substitutions in regions of high recombination

• Relationship between neutral substitution patterns and recombinaion rate ?

Substitution patterns in the hominidae lineage

• Human, chimp, macaca whole genome alignments:– Genomicro: database of whole genome alignments– 2700 Mb (introns and intergenic regions)

• Substitutions infered by maximum likelihood approach (collaboration with Peter Arndt, Berlin)

• Substitution rates:– 4 transversion rates: A->T; C->G; A->C; C->A– 2 transition rates: A->G; G->A– transitions at CpG sites: G->A

• Cross-over rate: HAPMAP

GC-content expected at equilibrium (GC*)

• Equilibrium GC-content : the GC content that sequences would reach if the pattern of substitution remains constant over time = the future of GC-content

• Ratio of ATGC over GCAT substitution rates (taking into account CpG hypermutability)

GC-content expected at equilibrium and recombination

30%

40%

50%

60%

0 1 2 3 4 5 6 7 8 9

R2 = 36%p < 0.0001

Cross-Over Rate (cM/Mb)

EquilibriumGC-contentGC*

N = 2707 non-overlapping windows (1 Mb), from autosomes

QuickTime™ et undécompresseur TIFF (LZW)

sont requis pour visionner cette image.

GC-content and Recombination

• Strong correlation: suggests direct causal relationship

• GC-rich sequences promote recombination ? – Gerton et al. (2000), Petes & Merker (2002), Spencer et al. (2006)

• Recombination promotes ATGC substitutions ?

GC-content and recombination

N = 2707R2 = 14%p < 0.001

Cross-Over Rate (cM/Mb)

Present GC- content

40%

50%

60%

70%

0 1 2 3 4 5 6 7 8 9

QuickTime™ et undécompresseur TIFF (LZW)

sont requis pour visionner cette image.

GC-content expected at equilibrium and recombination

30%

40%

50%

60%

0 1 2 3 4 5 6 7 8 9

R2 = 36%p < 0.0001

Cross-Over Rate (cM/Mb)

EquilibriumGC-contentGC*

N = 2707 non-overlapping windows (1 Mb), from autosomes

QuickTime™ et undécompresseur TIFF (LZW)

sont requis pour visionner cette image.

Recombination and GC-content

• Recombination events: crossover + non-crossover

• Genetic maps: crossover

Non-crossover Crossover

Molecular events of meiotic recombination

=> The correlation between GC* and crossover rate might underestimate the real correlation between GC* and recombination

Evolution of GC-content: distance to telomeres

0.30

0.40

0.50

0.60

0.1 1 10 100

Distance to Telomere (Mb)

N = 2707R2 = 41%p < 0.0001

EquilibriumGC-contentGC*

GC* vs. crossover rate + distance telomeres: R2 = 53%

QuickTime™ et undécompresseur TIFF (LZW)

sont requis pour visionner cette image.

BGC: a realistic model ?

• Recombination occurs predominantly in hotspots that cover only 3% of the genome (Myers et al 2005)

• Recombination hotspots evolve rapidly (their location is not conserved between human and chimp) (Ptak et al. 2005, Winkler et al. 2005)

Can BGC affect the evolution of Mb-long isochores ?

BGC: a realistic model ?

• Probability of fixation of a AT-allele

• Probability of fixation of a GC-allele

• Effective population size N ~ 10,000• s : BGC coefficient

– Recombination hotspots: s = 1.3 10-4 (Spencer et al. 2006)

– No BGC outside hotspots: s = 0

• Hotspots density: 3% (in average), variations along chromosomes (0.05% to 10.7% )

• Pattern of mutation: constant across chromosomes

p =1− e−2s

1− e−4Ns

q =1− e2s

1− e4Ns

BGC: a realistic model ?

Crossover rate (cM/Mb)

EquilibriumGC-contentGC*

Observations Predictions of the BGC model

Summary (1)• Recombination :

– Strong impact on patterns of substitutions– drives the evolution of GC-content

• Most probably an consequence of BGC– Mutation: ! fixation bias favoring GC alleles !– Selection: ! correlation with recombination rate !– BGC: all observations fit the predictions of the

model

BGC can affect functional regions

• Fxy gene : translocated in the pseudoautosomal region (PAR) of the X chromosome in Mus musculus

X specific PAR

Recombination rate normal extreme

GC synonymous sites normal very high (55%) (90%)

Amino-acid substitutions in Fxy

Homo Rattus M. spretus M. musculus

Y

X

PAR

Y

X

PAR

0

20

80

40

60

Tim

e (M

yrs) 5’ part of Fxy : 4

2 1

0 1 0 28

3’ part of Fxy : 5

1 0

3 1

Amino-acid substitutions in Fxy

Homo Rattus M. spretus M. musculus

0

20

80

40

60

Tim

e (M

yrs) 5’ part of Fxy : 4

2 1

0 1 0 28

3’ part of Fxy : 5

1 0

3 1

28 non-synonymous substitutions, all ATGCNB: strong negative selection (Ka/Ks < 0.1)

Amino-acid substitutions in Fxy

BGC can drive the fixation of deleterious mutations

BGC: a neutral process that looks like selection

• BGC can confound selection tests

HARs: human-accelerated regions

• Pollard et al. (Nature, Plos Genet. 2006) : searching for positive selection in non-coding regulatory elements

• Identify regulatory elements that have significantly accelerated in the human lineage = HARs

Positive selection in the human lineage ?

• 49 significant HARs• HAR1: 120 bp

– Rate of evolution >> neutral rate (18 fixed substitutions in the human lineage, vs. 0.7 expected)

– Part of a non-coding RNA gene– Expressed in the brain– Involved in the evolution of human-specific brain

features ?

Positive selection ?

• GC-biased substitution pattern in HARs– HAR1: the 18 substitutions are all ATGC changes

– Known functional elements (coding or non-coding) are not GC-rich

• HAR1-5: no evidence of selective sweep (Pollard et al. 2006)

• HAR1: the accelerated region covers >1 kb, i.e. is not restricted to the functional element

Positive selection or BGC ?

• HARs are located in regions of high recombination

• Recombination occurs in hotspots (<2 kb)

• Given known parameters (population size, fixation bias), the BGC model predicts substitution hotspots within recombination hotspots

HARs = substitution hotspots caused by BGC in recombination hotspots

Conclusion (1)

GC-rich isochores = result of BGC in highly recombining parts of the genome

Recombination drives the evolution of GC-content in mammals

Probably a universal process: correlation GC / recombination in many

taxa (yeast, drosophila, nematode, paramecia, …)

Conclusion (2)

Recombination hotspots = the Achilles’ heel of our genome

BGC => substitution hotspots in recombination hotspots

Conclusion (3)Probability of fixation depends on:

- selection- drift (population size)- BGC

Extending the null hypothesis of neutral evolution: mutation + BGC

Galtier & Duret (2007) Trends Genet

Thanks

• Vincent Lombard (Génomicro)• Nicolas Galtier (Montpellier)• Peter Arndt (Berlin)

• Katherine Pollard (UC Davis)

Sex-specific effects

• Correlation GC* / crossover rate (deCODE genetic map):– male: R2 = 31% – female: R2 = 15%

• The rate of cross-over is a poor predictor of the total recombination rate in female: more variability in the ratio non-crossover / crossover along chromosomes ?

Chromosome length (Mb) Crossover rate (cM/Mb)

GC

*

Crossover rate (cM/Mb)

R2=0.84 R2=0.66

Cro

ssov

er r

ate

(cM

/Mb)

R2=0.82 R2=0.81

Human Human

Chicken Chicken

Cro

ssov

er r

ate

(cM

/Mb)

Cur

rent

GC

Chromosome length (Mb)

Chromosome size, recombination and GC-content

Recombination and GC-content: a universal relationship ?

G+C content vs. chromosome length: yeast

R2= 61%

Bradnam et al. (1999) Mol Biol Evol

G+C content vs. chromosome length: Paramecium

GC-content

Chromosome size (kb)

R2= 67%

Evolution of GC-content

• Equilibrium GC-content correlates with ...– Cross-over rate (HAPMAP): R2 = 36% – Distance to telomere: R2 = 41%

– Cross-over rate + distance telomeres: R2 = 53%

• Recombination pattern: ratio non-crossover / crossover higher near telomeres ?

Frequency distribution of GC and AT alleles

<5% 5%-15% 15%-50% >50%0

0.2

0.4

0.6

allele frequency

proportionof SNPs

GC ATGC

Distribution expected in absence of fixation bias

NB: the shape of the distribution may vary according

to population history, but should be identical for GC and AT alleles

Frequency distribution of AT and GC alleles at

silent sites• 410 SNPs with allele

frequency (Cargill et al 1999)

• Chimpanzee as an outgroup to orientate mutations

• GC alleles segregate at significantly higher frequencies than AT alleles in GC-median and GC-rich genes

<5%5%-15%15%-50%>50%00.20.40.6<5%5%-15%15%-50%>50%00.20.40.6<5%5%-15%15%-50%>50%00.20.4GC-poor genesGC-median genesGC-rich genes allele frequencyproportion of SNP's GC→ AT→ GC

Duret et al. 2002

Frequency distribution of GC and AT alleles

• Spencer (2006): analysis of HAPMAP data (SNPs from 60 unrelated individuals)

• The fixation bias in favor of GC increases near recombination hotspots

Frequency distribution of GC and AT alleles

Spencer (2006)

Average Derived Frequency

Allele AT->GC

Allele GC->AT

Allele GC->GC

Allele AT->AT

Recommended