Molecular evolution Part I: The evolution of macromolecules

Embed Size (px)

DESCRIPTION

Molecular Evolution AGAMOUS; transcription factor [ Arabidopsis thaliana ] What information can DNA sequences give us? Evaluating the role of drift/demography vs. selection on trait divergence. Identify function. Looking at genes whose evolutionary history was shared. www.ncbi.nlm.nih.gov Biol336-12

Citation preview

Molecular evolution Part I: The evolution of macromolecules.
Part II:The reconstruction of the evolutionary history of genes and organisms. Molecular evolution encompasses two areas of study: The evolution of macromolecules: the rates and patterns of change in the genetic material (DNA sequences) and in the encoded products proteins The evolutionary history of genes and organisms Biol336-12 Molecular Evolution AGAMOUS; transcription factor [ Arabidopsis thaliana ] What information can DNA sequences give us? Evaluating the role of drift/demography vs. selection on trait divergence. Identify function.Looking at genes whose evolutionary history wasshared. Biol336-12 Molecular Evolution 3D Protein Structure of Human proinsulin 1952: Frederick Sanger and coworkers determine the complete amino acid sequence of insulin. MALWMRLLPLLALLALWGPDPAAAFVNQHLCG This field has its roots in two separate disciplines: population genetics and molecular biology.Population genetics provides the theoretical background and molecular biology provides the empirical data. The first complete sequence of a protein (insulin) was determined in 1952 by F.Sanger and colleagues. Munte et al FEBS J Biol336-12 How and why have molecular sequences evolved to be the way they are?
Molecular Evolution How and why have molecular sequences evolved to be the way they are? Biol336-12 Molecular Evolution Learning Objectives:
Variability within a population Subsitution rates Neutral Theory Detecting selection at the DNA level. Biol336-12 Molecular Evolution REVIEW: NUCLEOTIDE SUBSITUTIONS
LysAla Leu ValLeu Leu AgAt145 AAG GCA CTG GTC CTG TTG AgAt134 AAAGCA CTG GTC CTC TTG SepAt145 AGG GCA CTG GTC CTG GTG SepAt134 AGG GCA CTG GTC CTG GTG CalAt145 AAG CTG TTC CTG TTG CalAt134 AAG CTG TTC CTG TTG Here are some sequences taken from Arabidopsis thaliana two individuals 145 and These are the MADS BOX genes important for flower development.On the left is the identifier =name of gene, species, individual Bolded in black is the reference individual.We will compare this sequence to the others. Btwn AgAt145 and AgAt134 we have two changes in the third position.Both are synonymous change because the amino acid stay the same.The first is a transitions because G->A (a change from one purine to another) but the second case G->C (from purine to pyrimidine to another) is an example of a transversion. Comparing AgAt145 to Sep1134 and Sep1145 A->G (a transition)and T->G (a transversion) but this time lysine changes to arginine in the first and in the second case leucine changes to valine both are nonsynonymous changes because they result in the amino acid changing. Comparing AgAt145 to CalAt145 and CalAt134:we see there is the deletion of a codon creating a gap. But there is also a change from G->T (transversion) resulting in a nonsynonymous change = valine changes to phenylalanine. GENE SPECIES Biol336-12 Molecular Evolution What happens after a mutation arises in the DNA sequence at a locus? Polymorphism:mutant allele is one of -several present in population. Substitution:the mutant allele fixes in the population.(New mutations at other nucleotides may occur later.) Mutations occur-sometimes they are the result of DNA replication errors or errors in DNA repair leading to the changes in nucleotides, however, sometimes the changes are larger creating deletions and insertions. When we ask the question HOW and WHY have molecular sequences evolved the way they do?We are really interested in knowing once a mutation arises what happens to it?Does it remain in the population as one of several mutant alleles? In which case we consider this a polymorphism. If on the other hand it fixes in a population, then the change is considered asubsitution. Biol336-12 Molecular Evolution 0 aaat aaat aaat aaat aaat aaat aaat
10aaat aaat aaataaatacataaataaat 20aaat aaat acataaatacatacatacat 30acat acat acatacatacatacatacat 40acat acat actt acatacatacatacat Generation new mutation polymorphism Generation 30 mutation fixed substitution Generation 40 new mutation polymorphism L1 L2 L3 L4 L5 L6 L7 Time (generations) At the start, time 0 generation, everyone in the population is aaat.As time passes a mutation arises in individual 5.Now there are two different sequences segregating in the population = polymorphism.This polymorphism is present in several individuals within the population and hangs around for generations 10 to about 29.Finally in generation 30 it is fixed every individual in the population has a c at this second nt. This is the same as the frequency of p reaching 1. Again in generation 40, a new mutation arises and we have a polymorphism again. Biol336-12 Molecular Evolution Imagine that five sequences are obtained from each of two species, and that the sequences are related to each other as shown here. Any mutation that happens on a red branch will appear as a polymorphism within species one Biol336-12 Molecular Evolution Any mutation that happens on a red branch will appear as a polymorphism within species 1. Any mutation that happens on a blue branch will appear as a polymorphism within species 2. Any mutation that happens on the green branch will appear as a fixed difference between the species within species between species Here the phylogeny is divided into two parts:between species branches and within species branches. Within species branches connect all the alleles within each species to their most recent common ancestor. Between species branches connect these common ancestors to the common ancestor of the whole phylogeny. A mutation on a between species branch will appear in all the descendant alleles and thus will be a fixed difference between species.A mutation on a within species branch will be a polymorphism within a species Biol336-12 Molecular Evolution What happens after a mutation arises in the DNA sequence at a locus? Polymorphism:mutant allele is one of -several present in population. Substitution:the mutant allele fixes in the population.(New mutations at other nucleotides may occur later.) Biol336-12 Molecular Evolution Substitution rate:the rate at which mutant alleles rise to fix within a lineage By comparing DNA sequences from different organisms, we can estimate the rate at which mutations appear and fix, causing basepair substitutions. Biol336-12 Molecular Evolution How many selectively neutral mutants reach fixation per unit time? Neutral mutations occur at a rate, per locus per generation. In a diploid population at a particular locus, there are 2N alleles. The number of mutants arising every generation at a givenlocus in a diploid population of size N is The probability of fixation of selectively neutral allele? Thus, the substitution rate for neutral alleles is 2N* A new mutant arising as a single copy in a diploid population of size N has an initial frequency of 1/2N.If only drift is acting what is the probability of fixation for that neutral allele?1/2N 1/2N (1/2N)( 2N*) = Biol336-12 Fixation probability for a beneficial allele
Molecular Evolution What is the substitution rate for neutral alleles? What is the substitution rate for beneficial alleles (s>0)? What is the substitution rate for deleterious alleles? Fixation probability for a beneficial allele (2N)(2s) = 4Ns Probability of fixation of an advantageous allele * the number of new mutants arising every generation. Probability of fixation for positive values of s when N is large, is 2s. IF the absolute value of s is small the probability of fixation is 2s/1-exp(-4Ns) Close to zero. Biol336-12 Molecular Evolution Consider a numerical example:
A new mutant arises in a population of 1000 individuals. If it is neutral the probability it will fix is If it confers a selective advantage of s=0.01, then the probability it will fix is, If it has a selective disadvantage of s=-0.001? 1/2N=1/(2*1000) These last two results are noteworthy because it means advantageous mutations dont always fix in a population.In the case of an advantageous mutation with s=0.01, the probability it will fix is 2% but that also means 98% of all the mutations with the selective advantage of 0.01 are lost. On the other hand, even slightly deleterious mutations have a finite (albeit small) chance of fixing in a population. 2*s=0.02 (2%) 0.004% Biol336-12 Molecular Evolution If the population size is very large then the probability of fixation for an advantageous mutation converges to 2*s Given s=0.01, N=1000, P(fixation)= 0.02 or 2%, Given s=0.01, N=100, P(fixation)= Biol336-12 What about slightly deleterious mutations?
Molecular Evolution What about slightly deleterious mutations? s= , N=1000P(fixation)= s=-0.001, N=100, P(fixation)= s=-0.001, N=10,P(fixation) = Biol336-12 Molecular Evolution Are most substitutions (fixed changes) due to drift or natural selection? vs. Agree that: Most mutations are deleterious and are removed.Some mutations are favourable and are fixed. At Dispute: Are most replacement mutations that fix beneficial or neutral? Is observed polymorphism due to selection or drift? The 1960s witnessed a revolution in population genetics.With the introduction of electrophoresis into popgen studies, soon led to the discovery of large amounts of genetic variability in natural populations such as humans and Drosophila. In 1968 Kimura postulated that the majority of the molecular changes in evolution were due to the random fixation of neutral or nearly neutral mutations.This created a dispute between neutralists and selectionist.The dispute essentially concerns the distribution of fitness values of mutant alleles. Biol336-12 Molecular Evolution Silent (or synonymous) mutations, where the amino acid remains unchanged, are more likely to be neutral. Replacement (or non-synonymous) mutations causing an amino acid change are more likely to experience selection. Form and strength depends on gene and its function Biol336-12 Molecular Evolution Mammalian Genes Non-synonymous substitution rate
(per site per 109 years) Synonymous substitution rate (per site per 109 yrs) Histone 4 0.00 4.52 Histone 3 3.94 Myosin 0.10 2.15 Insulin 0.20 3.03 Growth Hormone 1.34 3.79 Immunoglobulin k 2.03 5.56 From this table, it is clear that the rate of nonsynonymous substitution is variable among different genes, rangingfrom zero to about 2x10-9 substitutions per nonsynonymous site per year Histones have an unusually low replacement substitution rate. Look at the column describing the rate of synonymous substitution.It also varies though not as much as the rate of nonsynonymous substittuion Biol336-12 Histones seem to have an unusually low replacement substitution rate.
Molecular Evolution Histones seem to have an unusually low replacement substitution rate. This suggests that mutations causing basepair changes in histones are deleterious WHY? Biol336-12 Molecular Evolution Looking at H3 and H4 it is clear there is some interaction with both the DNA and other histones Histones are DNA binding proteins around which DNA is coiled to form chromatin.Many positions within the protein interact with the DNA or other histones. Biol336-12 Molecular Evolution Most amino acid changes in histone proteins may have negative or even lethal consequences. Histone proteins have strong functional constraints. Biol336-12 Molecular Evolution Mammalian Genes Non-synonymous substitution rate
(per site per 109 years) Synonymous substitution rate (per site per 109 yrs) Histone 4 0.00 4.52 Histone 3 3.94 Myosin 0.10 2.15 Insulin 0.20 3.03 Growth Hormone 1.34 3.79 Immunoglobulin k 2.03 5.56 From this table, it is clear that the rate of nonsynonymous substitution is variable among different genes, rangingfrom zero to about 2x10-9 substitutions per nonsynonymous site per year Histones have an unusually low replacement substitution rate. Look at the column describing the rate of synonymous substitution.It also varies though not as much as the rate of nonsynonymous substittuion Biol336-12 Molecular Evolution Active sites (antigen binding sites of immunoglobins often have higher substitution rates than silent sites Immunoglobin genes are proteins found in the blood or bodily fluids of vertebrates and are used by the immune system to identify and neutral foreign objects.It is the small region at the tip of the protein that is extremely variable.Each variant can bind a different target or antigen.A huge diversity in this region allows the immune system to recognize an equally wide diversity of antigens Biol336-12 Molecular Evolution It could be that selection favours mutations in these regions, thereby increasing the diversity among antibodies produced by the body and improving the immune response Immunoglobin genes are proteins found in the blood or bodily fluids of vertebrates and are used by the immune system to identify and neutral foreign objects.It is the small region at the tip of the protein that is extremely variable.Each variant can bind a different target or antigen.A huge diversity in this region allows the immune system to recognize an equally wide diversity of antigens Biol336-12 How and why have molecular sequences evolved to be the way they are?
Molecular Evolution How and why have molecular sequences evolved to be the way they are? Biol336-12 Molecular Evolution To infer that selection has acted within a genome, one must reject the null hypothesis that no selection has acted. Null hypothesis:describes pattern of sequence evolution under the forces of mutation and drift. Remember from neutral theory:The rate at which one nucleotide is replaced by another nucleotide throughout a population (substitution) equals the rate of mutation () at that site. Probability of fixation of an advantageous allele * the number of new mutants arising every generation. Probability of fixation for positive values of s when N is large, is 2s. IF the absolue value of s is small the probability of fixation is 2s/1-exp(-4Ns) Biol336-12 How do we detect selection at DNA sequences?
Molecular Evolution How do we detect selection at DNA sequences? Comparing intra-species polymorphism to inter-species differences (McDonald-Kreitman test). Linked/neighbouring neutral markers. Examine genes for Dn/Ds ratios. Biol336-12 Molecular Evolution: The McDonald Kreitman Test
Kreitman and Hudson (1991) sequenced a 4750 basepair region near the alcohol dehydrogenase (ADH) gene from 11 individuals of D. melanogaster and found higher than expected levels of polymorphism Biol336-12 Molecular Evolution: The McDonald Kreitman Test
There is only one amino acid polymorphism (AdhF/AdhS) within this region which occurs at site 1490. Biol336-12 Molecular Evolution: The McDonald Kreitman Test
Selection may be maintaining this polymorphism at or near this site. Biol336-12 Molecular Evolution: The McDonald Kreitman Test
ADH is an enzyme that breaks down ethanol. Flies carrying the ADHF allele survive better when their food is spiked with ethanol than do flies carrying the ADHS allele (Cavener and Clegg 1981) Nonetheless, the factor that maintains ADHF/ADHS polymorphism remains unknown. Alchohol dehydrogenase Biol336-12 How and why have molecular sequences evolved to be the way they are?
Molecular Evolution: The McDonald-Kreitman Test How and why have molecular sequences evolved to be the way they are? How do we explain the patterns of variation observed in ADH DNA sequences? Biol336-12 Molecular Evolution: McDonald Kreitman Test
Imagine that five sequences are obtained from each of two species, and that the sequences are related to each other as shown here. Any mutation that happens on a red branch will appear as a polymorphism within species one Biol336-12 Molecular Evolution: McDonald Kreitman Test
Any mutation that happens on a red branch will appear as a polymorphism within species 1. Any mutation that happens on a blue branch will appear as a polymorphism within species 2. Any mutation that happens on the green branch will appear as a fixed difference between the species within species between species Here the phylogeny is divided into two parts:between species branches and within species branches. Within species branches connect all the alleles within each species to their most recent common ancestor. Between species branches connect these common ancestors to the common ancestor of the whole phylogeny. A mutation on a between species branch will appear in all the descendant alleles and thus will be a fixed difference between species.A mutation on a within species branch will be a polymorphism within a species Biol336-12 Molecular Evolution: McDonald Kreitman Test
Some abbreviations: Within species Ps=numbers of synonymous polymorphisms Pn=numbers of non-synonymous polymorphisms Between species Ds=numbers of synonymous substitutions Dn=numbers of non-synonymous substitutions Biol336-12 Molecular Evolution: McDonald Kreitman Test
If mutations occur randomly over time and if the chance that a mutation does or does not cause an amino acid change remains constant, then the ratio of replacement to silent changes should be the same along any of these branches Between species Remember weve divided nt subsitutions in a coding region into two types:replacement (non-synonymous) and synonymous. For a particular phylogeny and mutation rate, if mutations occur randomly over time and if the chance that a mutation does or doesnt cause a change in the amino acid remains constant, then ratio of the replacement changes to silent changes should be the same along any of these branches. Biol336-12 Molecular Evolution: McDonald Kreitman Test
If mutations are neutral any of these mutations has an equal chance of persisting. So the ratio of replacement to silent polymorphisms within a species (Pn/Ps) should be the same as the ratio of replacement to silent differences fixed between species (Dn/Ds) Pn/Ps Dn/Ds Remember weve divided nt subsitutions in a coding region into two types:replacement (non-synonymous) and synonymous. For a particular phylogeny and mutation rate, if mutations occur randomly over time and if the chance that a mutation does or doesnt cause a change in the amino acid remains constant, then ratio of the replacement changes to silent changes should be the same along any of these branches. Biol336-12 Molecular Evolution The McDonald-Kreitman Test:
Ho:If all changes are neutral, the ratio of replacement to silent changes at polymorphic sites (within species) should equal the ratio among fixed differences (between species). H1: If replacement mutations are advantageous, they fix rapidly, causing a higher replacement to silent ratio between species and a lower replacement to silent ratio within species. Biol336-12 Molecular Evolution The McDonald-Kreitman Test:
H2:If replacement mutations are deleterious, they rarely fix.Thus there will be a lower ratio of replacement to silent changes between species and a higher replacement to silent ratio within species. H3: If replacement mutations are subject to heterozygote advantage or frequency dependent selection, they rarely fix, causing a lower replacement to silent ratio between species and a higher replacement to silent ratio within species. Biol336-12 Molecular Evolution Null: all changes are neutral : drift
H1: changes are advantageous, positive selection H2: changes are deleterious, purifying selection H3: replacement changes never fix because of heterozygote advantage. Biol336-12 Molecular Evolution: McDonald Kreitman Test
ADH gene Fixed differences Between species Polymorphisms Within species Replacement 7 2 Silent 17 42 Btwn species: Ratio of replacement to silent = 7/17 =0.41 Wn species:Ratio of replacement to silent = 2/42 =0.05 FIXED>POLYMORPHISM Biol336-12 Molecular Evolution: McDonald Kreitman Test
Using a X2 test, the null hypothesis that selection is absent is statistically rejected for ADH. The excess of replacement differences between species suggests that mutations have been postively favoured. Biol336-12 Molecular Evolution: McDonald Kreitman Test
Assumes: All synonymous mutations are neutral (codon bias). All non-synonymous mutations are either strongly deleterious, neutral or strongly advantageous. Levels of polymorphism are governed by the neutral mutation rate. Within a species, advantageous mutations contribute little to polymorphism but can contribute to divergence between species. A problem with this test is that: A failure to reject the null hypothesis could be because both purifying and directional selection have taken place. Not all synonymous changes are in fact neutral. In some organisms, some codons are preferentially used. Biol336-12 Molecular Evolution How else might you detect selection in the genome, in particular the presence of selective sweeps? Biol336-12 Molecular Evolution: Neighbouring marker sites
If a beneficial mutation appears and sweeps through a population, what will happen to the level of polymorphism present at neighbouring DNA sites? Biol336-12 Molecular Evolution: Neighbouring marker sites
If a beneficial mutation appears and sweeps through a population, what will happen to the level of polymorphism present at neighbouring DNA sites? Genetic hitchhiking will decrease variation. Biol336-12 Molecular Evolution: Neighbouring marker sites
In the case of Plasmodium falciparum, diversity at neighbouring marker loci decreased. Biol336-12 Molecular Evolution: Neighbouring marker sites
Biol336-12 Wootton et al.(2002) Nature Molecular Evolution: Neighbouring marker sites
If there is overdominance at a nucleotide site, what will happen to the level of polymorphism at neighbouring sites? Variation at linked sites is more likely to be maintained. Biol336-12 Molecular Evolution: Neighbouring marker sites
If there is directional selection to remove a particular mutant allele (purifying selection), what will happen to the marker allele that happens to be on the same chromosome? It will decrease in frequency as a result of this association.This is called background selection. Biol336-12 So what is the evidence for natural selection shaping DNA sequences?
Molecular Evolution So what is the evidence for natural selection shaping DNA sequences? Nielsen et al.(2005) PloS Biology H0: neutral H1: positive Biol336-12 Molecular Evolution Nielsen et al.(2005) PloS Biology Biol336-12 Molecular Evolution How can you detect the signature of selection?
Comparing intra-species polymorphism to inter-species differences (McDonald-Kreitman test). Linked/neighbouring neutral markers. Examine genes for Dn/Ds ratios. Biol336-12 Zayed and Whitfield (2008) PNAS
Molecular Evolution Zayed and Whitfield (2008) PNAS If drift and demography are important then the effects will be seen on the whole genome. If selection is important, then the effects will be seen in specific regions of the genome. Biol336-12