Reference: Dan ’ s book chapter 4 Evolutionary rates

Preview:

Citation preview

Reference: Dan’s book chapter 4

Evolutionary rates

Evolutionary rates - history

•The first to suggest using DNA and proteins to investigate evolutionary history.

(They discussed molecular evolution before the genetic code was established).

Linus Pauling (1901-1994)

•The only person ever to receive two unshared Nobel Prizes—for Chemistry (1954) and for Peace (1962).

•His introductory textbook General Chemistry, revised three times since its first printing in 1947 and translated into 13 languages, has been used by generations of undergraduates.

Linus Pauling (1901-1994)

•Also wrote popular science books, e.g., “How to Live Longer and Feel Better”, and “Vitamin C and the Common Cold”.

•Published over 1,000 articles and books.

•Used to protest against nuclear testing.

Linus Pauling (1901-1994)

•He received a Ph.D. in chemistry and mathematical physics from California Institute of Technology (Caltech) in 1925 (age 24).

d

T

dr

2

Evolutionary rates

Rate is distance divided by time. Distance is number of substitutions per site. Time is in years. The time must be doubled, because the sequences evolved independently.

T

dr

2

Evolutionary rates

This formula is not accurate for closely related taxa, in which polymorphism must be taken into account (Takahata and Satta 1997).

Mean Rate of Nucleotide Substitutions in Mammalian Genomes

Evolution is a very Evolution is a very slowslow process at the molecular process at the molecular level level (“Nothing happens…”)(“Nothing happens…”)

~10-9

Substitutions/site/year

Sequence alignments

Alignment is needed for phylogeny and for molecular evolution. We will assume that the alignment is given.

How to construct alignment is outside the scope of this course.

For most proteins, it is observed that the rate of synonymous substitutions (silent substitutions) is much larger than the nonsynonymous rate (amino-acid modifying substitutions).

Synonymous vs. nonsynonymous substitutions

UUU -> UUC (both encode phenylalanine ): synonymous

UUU -> CUU (phenylalanine to leucine): nonsynonymous

A lot

A little

Synonymous vs. nonsynonymous substitutions

Synonymous vs. nonsynonymous substitutions

Empirical findings:Empirical findings:

Important proteins Important proteins evolve slowerslower than unimportantunimportant onesones.

Insulin

1953, Frederick Sanger determines the amino-acid sequence of insulin.

This is the FIRST protein whose amino-acid sequence was determined.

It demonstrated that insulin is comprised of only L-amino acids.

Insulin

Insulin was characterized to be composed of two chains (A&B), linked together by S-S bonds.

Insulin

30 AA

21 AA

How is the 2 chain protein synthesized?

Donald Steiner (University of Chicago) gave the answer.

He studied an islet-cell adenoma of the pancreas, a rare human tumor producing large amounts of insulin.

Insulin

Adenoma is a benign tumor (not a malignant tumor). Benign in English = harmless

Benign tumor: A tumor that does not recur locally and does not spread to other parts of the body.

Adenoma is from a glandular (i.e., from a gland) origin.

Adenomas can grow from many organs including the colon, adrenal, pituitary, thyroid.

Adenoma

He sliced the pancreatic tumor and incubated it with tritiated leucine and then analyzed it.

He found a new protein that was later proven to be the biosynthetic precursor of insulin, the proinsulin.

Insulin

Proinsulin has 30 residues that are absent from insulin.

Insulin

There is even a former form of proinsulin, called preproinsulin. It contains additional 19 AA at the N-terminus. This 19 AA hydrophobic stretch directs the preproinsulin to the ER.

Preproinsulin -> Proinsulin (ER membrane)From the ER it moves on to the Golgi and then to secretory granules.Proinsulin -> Insulin (Granules)

Insulin

Xenopus MALWMQCLP-LVLVLLFSTPNTEALANQHLBos MALWTRLRPLLALLALWPPPPARAFVNQHL **** : * *.*: *:..* :. *:****

Xenopus CGSHLVEALYLVCGDRGFFYYPKIKRDIEQBos CGSHLVEALYLVCGERGFFYTPKARREVEG ***************:***** ** :*::*

Xenopus AQVNGPQDNELDG-MQFQPQEYQKMKRGIVBos PQVG---ALELAGGPGAGGLEGPPQKRGIV .**. ** * * *****

Xenopus EQCCHSTCSLFQLENYCNBos EQCCASVCSLYQLENYCN **** *.***:*******

Alignment of preproinsulin

Empirical findings:Empirical findings:

Functional regionsFunctional regions evolve slowerslower than nonfunctionalnonfunctional regionsregions.

Rates of amino-acid replacements in different proteins

ProteinRate (number of replacements per site per

10 9 years)

Fibrinopeptide8.3

Insulin C-peptide

2.4

Ribonuclease2.1

Hemoglobin1.0

Cytochrome c0.3

Histone H40.01

Clotting – The end reaction

thrombinfibrinogen fibrin

Synonymous vs. nonsynonymous substitutions

Histone H4 between human and wheat: excess of synonymous substitutions

Mean nonsynonymous rate

0.74 0.67 (10-9 substitutions per site per year)

Mean synonymous rate

3.51 1.01 (10-9 substitutions per site per year)

The coefficient of variation is an attribute of a distribution: its standard deviation divided by its mean

Coefficient of variation of nonsynonymous rate

91%

Coefficient of variation of synonymous rate

29%

Ratio

1.5

4.4

1.1

Transition vs. transversion rates

0

Degeneracy class

4 2

Computing synonymous and non-synonymous rates

Silent and non-silent…

Computing synonymous and non-synonymous rates

3

3

Our goal is to be able to compare two (or later, more) sequences and to compare the rate of neutral evolution (determined by the synonymous rate) with than of the non-synonymous rate.

The lower the ratio of non-synonynous substitutions to synonymous ones, the higher the intensity of the purifying selection.

Ka/Ks

Computing synonymous and non-synonymous rates

Problematic: p-distance does not correct for multiple substitutions…

Solution: compute the JC correction to the p-distance.

p-distance of synonymous subs. = 3/6p-distance of nonsynonymous subs. = 3/63

3

Computing synonymous and non-synonymous rates

The random chance of a synonymous substitution is much smaller than the chance of a nonsynonymous one.

GAA (Glu)

ATA (Ile)

AAA (Lys)

AGA (Arg)

ACA (Thr)

TAA (Stop)CAA (Gln)

AAT (Asn)

AAG (Lys)

AAC (Asn)

Assume a protein without selection (evolving neutrally).

Computing synonymous and non-synonymous rates

This is also different for different codons.

CCA (Pro)

GTA (Val)

GCA (Ala)

GGA (Gly)

GAA (Glu)

TCA (Ser)ACA (Thr)

GCT (Ala)

GCG (Ala)

GCC (Ala)

Assume a protein without selection (evolving neutrally).

Computing synonymous and non-synonymous rates

So when one “observe” 6 times more nonsynonymous substitutions than synonymous ones – does it indicate that the protein is under purifying selection???

We must normalize for the potentials for silent vs. non-silent mutations of the codons in question.

Nei & Gojobori (1986)method

Masatoshi Nei Takashi Gojobori

Counting synonymous sites

Consider a particular position in a codon (j=1,2,3). Let fj be the fraction of synonymous changes at this site.

In TTT (Phe), the first two positions are nonsynonymous, because no synonymous changes can occur in them, and the third position is 1/3 synonymous and 2/3 nonsynonymous because one

of the three possible changes is synonymous .

Counting synonymous sites

Let s be the number of synonymous sites for each codon. s is in fact, the proportion, out of 3, of synonymous substitutions, assuming equal probability for each type of substitution.

3

1jjfs

For this example, s = 1/3.

Counting synonymous sites

Let n be the number of non-synonymous sites for each codon. n is in fact, the proportion, out of 3, of non-synonymous substitutions, assuming equal probability for each type of substitution.

sn 3

For this example, n = 2+2/3.

Counting synonymous sites

Assume we have r codons (3r sites). Let si and ni be s and n for the i’th codon. We define:

rNS

nN

sS

r

i

i

r

i

i

31

1

Classification of sites

S is in fact, the proportion, out of 3r, of synonymous substitutions, assuming equal probability for each type of substitution.

Classification of sites

We have two sequences

ACG CCG ATTATG CCT CTA

S for these two sequences, will be the average S of the 2 sequence. The same goes for N.

Counting synonymous substitutions

So far we have counted the potential for synonymous and nonsynonymous substitutions. Now we wish to count the actual number of synonymous and nonsynonymous substitutions.

For two codons that differ by only one nucleotide, the difference is easily inferred.

GTC (Val) GTT (Val)synonymous

GTC (Val) GCC (Ala) nonsynonymous.

Counting synonymous substitutions

We define sd and nd to be the number of synonymous and nonsynonymous substitutions per codon.

GTC (Val) GTT (Val)sd = 1, nd = 0

GTC (Val) GCC (Ala) sd = 0, nd = 1

Counting synonymous substitutions

For two codons that differ by two or more nucleotides, the estimation problem is more complicated, because we need to determine the order in which the substitutions occurred.

Counting synonymous substitutions

Pathway (1) requires one synonymous and one nonsynonymous substitutions, whereas pathway (2) requires two nonsynonymous substitutions.

If there are 3 differences between two codons, there are 6 possible paths.

ABC XYZ

A changed first, B second and finally C.A changed first, C second and finally B.B changed first, A second and finally C.B changed first, C second and finally A.C changed first, A second and finally B.C changed first, B second and finally A.

There are two approaches to deal with multiple substitutions at a codon:

The unweighted method: Average the numbers of the different types of substitutions for all the possible scenarios. For example, if we assume that the two pathways are equally likely, then the number of nonsynonymous substitutions is (1 + 2)/2 = 1.5, and the number of synonymous substitutions is (1 + 0)/2 = 0.5.

The weighted method. Employ an a priori criteria to assign the probability of each pathway. For instance, if the weight of pathway 1 is 0.9, and the weight for pathway 2 is 0.1, then the number of nonsynonymous substitutions between the two codons is (0.9 1) + (0.1 2) = 1.1, and the number of synonymous substitutions is 0.9.

Counting synonymous sites

Assume we have r codons (3r sites). Let and be sd and nd for the i’th codon. We define:

r

i

dd

r

i

dd

i

i

nN

sS

1

1

idnids

dd NS Total number of “observed” substitutions

Counting synonymous substitutions per synonymous sites

We define p-distances for each type of substitution:

)3

41ln(

4

3 )

3

41ln(

4

3nnss pdpd

These distances, are than corrected using the JC formula:

N

Np

S

Sp d

nd

s

Three types of selection

If dn < ds purifying selectionIf dn = ds neutral evolutionIf dn > ds positive selection

Humans are not so special?

Generation time and genomic evolution in primates

Vincent M. Sarich & Allan C. WilsonScience vol 179: 1144-1147 (1973).

A primate

Some background on Primates

Prosimians(Strepsirhines)

Tarsiers

New world monkeys(Platyrrhines)

Catarrhines

Gibbons

Hominidae

Old world monkeys

http://www.whozoo.org/mammals/Primates/primatephylogeny.htm

Haplorhines(Higher primates)

Some background on Primates

•Primates: 233 species and 13 families

•The smallest living primate is the pygmy marmoset (NW monkey), which weighs around 70 g; the largest is the gorilla, weighing up to around 175 kg.http://animaldiversity.ummz.umich.edu/site/accounts/information/Primates.html

Some background on Primates

•Most primate species live in the tropics or subtropics, although a few, most notably humans, also inhabit temperate regions. •Except for a few terrestrial species, primates are arboreal. Some species eat leaves or fruit; others are insectivorous or carnivorous.

Arbor = tree in Latin

Prosimians

Great apes

Hominidae is the primate family, which includes the extant species of humans, chimpanzees, gorillas, and orangutans, as well as many extinct species.

The members of the family are called hominids. The family is also called “great apes”.

Great apes

Originally non-human great apes were called Pongidae. However, this original definition suggests that Pongidae is a monophyletic family – which is not the case.

Many studies have showed a correlation between time of divergence and amount of evolutionary (molecular) distance:

Protein sequences of species that diverged earlier, show more differences.

time

p-dist

There’s a big disagreement if time should be measured in terms of astronomical time (i.e., years) or generation length.

The generation-time-hypothesis:

The number of substitutions is proportional to the number of generations.

Prediction:

Short generation More generations since divergence More substitutions (in B)

B (tree shrew)

A (human)

O

Absolute rates of evolution demand knowledge of divergence dates (from the fossil record).

However, relative rates of evolution can be computed from the phylogeny…

This will be done using the “relative rate method”.

Assume 3 taxa, A, B and C.

B (tree shrew)

T1

A (human)

C (outgroup)

T2

O

Assume 3 taxa, A, B and C.

B (tree shrew)

A (human)

C (outgroup)

O

BO > AO BO+OC > AO+OC BC > AC

Assume 3 taxa, A, B and C.

The generation time hypothesis predicts:

B (tree shrew)

A (human)

C (outgroup)

O

BO > AO BO+OC > AO+OC BC > AC

In words, the distance of species with short generation time from an outgroup, should be higher compared to species with longer generation time.

Assume 3 taxa, A, B and C.

They used (C) modern carnivore species as their outgroup.

B (tree shrew)

A (human)

C (outgroup)

O

The authors compared immunological distances between a few species and carnivore species.

The distance between Homo sapiens and each one of 4 carnivore species was computed, and they reported the average.

The 4 carnivore species are: Hyaena, Genetta, Ursus, and Arctogalida.

Hyaena, Genetta, Ursus, and Arctogalida.

Genetta genetta (small-spotted genet)

Although catlike in appearance and habit, the genet is not a cat but a member of the family Viverridae.

Genets were kept as pets by the ancient Egyptians as they are today by Berbers in North Africa. From the Greek empire to the Middle Ages, the genet was kept as a rat catcher and was often portrayed on tapestries of the period. The domestic cat eventually replaced the genet, probably because it is more efficient in killing rats-and perhaps because it is less smelly.

Results:

Immunological distances from carnivore species:

Homo sapiens 162Macaca mulatta (rhesus monkey) 166Ateles geoffroyi (spider monkey) 149Nycticebus coucang (slow loris) 125Lemur fulvus (brown lemur) 135Tarsius spectrum (tarsier) 137Tupaia glis (tree shrew) 156

Results:

Immunological distances from carnivore species:

Homo sapiens 162Macaca mulatta (rhesus monkey) 166Ateles geoffroyi (spider monkey) 149Nycticebus coucang (slow loris) 125Lemur fulvus (brown lemur) 135Tarsius spectrum (tarsier) 137Tupaia glis (tree shrew) 156

Prosim

ian

Nycticebus coucang (slow loris)

India, Malaysia, Sumatra, Java, Borneo, Philippines

Life span is 20 years (generation time < 20 years). Nocturnal and arboreal, they spend the day sleeping in a tight ball up a tree.

These results are against the generation-time hypothesis…

Homo sapiens 162Macaca mulatta (rhesus monkey) 166Ateles geoffroyi (spider monkey) 149Nycticebus coucang (slow loris) 125Lemur fulvus (brown lemur) 135Tarsius spectrum (tarsier) 137Tupaia glis (tree shrew) 156

Prosim

ian

No correlation of distances with generation length, for homo-prosimians

Results:

Immunological distances from carnivore species:

Homo sapiens 162Macaca mulatta (rhesus monkey) 166Ateles geoffroyi (spider monkey) 149Nycticebus coucang (slow loris) 125Lemur fulvus (brown lemur) 135Tarsius spectrum (tarsier) 137Tupaia glis (tree shrew) 156

Scan

den

tia

Common tree shrew - TUPAIA GLIS

Order: Climbing Mammals (Scandentia)Family: Tupaiidae.

Common tree shrew - TUPAIA GLIS

For several years, different groups placed the tree shrews in either one of these orders. Finally, in 1984 this issue was resolved when they were placed in their own order, called Scandentia. Some researchers still argue that they are the most primitive form of the primates, however.

This small order of tree shrews was at one time placed in the midst of controversy: is it a primate (order Primates) or an insectivore (order Insectivora).

Tarsius spectrum(tarsier)

Although data are not available on the lifespan of this species, another member of the genus, T. syrichta, is reported to have lived 13.5 years in captivity. Tarsius spectrum is likely to have a similar maximum lifespan.

Results:

Immunological distances from carnivore species:

Homo sapiens 162Macaca mulatta (rhesus monkey) 166Ateles geoffroyi (spider monkey) 149Nycticebus coucang (slow loris) 125Lemur fulvus (brown lemur) 135Tarsius spectrum (tarsier) 137Tupaia glis (tree shrew) 156

No correlation of distances with generation length. Homo has the longest, tree shrew, the shortest.

An evolutionary experimentAn evolutionary experiment

Spalax ehrenberghiSpalax ehrenberghi

The structural protein composing the lens is called α-crystallin.

It is composed of two subunits, αA and αB.Each subunit is a single-copy gene located on a different chromosome.

The two subunits have approximately 57% sequence homology, probably reflecting ancient gene duplication.

They also have low sequence similarity to heat-shock proteins (possible origin of family).

In Spalax, aA-crystallin lost its functional role more than 25 million years ago, when the mole rat became subterranean and presumably lost use of its eyes.

The aA-crystallin of Spalax evolves 4 times faster than the aA-crystallins in other rodents, such as rats, mice, hamsters, gerbils and squirrels. Functional relaxation.

The aA-crystallin of Spalax evolves 5 times slower than pseudogenes. It is still functional.

The aA-crystallin of Spalax possess all the prerequisites for normal function and expression, including the proper signals for alternative splicing.

The aA-crystallin of Spalax was shown to still be present in the rudimentary lens of the mole rat. Functional.

Explanation 1:

There is good evidence that the rudimentary eye, though not able to detect light anymore is still of vital importance for photoperiod perception, which is required for the physiological adaptations of the animal to seasonal changes.

Explanation 2:

The blind mole rat lost its vision more recently than 25 million years ago. The rate of nonsynonymous substitutions after nonfunctionalization has been underestimated.

Contradicting evidence: The aA-crystallin gene is still an intact gene as far as the essential molecular structures for its expression are concerned.

Explanation 3:

The aA-crystallin-gene product serves a function unrelated to that of the eye.

Supporting evidence: 1. aA-crystallin has been found in other tissues. 2. aA-crystallin also functions as a chaperone that binds denaturing proteins and prevents their aggregation. 3. The regions within aA-crystallin responsible for chaperone activity are conserved in the mole rat.

Recommended