Why it matters
• Pure scientific curiosity
• Knowledge is intrinsically valuable, regardless of applications
• Critical for truly understanding function
• Translating research/knowledge between model
organisms
• Evolution shapes population genetics
• Critical for understanding how mutations cause disease
Why it matters
• Ecology, ecological interactions, diversity
• Antibiotic resistance
• Microbiome
• Cancer
A Brief History of Life on Earth
Time
4.5B: Origin of the Earth
3 – 4B: Origin of Life
2.7B: Bacteria
1.5B: Eukaryotes
1B: Animals
Definitions
• Homology
• Descent from a common ancestor
• All or nothing, no such thing as percent homology
• Divergence
• Change in two sequences over time, after splitting from a common
ancestor
• Convergence
• Similarity due to independent evolutionary events
• On the amino acid level: rare and difficult to prove
Two Groups of Processes
• Mutation
• Provides raw material of evolution
• Many different processes and mechanisms
• Happens within individuals
• Selection and Drift
• Happens within populations of organisms
• Affect the frequency if mutations within organisms over time
AGTCCAAGGCCTTAA -------------> AGTTCAAGGCCTTAA
point mutation
CCTTA
AGTCCAAGGCCTTAA
insertion
-------------> AGTCCAAGGCCTTACCTTAA
AAGG
------------->AGTCCAAGGCCTTAA
deletion
AGTCC-CCTTAA
AGTCCAAGGCCTTAA
` inversion
AGTCCAAGGCCTTAA
+
GGTCCTGGAATTCAG
AGTCCAAGGCC
-------------> AGTCCCCTTCCTTAA
------------->
translocation +
AGTCCAAGGCC
GGTCCTGGAATTCAGTTAA
-------------->
duplication
AGTCCAAGGCCAGTCCAAGGCC
AAGG
AGTCCAAGGCCTTAA ---------------> AGTCCAAAGGCTTAA
recombination AGGC
Mutational Processes
• Arise generally as unrepaired mismatches during DNA
replication
• Some repair processes introduce mutation
• Chemical processes change non-replicating DNA
• Multi-cellularity buffers from all acquired (somatic)
mutations being hereditary
• Humans:
• de novo mutation rate of 1.2 x 10-8/nucleotide/generation
• ~70 per child
• Majority of paternal origin
Mutations, Polymorphisms, Substitutions
• Mutations: Appear in individuals within a population
• Sometimes in human genetics used to specifically describe
pathogenic or disease causing variation
• Polymorphism: An unfixed mutation of varying frequency
within a population
• In human genetics generally used to describe functionally
neutral/benign variation. Often must have a frequency of >5%
• Substitution: A fixed mutation. All individuals within a
population have the mutation
• Most often used when comparing one or more species
Selection and Drift
• Fitness
• Measured in terms of the number of offspring that survive to
themselves reproduce
• Positive Selection
• Rare
• Mutation confers some fitness advantage
• Negative Selection
• Frequent
• Mutation confers a fitness disadvantage
• Neutral
• Mutation has little to no impact on fitness
• Most frequent
Examples of Positive Selection
• MHC Genes
• Balancing selection: favours diversity at loci
• Many genes involved in metabolism and digestion
• Accelerated evolution over last ~10,000 years
• Adaptation to Agriculture
• Human adaptations to high altitide
• EPAS1, PPARA, EGLN1 (Tibetans)
• CBARA1, VAV3, ARNT2, THRB (Ethiopian Highlanders)
• EGLN1 (Andean Peruvians)
Mutation at the Codon Level
Synonymous (Silent)
Mutation: Codon still codes
for the same amino acid
Non-Synonymous
Mutation: Codon now
codes for a different amino
acid (missense), premature
stop codon (nonsense), or
alters a start codon
Evolutionary Rates and Constraints
• Evolution is only partially random
• Mutations (quasi-random, non-uniform distribution of possibilities)
• Drift (Random)
• Selection (Non-random)
• Evolutionary rate at the protein level is the number of
fixed amino acid substitutions over evolutionary time
• Measured between one or more species-level comparisons
Evolutionary Rates and Constraints
• Different proteins have different overall rates of evolution
• Functional necessity
• Structural necessity
• Number of protein-protein interactions
• Different regions within a protein have different rates of
evolution
• Functional constraint
• Structural constraint
Evolutionary Rate: Structure/Function
Relationship
• Pattern of evolution is that rates are slowest near the
centre, fastest on exterior
• Distance to catalytic centre
• Hydrophobic packing of the interior
• Spatial/size constraints in interior
• More loops and alpha-helices on exterior
• How does this change for structural proteins like tubulin or
actin?
Identifying Disease Causing Genes
• Lynch Syndrome
• Autosomal dominant cancer syndrome
• Defective mismatch repair
• Increased risk of many cancers, particularly colorectal
Identifying the Gene using Evolutionary
Reasoning
• Inactivation of genes known to be involved in mismatch
repair in E. coli and yeast lead to ‘mutator’ phenotype
• Microsatellite instability observed
• Searched for homologous genes in humans based on
Microsatellite instability
• Identified MLH1 and MSH2
• Sequenced genes in Lynch syndrome patients and identified
mutations
Identifying Likely Pathogenic Mutations
• Needle in a stack of needles (Exome and Genome
Sequencing)
• Individual humans ~70 new mutations
• Can be hundreds to thousands of shared variants between small
numbers of individuals in a family
Evolutionary Profile of Pathogenic
Mutations
• Highly conserved amino acids more likely to be
functionally important
• Highly conserved genes more likely to be indispensable
• Conservation alone can be misleading
• Factor in evolutionary history and relatedness of species being
compared
• Best tools use many sources of information and high-level machine
learning
o Older age of diagnosis
o Often diagnosed at later
stages as symptoms similar
to many common diseases
o 3rd leading cause of cancer
death worldwide: 730,000
deaths per year
o 90% of cases are sporadic
o Most cases of familial
clustering due to shared
environmental factors
o 60% of hereditary cases
caused by mutations in the
gene CDH1
Genomic
Regions
Number of
Exomes
Number of
Variants <5%
Allele Frequency
in Regions of
Interest
Number With
Medium or High
Impact
All Affected 3 14 0
Siblings Only 2 9550 525
All Variants in Exome
Variants in Shared Regions
Variant Frequency in Population
Variant Impact
Candidates
MAP3K6
Protein Kinase
ATP
Bindin
g
Proton
Acceptor
D200Y
V207G
H506Y* P946L
P958T
F849Sfs*142
Coiled-
Coil
Functional Divergence
• Duplicated genes (paralogs)
• Can diverge in function as well as sequence
Gene 1 Gene 2Gene 1a
Types of Functional Divergence
• Subfunctionalization
• Specialize and retain only a subset of ancestral function
• Neofunctionalization
• Gain a new function, lose ancestral
• Subneofunctionalization
• Specialize and elaborate
Glyceraldehyde-3-Phosphate
Dehydrogenase
NAD+ NADH+Pi +H+
NAD+ NADH+ Pi + H+
Cytosol: Glycolysis
Glyceraldehyde-3-Phosphate 1,3-Biphosphate
Glyceraldehyde-3-Phosphate
Dehydrogenase
NADP+ NADPH+Pi +H+
NADP+ NADPH+Pi +H+
Glyceraldehyde-3-Phosphate 1,3-Biphosphate
Plastid: Calvin Cycle
Divergent and Convergent Evolution in
GAPDH• Many sites predicted to be functionally divergent
• 69 in the green group (GapA/B)
• 26 in GapC1
• 20 in both GapC1 and GapA/B