Upload
prosper-melton
View
212
Download
0
Embed Size (px)
Citation preview
1
Population Genetics Basics
2
Terminology review
• Allele• Locus• Diploid• SNP
3
Single Nucleotide Polymorphisms
000001010111000110100101000101010010000000110001111000000101100110
Infinite Sites Assumption:Each site mutates at most once
4
What causes variation in a population?
• Mutations (may lead to SNPs)• Recombinations• Other genetic events (gene conversion)• Structural Polymorphisms
5
Recombination
0000000011111111
00011111
6
Gene Conversion
• Gene Conversion versus crossover– Hard to distinguish
in a population
7
Structural polymorphisms
• Large scale structural changes (deletions/insertions/inversions) may occur in a population.
8
Topic 1: Basic Principles
• In a ‘stable’ population, the distribution of alleles obeys certain laws– Not really, and the deviations are
interesting• HW Equilibrium
– (due to mixing in a population)• Linkage (dis)-equilibrium
– Due to recombination
9
Hardy Weinberg equilibrium
• Consider a locus with 2 alleles, A, a• p (respectively, q) is the frequency of A
(resp. a) in the population• 3 Genotypes: AA, Aa, aa• Q: What is the frequency of each genotype
If various assumptions are satisfied, (such as random mating, no natural selection), Then• PAA=p2
• PAa=2pq• Paa=q2
10
Hardy Weinberg: why?
• Assumptions:– Diploid– Sexual reproduction– Random mating– Bi-allelic sites– Large population size, …
• Why? Each individual randomly picks his two chromosomes. Therefore, Prob. (Aa) = pq+qp = 2pq, and so on.
11
Hardy Weinberg: Generalizations
• Multiple alleles with frequencies– By HW,
• Multiple loci?
1,2,,H
Pr[homozygous genotype i] =i2
Pr[heterozygous genotype i, j] = 2i j
12
Hardy Weinberg: Implications
• The allele frequency does not change from generation to generation. Why?
• It is observed that 1 in 10,000 caucasians have the disease phenylketonuria. The disease mutation(s) are all recessive. What fraction of the population carries the mutation?
• Males are 100 times more likely to have the “red’ type of color blindness than females. Why?
• Conclusion: While the HW assumptions are rarely satisfied, the principle is still important as a baseline assumption, and significant deviations are interesting.
13
Recombination
0000000011111111
00011111
14
What if there were no recombinations?
• Life would be simpler• Each individual sequence would have a
single parent (even for higher ploidy)• The relationship is expressed as a tree.
15
The Infinite Sites Assumption
0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0
3
8 5
• The different sites are linked. A 1 in position 8 implies 0 in position 5, and vice versa.
• Some phenotypes could be linked to the polymorphisms• Some of the linkage is “destroyed” by recombination
16
Infinite sites assumption and Perfect Phylogeny
• Each site is mutated at most once in the history.
• All descendants must carry the mutated value, and all others must carry the ancestral value
i
1 in position i0 in position i
17
Perfect Phylogeny
• Assume an evolutionary model in which no recombination takes place, only mutation.
• The evolutionary history is explained by a tree in which every mutation is on an edge of the tree. All the species in one sub-tree contain a 0, and all species in the other contain a 1. Such a tree is called a perfect phylogeny.
18
The 4-gamete condition
• A column i partitions the set of species into two sets i0, and i1
• A column is homogeneous w.r.t a set of species, if it has the same value for all species. Otherwise, it is heterogenous.
• EX: i is heterogenous w.r.t {A,D,E}
iA 0B 0C 0D 1E 1F 1
i0
i1
19
4 Gamete Condition
• 4 Gamete Condition– There exists a perfect phylogeny if and only if for all
pair of columns (i,j), either j is not heterogenous w.r.t i0, or i1.
– Equivalent to– There exists a perfect phylogeny if and only if for all
pairs of columns (i,j), the following 4 rows do not exist
(0,0), (0,1), (1,0), (1,1)
20
4-gamete condition: proof
• Depending on which edge the mutation j occurs, either i0, or i1 should be homogenous.
• (only if) Every perfect phylogeny satisfies the 4-gamete condition
• (if) If the 4-gamete condition is satisfied, does a prefect phylogeny exist?
i0 i1
i
21
An algorithm for constructing a perfect phylogeny
• We will consider the case where 0 is the ancestral state, and 1 is the mutated state. This will be fixed later.
• In any tree, each node (except the root) has a single parent.– It is sufficient to construct a parent for every node.
• In each step, we add a column and refine some of the nodes containing multiple children.
• Stop if all columns have been considered.
22
Inclusion Property
• For any pair of columns i,j– i < j if and only if i1
j1 • Note that if i<j then the
edge containing i is an ancestor of the edge containing i
i
j
23
Example
1 2 3 4 5A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0
r
A B C D E
Initially, there is a single clade r, and each node has r as its parent
24
Sort columns
• Sort columns according to the inclusion property (note that the columns are already sorted here).
• This can be achieved by considering the columns as binary representations of numbers (most significant bit in row 1) and sorting in decreasing order 1 2 3 4 5
A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0
25
Add first column
• In adding column i– Check each edge
and decide which side you belong.
– Finally add a node if you can resolve a clade
r
A BC DE
1 2 3 4 5
A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0
u
26
Adding other columns
• Add other columns on edges using the ordering property
r
E B
C
D
A
1 2 3 4 5
A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0
1
2
4
3
5
27
Unrooted case
• Switch the values in each column, so that 0 is the majority element.
• Apply the algorithm for the rooted case
28
Handling recombination
• A tree is not sufficient as a sequence may have 2 parents
• Recombination leads to loss of correlation between columns
29
Linkage (Dis)-equilibrium (LD)
• Consider sites A &B• Case 1: No
recombination– Pr[A,B=0,1] = 0.25
• Linkage disequilibrium
• Case 2:Extensive recombination– Pr[A,B=(0,1)=0.125
• Linkage equilibrium
A B0 10 10 00 01 01 01 01 0
30
Handling recombination
• A tree is not sufficient as a sequence may have 2 parents
• Recombination leads to loss of correlation between columns
31
Recombination, and populations
• Think of a population of N individual chromosomes.
• The population remains stable from generation to generation.
• Without recombination, each individual has exactly one parent chromosome from the previous generation.
• With recombinations, each individual is derived from one or two parents.
• We will formalize this notion later in the context of coalescent theory.