31
1 Population Genetics Basics

1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

Embed Size (px)

Citation preview

Page 1: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

1

Population Genetics Basics

Page 2: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

2

Terminology review

• Allele• Locus• Diploid• SNP

Page 3: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

3

Single Nucleotide Polymorphisms

000001010111000110100101000101010010000000110001111000000101100110

Infinite Sites Assumption:Each site mutates at most once

Page 4: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

4

What causes variation in a population?

• Mutations (may lead to SNPs)• Recombinations• Other genetic events (gene conversion)• Structural Polymorphisms

Page 5: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

5

Recombination

0000000011111111

00011111

Page 6: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

6

Gene Conversion

• Gene Conversion versus crossover– Hard to distinguish

in a population

Page 7: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

7

Structural polymorphisms

• Large scale structural changes (deletions/insertions/inversions) may occur in a population.

Page 8: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

8

Topic 1: Basic Principles

• In a ‘stable’ population, the distribution of alleles obeys certain laws– Not really, and the deviations are

interesting• HW Equilibrium

– (due to mixing in a population)• Linkage (dis)-equilibrium

– Due to recombination

Page 9: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

9

Hardy Weinberg equilibrium

• Consider a locus with 2 alleles, A, a• p (respectively, q) is the frequency of A

(resp. a) in the population• 3 Genotypes: AA, Aa, aa• Q: What is the frequency of each genotype

If various assumptions are satisfied, (such as random mating, no natural selection), Then• PAA=p2

• PAa=2pq• Paa=q2

Page 10: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

10

Hardy Weinberg: why?

• Assumptions:– Diploid– Sexual reproduction– Random mating– Bi-allelic sites– Large population size, …

• Why? Each individual randomly picks his two chromosomes. Therefore, Prob. (Aa) = pq+qp = 2pq, and so on.

Page 11: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

11

Hardy Weinberg: Generalizations

• Multiple alleles with frequencies– By HW,

• Multiple loci?

1,2,,H

Pr[homozygous genotype i] =i2

Pr[heterozygous genotype i, j] = 2i j

Page 12: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

12

Hardy Weinberg: Implications

• The allele frequency does not change from generation to generation. Why?

• It is observed that 1 in 10,000 caucasians have the disease phenylketonuria. The disease mutation(s) are all recessive. What fraction of the population carries the mutation?

• Males are 100 times more likely to have the “red’ type of color blindness than females. Why?

• Conclusion: While the HW assumptions are rarely satisfied, the principle is still important as a baseline assumption, and significant deviations are interesting.

Page 13: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

13

Recombination

0000000011111111

00011111

Page 14: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

14

What if there were no recombinations?

• Life would be simpler• Each individual sequence would have a

single parent (even for higher ploidy)• The relationship is expressed as a tree.

Page 15: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

15

The Infinite Sites Assumption

0 0 0 0 0 0 0 0

0 0 1 0 0 0 0 0

0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0

3

8 5

• The different sites are linked. A 1 in position 8 implies 0 in position 5, and vice versa.

• Some phenotypes could be linked to the polymorphisms• Some of the linkage is “destroyed” by recombination

Page 16: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

16

Infinite sites assumption and Perfect Phylogeny

• Each site is mutated at most once in the history.

• All descendants must carry the mutated value, and all others must carry the ancestral value

i

1 in position i0 in position i

Page 17: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

17

Perfect Phylogeny

• Assume an evolutionary model in which no recombination takes place, only mutation.

• The evolutionary history is explained by a tree in which every mutation is on an edge of the tree. All the species in one sub-tree contain a 0, and all species in the other contain a 1. Such a tree is called a perfect phylogeny.

Page 18: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

18

The 4-gamete condition

• A column i partitions the set of species into two sets i0, and i1

• A column is homogeneous w.r.t a set of species, if it has the same value for all species. Otherwise, it is heterogenous.

• EX: i is heterogenous w.r.t {A,D,E}

iA 0B 0C 0D 1E 1F 1

i0

i1

Page 19: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

19

4 Gamete Condition

• 4 Gamete Condition– There exists a perfect phylogeny if and only if for all

pair of columns (i,j), either j is not heterogenous w.r.t i0, or i1.

– Equivalent to– There exists a perfect phylogeny if and only if for all

pairs of columns (i,j), the following 4 rows do not exist

(0,0), (0,1), (1,0), (1,1)

Page 20: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

20

4-gamete condition: proof

• Depending on which edge the mutation j occurs, either i0, or i1 should be homogenous.

• (only if) Every perfect phylogeny satisfies the 4-gamete condition

• (if) If the 4-gamete condition is satisfied, does a prefect phylogeny exist?

i0 i1

i

Page 21: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

21

An algorithm for constructing a perfect phylogeny

• We will consider the case where 0 is the ancestral state, and 1 is the mutated state. This will be fixed later.

• In any tree, each node (except the root) has a single parent.– It is sufficient to construct a parent for every node.

• In each step, we add a column and refine some of the nodes containing multiple children.

• Stop if all columns have been considered.

Page 22: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

22

Inclusion Property

• For any pair of columns i,j– i < j if and only if i1

j1 • Note that if i<j then the

edge containing i is an ancestor of the edge containing i

i

j

Page 23: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

23

Example

1 2 3 4 5A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0

r

A B C D E

Initially, there is a single clade r, and each node has r as its parent

Page 24: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

24

Sort columns

• Sort columns according to the inclusion property (note that the columns are already sorted here).

• This can be achieved by considering the columns as binary representations of numbers (most significant bit in row 1) and sorting in decreasing order 1 2 3 4 5

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0

Page 25: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

25

Add first column

• In adding column i– Check each edge

and decide which side you belong.

– Finally add a node if you can resolve a clade

r

A BC DE

1 2 3 4 5

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0

u

Page 26: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

26

Adding other columns

• Add other columns on edges using the ordering property

r

E B

C

D

A

1 2 3 4 5

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0

1

2

4

3

5

Page 27: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

27

Unrooted case

• Switch the values in each column, so that 0 is the majority element.

• Apply the algorithm for the rooted case

Page 28: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

28

Handling recombination

• A tree is not sufficient as a sequence may have 2 parents

• Recombination leads to loss of correlation between columns

Page 29: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

29

Linkage (Dis)-equilibrium (LD)

• Consider sites A &B• Case 1: No

recombination– Pr[A,B=0,1] = 0.25

• Linkage disequilibrium

• Case 2:Extensive recombination– Pr[A,B=(0,1)=0.125

• Linkage equilibrium

A B0 10 10 00 01 01 01 01 0

Page 30: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

30

Handling recombination

• A tree is not sufficient as a sequence may have 2 parents

• Recombination leads to loss of correlation between columns

Page 31: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP

31

Recombination, and populations

• Think of a population of N individual chromosomes.

• The population remains stable from generation to generation.

• Without recombination, each individual has exactly one parent chromosome from the previous generation.

• With recombinations, each individual is derived from one or two parents.

• We will formalize this notion later in the context of coalescent theory.