Sampling distributions of alleles under models of neutral evolution

1. Genetic drift and mutation2. Coalescent3. Pairwise differences and numbers of segregating sites4. Population with time-varying size

Mathematical model for sampling distributions

of alleles

Genetic drift Mutation

Genetic drift

Alleles:

A1: A2:

Replication = sampling with replacement

A1 – becomes fixed

A2 – becomes lost

Mutation

Mutation introducesgenetic variability tothe evolution process

MutationMutation follows a Poisson process with intensity measured per locus (per site) per generation. Spatial characterization of places and effects caused, further specifies a mutation model. Most often applied are: infinite sites model, where it is assumed that each mutation takes place at a DNA site that never mutated before; infinite alleles model, where each mutation produces an allele never present in a population before; recurrent mutation model, where multiple changes of the nucleotide at a site are possible; stepwise mutation model, where mutation acts bidirectionally, increasing or reducing the number of repeats of a fixed DNA motif.

Infinite sites model

Mutation configuration in the infinite sites

model is fully described by a map between numbers of

sequences and numbers of mutations

Mutations

1 2 3 4 5 6

Statistics of mutations (segregating sites)

Number of segregating sites

Mutations

1 2 3 4 5 6

Pairwise differences

1 2 3 4 5 6

No of differencesd23 = 3

Mutations

Average number of pairwise differences = 3

Histogram of pairwise differences

No of differences

0 1 2 3 4 5 6

Classes of mutations

1 2 3 4 5 6

Mutation of class 2

Mutations

Histogram of classes of mutations

Class of mutation

Coalescence method

One looks at the past of an n - sample of sequences taken at present. Possible events that happen in the past are coalescences leading to common ancestors of sequences, and mutationsalong branches of ancestral tree.

Coalescence method

Present

Generation 1, (=1)

Generation 2, (=2)

Generation k, (=k)

…….

n - sample

Coalescence – pairwise statistics

Two sequences. For each sequence draw randomly a parent in generation 1 (=1), then for each parent draw randomly a (grand) parent in generation 2, (=2) …. . COMMON ANCESTOR2(i) - probability that a COMMON ANCESTOR of the two sequences lived in generation i (=i)

1)2(2 NN

Coalescence – continuous time approximations

Population time scale 1 unit = 2N generations

tetp )(2

Mutational time scale 1 unit = 1/2 generations

2t Netpt

Coalescence n-samplek independent, exponentially distributed random variables mutation intensityN population's effective size

= 4N product parameter t = 2 mutational time scale ( - is time in number of

generations).

),...,(

)( kk s

Coalescence method

The use of coalescence

theory allows efficient

formulation of appropriate models and

gives a good basis for

approaching model analysis problems, like

hypotheses testing or

parameter estimation.

1 2 3 4 5

Independence of metrics (coalescence times) and

topology

Topologies of trees (with ordered

branches) are all equally probable.

Metrics (distributions of branch

lengths) of trees are determined by

coalescence process which, in turn,

depends on population parameters.

Coalescence – statistics of pairwise differences

Assume mutational time – scale. Then mutations occur with intensity = 1/2. Let A2 denote a Z+ random variable defined by number of segregating sites between sample 1 and sample 2. T – random variable given by coalescence time t. Conditional probability that A2=n is Poisson with =t ! n

P[A2=n | T=t] =

22 ][)(n

nsnAPs

)1(2 )|( stetTs

Coalescence – population with time varying size

Population with time-varying size

Population's effective size N(t) changes in time, then product parameter is also a time function (t)= 4N(t)

Joint probability density function:

)(),...,(

How the history of population size

N(t) (t)is encoded in histograms

of pairwise differences and mutation classes ?

Pairwise differences

no of differences

0 5 10 150

time t

Pairwise differences I

0 5 10 15 20 250

no of differences

Pairwise differences II

0 5 10 15 20 25 300

time t

0 5 10 15 20 25 300

no of differences

Pairwise differences III

0 5 10 15 20 25 300

time t

0 5 10 150

Mutation classes

Frequencies are computed under the assumption

that mutaion intensity is low

Mutation classes I

0 5 10 150

time t

SNP type

1 2 3 4 5 6 7 8 9 100

N(t)=const

SNP type

time t

1 2 3 4 5 6 7 8 9 100

0 5 10 150

N(t)=N0exp(rt)

N0r=10

Mutation classes II

time t

SNP type

1 2 3 4 5 6 7 8 9 100

0 5 10 15 20 25 300

Mutation classes III

Conclusions

Different histories of population sizes lead to different sampling distributions of alleles

Parametric models of different form (exponential, stepwise, logistic) can lead to similar (difficult to distinguish) distributions of alleles

Estimation of population size history from DNA data can be unstable

Models versus data

Parametric and nonparametric estimation of

population size histories from DNA samples

Testing hypotheses on values of parameters

under parametric models, testing hypotheses

of time constant versus time varying

scenario

Models versus data

0 2 4 6 8 10 12 14 16 18 200

50100150200250300350400450

0 5 10 15 20 25 300

Data on worldwide distribution of mtDNA pairwise differences R. Cann et. al. 1987

Estimation of history of human population size

Models versus data II

2 4 6 8 10 12 14 16 18 200

0.6 Histogram of classes of mutations. Data on worldwide distribution of mtDNA pairwise differences R. Cann et. al. 1987

Models versus data III

Data on types of 44 SNPs randomly located in the genome Picoult, Newberg 2000

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 10

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.10.20.30.40.50.60.70.80.9

Parametric estimates of N(t) based on the above data

Sampling distributions of alleles under models of neutral evolution

Documents

6-4 Traits, Genes, Alleles

Copulas, multivariate risk-neutral distributions and ... · Copulas, multivariate risk-neutral distributions and implied dependence functions… S. Coutant Groupe de Recherche Op´erationnelle,

Variations on Mendel’s principles Incomplete dominance Co-dominance Multiple alleles Sex-linked alleles

Genotypes Genomes Alleles Proteomes

blank The Movement of Alleles The Movement of Alleles Migration and Inter-breeding

02 Chromosomes, Genes & Alleles

A Simple and Reliable Way to Compute Option-Based Risk-Neutral Distributions

Lethal alleles

University of California accessed 07.04.08 Chromosomes, Genes, Alleles Chromosomes, Genes, Alleles

Beyond Dominant & Recessive Alleles

Codominance and Multiple Alleles. 2 of 45© Boardworks Ltd 2009 GenotypePhenotype Codominant alleles Alleles are codominant if they are both expressed

CO-DOMINANT ALLELES: BLOOD TYPES. BOTH alleles are expressed, even when the genotype is heterozygous Both alleles are represented by capital lettersEX:

Balancing selection - mouse Mhc alleles

Bellringer – Bunny Lab What happened to the number of F alleles? What happened to the number of f alleles? What happened to the frequency of F alleles?

Lethal Recessive Alleles

Generalized Parton Distributions in the Chiral Odd …...1 Generalized Parton Distributions in the Chiral Odd Sector & Their Role in Neutral Meson Leptoproduction Gary R. Goldstein

Answers: Genes, alleles, nucleotide sequence

BIOLOGY Topic 3 Topic 3. Topic Outline Chromosomes, Genes, Alleles and Mutations Chromosomes, Genes, Alleles and Mutations Chromosomes, Genes, Alleles

Chapter 11 Advanced Genetics Codominance Multiple Allele Traits Polygenic Inheritance Lethal Alleles Epistatic Alleles

Multiple Alleles