39
Sampling distributions of alleles under models of neutral evolution

Sampling distributions of alleles under models of neutral evolution

Embed Size (px)

Citation preview

Page 1: Sampling distributions of alleles under models of neutral evolution

Sampling distributions of alleles under models of neutral evolution

Page 2: Sampling distributions of alleles under models of neutral evolution

1. Genetic drift and mutation2. Coalescent3. Pairwise differences and numbers of segregating sites4. Population with time-varying size

Plan

Page 3: Sampling distributions of alleles under models of neutral evolution

Mathematical model for sampling distributions

of alleles

Genetic drift Mutation

Page 4: Sampling distributions of alleles under models of neutral evolution

Genetic drift

Alleles:

A1: A2:

Replication = sampling with replacement

A1 – becomes fixed

A2 – becomes lost

G1

G2

Gn

...

Page 5: Sampling distributions of alleles under models of neutral evolution

Mutation

Gk

Gk+1

Mutation introducesgenetic variability tothe evolution process

Page 6: Sampling distributions of alleles under models of neutral evolution

MutationMutation follows a Poisson process with intensity measured per locus (per site) per generation. Spatial characterization of places and effects caused, further specifies a mutation model. Most often applied are: infinite sites model, where it is assumed that each mutation takes place at a DNA site that never mutated before; infinite alleles model, where each mutation produces an allele never present in a population before; recurrent mutation model, where multiple changes of the nucleotide at a site are possible; stepwise mutation model, where mutation acts bidirectionally, increasing or reducing the number of repeats of a fixed DNA motif.

Page 7: Sampling distributions of alleles under models of neutral evolution

Infinite sites model

Mutation configuration in the infinite sites

model is fully described by a map between numbers of

sequences and numbers of mutations

1

2

3

4

5

Mutations

1 2 3 4 5 6

Seq

uen

ce

s

Page 8: Sampling distributions of alleles under models of neutral evolution

Statistics of mutations (segregating sites)

Page 9: Sampling distributions of alleles under models of neutral evolution

Number of segregating sites

1

2

3

4

5

Mutations

1 2 3 4 5 6

Seq

uen

ce

s

S=6

Page 10: Sampling distributions of alleles under models of neutral evolution

Pairwise differences

1

2

3

4

5

1 2 3 4 5 6

Seq

uen

ces

No of differencesd23 = 3

Mutations

Average number of pairwise differences = 3

Page 11: Sampling distributions of alleles under models of neutral evolution

Histogram of pairwise differences

No o

f p

air

s

No of differences

0

1

2

0 1 2 3 4 5 6

3

Page 12: Sampling distributions of alleles under models of neutral evolution

Classes of mutations

1

2

3

4

5

1 2 3 4 5 6

Mutation of class 2

Seq

uen

ces

Mutations

Page 13: Sampling distributions of alleles under models of neutral evolution

Histogram of classes of mutations

Fre

qu

en

cy

Class of mutation

0

0.5

1 2

1

Page 14: Sampling distributions of alleles under models of neutral evolution

Coalescence method

One looks at the past of an n - sample of sequences taken at present. Possible events that happen in the past are coalescences leading to common ancestors of sequences, and mutationsalong branches of ancestral tree.

Page 15: Sampling distributions of alleles under models of neutral evolution

Coalescence method

Present

Past

Generation 1, (=1)

Generation 2, (=2)

Generation k, (=k)

.

.

…….

n - sample

Pop

ula

tion

size

2N

2N

2N

Page 16: Sampling distributions of alleles under models of neutral evolution

Coalescence – pairwise statistics

Two sequences. For each sequence draw randomly a parent in generation 1 (=1), then for each parent draw randomly a (grand) parent in generation 2, (=2) …. . COMMON ANCESTOR2(i) - probability that a COMMON ANCESTOR of the two sequences lived in generation i (=i)

Page 17: Sampling distributions of alleles under models of neutral evolution

N2

1)1(2

)2

11(

2

1)2(2 NN

12 )

2

11(

2

1)( k

NNk

Page 18: Sampling distributions of alleles under models of neutral evolution

Coalescence – continuous time approximations

Population time scale 1 unit = 2N generations

Nt

2

tetp )(2

Mutational time scale 1 unit = 1/2 generations

2t Netpt

4 ,1

)(2

Page 19: Sampling distributions of alleles under models of neutral evolution

Coalescence n-samplek independent, exponentially distributed random variables mutation intensityN population's effective size

= 4N product parameter t = 2 mutational time scale ( - is time in number of

generations).

n

kkn s

kk

ssp2

2 )2

exp(2

),...,(

)2

exp(2

)( kk s

kk

sp

Page 20: Sampling distributions of alleles under models of neutral evolution

Coalescence method

The use of coalescence

theory allows efficient

formulation of appropriate models and

gives a good basis for

approaching model analysis problems, like

hypotheses testing or

parameter estimation.

s5

s4

s3

6

5

4

3

2

1t4

t2

1 2 3 4 5

s2

t3

t5

Page 21: Sampling distributions of alleles under models of neutral evolution

Independence of metrics (coalescence times) and

topology

Topologies of trees (with ordered

branches) are all equally probable.

Metrics (distributions of branch

lengths) of trees are determined by

coalescence process which, in turn,

depends on population parameters.

Page 22: Sampling distributions of alleles under models of neutral evolution

Coalescence – statistics of pairwise differences

Assume mutational time – scale. Then mutations occur with intensity = 1/2. Let A2 denote a Z+ random variable defined by number of segregating sites between sample 1 and sample 2.  T – random variable given by coalescence time t. Conditional probability that A2=n is Poisson with =t ! n

te

nt

P[A2=n | T=t] =

Page 23: Sampling distributions of alleles under models of neutral evolution

n

nAP

11

1][ 2

0

22 ][)(n

nsnAPs

)1(2 )|( stetTs

sss

11

1

1

)1(1

1)(2

Page 24: Sampling distributions of alleles under models of neutral evolution

Coalescence – population with time varying size

Page 25: Sampling distributions of alleles under models of neutral evolution

Population with time-varying size

Population's effective size N(t) changes in time, then product parameter is also a time function (t)= 4N(t)

Joint probability density function:

.0 ,

))(

exp(

)(),...,(

132

2

222

1

nn

n

k

t

t

k

k

k

n

tttt

d

tttp

k

k

Page 26: Sampling distributions of alleles under models of neutral evolution

How the history of population size

N(t) (t)is encoded in histograms

of pairwise differences and mutation classes ?

Page 27: Sampling distributions of alleles under models of neutral evolution

Pairwise differences

Page 28: Sampling distributions of alleles under models of neutral evolution

no of differences

0 5 10 150

12

34

56

7

time t

(t

)

Pairwise differences I

0 5 10 15 20 250

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

freq

uen

cy

Page 29: Sampling distributions of alleles under models of neutral evolution

no of differences

freq

uen

cy

Pairwise differences II

0 5 10 15 20 25 300

0.02

0.04

0.06

0.08

0.1

0.12

time t

(t

)

0 5 10 15 20 25 300

20

40

60

80

100

120

Page 30: Sampling distributions of alleles under models of neutral evolution

no of differences

freq

uen

cy

Pairwise differences III

0 5 10 15 20 25 300

0.02

0.04

0.06

0.08

0.1

0.12

0.14

time t

(t

)

0 5 10 150

50

100

150

200

250

Page 31: Sampling distributions of alleles under models of neutral evolution

Mutation classes

Frequencies are computed under the assumption

that mutaion intensity is low

Page 32: Sampling distributions of alleles under models of neutral evolution

Mutation classes I

0 5 10 150

12

34

56

7

time t

SNP type

N(t

)fr

eq

uen

cy

1 2 3 4 5 6 7 8 9 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.3

N(t)=const

Page 33: Sampling distributions of alleles under models of neutral evolution

SNP type

time t

N(t

)fr

eq

uen

cy

1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 5 10 150

50

100

150

200

250

N(t)=N0exp(rt)

0.5

N0r=10

Mutation classes II

Page 34: Sampling distributions of alleles under models of neutral evolution

time t

SNP type

N(t

)fr

eq

uen

cy

1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 5 10 15 20 25 300

20

40

60

80

100

120

0.6

Mutation classes III

Page 35: Sampling distributions of alleles under models of neutral evolution

Conclusions

Different histories of population sizes lead to different sampling distributions of alleles

Parametric models of different form (exponential, stepwise, logistic) can lead to similar (difficult to distinguish) distributions of alleles

Estimation of population size history from DNA data can be unstable

Page 36: Sampling distributions of alleles under models of neutral evolution

Models versus data

Parametric and nonparametric estimation of

population size histories from DNA samples

Testing hypotheses on values of parameters

under parametric models, testing hypotheses

of time constant versus time varying

scenario

Page 37: Sampling distributions of alleles under models of neutral evolution

Models versus data

0 2 4 6 8 10 12 14 16 18 200

50100150200250300350400450

0 5 10 15 20 25 300

0.02

0.04

0.06

0.08

0.1

0.12

Data on worldwide distribution of mtDNA pairwise differences R. Cann et. al. 1987

Estimation of history of human population size

Page 38: Sampling distributions of alleles under models of neutral evolution

Models versus data II

2 4 6 8 10 12 14 16 18 200

0.1

0.2

0.3

0.4

0.5

0.6 Histogram of classes of mutations. Data on worldwide distribution of mtDNA pairwise differences R. Cann et. al. 1987

Page 39: Sampling distributions of alleles under models of neutral evolution

Models versus data III

Data on types of 44 SNPs randomly located in the genome Picoult, Newberg 2000

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 10

12

34

5

67

8

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.10.20.30.40.50.60.70.80.9

1

Parametric estimates of N(t) based on the above data