63
The Coalescent Theory And coalescent- based population genetics programs

The Coalescent Theory And coalescent- based population genetics programs

Embed Size (px)

Citation preview

Page 1: The Coalescent Theory And coalescent- based population genetics programs

The Coalescent TheoryAnd coalescent-based population genetics programs

Page 2: The Coalescent Theory And coalescent- based population genetics programs

OverviewSet up IMa run

The theory

Influence

Computer programs

IMa tutorial

Page 3: The Coalescent Theory And coalescent- based population genetics programs

Set up IMa Run Download data file from Wiki Open terminal Type command:ima -i IMaEliurus -o IMaEliurus.out -q1 10 –q2 20 -m1 1 -m2 1 –n 10 -t 80 -b 100000 –L0.5 –p 45

Can vary numbers for q’s, t, & m’s

Page 4: The Coalescent Theory And coalescent- based population genetics programs

OverviewSet up IMa run

The theory

Influence

Computer programs

IMa tutorial

Page 5: The Coalescent Theory And coalescent- based population genetics programs

COALESCENT THEORY Formalized in 1982 by Kingman in “The

Coalescent” Based on main idea of:

Retrospective model of population genetics

Dependent on ancestral population size and time since divergence

Page 6: The Coalescent Theory And coalescent- based population genetics programs
Page 7: The Coalescent Theory And coalescent- based population genetics programs
Page 8: The Coalescent Theory And coalescent- based population genetics programs
Page 9: The Coalescent Theory And coalescent- based population genetics programs
Page 10: The Coalescent Theory And coalescent- based population genetics programs

COALESCENT THEORY Formalized in 1982 by Kingman in “The

Coalescent” Based on main idea of:

Retrospective model of population genetics

Dependent on ancestral population size and time since divergence

Page 11: The Coalescent Theory And coalescent- based population genetics programs

COALESCENT THEORY Terms:

Coalescence: two lineages tracing back to a common ancestor at particular time

Effective Population Size (Ne): size of Wright-Fisher population; usually smaller than census

Theta, Θ: capacity of population to maintain genetic variability (=4Neμ)

Incomplete lineage sorting: failure to coalesce

Page 12: The Coalescent Theory And coalescent- based population genetics programs

COALESCENT THEORY Terms:

Coalescence: two lineages tracing back to a common ancestor at particular time

Effective Population Size (Ne): size of Wright-Fisher population; usually smaller than census

Theta, Θ: capacity of population to maintain genetic variability (=4Neμ)

Incomplete lineage sorting: failure to coalesce

Page 13: The Coalescent Theory And coalescent- based population genetics programs

Wright Fisher Model Describes genetic drift in finite population Assumptions

N diploid organisms Monoecious reproduction with infinite

number of gametes Non-overlapping generations Random mating No mutation No selection

Page 14: The Coalescent Theory And coalescent- based population genetics programs

COALESCENT THEORY Terms:

Coalescence: two lineages tracing back to a common ancestor at particular time

Effective Population Size (Ne): size of Wright-Fisher population; usually smaller than census

Theta, Θ: capacity of population to maintain genetic variability (=4Neμ)

Incomplete lineage sorting: failure to coalesce

Page 15: The Coalescent Theory And coalescent- based population genetics programs

COALESCENT THEORY Terms:

Coalescence: two lineages tracing back to a common ancestor at particular time

Effective Population Size (Ne): size of Wright-Fisher population; usually smaller than census

Theta, Θ: capacity of population to maintain genetic variability (=4Neμ)

Incomplete lineage sorting: failure to coalesce

Page 16: The Coalescent Theory And coalescent- based population genetics programs

Incomplete Lineage Sorting

Degnan & Salter (2005)

Page 17: The Coalescent Theory And coalescent- based population genetics programs

COALESCENT THEORY Mathematical expectation of distribution of

time back to coalescence

Seeks to predict amount of time elapsed between introduction of mutation and arising of particular allele/gene distribution in population

Page 18: The Coalescent Theory And coalescent- based population genetics programs

Present

Past

Page 19: The Coalescent Theory And coalescent- based population genetics programs

Present

Past

Page 20: The Coalescent Theory And coalescent- based population genetics programs

Present

Past

Page 21: The Coalescent Theory And coalescent- based population genetics programs

Present

Past

Page 22: The Coalescent Theory And coalescent- based population genetics programs

Present

Past

Page 23: The Coalescent Theory And coalescent- based population genetics programs

Present

Past

Page 24: The Coalescent Theory And coalescent- based population genetics programs

Present

Past

Page 25: The Coalescent Theory And coalescent- based population genetics programs

Present

Past

Page 26: The Coalescent Theory And coalescent- based population genetics programs

Present

Past

Page 27: The Coalescent Theory And coalescent- based population genetics programs

Mathematical Representation

Θ= 4Neμ

P(Coalescent event) = 1/(2Ne)

Pc(t) = (1 – (1/2Ne))t-1(1/(2Ne))

E(tk) = 2/(k(k-1))

Page 28: The Coalescent Theory And coalescent- based population genetics programs

OverviewSet up IMa run

The theory

Influence

Computer programs

IMa tutorial

Page 29: The Coalescent Theory And coalescent- based population genetics programs

Influence Population Genetics

Phylogenetics

Statistical Phylogeography

Page 30: The Coalescent Theory And coalescent- based population genetics programs

Population Genetics Theory describes the genealogical

relationships among individuals in a Wright-Fisher population

Page 31: The Coalescent Theory And coalescent- based population genetics programs

Phylogenetics Gene tree-Species tree

Predicts certain distribution of gene tree frequencies

Page 32: The Coalescent Theory And coalescent- based population genetics programs

Statistical Phylogeography Individual gene trees contain

information about past demographic events when rate of coalescence different between

Page 33: The Coalescent Theory And coalescent- based population genetics programs

OverviewSet up IMa run

The theory

Influence

Computer programs

IMa tutorial

Page 34: The Coalescent Theory And coalescent- based population genetics programs

Computer Programs Kuhner, 2008

BEAST GENETREE LAMARC MIGRATE-N IM/IMa

IMa2

Page 35: The Coalescent Theory And coalescent- based population genetics programs

Computer Programs Kuhner, 2008

BEAST GENETREE LAMARC MIGRATE-N IM/IMa

IMa2

Page 36: The Coalescent Theory And coalescent- based population genetics programs

Computer Programs Kuhner, 2008

BEAST GENETREE LAMARC MIGRATE-N IM/IMa

IMa2

Page 37: The Coalescent Theory And coalescent- based population genetics programs

Computer Programs Kuhner, 2008

BEAST GENETREE LAMARC MIGRATE-N IM/Ima

IMa2

Page 38: The Coalescent Theory And coalescent- based population genetics programs

Computer Programs Kuhner, 2008

BEAST GENETREE LAMARC MIGRATE-N IM/IMa

IMa2

Page 39: The Coalescent Theory And coalescent- based population genetics programs

Computer Programs Kuhner, 2008

BEAST GENETREE LAMARC MIGRATE-N IM/IMa

IMa2

Page 40: The Coalescent Theory And coalescent- based population genetics programs

Computer Programs Coalescent Simulators

Approximate Bayesian Computation DIY-ABC PopABC

Simulation (Using “Pipeline” Approach) GENOME COAL CoaSim

Page 41: The Coalescent Theory And coalescent- based population genetics programs

Computer Programs Coalescent Simulators

Approximate Bayesian Computation DIY-ABC PopABC

Simulation (Using “Pipeline” Approach) GENOME COAL CoaSim

Page 42: The Coalescent Theory And coalescent- based population genetics programs

Computer Programs Coalescent Simulators

Approximate Bayesian Computation DIY-ABC PopABC

Simulation (Using “Pipeline” Approach) GENOME COAL CoaSim

Page 43: The Coalescent Theory And coalescent- based population genetics programs

Computer Programs Coalescent Simulators

Approximate Bayesian Computation DIY-ABC PopABC

Simulation (Using “Pipeline” Approach) GENOME ms COAL CoaSim

Page 44: The Coalescent Theory And coalescent- based population genetics programs

Computer Programs Coalescent Simulators

Approximate Bayesian Computation DIY-ABC PopABC

Simulation (Using “Pipeline” Approach) GENOME ms COAL CoaSim

Page 45: The Coalescent Theory And coalescent- based population genetics programs

Computer Programs Coalescent Simulators

Approximate Bayesian Computation DIY-ABC PopABC

Simulation (Using “Pipeline” Approach) GENOME ms COAL CoaSim

Page 46: The Coalescent Theory And coalescent- based population genetics programs

Computer Programs Coalescent Simulators

Approximate Bayesian Computation DIY-ABC PopABC

Simulation (Using “Pipeline” Approach) GENOME ms COAL CoaSim

Page 47: The Coalescent Theory And coalescent- based population genetics programs

OverviewSet up IMa run

The theory

Influence

Computer programs

IMa tutorial

Page 48: The Coalescent Theory And coalescent- based population genetics programs

Introduction MCMC simulation of gene genealogies

IM simulates model parameters

Hey, J (2006)

Page 49: The Coalescent Theory And coalescent- based population genetics programs

Introduction cont’d Assumptions

No other populations more closely related Selective neutrality No recombination within loci Free recombination between loci Mutation model chosen is correct

Infinite sites Hasegawa-Kishino-Yano Stepwise Compound locus

Page 50: The Coalescent Theory And coalescent- based population genetics programs

Input FileExample data for IM

# im test data

population1 population2

3 locus1 1 1 13 I 1 0.0000000008 (0.0000000001, 0.0000000015)

pop1_1 ACTACTGTCATGA

pop2_1 AGTACTATCACGA

hapstrexample 2 1 4 J2 0.75

pop1_1 13 34 GTAC

pop1_2 12 35 GTAT

pop2_1 12 37 GTAT

strexample 2 2 1 S1 1 0.00001 (0.000001, 0.00005)

strpop11a 23 strpop11b 26

strpop21a 25

strpop21b 31

Page 51: The Coalescent Theory And coalescent- based population genetics programs

Input FileExample data for IM

# im test data

population1 population2

3 locus1 1 1 13 I 1 0.0000000008 (0.0000000001, 0.0000000015)

pop1_1 ACTACTGTCATGA

pop2_1 AGTACTATCACGA

hapstrexample 2 1 4 J2 0.75

pop1_1 13 34 GTAC

pop1_2 12 35 GTAT

pop2_1 12 37 GTAT

strexample 2 2 1 S1 1 0.00001 (0.000001, 0.00005)

strpop11a 23 strpop11b 26

strpop21a 25

strpop21b 31

Page 52: The Coalescent Theory And coalescent- based population genetics programs

Input FileExample data for IM

# im test data

population1 population2

3 locus1 1 1 13 I 1 0.0000000008 (0.0000000001, 0.0000000015)

pop1_1 ACTACTGTCATGA

pop2_1 AGTACTATCACGA

hapstrexample 2 1 4 J2 0.75

pop1_1 13 34 GTAC

pop1_2 12 35 GTAT

pop2_1 12 37 GTAT

strexample 2 2 1 S1 1 0.00001 (0.000001, 0.00005)

strpop11a 23 strpop11b 26

strpop21a 25

strpop21b 31

Page 53: The Coalescent Theory And coalescent- based population genetics programs

Command Line (terminal)Command line:ima -i IMaEliurus -o IMaEliurus.out -q1 10 –q2 20 -m1 1 -m2 1 –n 10 -t 80 -b 100000 –L10000 –p 45

Page 54: The Coalescent Theory And coalescent- based population genetics programs

Command Line (terminal)Command line:ima -i IMaEliurus -o IMaEliurus.out -q1 10 –q2 20 -m1 1 -m2 1 –n 10 -t 80 -b 100000 –L100000 –p 45

More complex run line:ima -i IMaEliurus -o IMaEliurus.out -q1 10 -q2 10 –qA 300 –m 12 –m 23

–t 80 –n 20 –b 100000 –L 0.5 –fl

–g1 0.01 –p 45

Page 55: The Coalescent Theory And coalescent- based population genetics programs

Important Note! Need “IMrun” file which only says “yes”

to continue indefinitely (or until it crashes or DSCR kicks the job)

Page 56: The Coalescent Theory And coalescent- based population genetics programs

Ouput File .out MCMC information

Summary Acceptance rates Autocorrelation ESS Chain swapping

Page 57: The Coalescent Theory And coalescent- based population genetics programs

Ouput File .out Marginal Peak Marginal distributions

Minbin Maxbin HiPt HiSmth Mean 95lo/hi HPD90lo/hi

Page 58: The Coalescent Theory And coalescent- based population genetics programs

Ouput File .out ASCII

Curves Plots

Page 59: The Coalescent Theory And coalescent- based population genetics programs

Ouput File .out.ti No outward information Can be used on subsequent runs when

in “L mode”

Page 60: The Coalescent Theory And coalescent- based population genetics programs

How can I get a “good” run? Conduct preliminary run Duration?

Ideally, once run reaches stationarity and convergence

Assess autocorrelation Use Metropolis-coupled MCMC Run many, many times (well, at least 3)

Page 61: The Coalescent Theory And coalescent- based population genetics programs

Robustness of Coalescent Violation to assumptions of:

Intralocus recombination Population structure Gene flow from unsampled populations Linkage among loci Divergent selection Different model of substitution

Page 62: The Coalescent Theory And coalescent- based population genetics programs
Page 63: The Coalescent Theory And coalescent- based population genetics programs

Questions?