Upload
gabriella-bilberry
View
247
Download
0
Tags:
Embed Size (px)
Citation preview
The Coalescent TheoryAnd coalescent-based population genetics programs
OverviewSet up IMa run
The theory
Influence
Computer programs
IMa tutorial
Set up IMa Run Download data file from Wiki Open terminal Type command:ima -i IMaEliurus -o IMaEliurus.out -q1 10 –q2 20 -m1 1 -m2 1 –n 10 -t 80 -b 100000 –L0.5 –p 45
Can vary numbers for q’s, t, & m’s
OverviewSet up IMa run
The theory
Influence
Computer programs
IMa tutorial
COALESCENT THEORY Formalized in 1982 by Kingman in “The
Coalescent” Based on main idea of:
Retrospective model of population genetics
Dependent on ancestral population size and time since divergence
COALESCENT THEORY Formalized in 1982 by Kingman in “The
Coalescent” Based on main idea of:
Retrospective model of population genetics
Dependent on ancestral population size and time since divergence
COALESCENT THEORY Terms:
Coalescence: two lineages tracing back to a common ancestor at particular time
Effective Population Size (Ne): size of Wright-Fisher population; usually smaller than census
Theta, Θ: capacity of population to maintain genetic variability (=4Neμ)
Incomplete lineage sorting: failure to coalesce
COALESCENT THEORY Terms:
Coalescence: two lineages tracing back to a common ancestor at particular time
Effective Population Size (Ne): size of Wright-Fisher population; usually smaller than census
Theta, Θ: capacity of population to maintain genetic variability (=4Neμ)
Incomplete lineage sorting: failure to coalesce
Wright Fisher Model Describes genetic drift in finite population Assumptions
N diploid organisms Monoecious reproduction with infinite
number of gametes Non-overlapping generations Random mating No mutation No selection
COALESCENT THEORY Terms:
Coalescence: two lineages tracing back to a common ancestor at particular time
Effective Population Size (Ne): size of Wright-Fisher population; usually smaller than census
Theta, Θ: capacity of population to maintain genetic variability (=4Neμ)
Incomplete lineage sorting: failure to coalesce
COALESCENT THEORY Terms:
Coalescence: two lineages tracing back to a common ancestor at particular time
Effective Population Size (Ne): size of Wright-Fisher population; usually smaller than census
Theta, Θ: capacity of population to maintain genetic variability (=4Neμ)
Incomplete lineage sorting: failure to coalesce
Incomplete Lineage Sorting
Degnan & Salter (2005)
COALESCENT THEORY Mathematical expectation of distribution of
time back to coalescence
Seeks to predict amount of time elapsed between introduction of mutation and arising of particular allele/gene distribution in population
Present
Past
Present
Past
Present
Past
Present
Past
Present
Past
Present
Past
Present
Past
Present
Past
Present
Past
Mathematical Representation
Θ= 4Neμ
P(Coalescent event) = 1/(2Ne)
Pc(t) = (1 – (1/2Ne))t-1(1/(2Ne))
E(tk) = 2/(k(k-1))
OverviewSet up IMa run
The theory
Influence
Computer programs
IMa tutorial
Influence Population Genetics
Phylogenetics
Statistical Phylogeography
Population Genetics Theory describes the genealogical
relationships among individuals in a Wright-Fisher population
Phylogenetics Gene tree-Species tree
Predicts certain distribution of gene tree frequencies
Statistical Phylogeography Individual gene trees contain
information about past demographic events when rate of coalescence different between
OverviewSet up IMa run
The theory
Influence
Computer programs
IMa tutorial
Computer Programs Kuhner, 2008
BEAST GENETREE LAMARC MIGRATE-N IM/IMa
IMa2
Computer Programs Kuhner, 2008
BEAST GENETREE LAMARC MIGRATE-N IM/IMa
IMa2
Computer Programs Kuhner, 2008
BEAST GENETREE LAMARC MIGRATE-N IM/IMa
IMa2
Computer Programs Kuhner, 2008
BEAST GENETREE LAMARC MIGRATE-N IM/Ima
IMa2
Computer Programs Kuhner, 2008
BEAST GENETREE LAMARC MIGRATE-N IM/IMa
IMa2
Computer Programs Kuhner, 2008
BEAST GENETREE LAMARC MIGRATE-N IM/IMa
IMa2
Computer Programs Coalescent Simulators
Approximate Bayesian Computation DIY-ABC PopABC
Simulation (Using “Pipeline” Approach) GENOME COAL CoaSim
Computer Programs Coalescent Simulators
Approximate Bayesian Computation DIY-ABC PopABC
Simulation (Using “Pipeline” Approach) GENOME COAL CoaSim
Computer Programs Coalescent Simulators
Approximate Bayesian Computation DIY-ABC PopABC
Simulation (Using “Pipeline” Approach) GENOME COAL CoaSim
Computer Programs Coalescent Simulators
Approximate Bayesian Computation DIY-ABC PopABC
Simulation (Using “Pipeline” Approach) GENOME ms COAL CoaSim
Computer Programs Coalescent Simulators
Approximate Bayesian Computation DIY-ABC PopABC
Simulation (Using “Pipeline” Approach) GENOME ms COAL CoaSim
Computer Programs Coalescent Simulators
Approximate Bayesian Computation DIY-ABC PopABC
Simulation (Using “Pipeline” Approach) GENOME ms COAL CoaSim
Computer Programs Coalescent Simulators
Approximate Bayesian Computation DIY-ABC PopABC
Simulation (Using “Pipeline” Approach) GENOME ms COAL CoaSim
OverviewSet up IMa run
The theory
Influence
Computer programs
IMa tutorial
Introduction MCMC simulation of gene genealogies
IM simulates model parameters
Hey, J (2006)
Introduction cont’d Assumptions
No other populations more closely related Selective neutrality No recombination within loci Free recombination between loci Mutation model chosen is correct
Infinite sites Hasegawa-Kishino-Yano Stepwise Compound locus
Input FileExample data for IM
# im test data
population1 population2
3 locus1 1 1 13 I 1 0.0000000008 (0.0000000001, 0.0000000015)
pop1_1 ACTACTGTCATGA
pop2_1 AGTACTATCACGA
hapstrexample 2 1 4 J2 0.75
pop1_1 13 34 GTAC
pop1_2 12 35 GTAT
pop2_1 12 37 GTAT
strexample 2 2 1 S1 1 0.00001 (0.000001, 0.00005)
strpop11a 23 strpop11b 26
strpop21a 25
strpop21b 31
Input FileExample data for IM
# im test data
population1 population2
3 locus1 1 1 13 I 1 0.0000000008 (0.0000000001, 0.0000000015)
pop1_1 ACTACTGTCATGA
pop2_1 AGTACTATCACGA
hapstrexample 2 1 4 J2 0.75
pop1_1 13 34 GTAC
pop1_2 12 35 GTAT
pop2_1 12 37 GTAT
strexample 2 2 1 S1 1 0.00001 (0.000001, 0.00005)
strpop11a 23 strpop11b 26
strpop21a 25
strpop21b 31
Input FileExample data for IM
# im test data
population1 population2
3 locus1 1 1 13 I 1 0.0000000008 (0.0000000001, 0.0000000015)
pop1_1 ACTACTGTCATGA
pop2_1 AGTACTATCACGA
hapstrexample 2 1 4 J2 0.75
pop1_1 13 34 GTAC
pop1_2 12 35 GTAT
pop2_1 12 37 GTAT
strexample 2 2 1 S1 1 0.00001 (0.000001, 0.00005)
strpop11a 23 strpop11b 26
strpop21a 25
strpop21b 31
Command Line (terminal)Command line:ima -i IMaEliurus -o IMaEliurus.out -q1 10 –q2 20 -m1 1 -m2 1 –n 10 -t 80 -b 100000 –L10000 –p 45
Command Line (terminal)Command line:ima -i IMaEliurus -o IMaEliurus.out -q1 10 –q2 20 -m1 1 -m2 1 –n 10 -t 80 -b 100000 –L100000 –p 45
More complex run line:ima -i IMaEliurus -o IMaEliurus.out -q1 10 -q2 10 –qA 300 –m 12 –m 23
–t 80 –n 20 –b 100000 –L 0.5 –fl
–g1 0.01 –p 45
Important Note! Need “IMrun” file which only says “yes”
to continue indefinitely (or until it crashes or DSCR kicks the job)
Ouput File .out MCMC information
Summary Acceptance rates Autocorrelation ESS Chain swapping
Ouput File .out Marginal Peak Marginal distributions
Minbin Maxbin HiPt HiSmth Mean 95lo/hi HPD90lo/hi
Ouput File .out ASCII
Curves Plots
Ouput File .out.ti No outward information Can be used on subsequent runs when
in “L mode”
How can I get a “good” run? Conduct preliminary run Duration?
Ideally, once run reaches stationarity and convergence
Assess autocorrelation Use Metropolis-coupled MCMC Run many, many times (well, at least 3)
Robustness of Coalescent Violation to assumptions of:
Intralocus recombination Population structure Gene flow from unsampled populations Linkage among loci Divergent selection Different model of substitution
Questions?