Upload
others
View
1
Download
1
Embed Size (px)
Citation preview
Statistical inference of phylogenetic networks
Claudia Solís-Lemus 1
Joint work with Cécile Ané 1 2
1Department of Statistics, UW-Madison
2Department of Botany, UW-Madison
October 28, 2015
Motivation: Tree of life
Solis-Lemus networks October 28, 2015 2 / 32
Data
Solis-Lemus networks October 28, 2015 3 / 32
Phylogenetic tree
(a) Rooted (b) Unrooted
Figure: Binary phylogenetic tree
Solis-Lemus networks October 28, 2015 4 / 32
Estimated tree
Alligator
Emu
Kiwi
Ostrich
Swan
Goose
Chicken
Falcon
Finch
Osprey
Woodpecker
Ibis
Stork
Vulture
Penguin
Solis-Lemus networks October 28, 2015 5 / 32
Gene tree reconstruction
Human
Chimpanzee
Gorilla
Orangutan
Markov model for
sequence evolution
Solis-Lemus networks October 28, 2015 6 / 32
Markov model for sequence evolution: A,C,G,T
Solis-Lemus networks October 28, 2015 7 / 32
Markov model for sequence evolution: A,C,G,T
L =∑w
∑y
∑x
π(w)Pt6(w , y)Pt5(w ,G)Pt3(y , x)Pt4(y ,C)
Pt2(x ,C)Pt1(x ,A)
Solis-Lemus networks October 28, 2015 8 / 32
Markov model for sequence evolution: A,C,G,T
L =∑w
∑y
∑x
π(w)Pt6(w , y)Pt5(w ,G)Pt3(y , x)Pt4(y ,C)
Pt2(x ,C)Pt1(x ,A)
Solis-Lemus networks October 28, 2015 8 / 32
Gene tree reconstruction
Numerical optimization for branch lengths: tHeuristic optimization for tree topology
Available software: MrBayes (Huelsenbeck and Ronquist, 2001)RAxML (Stamatakis, 2014)
Solis-Lemus networks October 28, 2015 9 / 32
Challenge: big space of trees
# Species # Unrooted trees # Rooted trees1 1 12 1 13 1 34 3 155 15 1056 105 9457 945 103958 10,395 135,1359 135,135 2,027,02510 2,027,025 34,459,42511 34,459,425 654,729,07512 654,729,075 13,749,310,57513 13,749,310,575 316,234,143,225...
......
52 > # atoms in universe
Solis-Lemus networks October 28, 2015 10 / 32
Challenge: gene tree discordance
76% Human
Chimpanzee
Gorilla
Orangutan
Solis-Lemus networks October 28, 2015 11 / 32
Challenge: gene tree discordance
76% Human
Chimpanzee
Gorilla
Orangutan
12% Human
Gorilla
12% Human
Orangutan
Species tree and gene trees can be different!
Solis-Lemus networks October 28, 2015 12 / 32
Species tree vs Gene tree
Human Chimpanzee Gorilla
Solis-Lemus networks October 28, 2015 13 / 32
From gene trees to species tree
?
Solis-Lemus networks October 28, 2015 14 / 32
Multiple reasons for gene tree discordance
1 gene tree reconstruction error2 horizontal gene transfer (HGT)3 incomplete lineage sorting (ILS)
Solis-Lemus networks October 28, 2015 15 / 32
2. Horizontal Gene Transfer (HGT)
Psst! Hey kid! Wanna be a Superbug...?
Stick some of this into your genome...
Even penicillin won't be able to harm you...
www.nearingzero.net
www.quora.net
Solis-Lemus networks October 28, 2015 16 / 32
2. Horizontal Gene Transfer (HGT)
Solis-Lemus networks October 28, 2015 17 / 32
2. HGT => Phylogenetic network
(c) Rooted (d) Unrooted
Figure: Binary phylogenetic network
Solis-Lemus networks October 28, 2015 18 / 32
3. Incomplete lineage sorting (ILS)
Gene trees differ inside a species tree
Solis-Lemus networks October 28, 2015 19 / 32
3. Incomplete lineage sorting (ILS)
Gene trees differ inside a species tree
Solis-Lemus networks October 28, 2015 19 / 32
3. Incomplete lineage sorting (ILS)
Gene trees differ inside a species tree
Solis-Lemus networks October 28, 2015 19 / 32
3. Incomplete lineage sorting (ILS)
Gene trees differ inside a species tree
Solis-Lemus networks October 28, 2015 19 / 32
3. Incomplete lineage sorting (ILS)
Solis-Lemus networks October 28, 2015 20 / 32
Multispecies coalescent model on a network (ILS+HGT)
P(gene tree | species network,t, γ)
Meng & Kubatko (2009), Yu Degnan & Nakhleh (2012)
Solis-Lemus networks October 28, 2015 21 / 32
Maximum likelihood estimationModel:
Data: multiple gene alignments −→ multiple gene trees
L(network, t, γ) =∏g
P(g |network, t, γ)
Numerical optimization for t, γ
Heuristic optimization for network topology
PhyloNet Maximum LikelihoodYu, Dong, Liu & Nakhleh (2014)
Solis-Lemus networks October 28, 2015 22 / 32
Challenge: huge space of networks
Solis-Lemus networks October 28, 2015 23 / 32
Challenge: expensive computation of likelihood
Gene tree: ((a, c), (b1, b2))
Figure: Yu, Dong, Liu, Nakhleh (2014)
Solis-Lemus networks October 28, 2015 24 / 32
Maximum pseudolikelihood estimation
Quartet-based inference:
L̃(network, t, γ) ∝∏
q∈Q(network)
Likelihood(q, t, γ)
Numerical optimization for t, γ
Heuristic optimization for network topology
SNaQ Maximum PseudolikelihoodS.-L., Ané (2015)
Solis-Lemus networks October 28, 2015 25 / 32
SNaQ (S.-L., Ané, 2015)
Maximum pseudolikelihoodestimation of unrooted networksquartet-based inferenceInput: unrooted gene treesFast, scalable,easy pseudolikelihood computation
www.github.com/crsl4/PhyloNetworks
Solis-Lemus networks October 28, 2015 26 / 32
SNaQ (S.-L., Ané, 2015)
0
20
40
60
80
100
10 30 100 300 1k 3k
PhyloNetSNaQ
n=6, h=1
Tim
e (
min
ute
s)
10 30 100 300 1k 3k
n=6, h=2
0
50
100
150
200
number of genes
10 30 100 300 1k 3k
n=10, h=2
Tim
e (
hours
)
number of genes
10 30 100 300 1k 3k
n=15, h=3
Solis-Lemus networks October 28, 2015 27 / 32
Identifiability issues: flat pseudolikelihood
Solis-Lemus networks October 28, 2015 28 / 32
Xiphophorus fish data (Cui et al., 2013)
1183 genes, 24 swordtails and platyfish, 5 hybridizations
XalvareziXmayaeXhelleriiXsignumXclemenciae_F2XmonticolusXmultilineatusXnigrensisXpygmaeusXcontinensXcorteziXnezahuacoyotlXmontezumaeXmalinche_CHIC2Xbirchmanni_GARCXmilleriXevelynaeXvariatusXcouchianusXgordoniXmeyeriXxiphidiumXandersiXmaculatus
X.alvarezi
X.mayae
X.hellerii
X.signum
X.clemenciae
X.monticolus
X.multilineatus
X.nigrensis
X.pygmaeus
X.continens
X.cortezi
X.nezahualcoyotl
X.montezumae
X.malinche
X.birchmanni
X.milleri
X.evelynae
X.variatus
X.couchianus
X.gordoni
X.meyeri
X.xiphidium
X.andersi
X.macalatus
SS
NS
NP
SP
� Loss of sword• Loss of ability to produce sword
Solis-Lemus networks October 28, 2015 29 / 32
Many challenges
Huge space of networks- How to search efficiently?
No clear idea on the shape of pseudolikelihood function- Flatness => Identifiability issues:
· Identify hybrid node in cycle?- Many local maxima
Properties for pseudolikelihood estimator
Solis-Lemus networks October 28, 2015 30 / 32
Final remark
(SNaQ uses NLopt)
Solis-Lemus networks October 28, 2015 31 / 32
Acknowledgements
Joint work with Cécile Ané
Thanks:Doug BatesNoah Stenz
Sarah Friedrich (for cool SNaQ logo)
DEB 0949121
DEB 1354793www.github.com/crsl4/PhyloNetworks
Solis-Lemus networks October 28, 2015 32 / 32