Upload
neal-dorsey
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
EVOLUTIONARY HMMSBAYESIAN APPROACH TO MULTIPLE ALIGNMENTSiva Theja MaguluriCS 598 SS
Goal
Given a set of sequences and a tree representing their evolutionary relationship, to find a multiple sequence alignment which maximizes the probability of the evolutionary relationships between the sequences.
Apr 22, 2023
2
Siva Theja Maguluri
Evolutionary Model
Pairwise likelihood for relation between two sequences
Reversibility
Additivity
Apr 22, 2023
3
Siva Theja Maguluri
Alignment can be inferred from the sequences using DP if Markov condition applies
Joint likelihood of a multiple alignment on a tree
Apr 22, 2023
4
Siva Theja Maguluri
Alignment Model
Substitution models
Apr 22, 2023
5
Siva Theja Maguluri
Links Model
Apr 22, 2023Siva Theja Maguluri
6
Birth Death process with Immigration ie each residue can either spawn a child or die
Birth rate λ, Death rate µ Immortal link at the left hand side Independent Homogenous Substitution
Probability evolution in Links Model
Apr 22, 2023Siva Theja Maguluri
7
Time evolution of the probability of a link surviving and spawning n descendants
Time evolution of the probability of a link dying before time t and spawning n descendants
Probability evolution in Links Model
Apr 22, 2023Siva Theja Maguluri
8
Time evolution of the probability of the immortal link spawning n descendants at time t
Probability evolution in Links Model
Apr 22, 2023Siva Theja Maguluri
9
Solution of these differential equations is
where
Probability evolution in Links Model
Apr 22, 2023Siva Theja Maguluri
10
Conceptually, α is the probability the ancestral residue survives
β is the probability of more insertions given one or more descendants
γ is the probability of insertion given ancestor did not survive
In the limit, immortal link generates residues according to geometric distribution
Links model as a Pair HMM
Apr 22, 2023Siva Theja Maguluri
11
Just like a standard HMM, but emits two sequences instead of one
Aligning two sequences with pair HMM, implicitly aligns the sequences
Pair HMM for Links model
Apr 22, 2023Siva Theja Maguluri
12
Either the residue lives or dies, spawning geometrically distributed residues in each case
Links model as a Pair HMM
Apr 22, 2023Siva Theja Maguluri
13
The path through the Pair HMM is π DP used to infer alignment of two
sequences Viterbi Algorithm for finding optimum
π Forward algorithm to sum over all
alignments or to sample from the posterior,
]Pr[ DA
Multiple HMMs
Apr 22, 2023Siva Theja Maguluri
14
Instead of emitting 2 sequences, emit N sequences
2N-1 emit states! Can develop such a model for any tree Viterbi and Forward algorithms use N
dimensional Dynamic programming Matrix Given a tree relating N sequences,
Multiple HMM can be constructed from Pair HMMs so that the likelihood function is
],},{Pr[ TS
Multiple HMMs
Apr 22, 2023Siva Theja Maguluri
15
Multiple HMMs
Apr 22, 2023Siva Theja Maguluri
16
Composing multiple alignment from branch alignments
Apr 22, 2023Siva Theja Maguluri
17
Residues Xi and Yj in a multiple alignment containing sequences X and Y are aligned iff They are in the same column That column contains no gaps for intermediate
sequences No deletion, re-insertion is allowed Ignoring all gap columns, provides and
unambiguous way of composing multiple alignment from branch alignments and vice versa
Eliminating internal nodes
Apr 22, 2023Siva Theja Maguluri
18
Internal nodes are Missing data Sum them out of the likelihood function Summing over indel histories will kill the
independence Sum over substitution histories using
post order traversal algorithm of Felsentein
Algorithm
Apr 22, 2023Siva Theja Maguluri
19
Progressive alignment – profiles of parents estimated by aligning siblings on a post order traversal – Impatient strategy
Iterative refinement – revisit branches following initial alignment phase – Greedy
Sample from a population of alignments, exploring suboptimal alignments in anticipation of long term improvements
Algorithm
Apr 22, 2023Siva Theja Maguluri
20
Moves to explore alignment space These moves need to be ergodic, i.e.
allow for transformation of any alignment into any other alignment
These moves need to satisfy detailed balance i.e. converges to desired stationary distribution
Move 1: Parent Sampling .
Apr 22, 2023Siva Theja Maguluri
21
Goal: Align two sibling nodes Y and Z and infer their parent X
Construct the multiple HMM for X,Y and Z
Sample an alignment of Y and Zusing the forward algorithm
This imposes an alignment of XZ and YZ Similar to sibling alignment step of
impatient-progressive alignment
Move 2: Branch Sampling
Apr 22, 2023Siva Theja Maguluri
22
Goal: realign two adjacent nodes X and Y Construct the pair HMM for X and Y,
fixing everything else Resample the alignment using the
forward algorithm This is similar to branch alignment step
of greedy-refined algorithm
Move 3: Node Sampling
Apr 22, 2023Siva Theja Maguluri
23
Goal: resample the sequence at an internal node X
Construct the multiple HMM and sample X, its parent W and children Y and Z, fixing everything else
Resample the sequence of X, conditioned on relative alignment of W,Y and Z
This is similar to inferring parent sequence lengths in impatient-progressive algorithms
Algorithm
Apr 22, 2023Siva Theja Maguluri
24
1. Parent sample up the guide tree and construct a multiple alignment
2. Visit each branch and node once for branch sampling or node sampling respectively
3. Repeat 2 to get more samples
Algorithm
Apr 22, 2023Siva Theja Maguluri
25
Replacing ‘sampling by Forward algorithm’ with ‘optimizing by Viterbi algorithm’
Impatient- Progressive is ML version of parent sampling
Greedy-refinement is ML version of Branch and node sampling
Gibbs sampling in ML context
Apr 22, 2023Siva Theja Maguluri
26
Periodically save current alignment, then take a greedy approach to record likelihood of refined alignment and get back to the saved alignment
Store this and compare likelihood to other alignments at the end of the run
Ordered over-relaxation
Apr 22, 2023Siva Theja Maguluri
27
Sampling is a random walk on Markov chain so follows Brownian motion ie rms drift grows as sqrt(n)
Would be better to avoid previously explored spaces ie ‘boldly go where no alignment has gone before’
Impose a strict weak order on alignments Sample N alignments at each stage and sort
them If the original sample ends up in position k,
choose the (N-k)th sample for the next emission
Implementation and results
Apr 22, 2023Siva Theja Maguluri
28
Implementation and results
Apr 22, 2023Siva Theja Maguluri
29
A True alignment B impatient progressive C greedy refined D Gibbs Sampling followed by Greedy
refinement E Gibbs sampling with simulated
annealing F Gibbs sampling with over relaxation G without Felsentein wild cards
Discussion
Outlines a very appealing Bayesian framework for multiple alignment
Performs very well, considering the simplicity of the model
Could add profile information and variable sized indels to the model to improve performance
Apr 22, 2023
30
Siva Theja Maguluri
Apr 22, 2023
31
Siva Theja Maguluri
Questions
Apr 22, 2023
32
Siva Theja Maguluri
Questions
Apr 22, 2023Siva Theja Maguluri
33
What is the assumption that enabled us to use this algorithm, enabling us to avoid the N dimensional matrices of DP ?
What is the importance of immortal link in the Links model ?
References
“Evolutionary HMMs: a Bayesian approach to multiple alignment” - Holmes and Bruno. Bioinformatics 2001
Apr 22, 2023
34
Siva Theja Maguluri
More results
Apr 22, 2023Siva Theja Maguluri
35
More results
Apr 22, 2023Siva Theja Maguluri
36
More results
Apr 22, 2023Siva Theja Maguluri
37
More results
Apr 22, 2023Siva Theja Maguluri
38
Poor performance on 4 is probably because Handel produces a global alignment and doesn’t handle affine gaps
Handle doesn’t incorporate any profile information
Handle cannot use BLOSUM (it’s not additive)