EM with Many Random Variables Another Example of EM Sequence Alignment via HMM Lecture # 10

.

EM with Many Random VariablesAnother Example of EM

Sequence Alignment via HMM

Lecture #10

This class has been edited from Nir Friedman’s lecture. Changes made by Dan Geiger, then Shlomo Moran.

Background Readings: chapters 11.6, 3.4, 3.5, 4, in the Durbin et al., 2001, Chapter 3.4 Setubal et al., 1997

2

EM for processes with many dice

In the previous class we presented the EM algorithm for the case where the parameters are probabilities associated with a single “die” (i.e., probability space/random variable) .

In practical applications the model may include many dice (like the HMM model).

The generalization of the EM algorithm to many dice is rather straightforward, and given next.

3

EM for processes with many dice

The model is defined by the parameters (random variables, or dice) and the simple events.

Let the random variables be Zl (l =1,...,r), Zl has ml values zl,1,...zl,ml

with probabilities {qlk|k=1,...,ml}.

Each simple event y corresponds to a sequence of outcomes (zl1,k1

,...,zln,kn) of the random variables used in y.

Let Nlk(y) = #(zlk appears in y).

Define Nlk as the expected value of Nlk(y), given x and θ:

Nlk=E(Nlk|x,θ) = ∑y p(y|x,θ) Nlk(y),

Then we have:

4

EM for processes with many dices

Similarly to the single die case, we have:

( )

1 1

( | )l

lk

mrN y

lkl k

p y

5

( | , )( | , ) ( )

1 1

( ) ( | , )

1 1 1 1

( ) ( | )

where is the expected number of times that,

given and , the outcome of dice was :

l

lk

l llky lk

p y xmrp y x N y

lky y l k

m mN y p y xr rN

lk lkl k l k

lk

L p y

N

x l k

''

( ) ( | , ).

( ) is maximized for

lk lky

lklk

lkk

N N y p y x

NL

N

L (λ) for processes with many dices

Nlk

6

EM algorithm for processes with many dice

Maximization step

Set λlk=Nlk / (∑k’ Nlk’)

Similarly to the one dice case we get:

Expectation step

Set Nlk to E (Nlk(y)|x,θ), ie:Nlk= ∑y p(y|x,θ) Nlk(y)

7

EM algorithm for n independent observations x1,…, xn :

Expectation stepIt can be shown that, if the xj are independent, then:

1 1

( | , ) ( , ) j

n nj j j j j

lk lk lkj jy

N p y x N y x N

j

lkN

1

1 ( , | ) ( , )

( ) j

nj j j j

lkjj y

p y x N y xp x

8

EM – One More Example

The DNA of species in planet Melmek contains two letters: A and B. In the evolutionary process in Melmek, a mutation of a letter occurs in two steps: first the letter may be deleted, and in case the letter is not deleted it can be changed to the 2nd letter. The unknown probabilities of these events are respectively , ,del A B B Ap p p

There are two species in Melmek, S (for Son) and its direct ancestor F (Father). Scientists were able to deduce that the DNA of F contained a sequence of two letters "AX", where "X" is equally likely to be A or B (Prob(X=A) = Prob(X=B) = 0.5).

1. Describe the probability space defined by the evolution of the two letters "AX" in F into a sequence (of up to two letters) in S:a. Write down all the "simple events".b. For each event write its probability (as a function of ),c. and its contribution to the six statistics used by the EM algorithm.

, ,del A B B Ap p p

9

Example (cont)The parameters:

;

;

1

,

,

,

1

1del remain del

A B A A A B

B A B B B A

p

p

p

p p

p

p

p

p

For writing the simple events, we assume the following order of “dice tossing” :

(a) decide if the initial sequence is AA or AB (with probability 0.5 each)(b) Delete/replace the right letter (one or two tossing) (c) Same for left letter.

For instance, a simple event of deletion of two letters is:

2

. The outcomes : ( , )

its probability: 0.5* ; its parameters-count: 2 of de

del del

l

AA A del del

p del

10

Example (cont)

Two more simple events which start with AA:

, ( , , )

probability: 0.5* *(1 )*(1 ),

counts: 1 , 1 , 1

,corresponding to ( , , )

probability: 0.5* *(1 )*

del del A B

del del A B

A Adel remainAA A A

p p p

del remain A A

A

del remain

del remaA inA B

p p

A

A

p

A

B

counts: 1 , 1 , 1 del remain A B

There are altogether 18 simple events: 9 which start with AA and 9 which start with AB

11

Example (cont)

2. Later information showed that the sequence "AX" of F evolved into the sequence "A" in S.For given , write down the probability of the above scenario of evolution (of

"AX" in F evolving into "A" in S).

, ,del A B B Ap p p

1 2 3 4

1 2

1 2

3 4

This scenario consists of 4 simple events:

, which start with and , which start with :

: ; :

( ) ( ) 2* 0.5* *(1 )*(1 ) ,

: ; :del del A B

y y AA y y AB

y AA A A y AA AA A

prob y prob y p p p

y AB A A y AB AA A

3 4

4

1

( ) ( ) 0.5* *(1 )* (1 )

The total probability of all 4 events is

( ) *(1 )* 1.5*(1 ) 0.5*

del del A B B A

i del del A B B Ai

prob y prob y p p p p

prob y p p p p

12

Example (cont) 3. Write a single round of the EM algorithm to estimate the parameters which maximize the likelihood of the above scenario, starting with arbitrary initial parameters Show that the outcome is independent on the initial parameters.

1

2

3

4

: : the outcomes are ( , , )

: : ( , , )

: : ( , , )

: : ( , , )

y AA A A del remain A A

y AA AA A remain A A del

y AB A A del remain A A

y AB AA A remain B A del

Regardless the initial parameters:del and remain have exactly the same count, hence they receive each probability 0.5Also, A→A and B→A both get probability 1.

Calculate the counts of each outcome in each simple event:

13

Example (end) 4.What are the parameters which maximize the likelihood of the above scenario? Justify your answer.

4

1

Solution 1: By analyzing the formula of the likelihood of the event,

which we computed earlier:

( ) *( )* 1.5*( ) 0.5*

which can be maximized by maximizing for each of the

11

3 di

i del de B ABi

Al ppppr b y po

ce:

0.5; 0; 1.del A B B Ap p p

Solution 2: The above is parameters are obtained after each iteration of the EM algorithm. By the EM correctness theorem, this must be the (unique) maximum.

14

EM for other discrete stochastic processes

Where the experiment (x,y) is generated by a general “stochastic process”. The only assumption we make is that the outcome of each experiment consists of a (finite) sequence of samplings of r discrete random variables (dices) Z1,...,Zr , each of the Zi ‘s can be sampled few times. This can be realized by a probabilistic acyclic state machine, where at each state some Zi is sampled, and the next state is determined by the outcome – until a final state is reached.

The EM algorithm is applicable to a general scenario in which we wish to maximize

p(x|)=∑yp(x,y|).

15

EM in Practice

Initial parameters: Random parameters setting “Best” guess from other source

Stopping criteria: Small change in likelihood of data Small change in parameter values

Avoiding bad local maxima: Multiple restarts Early “pruning” of unpromising ones

16

Sequence Comparison using HMM

We now use HMM to extend such log-odds scoring functions to alignments which may contain gaps (indels).

Recall: We used log-odds scoring functions for gapless alignments, as follows:

s(a,b)= log(pab / qa qb ), where pab and qa are the probabilities of the “Match” and “Random” models.

17

Sequence alignment using HMM

• Each “output symbol” of the HMM is an aligned pair of two letters, or of a letter and a gap.

• Example: Insertion of a first gap in this model:

17

…XM

(G,T)

…

(C,-)

We still need to assign transition/emission probabilities

19

• Need to define the hidden states, and the transition and emission probabilities, which define the probability of each aligned pair of sequences.

• Given two input sequences, we look for an alignment of these sequences of maximum probability.

20

Hidden states and emitted symbols

“Hidden” States Match. Insertion in x. insertion in y.

Symbols emitted Match: {(a,b)| a,b in ∑ }. Insertion in x: {(a,-)| a in ∑ }. Insertion in y: {(-,a)| a in ∑ }.

21

Transitions and Emission Probabilities

ε01- ε

0ε1- ε

δδ1-2δ

M X

X

M

Y

Y

Emission Probabilities Match: (a,b) with pab – only from M states

Insertion in x: (a,-) with qa – only from X state

Insertion in y: (-,a).with qa - only from Y state.

(Note that the hidden states can be reconstructed from the alignment.)

Transitions probabilities(note the forbidden ones).δ = probability for 1st gapε = probability for tailing gap.

22

Scoring alignments

For each pair of sequences x (of length m) and y (of length n), there are many alignments of x and y, each corresponds to a different state-path (the lengths of the paths are between max{m,n} and m+n).Given the transmission and emission probabilities, each alignment has a defined score – the product of the corresponding probabilities.An alignment is “most probable” if it maximizes this score.

23

Finding the most probable alignment

Let vM(i,j) be the probability of the most probable alignment of x(1..i) and y(1..j), which ends with a match. Similarly, vX(i,j) and vY(i,j), the probabilities of the most probable alignment of x(1..i) and y(1..j), which ends with an insertion to x or y.

Then using a recursive argument, we get:

(1 2 ) ( 1, 1)

[ , ] max (1 ) ( 1, 1)

(1 ) ( 1, 1)i j

M

M Xx y

Y

v i j

v i j p v i j

v i j

24

Most probable alignment

By similar argument for vX(i,j) and vY(i,j), the probabilities of the most probable alignment of x(1..i) and y(1..j), which ends with an insertion to x or y, are:

( 1, )[ , ] max

( 1, )i

MX

x X

v i jv i j q

v i j

( , 1)[ , ] max

( , 1)j

MY

y Y

v i jv i j q

v i j

25

The Probability Space

Different alignments of x and y may have different lengths, so the probability space which we used earlier, for HMM of a fixed length L, is not applicable to this alignment HMM model.

However, there is a probability space which contains all infinite sequence alignments (finite alignments are compound events in this model). The algorithm of the previous slides compute the correct probability of each alignment in this probability space.

Another approach is to define a probability space which contains all alignments of finite length. In the following we adapt our algorithm to this model.

26

Adding termination probabilities

Each other state has a transition probability τ to the END state. This results in an expected sequence length of 1/ τ.

M X Y END

M 1-2δ -τ δ δ τ

X 1-ε -τ ε τ

Y1-ε -τ

ε τ

END 1

Probability space for all finite alignments is obtained by adding an END state, which denotes the end of the alignment

The last transition in each alignment is to the END state, with probability τ

27

The log-odds scoring function

We wish to compare the “model” alignment score to the “random” alignment score.

For gapless alignments we used the log-odds ratio:

s(a,b) = log (pab / qaqb). To adapt this for the HMM model, we need to model

random sequence by HMM, with end state.

28

scoring function for random model

X Y END

X 1- η η 0

Y 0 1- η η

END 0 0 1

The transition probabilities for the random model, with termination probability η:

(x is the start state)

2

1 1

( , | ) (1 )i j

n mn m

x yi j

p x y Random q q

The emission probability for a is qa.

Thus the probability of x (of length n) and y (of length m) being random is:

And the corresponding score is:

m

iy

n

ix ii

qqmnRandomyxp11

12 loglog)log()(log)|,(log

29

Markov Matrices for “Random” and “Model”

X Y END

X 1- η η

Y 1- η η

END 1

M X Y END

M 1-2δ -τ δ δ τ

X 1-ε -τ ε τ

Y1-ε -τ

ε τ

END 1

“Model”

“Random”

30

Combining models in the log-odds scoring function

In order to compare the M score to the R score of sequences x and y, we can find an optimal M score, and then subtract from it the R score.

This is insufficient when we look for local alignments, where the optimal substrings in the alignment are not known in advance. A better way:

1. Define a log-odds scoring function which keeps track of the difference Match-Random scores of the partial strings during the alignment.

2. At the end add to the score (logτ – 2logη) to compensate for the end transitions in both models.

We get the following:

31


)log(

],[)log(

],[)log(

],[)log(

maxlog],[

12

111

111

1121

bY

X

M

yx

yxM

jiV

jiV

jiV

qq

pjiV

ji

ji

And at the end add to the score (logτ – 2logη).

)log(],[log

],[logmax],[

11

1

jiV

jiVjiV

Y

MY

)log(],[log

],[logmax],[

1

1

1

jiV

jiVjiV

X

MX

(assuming that letters at insertions/deletions are selected by the random model)

32


Another way, with uniform scoring for the M state (Durbin et. al., Chapter 4.1): Define scoring function s with penalties d and e for a first gap and a tailing gap, resp.

Then modify the algorithm to correct for extra prepayment, as follows:

-1

states) Yor X tong when movi"prepayment(" 211

1

state)from M move(assume 1

212

log

))((

)(log

)(

)(loglog),(

-e

-d

qq

pbas

ba

ab

33

Log-odds alignment algorithm

( 1, 1)

[ , ] ( , ) max ( 1, 1)

( 1, 1)

M

M Xi j

Y

V i j

V i j s x y V i j

V i j

( 1, )[ , ] max

( 1, )

MX

X

V i j dV i j

V i j e

( , 1)[ , ] max

( , 1)

MY

Y

V i j dV i j

V i j e

Initialization: VM(0,0)=logτ - 2logη.

Termination:V = max{ VM(m,n), VX(m,n)+c, VY(m,n)+c}Where c= log (1-2δ-τ) – log(1-ε-τ)

34

Total probability of x and y

Rather then computing the probability of the most probable alignment, we look for the total probability that x and y are related by our model.

Let fM(i,j) be the sum of the probabilities of all alignments of x(1..i) and y(1..j), which end with a match. Similarly, fX(i,j) and fY(i,j) are the sum of these alignments which end with insertion to x (y resp.). A “forward” type algorithm for computing these functions.Initialization: fM(0,0)=1, fX(0,0)= fY(0,0)=0 (we start from M, but we could select other initial state).

35

Total probability of x and y (cont.)

(1 2 ) ( 1, 1)

[ , ] (1 ) ( 1, 1)

(1 ) ( 1, 1)i j

M

M Xx y

Y

f i j

f i j p f i j

f i j

The total probability of all alignments is:P(x,y|model)= fM[m,n] + fX[m,n] + fY[m,n]

( 1, )[ , ]

( 1, )i

MX

x X

f i jf i j q

f i j

( , 1)[ , ]

( , 1)j

MY

y X

f i jf i j q

f i j

Documents

EM with Many Random Variables Another Example of EM Sequence Alignment via HMM Lecture # 10