42
Inside-outside algorithm LING 572 Fei Xia 02/28/06

Inside-outside algorithm LING 572 Fei Xia 02/28/06

  • View
    219

  • Download
    1

Embed Size (px)

Citation preview

Inside-outside algorithm

LING 572

Fei Xia

02/28/06

Outline

• HMM, PFSA, and PCFG

• Inside and outside probability

• Expected counts and update formulae

• Relation to EM

• Relation between inside-outside and forward-backward algorithms

HMM, PFSA, and PCFG

PCFG

• A PCFG is a tuple: – N is a set of non-terminals:– is a set of terminals– N1 is the start symbol– R is a set of rules– P is the set of probabilities on rules

• We assume PCFG is in Chomsky Norm Form• Parsing algorithms:

– Earley (top-down)– CYK (bottom-up)– …

),,,,( 1 PRNN

}{ iN

}{ kw

PFSA vs. PCFG

• PFSA can be seen as a special case of PCFG– State non-terminal– Output symbol terminal– Arc context-free rule– Path Parse tree (only right-branch binary tree)

S1 S2 S3

a b

S1

a S2

b S3

ε

PFSA and HMM

HMM Finish

Add a “Start” state and a transition from “Start” to any state in HMM.Add a “Finish” state and a transition from any state in HMM to “Finish”.

Start

The connection between two algorithms

• HMM can (almost) be converted to a PFSA.• PFSA is a special case of PCFG.• Inside-outside is an algorithm for PCFG.Inside-outside algorithm will work for HMM.

• Forward-backward is an algorithm for HMM.In fact, Inside-outside algorithm is the same

as forward-backward when the PCFG is a PFSA.

Forward and backward probabilities

)(ti)(ti

X1Xt Xn…

o1 onXn+1…

Ot-1

)(ti

X1

…Xt-1

Xt

Xn

Xn+1

O1

Ot-1

On

Ot )(ti

Backward/forward prob vs. Inside/outside prob

X1

),( lti

Xt=Ni

Ot OnOt-1O1 Ol

),( lti)(ti )(ti

O1

X1

Xt=Ni

Ot OnOt-1

PFSA:PCFG:

OutsideInsideForward

Backward

),( qpj

wp wmwp-1w1wq

),( qpj

Wq+1

N1

Nj

Notation

Inside and outside probabilities

Definitions

• Inside probability: total prob of generating words wp…wq from non-terminal Nj.

• Outside probability: total prob of beginning with the start symbol N1 and generating and all the words outside wp…wq

• When p>q,

jpqN

),,(),( )1()1(1 mqjpqpj wNwPqp

)|(),( jpqpqj NwPqp

0),(),( qpqp jj

Calculating inside probability (CYK algorithm)

),1(),()(),(,

1

qddpNNNPqp srsr

sr

q

pd

jj

Nj

Nr Ns

wp wd Wd+1 wq

)(),( kj

j wNPkk

Calculating outside probability (case 1)

),1()(),(),(, 1

eqNNNPepqp ggjf

gf

m

qefj

Nj Ng

wp wq Wq+1 we

Nf

N1

w1 wm

Calculating outside probability (case 2)

)1,()(),(),(,

1

1

peNNNPqeqp gjgf

gf

p

efj

Ng Nj

we Wp-1 Wp wq

Nf

N1

w1 wm

Outside probability

)1,()(),(

),1()(),(),(

,

1

1

, 1

peNNNPqe

eqNNNPepqp

gjgf

gf

p

ef

ggjf

gf

m

qefj

otherwise

jifmj 0

11),1(

Probability of a sentence

),1()( 11 mwP m

kanyforwNPkkwP kj

jjm )(),()( 1

),(),(),( 1 qpqpNwP jjjpqm

Recap so far

• Inside probability: bottom-up

• Outside probability: top-down using the same chart.

• Probability of a sentence can be calculated in many ways.

Expected counts and update formulae

The probability of a binary rule is used

)(

),1(),()(),(

)|,(1

1

1m

q

pdsr

srjj

msrjj

pq wP

qddpNNNPqp

wNNNNP

)(

),1(),()(),(

)|,(

)|,(

1

1

1 1

1 11

1

m

q

pdsr

srjj

m

p

m

q

m

p

m

qm

srjjpq

msrjj

wP

qddpNNNPqp

wNNNNP

wNNNNP

(1)

The probability of Nj is used

),(),(),( 1 qpqpNwP jjjpqm

)(

),(),(

)(

),()|(

11

11

m

jj

m

mjpq

mjpq wP

qpqp

wP

wNPwNP

)(

),(),(

)|(

)|(

)|(

11 1

1

11 1

1

m

jjm

p

m

q

msrj

r s

mjpq

m

p

m

q

mj

wP

qpqp

wNNNP

wNP

wNP

(2)

m

p

m

pqjj

m

p

m

pq

q

pdsr

srjj

mj

mjsrj

msrj

qpqp

qddpNNNPqp

wNP

wNNNNPwNNNP

1

1

1

1

11

),(),(

),1(),()(),(

)2(

)1(

)|(

)|,()|(

The probability of a unaryrule is used

)(

),(),(),()|,(

1

11

m

m

h

khjj

mjkj

wP

wwhhhhwusedisNwNP

m

p

m

pqjj

m

h

khjj

mj

mjkj

mjkj

qpqp

wwhhhh

wNP

wNwNPwNwNP

1

1

1

11

),(),(

),(),(),(

)2(

)3(

)|(

)|,(),|(

(3)

Multiple training sentences

ii

m

jjm

p

m

qm

j

Wsentenceforjh

wP

qpqpwNP

)(

)(

),(),()|(

11 11

),,(

)(

),1(),()(),(

)|,(1

11

1 11

srjf

wP

qddpNNNPqp

wNNNNP

i

m

q

pdsr

srjj

m

p

m

pqm

srjj

(1)

(2)

)(

),,()(

jh

srjfNNNP

ii

iisrj

Inner loop of the Inside-outside algorithm

Given an input sequence and1. Calculate inside probability:

• Base case• Recursive case:

2. Calculate outside probability:• Base case:

• Recursive case:

)(),( kj

j wNPkk

),1(),()(),(,

1

qddpNNNPqp srsr

sr

q

pd

jj

otherwise

jifmj 0

11),1(

)1,()(),(

),1()(),(),(

,

1

1

, 1

peNNNPqe

eqNNNPepqp

gjgf

gf

p

ef

ggjf

gf

m

qefj

Inside-outside algorithm (cont)

)(

),1(),()(),(

)|,(

1

1

1 1

1

m

q

pdsr

srjj

m

p

m

q

msrjj

wP

qddpNNNPqp

wNNNNP

)(

),(),(),()|,(

1

11

m

m

h

khjj

mjkj

wP

wwhhhhwusedisNwNP

3. Collect the counts

4. Normalize and update the parameters

km

jkjm

jkj

kj

k

kjkj

r sm

srjjm

srjj

r s

srj

srjsrj

wusedisNwNP

wusedisNwNP

wNCnt

wNCntwNP

wNNNNP

wNNNNP

NNNCnt

NNNCntNNNP

)|,(

)|,(

)(

)()(

)|,(

)|,(

)(

)()(

1

1

1

1

Relation to EM

Relation to EM

• PCFG is a PM (Product of Multi-nominal) Model

• Inside-outside algorithm is a special case of the EM algorithm for PM Models.

• X (observed data): each data point is a sentence w1m.

• Y (hidden data): parse tree Tr.

• Θ (parameters):

)(

)(

kj

srj

wNP

NNNP

Relation to EM (cont)

),|,(

),|,(

),,(*),|(

),,(*),|()(

1

11 1

11

msrjj

msrjj

pq

m

p

m

q

srj

Trmm

srj

Y

srj

wNNNNP

wNNNNP

NNNwTrcountwTrP

NNNYXcountXYPNNNcount

),|,(

),,(*),|(

),,(*),|()(

11

11

mjkj

m

h

Tr

kjmm

kj

Y

kj

wusedisNwNP

wNTrwcountwTrP

wNYXcountXYPwNcount

Summary

)(ti )1( tj

XtXt+1

Ot

N1

Nr Ns

wp wd Wd+1 wq

Nj

),( qpj

),( qpj

)|( 1 iXjXPa ttij

),|( 1 jXiXwOPb ttktijk

)( srj NNNP

)( kj wNP

Summary (cont)

• Topology is known:– (states, arcs, output symbols) in HMM– (non-terminals, rules, terminals) in PCFG

• Probabilities of arcs/rules are unknown.

• Estimating probs using EM (introducing hidden data Y)

Additional slides

Relation between forward-back and inside-outside algorithms

Converting HMM to PCFG

• Given an HMM=(S, Σ, π, A, B), create a PCFG=(S1, Σ1,S0, R, P) as follows:– S1=– Σ1=– S0=Start– R=

– P:

]},1[,|,,{}{ 0 NjiDDStartN iji

},{ EOSBOS

},,{

}{]},1[,|{0

0

EOSNBOSDwD

NDStartNjiNDNi

kij

ijiji

1)(

1)(

)(

)(

)(

0

EOSNP

BOSDP

bwDP

DNStartP

aNDNP

i

ijkkij

ii

ijjiji

Path Parse tree

X1 X2XT…

o1 o2oT

XT+1

Start

X1D0

BOS X2D12

o1…

XT

XT+1DT,T+1

ot EOS

Outside probability

),( qpj

),( Tpj

),( Tti

q=T

)(ti

(j,i),(p,t)

),(_ qpji

),(_ ppji

),(_ ttji

q=p

)1()( tat jiji

(p,t)

Outside prob for Nj Outside prob for Dij

Inside probability

),( Tpj

),( Tti

)(ti

q=T

(j,i),(p,t)

),( qpj

),(_ ppji

),(_ ttji

tijob

q=p

(p,t)

),(_ qpji

Inside prob for Nj Inside prob for Dij

)(

),1(),()(),(

)|,(1

1

1m

q

pdsr

srjj

msrjj

pq wP

qddpNNNPqp

wNNNNP

)(

),1(),()(),(

)|,(1

1

1T

q

pdjr

jrii

Tjrii

tq oP

qddtNNNPqt

oNNNNP

td

DN

Tqijr

)(

),1(),()(),()|,(

11

T

jrjiji

iT

jijiitT oP

TtttNDNPTtoNDNNP

Renaming: (j,i), (s,j),(p,t),(m,T)

)()(

)()()|,(

1

11 t

oP

jbaioNDNNP ij

T

tijkijtT

jijiitT

)|,( 1Tjijii

tT oNDNNP Estimating

m

p

m

pqm

srjjpqm

srjj wNNNNPwNNNNP1 1

11 )|,()|,(

T

t

T

pqT

jriitqT

jrii ONNNNPONNNNP1 1

11 )|,()|,(

td

DN

Tqijr

Renaming: (j,i), (s,j),(p,t),(m,T)

T

tij

T

tT

jriitTT

jijii tONNNNPONDNNP11

11 )()|,()|,(

)|,( 1Tjijii oNDNNP Estimating

)(

),(),()|(

11 11

m

jjm

p

m

qm

j

wP

qpqpwNP

)(

),(),()|(

11 11

T

iiT

t

T

qT

i

OP

qtqtONP

td

DN

Tqijr

Renaming: (j,i), (s,j),(p,t),(m,T)

)()(

)()()|(

1111 t

OP

iiONP

T

ti

T

ttT

tT

i

)|( 1Ti ONPEstimating

)|(

)|,()|(

1

11

mj

mjsrj

msrj

wNP

wNNNNPwNNNP

ijT

ti

T

tij

Ti

Tijri

Tjri a

t

t

ONP

ONNNNPONNNP

1

1

1

11

)(

)(

)|(

)|,()|(

Renaming: (j,i), (s,j),(w,o),(m,T)

)|( 1Tjri ONNNP Calculating

Calculating

m

p

m

pqjj

m

hhhjj

mkj

qpqp

wwhhhhwwNP

1

11

),(),(

),(),(),()|(

T

t

T

tqii

T

thtjiji

Tkji

qtqt

wOttttOwDP

1

1__

1_

),(),(

),(),(),()|(

tijoji

jijiji

btt

tattt

tq

),(

)1()(),(

_

_

Renaming (j,i_j), (s,j),(p,t),(h,t),(m,T),(w,O), (N,D)

T

tij

ht

T

tij

T

tijojiji

T

thtijojiji

Tkji

t

wOt

btat

wObtatOwDP

t

t

1

1

1

11

_

)(

),()(

)1()(

),()1()()|(

)|( 1_

Tkji OwDP