Upload
lee-lewis
View
39
Download
6
Embed Size (px)
DESCRIPTION
Course Review (Part 1). LING 572 Fei Xia 1/19/06. Outline. Recap Homework 1 Project Part 1. Recap. Recap. FSA and HMM DT, DL, TBL. A learning algorithm. Modeling: Representation Decomposition Parameters Properties Training: Simple counting, hill-climbing, greedy algorithm, … - PowerPoint PPT Presentation
Citation preview
Course Review (Part 1)
LING 572
Fei Xia
1/19/06
Outline
• Recap
• Homework 1
• Project Part 1
Recap
Recap
• FSA and HMM
• DT, DL, TBL
A learning algorithm
• Modeling:– Representation– Decomposition– Parameters– Properties
• Training: – Simple counting, hill-climbing, greedy algorithm, …– Pruning and filtering– Smoothing issues
A learning algorithm (cont)
• Decoding:– Simply verify condition: DT, DL, TBL– Viterbi: FSA and HMM– Pruning during the search
• Relation with other algorithms: – Ex: DNF, CNF, DT, DL and TBL– Ex: WFA and HMM, PFA and HMM
NLP task
• Choose a ML method: e.g., DT, TBL
• Modeling: – Ex: TBL: What kinds of features?– Ex: HMM: What are the states? What are the output symbols?
• Training: e.g., DT– Select a particular algorithm: ID3, C4.5– Choose pruning/filtering/smoothing strategies, thresholds,
quality measures, etc.
• Decoding:– Pruning strategies
Homework 1
Hw1
• Problem 3 & 4: State-emission and arc-emission HMMs.
• Problem 5: Viterbi algorithm
• Problem 2: HMM
• Problem 1: FSA
Problem 3: State-emission HMM
Arc-emission HMM
kjibb
AASS
jkijk ,,12
12121212
Given a path X1, X2, ..., Xn+1 in HMM1 The path in HMM2 is X1, X2, ..., Xn+1.
(a)
(b)
111 k
ijki
iji
i ba
Problem 3 (cont)
)|(
),|(*)|(*)(
)|(*)|(*)(
)|(
22
121212
111111
11
HMMOP
XXOPXXPX
XOPXXPX
HMMOP
iiiii
i
iiii
i
(c)
Problem 4: Arc-emission HMM state-emission HMM
),|()_|(
.0
')|()_|_(
..0
)()_(
},|_{
12
'2
12
12
1112
1
jiji
ijikji
iji
jiji
sswBsswB
wo
iiifssAssssA
wo
jiifsss
SssssSSS
111 k
ijki
iji
i ba
(a)
Problem 4 (cont)
(b) Given a path X1, X2, …., Xn+1 in HMM1, the path in HMM2 is X1_X1, X1_X2, …., Xn_Xn+1
(c)
)|(
)_|(*)_|(*)_(
),|(*)|(*)(
)|(
22
1211_2112
111111
11
HMMOP
XXOPXXXXPXX
XXOPXXPX
HMMOP
iiiiii
ii
iiiii
i
Problem 5: Viterbi algorithm with ε-emission
)),(cos)(max,)((max)1(
:
)),(cos*,(max)1(
:
),,(max)( 1,11,11.1
tt
m
kjokjik
ijoijii
j
iji
j
jmtmX
def
j
bakittbatt
Induction
jit
tionInitializa
sXOXPt
Problem 5 (cont)
)()1(ijijijij cCandbac )( )()( n
ijn cC
),*(max )1()1()( nijkj
nik
k
nij cccc
)1(),( NijcjiCost
Cost(i, j) is the max prob for a path from i to j which produces nothing. To calculate Cost(i, j), let
where N is the number of states in HMM.
Problems 1 & 2: Important tricks
),...,(...),...,(...
)()(
1 21 2
11 nx x x
nx x x
xx
xxfcxxfc
xfcxfc
nn
Constants can be moved outside the sum signs:
Tricks (cont)
The order of sums can be changed:
),...,(...),...,(...1 11 2
11 nx x x
nx x x
xxfxxfn nn
Tricks (cont)
• The order of sum and product:
)(
))(...))((
))(...)((
)()(...
)()(...
)(...
1
1
1
1
1
1
1
1
1
1
1 1
1 1
1 1
1 2
11 22
n
i xii
x x x
n
iiin
x x x
n
iiinn
x x x
n
iiinn
x x x
n
iiinn
x x x
n
iii
ii
n n
n n
n n
n
nn
xf
xfxf
xfxf
xfxf
xfxf
xf
Problem 2: HMM
• Prove by induction:– When the length is 0:
– When the length is n-1, we assume that
1),()()(0||
i
ii
iO
sPPOP
1||
1)(nO
OP
Problem 2 (cont)
1)(
),(),(
),(),(
),(),(
),()(
1,1
1,11,1
1,11,1
1,1,1
,1,1
1,1
1,11,1
1,11,1
1,11,1
,1,1
n
nn
n
n
nn
n
n n
n
n
nn
On
O iin
jij
O iin
ijiO j i
nijoO
ijiO j i
n
ijoijiO j i
nO
ijoijiO j i
n
Ojn
jOn
OP
sOPasOP
asOPbasOP
basOPbasOP
sOPOP
Problem 1: FSA
0 1111
0 1111
0 1111
01
111
01,1,1
0,1
1,1
1,1
1,1 ,1
,1 1,1
,1 1,1,1
),()()(
),,()()(
),,()()(
),,()()(
),()(
n q
n
iiin
n q
n
iiii
wn
n q
n
iiii
wn
n wi
q
n
iiin
nn
wn
qn wn
n
n i
n n
n n
n nn
qqtqFqI
qwqpqFqI
qwqpqFqI
qwqpqFqI
qwPwP
Problem 1 (cont)
q0
f
q1
q2
qN
f
q1
q2
qN
f
q1
q2
qN
...
Project Part 1
Carmel: a WFA package
CarmelInput/outputsymbols
WFA
best path
Bigram tagging
• FST1:
Initial states: {BOS}
Final states: {EOS}
• FST2:
ti
tj: P(tj | ti)tj
t/w: P(w | t)
q
21 FSTFSTWFA
Trigram tagging
• FST1:
Initial state: {BOS-BOS}
Final state: {EOS-EOS}
• FST2:
t0t1
t2: P(t2 | t1,t0)t1t2
t/w: P(w | t)
q
21 FSTFSTWFA
Minor details
• BOS and EOS:– No need for special treatment for BOS– EOS:
• Add two “EOS”s at the end of a sentence, or• Replace input symbol “EOS” with ε (a.k.a. *e*).
Results
# of Training sentences
Tagging accuracy
1K 85.67%
5K 92.11%
10K 93.43%
40K 95.35%