Upload
zander
View
43
Download
0
Tags:
Embed Size (px)
DESCRIPTION
A Generalization of Forward-backward Algorithm. Ai Azuma Yuji Matsumoto Nara Institute of Science and Technology. Forward-backward algorithm. Allows efficient calculation of sums (e.g. expectation, ...) over all paths in a trellis. Plays an important role in sequence modeling - PowerPoint PPT Presentation
Citation preview
A Generalization of Forward-backward Algorithm
Ai AzumaYuji Matsumoto
Nara Institute of Science and Technology
Forward-backward algorithm
• Allows efficient calculation of sums (e.g. expectation, ...) over all paths in a trellis.
• Plays an important role in sequence modeling• HMMs (Hidden Markov Models)• CRFs (Conditional Random Fields)
[Lafferty et al., 2001]• ...
A sequential labeling example: part-of-speech tagging
SOURCE
“Time flies like an arrow”
Time[noun]
Time[verb]
Time[prep.]
flies[noun]
flies[verb]
flies[prep.]
like[noun]
like[verb]
like[prep.]
an[noun]
an[verb]
an[prep.]
arrow[noun]
arrow[verb]
arrow[prep.]
SINK
Time[indef. art.]
flies[indef. art.]
like[indef. art.]
an[indef. art.]
arrow[indef. art.]
in CRFs and HMMs, we need to compute the "sum" ofthe probabilities (or scores) of all paths.
Forward-backward algorithm efficiently computes sums over all paths in the trellis with dynamic programming
It is intractable to enumerate all paths in the trellis because the number of all paths is enormous
Forward-backward algorithm recursively computes the sum from source/sink to sink/source with keeping intermediate results on each node and arc
Forward-backward algorithm is applicable to
Normalization constant of CRFs
E-step for HMMs
Feature expectationon CRFs
Yy yy
xCc
tcCc
P ctE ,,
Yy yy
xxFλx Cc
kCc
kP cfcZ
fE ,,exp1
Yy y
xFλxCc
cZ ,exp
t = type of node/node pair
= k-th featurekf yC = set of nodes and arcs (cliques) in path yY = set of paths
0th-order moment(Normalization constant)
1st-order moment
Type of sums computable with forward-backward algorithm:
Yy yy CcCc
cfc
Yy yCc
c
yC = set of nodes and arcs (cliques) in path yY = set of paths
But sometimes we need higher-order multivariate moments...
Yy yyy
Kn
CcK
n
CcCc
cfcfc 1
1
To name a few examples:Correlation between featuresObjectives more complex than log-likelihoodParameter differentiations of these...
Our goal: To generalize forward-backward algorithm for higher-order multivariate moments!
Can we derive dynamic programming for this formula?
Answer Record multiple forward/backward variables for each clique,
and Combine all the previously calculated values by the binomial theorem
xYy yyy
Kn
CcK
n
CcCc
cfcfc 1
1
SOURCE
u Cc
cusrc
0Yy y
・・・・・
u
u CcCc
cfcusrc
1Yy yy
u
n
CcCcn cfcu
srcYy yy
A set of paths from SOURCE to u
usrcY
SOURCE
u Cc
cusrc
0Yy y
・・・・・
u
u CcCc
cfcusrc
1Yy yy
u
n
CcCcn cfcu
srcYy yy
A set of paths from SOURCE to u
usrcY
Ordinary forward-backward records only this variable
Direct ancestors of v
u
v・・・・・SOURCE
vx
xvvprev
00
vxvx
xvfxvvprev
0prev
11
i
j vxji
ji vvf
j
ivv
0 prev
・・・・・
・・・・・
ni ,,0
vprev
・・・・・
Direct ancestors of v
u
v・・・・・SOURCE
vx
xvvprev
00
vxvx
xvfxvvprev
0prev
11
i
j vxji
ji vvf
j
ivv
0 prev
・・・・・
・・・・・
ni ,,0
vprev
・・・・・
These are derived from the binomial theorem
These are derived from the binomial theorem
Direct ancestors of SINK
SINK・・・・・SOURCE
SINKprev
・・・・・
SINKprev
00 SINKSINKx
x
SINKprev
0
SINK
SINKSINKSINK
xji
i
j
ji f
j
i
・・・・・
・・・・・ ni ,,0
Desired values
Summary of Our Ideas
u
v・・・・・ ・・・・・
u0
・・・・・
u1
un
v0
・・・・・
v1
vn・・・・・
SOURCE
multiple variablesfor each clique
multiple variablesfor each clique
Dependency between variables in a step,which is derived from the binomial theoremDependency between variables in a step,
which is derived from the binomial theorem
For multivariate cases, forward/backward variables have multiple indices
u
u0,,0
・・・・・
u1,,0
uKnn ,,1
xYy yyy
00
1Cc
KCcCc
cfcfc
xYy yyy
10
1Cc
KCcCc
cfcfc
xYy yyy
Kn
CcK
n
CcCc
cfcfc 1
1
・・・・・
To calculate the following form
computational cost of the generalized forward-backward is proportional to
.11 22
21 nnEV
Computational cost is only linear in the number of nodes and arcs in the trellis
xYy yyy
Kn
CcK
n
CcCc
cfcfc 1
1
Linear in |V| and |E|Linear in |V| and |E|
Merits of the generalized forward-backward algorithm
1. The generalized forward-backward subsumes many existing task-specific algorithms
2. For some tasks, it leads to a solution more efficient than the existing ones
Merit 1. The generalized forward-backward subsumes many existing task-specific algorithms:
Task Sum to compute
Parameter diffs. of Hamming-loss for CRFs [Kakade et al., 2002]
Parameter diffs. of entropy for CRFs[Mann et al., 2007]
Hessian-vector
product for CRFs[Vishwanathan et al., 2006]
y yyy
xxFλxFλCc
kCcCc
cfcc ,,,exp
y yyy
y yyy yy
xFλxFxFλ
xFλxFλxFxFλ
CcCcCc
CcCcCcCc
ccc
cccc
,,,exp
,,exp,,exp
y yyy
xxFλCc
kCcCc
cfcc ,,exp
Merit 1. The generalized forward-backward subsumes many existing task-specific algorithms:
Task Sum to compute
Parameter diffs. of Hamming-loss for CRFs [Kakade et al., 2002]
Parameter diffs. of entropy for CRFs[Mann et al., 2007]
Hessian-vector
product for CRFs[Vishwanathan et al., 2006]
y yyy
xxFλxFλCc
kCcCc
cfcc ,,,exp
y yyy
y yyy yy
xFλxFxFλ
xFλxFλxFxFλ
CcCcCc
CcCcCcCc
ccc
cccc
,,,exp
,,exp,,exp
y yyy
xxFλCc
kCcCc
cfcc ,,exp
All these formulas have a form computable with our proposed method.All these formulas have a form computable with our proposed method.
The previously proposed algorithms for these tasks are task-specific
The generalized forward-backward is a task-independent algorithm applicable to formulae of the form
If a problem involves this form, it immediately offers efficient solution
xYy yyy
Kn
CcK
n
CcCc
cfcfc 1
1
Merits of the generalized forward-backward algorithm
1. The generalized forward-backward subsumes many existing task-specific algorithms
2. For some tasks, it leads to a solution more efficient than the existing ones
Merit 2. Efficient optimization procedure with respect to Generalized Expectation Criteria for CRFs [Mann et al., 2008]
EVL Computational cost is proportional to
Computational cost is proportional to
EV
Algorithm proposed in [Mann et al., 2008] By a specialization of the generalization
Nodes labeled as answers
(L = # of nodes labeled as answers)
Future tasks
• Explore other tasks to which our generalized forward-backward algorithm is applicable
• Extend the generalized forward-backward to trees and general graphs containing cycles
Summary• We have generalized the forward-backward
algorithm to allow for higher-order multivariate moments
• The generalization offers an efficient way to compute complex models of sequences that involve higher-order multivariate moments
• Many existing task-specific algorithms are instances of this generalization
• It leads to a faster algorithm for computing Generalized Expectation Criteria for CRFs
Summary• We have generalized the forward-backward
algorithm to allow for higher-order multivariate moments
• The generalization offers an efficient way to compute complex models of sequences that involve higher-order multivariate moments
• Many existing task-specific algorithms are instances of this generalization
• It leads to a faster algorithm for computing Generalized Expectation Criteria for CRFs
Thank you for your attention!