Upload
vuongdien
View
218
Download
1
Embed Size (px)
Citation preview
Hidden Permutation Model and
Location-Based Activity Recognition
Hung Bui
SRI International
Dinh Phung, Svetha Venkatesh, Hai Phan
Curtin University of Technology
Talk Outline
� Why model permutations?
� Distribution of random permutations
� Hidden Permutation Model (HPM)
� How to estimate HPM parameters?
� How to perform approximate inference?
� Experiments with location-based activity recognition
Why Model Permutations?
� Permutations arise in many real-world problems
� Data association, information extraction from text, machine translation, activity recognition
� Usually, there is an unknown matching that needs to be recovered
� Correspondence in data association
� Field-to-value matching in IR
� Word/phrase matching in machine translation
� A permutation is the simplest form of matching
� Brute-force computation is at least O(n!)
Permutations in Activity Recognition
� Many activities require carrying out a collection of sub-steps, each performed just once (or a repeated a small number of times)
� AAAI travel = (get_approval, book_hotel, book_air_ticket, register, prepare_slides, do_travel)
� Ordering of steps is an unknown permutation that needs to be recovered
� Factors affecting ordering between steps:
� Strongly ordered: A enables B; A and B follow a timetable
� Weakly ordered: A performed before B out of habit
� Unordered: A performed before B by chance
� Learning these ordering constraints from data can lead to better recognition performance
Permutations and Markov Models
� Permutation constraints lead to awkward graphical models, since conditional independence is lost
� Need a more direct way of defining distribution on permutations
� Standard HMM does not enforce permutation
constraints
xn = x1?xn = x2?. . .
Distributions on Permutations
� Let Per(n) = permutations of {1,2,…,n}
� Multinomial over Per(n) requires n! parameters (Kirshner et al, ICML 2003)
� Exponential Family
f : Per(n) → Rd : feature function
λ ∈ Rd : natural parameters
E.F. distribution on permutations
Log-partition function
Pr(x | λ) = exp {〈f(x),λ〉 −A(λ)}
Very general
Expensive
Few parameters
A(λ) = ln
∑
x∈Per(n)
exp (〈f(x),λ〉)
Exponential Family on Permutations (cont.)
� What features to use?
� Factors affecting ordering between activity steps:
� Strongly ordered: A enables B; A and B follow a timetable
� Weakly ordered: A performed before B out of habit
� Unordered: A performed before B by chance
fij(x) = I{x−1i < x−1j }
fij(x) for i < j
� Does step i appear before step j in x?
� With no loss of information, keep only
d = n(n−1)2
features (also num. parameters)
Exponential Family on Permutations (cont.)
Pr(x | λ) = exp
(∑
l<k s.t xl<xk
λxlxk − A(λ)
)
Pr(x | λ) = exp
∑
i<j s.t x−1i <x−1
j
λij − A(λ)
� Simplified density forms
Sum over all in-order pairs
� Example
x = (2 4 1 5 3)λ2,4 + λ2,5 + λ2,3+
λ4,5+
λ1,5 + λ1,3
Some Properties
� Swapping xi and xi+1
x′ = (x1, . . . xi+1, xi, . . . , xn)
� Reverse permutation
x′ = (xn, xn−1 . . . x1)
Pr(x′|λ) =exp(
∑i<j λij − 2A(λ))
Pr(x|λ)
Pr(x′|λ)
Pr(x|λ)=
{e−λxi,xi+1 if xi < xi+1eλxi+1,xi if xi > xi+1
const(λ)
Cost of switching adjacent (i, j), i < j is eλij
Hidden Permutation Model
� “Graphical Model”
Pr(x|λ)
� Joint distribution
Pr(ot|xt = i, η) =
Mult(ηi)
Pr(x,o|λ, η) = Pr(x|λ)n∏
t=1
Pr(ot|xt, ηxt)
Max. Likelihood Estimation, Permutation Known
� Log-likelihood function:
� Optimize
� trivial (count frequency)
� Optimize
� Convex problem� Derivative:
η
λ
i appears before j ?
Pr( i appears before j)
L(λ, η) = lnP (x | λ) + lnP (o| x, η)
▽λij (L) = fij(x)−∑
x
fij(x)P (x | λ)
Max. Likelihood Estimation, Permutation Unknown
� Log-likelihood function
� Need to jointly optimize both ; Non-convex problem
� Can we use EM ?
� M-step to for does not have a closed form
� Can try coordinate ascent:
� Fix and improve by one gradient step
� Fix and improve by EM (now has closed form)
� Didn’t work as well as simple gradient ascent
λ, η
λη
λ η
l(λ, η) =K∑
k=1
log
{∑
x
P (ok,x | λ, η)
}
λ
Max. Likelihood Estimation, Permutation Unknown
� Derivative for
� Derivative for
� Avoid dealing with constraints by transforming to natural parameter for multinomial
λ
η
▽λij (l) =∑
x
fij(x)P (x| o, λ, η)
−∑
x
fij(x)P(x | λ)
▽ηiv(l) =∑
x
I{x−1i ∈ o[v]}P (x | o, λ, η)
− Pr(v|ηi)
Pr( i appears before j given o)
Pr( i appears before j)
Pr( i appears at one of
v’s position(s) given o)
Approximate Inference via MCMC
� Typical “inference” problem requires calculating an expectation.
� Expectations can be approximated if we can generate sample from
� How to draw random permutations?
� Try a well-known MCMC idea
� Start with a random initial permutation
� Randomly switch two positions
� Accept new permutation with probability
x ∼ Pr(x|λ)
min{P (x′|λ)P (x|λ) , 1
}
Atomic activities Physical locationsBanking BankLecture 1 Watson theaterLecture 2 Hayman theaterLecture 3 Davis theaterLecture 4 Jones theaterGroup meeting 1 Bookmark cafe, Library, CBSGroup meeting 2 Library, CBS, Psychology BldGroup meeting 3 Angazi cafe, Psychology BldCoffee TAV, Angazi cafe, Bookmark cafeBreakfast TAV, Angazi cafe, Bookmark cafeLunch TAV, Bookmark cafe
Location-Based Activity Recognition on Campus
Student Activity Routines
(Permutation with
Partial-Order Constraints)
Atomic
ActivitiesCorresponding
LocationsGPS “Places”
Detection
Problem
X
X
X
� Preprocessing
� Removal of points above a speed threshold
� Often missing precisely the samples we want! (e.g. buildings)
� Interpolation within a day and across days
� Clustered into groups to find significant places using DBSCAN
“Places” from GPS
Detection Performance
TP FP Precision RecallNBC 6 4 60% 60%HMM 8.5 5.3 61.6% 85%HPM 9.8 1.9 83.8% 98%
TP FP Precision RecallActivity 1 HMM 18.2 19.5 48.3% 91.0%
KIR 18.5 2.0 90.2% 92.5%HPM 19.1 4.1 82.3% 95.5%
Activity 2 HMM 17.9 4.4 80.3% 89.5%KIR 18.0 0.7 96.3% 90.5%HPM 18.8 0.4 97.9% 94.0%
TP FP Precision RecallActivity 1 NBC 16.6 11.1 59.9% 80.3%
HMM 18.3 19.8 48.0% 91.5%KIR 18.3 8.5 68.3 % 91.5%HPM 19.1 5.1 78.9% 95.5%
Activity 2 NBC 17.1 11.0 60.9% 85.5%HMM 17.7 3.8 82.3% 88.5%KIR 18.1 4.7 79.4 % 90.5%HPM 18.5 0.5 97.4% 92.5%
Simulated Data,
Supervised (Atomic Activities Given)
Simulated Data,
Unsupervised
Real Data,
Unsupervised
In a long sequence of GPS “places”,
detect occurrences of activity routine
Conclusion
� Modelling permutation is hard, but not impossible
� A general way to parameterize distribution over
permutations using the exponential family
� If permutation is not observed, use the Hidden Permutation Model (HPM)
� Demonstrated better performance than other models that do not exploit permutation constraints, as well as naïve
multinomial permutation model (Kirshner et al).
� Future work
� Generalize to permutations with repetitions
� In supervised mode, a discriminative formulation similar to CRF might work better