Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
At
tiid
tti
ihi
hilt
it
jt
dl
Aut
omat
icvi
deo
anno
tatio
nvi
ahi
erar
chic
alto
pic
traj
ecto
rym
odel
Aut
omat
ic v
ideo
ann
otat
ion
via
hier
arch
ical
topi
c tr
ajec
tory
mod
elp
jy
idi
dl
lti
cons
ider
ing
cros
sm
odal
corr
elat
ions
cons
ider
ing
cros
s-m
odal
corr
elat
ions
cons
ider
ing
cros
sm
odal
cor
rela
tions
gT
kh
Nk
(1)
Aki
tK
i(2
)H
ik
Kk
(12)
Shi
kiS
(1)
Shi
kiM
ib
(1)
Nb
tk
O(1
)K
iK
hi(2
)T
kN
ihi
t(1
)Ta
kuho
Nak
ano(
1)A
kisa
toK
imur
a(2)
,Hiro
kazu
Kam
eoka
(1,2
)Sh
igek
iSag
ayam
a(1)
Shig
ekiM
iyab
e(1)
Nob
utak
aO
no(1
)K
unio
Kas
hino
(2)
Taku
yaN
ishi
mot
o(1)
Taku
hoN
akan
o, A
kisa
to K
imur
a, H
iroka
zuK
ameo
ka, S
hige
ki S
agay
ama
, Shi
geki
Miy
abe
, Nob
utak
aO
no, K
unio
Kas
hino
, Tak
uya
Nis
him
oto
(1)T
heU
nive
rsity
ofTo
kyo
(2)N
TTC
omm
unic
atio
nSc
ienc
eLa
bort
orie
sC
onta
ct:A
kisa
toK
imur
a<a
kisa
to@
ieee
org>
(1) T
he U
nive
rsity
of T
okyo
(2)
NTT
Com
mun
icat
ion
Scie
nce
Labo
rtor
ies
Con
tact
: Aki
sato
Kim
ura
<aki
sato
@ie
ee.o
rg>
()
yy
()
@g
ABST
RAC
TAB
STR
ACT
PRO
CED
UR
EPR
OCE
DU
RE
ABST
RAC
TAB
STR
ACT
PRO
CED
UR
EPR
OCE
DU
RE
ABST
RAC
TAB
STR
ACT
PRO
CED
UR
EPR
OCE
DU
RE
(1)
Mod
elle
arni
ng•W
epr
opos
ea
gene
rativ
em
odel
nam
edH
iera
rchi
calT
opic
Traj
ecto
ryM
odel
(HTT
M)
(1)
Mod
el le
arni
ng•W
e pr
opos
e a
gene
rativ
e m
odel
nam
ed H
iera
rchi
cal T
opic
Tra
ject
ory
Mod
el (H
TTM
) (A
)Et
tl
llf
tf
hd
litfo
raut
omat
icvi
deo
anno
tatio
nan
dvi
deo
retr
ieva
l(A
) Ext
ract
s lo
w-le
vel f
eatu
res
for
each
mod
ality
for a
utom
atic
vid
eo a
nnot
atio
n an
d vi
deo
retr
ieva
l.(
)y
[I]S
IFT
(ki
&d
i)
BV
W(B
fVi
lWd
)H
TTM
iifi
dd
lfi
idi
did
il
[Imag
es]S
IFT
(key
poin
ts&
desc
ripto
rs)+
BoV
W(B
agof
Visu
alW
ords
)•H
TTM
isa
unifi
edm
odel
fora
utom
atic
vide
oan
nota
tion
and
vide
ore
trie
val
[Imag
es] S
IFT
(key
poin
ts&
des
crip
tors
) B
oVW
(Bag
of V
isua
l Wor
ds)
[T]
Ti
hd
bid
f(i
df
)H
TTM
is a
uni
fied
mod
elfo
r aut
omat
ic v
ideo
ann
otat
ion
and
vide
o re
trie
val.
[Tex
ts]
Tag
occu
rren
ces
wei
ghte
dby
idf(
inve
rse
docu
men
tfre
quen
cy)
In
corp
orat
eslo
wle
velf
eatu
res
toge
ther
with
keyf
ram
ew
ise
topi
cm
odel
sth
atca
n[T
exts
] T
ag o
ccur
renc
es w
eigh
ted
by id
f(in
vers
e do
cum
ent f
requ
ency
)
Inco
rpor
ates
low
-leve
l fea
ture
s to
geth
er w
ith k
eyfra
me-
wis
e to
pic
mod
els
that
can
(B
)Cl
lt
lt
ti
bli
PC
CA
pg
yp
enco
dean
arbi
trary
num
bero
fcro
ssm
odal
corre
latio
ns(B
)Cal
cula
tes
late
ntva
riabl
esus
ing
PC
CA
enco
de a
n ar
bitra
ry n
umbe
r of c
ross
-mod
al c
orre
latio
ns(B
) Cal
cula
tes
late
nt v
aria
bles
usi
ng P
CC
Ay
D
eals
with
tem
pora
ltop
ictra
nsiti
ons
byhi
dden
Mar
kov
mod
els
(HM
M)
D
eals
with
tem
pora
l top
ic tr
ansi
tions
by h
idde
n M
arko
v m
odel
s (H
MM
)P
ji
ib
id
iC
CA
Som
ebe
havi
ors
have
been
anal
yzed
thro
ugh
TREC
VID
Sem
antic
Inde
task
Pro
ject
ion
mat
rices
obt
aine
d vi
a C
CA
Dia
gona
lmat
rix•S
ome
beha
vior
s ha
ve b
een
anal
yzed
thro
ugh
TREC
VID
Sem
antic
Inde
task
.(C
)Ui
dl
if
dl
tf
ti
titi
dl(
HM
M)
iV
itbi
li
jD
iago
nal m
atrix
yg
g(C
)Uns
uper
vise
dle
arni
ngof
mod
elpa
ram
eter
sfo
rtop
ictra
nsiti
onm
odel
(HM
M)v
iaV
iterb
ilea
rnin
g(C
) Uns
uper
vise
d le
arni
ng o
f mod
el p
aram
eter
s fo
r top
ic tr
ansi
tion
mod
el (H
MM
) via
Vite
rbil
earn
ing
MET
HO
DM
ETH
OD
Obs
erva
tion:
Seq
uenc
eof
late
ntva
riabl
esM
ETH
OD
MET
HO
DO
bser
vatio
n: S
eque
nce
of la
tent
var
iabl
es
MET
HO
DM
ETH
OD
(2)
Infe
ence
(an
nota
tion)
(2)
Inf
eren
ce (
= a
nnot
atio
n)(1
)M
otiv
atio
n(
)(
)(1
) M
otiv
atio
n(a
)Ext
ract
slo
w-le
veli
mag
efe
atur
esM
dli
lti
hib
ti
dt
t(a
) Ext
ract
s lo
w-le
vel i
mag
e fe
atur
es
•Mod
elin
gre
latio
nshi
psbe
twee
nim
ages
and
text
s(b
)Cal
cula
tes
late
ntva
riabl
eson
lyfro
mim
age
feat
ures
Mod
elin
g re
latio
nshi
ps b
etw
een
imag
es a
nd te
xts
(b) C
alcu
late
s la
tent
var
iabl
es o
nly
from
imag
e fe
atur
es(c)
O
bjec
ts:
Co
Co--
occu
rren
ces
occu
rren
ces
are
usef
ul(d)
(c)
O
bjec
ts:
Co
Co--
occu
rren
ces
occu
rren
ces
are
usef
ul.
(d)
E
vent
s:C
oC
ooc
curr
ence
soc
curr
ence
sar
eal
mos
tnec
essa
ryso
met
imes
they
mig
htbe
the
only
cue
E
vent
s: C
oC
o--oc
curr
ence
soc
curr
ence
sar
e al
mos
t nec
essa
ry, s
omet
imes
they
mig
ht b
e th
e on
ly c
ue.
()E
tit
hidd
tt
bV
itbi
hI
tit
lif
tii
tth
dl
(c)E
stim
ates
hidd
enst
ates
byV
iterb
isea
rch
•Inc
orpo
ratin
gte
mpo
rali
nfor
mat
ion
into
the
mod
elC
ross
road
, tra
ffic
jam
(c) E
stim
ates
hidd
en s
tate
s by
Vite
rbis
earc
hIn
corp
orat
ing
tem
pora
l inf
orm
atio
n in
to th
e m
odel
j
(d)L
aten
tvar
iabl
em
odifi
catio
nw
ithim
age
feat
ures
&hi
dden
stat
es
Ahi
erar
chic
alte
mpo
rals
truct
ure
Ahi
erar
chic
alte
mpo
rals
truct
ure
wou
ldbe
nece
ssar
y(d
) Lat
ent v
aria
ble
mod
ifica
tion
with
imag
e fe
atur
es &
hid
den
stat
es()
A
hie
rarc
hica
l tem
pora
l stru
ctur
eA
hie
rarc
hica
l tem
pora
l stru
ctur
ew
ould
be
nece
ssar
y(e)
(d)
•S
tand
ard
appr
oach
es:D
iscr
imin
ativ
e(e
xS
VM
)(e)
()
(b)
•S
tand
ard
appr
oach
es: D
iscr
imin
ativ
e (e
x. S
VM
)()
E
ach
clas
sifie
rper
form
sw
elli
nge
nera
l
Eac
h cl
assi
fier p
erfo
rms
wel
l in
gene
ral
cars
g
Diff
ilt
ti
t“
”
Diff
icul
t to
inco
rpor
ate
“co-
occu
rren
ces”
pbu
ses
•O
urap
proa
ch:G
ener
ativ
e(to
pic
mod
els)
bike
sbu
ses
()
f(
)se
man
ticO
ur a
ppro
ach:
Gen
erat
ive
(top
ic m
odel
s)(e
)Mis
sing
feat
ure
(=se
man
ticin
dex)
estim
atio
nse
man
tic
(2)
Hie
rarc
hica
ltop
ictr
ajec
tory
mod
el(H
TTM
)(e
) Mis
sing
feat
ure
(se
man
tic in
dex)
est
imat
ion
inde
x(2
) H
iera
rchi
cal t
opic
tra
ject
ory
mod
el (
HTT
M)
inde
x(
)p
jy
()
Hid
den
stat
esEX
PER
IMEN
TSEX
PER
IMEN
TSH
idde
nst
ates
EXPE
RIM
ENTS
EXPE
RIM
ENTS
Hi
hil
EXPE
RIM
ENTS
EXPE
RIM
ENTS
Hie
rarc
hica
l te
mpo
ral
(1)
Cond
ition
sLa
tent
varia
bles
tem
pora
l (1
) C
ondi
tions
Late
ntva
riabl
esst
ruct
ures
[Dat
aset
]12
7vi
deo
clip
s,56
191
shot
sfro
mTR
EC
VID
2005
Cro
ss-m
odal
[Dat
aset
]12
7 vi
deo
clip
s, 5
6191
sho
ts fr
om T
RE
CV
ID20
05[I
ft
]V
IRE
O37
4(S
IFT
ki
t&
di
tli
dB
VW
ith50
0di
)co
-occ
urre
nces
[Imag
e fe
atur
es]
VIR
EO
-374
(SIF
T ke
ypoi
nts
& d
escr
ipto
rs +
nor
mal
ized
BoV
Ws
with
500
dim
s)co
-occ
urre
nces
Low
-leve
lfea
ture
s[
g]
(yp
p)
[Lab
els]
47co
ncep
tsse
lect
edfro
mLS
CO
Man
dLS
CO
Mlit
eLo
w-le
velf
eatu
res
[Lab
els]
4
7 co
ncep
ts s
elec
ted
from
LS
CO
M a
nd L
SC
OM
-lite
Car
Roa
d[
]p
[Eva
luat
ion]
Mea
nav
erag
epr
ecis
ion
(mea
nAP
com
mon
lyus
edm
easu
rein
TRE
CV
ID)
Car
, Roa
dFe
atur
e[E
valu
atio
n]M
ean
aver
age
prec
isio
n (m
eanA
P, c
omm
only
use
d m
easu
re in
TR
EC
VID
)Fe
atur
e l
ti(2
)R
ltco
rrel
atio
nsIm
ages
/tex
ttag
s(2
)Re
sults
Imag
es/ t
extt
ags
(2)
Res
ults
0045
chance
025
1L
tt
ibl
difi
tii
ffti
0.045
chance
画像
0.25
1La
tent
var
iabl
e m
odifi
catio
n is
effe
ctiv
e
Ti
tj
ti
Kf
it
id
l004
画像のみ
imag
e on
ly
Topi
ctr
ajec
tory
expr
essi
onKe
yfra
me-
wis
eto
pic
mod
el0.09
0.04
s1m24
0To
pic
traj
ecto
ry e
xpre
ssio
nKe
yfra
me
wis
e to
pic
mod
el008
0035
s1_m
240
212
00.2
•H
MM
isus
ed(O
bser
vatio
ns=
late
ntva
riabl
es)
•In
corp
orat
eco
occu
rren
ces
ina
natu
ralw
ay0.08
0.035
s2_m
120
Fram
ewis
e•
HM
M is
use
d (O
bser
vatio
ns =
late
nt v
aria
bles
).•
Inco
rpor
ate
co-o
ccur
renc
es in
a n
atur
al w
ay0.07
0.03
on
s3m80
ae
seP
CC
A(
)
Eh
hidd
tt
iht
dt
py
Sifi
df
kf
ti
sio
s3_m
804
60015
PC
CA
E
ach
hidd
en s
tate
mig
ht c
orre
spon
ds to
a
•S
erve
as
a un
ified
fram
ewor
k fo
r sem
antic
0.06
AP
0.025
cis
s4_m
600.15
gp
conc
epto
rast
ory
expr
esse
dby
ase
quen
ceof
inde
and
retri
eval
005
nA
rec
s5m48
conc
ept o
r a s
tory
exp
ress
ed b
y a
sequ
ence
of
inde
and
retri
eval
0.05
ean
0.02
pr
s5_m
486
4001
py
py
qto
pics
gT
lP
bbi
liti
CC
A0.04
me
ge
s6_m
400.1
topi
cs.
•To
ol:
Pro
babi
listic
CC
A [B
ach+
200
5]
m
0.015
ag
s7m35
HTT
MH
MM
isfe
asib
lefo
rsim
ple
epr
essi
onof
topi
c[
]
0.03
001
era
s7_m
35s10m24
HTT
M•
HM
M is
feas
ible
for s
impl
e ex
pres
sion
of t
opic
Low
com
puta
tiona
lcos
tfor
mod
el002
0.01
ave
s10_m24
0.05
(tem
pora
lp
pp
trans
ition
Low
com
puta
tiona
l cos
t for
mod
el
li
di
f0.02
0005
a
s20m12
(tem
pora
lt
ttra
nsiti
on.
lear
ning
and
infe
renc
e0.01
0.005
_s30m8
stru
ctur
e+
Ti
iti
bil
it
dd
ble
arni
ng a
nd in
fere
nce
0s30_m8
0la
tent
varia
ble
To
pic
varia
tion
can
be e
asily
intro
duce
d by
Eas
yto
exte
ndth
em
odel
tono
n-lin
ear
00
s40m6
0
bd
late
nt v
aria
ble
f)
py
yus
ing
Gau
ssia
nm
ixtu
res
(GM
M)
E
asy
to e
xten
d th
e m
odel
to n
onlin
ear
ik
lti
kAirplane
Airplane
_Flying
_Maps
Urban
Sports
Stud
iom
odifi
catio
n)us
ing
Gau
ssia
n m
ixtu
res
(GM
M).
ones
via
kern
eltri
ck2
Ai
ttti
fht
idhi
hi
i)
g(
)on
es v
ia k
erne
l tric
k2
App
ropr
iate
set
ting
of h
yper
-par
amet
ers
prov
ides
hig
h pr
ecis
ion
gy
g