1
125 250 500 1000 2000 4000 8000 16000 9 8 7 6 5 Experiments on real-world social media datasets Retweet (top): long sequences with MemeTrack (bottom): short sequences with 1000 2000 4000 8000 16000 32000 2000 1500 1000 500 0 500 Type-1 Type-2 BaseRate-1 BaseRate-2 LSTM-Unit Neural Hawkes process: Continuous-time LSTM Hidden state Cell memory Extra gates to compute and Hawkes An event stream from a neural Hawkes process Time and type depend on details of past history and on each other Events happen at random times At time , there occurs an event of type Given past events, what might happen next, and when? Generative model Medical: patient’s visits, tests and diagnoses Online shopping: purchasing and feedback Social media: posts, shares, comments Other: quantified self, news, dialogue, music, etc Traditional model is a Hawkes process [1] Each event type has an intensity Each event token occurs with probability Past events temporarily excite future events Neural Hawkes process vs. similar work [2] Prediction error for type (upper) and time (lower) Train the model by maximizing log-likelihood Total intensity Integral estimation by Monte Carlo simulation Minimum Bayes Risk prediction Density for is Time prediction Type prediction Thinning algorithm for sampling sequences Experiments on artificial datasets Models try to fit data generated by each other Oracle model performance — The Neural Hawkes Process A Neurally Self-Modulating Multivariate Point Process Hongyuan Mei and Jason Eisner Center for Language and Speech Processing, Department of Computer Science, Johns Hopkins University Overview Model Algorithms Experiments (many more in paper) 125 250 500 1000 2000 4000 8000 16000 40 30 20 10 0 Missing data experiments Censor all events of some types neural Hawkes > Hawkes process Consistent over all combos h(t)= o i (2σ (2c(t)) - 1) ¯ c i+1 λ k (t)= f k (w > k h(t)) ˆ t i = R 1 t=t i-1 tp i (t)dt ˆ k i = arg max k R 1 t=t i-1 p i (t)λ k (t)/λ(t)dt p i (t)= λ(t) exp(- R t s=t i-1 λ(s)ds) t i ` = P i:t i T log λ k i (t i ) - R T t=0 λ(t)dt λ(t)= P K k =1 λ k (t) [1] Hawkes, Alan G. Spectra of some self-exciting and mutually exciting point processes. 1971. [2] Du, Nan, et. al. Recurrent marked temporal point processes: Embedding event history to vector. 2016. P ((k i ,t i ) | (k 1 ,t 1 ),..., (k i-1 ,t i-1 )) λ k (t)= μ k + P h:t h <t k h ,k exp(-δ k h ,k (t - t h )) t i k i λ t λ t λ t Error Rate % RMSE Financial MIMIC-II Stack Overflow neural Hawkes Log-likelihood # of training sequences K =3 K = 5000 zoom in Hawkes+inhibition Log-likelihood Data: Hawkes Hawkes+inhibition neural Hawkes Hawkes neural Hawkes λ 1 (t) λ 2 (t) t c(t)=¯ c i+1 + i+1 (t) i+1 (t)=(c i+1 - ¯ c i+1 ) exp(-δ i+1 (t - t i )) λ k (t) λ k (t)dt c 1 (t) c 2 (t) λ 1 (t) λ 2 (t) 0 <t 1 <t 2 ... t i k i 2 {1, 2,...,K } δ i+1 t cell c(t) ! hidden h(t) ! intensity λ(t) ! event? ! updated c(t + dt) c 3 (t) Neural Hawkes is winner (4/5, 5/5, and 5/5) on type prediction No clear winner on time prediction

posterhmei/papers/mei+eisner.nips17.poster.pdf · 2020. 6. 17. · Title: poster Created Date: 11/28/2017 8:56:13 PM

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: posterhmei/papers/mei+eisner.nips17.poster.pdf · 2020. 6. 17. · Title: poster Created Date: 11/28/2017 8:56:13 PM

125 250 500 1000 2000 4000 8000 16000number of training sequences

9

8

7

6

5

log-l

ikelih

ood p

er

event

Experiments on real-world social media datasets • Retweet (top): long sequences with • MemeTrack (bottom): short sequences with

1000 2000 4000 8000 16000 32000number of training sequences

2000

1500

1000

500

0

500

log-l

ikelih

ood p

er

event

Type-1 Type-2

Intensity-1 Intensity-2

BaseRate-1 BaseRate-2

LSTM-Unit

Neural Hawkes process: • Continuous-time LSTM • Hidden state • Cell memory • • Extra gates to compute and

Hawkes

An event stream from a neural Hawkes process Time and type depend on details of past history and on each other

Events happen at random times At time , there occurs an event of type Given past events, what might happen next, and when? • Generative model • Medical: patient’s visits, tests and diagnoses • Online shopping: purchasing and feedback • Social media: posts, shares, comments • Other: quantified self, news, dialogue, music, etc Traditional model is a Hawkes process [1] • Each event type has an intensity • Each event token occurs with probability • Past events temporarily excite future events •

Neural Hawkes process vs. similar work [2] • Prediction error for type (upper) and time (lower)

Train the model by maximizing log-likelihood

• Total intensity • Integral estimation by Monte Carlo simulation Minimum Bayes Risk prediction • Density for is • Time prediction • Type prediction Thinning algorithm for sampling sequences

Experiments on artificial datasets • Models try to fit data generated by each other • Oracle model performance —

The Neural Hawkes ProcessA Neurally Self-Modulating Multivariate Point Process Hongyuan Mei and Jason Eisner Center for Language and Speech Processing, Department of Computer Science, Johns Hopkins University

Overview Model Algorithms

Experiments (many more in paper)

125 250 500 1000 2000 4000 8000 16000number of training sequences

40

30

20

10

0

log-l

ikelih

ood p

er

event

Missing data experiments ★ Censor all events of some types ★ neural Hawkes > Hawkes process ★ Consistent over all combos

h(t) = oi � (2�(2c(t))� 1)

c̄i+1

�k(t) = fk(w>k h(t))

Intensity

Time

t̂i =R1t=ti�1

tpi(t)dt

ˆki = argmaxk

R1t=ti�1

pi(t)�k(t)/�(t)dt

pi(t) = �(t) exp(�R ts=ti�1

�(s)ds)ti

` =P

i:tiT log �ki(ti)�R Tt=0 �(t)dt

�(t) =PK

k=1 �k(t)

[1] Hawkes, Alan G. Spectra of some self-exciting and mutually exciting point processes. 1971. [2] Du, Nan, et. al. Recurrent marked temporal point processes: Embedding event history to vector. 2016.

P ((ki, ti) | (k1, t1), . . . , (ki�1, ti�1))

�k(t) = µk +

Ph:th<t ↵kh,k exp(��kh,k(t� th))

ti ki

Intensity

Time

t

Intensity

Time

t

t

Erro

r Rat

e %

RMSE

Financial MIMIC-II Stack Overflow

neural Hawkes

Log-

likel

ihoo

d

# of training sequences

K = 3

K = 5000

zoom in

Hawkes+inhibitionLog-

likel

ihoo

d

Data: Hawkes Hawkes+inhibition neural Hawkes

Haw

kes

neural Hawkes

�1(t) �2(t)

t

c(t) = c̄i+1 + ⌫i+1(t)

⌫i+1(t) = (ci+1 � ¯ci+1) exp(��i+1(t� ti))

�k(t)

�k(t)dtc1(t)

c2(t)

�1(t)

�2(t)

0 < t1 < t2 . . .

ti ki 2 {1, 2, . . . ,K}

�i+1

t

cell c(t) ! hidden h(t) ! intensity �(t) ! event? ! updated c(t+ dt)

c3(t)

Neural Hawkes is winner (4/5, 5/5, and 5/5) on type prediction No clear winner on time prediction