18
Scalable and Near Real-Time Burst Detection from eCommerce Queries Nish Parikh, Neel Sundaresan ACM SIGKDD ’08 Presenter: Luo Yiming

Scalable and Near Real-Time Burst Detection from eCommerce Queries Nish Parikh, Neel Sundaresan ACM SIGKDD ’08 Presenter: Luo Yiming

Embed Size (px)

Citation preview

Page 1: Scalable and Near Real-Time Burst Detection from eCommerce Queries Nish Parikh, Neel Sundaresan ACM SIGKDD ’08 Presenter: Luo Yiming

Scalable and Near Real-Time Burst Detection from eCommerce Queries

Nish Parikh, Neel Sundaresan

ACM SIGKDD ’08

Presenter: Luo Yiming

Page 2: Scalable and Near Real-Time Burst Detection from eCommerce Queries Nish Parikh, Neel Sundaresan ACM SIGKDD ’08 Presenter: Luo Yiming

OutlineContext in which the problem is

bored Infinite-state automaton Bursty and Hierarchical Structure in Streams--ACM

SIGKDD’02

Main contribution of this workFormer related work

Page 3: Scalable and Near Real-Time Burst Detection from eCommerce Queries Nish Parikh, Neel Sundaresan ACM SIGKDD ’08 Presenter: Luo Yiming

Main Idea—Bursty and Hierarchical Structure in Streams

Extract meaningful structure from document stream

Burst of activity: certain features rising sharply in frequency as the topic emerges

A formal approach for modeling such “bursts”◦An infinite-state automaton◦Bursts appear as state transitions◦A nested representation of the set of bursts that

imposes a hierarchical structure on the overall stream.

Page 4: Scalable and Near Real-Time Burst Detection from eCommerce Queries Nish Parikh, Neel Sundaresan ACM SIGKDD ’08 Presenter: Luo Yiming

A Weighted Automation Model: One State Model

Generating model:

◦ : the gap in time of two consecutive messages

◦Expectation:◦ : rate of message arrivals

Why this model?

1

x

( ) xf x e

Page 5: Scalable and Near Real-Time Burst Detection from eCommerce Queries Nish Parikh, Neel Sundaresan ACM SIGKDD ’08 Presenter: Luo Yiming

A Weighted Automaton Model: Two State Model

Two states automaton A: q0,q1

A changes state with probability p, remaining in its current state with probability 1-p, independently of previous emissions and state changes.

A begins in state q0. Before each message is emitted, A changes state with probability p. A message is then emitted, and the gap in time until the next message is determined by the distribution associated with A's current state.

00 0( ) xf x e 1

1 1( ) xf x e

Page 6: Scalable and Near Real-Time Burst Detection from eCommerce Queries Nish Parikh, Neel Sundaresan ACM SIGKDD ’08 Presenter: Luo Yiming

A Weighted Automaton Model: Two State Model

Based on a set of messages to estimate a state sequence◦Maximum likelihood

n inter-arrival gaps:A state sequence:b denotes the number of state

transitions in the sequence q

1 2( , , )nx x xx 1 2

( , , )ni i iq q qq

1

Pr ( )Pr |

Pr ( )

11 ( )

1 t

b nn

i tt

f

f

pp f x

Z p

q

qq

q xq x

q x

Page 7: Scalable and Near Real-Time Burst Detection from eCommerce Queries Nish Parikh, Neel Sundaresan ACM SIGKDD ’08 Presenter: Luo Yiming

A Weighted Automaton Model: Two State Model

Finding a state sequence q maximizing previous probability is equivalent to finding one that minimizes

Equivalent to minimize the following cost function:

1

ln Pr | ln ln ( ) ln 1 ln1 t

n

i tt

pb f x n p Z

p

q x

1

| ln ln ( )1 t

n

i tt

pc b f x

p

q x

Page 8: Scalable and Near Real-Time Burst Detection from eCommerce Queries Nish Parikh, Neel Sundaresan ACM SIGKDD ’08 Presenter: Luo Yiming

Experiment relatedDataset: 5 months of queries from eBay.com in 2007 (75+ TB of data).Assumption and pre-definition: i) The number of queries uniform distribute over time of one day; ii) Max number of segments of query arrivals per day is scaled to 48; iii) Each arrival is represented by a UNIX Timestamp.

Page 9: Scalable and Near Real-Time Burst Detection from eCommerce Queries Nish Parikh, Neel Sundaresan ACM SIGKDD ’08 Presenter: Luo Yiming
Page 10: Scalable and Near Real-Time Burst Detection from eCommerce Queries Nish Parikh, Neel Sundaresan ACM SIGKDD ’08 Presenter: Luo Yiming

0

1 02.5 ln(0.38)

average rate of arrival for query

C

Page 11: Scalable and Near Real-Time Burst Detection from eCommerce Queries Nish Parikh, Neel Sundaresan ACM SIGKDD ’08 Presenter: Luo Yiming

Incremental Burst Detection

Based on the rate of change of percentage volume for a query

Vs. change of absolute volume— Noiseless;

Object to batched arrival of new queries– avoid recalculate the entire state sequence when new batch arrives.

Page 12: Scalable and Near Real-Time Burst Detection from eCommerce Queries Nish Parikh, Neel Sundaresan ACM SIGKDD ’08 Presenter: Luo Yiming

Incremental Burst Detection , let the batch contain

instances of Q out of a total of queries, and is the total number of batches.

: ; Cost= ,

when batch comes to state

tht tr

td n

0q 0 /p R D 1 1 0:q p s ptht

iq

Page 13: Scalable and Near Real-Time Burst Detection from eCommerce Queries Nish Parikh, Neel Sundaresan ACM SIGKDD ’08 Presenter: Luo Yiming

Burst ClassificationMethod based: Wavelet transforms4 classes: i) Matterhorns; ii) Cuestas; iii) Dogtooths; iiii) Hogback.

Page 14: Scalable and Near Real-Time Burst Detection from eCommerce Queries Nish Parikh, Neel Sundaresan ACM SIGKDD ’08 Presenter: Luo Yiming

Sorting and RankingConcentration based ranking -Duration of burst (D);

-Mass (Popularity) of Burst (M); -Arrival Rate for Burst (A); -Span Ratio (SR); -Momentum of Burst (Mo): Mo = (M . A); -Concentration of Burst (Xc): Xc=

0.1( )SR

Page 15: Scalable and Near Real-Time Burst Detection from eCommerce Queries Nish Parikh, Neel Sundaresan ACM SIGKDD ’08 Presenter: Luo Yiming

Sorting and Ranking

Distance Based Ranking

Page 16: Scalable and Near Real-Time Burst Detection from eCommerce Queries Nish Parikh, Neel Sundaresan ACM SIGKDD ’08 Presenter: Luo Yiming

Performance Compare

Page 17: Scalable and Near Real-Time Burst Detection from eCommerce Queries Nish Parikh, Neel Sundaresan ACM SIGKDD ’08 Presenter: Luo Yiming

Implementation

Page 18: Scalable and Near Real-Time Burst Detection from eCommerce Queries Nish Parikh, Neel Sundaresan ACM SIGKDD ’08 Presenter: Luo Yiming

Thank You!