25
© Copyright 2010 Hewlett-Packard Development Company, L.P. 1 Jianshu Weng and Bu-Sung Lee, Francis HP Labs Singapore EVENT DETECTION IN TWITTER

EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.1

Jianshu Weng and Bu-Sung Lee, Francis

HP Labs Singapore

EVENT DETECTION IN TWITTER

Page 2: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.2

AGENDA

– Motivation

– Proposed Method

• Wavelet analysis

• Event Detection with Clustering of Wavelet-based Signals (EDCoW)

– Evaluation

• Experimental study

• General Election case study

– Conclusion

Page 3: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.3

MOTIVATION

Benefits

Different perspective from the traditional

media

Disaster early warning

Challenges

Big & fast

Noisy

Page 4: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.6

MOTIVATION (CONT’D)

–We need a way to detect “events” out of

“babble”-flooded twitter content stream. •Fast

•Low storage requirement

•Can tell how “significant” the event is.

–Existing techniques: usually first-order signals, e.g.

TFIDF/DFIDF

•Redundant information

•Many features would seem to have similar waveforms when

we apply feature-pivot methods

Page 5: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.7

PROPOSED METHOD

–EDCoW: Event Detection with Clustering of

Wavelet-based Signals

–Steps:

•Construct 1st-order signals for individual features based on

their DFIDF scores, for a given interval

•Wavelet analysis on the 1st-order signals to construct 2nd-

order signals

•Filtering

•Clustering with modularity-based graph cut

•Measuring the event “significance”

Page 6: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.8

SIGNAL CONSTRUCTION

– First order: DFIDF

– is the number of tweets containing feature x at time

point i, and is the total number of tweets at time point i.

Page 7: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.9

SIGNAL CONSTRUCTION (CONT’D)

–Second stage:•applying wavelet analysis to compress the DFIDF signals to keep only the changes within a number of sample points in the first stage. −H is the normalized wavelet entropy in a sliding window

−Each 2nd-stage sampling point captures how much the change in the wavelet entropy is

•Why 2nd-stage signal?−Make the change in usage pattern more salient, easier to measure

the similarity between features

−Compression: only non-trivial changes are stored

Page 8: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.10

SIGNAL CONSTRUCTION (CONT’D)

1st stage 4210 3 6 7 85 ……... Tct=

210 3 ……... T’c

9

D1 D2

D2*

2nd

stage,

Δ=3t'=

Page 9: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.11

SIGNAL CONSTRUCTION (CONT’D)

1st-order 2nd-order

Page 10: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.12

FILTERING OF TRIVIAL FEATURES

–Measure the auto-correlation of the signal

corresponding to a feature

–Filtering with median absolute deviation (MAD)

– 5% features remain

Page 11: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.13

MEASURE THE SIMILARITY BETWEEN FEATURES

–Similarity measured as the cross-correlation

between signals

– Ignore the similarity between features if it is too

weak

•Using Median Absolute Deviation

– complexity, but still tractable with small

portion of remaining features

Page 12: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.14

CLUSTERING FEATURES TO DETECT EVENTS

–Organize the cross-correlation between features as

a matrix• Adjacency matrix of a graph .

0 0.8 0.5

0.8 0 0.6

0.5 0.6 0

Page 13: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.15

CLUSTERING FEATURES TO DETECT EVENTS (CONT’D)– Event detection can be formulated as a graph partitioning

problem.

– Modularity can be applied to measure the quality of the

partitioning

• the sum of weights of all the edges that fall within sub-graphs (after

partitioning) subtracted by the expected edge weight sum if edges were placed

at random

• The goal is to partition the graph so that value is maximized.

Page 14: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.16

MEASURE THE SIGNIFICANCE OF THE EVENT

–The first part sums up all the cross correlation

values between signals (i.e. features) associated

with an event

–The second part discounts the significance if

the event is associated with too many features• it is not reasonable for an event to be associated with too many words

Page 15: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.17

EVALUATION

–Experimental

–Data•Obtain the top 1000 Singapore-based Twitter users with

the most followers from http://twitaholic.com/.

•For each user in U, include her Singapore-based followers

and friends within 2 hops.

•For each user in , collect the tweets published in June

2010.

–Remove stop words, apply stemming

Page 16: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.18

Page 17: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.19

EVALUATION (CONT’D)

–Practical case study

–Singapore General Election 2011

•7 parties

•Most Singaporeans vote for the first time

•Social media is allowed as a campaign platform for the

first time

Page 18: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.20

Page 19: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.21

Page 20: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.22

Page 21: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.23

Page 22: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.24

Page 23: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.25

SOME TWEETS

•PAP is changing! @PAPSingapore: Join PM Lee's live chat

at the PAP FB page TONIGHT at 8pm.

http://on.fb.me/kg4qiS #sgelections #sgelection

•say Yes to Tin Pei Ling & get a free Kate Spade wallet

•Potong Pasir comes back to big boss PAP. Let them upgrade

already, 5 years later go back to SPP. Smart tactic

residents.

Page 24: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.26

CONCLUSION

–Text analysis on Twitter content reveals discussion

about “real-life” events

–Spaces for improvement:

•Combine discussion-based similarity (signal cross

correlation) with other factors

−N-Gram

−semantic similarity

•Further exploit the social relationship among users

•Larger scale study

Page 25: EVENT DETECTION IN TWITTER - Semantic Scholar · 2017. 8. 26. · –Practical case study –Singapore General Election 2011 ... •say Yes to Tin Pei Ling & get a free Kate Spade

© Copyright 2010 Hewlett-Packard Development Company, L.P.27

Q&A