Transcript
Page 1: Dynamic Multi-Faceted Topic Discovery in Twitter

Dynamic Multi-Faceted Topic Discovery in Twitter

Jan Vosecky

Di Jiang

Kenneth Wai-Ting Leung

Wilfred Ng

Page 2: Dynamic Multi-Faceted Topic Discovery in Twitter

2

Twitter

Page 3: Dynamic Multi-Faceted Topic Discovery in Twitter

3

Representation

• Vector space model– Term vector sparseness issue

• Topic models– Latent topic vector better than VSM?

Page 4: Dynamic Multi-Faceted Topic Discovery in Twitter

4

Topic Models

A latent topic in LDA

“Arab revolutions”

Libya 0.00040Force 0.00020Human 0.00010Abuse 0.00010Protect 0.00009Secure 0.00008War 0.00005Execute 0.00004

Page 5: Dynamic Multi-Faceted Topic Discovery in Twitter

5

A topic in Twitter?

• Not just words• People talk about entities

Locations

Time

…PersonsOrganizations

Page 6: Dynamic Multi-Faceted Topic Discovery in Twitter

6

Multi-faceted Topic Model

• Each topic consists of n facets– Elements of each facet ~ multinomial distribution

• Each document d is a distribution over topics– General terms, named entities and timestamp

drawn from the respective facet of topic z

Page 7: Dynamic Multi-Faceted Topic Discovery in Twitter

7

Multi-faceted Topic Model

Multi-faceted latent topic “Arab revolutions”

General terms Persons Locations Organizations

Time

Page 8: Dynamic Multi-Faceted Topic Discovery in Twitter

8

Parameter Inference

• Scalability– Gibbs sampling and variational inference

process data in a batch

• Online inference– Stochastic variational inference

to process streaming data

Model continuously updated

Constant time to process a new doc

doc doc doc doc

inference

doc doc doc doc

inference

……

Page 9: Dynamic Multi-Faceted Topic Discovery in Twitter

9

Perplexity comparison:Online inference vs. Gibbs sampling

K = 50 K = 200

Page 10: Dynamic Multi-Faceted Topic Discovery in Twitter

10

Tweet Clustering

(a) Manually-labeled dataset (b) Hashtag-labeled dataset

DBSCANK-means Direct DBSCANK-means Direct

Vector space model (TF-IDF)

Page 11: Dynamic Multi-Faceted Topic Discovery in Twitter

11

Summary

• Model multi-faceted topics in microblogs– Entity-oriented and dynamic

• Online inference method

• Beneficial for downstream applications

Page 12: Dynamic Multi-Faceted Topic Discovery in Twitter

12

Thank You!

Jan Vosecky

Di Jiang

Kenneth Wai-Ting Leung

Wilfred Ng


Recommended