15
EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

EVENT IDENTIFICATION IN SOCIAL MEDIA

  • Upload
    hien

  • View
    52

  • Download
    1

Embed Size (px)

DESCRIPTION

EVENT IDENTIFICATION IN SOCIAL MEDIA. Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University. Social Media Sites Host Many “Event” Documents. “Event”= something that occurs at a certain time in a certain place [Yang et al. ’99] - PowerPoint PPT Presentation

Citation preview

Page 1: EVENT IDENTIFICATION IN SOCIAL MEDIA

EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

Page 2: EVENT IDENTIFICATION IN SOCIAL MEDIA

Social Media Sites Host Many “Event” Documents

Photo-sharing: Flickr Video-sharing: YouTube Social networking: Facebook

2

“Event”= something that occurs at a certain time in a certain place [Yang et al. ’99]

Popular, widely known eventsPresidential Inauguration, Thanksgiving Day Parade

Smaller events, without traditional news coverageLocal food drive, street fair

Social media documents for “All Points West” festival, Liberty State Park, New

Jersey, 8/8/08

Social media documents for “All Points West” festival, Liberty State Park, New

Jersey, 8/8/08

Page 3: EVENT IDENTIFICATION IN SOCIAL MEDIA

Identifying Events and Associated Social Media Documents

Applications Event search and browsing Local search …

3

General approach: group similar documents via clusteringEach cluster corresponds to one event and its associated social media documents

Page 4: EVENT IDENTIFICATION IN SOCIAL MEDIA

Event Identification: Challenges

Uneven data quality Missing, short, uninformative text … but revealing structured context

available: tags, date/time, geo-coordinates Scalability Dynamic data stream of event

information Unknown number of events

Necessary for many clustering algorithms Difficult to estimate

4

Page 5: EVENT IDENTIFICATION IN SOCIAL MEDIA

Clustering Social Media Documents Social media document

representation Social media document similarity Social media document clustering

Clustering task: definition Ensemble algorithm: combining

multiple clustering results Preliminary evaluation

5

Page 6: EVENT IDENTIFICATION IN SOCIAL MEDIA

Social Media Document Representation

TitleTitle

Description

Description

TagsTags

Date/TimeDate/Time

LocationLocation

All-TextAll-Text

6

Page 7: EVENT IDENTIFICATION IN SOCIAL MEDIA

Social Media Document Similarity

Text: tf-idf weights, cosine similarity

7

TitleTitle

Description

Description

TagsTags

Date/TimeDate/Time

LocationLocation

All-TextAll-Text

TitleTitle

Description

Description

TagsTags

Date/Time-

Keywords

Date/Time-

Keywords

Location-ProximityLocation-Proximity

All-TextAll-Text

Location-KeywordsLocation-Keywords

Date/Time-

Proximity

Date/Time-

Proximity

time

Location: geo-coordinate proximity

AA AAAA BB BBBB

Time: proximity in minutes

Page 8: EVENT IDENTIFICATION IN SOCIAL MEDIA

Social Media Document Clustering Framework

Document featurerepresentation

Social mediadocuments

Event clusters

8

Page 9: EVENT IDENTIFICATION IN SOCIAL MEDIA

Consensus Function:combine ensemble similarities

Consensus Function:combine ensemble similarities

Clustering: Ensemble Algorithm

Wtitle

Wtags

Wtime

9

f(C,W)f(C,W)

Ctitle

Ctags

Ctime

Ensemble clustering solution

Ensemble clustering solution

Learned in a training step

Learned in a training step

Page 10: EVENT IDENTIFICATION IN SOCIAL MEDIA

Clustering: Measuring Quality Homogeneous clusters

10

Complete clusters

Metric: Normalized Mutual Information (NMI)Shared information between clustering solution and “ground truth”

Page 11: EVENT IDENTIFICATION IN SOCIAL MEDIA

Experimental Setup

Data: >270K Flickr photos Event labels from Yahoo!’s “upcoming” event

database Split into 3 parts for training/validation/testing

Clusterers: single pass algorithm with centroid similarity

Weighing scheme: Normalized Mutual Information (NMI) scores on validation set

Consensus function: weighted average of clusterers’ binary predictions

Final prediction step: single pass clustering algorithm

11

Page 12: EVENT IDENTIFICATION IN SOCIAL MEDIA

Preliminary Evaluation Results Individual clusterer performance

Highest NMI: Tags, All-Text Lowest NMI: Description, Title

Ensemble performance, compared against all individual clusterers Highest overall performance in terms of

NMI More homogenous clusters: each event

is spread over fewer clusters

12

Details in paper

Details in paper

Page 13: EVENT IDENTIFICATION IN SOCIAL MEDIA

Document similarity metric Ensemble approach

Weight assignment Choice of clusterers

Train a classifier to predict document similarity Features correspond to similarity scores

All-text, title, tags, time, location, etc. Numeric values in [0,1]

State-of-the-art classifiers: SVM, Logistic Regression, …

13

Future Work: Alternative Choices

Page 14: EVENT IDENTIFICATION IN SOCIAL MEDIA

Future Work: Alternative Choices

Final clustering step Apply graph partitioning algorithms

Requires estimating the number of clusters Evaluation metrics: beyond NMI Datasets

Flickr LastFM, YouTube Exploit social network connections

14

Page 15: EVENT IDENTIFICATION IN SOCIAL MEDIA

Conclusions

Identified events and their corresponding social media documents Proposed a clustering solution Leveraged different representations of social media

documents Employed various social media similarity metrics

Developed a weighted ensemble clustering approach

Reported preliminary results of our event identification approach on a large-scale dataset of Flickr photographs

15