KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia...

Preview:

Citation preview

KDD 2012, Beijing, China

Community Discovery and Profiling

with Social Messages

Wenjun Zhou • Hongxia Jin • Yan Liu

2

Background

Statistics show that emails have been ubiquitous at the workplace

With the availability of additional online social media, information overload has become problematic

• Use emails daily or several times each week: 97%• Emails being “essential” for their everyday work: 71%

--- Institute for the Future (Bowes 2000)

US workers average 49 minutes a day managing email, and 25% spend more than one hour per day on that task (Gartner 2001).

4

5

6

Motivation

Help users boost productivity Summarize their work areas automatically Keep track of past and on-going collaborations Prioritize work-related tasks

7

Problem Formulation Given: a user’s emails Find: the user’s work profile -- a set of work areas

Constraints: Unsupervised (or semi-supervised later on) Effectiveness in providing insights Computation efficiency

Teachingclass, homework, score

Alice, Bob, Charlie

Researchemail, mining, data, paper

Hongxia, Yan

Advisingmeeting, report, draft

Dane, Ellen, Flint

Grantsproject, proposal, grant, due

Sarah, Tim

8

Traditional Community Finding

9

Community (i.e. Work Area)

Two aspects people (whom

you collaborate with)

task (what you collaborate on)

10

The Email Data

11

Data Preprocessing

People (email accounts) Disregarded roles, only considered occurrence

Content (subject + body) Removed punctuations and stop words; Words

are stemmed; Documents converted into bag of words.

Unused: Replicate messages; Time-stamps; Attachments;

12

Topic Models: A Bayesian Approach

Assume: a topic is a unique distribution of words a document has a mixture of topics documents are generated by sampling from topics

and words

13

Latent Dirichlet Allocation (Blei et al., 2003)

14

COllaborator COMmunity Profiling Model (COCOMP)

15

Enron Emails

16

17

Social Messages

18

19

20

Summary COCOMP: a latent community model

Each social media document corresponds to a sharing activity within a community.

A community is represented with a list of top participants and associated list of topics.

Experiments on email and social media datasets demonstrate interesting results.

Future work Different sources of data with the same user Evolution over time with incremental learning Scalable inference with user feedback

21

Thank You!

Questions?

Recommended