21
KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

Embed Size (px)

Citation preview

Page 1: KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

KDD 2012, Beijing, China

Community Discovery and Profiling

with Social Messages

Wenjun Zhou • Hongxia Jin • Yan Liu

Page 2: KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

2

Background

Statistics show that emails have been ubiquitous at the workplace

With the availability of additional online social media, information overload has become problematic

• Use emails daily or several times each week: 97%• Emails being “essential” for their everyday work: 71%

--- Institute for the Future (Bowes 2000)

US workers average 49 minutes a day managing email, and 25% spend more than one hour per day on that task (Gartner 2001).

Page 4: KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

4

Page 5: KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

5

Page 6: KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

6

Motivation

Help users boost productivity Summarize their work areas automatically Keep track of past and on-going collaborations Prioritize work-related tasks

Page 7: KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

7

Problem Formulation Given: a user’s emails Find: the user’s work profile -- a set of work areas

Constraints: Unsupervised (or semi-supervised later on) Effectiveness in providing insights Computation efficiency

Teachingclass, homework, score

Alice, Bob, Charlie

Researchemail, mining, data, paper

Hongxia, Yan

Advisingmeeting, report, draft

Dane, Ellen, Flint

Grantsproject, proposal, grant, due

Sarah, Tim

Page 8: KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

8

Traditional Community Finding

Page 9: KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

9

Community (i.e. Work Area)

Two aspects people (whom

you collaborate with)

task (what you collaborate on)

Page 10: KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

10

The Email Data

Page 11: KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

11

Data Preprocessing

People (email accounts) Disregarded roles, only considered occurrence

Content (subject + body) Removed punctuations and stop words; Words

are stemmed; Documents converted into bag of words.

Unused: Replicate messages; Time-stamps; Attachments;

Page 12: KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

12

Topic Models: A Bayesian Approach

Assume: a topic is a unique distribution of words a document has a mixture of topics documents are generated by sampling from topics

and words

Page 13: KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

13

Latent Dirichlet Allocation (Blei et al., 2003)

Page 14: KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

14

COllaborator COMmunity Profiling Model (COCOMP)

Page 15: KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

15

Enron Emails

Page 16: KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

16

Page 17: KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

17

Social Messages

Page 18: KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

18

Page 19: KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

19

Page 20: KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

20

Summary COCOMP: a latent community model

Each social media document corresponds to a sharing activity within a community.

A community is represented with a list of top participants and associated list of topics.

Experiments on email and social media datasets demonstrate interesting results.

Future work Different sources of data with the same user Evolution over time with incremental learning Scalable inference with user feedback

Page 21: KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

21

Thank You!

Questions?