74
Modeling Information Seeking Behavior in Social Media Eugene Agichtein lligent Information Access Lab (IRL

Modeling Information Seeking Behavior in Social Media

  • Upload
    calvin

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Modeling Information Seeking Behavior in Social Media. Eugene Agichtein. Intelligent Information Access Lab ( IRLab ). Intelligent Information Access Lab ( IRLab ). Yandong Liu (2 n d year Phd ). Modeling information seeking behavior Web search and social media search - PowerPoint PPT Presentation

Citation preview

Page 1: Modeling Information Seeking Behavior in Social Media

Modeling Information Seeking Behavior in Social Media

Eugene AgichteinIntelligent Information Access Lab (IRLab)

Page 2: Modeling Information Seeking Behavior in Social Media

Eugene Agichtein, Emory University, IR Lab 2

Intelligent Information Access Lab (IRLab)

Qi Guo (3rd year Phd)

Ablimit Aji (2nd year PhD)

• Modeling information seeking behavior• Web search and social media search• Text and data mining for medical informatics and

public health

In collaboration with: - Beth Buffalo (Neurology)- Charlie Clarke (Waterloo)- Ernie Garcia (Radiology)- Phil Wolff (Psychology)- Hongyuan Zha (GaTech)

1st year graduate students: Julia Kiseleva, Dmitry Lagun, Qiaoling Liu, Wang Yu

Yandong Liu (2nd year Phd)

Page 3: Modeling Information Seeking Behavior in Social Media

Eugene Agichtein, Emory University, IR Lab 3

Online Behavior and Interactions

Information sharing: blogs, forums, discussions

Search logs: queries, clicks

Client-side behavior: Gaze tracking, mouse movement, scrolling

Page 4: Modeling Information Seeking Behavior in Social Media

Research Overview

Eugene Agichtein, Emory University, IR Lab

44

Information sharing

Health Informatics

Cognitive Diagnostics

Intelligent search

Discover Models of Behavior(machine learning/data mining)

Page 5: Modeling Information Seeking Behavior in Social Media

Eugene Agichtein, Emory University, IR Lab 5

Key Challenges for Web Search

• Query interpretation (infer intent)

• Ranking (high dimensionality)

• Evaluation (system improvement)

• Result presentation (information visualization)

Page 6: Modeling Information Seeking Behavior in Social Media

Eugene Agichtein, Emory University, IR Lab 6

Contextualized Intent Inference

• SERP text• Mouse trajectory, hovering/dynamics• Scrolling• Clicks

Page 7: Modeling Information Seeking Behavior in Social Media

Eugene Agichtein, Emory University, IR Lab 7

Research Intent

Page 8: Modeling Information Seeking Behavior in Social Media

Eugene Agichtein, Emory University, IR Lab 8

Purchase Intent

Page 9: Modeling Information Seeking Behavior in Social Media

Eugene Agichtein, Emory University, IR Lab 9

Relationship between behavior and intent?

• Search intent is contextualized within a search session

• Implication 1: model session-level state • Implication 2: improve detection based on client-

side interactions

Page 10: Modeling Information Seeking Behavior in Social Media

Eugene Agichtein, Emory University, IR Lab 10

Model: Linear Chain CRF

Page 11: Modeling Information Seeking Behavior in Social Media

Eugene Agichtein, Emory University, IR Lab 11

Results: Ad Click Prediction

• 200%+ precision improvement (within mission)

Page 12: Modeling Information Seeking Behavior in Social Media

Research Overview

Eugene Agichtein, Emory University, IR Lab

1212

Information sharing

Health Informatics

Cognitive Diagnostics

Intelligent search

Discover Models of Behavior(machine learning/data mining)

Page 13: Modeling Information Seeking Behavior in Social Media

Finding Information Online (Revisited)

13

Next generation of search: Algorithmically-mediated information exchange

CQA (collaborative question answering):• Realistic information exchange

• Searching archives

• Train NLP, IR, QA systems

• Study of social behavior, norms

Content quality, asker satisfaction

Current andfuture work

Page 14: Modeling Information Seeking Behavior in Social Media

Goal: Hybrid Human-Powered Search

1414

Page 15: Modeling Information Seeking Behavior in Social Media

Eugene Agichtein, Emory University, IR Lab 15

Talk Outline

Overview of the Emory IR Lab

Intent-centric Web Search

Classifying intent of a query

Contextualized search intent detection

Page 16: Modeling Information Seeking Behavior in Social Media

16

Page 17: Modeling Information Seeking Behavior in Social Media

(Text) Social Media TodayPublished:

4Gb/daySocial Media:

10Gb/Day

Technorati+Blogpulse120M blogs2M posts/day

Twitter: since 11/07:2M users3M msgs/day

Facebook/Myspace: 200-300M usersAvg 19 m/day

Yahoo Answers: 90M users, 20M questions, 400M answers[Data from Andrew Tomkins, SSM2008 Keynote]

Yes, we could read your blog. Or, you could tell us about your day

Page 18: Modeling Information Seeking Behavior in Social Media

18

Page 19: Modeling Information Seeking Behavior in Social Media

19

Total time: 7-10 minutes, active “work”

Page 20: Modeling Information Seeking Behavior in Social Media

Someone must know this…

Page 21: Modeling Information Seeking Behavior in Social Media

21+1 minute

Page 22: Modeling Information Seeking Behavior in Social Media

+7 hours: perfect answer

Page 23: Modeling Information Seeking Behavior in Social Media

Update (2/15/2009)

23

Page 24: Modeling Information Seeking Behavior in Social Media

24

http://answers.yahoo.com/question/index;_ylt=3?qid=20071008115118AAh1HdO

Page 25: Modeling Information Seeking Behavior in Social Media

25

Page 26: Modeling Information Seeking Behavior in Social Media

Finding Information Online (Revisited)

26

Next generation of search: Algorithmically-mediated information exchange

CQA (collaborative question answering):• Realistic information exchange

• Searching archives

• Train NLP, IR, QA systems

• Study of social behavior, norms

Content quality, asker satisfaction

Current andfuture work

Page 27: Modeling Information Seeking Behavior in Social Media

(Some) Related Work• Adamic et al., WWW 2007, WWW 2008:

– Expertise sharing, network structure• Elsas et al., SIGIR 2008:

– Blog search• Glance et al.:

– Blog Pulse, popularity, information sharing• Harper et al., CHI 2008, 2009:

– Answer quality across multiple CQA sites• Kraut et al.:

– community participation• Kumar et al., WWW 2004, KDD 2008, …:

– Information diffusion in blogspace, network evolution

SIGIR 2009 Workshop on Searching Social Mediahttp://ir.mathcs.emory.edu/SSM2009/

27

Page 28: Modeling Information Seeking Behavior in Social Media

Finding High Quality Content in SM

• Well-written• Interesting• Relevant (answer)• Factually correct• Popular?• Provocative?• Useful?

28

As judged by professional editors

E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne, Finding High Quality Content in Social Media, in WSDM 2008

Page 29: Modeling Information Seeking Behavior in Social Media

Social Media Content Quality

29

E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne, Finding High Quality Content in Social Media, WSDM 2008

quality

Page 30: Modeling Information Seeking Behavior in Social Media

3030

Page 31: Modeling Information Seeking Behavior in Social Media

31

How do Question and Answer Quality relate?

Page 32: Modeling Information Seeking Behavior in Social Media

3232

Page 33: Modeling Information Seeking Behavior in Social Media

3333

Page 34: Modeling Information Seeking Behavior in Social Media

3434

Page 35: Modeling Information Seeking Behavior in Social Media

3535

Page 36: Modeling Information Seeking Behavior in Social Media

Community

36

Page 37: Modeling Information Seeking Behavior in Social Media

Link Analysis for Authority Estimation

37

Question 1

Question 2

Answer 5

Answer 1

Answer 2

Answer 4

Answer 3

User 1

User 2

User 3

User 6

User 4

User 5

Answer 6

Question 3

User 1

User 2

User 3

User 6

User 4

User 5

Kj

jAiH..0

)()(

Mi

iHjA..0

)()(

Hub (asker) Authority (answerer)

Page 38: Modeling Information Seeking Behavior in Social Media

Qualitative Observations

HITS effective

HITS ineffective

38

Page 39: Modeling Information Seeking Behavior in Social Media

3939

Random forest classifier

Page 40: Modeling Information Seeking Behavior in Social Media

Result 1: Identifying High Quality Questions

40

Page 41: Modeling Information Seeking Behavior in Social Media

Top Features for Question Classification

• Asker popularity (“stars”)

• Punctuation density

• Question category

• Page views

• KL Divergence from reference LM41

Page 42: Modeling Information Seeking Behavior in Social Media

Identifying High Quality Answers

42

Page 43: Modeling Information Seeking Behavior in Social Media

Top Features for Answer Classification

• Answer length

• Community ratings

• Answerer reputation

• Word overlap

• Kincaid readability score43

Page 44: Modeling Information Seeking Behavior in Social Media

Finding Information Online (Revisited)

44

• Next generation of search: • human-machine-human

• CQA: a case study in complex IRContent quality• Asker satisfaction• Understanding the interactions

Page 45: Modeling Information Seeking Behavior in Social Media

Dimensions of “Quality”

• Well-written• Interesting• Relevant (answer)• Factually correct• Popular?• Timely?• Provocative?• Useful?

45

As judged by the asker (or community)

Page 46: Modeling Information Seeking Behavior in Social Media

Are Editor Labels “Meaningful” for CGC?

• Information seeking process: want to find useful information about topic with incomplete knowledge– N. Belkin: “Anomalous states of knowledge”

• Want to model directly if user found satisfactory information

• Specific (amenable) case: CQA

Page 47: Modeling Information Seeking Behavior in Social Media

Yahoo! Answers: The Good News

• Active community of millions of users in many countries and languages

• Effective for subjective information needs– Great forum for socialization/chat

• Can be invaluable for hard-to-find information not available on the web

47

Page 48: Modeling Information Seeking Behavior in Social Media

48

Page 49: Modeling Information Seeking Behavior in Social Media

Yahoo! Answers: The Bad News

0

5

10

15

20

25

30

35

40

1 2 3 4 5 6 7 8 9 10

49

May have to wait a long time to get a satisfactory answer

May never obtain a satisfying answer

1. FIFA World Cup2. Optical3. Poetry4. Football (American)5. Soccer6. Medicine7. Winter Sports8. Special Education9. General Health Care10. Outdoor RecreationTime to close a question (hours)

Page 50: Modeling Information Seeking Behavior in Social Media

Predicting Asker Satisfaction

Given a question submitted by an asker in CQA, predict whether the user will be satisfied with the answers contributed by the community.

– “Satisfied” :• The asker has closed the question AND• Selected the best answer AND• Rated best answer >= 3 “stars” (# not important)

– Else, “Unsatisfied

50

Yandong Liu Jiang Bian

Y. Liu, J. Bian, and E. Agichtein, in SIGIR 2008

Page 51: Modeling Information Seeking Behavior in Social Media

51

ASP: Asker Satisfaction Prediction

asker is satisfied

asker is not satisfied

TextCategory

Answerer History

Asker History

Answer

Question

Wikipedia

NewsClassifier

Page 52: Modeling Information Seeking Behavior in Social Media

52

Experimental Setup: Data

Questions

Answers Askers Categories

% Satisfied

216,170 1,963,615

158,515

100 50.7%

Crawled from Yahoo! Answers in early 2008

“Anonymized” dataset available at: http://ir.mathcs.emory.edu/shared/

1/2009: Yahoo! Webscope : “Comprehensive” Answers dataset: ~5M questions & answers.

Page 53: Modeling Information Seeking Behavior in Social Media

Satisfaction by TopicTopic Questi

onsAnswers

A per Q

Satisfied

Asker rating

Time to close by asker

2006 FIFA World Cup

1194 35,659

329.86

55.4%

2.63 47 minutes

Mental Health

151 1159 7.68 70.9%

4.30 1.5 days

Mathematics

651 2329 3.58 44.5%

4.48 33 minutes

Diet & Fitness

450 2436 5.41 68.4%

4.30 1.5 days53

Page 54: Modeling Information Seeking Behavior in Social Media

54

Satisfaction Prediction: Human Judges

• Truth: asker’s rating• A random sample of 130 questions• Researchers

– Agreement: 0.82 F1: 0.45 2P*R/(P+R)

• Amazon Mechanical Turk– Five workers per question. – Agreement: 0.9 F1: 0.61 – Best when at least 4 out of 5 raters agree

Page 55: Modeling Information Seeking Behavior in Social Media

Performance: ASP vs. Humans (F1, Satisfied)

Classifier With Text Without Text Selected Features

ASP_SVM 0.69 0.72 0.62ASP_C4.5 0.75 0.76 0.77ASP_RandomForest 0.70 0.74 0.68ASP_Boosting 0.67 0.67 0.67ASP_NB 0.61 0.65 0.58Best Human Perf 0.61Baseline (random)

0.66

55ASP is significantly more effective than humans

Human F1 is lower than the random baseline!

Page 56: Modeling Information Seeking Behavior in Social Media

Top Features by Information Gain

• 0.14 Q: Askers’ previous rating• 0.14 Q: Average past rating by

asker• 0.10 UH: Member since (interval)• 0.05 UH: Average # answers for by

past Q• 0.05 UH: Previous Q resolved for the

asker• 0.04 CA: Average asker rating for

category• 0.04 UH: Total number of answers

received…

56

Page 57: Modeling Information Seeking Behavior in Social Media

57

“Offline” vs. “Online” Prediction

• Offline prediction (AFTER answers arrive)– All features( question, answer, asker & category)– F1: 0.77

• Online prediction (BEFORE question posted)– NO answer features– Only asker history and question features (stars,

#comments, sum of votes…)– F1: 0.74

Page 58: Modeling Information Seeking Behavior in Social Media

Personalized Prediction of Satisfaction

Same information != same usefulness for different searchers!

Personalization vs. “Groupization”?

58

Y. Liu and E. Agichtein, You've Got Answers: Personalized Models for Predicting Success in Community Question Answering, ACL 2008

Page 59: Modeling Information Seeking Behavior in Social Media

Example Personalized Models

59

Page 60: Modeling Information Seeking Behavior in Social Media

Outline

60

• Next generation of search: • Algorithmically mediated information exchange

• CQA: a case study in complex IRContent qualityAsker satisfaction

Page 61: Modeling Information Seeking Behavior in Social Media

Current Work (in Progress)

• Partially supervised models of expertise(Bian et al., WWW 2009)

• Real-time CQA

• Sentiment, temporal sensitivity analysis

• Understanding Social Media dynamics

Page 62: Modeling Information Seeking Behavior in Social Media

Answer Arrival

62

5 10 15 20 25 30 35 40 45 50 55 600

100000

200000

300000

400000

500000

600000

700000

573086

378227

146845

7226046364 34573 27322 23194 19952 17260 15481 13985

First Hour (69%)

Time in minutes

Answer number arrived in < T

Page 63: Modeling Information Seeking Behavior in Social Media

Exponential Decay Model [Lerman 2007]

Page 64: Modeling Information Seeking Behavior in Social Media

Factors Influencing Dynamics

Page 65: Modeling Information Seeking Behavior in Social Media

Example: Answer Arrival | Category

Page 66: Modeling Information Seeking Behavior in Social Media

Subjectivity

Page 67: Modeling Information Seeking Behavior in Social Media

Answer, Rating Arrival

Page 68: Modeling Information Seeking Behavior in Social Media

Preliminary Results: Modeling SM Dynamics for Real-Time Classification

• Adapt SM dynamics models to classificatione.g.: predict ratings

feature value:

Page 69: Modeling Information Seeking Behavior in Social Media

Outline

69

• Next generation of search: • Algorithmically mediated information exchange

• CQA: a case study in complex IRContent qualityAsker satisfactionUnderstanding social media dynamics

Page 70: Modeling Information Seeking Behavior in Social Media

Eugene Agichtein, Emory University, IR Lab 70

Question Urgency

Problem – a growing volume of questions competing for visibility

• Time-sensitive (urgent) questions pushed out by newer questions

• Delayed responses may become useless to seeker – wastes site resources and responders’ time

Page 71: Modeling Information Seeking Behavior in Social Media

Goal: Query Processing over Web and Social Systems

7171

Page 72: Modeling Information Seeking Behavior in Social Media

Takeaways

Robust machine learning over behavior data system improvements, insights into behavior

Contextualized models for NLP and text mining system improvements, insights into interactions

Mining social media: potential for transformative impact for IR, sociology, psychology, medical informatics, public health, …

72

Page 73: Modeling Information Seeking Behavior in Social Media

References • Modeling web search behavior [SIGIR 2006, 2007]• Estimating content quality [WSDM 2008]• Estimating contributor authority [CIKM 2007]• Searching CQA archives [WWW 2008, WWW 2009]• Inferring asker intent [EMNLP 2008]• Predicting satisfaction [SIGIR 2008, ACL 2008, TKDE]• Coping with spam [AIRWeb 2008]

More information, datasets, papers, slides:http://www.mathcs.emory.edu/~eugene/

Page 74: Modeling Information Seeking Behavior in Social Media

Eugene Agichtein, Emory University, IR Lab 74

Thank you!

• Yandex (for hosting my visit)

Supported by: