34
Finding High-Quality Content in Social Media chenwq 2011/11/26

Finding High-Quality Content in Social Media chenwq 2011/11/26

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Finding High-Quality Content in Social Media chenwq 2011/11/26

Finding High-Quality Content in Social Media

chenwq2011/11/26

Page 2: Finding High-Quality Content in Social Media chenwq 2011/11/26

Authors

Eugene Agichtein

Emory University

Research: Intelligent Information Access Lab (IRLab)

News:our team wins the "Best Paper" award at SIGIR 2011.

Page 3: Finding High-Quality Content in Social Media chenwq 2011/11/26

Abstract

From the early 2000s,user-generated content has become popular on the web.The quality of user-generated content varies drastically from excel-lent to abuse and spam.To separate high-quality content from the rest automaticallyGraph-based framework– combine the different sources of evidence

in a classification formulation

Page 4: Finding High-Quality Content in Social Media chenwq 2011/11/26

MODELING CONTENT QUALITYMODELING CONTENT QUALITY

Related workRelated work

CONTENT QUALITY ANALYSISCONTENT QUALITY ANALYSIS

EXPERIMENT & ConclusionEXPERIMENT & Conclusion

11

22

33

44

Contents

Page 5: Finding High-Quality Content in Social Media chenwq 2011/11/26

Related work

Link analysis in social media

Propagating reputation

Question/answering portals and fo-

rums

Expert finding

Text analysis for content quality

Implicit feedback for ranking

Page 6: Finding High-Quality Content in Social Media chenwq 2011/11/26

Related work

Link analysis in social media

– G = (V, E)

– V corresponding to the users of a question/an-

swer system

– a directed edge e = (u, v) ∈ E from a user u ∈ V

to a user v ∈ V if user u has answered to at least

one question of user v

– G’ = (V, E’)

PageRank, ExpertiseRank, HITS

Page 7: Finding High-Quality Content in Social Media chenwq 2011/11/26

MODELING CONTENT QUALITYMODELING CONTENT QUALITY

Related workRelated work

CONTENT QUALITY ANALYSISCONTENT QUALITY ANALYSIS

EXPERIMENT & ConclusionEXPERIMENT & Conclusion

11

22

33

44

Contents

Page 8: Finding High-Quality Content in Social Media chenwq 2011/11/26

CONTENT QUALITY ANALYSIS——Intrinsic content quality

As a baseline, we use textual features

only—with all word n-grams up to

length 5 that appear in the collection

more than 3 times used as feature-

susers

Page 9: Finding High-Quality Content in Social Media chenwq 2011/11/26

Punctuation and typos Syntactic and semantic Grammaticality

1. Punctuation

2. Capitalization

3. Spacing density

4. Character-level

entropy

5. Spelling mistakes

6. Out-of-vocabulary

words

1. Average number of

syllables per word

2. Entropy of word

lengths

3. Readability measures

1. Part-of-speech

sequences

2. Formality score

3. Distance between its

(trigram) language

model and several

given language models

CONTENT QUALITY ANALYSIS——Intrinsic content quality

Page 10: Finding High-Quality Content in Social Media chenwq 2011/11/26

CONTENT QUALITY ANALYSIS——User relationships

items and users Graph

user-user Graphu qanswer

uv

u has answered a question from user v

Page 11: Finding High-Quality Content in Social Media chenwq 2011/11/26

CONTENT QUALITY ANALYSIS——Usage statistics

The number of clicks on some itemThe dwell time on some item

Page 12: Finding High-Quality Content in Social Media chenwq 2011/11/26

CONTENT QUALITY ANALYSIS——classification framework

We cast the problem of quality ranking as a binary classification – support vector machines– log-linear classifiers– stochastic gradient boosted trees

Our goal is to discover interesting,well for-mulated and factually accurate content

Page 13: Finding High-Quality Content in Social Media chenwq 2011/11/26

MODELING CONTENT QUALITYMODELING CONTENT QUALITY

Related workRelated work

CONTENT QUALITY ANALYSISCONTENT QUALITY ANALYSIS

EXPERIMENT & ConclusionEXPERIMENT & Conclusion

11

22

33

44

Contents

Page 14: Finding High-Quality Content in Social Media chenwq 2011/11/26

MODELING CONTENT QUALITY——user relationships

Our dataset, viewed as a graph as il-lustrated in Figure 1

Page 15: Finding High-Quality Content in Social Media chenwq 2011/11/26

MODELING CONTENT QUALITY——user relationships

The relationships between questions, users asking and answering questions, and answers can be captured by a tri-partite graph outlined in Figure 2

Page 16: Finding High-Quality Content in Social Media chenwq 2011/11/26

MODELING CONTENT QUALITY——user relationships

the unique characteristics of the com-munity question/answering domain

Page 17: Finding High-Quality Content in Social Media chenwq 2011/11/26

MODELING CONTENT QUALITY——user relationships

Question subtree– Q Features from the question being answered– QU Features from the asker of the question being

answered– QA Features from the other answers to the same

question

Page 18: Finding High-Quality Content in Social Media chenwq 2011/11/26

MODELING CONTENT QUALITY——user relationships

User subtree– UA Features from the answers of the user– UQ Features from the questions of the user– UV Features from the votes of the user– UQA Features from answers received to the

user’s questions– U Other user-based features

Page 19: Finding High-Quality Content in Social Media chenwq 2011/11/26

MODELING CONTENT QUALITY——user relationships

Question features

Page 20: Finding High-Quality Content in Social Media chenwq 2011/11/26

MODELING CONTENT QUALITY——user relationships

Implicit user-user relationsG = (V,E)– E = Ea∪Eb∪Ev∪Es∪E+∪E−

Gx = (V,Ex)– hx the vector of hub scores on the vertices V– ax the vector of authority scores– px the vector of PageRank scores– p´x the vector of PageRank scores in the trans-

posed graph

Page 21: Finding High-Quality Content in Social Media chenwq 2011/11/26

MODELING CONTENT QUALITY——user relationships

Implicit user-user relations

Page 22: Finding High-Quality Content in Social Media chenwq 2011/11/26

MODELING CONTENT QUALITY——user relationships

Content features for QA

– to identify the most salient features for the specific tasks of question or answer quality classification• the KL-divergence between the

language models of the two texts• their non-stopword overlap• the ratio between their lengths

Page 23: Finding High-Quality Content in Social Media chenwq 2011/11/26

MODELING CONTENT QUALITY——user relationships

Usage features for QA– number of item views (clicks)– Metadata of question

• how long ago the question was posted– derived statistics

• the expected number of views for a given category

• the deviation from the expected num-ber of views

– other second-order statistics• the click frequency

Page 24: Finding High-Quality Content in Social Media chenwq 2011/11/26

MODELING CONTENT QUALITYMODELING CONTENT QUALITY

Related workRelated work

CONTENT QUALITY ANALYSISCONTENT QUALITY ANALYSIS

EXPERIMENT & ConclusionEXPERIMENT & Conclusion

11

22

33

44

Contents

Page 25: Finding High-Quality Content in Social Media chenwq 2011/11/26

Experiment & Conclusions——EXPERIMENTAL SETTING

Dataset

Edges induced from the whole dataset.

Page 26: Finding High-Quality Content in Social Media chenwq 2011/11/26

MODELING CONTENT QUALITY——EXPERIMENTAL SETTING

Dataset statistics

Page 27: Finding High-Quality Content in Social Media chenwq 2011/11/26

MODELING CONTENT QUALITY——EXPERIMENTAL SETTING

Dataset statistics

Page 28: Finding High-Quality Content in Social Media chenwq 2011/11/26

MODELING CONTENT QUALITY——EXPERIMENTAL SETTING

Dataset statistics

Page 29: Finding High-Quality Content in Social Media chenwq 2011/11/26

MODELING CONTENT QUALITY——EXPERIMENTAL SETTING

Dataset statistics

Page 30: Finding High-Quality Content in Social Media chenwq 2011/11/26

MODELING CONTENT QUALITY——EXPERIMENTAL SETTING

Dataset statistics

Page 31: Finding High-Quality Content in Social Media chenwq 2011/11/26

MODELING CONTENT QUALITY——EXPERIMENTAL SETTING

Dataset statistics

Page 32: Finding High-Quality Content in Social Media chenwq 2011/11/26

MODELING CONTENT QUALITY——EXPERIMENTAL SETTING

Dataset statistics

Page 33: Finding High-Quality Content in Social Media chenwq 2011/11/26

MODELING CONTENT QUALITY——EXPERIMENTAL SETTING

Dataset statistics

Page 34: Finding High-Quality Content in Social Media chenwq 2011/11/26

Thanks for attention!