Upload
arcomem
View
179
Download
0
Tags:
Embed Size (px)
DESCRIPTION
This presentation on Diversification is part of the ARCOMEM training curriculum. Feel free to roam around or contact us on Twitter via @arcomem to learn more about ARCOMEM training on archiving Social Media.
Citation preview
Athena Research and Innovation CenterYahoo! Research
Diversifying User Comments on News Articles
2
ProblemProblem description:
Given a news article and the respective set of user comments, return a subset of the most diverse comments
Perception of a diverse set of comments:A set of comments that represents different opinions and
sentiments, …expressed by users with different demographic
characteristics, …covering different aspects of the news article.
MotivationArticle’s content itself is not always enough to form a complete
view over a topicThe public opinion complements the article and represents
the “wisdom of the crowds”
3
ExampleGiven a political article:
Find all the subtopics handled Persons related Events (election, bill voting)
Find all opinions and sentiments expressed Positive/negative/neutral On the whole article/on specific
subtopicsFind different kinds of users
commenting Different demographics Different commenting history on
previous articlesPresent a set of comments that
better represents the diversity of the above dimensions
4
Motivation
Several articles are very popular (>10000 comments)Articles get aggregated
even more commentsImpossible for the reader to
review
Current comment sorting options are based on more simple criteriaDateVotesReplies
5
Method outlineDefine diversification criteria
Dimensions Content, Sentiment, Named Entities, User co-commenting
behavior
Define a (dis)similarity function that produces a diversity score based on the criteriaQuantify the dissimilarity of commentsWeighted sum of cosine similarities on diversity feature
vectors Apply and iterative heuristic algorithm that, at each step,
selects the candidate comment that maximizes a diversity objective
6
Method description - CriteriaContent
Baseline diversity criterionUsed in the rest of the literature to diversify search results. Objective obtain comments with diverse content.Processing
Comments’ text term vectors Document length-normalized tf values
7
Method description - CriteriaNamed Entities (Nes)
Person, Organizations, LocationsMany times news articles revolve around Nes
Even when an article talks about events or situations, usually one or more Persons or Locations are involved
Objective obtain comments that cover (uniformly) as many different NEs as possible
Processing Extraction of Nes in comments (Stanford NER) Comments’ Nes term vectors Document length-normalized tf values
8
Method description - CriteriaSentiment
9 classes of sentiment within the interval [-4, 4] -4 very negative 4 very positive 0 neutral
Expresses users’ opinions on the news articles’ topics. Objective obtain comments that cover (uniformly) different
classes of sentimentProcessing
Sentiment analysis of the comment’s text (SentiStrength) Construct sentiment vectors Each vector value represents a sentiment class
9
Comment scoringCosine similarity function between
A pair of commentsA comment and a set of comments
Apply the similarity function for each criterion separatelyProduce a final diversity score as a weighted sum of all
criterion scoresProduce a final score that incorporates comment-to-article
similarity
10
Algorithm (MAXSUM) Initially
Empty diverse result set all comments belong to the candidate set
Arbitrary insertion of a candidate comment into the result setGreedy construction heuristic
Compare each candidate comment with the centroid (average) of the current result set
Finish after (k-1) iterations k comments are inserted
11
EvaluationComparison of methods’ coverage on different information
nuggets they containBaseline diversification based only on contentProposed method (combination of multiple criteria)
Proposed methods outperform the baseline
12
Framework - ImplementationsA desktop java application retrieving news articles and
comments comments stored in a MySQL databaseNews and comments obtained by the NY Times API
Arcomem offline module for calculating diverse
WebObjects of WebResources