12
Athena Research and Innovation Center Yahoo! Research Diversifying User Comments on News Articles

Arcomem training diversification

  • Upload
    arcomem

  • View
    179

  • Download
    0

Embed Size (px)

DESCRIPTION

This presentation on Diversification is part of the ARCOMEM training curriculum. Feel free to roam around or contact us on Twitter via @arcomem to learn more about ARCOMEM training on archiving Social Media.

Citation preview

Page 1: Arcomem training diversification

Athena Research and Innovation CenterYahoo! Research

Diversifying User Comments on News Articles

Page 2: Arcomem training diversification

2

ProblemProblem description:

Given a news article and the respective set of user comments, return a subset of the most diverse comments

Perception of a diverse set of comments:A set of comments that represents different opinions and

sentiments, …expressed by users with different demographic

characteristics, …covering different aspects of the news article.

MotivationArticle’s content itself is not always enough to form a complete

view over a topicThe public opinion complements the article and represents

the “wisdom of the crowds”

Page 3: Arcomem training diversification

3

ExampleGiven a political article:

Find all the subtopics handled Persons related Events (election, bill voting)

Find all opinions and sentiments expressed Positive/negative/neutral On the whole article/on specific

subtopicsFind different kinds of users

commenting Different demographics Different commenting history on

previous articlesPresent a set of comments that

better represents the diversity of the above dimensions

Page 4: Arcomem training diversification

4

Motivation

Several articles are very popular (>10000 comments)Articles get aggregated

even more commentsImpossible for the reader to

review

Current comment sorting options are based on more simple criteriaDateVotesReplies

Page 5: Arcomem training diversification

5

Method outlineDefine diversification criteria

Dimensions Content, Sentiment, Named Entities, User co-commenting

behavior

Define a (dis)similarity function that produces a diversity score based on the criteriaQuantify the dissimilarity of commentsWeighted sum of cosine similarities on diversity feature

vectors Apply and iterative heuristic algorithm that, at each step,

selects the candidate comment that maximizes a diversity objective

Page 6: Arcomem training diversification

6

Method description - CriteriaContent

Baseline diversity criterionUsed in the rest of the literature to diversify search results. Objective obtain comments with diverse content.Processing

Comments’ text term vectors Document length-normalized tf values

Page 7: Arcomem training diversification

7

Method description - CriteriaNamed Entities (Nes)

Person, Organizations, LocationsMany times news articles revolve around Nes

Even when an article talks about events or situations, usually one or more Persons or Locations are involved

Objective obtain comments that cover (uniformly) as many different NEs as possible

Processing Extraction of Nes in comments (Stanford NER) Comments’ Nes term vectors Document length-normalized tf values

Page 8: Arcomem training diversification

8

Method description - CriteriaSentiment

9 classes of sentiment within the interval [-4, 4] -4 very negative 4 very positive 0 neutral

Expresses users’ opinions on the news articles’ topics. Objective obtain comments that cover (uniformly) different

classes of sentimentProcessing

Sentiment analysis of the comment’s text (SentiStrength) Construct sentiment vectors Each vector value represents a sentiment class

Page 9: Arcomem training diversification

9

Comment scoringCosine similarity function between

A pair of commentsA comment and a set of comments

Apply the similarity function for each criterion separatelyProduce a final diversity score as a weighted sum of all

criterion scoresProduce a final score that incorporates comment-to-article

similarity

Page 10: Arcomem training diversification

10

Algorithm (MAXSUM) Initially

Empty diverse result set all comments belong to the candidate set

Arbitrary insertion of a candidate comment into the result setGreedy construction heuristic

Compare each candidate comment with the centroid (average) of the current result set

Finish after (k-1) iterations k comments are inserted

Page 11: Arcomem training diversification

11

EvaluationComparison of methods’ coverage on different information

nuggets they containBaseline diversification based only on contentProposed method (combination of multiple criteria)

Proposed methods outperform the baseline

Page 12: Arcomem training diversification

12

Framework - ImplementationsA desktop java application retrieving news articles and

comments comments stored in a MySQL databaseNews and comments obtained by the NY Times API

Arcomem offline module for calculating diverse

WebObjects of WebResources