24
Delft University of Technology Building a Microblog Corpus for Search Result Diversification AIRS 2013, Singapore, December 10 Ke Tao, Claudia Hauff, Geert-Jan Houben Web Information Systems, TU Delft, the Netherlands

Building a Microblog Corpus for Search Result Diversification

  • Upload
    ke-tao

  • View
    751

  • Download
    0

Embed Size (px)

DESCRIPTION

The talk given by Ke Tao at AIRS 2013 titled "Building a Microblog Corpus for Search Result Diversification"

Citation preview

Page 1: Building a Microblog Corpus for Search Result Diversification

DelftUniversity ofTechnology

Building a Microblog Corpus for Search Result

Diversification

AIRS 2013, Singapore, December 10

Ke Tao, Claudia Hauff, Geert-Jan HoubenWeb Information Systems, TU Delft, the Netherlands

Page 2: Building a Microblog Corpus for Search Result Diversification

2Building a Microblog Corpus for Search Result Diversification

1. Diversification needed: Users are likely to use shorter queries, which tend to be underspecified, to search on microblog

2. Lack of Corpus for Diversification Study: How can one build a microblog corpus for evaluating further study on diversification?

tweets

Research Challenges

query

SearchResult

diversificationstrategy

Diversified Result

diversityjudgment

Page 3: Building a Microblog Corpus for Search Result Diversification

3Building a Microblog Corpus for Search Result Diversification

MethodologyOverview

1. Data Source• How can we find a good representative Twitter dataset?

2. Topic Selection• How do we select the search topics?

3. Tweets Pooling• Which tweets are we going to annotate?

4. Diversity Annotation• How do we annotate the tweets with diversity characteristics?

Page 4: Building a Microblog Corpus for Search Result Diversification

4Building a Microblog Corpus for Search Result Diversification

Methodology – Data source

• From where?• Twitter sampling API around 1% of whole Twitter streams

• Duration• From February 1st to March 31st 2013• Coincide with TREC 2013 Microblog Track

• Tools• Twitter Public Stream Sampling Tools by @lintool• Amazon EC2 in EU

TREC 2013 Microblog Guideline: https://github.com/lintool/twitter-tools/wiki/ TREC-2013-Track-Guidelines

Twitter Public Stream Sampling Tool: https://github.com/lintool/twitter-tools/wiki/Sampling-the-public-Twitter-stream

Page 5: Building a Microblog Corpus for Search Result Diversification

5Building a Microblog Corpus for Search Result Diversification

Methodology – Topic SelectionHow do we select the search topics?• Candidates in Wikipedia Current Events Portal

• Enough importance

• More than local interests

• Temporal Characteristics• Evenly distributed during the period of 2-month

• Enables further analysis on temporal characteristics

• Selected• 50 topics on trending news events

Wikipedia Current Events Portal: http://en.wikipedia.org/wiki/Portal: Current_events

Page 6: Building a Microblog Corpus for Search Result Diversification

6Building a Microblog Corpus for Search Result Diversification

Methodology – Tweets Pooling – 1/2

Maximize coverage & Minimize effort• Challenge for adopting existing solution

• Lack of access to multiple retrieval systems

• Topic Expansion• Manually created query for each topic

• Aim at maximum coverage of tweets that are relevant to the

topic

• Duplicate Filtering• Filter out the duplicate tweets (cosine similarity > 0.9)

Page 7: Building a Microblog Corpus for Search Result Diversification

7Building a Microblog Corpus for Search Result Diversification

Methodology – Tweets Pooling – 2/2

Topic Expansion Example

Hillary Clinton steps down as United States

Secretary of StatePossible varietyof expressions

Page 8: Building a Microblog Corpus for Search Result Diversification

8Building a Microblog Corpus for Search Result Diversification

Methodology – Diversity Annotation

Annotation Efforts• 500 tweets for each topic

• No identification of subtopics beforehand

• Tweets about general topic (=no added value) are judged non-

relevant

• No further check on URL links may be not available as time

goes

• 50 topics split between 2 annotators• Subjective process

• Later comparative results

• 3 topics dropped – e.g. not enough diversity / relevant

documents

Page 9: Building a Microblog Corpus for Search Result Diversification

9Building a Microblog Corpus for Search Result Diversification

Topic AnalysisThe Topics and Subtopics 1/2

All topicsTopics annotated by

Annotator 1

Annotator 2

Avg. #subtopics 9.27 8.59 9.88Std. dev.

#subtopics3.88 5.11 2.14

Min. #subtopics 2 2 6Max. #subtopics 21 21 13

On average, we found 9 subtopics per each topic. The subjectivity of annotation is confirmed based on the differences in the standard deviation of number of subtopics per each topic between two annotators.

Page 10: Building a Microblog Corpus for Search Result Diversification

10Building a Microblog Corpus for Search Result Diversification

Topic AnalysisThe Topics and Subtopics 2/2

The annotators on average spent 6.6 seconds to annotate a tweet. Most of the tweets are assigned with exactly one subtopic.

Page 11: Building a Microblog Corpus for Search Result Diversification

11Building a Microblog Corpus for Search Result Diversification

Topic AnalysisThe relevance judgment 1/2• Different diversity in topics

• 25 topics have less than 100 tweets with subtopics• 6 topics have more than 350 tweets with subtopics

• Difference between 2 annotators• On average, 96 tweets v.s. 181 tweets with subtopic

assignment

Page 12: Building a Microblog Corpus for Search Result Diversification

12Building a Microblog Corpus for Search Result Diversification

Topic AnalysisThe relevance judgment 2/2• Temporal persistence

• Some topics are active during the entire timespan• Northern Mali conflicts• Syrian civil war

• Low to 24 hours for some topics• BBC Twitter account hacked• Eiffel Tower, evacuated due to bomb threat

Page 13: Building a Microblog Corpus for Search Result Diversification

13Building a Microblog Corpus for Search Result Diversification

All topicsTopics annotated by

Annotator 1 Annotator 2Avg. diversity difficulty 0.71 0.72 0.70

Std. Dev. diversity difficulty

0.07 0.06 0.07

Topic AnalysisDiversity Difficulty

• The difficulty to diversify the search results• Ambiguity or Under-specification of topics• Diverse content available in the corpus

• Golbus et al. proposed diversity difficulty measure dd• dd > 0.9 = arbitrary ranked list is likely to cover all subtopics• dd < 0.5 means hard to discover subtopics by an untuned

retrieval system

Golbus et al.: Increasing evaluation sensitivity to diversity. Information Retrieval (2013) 16

Page 14: Building a Microblog Corpus for Search Result Diversification

14Building a Microblog Corpus for Search Result Diversification

Topic AnalysisDiversity Difficulty

• The difficulty to diversify the search results• Ambiguity or Under-specification of topics• Diverse content available in the corpus

• Golbus et al. proposed diversity difficulty measure dd• dd > 0.9 indicates a diverse query• dd < 0.5 means hard to discover subtopics by an untuned

retrieval system

• Difference between long-/short-term topics• The topics with longer timespan (>50 days) are easier in diversity

difficulty (0.73 > 0.70)Golbus et al.: Increasing evaluation sensitivity to diversity. Information Retrieval (2013) 16

Page 15: Building a Microblog Corpus for Search Result Diversification

15Building a Microblog Corpus for Search Result Diversification

Diversification by De-Duplicating – 1/6

Lower redudancy, but higher diversity?

• In previous work, we were motivated by the fact that• 20% of search results are duplicate information in different

extent

• Therefore, we proposed to remove the duplicates in

order to achieve lower redundancy in top-k results• Implemented with a machine learning framework

• Make use of syntactical, semantic, and contextual features

• Eliminate the identified duplicates with lower rank in the search

resultTao et al.: Groundhog Day: Near-duplicate Detection on Twitter. In Proceedings of 22nd International World Wide Web Conference.

Whether it can achieve in higher diversity?

Page 16: Building a Microblog Corpus for Search Result Diversification

16Building a Microblog Corpus for Search Result Diversification

Diversification by De-Duplicating – 2/6

Measures

• We adopts following measures:• alpha-(n)DCG

• Precision-IA

• Subtopic-Recall

• Redundancy

Clarke et al.: Novelty and Diversity in Information Retrieval Evaluation. In Proceedings of SIGIR, 2008.Agrawal et al.: Diversifying Search Results. In Proceedings of WSDM, 2009.Zhai et al.: Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval. In Proceedings of SIGIR, 2003.

Page 17: Building a Microblog Corpus for Search Result Diversification

17Building a Microblog Corpus for Search Result Diversification

Diversification by De-Duplicating – 3/6

Baseline and De-Duplicate Strategies

• Baseline Strategies• Automatic run: using standard queries (no more than 3 terms)

• Filtered Auto: filter the duplicates out w.r.t. cosine similarity

• Manual Run: manually created complex queries with automatic

filtering

• De-duplicate Strategies• Sy = Syntactical, Se= Semantic, Co = Contextual

• Four strategies: Sy, SyCo, SySe, SySeCo

Page 18: Building a Microblog Corpus for Search Result Diversification

18Building a Microblog Corpus for Search Result Diversification

Diversification by De-Duplicating – 4/6

Overall comparison

Overall, the de-duplicate strategies did achieve in lower redundancy. However, they didn’t achieve in terms of higher diversity.

Page 19: Building a Microblog Corpus for Search Result Diversification

19Building a Microblog Corpus for Search Result Diversification

Diversification by De-Duplicating – 5/6

Influence of Annotator Subjectivity

Page 20: Building a Microblog Corpus for Search Result Diversification

20Building a Microblog Corpus for Search Result Diversification

Diversification by De-Duplicating – 5/6

Influence of Annotator Subjectivity

The same general trends for both annotators.alpha-nDCG scores are higher for Annotator 2 can be explained by on average more documents judged as relevant by Annotator 2.

Page 21: Building a Microblog Corpus for Search Result Diversification

21Building a Microblog Corpus for Search Result Diversification

Diversification by De-Duplicating – 6/6

Influence of Temporal Persistence

Page 22: Building a Microblog Corpus for Search Result Diversification

22Building a Microblog Corpus for Search Result Diversification

Diversification by De-Duplicating – 6/6

Influence of Temporal Persistence

De-duplicate strategies can help for long-term topics, because the vocabulary was richer while only a small set of terms were used for short-term topics.

Page 23: Building a Microblog Corpus for Search Result Diversification

23Building a Microblog Corpus for Search Result Diversification

Conclusions• We have done:

• Created a microblog-based corpus for search result diversification• Conducted comprehensive analysis and showed its suitability• Confirmed considerable subjectivity among annotators, although

the trends w.r.t. the different evaluation measures were largely independent of annotators

• We have made the corpus available via:• http://wis.ewi.tudelft.nl/airs2013/

• What we will do:• Apply the diversification approaches that have been shown to

perform well in the Web search setting.• Propose the diversification approaches specifically designed for

search on microblogging platforms.

Page 24: Building a Microblog Corpus for Search Result Diversification

24Building a Microblog Corpus for Search Result Diversification

Thank you!

Ke Tao @taubau

@wisdelfthttp://ktao.nl