14
Leveraging Learning To Rank in an Optimization Framework for Timeline Summarization Giang Binh Tran, Anh Tuan Tran, Nam Khanh Tran, Mohammad Alrifai, Nattiya Kanhabua L3S Research Center & University of Hannover, Germany 1 SIGIR Workshop TAIA’13, Dublin August 1, 2013

Leveraging Learning To Rank in an Optimization Framework for Timeline Summarization

Embed Size (px)

Citation preview

Page 1: Leveraging Learning To Rank in an Optimization Framework for Timeline Summarization

Leveraging Learning To Rank in an Optimization Framework for Timeline Summarization

Giang Binh Tran, Anh Tuan Tran, Nam Khanh Tran, Mohammad Alrifai, Nattiya Kanhabua

L3S Research Center & University of Hannover, Germany

1

SIGIR Workshop TAIA’13, Dublin August 1, 2013

Page 2: Leveraging Learning To Rank in an Optimization Framework for Timeline Summarization

Timeline Summarization

2

News Topic: Arab Spring What and how did it happen? A summarization with the temporal structure (list of daily key events) Example:

• 11 Feb 2011: Egypt President Hosni Mubarak resigned • 15 Feb 2011: protests broke out against Muammar

Gaddafi’s regime • 03 Mar 2011: Egypt Prime Minister Ahmed Shafik resigned

Page 3: Leveraging Learning To Rank in an Optimization Framework for Timeline Summarization

Example

Day Summaries of key events

Important dates 3

Page 4: Leveraging Learning To Rank in an Optimization Framework for Timeline Summarization

Related work • Timeline Summarization:

• Chieu et al. (SIGIR’04): • burstiness + interest score (~sum TFxIDF similarity to

neighbor sentences) • Yan et al. (SIGIR’11):

• Topic relevancy + coverage + coherence + diversity based on word distribution

Unsupervised manners

Our approach: learn from expert-created timeline summaries, and optimize with

different criteria

4

Page 5: Leveraging Learning To Rank in an Optimization Framework for Timeline Summarization

Sentence Ranking Model

TIMELINE

Date Summary

2011-08-29 Eni CEO meets with members of the rebel government.

2011-09-08 Gaddafi vows to fight on

……. ……

Learning Algorithms

Manually created

Timelines

Optimization

Rs

Ranked Sentences

5

Page 6: Leveraging Learning To Rank in an Optimization Framework for Timeline Summarization

Learning to Rank sentences • Assumption

• Day summaries are created from input news articles (e.g. BBC timelines BBC news articles)

• Generate Training Data automatically Relevance R(s) ~ Textual Similarity (s, DS ) A sentence with higher similarity to Day Summary (DS) is more likelihood to be selected as a part of summary

• Feature extraction

Surface: length, stop/non-stop words,#pronouns, position. Coherence: #temporal/logical/causal signals Topic: sum/avg TFIDF, logodds, cross entropy, semantic similarity to document abstract Temporal: popularity, has temporal expression Event: probability to describes the main events in term of top word pairs

6

Page 7: Leveraging Learning To Rank in an Optimization Framework for Timeline Summarization

Optimize Timeline Generation N-gram-based computation • Novelty Avoid duplication in a day summary when selecting s • Continuity Generate timeline as a flow of information (connecting the dots between day summaries) Maximize Using dynamic programming

7

Page 8: Leveraging Learning To Rank in an Optimization Framework for Timeline Summarization

Evaluation Dataset: Timeline17 (www.l3s.de/~gtran/timeline) 4650 articles collected from wellknown news agencies (e.g., BBC, CNN,.) 17 Timelines from 9 Topics : BP Oil Spill, Haiti Earthquake, H1N1, Financial Crisis, Lybian War, ... Leave-one-out strategy „In-house“ experiment: timeline generated from BBC news should be compared against BBC expert-generated timeline

8

Page 9: Leveraging Learning To Rank in an Optimization Framework for Timeline Summarization

9

Page 10: Leveraging Learning To Rank in an Optimization Framework for Timeline Summarization

Metric ROUGE n-gram based measurement (overlapped n-grams between generated day summary and expert-created day summaries - Precision/Recall/F-measure) ROUGE-1 uses uni-grams, ROUGE-2 uses bi-grams, ROUGE-S* uses skipped bi-grams

Chieu et al. (Chieu et al. SIGIR 2004) MEAD: traditional multi-document summarization system ETS (Yan et al. SIGIR 2011)

10

Page 11: Leveraging Learning To Rank in an Optimization Framework for Timeline Summarization

Michael Jackson Death trial, example

2009-07-28 Dr Murray 's home is also raided . 2011-05-02 The trial is delayed again , as Dr Murray 's lawyers ask for extra time to prepare for new prosecution witnesses . ----------------------- 2009-07-29 Court documents filed in Nevada show that Dr Murray is heavily in debt , owing more than $ 780,000 in judgements against him and his medical practice, outstanding mortgage payments on his house , child support and credit cards .

11

BBC Timeline (ground truth) 2009-07-28 (Ok) Police raid Jackson doctor 's home 2011-05-02 In Los Angeles , lawyers for Dr Conrad Murray had asked for a delay to prepare for new prosecution witnesses . ---------------------- 2009-07-29 (Bad) Michael Flanagan of the DEA describes the operation Police have searched the Las Vegas home and offices of Michael Jackson 's doctor as part of a manslaughter investigation into the singer 's death .

Ours

Page 12: Leveraging Learning To Rank in an Optimization Framework for Timeline Summarization

H1N1 – Continuity v.s. NonContinuity

12

Without Continuity 2009-04-25 The World Health Organisation has warned countries to be on alert for any unusual flu outbreaks after a swine flu virus was implicated in possibly dozens of human deaths in Mexico . 2009-04-26 The World Health Organisation said at least 81 people had died from severe pneumonia caused by the flu - like illness in Mexico .

With Continuity

2009-04-25 The World Health Organisation has warned countries to be on alert for any unusual flu outbreaks after a swine flu virus was implicated in possibly dozens of human deaths in Mexico . 2009-04-26 The influenza strain that has struck Mexico and the United States involves , in many cases, a never-before-seen strain of the H1N1 virus ..

Page 13: Leveraging Learning To Rank in an Optimization Framework for Timeline Summarization

Thank you very much!

13

Page 14: Leveraging Learning To Rank in an Optimization Framework for Timeline Summarization

14

Novelty computation (s: sentence, S: set of sentences)

Continuity computation (s: sentence, DS (d_i-1_) is the previous day summary