73
Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College Park Department of Computer Science November 28, 2006

Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Embed Size (px)

Citation preview

Page 1: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Dissertation Defense

Multiple Alternative Sentence Compressions as a Tool for

Automatic Summarization Tasks

David Zajic

University of Maryland College ParkDepartment of Computer Science

November 28, 2006

Page 2: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Intuition

• A newspaper editor was found dead in a hotel room in this Pacific resort city a day after his paper ran articles about organized crime and corruption in the city government. The body of the editor, Misael Tamayo Hernández, of the daily El Despertar de la Costa, was found early Friday with his hands tied behind his back in a room at the Venus Motel…

Page 3: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Intuition

• A newspaper editor was found dead in a hotel room in this Pacific resort city a day after his paper ran articles about organized crime and corruption in the city government.

• Newspaper editor found dead in Pacific resort city

Page 4: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Intuition

• A newspaper editor was found dead in a hotel room in this Pacific resort city a day after his paper ran articles about organized crime and corruption in the city government.

• Paper ran articles about corruption in government

Page 5: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Intuition

• A newspaper editor was found dead in a hotel room in this Pacific resort city a day after his paper ran articles about organized crime and corruption in the city government.

• Hernández, Zihuatanjo: Newspaper editor found dead in Pacific resort city

Page 6: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Intuition

• A newspaper editor was found dead in a hotel room in this Pacific resort city a day after his paper ran articles about organized crime and corruption in the city government.

• Newspaper Editor Killed in Mexico– (A) Newspaper Editor (was) killed in Mexico

Page 7: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Talk Roadmap

• Introduction• Automatic Summarization under MASC

framework– Single Document and evaluation

• HMM Hedge, Trimmer, Topiary

– Extending to Multi-document Summarization

• Review of Evaluations• Conclusion• Future Work

Page 8: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Introduction

• Automatic Summarization– Distillation of important information from a

source into an abridged form– Extractive Summarization: select sentences

with important content from the document– Abstractive Summarization– Limitations

• Sentences contain mixture of relevant, non-relevant information

• Sentences partially redundant to rest of summary

Page 9: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Contributions

• Two implementations of select-words-in-order– Statistical Method: HMM Hedge (Headline

Generation)– Syntactic Method: Trimmer

Page 10: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Contributions

• Multiple Alternative Sentence Compressions (MASC)– Framework for Automatic Text Summarization– Sentence Compression: rewriting a sentence in

an abridged form– Generation of many compressions of source

sentences to serve as candidates– Select from candidates using weighted features

to generate summary– Environment for testing hypotheses

Page 11: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Hypotheses

• Extractive summarization systems can create better summaries using larger pool of compressed candidates

• Sentence selectors choose better summary candidates using larger sets of features

• For Headline Generation, combination of fluent text and topics better than either alone

Page 12: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Contributions

• Sentence Compression– HMM Hedge– Trimmer– Topiary

• Sentence Selection– Lead Sentence for Headline Generation– Maximal Marginal Relevance for Multi-

document Summarization

Page 13: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Summarization Tasks

• Single Document Summarization– Very short: Headline Generation– Single sentence– 75 characters– DUC2002, 2003, 2004

• Query-focused Multi-Document Summarization– Multiple sentences– 100 – 250 words– DUC2005, 2006

Page 14: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Headline Generation

• Newspaper Headlines– Natural example of human summarization– Three criteria for a good headline:

• Summarize a story• Make people want to read it• Fit in specified space

– Headlinese: compressed form of English

Page 15: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Introduction

• Headline Types: Eye-Catcher • Indicative • Informative

• Under God Under Fire

• Pledge of Allegiance

• U.S. Court Decides Pledge of Allegiance Unconstitutional

Page 16: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Talk Roadmap

• Introduction• Automatic Summarization under MASC

framework– HMM Hedge, Trimmer, Topiary– Single Document, Multi-document– Experimental evidence supporting hypotheses

• Review of Evaluations• Conclusion• Future Work

Page 17: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

General Architecture

Compression CandidatesDocument CandidateSelection

Summary

HMM Hedge

Trimmer

Topiary

Maximal Marginal Relevance

sentence selection

Page 18: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Sentence Selection

• Select sentences to be compressed

• Lead sentence, first 5

Page 19: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Sentence Compression

• Selecting words in order from a sentence– Or window of words

• Feasibility Studies (my work!)– Describe the task in a backup slide– Humans can almost always do this for written

news– Bias for words from within a single sentence– Bias for words early in document

Page 20: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Sentence Compression

• Human study showed potential for saving space by using sentence compression in multi-document summarization

• Subject was shown 103 sentences, relevant to 39 queries, asked to make relevance judgments on 430 compressed versions

• Potential for 16.7% reduction by word count, 17.6% reduction by characters, with no loss of relevance

Page 21: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Candidate Selection

• Maximal Marginal Relevance (MMR) (Carbonell and Goldstein, 1998)– All candidates given scores: linear combination of

static and dynamic features– Weights optimized for Rouge 1 recall, using BBN’s

Optimizer– Add highest-scoring candidate to summary

• Other compressions of source sentence removed from pool

– Recalculate dynamic features, Rescore candidates– Iterate until summary is complete.

Page 22: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Talk Roadmap

• Introduction• Automatic Summarization under MASC

framework– HMM Hedge, Trimmer, Topiary– Single Document, Multi-document– Experimental evidence supporting hypotheses

• Review of Evaluations• Conclusion• Future Work

Page 23: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

HMM Hedge ArchitectureSingle Compression

HMM HedgeDocument

VerbTags

Selection Summary

Part of Speech Tagger

Headline Language Model

General Language Model

Page 24: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

HMM HedgeNoisy Channel Model

• Underlying method: select words in order• Sentences are observed data• Headlines are unobserved data• Noisy channel adds words to headlines to create

sentences

President signed legislation

On Tuesday the President signed the controversial legislation at a private ceremony

Page 25: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

HMM HedgeNoisy Channel Model

• Probability of Headline estimated with bigram model of Headlinese

• Probability of observed Sentence given unobserved Headline (the channel model) estimated by unigram model of General English

Page 26: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

HMM Hedge

• Viterbi Decoding parameters to mimic Headlines - give examples Are these model parameters of decoding parameters?– Groups of contiguous words, clumpiness– Size of gaps between words, gappiness– Sentence position of words– Require verb

Page 27: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

HMM Hedge

• Adaptation to Multi-candidate compression

• Finds the 5 most likely headlines for summary lengths 5 to 15 words of document sentences

Page 28: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Automatic Evaluation (Give references & examples)

• Recall Oriented Understudy of Gisting Evaluation (Rouge)– Rouge Recall: ratio of matching candidate n-gram

count to reference n-gram count– Rouge Precision: ratio of matching candidate n-gram

count to candidate n-gram count times number of references.

– R1 preferred for single document summarization– R2 preferred for multi-document summarization

Page 29: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

HMM Hedge

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

0.26

0.28

1 2 3 4 5N-best at each length

Ro

ug

e 1

Sco

res

R1 Recall, 1 Sentence

R1 Precision, 1 Sentence

R1 Recall, 2 Sentences

R1 Precision, 2 Sentences

R1 Recall, 3 Sentences

R1 Precision, 3 Sentences

Page 30: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

HMM Hedge

• Features (5-fold cross validation) default weight, optimized weight fold A) Linear combination scoring function– Word position sum (-0.05, 1.72)– Small gaps (-0.01, 1.02)– Large gaps (-0.05, 3.70)– Clumps (-0.05, -0.17)– Sentence position (0, -945)– Length in words (1, 42)– Length in characters (1, 85) – Unigram probability of story words (1, 1.03)– Bigram probability of headline words (1, 1.51)– Emit probability of headline words (1, 3.60)

Page 31: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

HMM Hedge

Fold Default

R1 recall

Weights

R1 Prec.

Optimized

R1 recall

Weights

R1 Prec.

A 0.11214 0.10726 0.24722 0.21482

B 0.11021 0.10231 0.24307 0.21425

C 0.11781 0.10811 0.24129 0.20795

D 0.11993 0.10660 0.16595 0.13454

E 0.11282 0.10003 0.25341 0.21775

Avg 0.11458 0.10486 0.23019 0.19786

Page 32: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Talk Roadmap

• Introduction• Automatic Summarization under MASC

framework– HMM Hedge, Trimmer, Topiary– Single Document, Multi-document– Experimental evidence supporting hypotheses

• Review of Evaluations• Conclusion• Future Work

Page 33: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

HMM HedgeMulti-Document Summarization

HMM Hedge Candidates,HMM Features URADocument

VerbTags

URAIndex

Selection

FeatureWeights

Summary

Part of Speech Tagger

Candidates,HMM Features,URA Features

Query (optional)

Headline Language Model

General Language Model

Page 34: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Candidate Selection

• Static Features– Sentence Position– Relevance– Centrality– Compression-specific features

• Dynamic Features– Redundancy– Count of summary candidates from source

document

Page 35: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Relevance and Centrality

• Universal Retrieval Architecture (URA)– Infrastructure for information retrieval tasks

• Four score components– Candidate Query Relevance: Matching score between

candidate and query– Document Query Relevance: Lucene similarity score

between document and query– Candidate Centrality: Average Lucene similarity of

candidate to other sentences in document– Document Centrality: Average Lucene similarity of

document to other documents in cluster

Page 36: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Redundancy: Intuition

• Consider a summary about earthquakes

• “Generated” by topic: Earthquake, seismic, Richter scale

• “Generated” by general language: Dog, under, during

• Sentences with many words “generated” by the topic are redundant

Page 37: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Redundancy: Formal

Ss

CsPDsPSredundancy

Csize

CwcountCwP

Dsize

DwcountDwP

)|()1()|()(

)(

),()|(

)(

),()|(

Page 38: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

HMM Hedge Multi-doc

• Placeholder for results of HMM Hedge

Page 39: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Talk Roadmap

• Introduction• Automatic Summarization under MASC

framework– HMM Hedge, Trimmer, Topiary– Single Document, Multi-document– Experimental evidence supporting hypotheses

• Review of Evaluations• Conclusion• Future Work

Page 40: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Trimmer

• Underlying method: select words in order• Parse and Trim• Rules come from study of Headlinese

– Different distributions of syntactic structures

Phenomenon Headlines Lead Sent

Preposed adjunct 0% 2.7%

Time expression 1.5% 24%

Noun Phrase Relative Clause 0.3% 3.5%

Page 41: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Trimmer: Mask operation

Page 42: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Trimmer: Mask Outside

Page 43: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Trimmer Single Document

TrimmerCandidates,Trimmer Features

Document

EntityTags

Parses

Selection Summary

Parser

EntityTagger

Page 44: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Trimmer: Root S

• Select the lowest leftmost S which has NP and VP children, in that order.[S [S [NPRebels] [VP agree to talks with

government]] officials said Tuesday.]

Page 45: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Trimmer: Preposed Adjunct

• Remove [YP …] preceding first NP inside chosen S

[S [PP According to a now-finalized blueprint described by U.S. officials and other sources] [NP the Bush administration] [VP plans to take complete, unilateral control of a post-Saddam Hussein Iraq]]

Page 46: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Trimmer: Conjunction

• Remove [X][CC][X] or [X][CC][X][S Illegal fireworks [VP [VP injured hundreds of

people] [CC and] [VP started six fires.]]]

[S A company offering blood cholesterol tests in grocery stores says [S [S medical technology has outpaced state laws,] [CC but] [S the state says the company doesn’t have the proper licenses.]]]

Page 47: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Trimmer

• Adaptation to multi-candidate compression

• Multi-candidate rules– Root S– Preamble– Conjunction

Page 48: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Trimmer

• Multi-candidate Root S• [S1 [S2 The latest flood crest, the eighth this summer,

passed Chongqing in southwest China], and [S3 waters were rising in Yichang, in central China’s Hubei province, on the middle reaches of the Yangtze], state television reported Sunday.]

• Single-candidate version would choose only S2. Multi-candidate Root-S generates all three choices.

Page 49: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Trimmer: Preamble Rule

Page 50: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Trimmer: Preamble Rule

Page 51: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Trimmer: Preamble Rule

Page 52: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Trimmer: Conjunction

• [S Illegal fireworks [VP [VP injured hundreds of people] [CC and] [VP started six fires.]]]

• Illegal fireworks injured hundreds of people

• Illegal fireworks started six fires

Page 53: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Trimmer: Features

• Selection among Trimmer candidates based on three sets of features– L: Length in characters or words– R: Counts of rule applications– C: Centrality

• Baseline LUL: select longest version under limit

Page 54: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Trimmer: Benefit of using more candidates

0.15

0.17

0.19

0.21

0.23

0.25

0.27

Trimmer Trimmer +R Trimmer +P Trimmer +C Trimmer +R+P Trimmer +R+C Trimmer +S+C Trimmer+R+S+C

Ro

ug

e 1

Rec

all

LUL

L

R

C

LR

LC

RC

LRC

Page 55: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Trimmer: Benefit of using more features

0.15

0.17

0.19

0.21

0.23

0.25

0.27

LUL L R C LR LC RC LRC

Ro

ug

e 1

Rec

all

Trimmer

Trimmer +R

Trimmer +P

Trimmer +C

Trimmer +R+P

Trimmer +R+C

Trimmer +S+C

Trimmer +R+S+C

Page 56: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Talk Roadmap

• Introduction• Automatic Summarization under MASC

framework– HMM Hedge, Trimmer, Topiary– Single Document, Multi-document– Experimental evidence supporting hypotheses

• Review of Evaluations• Conclusion• Future Work

Page 57: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Trimmer Multi-Document

TrimmerCandidates,Trimmer Features URA

Document

EntityTags

Parses URAIndex

Selection

FeatureWeights

Summary

Parser

EntityTagger

Candidates,Trimmer Features,URA Features

Query (optional)

Page 58: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Trimmer

System R1 Recall R1 Prec. R2 Recall R2 Prec

Trimmer

MultiDoc

0.38198 0.37617 0.08051 0.07922

HMM

MultiDoc

0.37404 0.37405 0.07884 0.07887

Page 59: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Talk Roadmap

• Introduction• Automatic Summarization under MASC

framework– HMM Hedge, Trimmer, Topiary– Single Document, Multi-document– Experimental evidence supporting hypotheses

• Review of Evaluations• Conclusion• Future Work

Page 60: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Topiary

• Combines topic terms and fluent text– Fluent text comes from Trimmer– Topics come from Unsupervised Topic Detection

(UTD)

• Single-candidate algorithm– Lower Trimmer threshold to make room for highest

scoring non-redundant topic term– Trim to lower threshold.– Adjust if topic redundancy changes because of

trimming

Page 61: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Topiary Single-Candidate

Topiary

Document

EntityTags

Parses

Summary

Parser

EntityTagger

Unsupervised Topic Detection

Topics

Page 62: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Topiary, Trimmer, UTD

0

0.05

0.1

0.15

0.2

0.25

0.3

Rouge 1 Rouge 2 Rouge 3 Rouge 4

First 75 chars

Topiary

Trimmer 2003

Trimmer 2004

UTD

Page 63: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Topiary

• Multi-candidate Algorithm– Generate Multi-candidate Trimmer candidates– Fill space in all Trimmer candidates with all

combinations of non-redundant topics– Score and select summary– Give an example of how this works.

Page 64: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Topiary Multi-Candidate

TopiaryCandidates,Trimmer Features URA

Document

EntityTags

Parses URAIndex

Selection

FeatureWeights

Summary

Parser

EntityTagger

Candidates,Trimmer Features,URA Features

Query (optional)

Unsupervised Topic Detection

Topics

Page 65: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

DUC 2004 Task 1 Results (Rouge)

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

ROUGE-1 ROUGE-L ROUGE-W-1.2 ROUGE-2 ROUGE-3 ROUGE-4

1 TOPIARY

9 10

18 25

26 31

32 33

50 51

52 53

54 75

76 77

78 79

80 87

88 89

90 91

92 98

99 100

101 110

128 129

130 131

132 135

136 137

A B

C D

E F

G H

HumanReferences

Topiary

Baseline

AutomaticSummaries

Page 66: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Topiary Evaluation

Rouge Metric Topiary Multi-Candidate Topiary

Rouge-1 Recall 0.25027 0.26490

Rouge-2 Recall 0.06484 0.08168*

Rouge-3 Recall 0.02130 0.02805

Rouge-4 Recall 0.00717 0.01105

Rouge-L 0.20063 0.22283*

Rouge-W1.2 0.11951 0.13234*

Page 67: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Talk Roadmap

• Introduction• Automatic Summarization

– HMM Hedge, Trimmer, Topiary• Single-candidate, MASC versions

– Multi-document Summarizataion• HMM Hedge, Trimmer

• Evaluation• Conclusion• Future Work

Page 68: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Evaluation: Review

• HMM Hedge, Single-document. Rouge-1 recall increases as number of candidates increases

• HMM Hedge, Single-document. Rouge-1 doubles when scored with optimized weights on features

• Trimmer, Single-document. Rouge-1 increases with greater use of multi-candidate rules

• Trimmer, Single-document. Rouge-1 increases with larger set of features

• Topiary, Single-document. Multi-candidate Topiary scores significantly higher on some Rouge metrics than single-candidate Topiary.

• Trimmer scored higher than HMM for Multi-document summarization

Page 69: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Evaluation

Human extrinsic evaluation of HMM, Trimmer, Topiary and First 75

LDC agreement: ~20x increase in speed. Some loss of accuracy.

Relevance Prediction

Baseline First75 char, hard to beat

Page 70: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Talk Roadmap

• Introduction

• Automatic Summarization– HMM Hedge, Trimmer, Topiary– Multiple Alternative Sentence Compressions

(MASC)

• Evaluation

• Conclusion

• Future Work

Page 71: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Contributions

• Use of MASC framework performance across summarization tasks and compression source

• Fluent and informative summaries can be constructed by selecting words in order from sentences. Verified by doing a human study.

• Headlines combining fluent text and topic terms score better than either alone

Page 72: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Future Work

• Enhance redundancy score with paraphrase detection

• Anaphora resolution in candidates

• Expand candidates by sentence merging

• Sentence ordering in multi-sentence summaries

Page 73: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

End