Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College

Dissertation Defense

Multiple Alternative Sentence Compressions as a Tool for

Automatic Summarization Tasks

David Zajic

University of Maryland College ParkDepartment of Computer Science

November 28, 2006

Intuition

• A newspaper editor was found dead in a hotel room in this Pacific resort city a day after his paper ran articles about organized crime and corruption in the city government. The body of the editor, Misael Tamayo Hernández, of the daily El Despertar de la Costa, was found early Friday with his hands tied behind his back in a room at the Venus Motel…

Intuition

• A newspaper editor was found dead in a hotel room in this Pacific resort city a day after his paper ran articles about organized crime and corruption in the city government.

• Newspaper editor found dead in Pacific resort city

Intuition


• Paper ran articles about corruption in government

Intuition


• Hernández, Zihuatanjo: Newspaper editor found dead in Pacific resort city

Intuition


• Newspaper Editor Killed in Mexico– (A) Newspaper Editor (was) killed in Mexico

Talk Roadmap

• Introduction• Automatic Summarization under MASC

framework– Single Document and evaluation

• HMM Hedge, Trimmer, Topiary

– Extending to Multi-document Summarization

• Review of Evaluations• Conclusion• Future Work

Introduction

• Automatic Summarization– Distillation of important information from a

source into an abridged form– Extractive Summarization: select sentences

with important content from the document– Abstractive Summarization– Limitations

• Sentences contain mixture of relevant, non-relevant information

• Sentences partially redundant to rest of summary

Contributions

• Two implementations of select-words-in-order– Statistical Method: HMM Hedge (Headline

Generation)– Syntactic Method: Trimmer

Contributions

• Multiple Alternative Sentence Compressions (MASC)– Framework for Automatic Text Summarization– Sentence Compression: rewriting a sentence in

an abridged form– Generation of many compressions of source

sentences to serve as candidates– Select from candidates using weighted features

to generate summary– Environment for testing hypotheses

Hypotheses

• Extractive summarization systems can create better summaries using larger pool of compressed candidates

• Sentence selectors choose better summary candidates using larger sets of features

• For Headline Generation, combination of fluent text and topics better than either alone

Contributions

• Sentence Compression– HMM Hedge– Trimmer– Topiary

• Sentence Selection– Lead Sentence for Headline Generation– Maximal Marginal Relevance for Multi-

document Summarization

Summarization Tasks

• Single Document Summarization– Very short: Headline Generation– Single sentence– 75 characters– DUC2002, 2003, 2004

• Query-focused Multi-Document Summarization– Multiple sentences– 100 – 250 words– DUC2005, 2006

Headline Generation

• Newspaper Headlines– Natural example of human summarization– Three criteria for a good headline:

• Summarize a story• Make people want to read it• Fit in specified space

– Headlinese: compressed form of English

Introduction

• Headline Types: Eye-Catcher • Indicative • Informative

• Under God Under Fire

• Pledge of Allegiance

• U.S. Court Decides Pledge of Allegiance Unconstitutional

Talk Roadmap


framework– HMM Hedge, Trimmer, Topiary– Single Document, Multi-document– Experimental evidence supporting hypotheses


General Architecture

Compression CandidatesDocument CandidateSelection

Summary

HMM Hedge

Trimmer

Topiary

Maximal Marginal Relevance

sentence selection

Sentence Selection

• Select sentences to be compressed

• Lead sentence, first 5

Sentence Compression

• Selecting words in order from a sentence– Or window of words

• Feasibility Studies (my work!)– Describe the task in a backup slide– Humans can almost always do this for written

news– Bias for words from within a single sentence– Bias for words early in document

Sentence Compression

• Human study showed potential for saving space by using sentence compression in multi-document summarization

• Subject was shown 103 sentences, relevant to 39 queries, asked to make relevance judgments on 430 compressed versions

• Potential for 16.7% reduction by word count, 17.6% reduction by characters, with no loss of relevance

Candidate Selection

• Maximal Marginal Relevance (MMR) (Carbonell and Goldstein, 1998)– All candidates given scores: linear combination of

static and dynamic features– Weights optimized for Rouge 1 recall, using BBN’s

Optimizer– Add highest-scoring candidate to summary

• Other compressions of source sentence removed from pool

– Recalculate dynamic features, Rescore candidates– Iterate until summary is complete.

Talk Roadmap




HMM Hedge ArchitectureSingle Compression

HMM HedgeDocument

VerbTags

Selection Summary

Part of Speech Tagger

Headline Language Model

General Language Model

HMM HedgeNoisy Channel Model

• Underlying method: select words in order• Sentences are observed data• Headlines are unobserved data• Noisy channel adds words to headlines to create

sentences

President signed legislation

On Tuesday the President signed the controversial legislation at a private ceremony

HMM HedgeNoisy Channel Model

• Probability of Headline estimated with bigram model of Headlinese

• Probability of observed Sentence given unobserved Headline (the channel model) estimated by unigram model of General English

HMM Hedge

• Viterbi Decoding parameters to mimic Headlines - give examples Are these model parameters of decoding parameters?– Groups of contiguous words, clumpiness– Size of gaps between words, gappiness– Sentence position of words– Require verb

HMM Hedge

• Adaptation to Multi-candidate compression

• Finds the 5 most likely headlines for summary lengths 5 to 15 words of document sentences

Automatic Evaluation (Give references & examples)

• Recall Oriented Understudy of Gisting Evaluation (Rouge)– Rouge Recall: ratio of matching candidate n-gram

count to reference n-gram count– Rouge Precision: ratio of matching candidate n-gram

count to candidate n-gram count times number of references.

– R1 preferred for single document summarization– R2 preferred for multi-document summarization

HMM Hedge

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

0.26

0.28

1 2 3 4 5N-best at each length

Ro

ug

e 1

Sco

res

R1 Recall, 1 Sentence

R1 Precision, 1 Sentence

R1 Recall, 2 Sentences

R1 Precision, 2 Sentences

R1 Recall, 3 Sentences

R1 Precision, 3 Sentences

HMM Hedge

• Features (5-fold cross validation) default weight, optimized weight fold A) Linear combination scoring function– Word position sum (-0.05, 1.72)– Small gaps (-0.01, 1.02)– Large gaps (-0.05, 3.70)– Clumps (-0.05, -0.17)– Sentence position (0, -945)– Length in words (1, 42)– Length in characters (1, 85) – Unigram probability of story words (1, 1.03)– Bigram probability of headline words (1, 1.51)– Emit probability of headline words (1, 3.60)

HMM Hedge

Fold Default

R1 recall

Weights

R1 Prec.

Optimized

R1 recall

Weights

R1 Prec.

A 0.11214 0.10726 0.24722 0.21482

B 0.11021 0.10231 0.24307 0.21425

C 0.11781 0.10811 0.24129 0.20795

D 0.11993 0.10660 0.16595 0.13454

E 0.11282 0.10003 0.25341 0.21775

Avg 0.11458 0.10486 0.23019 0.19786

Talk Roadmap




HMM HedgeMulti-Document Summarization

HMM Hedge Candidates,HMM Features URADocument

VerbTags

URAIndex

Selection

FeatureWeights

Summary

Part of Speech Tagger

Candidates,HMM Features,URA Features

Query (optional)

Headline Language Model

General Language Model

Candidate Selection

• Static Features– Sentence Position– Relevance– Centrality– Compression-specific features

• Dynamic Features– Redundancy– Count of summary candidates from source

document

Relevance and Centrality

• Universal Retrieval Architecture (URA)– Infrastructure for information retrieval tasks

• Four score components– Candidate Query Relevance: Matching score between

candidate and query– Document Query Relevance: Lucene similarity score

between document and query– Candidate Centrality: Average Lucene similarity of

candidate to other sentences in document– Document Centrality: Average Lucene similarity of

document to other documents in cluster

Redundancy: Intuition

• Consider a summary about earthquakes

• “Generated” by topic: Earthquake, seismic, Richter scale

• “Generated” by general language: Dog, under, during

• Sentences with many words “generated” by the topic are redundant

Redundancy: Formal

Ss

CsPDsPSredundancy

Csize

CwcountCwP

Dsize

DwcountDwP

)|()1()|()(

)(

),()|(

)(

),()|(

HMM Hedge Multi-doc

• Placeholder for results of HMM Hedge

Talk Roadmap




Trimmer

• Underlying method: select words in order• Parse and Trim• Rules come from study of Headlinese

– Different distributions of syntactic structures

Phenomenon Headlines Lead Sent

Preposed adjunct 0% 2.7%

Time expression 1.5% 24%

Noun Phrase Relative Clause 0.3% 3.5%

Trimmer: Mask operation

Trimmer: Mask Outside

Trimmer Single Document

TrimmerCandidates,Trimmer Features

Document

EntityTags

Parses

Selection Summary

Parser

EntityTagger

Trimmer: Root S

• Select the lowest leftmost S which has NP and VP children, in that order.[S [S [NPRebels] [VP agree to talks with

government]] officials said Tuesday.]

Trimmer: Preposed Adjunct

• Remove [YP …] preceding first NP inside chosen S

[S [PP According to a now-finalized blueprint described by U.S. officials and other sources] [NP the Bush administration] [VP plans to take complete, unilateral control of a post-Saddam Hussein Iraq]]

Trimmer: Conjunction

• Remove [X][CC][X] or [X][CC][X][S Illegal fireworks [VP [VP injured hundreds of

people] [CC and] [VP started six fires.]]]

[S A company offering blood cholesterol tests in grocery stores says [S [S medical technology has outpaced state laws,] [CC but] [S the state says the company doesn’t have the proper licenses.]]]

Trimmer

• Adaptation to multi-candidate compression

• Multi-candidate rules– Root S– Preamble– Conjunction

Trimmer

• Multi-candidate Root S• [S1 [S2 The latest flood crest, the eighth this summer,

passed Chongqing in southwest China], and [S3 waters were rising in Yichang, in central China’s Hubei province, on the middle reaches of the Yangtze], state television reported Sunday.]

• Single-candidate version would choose only S2. Multi-candidate Root-S generates all three choices.

Trimmer: Preamble Rule



Trimmer: Conjunction

• [S Illegal fireworks [VP [VP injured hundreds of people] [CC and] [VP started six fires.]]]

• Illegal fireworks injured hundreds of people

• Illegal fireworks started six fires

Trimmer: Features

• Selection among Trimmer candidates based on three sets of features– L: Length in characters or words– R: Counts of rule applications– C: Centrality

• Baseline LUL: select longest version under limit

Trimmer: Benefit of using more candidates

0.15

0.17

0.19

0.21

0.23

0.25

0.27

Trimmer Trimmer +R Trimmer +P Trimmer +C Trimmer +R+P Trimmer +R+C Trimmer +S+C Trimmer+R+S+C

Ro

ug

e 1

Rec

all

LUL

L

R

C

LR

LC

RC

LRC

Trimmer: Benefit of using more features

0.15

0.17

0.19

0.21

0.23

0.25

0.27

LUL L R C LR LC RC LRC

Ro

ug

e 1

Rec

all

Trimmer

Trimmer +R

Trimmer +P

Trimmer +C

Trimmer +R+P

Trimmer +R+C

Trimmer +S+C

Trimmer +R+S+C

Talk Roadmap




Trimmer Multi-Document

TrimmerCandidates,Trimmer Features URA

Document

EntityTags

Parses URAIndex

Selection

FeatureWeights

Summary

Parser

EntityTagger

Candidates,Trimmer Features,URA Features

Query (optional)

Trimmer

System R1 Recall R1 Prec. R2 Recall R2 Prec

Trimmer

MultiDoc

0.38198 0.37617 0.08051 0.07922

HMM

MultiDoc

0.37404 0.37405 0.07884 0.07887

Talk Roadmap




Topiary

• Combines topic terms and fluent text– Fluent text comes from Trimmer– Topics come from Unsupervised Topic Detection

(UTD)

• Single-candidate algorithm– Lower Trimmer threshold to make room for highest

scoring non-redundant topic term– Trim to lower threshold.– Adjust if topic redundancy changes because of

trimming

Topiary Single-Candidate

Topiary

Document

EntityTags

Parses

Summary

Parser

EntityTagger

Unsupervised Topic Detection

Topics

Topiary, Trimmer, UTD

0

0.05

0.1

0.15

0.2

0.25

0.3

Rouge 1 Rouge 2 Rouge 3 Rouge 4

First 75 chars

Topiary

Trimmer 2003

Trimmer 2004

UTD

Topiary

• Multi-candidate Algorithm– Generate Multi-candidate Trimmer candidates– Fill space in all Trimmer candidates with all

combinations of non-redundant topics– Score and select summary– Give an example of how this works.

Topiary Multi-Candidate

TopiaryCandidates,Trimmer Features URA

Document

EntityTags

Parses URAIndex

Selection

FeatureWeights

Summary

Parser

EntityTagger

Candidates,Trimmer Features,URA Features

Query (optional)

Unsupervised Topic Detection

Topics

DUC 2004 Task 1 Results (Rouge)

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

ROUGE-1 ROUGE-L ROUGE-W-1.2 ROUGE-2 ROUGE-3 ROUGE-4

1 TOPIARY

9 10

18 25

26 31

32 33

50 51

52 53

54 75

76 77

78 79

80 87

88 89

90 91

92 98

99 100

101 110

128 129

130 131

132 135

136 137

A B

C D

E F

G H

HumanReferences

Topiary

Baseline

AutomaticSummaries

Topiary Evaluation

Rouge Metric Topiary Multi-Candidate Topiary

Rouge-1 Recall 0.25027 0.26490

Rouge-2 Recall 0.06484 0.08168*

Rouge-3 Recall 0.02130 0.02805

Rouge-4 Recall 0.00717 0.01105

Rouge-L 0.20063 0.22283*

Rouge-W1.2 0.11951 0.13234*

Talk Roadmap

• Introduction• Automatic Summarization

– HMM Hedge, Trimmer, Topiary• Single-candidate, MASC versions

– Multi-document Summarizataion• HMM Hedge, Trimmer

• Evaluation• Conclusion• Future Work

Evaluation: Review

• HMM Hedge, Single-document. Rouge-1 recall increases as number of candidates increases

• HMM Hedge, Single-document. Rouge-1 doubles when scored with optimized weights on features

• Trimmer, Single-document. Rouge-1 increases with greater use of multi-candidate rules

• Trimmer, Single-document. Rouge-1 increases with larger set of features

• Topiary, Single-document. Multi-candidate Topiary scores significantly higher on some Rouge metrics than single-candidate Topiary.

• Trimmer scored higher than HMM for Multi-document summarization

Evaluation

Human extrinsic evaluation of HMM, Trimmer, Topiary and First 75

LDC agreement: ~20x increase in speed. Some loss of accuracy.

Relevance Prediction

Baseline First75 char, hard to beat

Talk Roadmap

• Introduction

• Automatic Summarization– HMM Hedge, Trimmer, Topiary– Multiple Alternative Sentence Compressions

(MASC)

• Evaluation

• Conclusion

• Future Work

Contributions

• Use of MASC framework performance across summarization tasks and compression source

• Fluent and informative summaries can be constructed by selecting words in order from sentences. Verified by doing a human study.

• Headlines combining fluent text and topic terms score better than either alone

Future Work

• Enhance redundancy score with paraphrase detection

• Anaphora resolution in candidates

• Expand candidates by sentence merging

• Sentence ordering in multi-sentence summaries

End

Documents

Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College