Upload
flora-stone
View
212
Download
0
Tags:
Embed Size (px)
Citation preview
Dissertation Defense
Multiple Alternative Sentence Compressions as a Tool for
Automatic Summarization Tasks
David Zajic
University of Maryland College ParkDepartment of Computer Science
November 28, 2006
Intuition
• A newspaper editor was found dead in a hotel room in this Pacific resort city a day after his paper ran articles about organized crime and corruption in the city government. The body of the editor, Misael Tamayo Hernández, of the daily El Despertar de la Costa, was found early Friday with his hands tied behind his back in a room at the Venus Motel…
Intuition
• A newspaper editor was found dead in a hotel room in this Pacific resort city a day after his paper ran articles about organized crime and corruption in the city government.
• Newspaper editor found dead in Pacific resort city
Intuition
• A newspaper editor was found dead in a hotel room in this Pacific resort city a day after his paper ran articles about organized crime and corruption in the city government.
• Paper ran articles about corruption in government
Intuition
• A newspaper editor was found dead in a hotel room in this Pacific resort city a day after his paper ran articles about organized crime and corruption in the city government.
• Hernández, Zihuatanjo: Newspaper editor found dead in Pacific resort city
Intuition
• A newspaper editor was found dead in a hotel room in this Pacific resort city a day after his paper ran articles about organized crime and corruption in the city government.
• Newspaper Editor Killed in Mexico– (A) Newspaper Editor (was) killed in Mexico
Talk Roadmap
• Introduction• Automatic Summarization under MASC
framework– Single Document and evaluation
• HMM Hedge, Trimmer, Topiary
– Extending to Multi-document Summarization
• Review of Evaluations• Conclusion• Future Work
Introduction
• Automatic Summarization– Distillation of important information from a
source into an abridged form– Extractive Summarization: select sentences
with important content from the document– Abstractive Summarization– Limitations
• Sentences contain mixture of relevant, non-relevant information
• Sentences partially redundant to rest of summary
Contributions
• Two implementations of select-words-in-order– Statistical Method: HMM Hedge (Headline
Generation)– Syntactic Method: Trimmer
Contributions
• Multiple Alternative Sentence Compressions (MASC)– Framework for Automatic Text Summarization– Sentence Compression: rewriting a sentence in
an abridged form– Generation of many compressions of source
sentences to serve as candidates– Select from candidates using weighted features
to generate summary– Environment for testing hypotheses
Hypotheses
• Extractive summarization systems can create better summaries using larger pool of compressed candidates
• Sentence selectors choose better summary candidates using larger sets of features
• For Headline Generation, combination of fluent text and topics better than either alone
Contributions
• Sentence Compression– HMM Hedge– Trimmer– Topiary
• Sentence Selection– Lead Sentence for Headline Generation– Maximal Marginal Relevance for Multi-
document Summarization
Summarization Tasks
• Single Document Summarization– Very short: Headline Generation– Single sentence– 75 characters– DUC2002, 2003, 2004
• Query-focused Multi-Document Summarization– Multiple sentences– 100 – 250 words– DUC2005, 2006
Headline Generation
• Newspaper Headlines– Natural example of human summarization– Three criteria for a good headline:
• Summarize a story• Make people want to read it• Fit in specified space
– Headlinese: compressed form of English
Introduction
• Headline Types: Eye-Catcher • Indicative • Informative
• Under God Under Fire
• Pledge of Allegiance
• U.S. Court Decides Pledge of Allegiance Unconstitutional
Talk Roadmap
• Introduction• Automatic Summarization under MASC
framework– HMM Hedge, Trimmer, Topiary– Single Document, Multi-document– Experimental evidence supporting hypotheses
• Review of Evaluations• Conclusion• Future Work
General Architecture
Compression CandidatesDocument CandidateSelection
Summary
HMM Hedge
Trimmer
Topiary
Maximal Marginal Relevance
sentence selection
Sentence Selection
• Select sentences to be compressed
• Lead sentence, first 5
Sentence Compression
• Selecting words in order from a sentence– Or window of words
• Feasibility Studies (my work!)– Describe the task in a backup slide– Humans can almost always do this for written
news– Bias for words from within a single sentence– Bias for words early in document
Sentence Compression
• Human study showed potential for saving space by using sentence compression in multi-document summarization
• Subject was shown 103 sentences, relevant to 39 queries, asked to make relevance judgments on 430 compressed versions
• Potential for 16.7% reduction by word count, 17.6% reduction by characters, with no loss of relevance
Candidate Selection
• Maximal Marginal Relevance (MMR) (Carbonell and Goldstein, 1998)– All candidates given scores: linear combination of
static and dynamic features– Weights optimized for Rouge 1 recall, using BBN’s
Optimizer– Add highest-scoring candidate to summary
• Other compressions of source sentence removed from pool
– Recalculate dynamic features, Rescore candidates– Iterate until summary is complete.
Talk Roadmap
• Introduction• Automatic Summarization under MASC
framework– HMM Hedge, Trimmer, Topiary– Single Document, Multi-document– Experimental evidence supporting hypotheses
• Review of Evaluations• Conclusion• Future Work
HMM Hedge ArchitectureSingle Compression
HMM HedgeDocument
VerbTags
Selection Summary
Part of Speech Tagger
Headline Language Model
General Language Model
HMM HedgeNoisy Channel Model
• Underlying method: select words in order• Sentences are observed data• Headlines are unobserved data• Noisy channel adds words to headlines to create
sentences
President signed legislation
On Tuesday the President signed the controversial legislation at a private ceremony
HMM HedgeNoisy Channel Model
• Probability of Headline estimated with bigram model of Headlinese
• Probability of observed Sentence given unobserved Headline (the channel model) estimated by unigram model of General English
HMM Hedge
• Viterbi Decoding parameters to mimic Headlines - give examples Are these model parameters of decoding parameters?– Groups of contiguous words, clumpiness– Size of gaps between words, gappiness– Sentence position of words– Require verb
HMM Hedge
• Adaptation to Multi-candidate compression
• Finds the 5 most likely headlines for summary lengths 5 to 15 words of document sentences
Automatic Evaluation (Give references & examples)
• Recall Oriented Understudy of Gisting Evaluation (Rouge)– Rouge Recall: ratio of matching candidate n-gram
count to reference n-gram count– Rouge Precision: ratio of matching candidate n-gram
count to candidate n-gram count times number of references.
– R1 preferred for single document summarization– R2 preferred for multi-document summarization
HMM Hedge
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
0.26
0.28
1 2 3 4 5N-best at each length
Ro
ug
e 1
Sco
res
R1 Recall, 1 Sentence
R1 Precision, 1 Sentence
R1 Recall, 2 Sentences
R1 Precision, 2 Sentences
R1 Recall, 3 Sentences
R1 Precision, 3 Sentences
HMM Hedge
• Features (5-fold cross validation) default weight, optimized weight fold A) Linear combination scoring function– Word position sum (-0.05, 1.72)– Small gaps (-0.01, 1.02)– Large gaps (-0.05, 3.70)– Clumps (-0.05, -0.17)– Sentence position (0, -945)– Length in words (1, 42)– Length in characters (1, 85) – Unigram probability of story words (1, 1.03)– Bigram probability of headline words (1, 1.51)– Emit probability of headline words (1, 3.60)
HMM Hedge
Fold Default
R1 recall
Weights
R1 Prec.
Optimized
R1 recall
Weights
R1 Prec.
A 0.11214 0.10726 0.24722 0.21482
B 0.11021 0.10231 0.24307 0.21425
C 0.11781 0.10811 0.24129 0.20795
D 0.11993 0.10660 0.16595 0.13454
E 0.11282 0.10003 0.25341 0.21775
Avg 0.11458 0.10486 0.23019 0.19786
Talk Roadmap
• Introduction• Automatic Summarization under MASC
framework– HMM Hedge, Trimmer, Topiary– Single Document, Multi-document– Experimental evidence supporting hypotheses
• Review of Evaluations• Conclusion• Future Work
HMM HedgeMulti-Document Summarization
HMM Hedge Candidates,HMM Features URADocument
VerbTags
URAIndex
Selection
FeatureWeights
Summary
Part of Speech Tagger
Candidates,HMM Features,URA Features
Query (optional)
Headline Language Model
General Language Model
Candidate Selection
• Static Features– Sentence Position– Relevance– Centrality– Compression-specific features
• Dynamic Features– Redundancy– Count of summary candidates from source
document
Relevance and Centrality
• Universal Retrieval Architecture (URA)– Infrastructure for information retrieval tasks
• Four score components– Candidate Query Relevance: Matching score between
candidate and query– Document Query Relevance: Lucene similarity score
between document and query– Candidate Centrality: Average Lucene similarity of
candidate to other sentences in document– Document Centrality: Average Lucene similarity of
document to other documents in cluster
Redundancy: Intuition
• Consider a summary about earthquakes
• “Generated” by topic: Earthquake, seismic, Richter scale
• “Generated” by general language: Dog, under, during
• Sentences with many words “generated” by the topic are redundant
Redundancy: Formal
Ss
CsPDsPSredundancy
Csize
CwcountCwP
Dsize
DwcountDwP
)|()1()|()(
)(
),()|(
)(
),()|(
HMM Hedge Multi-doc
• Placeholder for results of HMM Hedge
Talk Roadmap
• Introduction• Automatic Summarization under MASC
framework– HMM Hedge, Trimmer, Topiary– Single Document, Multi-document– Experimental evidence supporting hypotheses
• Review of Evaluations• Conclusion• Future Work
Trimmer
• Underlying method: select words in order• Parse and Trim• Rules come from study of Headlinese
– Different distributions of syntactic structures
Phenomenon Headlines Lead Sent
Preposed adjunct 0% 2.7%
Time expression 1.5% 24%
Noun Phrase Relative Clause 0.3% 3.5%
Trimmer: Mask operation
Trimmer: Mask Outside
Trimmer Single Document
TrimmerCandidates,Trimmer Features
Document
EntityTags
Parses
Selection Summary
Parser
EntityTagger
Trimmer: Root S
• Select the lowest leftmost S which has NP and VP children, in that order.[S [S [NPRebels] [VP agree to talks with
government]] officials said Tuesday.]
Trimmer: Preposed Adjunct
• Remove [YP …] preceding first NP inside chosen S
[S [PP According to a now-finalized blueprint described by U.S. officials and other sources] [NP the Bush administration] [VP plans to take complete, unilateral control of a post-Saddam Hussein Iraq]]
Trimmer: Conjunction
• Remove [X][CC][X] or [X][CC][X][S Illegal fireworks [VP [VP injured hundreds of
people] [CC and] [VP started six fires.]]]
[S A company offering blood cholesterol tests in grocery stores says [S [S medical technology has outpaced state laws,] [CC but] [S the state says the company doesn’t have the proper licenses.]]]
Trimmer
• Adaptation to multi-candidate compression
• Multi-candidate rules– Root S– Preamble– Conjunction
Trimmer
• Multi-candidate Root S• [S1 [S2 The latest flood crest, the eighth this summer,
passed Chongqing in southwest China], and [S3 waters were rising in Yichang, in central China’s Hubei province, on the middle reaches of the Yangtze], state television reported Sunday.]
• Single-candidate version would choose only S2. Multi-candidate Root-S generates all three choices.
Trimmer: Preamble Rule
Trimmer: Preamble Rule
Trimmer: Preamble Rule
Trimmer: Conjunction
• [S Illegal fireworks [VP [VP injured hundreds of people] [CC and] [VP started six fires.]]]
• Illegal fireworks injured hundreds of people
• Illegal fireworks started six fires
Trimmer: Features
• Selection among Trimmer candidates based on three sets of features– L: Length in characters or words– R: Counts of rule applications– C: Centrality
• Baseline LUL: select longest version under limit
Trimmer: Benefit of using more candidates
0.15
0.17
0.19
0.21
0.23
0.25
0.27
Trimmer Trimmer +R Trimmer +P Trimmer +C Trimmer +R+P Trimmer +R+C Trimmer +S+C Trimmer+R+S+C
Ro
ug
e 1
Rec
all
LUL
L
R
C
LR
LC
RC
LRC
Trimmer: Benefit of using more features
0.15
0.17
0.19
0.21
0.23
0.25
0.27
LUL L R C LR LC RC LRC
Ro
ug
e 1
Rec
all
Trimmer
Trimmer +R
Trimmer +P
Trimmer +C
Trimmer +R+P
Trimmer +R+C
Trimmer +S+C
Trimmer +R+S+C
Talk Roadmap
• Introduction• Automatic Summarization under MASC
framework– HMM Hedge, Trimmer, Topiary– Single Document, Multi-document– Experimental evidence supporting hypotheses
• Review of Evaluations• Conclusion• Future Work
Trimmer Multi-Document
TrimmerCandidates,Trimmer Features URA
Document
EntityTags
Parses URAIndex
Selection
FeatureWeights
Summary
Parser
EntityTagger
Candidates,Trimmer Features,URA Features
Query (optional)
Trimmer
System R1 Recall R1 Prec. R2 Recall R2 Prec
Trimmer
MultiDoc
0.38198 0.37617 0.08051 0.07922
HMM
MultiDoc
0.37404 0.37405 0.07884 0.07887
Talk Roadmap
• Introduction• Automatic Summarization under MASC
framework– HMM Hedge, Trimmer, Topiary– Single Document, Multi-document– Experimental evidence supporting hypotheses
• Review of Evaluations• Conclusion• Future Work
Topiary
• Combines topic terms and fluent text– Fluent text comes from Trimmer– Topics come from Unsupervised Topic Detection
(UTD)
• Single-candidate algorithm– Lower Trimmer threshold to make room for highest
scoring non-redundant topic term– Trim to lower threshold.– Adjust if topic redundancy changes because of
trimming
Topiary Single-Candidate
Topiary
Document
EntityTags
Parses
Summary
Parser
EntityTagger
Unsupervised Topic Detection
Topics
Topiary, Trimmer, UTD
0
0.05
0.1
0.15
0.2
0.25
0.3
Rouge 1 Rouge 2 Rouge 3 Rouge 4
First 75 chars
Topiary
Trimmer 2003
Trimmer 2004
UTD
Topiary
• Multi-candidate Algorithm– Generate Multi-candidate Trimmer candidates– Fill space in all Trimmer candidates with all
combinations of non-redundant topics– Score and select summary– Give an example of how this works.
Topiary Multi-Candidate
TopiaryCandidates,Trimmer Features URA
Document
EntityTags
Parses URAIndex
Selection
FeatureWeights
Summary
Parser
EntityTagger
Candidates,Trimmer Features,URA Features
Query (optional)
Unsupervised Topic Detection
Topics
DUC 2004 Task 1 Results (Rouge)
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
ROUGE-1 ROUGE-L ROUGE-W-1.2 ROUGE-2 ROUGE-3 ROUGE-4
1 TOPIARY
9 10
18 25
26 31
32 33
50 51
52 53
54 75
76 77
78 79
80 87
88 89
90 91
92 98
99 100
101 110
128 129
130 131
132 135
136 137
A B
C D
E F
G H
HumanReferences
Topiary
Baseline
AutomaticSummaries
Topiary Evaluation
Rouge Metric Topiary Multi-Candidate Topiary
Rouge-1 Recall 0.25027 0.26490
Rouge-2 Recall 0.06484 0.08168*
Rouge-3 Recall 0.02130 0.02805
Rouge-4 Recall 0.00717 0.01105
Rouge-L 0.20063 0.22283*
Rouge-W1.2 0.11951 0.13234*
Talk Roadmap
• Introduction• Automatic Summarization
– HMM Hedge, Trimmer, Topiary• Single-candidate, MASC versions
– Multi-document Summarizataion• HMM Hedge, Trimmer
• Evaluation• Conclusion• Future Work
Evaluation: Review
• HMM Hedge, Single-document. Rouge-1 recall increases as number of candidates increases
• HMM Hedge, Single-document. Rouge-1 doubles when scored with optimized weights on features
• Trimmer, Single-document. Rouge-1 increases with greater use of multi-candidate rules
• Trimmer, Single-document. Rouge-1 increases with larger set of features
• Topiary, Single-document. Multi-candidate Topiary scores significantly higher on some Rouge metrics than single-candidate Topiary.
• Trimmer scored higher than HMM for Multi-document summarization
Evaluation
Human extrinsic evaluation of HMM, Trimmer, Topiary and First 75
LDC agreement: ~20x increase in speed. Some loss of accuracy.
Relevance Prediction
Baseline First75 char, hard to beat
Talk Roadmap
• Introduction
• Automatic Summarization– HMM Hedge, Trimmer, Topiary– Multiple Alternative Sentence Compressions
(MASC)
• Evaluation
• Conclusion
• Future Work
Contributions
• Use of MASC framework performance across summarization tasks and compression source
• Fluent and informative summaries can be constructed by selecting words in order from sentences. Verified by doing a human study.
• Headlines combining fluent text and topic terms score better than either alone
Future Work
• Enhance redundancy score with paraphrase detection
• Anaphora resolution in candidates
• Expand candidates by sentence merging
• Sentence ordering in multi-sentence summaries
End