Summarization CS4705. 2 What is Summarization? Input: –Data: database, software trace, expert system –Text: one or more news articles, emails –Multimedia:

Summarization

CS4705

2

What is Summarization?

• Input:– Data: database, software trace, expert system– Text: one or more news articles, emails– Multimedia: text, speech, pictures, …

• Output:– Text summary– Speech summary– Multimedia summary

• Summaries must convey maximal information in minimal space

5

Template-Based SummarizationTemplate-Based Summarization

• N input articles parsed by information extraction system

• N sets of templates produced as output• Content planner uses planning operators to

identify similarities and trends• Refinement (Later template reports new # victims)

• New template constructed and passed to sentence generator

6

Sample TemplateSample Template

Message ID TST-COL-0001

Secsource: source ReutersSecsource: date 26 Feb 93

Early afternoonIncident: date 26 Feb 93Incident: location World Trade CenterIncident:Type BombingHum Tgt: number At least 5

7

Extractive Summarization

• Input: one or more text documents• Output: paragraph length summary• Sentence extraction is the standard method

• Using features such as key words, sentence position in document, cue phrases

• Identify sentences within documents that are salient• Extract and string sentences together• Luhn – 1950s• Hovy and Lin 1990s

• Schiffman 2000: Machine learning for extraction• Corpus of document/summary pairs• Learn the features that best determine important sentences• Kupiec 1995: Summarization of scientific articles

8

Problems with Sentence Extraction

• Extraneous phrases• “The five were apprehended along Interstate 95,

heading south in vehicles containing an array of gear including … ... authorities said.”

• Dangling noun phrases and pronouns• “The five”

• Misleading• Why would the media use this specific word

(fundamentalists), so often with relation to Muslims? *Most of them are radical Baptists, Lutheran and Presbyterian groups.

9

Beyond Extraction

• Shallow analysis instead of information extraction

• Extraction of phrases rather than sentences• Generation from surface representations in

place of semantics

10

Cut and Paste in Professional Summarization

• Humans also reuse the input text to produce summaries

• But they “cut and paste” the input rather than simply extract– Automatic corpus analysis

• 300 summaries, 1,642 sentences• 81% sentences were constructed by cutting and

pasting

– Linguistic studies

11

Major Cut and Paste Operations

• (1) Sentence reduction

~~~~~~~~~~~~

12



~~~~~~~~~~~~

13



• (2) Sentence Combination~~~~~~~~~~~~

~~~~~~~~~~~~~~ ~~~~~~

14


• (3) Syntactic Transformation

• (4) Lexical paraphrasing

~~~~~

~~~~~~~~~~~

~~~~~

~~~

15

Cut and Paste Based Single Document Summarization -- System Architecture

Extraction

Sentence reduction

Generation

Sentence combination

Input: single document

Extracted sentences

Output: summary

Corpus

Decomposition

Lexicon

Parser

Co-reference

16

(1) Decomposition of Human-written Summary Sentences

• Input: – a human-written summary sentence – the original document

• Decomposition analyzes how the summary sentence was constructed

• The need for decomposition– provide training and testing data for studying

cut and paste operations

17

Sample Decomposition Output

Summary sentence: Arthur B. Sackler, vice

president for law and public policy of Time Warner Cable Inc. and a member of the direct marketing association told the Communications Subcommittee of the Senate Commerce Committee that legislation to protect children’s privacy on-line could destroy the spondtaneous nature that makes the Internet unique.

Document sentences:S1: A proposed new law that would require web publishers to obtain parental consent before collecting personal information from children could destroy the spontaneous nature that makes the internet unique, a member of the Direct Marketing Association told a Senate panel Thursday.S2: Arthur B. Sackler, vice president for law and public policy of Time Warner Cable Inc., said the association supported efforts to protect children on-line, but he…S3: “For example, a child’s e-mail address is necessary …,” Sackler said in testimony to the Communications subcommittee of the SenateCommerce Committee.S5: The subcommittee is considering the Children’s Online Privacy Act, which was drafted…

18

Decomposition of human-written summaries




A Sample Decomposition Output

19






20






21

The Algorithm for Decomposition• A Hidden Markov Model based solution• Evaluations:

– Human judgements• 50 summaries, 305 sentences• 93.8% of the sentences were decomposed correctly

– Summary sentence alignment– Tested in a legal domain

• Details in (Jing&McKeown-SIGIR99)

22

(2) Sentence Reduction

• An example:Original Sentence: When it arrives sometime next year

in new TV sets, the V-chip will give parents a new and potentially revolutionary device to block out programs they don’t want their children to see.

Reduction Program: The V-chip will give parents a new and potentially revolutionary device to block out programs they don’t want their children to see.

Professional: The V-chip will give parents a device to block out programs they don’t want their children to see.

23

The Algorithm for Sentence Reduction

• Preprocess: syntactic parsing• Step 1: Use linguistic knowledge to decide what

phrases MUST NOT be removed

• Step 2: Determine what phrases are most important in the local context

• Step 3: Compute the probabilities of humans removing a certain type of phrase

• Step 4: Make the final decision

24

Step 1: Use linguistic knowledge to decide what MUST NOT be removed

• Syntactic knowledge from a large-scale, reusable lexicon constructed for this purpose convince: meaning 1: NP-PP :PVAL (“of”) (E.g., “He convinced me of his innocence”)

NP-TO-INF-OC (E.g., “He convinced me to go to the party”)

meaning 2: ... • Required syntactic arguments are not removed

25

• Saudi Arabia on Tuesday decided to sign…• The official Saudi Press Agency reported that

King Fahd made the decision during a cabinet meeting in Riyadh, the Saudi capital.

• The meeting was called in response to … the Saudi foreign minister, that the Kingdom…

• An account of the Cabinet discussions and decisions at the meeting…

• The agency...

Step 2: Determining context importance based on lexical links

26

Step 3: Compute probabilities of humans removing a phrase

verb(will give)

vsubc(when)

subj(V-chip)

iobj(parents)

obj(device)

ndet(a)

adjp(and)

lconj(new)

rconj(revolutionary)

Prob(“when_clause is removed”| “v=give”)

Prob (“to_infinitive modifier is removed” | “n=device”)

27

Step 4: Make the final decision

verb(will give)

vsubc(when)

subj(V-chip)

iobj(parents)

ndet(a)

adjp(and)

lconj(new)

rconj(revolutionary)

L Cn Pr

L Cn Pr

L Cn Pr L Cn Pr L Cn Pr

L Cn Pr

L Cn Pr L Cn Pr

obj(device)

L Cn Pr

L -- linguisticCn -- contextPr -- probabilities

28

Evaluation of Reduction

• Success rate: 81.3% – 500 sentences reduced by humans– Baseline: 43.2% (remove all the clauses,

prepositional phrases, to-infinitives,…)• Reduction rate: 32.7%

– Professionals: 41.8%• Details in (Jing-ANLP’00)

29

Multi-Document Summarization

• Monitor variety of online information sources• News, multilingual • Email

• Gather information on events across source and time

• Same day, multiple sources• Across time

• Summarize• Highlighting similarities, new information, different

perspectives, user specified interests in real-time

30

Approach

• Use a hybrid of statistical and linguistic knowledge

• Statistical analysis of multiple documents • Identify important new, contradictory information

• Information fusion and rule-driven content selection

• Generation of summary sentences • By re-using phrases• Automatic editing/rewriting summary

31

Newsblaster

http://newsblaster.cs.columbia.edu/• Clustering articles into events• Categorization by broad topic• Multi-document summarization • Generation of summary sentences

• Fusion• Editing of referring expressions

http://newsblaster.cs.columbia.edu/

32

Newsblaster Architecture Crawl News SitesCrawl News Sites

Form ClustersForm Clusters

CategorizeCategorize

TitleClusters

TitleClusters

SummaryRouter

SummaryRouter

SelectImagesSelectImages

EventSummary

EventSummary

BiographySummaryBiographySummary

Multi-EventMulti-Event

Convert Output to HTMLConvert Output to HTML

33

34

Fusion

35

Sentence Fusion Computation

• Common information identification• Alignment of constituents in parsed theme

sentences: only some subtrees match• Bottom-up local multi-sequence alignment• Similarity depends on

» Word/paraphrase similarity» Tree structure similarity

• Fusion lattice computation• Choose a basis sentence• Add subtrees from fusion not present in basis

36

• Add alternative verbalizations• Remove subtrees from basis not present in fusion

• Lattice linearization• Generate all possible sentences from the fusion

lattice• Score sentences using statistical language model

37

38

39

Tracking Across Days

• Users want to follow a story across time and watch it unfold• Network model for connecting clusters across days

• Separately cluster events from today’s news• Connect new clusters with yesterday’s news• Allows for forking and merging of stories

• Interface for viewing connections• Summaries that update a user on what’s new

• Statistical metrics to identify differences between article pairs• Uses learned model of features• Identifies differences at clause and paragraph levels

40

41

42

43

44

Different Perspectives

• Hierarchical clustering• Each event cluster is divided into clusters by

country

• Different perspectives can be viewed side by side

• Experimenting with update summarizer to identify key differences between sets of stories

45

46

47

48

Sentence Simplification

• Machine translated sentences long and ungrammatical

• Use sentence simplification on English sentences to reduce input to approximately “one fact” per sentence

• Use Arabic sentences to find most similar simple sentences

• Present multiple high similarity sentences

49

Simplification Examples

• 'Operation Red Dawn', which led to the capture of Saddam Hussein, followed crucial information from a member of a family close to the former Iraqi leader.

– ' Operation Red Dawn' followed crucial information from a member of a family close to the former Iraqi leader.

– Operation Red Dawn led to the capture of Saddam Hussein.

• Saddam Hussein had been the object of intensive searches by US-led forces in Iraq but previous attempts to locate him had proved unsuccessful.

– Saddam Hussein had been the object of intensive searches by US-led forces in Iraq.

– But previous attempts to locate him had proved unsuccessful.

50

Results on alquds.co.uk.195

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

F-measure vs. Threshold

Full Threshold .10Full Threshold .65Simplified Threshold .10Simplified Threshold .65FASTR Simplified .10FASTR Simplified .65

Threshold

F-m

ea

sure

51

Summarization Evaluation

• DUC (Document Understanding Conference): run by NIST– Held annually– Manual creation of topics (sets of documents)– 2-7 human written summaries per topic– How well does a system generated summary cover

the information in a human summary?• Metrics

– Rouge– Pyramid

52

User Study: Objectives

• Does multi-document summarization help?

• Do summaries help the user find information needed to perform a report writing task?

• Do users use information from summaries in gathering their facts?

• Do summaries increase user satisfaction with the online news system?

• Do users create better quality reports with summaries?• How do full multi-document summaries compare with

minimal 1-sentence summaries such as Google News?

53

User Study: Design

• Four parallel news systems– Source documents only; no summaries– Minimal single sentence summaries (Google

News)– Newsblaster summaries– Human summaries

• All groups write reports given four scenarios– A task similar to analysts– Can only use Newsblaster for research– Time-restricted

54

User Study: Execution• 4 scenarios

– 4 event clusters each– 2 directly relevant, 2 peripherally relevant– Average 10 documents/cluster

• 45 participants– Balance between liberal arts, engineering– 138 reports

• Exit survey– Multiple-choice and open-ended questions

• Usage tracking– Each click logged, on or off-site

55

“Geneva” Prompt

• The conflict between Israel and the Palestinians has been difficult for government negotiators to settle. Most recently, implementation of the “road map for peace”, a diplomatic effort sponsored by ……

• Who participated in the negotiations that produced the Geneva Accord?

• Apart from direct participants, who supported the Geneva Accord preparations and how?

• What has the response been to the Geneva Accord by the Palestinians?

56

Measuring Effectiveness

• Score report content and compare across summary conditions

• Compare user satisfaction per summary condition

• Comparing where subjects took report content from

57

58

User Satisfaction

• More effective than a web search with Newsblaster • Not true with documents only or single-sentence

summaries• Easier to complete the task with summaries than with

documents only• Enough time with summaries than documents only• Summaries helped most

• 5% single sentence summaries• 24% Newsblaster summaries• 43% human summaries

59

User Study: Conclusions

• Summaries measurably improve a news browswer’s effectiveness for research

• Users are more satisfied with Newsblaster summaries are better than single-sentence summaries like those of Google News

• Users want search– Not included in evaluation

60

Email Summarization

• Cross between speech and text– Elements of dialog– Informal language– More context explicitly repeated than speech

• Wide variety of types of email– Conversation to decision-making

• Different reasons for summarization– Browsing large quantities of email – a mailbox– Catch-up: join a discussion late and participate – a

thread

61

Email Summarization: Approach

• Collected and annotated multiple corpora of email– Hand-written summary, categorization

threads&messages – Identified 3 categories of email to address:

• Event planning, Scheduling, Information gathering

• Developed tools:– Automatic categorization of email – Preliminary summarizers

• Statistical extraction using email specific features• Components of category specific summarization

62

Email Summarization by Sentence Extraction

• Use features to identify key sentences• Non-email specific: e.g., similarity to centroid• Email specific: e.g., following quoted material

• Rule-based supervised machine learning• Training on human-generated summaries

• Add “wrappers” around sentences to show who said what

63

Data for Sentence Extraction

• Columbia ACM chapter executive board mailing list• Approximately 10 regular participants• ~300 Threads, ~1000 Messages• Threads include: scheduling and planning of meetings

and events, question and answer, general discussion and chat.

• Annotated by human annotators:– Hand-written summary– Categorization of threads and messages– Highlighting important information (such as question-

answer pairs)

64

Email Summarization by Sentence Extraction

• Creation of Training Data– Start with human-generated summaries– Use SimFinder (a trained sentence similarity measure –

Hatzivassiloglou et al 2001) to label sentences in threads as important

• Learning of Sentence Extraction Rules– Use Ripper (a rule learning algorithm – Cohen 1996) to learn

rules for sentence classification– Use basic and email-specific features in machine learning

• Creating summaries– Run learned rules on unseen data– Add “wrappers” around sentences to show who said what

• Results– Basic: .55 precision; .40 F-measure– Email-specific: .61 precision; .50 F-measure

65

Sample Automatically Generated Summary (ACM0100)

Regarding "meeting tonight...", on Oct 30, 2000, David Michael Kalin wrote: Can I reschedule my C session for Wednesday night, 11/8, at 8:00?

Responding to this on Oct 30, 2000, James J Peach wrote: Are you sure you want to do it then?

Responding to this on Oct 30, 2000, Christy Lauridsen wrote: David , a reminder that your scheduled to do an MSOffice session on Nov. 14, at 7pm in 252Mudd.

66

Information Gathering Email:The Problem

Summary from our rule-based sentence extractor:

Regarding "acm home/bjarney", on Apr 9, 2001, Mabel Dannon wrote:Two things: Can someone be responsible for the press releases for Stroustrup?

Responding to this on Apr 10, 2001, Tina Ferrari wrote:I think Peter, who is probably a better writer than most of us, is writing up something for dang and Dave to send out to various ACM chapters. Peter, we can just use that as our "press release", right?

In another subthread, on Apr 12, 2001, Keith Durban wrote:Are you sending out upcoming events for this week?

67

Detection of Questions

Questions in interrogative form: inverted subject-verb order

Supervised rule induction approach, training Switchboard, test ACM corpus

• Results:

• Recall low because:

Questions in ACM corpus start with a declarative clause– So, if you're available, do you want to come?– if you don't mind, could you post this to the class bboard?

• Results without declarative-initial questions:

Recall 0.56

Precision 0.96

F-measure 0.70

Recall 0.72

Precision 0.96

F-measure 0.82

68

Detection of Answers

Supervised Machine Learning Approach• Use human annotated data to generate gold standard training

data• Annotators were asked to highlight and associate question-answer

pairs in the ACM corpus.

• Learn a classifier that predicts if a subsequent segment to a question segment answers it– Represent each question and candidate answer segment

by a feature vector

Labeller 1 Labeller 2 Union

Precision 0.690 0.680 0.728

Recall 0.652 0.612 0.732

F1-Score 0.671 0.644 0.730

69

Integrating QA detection with summarization

• Use QA labels as features in sentence extraction (F=.545)

• Add automatically detected answers to questions in extractive summaries (F=.566)

• Start with QA pair sentences and augmented with extracted sentences (F=.573)

70

Integrated in Microsoft Outlook

71

Meeting Summarization(joint with Berkeley, SRI, Washington)

• Goal: automatic summarization of meetings by generating “minutes” highlighting the debate that affected each decision.

• Work to date: Identification of agreement/disagreement – Machine learning approach: lexical, structure,

acoustic features– Use of context: who agreed with who so far?

– Adressee identification– Bayesian modeling of context

72

Conclusions

• Non-extractive summarization is practical today• User studies show summarization improves

access to needed information• Advances and ongoing research in tracking

events, multilingual summarization, perspective identification

• Moves to new media (email, meetings) raise new challenges with dialog, informal language

Documents

Summarization CS4705. 2 What is Summarization? Input: –Data: database, software trace, expert system –Text: one or more news articles, emails –Multimedia: