37
Course overview Introduction to summarization Lecture 1

Course overview Introduction to summarization Lecture 1

Embed Size (px)

Citation preview

Page 1: Course overview Introduction to summarization Lecture 1

Course overviewIntroduction to summarization

Lecture 1

Page 2: Course overview Introduction to summarization Lecture 1

Instructor: Ani Nenkova– 505 Levine, [email protected]– Office hours: Tuesdays 3:15—4:15 or by

appointment

TA: Annie Louis– [email protected]

Page 3: Course overview Introduction to summarization Lecture 1

Textbook

No required text– Slides/lecture notes and handouts will be given in class

Recommended– Speech and Language Processing (second edition, 2007,

Prentice-Hall), by Daniel Jurafsky and James Martin

Also see– Christopher Manning and Hinrich Schutze, “Foundations of

statistical natural language processing”– Advances in Automatic Text Summarization

Edited by Inderjeet Mani and Mark T. Maybury

Page 4: Course overview Introduction to summarization Lecture 1

Grading

5 homeworks (65%)– One will be a literature overview assignment– One will be at the end of the semester, instead of a

final You are encouraged to form teams for the

homework (programming) assignments, but all write-ups should be individual

Midterm (20%) Class participation (15%)

– “Submit” 5 questions each week

Page 5: Course overview Introduction to summarization Lecture 1

Late submission policy

5 late days for the semester– Can be used for any assignment with no penalty

Late submissions after “late days” have been used up will not be graded

Page 6: Course overview Introduction to summarization Lecture 1

What you will learn

A lot about summarization and natural language techniques used in summarization

Tools and resources– Part of speech and named entity taggers, parsers,

Wordnet, WEKA

Page 7: Course overview Introduction to summarization Lecture 1

Problem formalization/distributions– Distributions: Zipfian, Binomial, Multinomial– Graph representations

System comparisons– Statistical significance and statistical tests

Page 8: Course overview Introduction to summarization Lecture 1

Reading scientific articles– Part of the assigned readings– Useful skill, regardless of your future job plans

Improving writing skills– Immensely useful, regardless of your future job plans– The literature overview assignment will focus on this, but in

other assignments the way you describe your work will also be evaluated

Page 9: Course overview Introduction to summarization Lecture 1

What is summarization?

Page 10: Course overview Introduction to summarization Lecture 1
Page 11: Course overview Introduction to summarization Lecture 1

Columbia Newsblaster

The academic version

Page 12: Course overview Introduction to summarization Lecture 1

What is the input?

News, or clusters of news– a single article or several articles on a related topic

Email and email thread Scientific articles Health information: patients and doctors Meeting summarization Video

Page 13: Course overview Introduction to summarization Lecture 1

What is the output

Keywords Highlight information in the input Chunks or speech directly from the input or

paraphrase and aggregate the input in novel ways

Modality: text, speech, video, graphics

Page 14: Course overview Introduction to summarization Lecture 1

Ideal stages of summarization

Analysis– Input representation and understanding

Transformation– Selecting important content

Realization– Generating novel text corresponding to the gist of the input

Page 15: Course overview Introduction to summarization Lecture 1

Most current systems

Use shallow analysis methods– Rather than full understanding

Work by sentence selection– Identify important sentences and piece them

together to form a summary

Page 16: Course overview Introduction to summarization Lecture 1

Data-driven approaches

Relying on features of the input documents that can be easily computes from statistical analysis

Word statistics Cue phrases Section headers Sentence position

Page 17: Course overview Introduction to summarization Lecture 1

Knowledge-based systems

Use more sophisticated natural language processing

Discourse information– Resolve anaphora, text structure

Use external lexical resources– Wordnet, adjective polarity lists, opinion

Using machine learning

Page 18: Course overview Introduction to summarization Lecture 1

What are summaries useful for?

Relevance judgments– Does this document contain information I am

interested in?– Is this document worth reading?

Save time Reduce the need to consult the full document

Page 19: Course overview Introduction to summarization Lecture 1

Multi-document summarization

Very useful for presenting and organizing search results– Many results are very similar, and grouping

closely related documents helps cover more event facets

– Summarizing similarities and differences between documents

Page 20: Course overview Introduction to summarization Lecture 1

Scientific article summarization

Not only what the article is about, but also how it relates to work it cites

Determine which approaches are criticized and which are supported– Automatic genre specific summaries are more

useful than original paper abstracts

Page 21: Course overview Introduction to summarization Lecture 1

Other uses

Document indexing for information retrieval

Automatic essay grading, topic identification module

Page 22: Course overview Introduction to summarization Lecture 1

Data-driven summarization

Page 23: Course overview Introduction to summarization Lecture 1

Frequency as indicator of importance

The topic of a document will be repeated many times

In multi-document summarization, important content is repeated in different sources

Page 24: Course overview Introduction to summarization Lecture 1

Greedy frequency method

Compute word probability from input

Compute sentence weight as function of word probability

Pick best sentence

Page 25: Course overview Introduction to summarization Lecture 1

How to deal with redundancy?

Author JK Rowling has won her legal battle in a New York court to get an unofficial Harry Potter encyclopaedia banned from publication.

A U.S. federal judge in Manhattan has sided with author J.K. Rowling and ruled against the publication of a Harry Potter encyclopedia created by a fan of the book series.

– Shallow techniques not likely to work well

Page 26: Course overview Introduction to summarization Lecture 1

Global optimization for content selection

What is the best summary? vs What is the best sentence?

Form all summaries and choose the best– What is the problem with this approach?

Page 27: Course overview Introduction to summarization Lecture 1

Sentence clustering for theme identification

1. PAL was devastated by a pilots' strike in June and by the region's currency crisis.

2. In June, PAL was embroiled in a crippling three-week pilots' strike.

3. Tan wants to retain the 200 pilots because they stood by him when the majority of PAL's pilots staged a devastating strike in June.

Page 28: Course overview Introduction to summarization Lecture 1

Cluster sentences from the input into similar themes

Choose one sentence to represent a theme

Consider bigger themes as more important

Page 29: Course overview Introduction to summarization Lecture 1

Using graph representations

Nodes– Sentences– Discourse entities

Arcs– Between similar sentences– Between related entities

Page 30: Course overview Introduction to summarization Lecture 1

Using machine learning

Ask people to select sentences Use these as training examples for machine

learning– Each sentence is represented as a number of

features– Based on the features distinguish sentences that

are appropriate for a summary and sentences that are not

Run on new inputs

Page 31: Course overview Introduction to summarization Lecture 1

Information ordering

In what order to present the selected sentences?– An article with permuted sentences will not be

easy to understand

Very important for multi-document summarization– Sentences coming from different documents

Page 32: Course overview Introduction to summarization Lecture 1

Automatic summary edits

Some expressions might not be appropriate in the new context– References:

– he – Putin – Russian Prime Minister Vladimir Putin

– Discourse connectives However, moreover, subsequently

Requires more sophisticated NLP techniques

Page 33: Course overview Introduction to summarization Lecture 1

Before

Pinochet was placed under arrest in London Friday by

British police acting on a warrant issued by a Spanish

judge. Pinochet has immunity from prosecution in

Chile as a senator-for-life under a new constitution that

his government crafted. Pinochet was detained in the

London clinic while recovering from back surgery.

Page 34: Course overview Introduction to summarization Lecture 1

After

Gen. Augusto Pinochet, the former Chilean dictator, was placed under arrest in London Friday by British police acting on a warrant issued by a Spanish judge. Pinochet has immunity from prosecution in Chile as a senator-for-life under a new constitution that his government crafted. Pinochet was detained in the London clinic while recovering from back surgery.

Page 35: Course overview Introduction to summarization Lecture 1

Before

Turkey has been trying to form a new government since a coalition government led by Yilmaz collapsed last month over allegations that he rigged the sale of a bank. Ecevit refused even to consult with the leader of the Virtue Party during his efforts to form a government. Ecevit must now try to build a government. Demirel consulted Turkey's party leaders immediately after Ecevit gave up.

Page 36: Course overview Introduction to summarization Lecture 1

After

Turkey has been trying to form a new government since a coalition government led by Prime Minister Mesut Yilmaz collapsed last month over allegations that he rigged the sale of a bank. Premier-designate Bulent Ecevit refused even to consult with the leader of the Virtue Party during his efforts to form a government. Ecevit must now try to build a government. President Suleyman Demirel consulted Turkey's party leaders immediately after Ecevit gave up.

Page 37: Course overview Introduction to summarization Lecture 1