17
The ICSI Summarization System Dan Gillick, Benoit Favre, and Dilek Hakkani-Tür {dgillick, favre, dilek}@icsi.berkeley.edu International Computer Science Institute

The ICSI Summarization System Dan Gillick, Benoit Favre, and Dilek Hakkani-Tür {dgillick, favre, dilek}@icsi.berkeley.edu International Computer Science

Embed Size (px)

Citation preview

Page 1: The ICSI Summarization System Dan Gillick, Benoit Favre, and Dilek Hakkani-Tür {dgillick, favre, dilek}@icsi.berkeley.edu International Computer Science

The ICSI Summarization System

Dan Gillick, Benoit Favre, and Dilek Hakkani-Tür

{dgillick, favre, dilek}@icsi.berkeley.edu

International Computer Science Institute

Berkeley, CA

Page 2: The ICSI Summarization System Dan Gillick, Benoit Favre, and Dilek Hakkani-Tür {dgillick, favre, dilek}@icsi.berkeley.edu International Computer Science

November 18, 2008 ICSI at TAC 2008 Dan Gillick (2)

Who Are We?

Graduate student at UC Berkeley

Postdoc at ICSI, PhD from Avignon

Senior Researcher at ICSIBenoit Favre

Dilek Hakkani-Tür

Dan Gillick

Page 3: The ICSI Summarization System Dan Gillick, Benoit Favre, and Dilek Hakkani-Tür {dgillick, favre, dilek}@icsi.berkeley.edu International Computer Science

November 18, 2008 ICSI at TAC 2008 Dan Gillick (3)

Summarization Assumptions

• Information is conveyed by discrete, independent concepts.

• The content value of a summary can be measured by the total value of the unique concepts it contains.

• Linguistic quality is enforced primarily by units of selection (e.g. sentences).

Page 4: The ICSI Summarization System Dan Gillick, Benoit Favre, and Dilek Hakkani-Tür {dgillick, favre, dilek}@icsi.berkeley.edu International Computer Science

November 18, 2008 ICSI at TAC 2008 Dan Gillick (4)

What are Concepts?

Christians make up just 3 percent of Iraq's population of about 25 million.

(1) Christians make up 3 percent of Iraq’s population

(2) The population of Iraq is 25 million

(1) Christians make

(2) 3 percent

(3) Iraq’s population

(4) 25 million

Original sentence

Pyramid concepts

Word bigram concepts

Page 5: The ICSI Summarization System Dan Gillick, Benoit Favre, and Dilek Hakkani-Tür {dgillick, favre, dilek}@icsi.berkeley.edu International Computer Science

November 18, 2008 ICSI at TAC 2008 Dan Gillick (5)

ILP Formulation

Maximize a single linear objective function:

i : concept indexci: indicator for concept i in summarywi : weight (value) of concept i

Image: chilton-computing.org.uk

Page 6: The ICSI Summarization System Dan Gillick, Benoit Favre, and Dilek Hakkani-Tür {dgillick, favre, dilek}@icsi.berkeley.edu International Computer Science

November 18, 2008 ICSI at TAC 2008 Dan Gillick (6)

ILP Formulation

Maximize a single linear objective function:

Subject to linear constraints:

i : concept indexj : sentence indexci: indicator for concept i in summarysj: indicator for sentence j in summarywi : weight (value) of concept ilj : length of sj

oij: indicator for ci in sj

L : maximum summary length

Image: chilton-computing.org.uk

Page 7: The ICSI Summarization System Dan Gillick, Benoit Favre, and Dilek Hakkani-Tür {dgillick, favre, dilek}@icsi.berkeley.edu International Computer Science

November 18, 2008 ICSI at TAC 2008 Dan Gillick (7)

Building Systems (1)

ICSI-1– Concepts: word bigrams– Mapping Function: document frequency

• only include sentences with some query overlap

• prune concepts appearing in fewer than 3 documents

– Units of Selection: sentences

ICSI-2– Units of Selection: compressed sentence

candidates

Page 8: The ICSI Summarization System Dan Gillick, Benoit Favre, and Dilek Hakkani-Tür {dgillick, favre, dilek}@icsi.berkeley.edu International Computer Science

November 18, 2008 ICSI at TAC 2008 Dan Gillick (8)

Building Systems (2)

MRO (Maximum ROUGE Oracle)– Concepts: word bigrams– Mapping Function: document frequency in human

“gold” summaries– Units of Selection: sentences

Page 9: The ICSI Summarization System Dan Gillick, Benoit Favre, and Dilek Hakkani-Tür {dgillick, favre, dilek}@icsi.berkeley.edu International Computer Science

November 18, 2008 ICSI at TAC 2008 Dan Gillick (9)

Pre/post - processing

• Sentence segmentation, tokenization, stop-words, Porter stemming – NLTK

• Simple rules for removing newswire headers and formatting markup

• ICSI-1, MRO: ordering first by source date, then by sentence number

• ICSI-2: dendrogram ordering (not clear this is better)

Page 10: The ICSI Summarization System Dan Gillick, Benoit Favre, and Dilek Hakkani-Tür {dgillick, favre, dilek}@icsi.berkeley.edu International Computer Science

November 18, 2008 ICSI at TAC 2008 Dan Gillick (10)

Only the Most Related Work

• Assigning value to words based on frequency (Nenkova and Vanderwende, 2005)

• Global optimization with learned word values using a beam search (Yih, et al., 2007)

• Set cover formalism for summarization (Filatova and Hatzivassiloglou, 2004)

• ILP for summarization (McDonald, 2007)

• Approximate ROUGE-1 oracle results (Conroy et al., 2006)

Page 11: The ICSI Summarization System Dan Gillick, Benoit Favre, and Dilek Hakkani-Tür {dgillick, favre, dilek}@icsi.berkeley.edu International Computer Science

November 18, 2008 ICSI at TAC 2008 Dan Gillick (11)

TAC Results (1)

• Excellent performance on non-update problems, t-test shows no significant difference between ICSI-1 and the best system in every category

• No specific update task processing

Page 12: The ICSI Summarization System Dan Gillick, Benoit Favre, and Dilek Hakkani-Tür {dgillick, favre, dilek}@icsi.berkeley.edu International Computer Science

November 18, 2008 ICSI at TAC 2008 Dan Gillick (12)

TAC Results (2)

• Overall best ROUGE scores

• Relatively poor linguistic quality

Page 13: The ICSI Summarization System Dan Gillick, Benoit Favre, and Dilek Hakkani-Tür {dgillick, favre, dilek}@icsi.berkeley.edu International Computer Science

November 18, 2008 ICSI at TAC 2008 Dan Gillick (13)

Linguistic Quality Analysis

Among summaries receiving linguistic quality scores of 1 or 2, we counted how many contained each type of error:

• ICSI-1 could be drastically improved by better sentence segmentation and rules for removing a few words.

• ICSI-2 is too aggressive with sentence compression.

• Co-reference resolution is a major problem.

Page 14: The ICSI Summarization System Dan Gillick, Benoit Favre, and Dilek Hakkani-Tür {dgillick, favre, dilek}@icsi.berkeley.edu International Computer Science

November 18, 2008 ICSI at TAC 2008 Dan Gillick (14)

An Oracle Experiment (1)

Data: DUC 2007 update task set A (10 topics)

Note: “Content responsiveness” evaluation does not include linguistic quality as in TAC 2008.

Systems Evaluated:

B1: Returns all leading sentences up to the length limit from the most recent document

B2: NIST’s “high performance generic summarizer” (Conroy, et al., 2004)

ICSI-1: Our submitted system

MRO: The oracle system

H: Each of 4 human summaries written by NIST’s IR experts.

Page 15: The ICSI Summarization System Dan Gillick, Benoit Favre, and Dilek Hakkani-Tür {dgillick, favre, dilek}@icsi.berkeley.edu International Computer Science

November 18, 2008 ICSI at TAC 2008 Dan Gillick (15)

An Oracle Experiment (2)

• MRO gets better content scores than ICSI-1, but worse than humans

• All differences significant at 95% confidence interval, using Tukey’s “Honestly Significant Differences”

• Suggests there is room for improvement in sentence extraction

Page 16: The ICSI Summarization System Dan Gillick, Benoit Favre, and Dilek Hakkani-Tür {dgillick, favre, dilek}@icsi.berkeley.edu International Computer Science

November 18, 2008 ICSI at TAC 2008 Dan Gillick (16)

Not Randomly Selected ExampleSummaries for D0808-A: “Describe the events related to

Christian minorities in Iraq and their current status.”MRO: Iraq's Christians, increasingly targeted by insurgents, are fleeing Baghdad for the safety of the Kurdish north or neighboring Syria and Jordan. But the exodus is temporary, insist many, because they are not selling their homes and property. Christians make up just 3 percent of Iraq's population of about 25 million. Officials estimate that as many as 15,000 of Iraq's nearly one million Christians have left the country since August, when four churches in Baghdad and one in Mosul were attacked in a coordinated series of car bombings. Insurgents abducted Syrian Catholic Archbishop Basile Casmoussa apparently to frighten Iraqi Christians.

ICSI-1: In an interview, Yonadem Kana, the leader of the Assyrian Democratic Movement in Iraq and a member of the Iraqi National Council, said the fighters have been deployed in Baghdida near the northern city of Mosul. Christians make up just 3 percent of Iraq's population of about 25 million. Officials estimate that as many as 15,000 of Iraq's nearly one million Christians have left the country since August, when four churches in Baghdad and one in Mosul were attacked in a coordinated series of car bombings.Most of Christians in Iraq are in Baghdad and northern cities.

ICSI-2: Officials estimate that as many as 15000 of Iraq's nearly one million Christians have left the country since August, when four churches in Baghdad and one in Mosul were attacked in a coordinated series of car bombings. Most of Christians in Iraq are in Baghdad and northern cities. Christians make up just 3 percent of Iraq's population of about 25 million. Armed men kidnapped a Catholic archbishop in Iraq's main northern city of Mosul Monday. In an interview, Yonadem Kana, the leader of the Assyrian Democratic Movement in Iraq and a member, said the fighters have been deployed in Baghdida.

Responsiveness: ?Linguistic Quality: ?Pyramid: ?ROUGE-2: 0.186

Responsiveness: 3Linguistic Quality: 3Pyramid: 0.489ROUGE-2: 0.118

Responsiveness: 4Linguistic Quality: 4Pyramid: 0.517ROUGE-2: 0.119

Page 17: The ICSI Summarization System Dan Gillick, Benoit Favre, and Dilek Hakkani-Tür {dgillick, favre, dilek}@icsi.berkeley.edu International Computer Science

November 18, 2008 ICSI at TAC 2008 Dan Gillick (17)

Conclusion

• ICSI system is simple, fast, and performs well.

• Linguistic quality needs work but a set of rules for cleaning sentences will help a lot.

• Oracle system suggests: – room for improvement in sentence selection– more is likely needed to match human performance

• Source code available soon (www.dgillick.com/summarize.html)