28
Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning Liangda Li 1 Ke Zhou 1 G ui-R ong Xue 1 H ongyuan Zha 2 Yong Yu 1 1 D epartm entofCom puterScience Shanghai Jiao-Tong U niversity 2 C ollege ofC om puting G eorgia Institute ofTechnology W W W 2009

Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Embed Size (px)

DESCRIPTION

Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning. Outline. Introduction Diversity, Coverage and Balance Optimization Problem and Structure Learning Framework Experiments Conclusion. Introduction. Example for Search Results. Example for News Browsing. - PowerPoint PPT Presentation

Citation preview

Page 1: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Enhancing Diversiy, Coverage and Balance for Summarization through Structure

Learning

Liangda Li 1 Ke Zhou 1 Gui-Rong Xue 1 Hongyuan Zha 2 Yong Yu 1

1Department of Computer Science

Shanghai Jiao-Tong University

2College of Computing

Georgia Institute of Technology

WWW 2009

Page 2: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Outline

Introduction

Diversity, Coverage and Balance

Optimization Problem and Structure Learning Framework

Experiments

Conclusion

Page 3: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Introduction

Page 4: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Example for Search Results

Page 5: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Example for News Browsing

Page 6: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Traditional summarization approaches

Consider the summarization task as a binary classification problem

0-1 loss function

Raise serious redundancy, unbalance and low recall problems.

Robert H. Bork, who once hoped for a Supreme Court seat, instead stood before the nation's highest court Monday.

He was there in the capacity of a lawyer _ apparently making him the first defeated Supreme Court nominee ever to argue before the justices.

Robert H. Bork, who once hoped for a Supreme Court seat, instead stood before the nation's highest court Monday.

Representing Citibank in a big-stakes battle, Bork argued that U.S. banks with branch offices overseas should not be required to pay depositors after foreign governments seize or freeze those accounts.

He was branded a conservative extremist by opponents.

Bork, nominated by then-President Reagan, has said he was the victim of a campaign of lies and distortions led by liberals.

There were no references to that fight Monday as Bork engaged in debate with the justices and opposing lawyers over the complexities of federal banking law and related matters.

He will be paid $25,000 a year to teach one course each semester as a part-time professor.

Bork was treated much the same as any attorney who appears before the justices.

They questioned him vigorously, occasionally interrupting him for clarification and elaboration.

Justice Anthony M. Kennedy, who occupies the seat that eluded Bork, directed a few questions at Bork.

Bork, 63, a former federal appeals court judge, is a fellow at the American Enterprise Institute and will begin teaching constitutional law in the fall at the George Mason University law school in Arlington, Va.

Bork was only the sixth man this century to be denied a Supreme Court seat by the Senate and the 26th in its history.

Page 7: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Diversity, Coverage and Balance

Page 8: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Three Key Requirements in Summarization

Diversity: less redundant sentences

Coverage: little information loss

Balance: emphasize various aspects of the document in a balance way

Page 9: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Example

AP890616-0912 from DUC2001

Generally three topics (views from different perspectives): Michael Milken himself Involved people The company

13 AP890616-0192 4826<DOC><DOCNO> AP890616-0192 </DOCNO><FILEID>AP-NR-06-16-89 0237EST</FILEID><FIRST>r f PM-Milken Bjt 06-16 0734</FIRST><SECOND>PM-Milken, Bjt,0757</SECOND><HEAD>Indicted Bond Trader Quits Drexel, Sets Up Own Firm</HEAD><BYLINE>By STEFAN FATSIS</BYLINE><BYLINE>AP Business Writer</BYLINE><DATELINE>NEW YORK (AP) </DATELINE><TEXT>Michael Milken, the fallen Drexel Burnham Lambert financier, is striking out on his own.But what isn't clear is whether loyal ex-colleagues will follow the Pied Piper of junk bonds to his new firm _ and how long the venture will last with Milken facing a lengthy jail term. ……

Page 10: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Example for Diversity

A: Milken, 42, resigned Thursday after 19 years at Drexel, where he began a wildly successful career that helped reshape corporate America in the 1980s through the pioneering use of low-grade securities called junk bonds.

B: Milken, who made a reported $550 million in 1987, said he is forming a consulting firm to assist companies that want to raise money to start up, grow or stay in business.

C: People involved in Milken's plans for the new firm, International Capital Assets Group, said Milken does not intend to raid Drexel's Beverly Hills, Calif., junk bond division, which he founded and ran until his March indictment.

B & C is better than A & B

Select sentences belonging to different topics

Milken himself

Involved People

Page 11: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Example for Coverage

A: Milken, who made a reported $550 million in 1987, said he is forming a consulting firm to assist companies that want to raise money to start up, grow or stay in business.

B: Milken joined Drexel full-time in 1970 after graduating from the University of Pennsylvania's Wharton School.

A is better than B

Select sentences more relevant to one of the topics.

Relevant

Irrelevant

Page 12: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Example for Balance

A: Milken, who made a reported $550 million in 1987, said he is forming a consulting firm to assist companies that want to raise money to start up, grow or stay in business.B: But he faces $1.85 billion in forfeitures of alleged illegal profits and a lengthy jail term if convicted on a 98-count fraud and racketeering indictment, the government's largest securities crime prosecution to date.C: A Drexel official, speaking on condition of anonymity, said the Wall Street giant does not anticipate conflicts with Milken's new firm because it will not be in the brokerage business.D: “Michael Milken made many important contributions to Drexel Burnham, and his resignation, although not unexpected, is a sad event,” Drexel stated.

A & B & C is better than B & C & D

For each topic, select the same percentage of sentences according to its corresponding weight.

Milken himself

The company

Page 13: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Optimization Problem and Structure Learning Framework

Page 14: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Problem Formulation

Predicate a summary: y* = argmaxy f(x,y)

Learning a model: f(x,y) = <w,ψ(x,y)>

Joint feature representation: ψ(x,y)

Loss function: Δ(y, y’)

Page 15: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Large margin approach:

The parameter c: controls the tradeoff between model complexity

and the sum of slacks variables

The constraints enforce the ground-truth summary a higher score.

Structural Support Vector Machines (Tsochantaridis et al., 2005)

Page 16: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Constraint for Diversity

Diversity: little overlap

The sum of summary sentences’ unique score should be no more than the overall score when they are regarded as a whole set.

Each sentence should focus on different subtopics

Page 17: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Constraint for Coverage

Coverage: cover all subtopics as much as possible

Vector v: a sentence’s coverage of the subtopic set.

Subtopic Coverage Degree

Page 18: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Subtopic Set

A subtopic set T for each document,each subtopic is associated with a set of words

cover(t, s) is employed to define sentence s’s coverage of subtopic t:

cover(t, s) represents the proportion of the words in the subtopic t that also appear in the sentence s.

Page 19: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Example

Each topic may owns several subtopics, which indicates its importance Topic: Michael Milken himself

Subtopic: Milken’s contribution, Milken’s fallen, Milken’s current situation.

For subtopic t: Milken’s contribution a: Milken, who made a reported $550 million in 1987, said he is forming

a consulting firm to assist companies that want to raise money to start up, grow or stay in business.b: But he faces $1.85 billion in forfeitures of alleged illegal profits and a lengthy jail term if convicted on a 98-count fraud and racketeering indictment, the government's largest securities crime prosecution to date.c: “I am naturally disappointed to be forced to leave Drexel as part of the firm's settlement with the government, but I look forward to the opportunity of helping people build companies,” Michael Milken said in a statement.

cover(t,a)=1; cover(t,b)=0; cover(t,c)=0.16.

Page 20: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Constraint for Balance

Balance: relatively equal coverage for each subtopic

Variation of subtopics’ coverage:

Page 21: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Combined Optimization Problem

Page 22: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Structure Learning

Independence Graphs Measure the similarity between sentences Shrink the searching space

Learning Algorithm Cutting plane algorithm

Making Prediction

Page 23: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Experiments

Page 24: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Experiments Setup

Dataset: DUC2001 Bigset, Docset1, Docset2

Bigset: contains147 document-summary pairs from DUC2001 dataset Docset1, Docset2: two main subset of Bigset

Evaluation Metric F1 Evaluation Rouge Evaluation Comparable to F1 evaluation ROUGE-N-R, ROUGE-N-P, ROUGE-N-F

gramn denotes the n-grams in document y

Page 25: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Overall Performance

Our approach performs best Results on smaller data set show the robustness of our

approach

Page 26: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Constraint Selection

Coverage-biased constraint makes the greatest contribution to summarization.

The model trained with all three constraints performs the best

Page 27: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Conclusion

Diversity, Coverage and Balance Prove to be of great importance to the summarization task.

Structural Learning Framework Structural SVM Three constraints enforce diversity, coverage and balance

seperately Independence graphs and Cutting plane algorithm

Experimental Results Our approach outperforms state-of-art ones. The constraint imporve the preformance significantly.

Page 28: Enhancing Diversiy, Coverage and Balance for Summarization through Structure Learning

Thank you!