14
Intelligent Database Systems Presenter : JIAN-REN CHEN Authors : Rafael Ferreira a, * , Luciano De Souza Cabral a , Rafael Dueire Lins a , Gabriel Pereira E Silva a , Fred Freitas a , George D.C. Cavalcanti a , Rinaldo Lima a , Steven J. Simske b , Luciano Favaro c 2013.ESA Assessing sentence scoring techniques for extractive text summarization

Assessing sentence scoring techniques for extractive text summarization

  • Upload
    zody

  • View
    53

  • Download
    0

Embed Size (px)

DESCRIPTION

Assessing sentence scoring techniques for extractive text summarization. - PowerPoint PPT Presentation

Citation preview

Page 1: Assessing  sentence scoring techniques  for  extractive text summarization

Intelligent Database Systems Lab

Presenter : JIAN-REN CHEN

Authors : Rafael Ferreiraa, *, Luciano De Souza Cabrala, Rafael Dueire Linsa,

Gabriel Pereira E Silvaa, Fred Freitasa, George D.C. Cavalcantia, Rinaldo Limaa,

Steven J. Simskeb, Luciano Favaroc

2013.ESA

Assessing sentence scoring techniques for extractive text summarization

Page 2: Assessing  sentence scoring techniques  for  extractive text summarization

Intelligent Database Systems Lab

OutlinesMotivationObjectivesMethodologyExperimentsConclusionsComments

Page 3: Assessing  sentence scoring techniques  for  extractive text summarization

Intelligent Database Systems Lab

MotivationDue to the huge volume of information in the Internet, it has

become unfeasible to efficiently sieve useful information from the

huge mass of documents.

Text Summarization

- Extractive

- Abstractive

Page 4: Assessing  sentence scoring techniques  for  extractive text summarization

Intelligent Database Systems Lab

Objectives

• We want to introduce 15 sentence scoring methods and assess all of them for extractive text summarization.

Page 5: Assessing  sentence scoring techniques  for  extractive text summarization

Intelligent Database Systems Lab

Methodology – Word scoring

• Word frequency• TF/IDF• Upper case• Proper noun• Word co-occurrence• Lexical similarity

Score(s) =

n-gram

Page 6: Assessing  sentence scoring techniques  for  extractive text summarization

Intelligent Database Systems Lab

Methodology – Sentence scoring

• Cue-phrases• Sentence inclusion of numerical data• Sentence length• Sentence position• Sentence centrality• Sentence resemblance to the title

in summary, in conclusion, our investigationthe best, the most important, according to the study,significantly, important, in particular, hardly, impossible

Score(s) =

Sp( Si) {1 𝑓𝑖𝑟𝑠𝑡 𝑁 𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑠0 h𝑜𝑡 𝑒𝑟𝑤𝑖𝑠𝑒

Page 7: Assessing  sentence scoring techniques  for  extractive text summarization

Intelligent Database Systems Lab

Methodology – Graph scoring

• Text rank• Bushy path of the node• Aggregate similarity

Score (s) = #(branches connected to the node)

Score (s) =

Page 8: Assessing  sentence scoring techniques  for  extractive text summarization

Intelligent Database Systems Lab

Experiments - Datasets、 Evaluation

CNN Blog SUMMAC

data 400 100 post50 blog 183

Summary length 2-4 7 2-7

Datasets:

ROUGE- Quantitative Assessment- Qualitative Assessment

Evaluation:

Page 9: Assessing  sentence scoring techniques  for  extractive text summarization

Intelligent Database Systems Lab

Experiments - CNN

word scoring: TF/IDFsentence scoring: Sentence position 1graph scoring: TextRank score

Page 10: Assessing  sentence scoring techniques  for  extractive text summarization

Intelligent Database Systems Lab

Experiments - Blogword scoring: TF/IDFsentence scoring: Sentence lengthgraph scoring: TextRank score

Page 11: Assessing  sentence scoring techniques  for  extractive text summarization

Intelligent Database Systems Lab

Experiments - SUMMACword scoring: TF/IDFsentence scoring: Resemblance to the titlegraph scoring: TextRank score

Page 12: Assessing  sentence scoring techniques  for  extractive text summarization

Intelligent Database Systems Lab

Sentence scoring results improveMorphological transformation:- Truncation 、 Stemming 、 LemmatizationStop wordsSimilar semantics - WordNet 、 Lexical ChainsCo-reference - word frequency featuresAmbiguity - Lexical ChainsRedundancy - Sentence fusion

lights: light, lights, lighting, litcolleg*: college, colleges, collegium, collegial col*r : color, colour, colander

be: is, am, arecar, wheel, seat, passenger => automobile topicJohn will travel tomorrow. He bought the ticket yesterday

Page 13: Assessing  sentence scoring techniques  for  extractive text summarization

Intelligent Database Systems Lab

Conclusions

• The Word Frequency, TF/IDF, Lexical Similarity, Sentence Length and Text-Rank Score was chosen by as providing good results.

- computationally intensive: TF/IDF- balance in execution-time: Word Frequency

Sentence Length

Page 14: Assessing  sentence scoring techniques  for  extractive text summarization

Intelligent Database Systems Lab

Comments• Advantages

- understand the basic methods and their difference• Applications

- text summarization