63
Ph. D. Dissertation Defense Semantic Analysis for Improved Multi - Document Summarization Quinsulon L. Israel Committee Members: Dr. Il-Yeol Song (Chair) Dr. Hyoil Han (Co-chair) Dr. Jung-Ran Park Dr. Harry Wang Dr. Erjia Yan 1

Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Embed Size (px)

DESCRIPTION

Dissertation defense slides for "Semantic Analysis for Improved Multi-document Text Summarization".

Citation preview

Page 1: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Ph. D. Dissertation DefenseSemantic Analysis for Improved Multi-

Document Summarization

Quinsulon L. Israel

Committee Members:

Dr. Il-Yeol Song (Chair)

Dr. Hyoil Han (Co-chair)

Dr. Jung-Ran Park

Dr. Harry Wang

Dr. Erjia Yan

1

Page 2: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Overview Motivation

Background

Research Goals

Literature Review

Methodology› Approach 1 - MDS by Aggregate SDS via Semantic Linear Combination

› Approach 2- MDS by Semantic Triples Clustering with Focus Overlap

Evaluation

Results

Conclusion

Further Work

2

Page 3: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Motivation

▪ What is automatic focused Multi-Document Text Summarization (fMDS)?

‒ Automatic text summarization: creation of a gist of text by an artificial system

‒ Multi-document summarization: automatic summarization of multiple, yet related documents

‒ fMDS: multi-document summarization focused on some input given to an artificial system

3

Page 4: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

▪ Why automatic focused Multi-Document text Summarization (fMDS)?

‒ Purpose: Information overload reduction of multiple, related documents according to an inputted focus (i.e. query, topic, question, etc.)

‒ Use: Quick overviews of news and reports by analysts that focus on specific details and/or new information

‒ How: Extract subset of sentences from multiple, related text sources, while maximizing “informativeness” and reducing redundancy in the new summary

4

Motivation (cont.)

Page 5: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

5

Motivation (cont.)

“Government Analyst”

Page 6: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Hypothesis

The use of semantic analysis can improve focused multi-document summarization (fMDS) beyond the use of baseline sentence features.

› Semantic analysis here uses light-weight semantic triples to help both represent and filter sentences.

› Semantic analysis also uses assigned weights given to the semantic classes (e.g. special NER typed as person, organization, location, date, etc.)

› Semantic analysis also uses “semantic cues” for identifying important information

6

Motivation (cont.)

Page 7: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

7

What effects does semantic analysis of sentences have

on the improvement of focused multi-document

summarization (fMDS)?

› What is the effect on overall system performance of clustering

sentences based on semantic analysis for improving fMDS?

› What is the effect on overall system performance of using the

semantic class scoring of sentences for improving fMDS?

Motivation (cont.)

Research Question & Sub-questions

Page 8: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Research Goals

› Improve upon the gold standard baseline

› Examine the use of the new “semantic class scoring” with “semantic cue scoring”

› Examine the use of the new “semantic triples clustering” methods for extractive fMDS

› Create a portable, light-weight improvement for fMDS that can be easily modified

8

Page 9: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Background Human Summarization Activity

› Single document summarization (SDS): 79% “direct match” to a sentence within the source document(Kupiec, Pedersen et al. 1995)

› Multiple document summarization (MDS): use 55% of the “vocabulary” contained within source documents(Copeck and Szpakowicx 2004)

Man vs. Machine› “Highly frequent content words” from corpus not

found in automatic summaries during evaluation but appear in human summaries(Nenkova, Vanderwende et al. 2006)

› Man and machine have difficulties with generic MDS(Copeck and Szpakowicx 2004)

9

Page 10: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Human Summarization Agreement› SDS: 40% unigram overlap

(Lin and Hovy 2002)

− Humans tend to agree on “highly frequent content words”(Nenkova, Vanderwende et al. 2006)

− Words not agreed upon may not be highly frequent but may still be useful

› SDS: 30-40 summaries before consensus

› MDS: No such human studies found within literature

10

Background (cont.)

Page 11: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Focus Processing

Sentence Scoring and Ranking

Sentence Selection(into summary)

Redundancy Removal

Sentence Compression

Sentence Processing

Summary Truncation

Figure 1.

Multi-phase process• Process initial focus and document sentences

• Score and rank focus-salient sentences

• Add sentences to summary until pre-determined length

* Compression is optional .* Sentence scoring and ranking can be an iterative process.

Focused MDS (fMDS) Process

Focus-to-Sentences Comparison

11

Standard Summarization System

Optional

System deviates

Background (cont.)

Page 12: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Semantic Triples Parsing(subject-verb-object)

Sentence Scoring

(semantic classes, semantic

cues into aggregate score,

query overlap)

Conceptual Representations

(semantic triples clustering)

Sentence Processing(sentence splitting, tokenization, POS, NER,

phrase detection)

12

Text Summary

Text Collection

Research System

< GATE 7 Toolkit

< MultiPax Plugin

Figure 2

Semantic Annotation(person, location, organization)

Novel features of system

< Developed

LightSemantic >Component

Improvements >

Background (cont.)

SentenceSelection

(STC cluster representative)

< GATE 7 Toolkit

Semantic >

Components >

Page 13: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Overview

Summarization Timeline

Systems Comparison

› Probability/statistical modeling

› Features combination

› Multi-level text relationships

› Graph-based

› Semantic analysis

13

Literature Review

Page 14: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Summarization Timeline

14

1958

1968

1973

1977

1982

1988

1989

1979

1990-1999

2000

Lexical

occurrence

statisticsby Luhn

1961

Linguistic

approaches

Position,

cue wordsby Edmundson

“Cohesion

streamlining”By Mathais

Frames,

semantic

networksTOPIC system

Logic rules,

generativeSUSY system

“Sketchy

scripts”

(templates)FRUMP system

Hybrid

representations,

corpus-basedSCISOR system

Return of

occurrence

statistics“Ranaissance” era

Statistics,

corpus training“State-of-the-art”

era

2010-2011

Deeper

semantic,

structural

analyses

Literature Review (cont.)

MDS

2004

Page 15: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

15

Author YearStatistical

ApproachesFeatures

CombinationGraph-based

Multi-level Text

Relationships

Semantic Analysis

Conroy 2006 x

Nenkova 2006 x

Arora 2008 xYih 2007 xOuyang 2007 xWan 2008 x

Wei 2008 x

Wang 2008 x

Harabagiu 2010 x

System Categories

Literature Review (cont.)

Because systems report different evaluation measures or use

different datasets, normalizing performances across years is

not possible.

Page 16: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

16

Statistical Approaches

Literature Review (cont.)

Authors Year Compress Mat Calc Freq/Prob LDA Summary Op

Conroy 2006 x x xNenkova* 2006

Arora 2008 x x

Focused Uses some focusing input to system

Compress Uses some form of simplification to add more information

Mat Calc Uses complex matrix calculations to filter and select sentences

Freq/Prob Uses statistical or probability distribution method to score terms

LDA Uses complex Latent Dirichlet Allocation to model summaries

Summary Op Uses a method of creating multiple summaries and choosing the optimum summary.

* Not focus-based, but still important

Legend

Authors Year Advantages Disadvantages

Conroy 2006 Uses likely human vocabularyUses other collections external to the one

observed

Nenkova 2006 Simple yet relatively effective

Reports frequency based indicator of human vocabulary but uses probability

instead

Arora 2008Captures fixed topics from corpus,

optimizes summary

Very complex, relies on sampling overcorpus, sentence can represent only one

topic

Page 17: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

17

Features Combination

Literature Review (cont.)

Authors Published Log Reg Word Pos Summary Op Freq/Prob Sentence Pos NER Count WordNet

Yih 2007 x x x x

Ouyang 2007 x x x x

Log Reg Uses logistic regression to tune a scoring estimator

Word Pos Adds word position metric to score

Freq/Prob Uses statistical or probability distribution method to score terms

Sentence Pos Adds sentence position metric to score

Summary Op Uses a method of creating multiple summaries and choosing the optimum summary.

NER Count Counts named entities found and adds to score

WordNet Used to determine semantically related words

Legend

Authors Year Advantages Disadvantages

Yih 2007Simple estimated scoring,

optimizes summaryUses only two sentence features, no

comparison of meaning between words

Ouyang 2007Determines most important

featuresSemantics between words in focus and

sentence compared arbitrarily

Page 18: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

18

Graph-based Approaches

Literature Review (cont.)

Authors Published Bi-partite CM Rand Walk

Wan 2008 x x

Bi-partite Uses bi-partite link graph (between clusters and sentences)

CM Rand Walk Conditional Markov Random Walk done between nodes in graph

Legend

Authors Year Advantages Disadvantages

Wan 2008

Introduces link analysis via modified Google PageRank to

MDS

Uses only cosine similarity for all values, thus no comparison of meaning between

words

Page 19: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

19

Multi-level Text Relationships

Literature Review (cont.)

Authors Published Mat Calc Pairwise WordNet

Wei 2008 x x x

Mat Calc Uses complex matrix calculations to filter and select sentences

Pair-wise Compares pairs of text units closely for determining score

WordNet Used to determine semantically related words

Legend

Authors Year Advantages Disadvantages

Wei 2008

Introduces affinity relationship between text unit levels, text units paired intersected query

reduces noise

Very complex, need better formulation of creating vectors for singly observed terms

(e.g. too much noise from WordNetwithout better constraint)

Page 20: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

20

Semantic Analysis

Literature Review (cont.)

Authors Published Mat Calc Pairwise Structure Coherence

Wang 2008 x x

Harabagiu 2010 x x

Mat Calc Uses complex matrix calculations to filter and select sentences

Pair-wise Compares pairs of text units closely for determining score

Structure Adds ordering and/or proximity of text units to scoring

Coherence Attempts to improve readability of summary

Legend

Authors Year Advantages Disadvantages

Wang 2008Uses semantic analysis, text units

paired reduces noiseComplex matrix reduction, performance

only above average system

Harabagiu 2010Complete semantic parsing, adds

coherence for improvement

heavy corpus training/machine learning, internally created KB, not easily replicable,

fMDS vs generic MDS,

Page 21: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Similar Research

Wang et al. 2008 vs. Proposed Research

21

Wang Harabagiu Research System

Complex Matrix Factorization

Corpus-trained, hand-crafted

kb

Semantic Triples Clustering

Redundancy Removal

Semantic frames overlap, complex

SNMFHeavy, all

argument rolesLight-weight, simpler

semantic triples

Scoring

Sentence-to-sentence semantic relationship more important than

focus

Position, ordering

Semantic class scoring, semantic cues scoring

Training None Extensive None

Literature Review (cont.)

Page 22: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

22

Approach 1 - fMDS by Aggregated SDS via Semantic Linear Combination

− Applies to:

What is the effect on overall system performance of using the semantic class scoring of sentences for improving fMDS?

• Stage 1 Algorthm: SDS via Semantic Linear Combination (SLC)› Uses the combination of a feature set to create an aggregate score

› Introduces semantic class and semantic cues scoring to the feature set

• Stage 2 Algorithm: fMDS by SDS via SLC› Uses all the scored sentences from Stage 1

› Introduces redundancy removal via cosine similarity

Approach 2 - fMDS by Semantic Triples Clustering and Aggregated SDS via SLC

› Uses only the aggregate scores from Approach 1

› Introduces semantic triples clustering for redundancy removal and sentence selection

− Applies to:

What is the effect on overall system performance of clustering sentences based on semantic analysis for improving fMDS?

Approach (cont.)

Page 23: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

23

Approach 3 - fMDS MDS by Semantic Triples Clustering with Cluster Connections

› Uses the aggregate scores from Approach 1

› Uses the semantic triples clustering from Approach 2

› Introduces sentence intra- and inter-connectivity for redundancy removal and sentence selection

− Applies to:

What is the effect on overall system performance of measuring the conceptual connectivity of sentence triples for improving

fMDS?

Approach (cont.)

Page 24: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Approach 1

Stage 1 Algorithm: SDS via Semantic Linear Combination

Input: A Corpus (C) of topically related Documents (D) pre-processed into Sentences (S) by which shallow semantic analysis has been performed: the Named Entities (NE) have been labeled externally by GATE ANNIE.

Output: A summary (SUMM) is a subset of Sentences (S) from the input corpus documents up to a maxLength (i.e., SUMM = {S1, S2, ... , SN}, where N is the maximum number of sentences that could be added to the summary). SUMM contains the best sentences toward a single-document summary.

24

Approach (cont.)

Page 25: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

25

newswire documents

Focus: “Airbus A380”

Evaluation

Test Collection

System 100 (MDS on top of Semantic SDS Linear Combination [Semantic MDS])

Here are some key dates in its development: January 23, 2002: Production starts of Airbus A380 components.

May 7, 2004: French Prime Minister Jean-Pierre Raffarin inaugurates the Toulouse assembly line.

Ravenel said sound levels near Charles de Gaulle airport normally reached about 40 decibels.

According to the source, Wednesday's flight may be at an altitude slightly higher than the some 10,000 feet (3,000

meters) achieved in the first flight, and could climb up to 13,000 feet.

2: AFP_ENG_20050116.0346

A 380 'superjumbo' will be profitable from 2008 : Airbus chief

PARIS , Jan 16

The A 380 'superjumbo', which will be presented to the world in a lavish ceremony in southern France on Tuesday , will be

profitable from 2008 , its maker Airbus told the French financial newspaper La Tribune .

"You need to count another three years ," Airbus chief Noel Forgeard told Monday 's edition of the newspaper when asked

when the break-even point of the 10 - billion-euro-plus ( 13 - billion-dollar-plus ) A 380 programme would come .

So far , 13 airlines have placed firm orders for 139 of the new planes , which can seat between 555 and 840 passengers

and which have a catalogue price of between 263 and 286 million dollars ( 200 and 218 million euros ) .

The break-even point is calculated to arrive when the 250 th A 380 is sold .

6: AFP_ENG_20050427.0493

Paris airport neighbors complain about noise from giant Airbus A 380

TOULOUSE , France , April 27

An association of residents living near Paris 's Charles-de- Gaulle airport on Wednesday denounced the noise pollution

generated by the giant Airbus A 380 , after the new airliner 's maiden flight .

French acoustics expert Joel Ravenel , a member of the Advocnar group representing those who live near Charles de Gaulle ,

told AFP he had recorded a maximum sound level of 88 decibels just after the aircraft took off from near the southwestern city

of Toulouse .

The figure makes the world 's largest commercial jet "one of the loudest planes that will for decades fly over the heads of the

four million people living in the area" outside Paris , Advocnar said in a statement .

Ravenel said sound levels near Charles de Gaulle airport normally reached about 40 decibels .

Journalists watching the Airbus A 380 's first flight at Toulouse airport in southwestern France , however , noted how quiet the

take-off and landing had seemed .

Tens of thousands of spectators cheered as the A 380 touched down at the airport near Toulouse , home of the European

aircraft maker Airbus Industrie , after a test flight of three hours and 54 minutes .

1

Input

Output

Page 26: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Flow of Approach 1 Stage 1 Algorithm: SDS via Semantic Linear Combination 26

Approach (cont.)

Page 27: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

27

Approach (cont.)

Stage 1 Algorithm: Feature set of Step 10

Aggregated Score: Formula…

Aggregate Score = ∑i Є F ɷi * fi, where fi is one of the features

described in section 5.1, i is the number of the feature, ɷi is the

weight of fi, and F is the feature set that this research use.

Page 28: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Flow of Approach 1 Stage 1 Algorithm: SDS via Semantic Linear Combination 28

Approach (cont.)

Page 29: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Approach 1

Stage 2 Algorithm: MDS By SDS via Semantic Linear Combination

• Input: A corpus (C) of topically related documents (D) pre-processed into the best sentences (S) from the Stage 1 SDS by Shallow Semantic Analysis.

• Output: A summary (SUMMm) is a subset of Sentences (S) from the input documents up to maxLength (i.e., SUMMm= {Sx

1, Sy2, ... , S

zN},

where m refers to multi-document and x, y, and z identifies its containing document).

29

Approach (cont.)

Page 30: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

30

Approach (cont.)

Flow of Approach 1 Stage 2 Algorithm: MDS via SDS Semantic Linear Combination

Page 31: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Approach 2

Algorithm: STC Focused MDS By SDS via Semantic Linear Combination

• Input: A corpus (C) of topically related documents (D) pre-processed into the best sentences (S) from the Stage 1 SDS by Shallow Semantic Analysis of Approach 1 and pre-processed for their subject-verb-object triples. Stage 2 MDS of Approach 1 is not used as part of this approach.

• Output: A summary (SUMMstc) is a subset of Sentences (S) from the input corpus documents up to maxLength (i.e., SUMMm= {Sx

1, Sy

2, ... , SzN}, where x, y, and z identifies its containing document).

31

Approach (cont.)

Page 32: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

32

According to police, the violence erupted after two boys, aged 14 and 16, died when they scaled a wall of an electrical relay station and fell against a transformer.

<Sentence s-v-o=“erupt:violence:*:f;die:boy:*:f;scaled:they:wall:f;”></Sentence>

erupt

violence *

die

boy *

triple 1scale

they wall

Example Sentence

Approach (cont.)

triple 2 triple 3

Represents examples of sentences transformed into semantic

triples.

The circle node represents the verb, the first square node

represents the subject, and the last square node represents the

object (direct) if found.

Approach 2 Algorithm: Example of Step 1

Semantic Triples

Page 33: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Flow of Approach 2 Algorithm: STC fMDS By SDS via SLC33

Approach (cont.)

agg

Page 34: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

34

The riot has spread to 200 city

suburbs and towns, including

Marseille, Nice, Toulouse, Lille,

Rennes, Rouen, Bordeaux and

Montpellier and central Paris,

French police said.

_ Nov. 4 _ Youths torch 750 cars,

throw stones at paramedics, as

violence spreads to other towns.

say

Police *

spread

riot *

spread

riot *

throw

youth Stone

spread

riot *

*Yellow triple (at top left) signifies the main cluster semantic triple

2 semantic triples

Rioting spreads to at least 20 Paris-region

towns.

Approach (cont.)

Approach 2 Algorithm: Example of Step 2

2 semantic triples

1 semantic triple

Semantic Triple Cluster Representation

Page 35: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Flow of Approach 2 Algorithm: STC fMDS By SDS via SLC35

Approach (cont.)

agg

Page 36: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

36

The riot has spread to 200 city

suburbs and towns, including

Marseille, Nice, Toulouse, Lille,

Rennes, Rouen, Bordeaux and

Montpellier and central Paris,

French police said.

Higher ranked triple (contained

in sentence with high triple count)

Semantic Triple Cluster Representation

say

Police *

spread

riot *

spread

riot *

2 semantic triples

Rioting spreads to at least 20 Paris-region

towns.

Approach (cont.)

Approach 2 Algorithm: Example of Step 3

1 semantic triple

*Yellow triple (at top left) signifies the main cluster semantic triple

Page 37: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Flow of Approach 2 Algorithm: STC fMDS By SDS via SLC37

Approach (cont.)

agg

Page 38: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

38

The riot has spread to 200 city

suburbs and towns, including

Marseille, Nice, Toulouse, Lille,

Rennes, Rouen, Bordeaux and

Montpellier and central Paris,

French police said.

Query: “Paris Riots”

say

Police *

spread

riot *

spread

riot *

*Yellow triple (at top left) signifies the main cluster semantic triple

2 semantic triples

Rioting spreads to at least 20 Paris-region

towns.

Approach (cont.)

Approach 2 Algorithm: Example of Step 5

1 semantic triple

Semantic triple overlap = 1

Semantic triple overlap = 1

Query Overlap with the Semantic Triples

Semantic Triple Cluster Representation

Page 39: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Flow of Approach 2 Algorithm: STC fMDS By SDS via SLC39

Approach (cont.)

agg

Page 40: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Approach 3

Algorithm: STC Focused MDS By SDS via Semantic Linear Combination

• Input: A corpus (C) of topically related documents (D) pre-processed into the best sentences (S) from the Stage 1 SDS by Shallow Semantic Analysis of Approach 1 and pre-processed for their subject-verb-object triples. Stage 2 MDS of Approach 1 is not used as part of this approach. In addition to the Stage 1 SDS processing from Approach 1, Approach 2 Steps 1-6 are used to collect the triples into their proper ordering, and the sentences are later ordered by the connections between these triples..

• Output: A summary (SUMMconn) is a subset of Sentences (S) from the input corpus documents up to maxLength (i.e., SUMMconn= {Sx

1, Sy

2, ... , SzN}, where x, y, and z identifies its containing document)..

40

Approach (cont.)

Page 41: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

41

Approach 3

Approach (cont.)

Page 42: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

42

Approach 3

Approach (cont.)

Page 43: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Flow of Approach 3 Algorithm: STC fMDS By Connectivity43

Approach (cont.)

agg

Page 44: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

44

Goal: To get a summary with a ROUGE score higher than the gold standard baseline system

To place well against the veteran automatic systems.

For the evaluation, the following methods were used in combination:

› Counting semantic classes and semantic cues to boost informative sentences

› Simpler semantic triples clustering method (including with focus)

Method:

Gather human reference summaries and automated system summaries from the NIST 2008

Text Analysis Conference competition

Use evaluation script from the competition to compare research system summaries against all

other automatic summaries

Compare extrinsic evaluation ROUGE scores with gold standard baseline system and other

automatic systems

Evaluation

Page 45: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

45

newswire documents

Focus: “Paris Riots”

Data used in Evaluation:

› Input for each focus text is a collection of 10 newswire documents

› Each document has approximately 250-500 words for ~20 sentences

› Total input for each collection range from ~150-200 sentences

› Total documents 46 collections for a total of about 10,000 sentences

Evaluation

Example Input (truncated)

<DOC id="AFP_ENG_20051028.0154" type="story" >

<HEADLINE>

Riot rocks Paris suburb after teenagers killed

</HEADLINE>

<DATELINE>

CLICHY-SOUS-BOIS, France, Oct 28

</DATELINE>

<TEXT>

<P>

Dozens of youths went

on a rampage, burning vehicles and vandalising buildings in a tough

Paris suburb Friday in an act of rage following the death by

electrocution of two teenagers trying to flee police.

</P>

</DOC>

Test Collection

NIST Text Analysis Conference Data

Page 46: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

46

Evaluation

Evaluation Metrics:

Three ROUGE metrics from the NIST competitions are used to evaluate the performance of the proposed system:

ROUGE-1, ROUGE-2, and ROUGE-SU4

ROUGE is an N-gram co-occurrence statistic between a candidate system

summary and a set of human model summaries. ROUGE-1 is calculated as

follows:

Reference calculation: Four (4) human model summaries created by judges

according to the NIST competition guidelines

Gold Standard: Lead Baseline System: Collects first four (4) lines of most recent

document

s є {Reference Summaries}

Σ Σ Countmatch(gramn)

Σ Σ Count(gramn)

s є {Reference Summaries}

ROUGE-1

Page 47: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

47

Evaluation Metric: ROUGE-1

Unigram co-occurrence statistic between a system summary and a set of four

human reference summaries. ROUGE-1 is calculated as follows:

Evaluation

Semi-automatic summary vs. 1 Reference Summary

System Candidate Summary Sentence

Police detained 14 people Saturday after a second [night] of [rioting] that broke out in a working-class

[Paris] suburb following the deaths of two youths who were electrocuted while trying to evade police.

3 unigrams found

Human reference Summary Sentence

On successive [nights] the [rioting] spread to other parts of [Paris] and then to other cities.

16 unigrams total

ROUGE-1 = 3 / 16 = 0.1875

Unigrams: {night}, {rioting}, {Paris}

Page 48: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

48

Evaluation

Evaluation Metric: Rouge-SU4

Bigram co-occurrence statistic that allows for four (4) words to be appear

between two (2) words as long as they are in the same sentence order with the

human reference summary

Semi-automatic summary vs. 1 Reference Summary

System Candidate Summary Sentence

Police detained 14 people Saturday after a second [night of rioting that broke out] in a working-class

[Paris] suburb following the deaths of two youths who were electrocuted while trying to evade

police.

1 skip bigram found

Human Reference Summary Sentence

On successive [nights the rioting spread to other] parts of [Paris] and then to other cities.

21 skip bigrams total

Skip Bi-grams: {night rioting}

ROUGE-SU4 = 1 / 21 = 0.04762

Page 49: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

49

Results: Approach 1System Ranking by MDS via SDS Semantic Analysis Approach Variations

Page 50: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Discussion: Approach 1

• The poorer performances of Systems 5, 10R and 16R show that stop word removal is absolutely necessary for improvement, even with the semantic analysis. Without it, systems could not outperform the gold standard baseline system. Slight improvement is shown from the semantic analysis SDS-based MDS Systems 6 over its relative System 5.

• The improved ROUGE score of System 9 over System 16P shows some importance for adding more semantic cueing and semantic class scoring to the selection of sentences. The weights are similarly, but System 16P takes away from the semantic cueing and semantic class scoring and gives it to “df”, and hence the drop.

• Another related class of tested systems are those of Systems 9, 10R and 11H. These systems differ mostly on alternative frequency measures. System 11H use of the well-known tf*idf measure outperforms the pure “df” measure that the other two use. “tf*idf” along with the semantic cueing and semantic class scoring allowed system 11H to obtain a higher score than the gold standard baseline.

50

Page 51: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

51

Results: Approach 2System Ranking by STC Approach Variations

Page 52: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Discussion: Approach 2• All systems displayed are instances of the STC MDS system, except System 0 and this work’s

System 100, which is fMDS by SDS SLC from Approach 1.

• Although its performance in singular was not as promising compared to the veteran systems, the addition of System 3100's semantic triples clustering greatly improves performance by more than 10 rankings over the gold standard baseline. System 3100 also used a minimum cluster density of 2 and a ranking method that gave preference to cluster aggregate score over the cluster density.

• Systems 1600, 1200, and 2700 show improvement over the automatic gold standard baseline with semantic triples clustering alone; however, each additional ranking method shows added improvement.

52

Page 53: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

53

Results: Approach 3System Ranking by STC Connectivity Process

Page 54: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Discussion: Approach 3• Systems Conn1 and Conn2 show only slight improvement over the gold standard baseline

System 0 in Table 11, with System Conn2 performing the best with stop-word pruning of the clusters based on their semantic roles (i.e. if a stop-word was found within a subject, verb or object slot, it was removed from consideration).

• Because these system variations show a minor drop in performance against the gold standard baseline system in other values of rouge, the approach is not satisfactorily performant, but may be relevant for improvement in future work.

54

Page 55: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

55

Conclusion

This work sought to answer the question of what effects does semantic analysis of sentences have on the improvement of focused multi-document summarization (fMDS)?

1. What is the effect on overall system performance of using the semantic classes scoring of sentences for improving fMDS?

Even though it was shown that tf*idf is extremely important in selecting the best sentences, there is a gap that is created that this semantic analysis starts to fill.

The semantic classes and semantic cues scoring still improved several places over the gold standard baseline system.

Page 56: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

56

Conclusion

2. What is the effect on overall system performance of clustering sentences based on semantic analysis for improving fMDS?

This work’s System 3100 outperformed the gold standard baseline System 0 by over 10 places. This performance improvement is mainly attributed to the semantic analysis technique of filtering the sentences by clustering their semantic triples. The semantic triples represent the most basic "meaning" of the sentences during the filtering process.

However, a short drop in performance of two places was observed when attempting to focus the semantic triples themselves. This is possibly due to the absence of the focus terms within the main propositions of the sentences. Yet, they may appear somewhere else within the sentence. Using the query feature helped mitigate this effect for the semantic tripling clustering in system 3100; hence, the better performance without the query overlap determination.

Page 57: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

57

Conclusion

2. What is the effect on overall system performance of measuring the conceptual connectivity of sentence triples for improving fMDS?

Unfortunately, the technique used for sentence intra- and inter-connectivity did not perform well enough against the gold standard baseline system. This approach was able to obtain slightly more vocabulary as denoted by its slightly higher rouge1 score, but other scores were slightly lower compared to the performance of the gold standard baseline system.

3. Important note: within all three semantic analysis approaches, no word sense

disambiguation (WSD) was performed. Even for terms within semantic roles across multiple triples that can be clustered together, their actual "meaning" may be different. It would be worthwhile to add WSD utilizing words that appear around each role that is discovered. This may help improve the accuracy of systems implemented in this research.

Page 58: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Contributions:

› Provides an improvement over the gold standard baseline by more than ten positions.

› Proposes “semantic triples clustering” along with “semantic class scoring” and “semantic cue scoring” as methods to improve extractive fMDS.

› Provides a comparison of the semantic analysis techniques on performance for fMDS that can be used later for new abstractive summarization.

› Created a light-weight, portable fMDS system with no training.

58

Conclusion

Page 59: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Significance:

Improved over the gold standard baseline The research provides a more comparable semantic analysis against fundamental techniques and a gold

standard baseline

The research outlined here provides a more comparable semantic analysis against fundamental techniques and a standard baseline

More domain-independent improvement due to no need for training Can be used as a new baseline and can tested easily on other corpora

Simpler, inexpensive than extensively corpus-trained ‘a prior’ systems Other veteran methods are too expensive and time consuming to reproduce

This research does not rely on extensive corpus training and building tailored, domain-dependent resources.

Does not have the associated cost in time.

Compressing the sentences into a basic form of meaning takes a step in the direction of an abstractive technique . The semantic triples used for this extractive fMDS can be modified to take a step in relatively unexplored area

of abstractive fMDS.

Humans tend to extract whole sentences from documents to create a summary, however they also shorten, move and/or infuse information depending upon importance and length.

59

Conclusion

Page 60: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Direct semantic triplet summaries

Weight dampening

Advanced semantic class analysis

60

Further Work

Conclusion

Page 61: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Publications

Submitted:

Israel, Quinsulon L., Hyoil Han, and Il-Yeol Song. Semantic analysis for focused multi-document summarization of text. Submitted to ACM Symposium on Applied Computing (SAC) 2015.

61

Israel, Quinsulon L., Hyoil Han, and Il-Yeol Song (2010). Focused multi-document summarization: human summarization activity vs. automated systems techniques. Journal of Computing Sciences in Colleges. 25(5): 10-20.

Page 62: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

Publication PlanJournals

Fall 2014 Submit to Journal of Intelligent Information Systems:

› Covers the integration of artificial intelligence and database technologies to create next generation Intelligent Information Systems

› http://www.springer.com/computer/database+management+%26+information+retrieval/journal/10844

Submit to Information Processing Management:› Covers experimental and advanced processes related to information retrieval (IR) and a variety of information systems, networks, and contexts, as well as their implementations and related evaluation.

› http://www.journals.elsevier.com/information-processing-and-management

62

Page 63: Dissertation defense slides on "Semantic Analysis for Improved Multi-document Text Summarization"

References• Conroy, J. M., J. D. Schlesinger, et al. (2006). Topic-focused multi-document summarization using an

approximate oracle score. Proceedings of the COLING/ACL on Main conference poster sessions. Sydney, Australia, Association for Computational Linguistics: 152-159.

• Dang, H. T. (2006). Overview of the DUC 2006. Document Understanding Conference.

• Edmundson , H. P. 1969. New Methods in Automatic Extracting. J. ACM 16, 2 (April 1969), 264-285.

• Harabagiu, S. and F. Lacatusu (2010). "Using topic themes for multi-document summarization." ACM Transactions on Information Systems 28(3): 1-47.

• Ouyang, Y., S. Li, et al. (2007). Developing learning strategies for topic-based summarization. Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. Lisbon, Portugal, ACM: 79-86.

• Nenkova, A. and K. McKeown (2003). References to named entities: a corpus study. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2. Edmonton, Canada, Association for Computational Linguistics: 70-72.

• Yih, W.-T., J. Goodman, et al. (2007). Multi-document summarization by maximizing informative content-words. International Joint Conferences on Artificial Intelligence, Hyderabad, India.

• Wan, X. and J. Yang (2008). Multi-document summarization using cluster-based link analysis. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. Singapore, Singapore, ACM: 299-306.

• Wang, D., T. Li, et al. (2008). Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. Singapore, Singapore, ACM: 307-314.

• Wei, F., W. Li, et al. (2008). Query-sensitive mutual reinforcement chain and its application in query-oriented multi-document summarization. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. Singapore, Singapore, ACM: 283-290.

63