Probabilistic Text Structuring: Experiments with Sentence Ordering Mirella Lapata Department of Computer Science University of Sheffield, UK (ACL 2003)

Probabilistic Text Structuring: Experiments with Sentence Ordering

Mirella LapataDepartment of Computer Science

University of Sheffield, UK

(ACL 2003)

2/23

Abstract Ordering information is a critical task for natural

language generation applications. We describe a model that learns constraints on

sentence order from a corpus of domain specific texts and an algorithm that yields the most likely order among several alternatives.

We evaluate the automatically generated orderings against authored texts and against human subjects

We also access the appropriateness for multi-document summarization

3/23

Introduction Structuring a set of facts into a coherent text is a

non-trivial task which has received much attention in the area of concept-to-text generation

The problem of finding an acceptable ordering does not arise solely in concept-to-text generation but also in the emerging field of text-to-text generation

Examples of applications are single- and multi-document summarization as well as question answering

4/23

Introduction Barzilay et al. (2002) address the problem of information o

rdering in multi-document summarization and propose two naïve algorithms

Majority ordering Select most frequent orders across input documents Issues: (Th1, Th2, Th3) and (Th3, Th1)

Chronological ordering Order facts according to publication date Issues: Event Switching

Based on the human study Barzily et al. further proposed an algorithm that first identifies topically related groups of sentences (e.g. lexical chains) and then orders them according to chronological information

5/23

Introduction In this paper, we introduce an unsupervised

probabilistic model for text structuring that learns ordering constraints

Sentences are represented by a set of informative features that can be automatically extracted without recourse to manual annotation

We also propose an algorithm to construct an acceptable ordering rather the best one

We propose an automatic method of evaluating the orders by measuring closeness or distance from the gold standard

6/23

Learning to Order The method

The task of predicting the next sentence is dependent on its n-i previous sentences

We simplify by assuming the probability of any given sentence is determined only by its previous sentence:

7/23

Learning to Order We will therefore estimate P(Si|Si-1) from features that

express its structure and content

We further assume that these features are independent and that P(Si|Si-1) can be estimated from the pairs in the cartesian product

8/23

Learning to Order The probability P(a(i,j)|a(i-1,k)) is estimated as:

To illustrate with an example

The probability of S2 and S3 can be estimated: P(h|e), P(h|f),P(h|g), P(i|e), P(i|f), P(i|g) P(h|e) = f(h,e)/f(e)= 1/6 = 0.16

9/23

Learning to Order Determining an order

The set of orders can be represented as a complete graph, where the set of orders can be represented as a complete graph, where the set of vertices V is equal to the set of sentence S and each edge u->v has a weight, the probability p(v|u) <= NP-complete problem

Fortunately, Cohen et al. (1999) propose an approximate solution which ca be easily modified for our task

10/23

Learning to Order The algorithm starts by assigning each vertex v

V a probability (the product of their features) The greedy algorithm then picks the node with

highest probability and orders it ahead of the other nodes

The selected node and its incident edges are deleted from the graph and each remaining node is now assigned the conditional probability of seeing this node

The node which yields the highest conditional probability is selected and order ahead

The process is repeated until the graph is empty

11/23

Learning to Order As an example

12/23

Parameter Estimation The model was trained on BILLIP corpus (30M

words), a collection of texts from the Wall Street Journal (1987~89)

The average story length is 19.2 sentences, 71.3% of the texts are less than 50 sentences

13/23

Parameter Estimation The corpus is distributed in a Treebank-style mach

ine-parsed version which was produced with Charniak’s (2000) parser

We also obtained a dependency-style version of the corpus using MINIPAR (Lin, 1998)

From the two different version of BILLIP corpus the following features were extracted Verbs, Nouns and dependencies

14/23

Parameter Estimation Verbs

We capture the lexical inter-dependencies between sentences by focusing on verbs and their precedence relationships in the corpus

From the Treebank parses we extracted the verbs contained in each sentence

A lemmatized version Reduce to their base forms For example in Figure 3(1): say, will, be, ask and approve

A non-lemmatized version Preserve tense-related information For example in Figure 3(1): said, will be asked, to approve

15/23

Parameter Estimation Nouns

We operationalize entity-based coherence for text-to-text generation by simply keeping track of the nouns attested in a sentence without taking the personal pronouns into account

We extracted nouns from a lemmatized version of Treebank-style parsed corpus

In case of noun compounds, only the compound head was taken into account

A small set of rules was used to identify organizations, person names, locations spanning more than one word

Back-off model was used to tackle unknown words Examples in sentence (1) of Figure 3:Laidlaw Transportation Ltd., s

hareholder, Dec 7, meeting, change, name and Laidlaw Inc. In sentence (2), company, name, business, 1984, sale and operation

16/23

Parameter Estimation Dependencies

The Noun and verb features do not capture the structure of the sentences to be ordered

The dependencies were obtained from the output of MINIPAR and they are represented as triples, consisting of head, a relation and object modifier (N:mod:A)

Verbs(49 types), nouns(52 types), verbs and nouns (101 types) (frequency larger than one per million)

17/23

Experiments Evaluation Metric

Kendall’sτis based on the number of inversions in the rankings and is defined below

Example

Where N: the number of objects (i.e., sentences) being ranked Inversions: the number of interchanges of consecutive elements necessary to arrange them in their natural order

M1 and M2: 1-8/45 = 0.822M1 and M3: 1-34/45 = 0.244

18/23

Experiments Experiment 1: Ordering Newswire Texts

The model was trained on the BILLIP corpus and tested on 20 held-out randomly selected unseen texts (average length 15.3)

The ordered output was compared against the original authored text usingτ

ANOVA test

19/23

Experiments Experiment 2: Human

Evaluation We compare our model’s performance

against judges Twelve texts were randomly selected from

the 20 texts in our test data and the texts were presented to subjects with the order of their sentences scrambled

Each participant (137 volunteers, 33 per text) saw three texts randomly chosen from the pool of 12 texts and they were asked to reorder the sentences so as to produce coherent text

ANOVA test

20/23

Experiments Example 3: Summarization

Barzilay et al. (2002) collected ten sets of articles each consisting of two to three articles reporting the same event and simulated MULTIGEN by manually selected the sentences to be included in the final summary. Ten subjects provided orders for each summary which had an average length of 8.8

We simulated the participants’ task by using the model to produce an order for each candidate summary and then compared the difference in the orderings generated by the model and participants

Note that the model was trained on the BILLIP corpus, whereas the sentences to be ordered were taken from news articles describing the same event

21/23

Experiments Example 3: Summarization

Not only were the news articles unseen but also their syntactic structure was unfamiliar to the model

ANOVA test

22/23

Discussion In this paper, we proposed a data intensive approach to

text coherence where constraints on sentence ordering are learned from a corpus of domain-specific texts

We experimented with different feature encodings and showed that lexical and syntactic information is important for the ordering task

Our results indicate that the model can successfully generate orders for texts taken from the corpus on which is trained

The model also compares favorably with human performance on a single- and multiple document ordering task

23/23

Discussion Future works

Our evaluation metric only measures order similarities or dissimilarities

How about coherent? Whether a trigram model performs better than the prop

osed model? The greedy algorithm implements a search procedure w

ith a beam of width one . How about using two or three? Introducing the features that express semantic similariti

es across documents by relying on WordNet or on automatic clustering methods

Documents

Probabilistic Text Structuring: Experiments with Sentence Ordering Mirella Lapata Department of Computer Science University of Sheffield, UK (ACL 2003)