Transcript
Page 1: The role of linguistic information for shallow language processing

The Role of Linguistic Information for Shallow Language Processing

Constantin OrasanResearch Group in Computational LinguisticsUniversity of Wolverhamptonhttp://www.wlv.ac.uk/~in6093/

Page 2: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

We need to be able to process language automatically: To have better access to information To interact better with computers To have texts translated from one

language to another … so why not replicate the way

humans process language?

Page 3: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Process language in a similar manner to humans

… “natural language systems must not simply understand the shallow surface meaning of language, but must also be able to understand the deeper implications and inferences that a user is likely to intend and is likely to take from language” (Waltz, 1982)

Also referred to as deep processing

Page 4: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Deep vs. shallow linguistic processing

Deep processing: tries to build an elaborated representation of the document in order to “understand” and make inferences

Shallow processing: extracts bits of information which could be useful for the task (e.g. shallow surface meaning), but no attempt is made to understand the document

Page 5: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Purpose of this talk

To show that deep processing has limited applicability

To show that it is possible to improve the performance of shallow methods by adding linguistic information

Text summarisation is taken as example

Page 6: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Structure

1. Introduction2. FRUMP3. Shallow processing for automatic

summarisation4. Evaluation5. Conclusions

Page 7: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Automatic summarisation

Attempts to produce summaries using automatic means

Produces extracts: extract and rearrange Uses units from the source as such

Produces abstracts: understand and generate Rewords the information in the source

Page 8: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Automatic abstraction

Many methods try to replicate the way humans produce summaries

Very popular in the 1980s because it fit the overall AI trend

The abstracts are quite good in terms of coherence and cohesion

Tend to keep the information in some intermediate format

Page 9: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

FRUMP The most famous automatic

abstracting system Attempts to understand parts of the

document Uses 50 sketchy scripts Discards information which is not

relevant to the script Words from the source are used to

select the relevant script

Page 10: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Example of script

The ARREST script:1. Police goes where the suspect is2. There is optional fighting between the

suspect and the police3. The suspect is apprehended4. The suspect is taken to a police station5. The suspect is charged6. The suspect is incarcerated or released

on bond

Page 11: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

System organisation Relies on:

a PREDICTOR which takes the current context and predicts next events

a SUBSTANTIATOR which verifies and flesh out the predictions

If the PREDICTOR is wrong, it backtracks

The SUBSTANTIATOR relies on textual information and inferences

Page 12: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

The output

Example of summary: A bomb explosion in a Philippines Airlines jet has killed the person who planted the bomb and injured 3 people.

The output can be in several languages

It is very coherent and brief

Page 13: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Limitations It works very well when it can

understand the text, but … Language is ambiguous so it is

common to misunderstand a text (e.g. “Carter and Sadat embraced under a cherry tree in the White House garden, a symbolic gesture belying the differences between the two governments” MEETING script)

Page 14: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Limitations (II) It can handle only scripts which are

predefined In can deal only with information

which is encoded in the scripts It can make inferences only about

concepts it knows

… it is domain dependent and cannot be easily adapted to other domains

Page 15: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Limitations (III) sometimes it can misunderstand

some scripts with funny results:

Vatican City. The dead of the Pope shakes the world. He passed away …

Summary:

Earthquake in the Vatican. One dead.

Page 16: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

… “natural language systems must not simply understand the shallow surface meaning of language, but must also be able to understand the deeper implications and inferences that a user is likely to intend and is likely to take from language” (Waltz, 1982)

“…there seems to be no prospect for anything other than narrow-domain natural-language systems for the foreseeable future” (Waltz, 1982)

Page 17: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Automatic extraction Users various shallow methods to

determine which sentences are important

It is fairly domain independent Extracts units (e.g. sentences,

paragraphs) and usually presents them in the order they appear

The extracts are not very coherent, but they can give the gist of the text

Page 18: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Purpose of this research

Show how different types of linguistic information can be used to improve the quality of automatic summaries

Build automatic summarisers which relies on an increasing number of modules

Combine this information Assess each of the summarisers

Page 19: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Setting of this research

A corpus of 65 scientific articles from JAIR was used

Over 600,000 words in total They were in electronic format Contain author produced summaries 2%, 3%, 5%, 6% and 10% summaries

are produced

Page 20: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Evaluation metric Cosine similarity between the

automatic extract and the human produced abstract

It would be very interesting to repeat the experiments using alternative evaluation metrics e.g. ROUGE

Page 21: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Extracts vs. abstractsHuman abstract

The main operations in Inductive Logic Programming (ILP) are generalization and specialization, which

only make sense in a generality order.

Extract

S16 Inductive Logic Programming (ILP) is a subfield of Logic Programming and Machine Learning that tries to induce clausal theories from given sets of positive and negative examples.

S24 The two main operations in ILP for modification of a theory are generalization and specialization.

S26 These operations only make sense within a generality order.

Page 22: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Extracts vs. abstractsHuman abstract

The main operations in Inductive Logic Programming (ILP) are generalization and specialization, which

only make sense in a generality order.

Extract

S16 Inductive Logic Programming (ILP) is a subfield of Logic Programming and Machine Learning that tries to induce clausal theories from given sets of positive and negative examples.

S24 The two main operations in ILP for modification of a theory are generalization and specialization.

S26 These operations only make sense within a generality order.

Page 23: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Extracts vs. abstracts (II)

It is not possible to obtain 100% match between extracts and abstracts

There is somewhere an upper limit for extracts

This upper limit is represented by the set of sentences which maximise the similarity with human abstracts

Page 24: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Determining the upper limit

Try to find out the set of sentences which maximises the similarity with the human abstract

Two approaches: Greedy algorithm A genetic algorithm

More details in Orasan (2005)

Page 25: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

The upper limit

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

2% 3% 5% 6% 10%

Upper limit

Page 26: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Baseline

Is a very simple method which does not employ too much knowledge

The first and last sentence in the paragraphs were used

Page 27: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

The upper and lower limit

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

2% 3% 5% 6% 10%

Upper limit

Baseline

Page 28: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Term-based summarisation One of the most popular

summarisation methods It is rarely used on its own Assumes that the importance of a

sentence can be determined on the basis of the importance of words it contains

Various methods can be used to determine the importance of words

Page 29: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Term-frequency

The importance of a word is determined by how frequent it is

Not very good for very frequent words such as articles and prepositions

A stop list can be used to filter out such words

Page 30: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

TF*IDF

Very popular method in IR and AS IDF = inverse document frequency A word which is frequent in a

collection of documents cannot be important for a document even if it is quite frequent

Page 31: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

2% 3% 5% 6% 10%

Upper limit

Baseline

TF

TF*IDF

Page 32: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

0.4

0.41

0.420.43

0.44

0.45

0.46

0.470.48

0.49

0.5

5% 6% 10%

Upper limit

Baseline

TF

TF*IDF

Page 33: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Indicating phrases

Indicating phrases are groups of words which can indicate the importance or “un-importance” of a sentence

They are usually meta-discourse markers

They are genre dependent E.g. in this paper, we present, we

conclude that, for example, we believe

Page 34: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

2% 3% 5% 6% 10%

Baseline

TF

TF*IDF

IP

Page 35: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

More accurate word frequencies Words can be referred to by

pronouns, this means that … concepts represented by these words

do not get accurate frequency scores A pronoun resolution algorithm was

employed to determine the antecedents of pronouns …

and obtain more accurate frequency scores for words

Page 36: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Mitkov’s Anaphora Resolution System (MARS)

Relies on a set of boosting and impeding indicators to determine the antecedent from a set candidates: Prefer: subject, terms, closer candidates Penalise: indefinite NPs, far away candidates

A third of the pronouns in the corpus were annotated with anaphoric information

MARS: 51% success rate More in Mitkov, Evans and Orasan (2002)

Page 37: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

0.35

0.4

0.45

0.5

0.55

2% 3% 5% 6% 10%

TF

TF*IDF

IP

TF+MARS

TF*IDF+MARS

Page 38: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Combination of modules Used a linear combination of the

previous modules: Term-based summariser enhanced with

anaphora resolution Indicating phrases Positional clues

The scores assigned by each module as normalised and each module obtained a weight of 1

Page 39: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

0.35

0.4

0.45

0.5

0.55

2% 3% 5% 6% 10%

TF

TF*IDF

IP

TF+MARS

TF*IDF+MARS

Combination

Page 40: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Discourse information

Use a genetic algorithm to produce extracts which: Have the score assigned by the

“Combined” summariser high Consecutive sentences feature the same

entities Loosely implements the Centering

Theory

Page 41: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

0.25

0.35

0.45

0.55

0.65

0.75

2% 3% 5% 6% 10%

Upper limit

Baseline

TF

TF*IDF

IP

TF+MARS

TF*IDF+MARS

Combination

Discourse

Page 42: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Conclusions

It is possible to improve the accuracy of shallow automatic summarisers by using additional linguistic information

The linguistic information is relatively simple and easy to obtain

… but things are not always the way expect (see Orasan 2006)

The methods are domain independent

Page 43: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

0.25

0.35

0.45

0.55

0.65

0.75

2% 3% 5% 6% 10%

Upper limit

Baseline

TF

TF*IDF

IP

TF+MARS

TF*IDF+MARS

Combination

Discourse

Page 44: The role of linguistic information for shallow language processing

KEPT2007 - 6th June 2007

Thank you


Recommended