23
Interpreting noun compounds using paraphrases András Dobó University of Oxford Stephen G. Pulman University of Oxford

Interpreting noun compounds using paraphrases

  • Upload
    thanos

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

Interpreting noun compounds using paraphrases. András Dobó University of Oxford Stephen G. Pulman University of Oxford. Interpreting noun compounds using paraphrases. Motivation Related work Method Results Summary Future work. Motivation. - PowerPoint PPT Presentation

Citation preview

Page 1: Interpreting noun compounds using paraphrases

Interpreting noun compounds using

paraphrases

András DobóUniversity of Oxford

Stephen G. PulmanUniversity of Oxford

Page 2: Interpreting noun compounds using paraphrases

Interpreting noun compounds using paraphrases

1. Motivation

2. Related work

3. Method

4. Results

5. Summary

6. Future work

Page 3: Interpreting noun compounds using paraphrases

Motivation

English is full of noun compounds, which are sequences of nouns acting as a single noun

Their interpretation is crucial for many NLP tasks

Using dictionaries is unfeasible

Automated methods

Page 4: Interpreting noun compounds using paraphrases

Related work

Statistical approaches Web queries or large corpora Two main categories of methods

Inventory based approaches Small number of abstract relational categories Criticized for numerous reasons

Paraphrasing approaches Verbs and prepositions as paraphrases Water bottle = bottle that is for water be for

Page 5: Interpreting noun compounds using paraphrases

Method

Paraphrasing method Ranked list of paraphrases for each NC Uses large corpora to search for paraphrases

Second noun is the head subject = second noun, object = first noun

Validates paraphrases using web queries Two main approaches in the search of

paraphrases

Page 6: Interpreting noun compounds using paraphrases

Subject-paraphrase-object-triples

Counts the frequency of all (subject, paraphrase, object) triples in the corpus

Then for each NC it searches for those triples, where subject = second noun, object = first noun

List of suitable paraphrases for each NC Ranks paraphrases for each NC using a

scoring method based on their frequency

Page 7: Interpreting noun compounds using paraphrases

Subject-paraphrase-and-paraphrase-object-pairs

Counts the frequency of all (subject, paraphrase) and (paraphrase, object) pairs in the corpus

Then for each NC it searches for those pairs, where subject = second noun, object = first noun

Two lists of paraphrases for each NC Rank paraphrases for each NC using a

scoring method based on their frequency

Page 8: Interpreting noun compounds using paraphrases

Scoring methods

Subject-paraphrase-object-triples version: Simply the frequency of the relevant (subject,

paraphrase, object) triple Subject-paraphrase-and-paraphrase-object-

pairs version: Using frequencies is not suitable The product of pointwise the mutual information of

the relevant (subject, paraphrase) and (paraphrase, object) pairs

Page 9: Interpreting noun compounds using paraphrases

Used corpora and their preprocessing

Search for paraphrases: British National Corpus

100 million words Grammatical relations from parser

Web 1T 5-gram Corpus Generated from 1 trillion words of web page text Grammatical relations from POS patterns

Noun verb determiner noun

Validation of paraphrases: The Web through Google and Yahoo!

Page 10: Interpreting noun compounds using paraphrases

Passive paraphrases

Their surface subject is actually their object

(subject, paraphrase) = (paraphrase2, object) paraphrase: passive, without preposition paraphrase2: active version of paraphrase subject = object Their frequencies are counted together

Page 11: Interpreting noun compounds using paraphrases

Passive paraphrases

(subject, paraphrase, object) = (subject2, paraphrase2, object2) paraphrase: passive, with by preposition paraphrase2: active version of paraphrase, without

preposition object2 = subject subject2 = object Their frequencies are counted together

Such (paraphrase, object) and (subject2, paraphrase2) pairs are treated the same way

Page 12: Interpreting noun compounds using paraphrases

Patientive ambitransitive verbs

Three main groups of verbs: strictly transitive, strictly intransitive, ambitransitive

Strictly intransitive verbs have two subclasses: unergative and unaccusative

Ambitransitive verbs have two subclasses too: agentive and patientive

Patientive ambitransitive verbs in intransitive use behave in the same way as passive verbs they are treated the same way

Page 13: Interpreting noun compounds using paraphrases

Using synonyms, hypernyms, sister words etc.

No paraphrases are found for several NCs Hypothesis: NCs comprising semantically

similar words are interpreted the same way Using semantically similar words in the

search for paraphrases Synonyms, hypernyms, sister words from

WordNet Semantically similar words that are automatically

found with a method proposed by Dekang Lin

Page 14: Interpreting noun compounds using paraphrases

Validation of paraphrases

Some paraphrases are incorrect

Validation is needed Hypothesis: If a paraphrase is suitable for a

NC, then there should exist at least some web pages containing the NC paraphrased by that paraphrase

Page 15: Interpreting noun compounds using paraphrases

Validation of paraphrases

Google and Yahoo! queries Simple queries: “n2Infl THAT p n1Infl” Extended queries:

Multiple verb tenses Wildcard characters (up to 9)

Score for each paraphrase is recalculated

Page 16: Interpreting noun compounds using paraphrases

Testing and evaluation

Tested on the first 50 NCs of the SemEval-2 Task #9

3 best paraphrases for each NC 5 native speakers recruited for evaluation

They score each paraphrase from 1 to 5 Their agreement was checked using Krippendorff’s

alpha, and it was too low

The (noun compound, paraphrase) pairs with highest disagreement were omitted

Page 17: Interpreting noun compounds using paraphrases

Best version

Subject-paraphrase-object-triples version Web 1T 5-gram Corpus Combination of two basic versions:

No substitute words Sister words Scores are recalculated in a way that favors

paraphrases returned by the first version Validation: Google, present simple, up to 1

wildcard

Page 18: Interpreting noun compounds using paraphrases

Results

Mixed performance

Average scores

Promising results given the difficulty of task

Noun compound 1st rank 2nd rank 3rd rank

arts museum be of be devoted to be for

bird droppings be in be for be

Rank of paraphrase Average score

1st rank 3.1842

2nd rank 2.7687

3rd rank 2.5583

Page 19: Interpreting noun compounds using paraphrases

Results

Best scoring NCsNoun compound Avg. Score

broadway youngster 4,7500

cell membrane 4,6000

cattle population 4,4000

arts museum 4,3333

business sector 4,2000

arts colleges 4,0000

backwoods protagonist 3,8750

antibiotic regimen 3,8667

census population 3,8667

business applications 3,7000

Worst scoring NCsNoun compound Avg. Score

championship bout 2,0000

buddhist philosophy 1,8000

cell block 1,7500

banana industry 1,7333

ancestor spirits 1,6000

anode loss 1,5000

bird droppings 1,2667

bow scrape 1,2500

activity spectrum 1,0000

altitude reconnaissance 1,0000

Page 20: Interpreting noun compounds using paraphrases

Future work

Parsing the Web 1T 5-gram Corpus

Much lower error rate in obtaining the grammatical relations

Extended validation part Employing synonyms, hypernyms, sister words or

semantically similar words Combining the different extensions

Page 21: Interpreting noun compounds using paraphrases

Summary

Interpreting noun compounds is crucial for many NLP tasks

We presented a method for noun compound interpretation that searches for paraphrases in large corpora and issues web queries to validate the results

The results are promising, and could be further improved

Page 22: Interpreting noun compounds using paraphrases

Acknowledgements

The attendance of this workshop was partly supported by the Hungarian National Office for Research and Technology within the framework of the R&D project MASZEKER (Modell-Alapú Szemantikus Kereső Rendszer – Model Based Semantic Search System).

Page 23: Interpreting noun compounds using paraphrases

Thank you!