14
Learning Taxonomic Relations from Heterogeneous Evidence Philipp Cimiano Aleksander Pivk Lars Schmidt-Thieme Steffen Staab (ECAI 2004)

Learning Taxonomic Relations from Heterogeneous Evidence

  • Upload
    gage

  • View
    18

  • Download
    2

Embed Size (px)

DESCRIPTION

Learning Taxonomic Relations from Heterogeneous Evidence. Philipp Cimiano Aleksander Pivk Lars Schmidt-Thieme Steffen Staab (ECAI 2004). Purpose. To examine the possibility of learning taxonomic relations by considering various sources of evidence - PowerPoint PPT Presentation

Citation preview

Page 1: Learning Taxonomic Relations from Heterogeneous Evidence

Learning Taxonomic Relations from Heterogeneous Evidence

Philipp Cimiano

Aleksander Pivk

Lars Schmidt-Thieme

Steffen Staab (ECAI 2004)

Page 2: Learning Taxonomic Relations from Heterogeneous Evidence

Purpose To examine the possibility of learning

taxonomic relations by considering various sources of evidence

Main aim:1. To gain insight into the behavior of different

approaches to learn taxonomic relations2. To provide a first step towards combining these

different approaches3. To establish a baseline model for further research

Page 3: Learning Taxonomic Relations from Heterogeneous Evidence

Introduction Taxonomies or conceptual hierarchies are useful in many NL

P applications. However, the development of suitable ontologies is time-con

suming. Automatically acquiring ontological knowledge is required. The approach proposed in this paper learns taxonomic relatio

ns (is-a relation) by considering four different evidences:1. Hearst-patterns matched in a large corpus2. Hearst-patterns matched in WWW3. WordNet4. The ‘vertical relations’-heuristic

Page 4: Learning Taxonomic Relations from Heterogeneous Evidence

Introduction Goal:

Learning is-a relations in tourism domain Training Corpus:

Domain-specific: http://www.lonelyplanet.com http://www.all-inall.de

General: British National Corpus

The ontology for evaluation: A tourism reference ontology modeled by ontology engineer. A few abstract concepts are removed. 272 concepts, 225 direct is-a relations, and 636 non-direct is-a relatio

ns

Page 5: Learning Taxonomic Relations from Heterogeneous Evidence

Hearst Patterns Lexico-syntactic patterns proposed by Hearst (1992).

N such as N1, N2,…

such N as N1, N2,…

N1, N2,… and other N

N, (especially | including) N1, N2,…

From these patterns, we could derive is-a(Ni, N). Numbers of Hearst-patterns between different terms ar

e recorded and normalized to 0~1. Different thresholds are set and experimented.

Page 6: Learning Taxonomic Relations from Heterogeneous Evidence

Hearst Patterns

Page 7: Learning Taxonomic Relations from Heterogeneous Evidence

WordNet WordNet is not “unstructured” source of evidence. However, it is general and domain-independent. One term may have several senses and there may be

more than one hypernym relation between two terms. Two different strategies are used:

1. Normalizing all hypernym paths between two terms:

2. Considering only the most frequent sense of t1

1,

)(

))(),((max

1

21

tsense

tsensetsensepaths

Page 8: Learning Taxonomic Relations from Heterogeneous Evidence

WordNet

Page 9: Learning Taxonomic Relations from Heterogeneous Evidence

WordNet

Page 10: Learning Taxonomic Relations from Heterogeneous Evidence

‘Vertical Relations’-Heuristic Given t1 and t2, if t2 matches t1 and t1 is additionally modified by certain terms

or adjectives, the relation is-a(t1, t2) is derived. Ex. is-aHEURISTIC(international conference, conference)

Page 11: Learning Taxonomic Relations from Heterogeneous Evidence

World Wide Web Google API (http://www.google.com/apis/) is

used to count the matches of certain Hearst-patterns in the Web.

The sum of the number of Google hits over all patterns for a certain pair (t1, t2) is normalized by dividing through the number of hits returned for t1.

Page 12: Learning Taxonomic Relations from Heterogeneous Evidence

World Wide Web

Page 13: Learning Taxonomic Relations from Heterogeneous Evidence

Combining Evidences

Page 14: Learning Taxonomic Relations from Heterogeneous Evidence

Conclusion and Further Work A simple combination strategy improves the results. It remains further work to find out if other sources of

evidence could be integrated into this approach. It could turn out to be useful to only consider domain-specific

text collections instead of a general corpus such as the BNC and to consider only pages in the World Wide Web related to the domain.

It remains as a challenge to determine the optimal strategy to combine the different approaches.

In order to apply machine learning techniques for this purpose, it is necessary to cope with the high number of negative examples.