63
Methods & Tools Ontology Learning from Ontology Learning from Text Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and Telecommunications University of Athens – Greece Polyxeni Katsiouli

Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Embed Size (px)

Citation preview

Page 1: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Methods & Tools

Ontology Learning from TextOntology Learning from Text

18/5/2007

Pervasive Computing Research GroupCommunication Networks Laboratory

Department of Informatics and TelecommunicationsUniversity of Athens – Greece

Polyxeni Katsiouli

Page 2: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Definition of Ontology

‘A formal, explicit specification of a shared conceptualization’

must be machine

understandable

types of concepts and constraints must be clearly

defined

not private to some individual,but accepted by a group

an abstract model of some

phenomenon in the world formed

by identifying the relevant

concepts of that phenomenon

Page 3: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Main elements of an ontology

Hierarchy of concepts(is-a relations)

Object property(relation)

domain range

domain

xsd:stringxsd:string

range

datatype property(attribute)

hasTitle

wasWrittenBy

Page 4: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Definition of Ontology Learning

The application of a set of methods and techniques used for building an ontology from scratch

Uses distributed and heterogeneous knowledge and information sources

Allows a reduction in the time and effort needed in the ontology development process

Page 5: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Ontology Learning methods from…

Unstructured sources

• Involves NLP techniques, morphological and syntactic analysis, etc.

Semi-structured source

• elicit an ontology from sources that have some predefined

structure, such as XML Schema

Structured data

• Extracting concepts and relations from knowledge contained in structured data, such as databases

Page 6: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Ontology Learning ‘Layer Cake’

Axioms & Rules

Relations

Taxonomy (Concept hierarchies)

Concepts

Synonyms

Termsdisease, illness, hospital

{disease, illness}

Disease:=<I, E, L>

is_a (Doctor, Person)

cure (domain:Doctor, range:Disease)

x, y (sufferFrom(x, y) ill(x))

Page 7: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Part 1 Terms Extraction

Axioms & Rules

Relations

Taxonomy (Concept hierarchies)

Concepts

Synonyms

Termsdisease, illness, hospital

Page 8: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Terms

Linguistic realizations of domain-specific concepts

Are the basis of the ontology learning process

Term extraction implies:

• Linguistic processing part-of-speech tagging, morphological analysis, etc.

• Statistical processing compares the distribution of terms between corpora

Page 9: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Terms Extraction: Process

Run a Part-Of-Speech (POS) tagger over the domain corpus

Identify possible terms by constructing patterns, such as: Adj-Noun, Noun-noun, Adj-Noun-Noun,…

Ignore Names

Identify only the relevant to the text terms by applying statistical metrics

Page 10: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Linguistic Analysis: an exampleDiscourse Analysis

Dependency Structure (S)

Dependency Structure (Phrases)

Phrase Recognition

Morphological Analysis (stemming)

Part of Speech & Semantic Tagging

Tokenization (incl. Named-Entity Rec.)[table] [2005-06-01] [John Smith]

[[the] [large] [table] NP] [[in] [the] [corner] PP]

[table N:ARTIFACT] [table N:furniture]

[work~ing V]

[[the SPEC] [large MOD] [table HEAD] NP]

[[He SUBJ] [booked PRED] [[this] [table HEAD] NP:DOBJ]S]

[[He SUBJ] [booked PRED] [[this] [table HEAD]NP:DOBJ:X1]…]…

[[It SUBJ:X1] [was PRED] still available…]

Page 11: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Statistical Analysis

Statistical metrics used in terms extraction:

2 ( exp)

exp

obs Chi-square

Term weighting (TFIDF) ( ) log( )( )

Ntfidf w tf

df w

Mutual Information ( , )( , )

( ) ( )

P x ymi x y

P x P y

Page 12: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

TFIDF

( ) ( ) log( )( )

Ntfidf w tf w

df w

tf(w) term frequency (number of words occurrences in a document)

df(w) document frequency (number of documents containing the word

N number of all documents

tfidf(w) relative importance of the word in the document

Most popular weighting schema

The word is more popular when it appears several times in a document The word is more important if it appears

in less documents

Page 13: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Part 2 Synonyms

Axioms & Rules

Relations

Taxonomy (Concept hierarchies)

Concepts

Synonyms

Terms

{disease, illness}

Page 14: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Synonyms

Identification of terms that share semantics, i.e., potentially refer to the same concept

Methods for extracting synonyms

• Based on WordNet

• Latent Semantic Indexing (LSI)

Page 15: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

WordNet A lexical database for the English language Nouns, verbs, adjectives & adverbs are grouped into sets

of synonyms (synsets) Synsets are interlinked by means of conceptual-semantic

and lexical relations

Page 16: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Adapting WordNet to specific domain

Partition the set of synonymy relations defined in WordNet in three classes:

• Relations irrelevant in the specific domain

• Relations that are relevant but incorrect in the specific domain

• Relations that are relevant and correct in the specific domain

Remove relations from the first two classes and include relations from the third class

Rank the rest sets according to their frequency in corpus

Page 17: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Latent Semantic Indexing (LSI)

LSI is a technique in NLP of analyzing relationships between a set of documents and the terms they contain

Uses a term-document matrix which describes the occurrences of terms in documents – Vector Space Model

Example: doc1 doc2

database X

computer X X

access X

Page 18: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Part 3 Concepts

Axioms & Rules

Relations

Taxonomy (Concept hierarchies)

Concepts

Synonyms

Terms

Disease:=<I, E, L>

Page 19: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Concepts Intension, Extension, Lexicon

A term may be indicate a concept if we can define its:

Intension:

Extension:

Lexical realizations:

(in)formal definition of the set of objects that this concept

describes

a set of objects that the definition of this concept

describes

the term itself and its multilingual synonyms

Example: a disease is an impairment of health or a condition of abnormal functioning

Example: influenza, cancer, heart disease

Example: disease, illness, maladie

Page 20: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Part 4 Taxonomy Induction

Axioms & Rules

Relations

Taxonomy (Concept hierarchies)

Concepts

Synonyms

Terms

is_a (Doctor, Person)

Page 21: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Concept Hierarchy Extraction

With the use of WordNet

Lexico-syntactic patterns

Machine Readable Dictionaries

Co-occurrence Analysis

Linguistic-approaches

Basic methods used for taxonomy extraction:

Page 22: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Taxonomy Extraction with WordNet

Given two terms t1 and t2, check if they stand in a

hypernym relation with regard to WordNet

Normalize the number of hypernym paths by dividing by the number of senses of t1

1 21 2

1

| ( ( ), ( )) |( , ) min( ,1)

| ( ) |

paths senses t senses tisa t t

senses t

path: a sequence of edges connecting the two synsets

Example: - 4 different hypernym paths between synsets ‘country’ and ‘region’ - ‘country’ has 5 senses

value of isa (country, region) = 0.8

Page 23: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Lexico-syntactic patterns - Hearst

Aim: the acquisition of hyponym lexical relations from text

Uses a set of predefined lexico-syntactic patterns which

• occur frequently and in many text genres

• indicate the relation of interest

• can be recognized with little or no pre-encoded knowledge

Principle idea: match these patterns in texts to retrieve is_a relations

Precision with respect to WordNet: 55,45%

Page 24: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Lexico-syntactic patterns - Hearst

NPo such as {NP1, NP2,…, (and | or)} NPn

‘Vehicles such as cars, trucks and bikes….’

such NP as {NP,} * { (or | and) } NP

‘Such fruits as oranges, nectarines or apples…’

NP {, NP} * { , } { or | and } other NP

‘Swimming, running, or/and other activities…’

vehicle

carbike

truckis-a

is-a is-a

fruit

applenectarine

orangeis-a

is-a is-a

is-a

activity

swimmingrunning

is-a

Page 25: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

NP { , } including {NP, } * { or | and } NP

‘Injuries, including broken bones, wounds and bruises…’

NP { , } especially {NP, } * { or | and } NP

‘Publications, especially papers and books…’ publication

bookpaper

is-ais-a

Lexico-syntactic patterns - Hearst

injury

bruisewound

broken boneis-a

is-a is-a

Page 26: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Machine Readable Dictionaries

A method for extracting taxonomies which goes back

to the 80’s Main idea: exploit the regularity of dictionary entries to

find a suitable hypernym for the defined word

spring “the season between winter and summer and in which leaves and flowers appear”

Example:

is_a (spring, season)

Page 27: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

MRDs: Exceptions

The hypernym can be preceded by an expression such as ‘a kind of’, ‘a sort

of’, or ‘a type of’ The problem is solved by keeping an exception list with words such as ‘kind’,

‘sort’, ‘type‘ and taking the head of the NP following the preposition ‘of’

The word can be defined in terms of a part-of or membership relation

republican : “a member of a political party advocating republicanism” Example:

is_a (republican, political party) part_of (republican, political party)

hornbeam: “a type of tree with a hard wood, sometimes used in hedges” Example:

is_a (hornbeam, tree)

Page 28: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Co-occurrence analysis

A certain term t1 is more special that a term t2, if

t2 also appears in all the documents in which t1

appears.

( , )( | )

( )

n x yP x y

n y

Term x subsumes term y iff P(x | y) 1, where

n(x,y) the number of documents in which x and y co-occurn(y) the number of documents that contain y

Document-based subsumption

Page 29: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Linguistic Approaches

Modifiers typically restrict or narrow down the meaning of the modified noun

is_a (international credit card, credit card)Example:

Page 30: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Part 5 Relations (non-taxonomic)

Axioms & Rules

Relations

Taxonomy (Concept hierarchies)

Concepts

Synonyms

Terms

cure (domain:Doctor, range:Disease)

Page 31: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Extracting relations & attributes

Specific relations

• Part-of

• Qualia (Formal, Constitutive, Telic, Agentive)

General relations

• Exploiting linguistic structure

Attributes

Page 32: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Learning attributes: Introduction

Attributes relations with a datatype as range

Typically expressed in texts using preposition ofof, the verb havehave or genitivegenitive constructs, e.g. ‘the color of the car’, ‘the car’s color’, ‘every car has a color’

Values of attributes are expressed using copulacopula constructsconstructs, adjectivesadjectives or expressionsexpressions specific specific to the attribute in question, e.g.,

• ‘the car is red’ (copula + value)

• ‘the red car’ (adjective)

• ‘the baby weights 3 kgr’ (specific expressions)

Page 33: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Classification of attributes

To systematize the learning process attributes are classified according to their range

Page 34: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

An approach to learning attributes

Tokenize & part-of-speech tag the corpus Apply the following patterns to extract adjective/noun pairs

(\w+{DET})? (\w+{NN}) + is{VBZ} \w + {JJ}

(\w+{DET})? \w + {JJ} (\w+{NN}) +

These pairs are weighted using conditional probability:

For each of the adjectives we look up the corresponding attributes in WordNet

f(n,a): joint frequency of adjective a and noun nf(n): the frequency of noun n

JJ: adjective DET: determinerNN: noun VBZ: verb, 3rd person singular present

Page 35: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

“meronymy” / “part-of” relations

whole NN[-PL] ‘s POS part NN[-PL]

part NN[-PL] of PREP {the|a} DET mods [JJ|NN]* whole NN

Format type_of_word TAG type_of_word TAG…

NN = Noun NN-PL = Plural Noun

PREP = Preposition POS = Possessive

JJ = Adjective

e.g. …building’s basement…

e.g. …basement of a building… 55% accuracy55% accuracy

Given a “seed” word find parts of that word in a large corpus of text

Page 36: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Qualia structures

The meaning of a lexical element is described in terms of four roles:

Constitutive

Agentive

Formal

Telic

physical properties of a object (e.g., weight, material, parts)

typically a verb denoting an action which brings the object in existence

normally consists in typing information about the object (e.g., hypernym)

the purpose or function of an object either by a verb or by a nominal

Formal: artifact_tool

Constitutive: blade, handle,…

Telic: cut_act

Agentive: make_act

Example: Qualia structures for knife

Page 37: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Qualia Structures: Learning Approach

aim: to automatically learn qualia structures from the WWW

Based on the idea of matching certain lexico-syntactic patterns conveying a standard relation

Page 38: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Clues: search engine queries indicating the relation of interest

Calculate the weight of a candidate qualia element e for the term t using Jaccard coefficient:

Qualia Structures: Learning Process

Generate Clues

Download GoogleAbstracts

POS-tagging

Matching regularexpressions

Statistical Weighting

Word

Weighted QS

( )

( ) ( ) ( )

GoogleHits e t

GoogleHits e GoogleHits t GoogleHits e t

Page 39: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Qualia Structure: Patterns (1/2)

Formal RoleFormal Role

Telic RoleTelic Role

Page 40: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Qualia Structure: Patterns (2/2)

Constitutive RoleConstitutive Role

Page 41: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Relations by syntactic analysis

SubjToClass_PredToSlot_DObjToRange

Maps a subject to the domain, the predicate or verb to a slot or relation and the object to its range.

Example:

OntoLT

‘The player kicked the ball to the net’

relation: kick (domain: player, range: ball)

Page 42: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

RelExt A tool for Relation Extraction

identifies relevant triples (pairs of concepts connected by a

relation) over concepts from an existing ontology

is based on the fact that verbs express a relation between two

classes that specify the domain and range

extracts relevant verbs & their grammatical arguments and

computes corresponding relations through a statistical &

linguistic processing

was developed in the context of SmartWeb project to provide

intelligent information services in the FIFA World Cup 2006

Page 43: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

RelExt: Linguistic processing

Corpus

NER &Concept Tagging

Linguistic annotation

Annotatedcorpus

● Linguistic annotation

the SCHUG system was used

provides a multi-layer XML format for a given text

dependency structure, lemmatization, POS

● NER (Name Entity Recognition)

performed to map instances of football players to existing ontology classes

●Concept tagging

maps synonyms for given terms to the corresponding ontology concepts

Page 44: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

RelExt: Statistical Processing

Relevance Measure• χ2 test used to compute relevance

ranking Coocurence measure Relation Extraction

RelevanceMeasure

FrequenciesIn BNC, NZZ

Relevance ScoresHeads, Preds

Cooccurence Scores

Heads <> Preds

Cooccurencemeasure

Page 45: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Part 6 Axioms & Rules

Axioms & Rules

Relations

Taxonomy (Concept hierarchies)

Concepts

Synonyms

Terms

x, y (sufferFrom(x, y) ill(x)

Page 46: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

DIRT Discovery of Inference Rules from Text

an unsupervised method for discovering inference rules from text, such as

X is author of Y X wrote Y,X caused Y Y is blamed on XX manufactures Y X’s Y factory

Is based on the assumption that:

Words that occurred in the same contexts tend to be similar

Distributional Hypothesis

Page 47: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

DIRT: Distributional Hypothesis

Distributional Hypothesis is applied to dependency tress

If two paths tend to link the same sets of words, their meanings are hypothesized to be similar

Page 48: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

DIRT: Dependency trees

The inference rules discovered by DIRT are between paths in dependency trees

Are generated by Minipar parser

Minipar represents its grammar as a network where nodes represent grammatical categories and links syntactic relationships A subset of the dependency relations in Minipar output

Page 49: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

DIRT: Dependency trees“John found a solution to the problem”

pcomp

found

a

solution

to

problem

the

John

moddet

subj obj

det

Links represent dependency relationships

Direction: from the head to the modifier

Labels represent types of dependency relations

Each link between two words represents a direct semantic relationship

Path between “John” and “problem”

N:subj:V find V:obj:N solution N:to:N

meaning “X finds solution to Y”

Page 50: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

DIRT: Paths in Dependency Trees

Connect the prepositional complement directly to the words modified by the preposition

transformation rule

Each link between two words represent a direct semantic relationship

A path represents indirect semantic relationships between two content words

Page 51: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Ontology Learning Tools

Text2OntoText2Onto• Open source (Java)

• http://ontoware.org/projects/text2onto

OntoLTOntoLT• Open source (Protégé plug-in, Java)

• http://olp.dfki.de/OntoLT/OntoLT.htm

OntoGenOntoGen• Open source (C++, .NET)

• http://www.textmining.net

Page 52: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Text2Onto: Main Features

Learn primitives independent of a specific KR language (Probabilistic Ontology Model, POM)

System calculates a confidence for each learned object for better user interaction

Updates the learned knowledge each time the corpus is changed and avoid processing it by scratch

Allows for easy • combination of algorithms,

• execution of algorithms,

• writing new algorithms

Page 53: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Text2Onto: Algorithms used

Concepts

• Statistical measures, e.g. TFIDF, C-value/NC-value,…

Subclass_of relations

• Exploits hypernym relations from WordNet

• Hearst patterns

Mereological relations (part-of) General relations: extracts the following syntactic frames:

• Transitive, e.g., love(subj, obj)

• Intransitive + PP-complement, e.g., walk(subj, pp(to))

• Transitive + PP-complement, e.g., hit(subj, obj, pp(with))

Instance-of Equivalence

Page 54: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Text2Onto: screenshot

Page 55: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

OntoGen : Techniques used

Linear Dimensionality Reduction (a.k.a LSI)

• words related to the same topic co-occur together

more often than words related to different topics

• Result: clusters of words each describing one topic

K-means clustering algorithm

• Partitions the corpus into k clusters so that two

documents within the same cluster are more closely

related than two documents from different clusters

Page 56: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

OntoGen: screenshot

Page 57: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Onto-LT

A Protégé plug-in with which classes and

relations can be extracted from a linguistic

annotated text collection

Provides mapping rules that allow for a mapping

between linguistic entities and class/slots

candidates in Protégé

Page 58: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Onto-LT: Mapping rules

HeadNounToClass_ModToSubClassHeadNounToClass_ModToSubClass

Maps a head-noun to a class and in combination with its modifier(s)to one or more sub-class(es)

Maps a linguistic subject to a class, its predicate to a corresponding

slot for this class and the direct object to the “range” of the slot

SubjToClass_PredToSlot_DObjToRangeSubjToClass_PredToSlot_DObjToRange

Page 59: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Onto-LT: System architecture

Page 60: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Onto-LT: screenshot

Page 61: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and
Page 62: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Conclusions

A detailed methodology that guides the ontology learning process does not exist

Only general guidelines are provided

No complete correspondence between the methods and the tools

Methods are based mainly on NLP techniques complemented with statistical measures

Tools give only support to perform some of the steps proposed in different approaches (except Text2Onto)

Page 63: Methods & Tools Ontology Learning from Text 18/5/2007 Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and

Some References…

Cimiano, P. Ontology Learning and Population from Text: Algorithms, Evaluation and Applications. Springer, 2006

Hearst, M.A., Automatic Acquisition of Hyponyms from Large Text Corpora. In: Proceedings of the 14th International Conference on Computational Linguistics, pp. 539-545, 1992

Gómez-Pérez, A., & Manzano-Macho, D., An overview of methods and tools for ontology learning from text, The

Knowledge Engineering Review, Vol. 19:3, 187-212, 2005. P. Cimiano, J. Wenderoth, Automatically Learning Qualia

Structures from the Web. In: Proceedings of the ACL Workshop on Deep Lexical Acquisition, pp. 28-37, 2005