Upload
nathan-meyer
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
An Ontology Creation Methodology: A Phased Approach
Jon Atle GullaNorwegian University of Science and Technology; Norway
Vijay SugumaranOakland University, [email protected]
Agenda
• Ontology development• Traditional ontology learning• Limitations of ontology learning• A phased approach to ontology learning
The Challenge• How to develop large complex ontologies?• How to keep ontologies updated in dynamic domains?
Ontology Modeling vs. Learning• Traditional ontology engineering
approach– Project:
Form team of ontology and domain experts
– Ontology & domain experts:Collaborative manual modeling process
– Domain experts:Verify ontology against domain knowledge
– Ontology experts:Verify ontology against syntactic and semantic quality measures
• Expensive and time-consuming approach
• Stable domains assumed
• Ontology learning approach:– Domain experts:
Find representative domain text– Tool:
Extract candidate classes, individuals and properties automatically from domain texts
– Ontology & domain experts:Verify candidate structures and complete ontology
• Can also be used to verify domain quality of existing ontology
• Cost-effective approach• Not unproblematic in dynamic
domains
Agenda
• Ontology development• Traditional ontology learning• Limitations of ontology learning• A phased approach to ontology learning
Ontology Learning Basis• People communicate using domain-specific concepts• People document using domain-specific concepts• Ontology learning: Extract ontology structures from written documentation
• Requirements:– Documents representative for domain terminology– Documents cover all the terminology– Well-defined and consistent use of terminology in domain
Ontology discussions
Realm ofontology learning
Realm ofontology engineering
Ontology in use
Levels of Ontology Learning
TermsTerms
SynonymsSynonyms
ConceptsConcepts
Concept hierarchiesConcept hierarchies
RelationsRelations
RulesRules
sponsors, costs, charter
(leader, manager, lead)
PROJECT
is_a(MANAGER, EMPLOYEE)
FINANCE(ag:SPONSOR, go: PROJECT)
x,y(manager(x,y) → report(y,x))
Degree ofdifficulty
Ontology Learning Strategies• Term extraction
– Linguistic analysis– Statistical analysis
• Synonyms– Classification-based techniques– Distribution-based techniques
• Concept formation– Structure recognition– Keyphrase generation– Instance learning
• Concept hierarchy– Clustering– Lexico-syntactic patterns– Head-modifier approaches– Subsumption approaches– Classification-based techniques
• Relations– Association rules
– Concept vectors
• Rules– Structure recognition for meta-
property recognition
– Dependency trees and path similarities
Ontology Learning Process
Scope managementWBSBusiness needConstituent componentsProduct description...
Domain text
Reference set
Concept candidates
PMBOK
Search ontology
Abstract elementsConstraintsPropertiesRules
Automatic extraction of concept and relationship candidates Manual selection of candidates and completion of model
Ex 1. Learning Concept/Individual Candidates
Scope planning is the process of progressively elaborating and documentingthe project work (project scope) that produces the product of the project.
Scope/NNP planning/NN is/VBZ the/DT process/NN of/IN progressively/RB elaborating/VBG and/CC documenting/VBG the/DT project/NN work/NN (/( project/NN scope/NN )/) that/WDT
produces/VBZ the/DT product/NN of/IN the/DT project/NN ./.
Scope planning is the process of progressively elaborating and documenting the project work (project scope) that produces the product of the project.
Scope plan process progress elaborate document project work project scope produce product project
POS tagging
Stopword removal (571 words)
Lemmatization/stemming(POS tags not shown)
{scope planning, process, project work, project scope, product, project}Select consecutive nounsas candidate phrases
Calculate tf.idf score for phrases
{(scope planning, 0.0097), (project scope, 0.0047), (product, 0.0043), (project work, 0.0008), (project, 0.0001), (process, 0.0000)}
Classes Relevant to the Drama Genre
• Data sources: IMDB, Wikipedia, Videoload
• Keyphrase extraction technique• Noun phrases ranked according
to various statistical measures
TokenizerTokenizer
GATESentence
splitter
GATESentence
splitter
GATETaggerGATETagger
GATELemmatizer
GATELemmatizer
GATENoun phrase
extractor
GATENoun phrase
extractor
Noun phraseindexer
Noun phraseindexer
Associationrulesminer
Associationrulesminer
Association rules
Concept profiles
Conceptsimilarity
calculation
Conceptsimilarity
calculation
Conceptprofilebuilder
Conceptprofilebuilder
LuceneDocument
indexer
LuceneDocument
indexer
LuceneParagraph
indexer
LuceneParagraph
indexer
LuceneSentenceindexer
LuceneSentenceindexer
Lightstemmer
Lightstemmer
Relationshipmerger
Relationshipmerger
Ex 2. Learning Relationship Candidates
sim (q,d)(qi*di)
i1
n
qi2
i1
n
* di2
i1
n
(
qq)(
dd)sim (q,d)
(qi*di)i1
n
qi2
i1
n
* di2
i1
n
(
qq)(
dd)
YX , where YXIYIX ,, Ø
A rule YX holds in the transaction set D with confidence c if c% of the transactions in D that contain X also contain Y. The rule YX has support s in the transaction set D if s% of the transactions in D contains YX .
Relationships Relevant to Drama Genre
• Association rules on extracted concepts
Automatic OWL Generation
Agenda
• Ontology development• Traditional ontology learning• Limitations of ontology learning• A phased approach to ontology learning
Limitations of Ontology Learning
• Different techniques produce different results• Different data sources produce different results• Lost control over process• Extensive verification of
final ontology needed• New data hard to combine
with old data
Agenda
• Ontology development• Traditional ontology learning• Limitations of ontology learning• A phased approach to ontology learning
Ontology Learning for Entertainment Domain
Ontology evolution for DeutscheTelecom’s Videoload downloadservice
• What does Brangelina mean?• Should Pitt be Brad Pitt or Michael Pitt?• Actor vs. Schauspieler?• All movies of Brad Pitt?• Last movie of Pitt?
Ontology Learning Project
• Duration: Nov 2007 – Nov 2009• Domain: movie download service• Ontology analysis and creation based on indexed noun phrases from
movie documents• Ontology used for search and navigation on top of FAST search platform
• Ontology learning challenges:– Domain changes from one day to another
– No consistent domain terminology
– No professional domain terminology
– Multiple languages
– Movies about anything... unlimited domain
– Ontology needs to be up to date to support search
Ontology Workbench
• 3 phases that are carried out independently– Crawling into Lucene indices
– Supervised extraction of candidates
– Combining candidates into ontology structures
Collection Analysis Creation
Web document
s
Web document
s
Web document
s
CrawlingLinguistic pre-
processing
Statistical ontology extraction
Set-theoretical ontology
operations
OWL creation
Document & term
statistics
Extracted ontology elements
OWL ontology
Document ontology
Interactive Ontology Development
Expandable indices
Subset of data source
Focus of analysis
List of techniques
Partial results
Stored results
Set operations for combining results
Thank you