View
222
Download
1
Category
Tags:
Preview:
Citation preview
A Domain Ontology Engineering Tool with General Ontologies and Text Corpus
Naoki Sugiura, Masaki Kurematsu,
Naoki Fukuta,Naoki Izumi, &
Takahira Yamaguchi
DODDLE and DODDLE II
Domain Ontology rapiD DeveLopmet Environment
Builds taxonomic and non-taxonomic relationships
Uses dictionary approach and text corpus (body) to build relationships
DODDLE & DODDLE II
Large Ontologies are difficult to build by hand
Locates relationships between words based on context similarities; even if separated
Disadvantages Human Interaction is still required Low amount of success
DODDLE vs DODDLE II
DODDLE only works on taxonomic relationships
DODDLE II Extension of DODDLE Finds non-taxonomic relationships
Outline
Overview Taxonomic Relationships Non-Taxonomic Relationships Case Studies Problems/Future Work Conclusion Assessment
OverviewDomain Terms
Domain Specific Text Corpus
Domain Specific Text Corpus
Concept Extraction
Module
NTRL ModuleTRA Module
Overview TRA Module
Matched Result Analysis
Trimmed Result Analysis
Modification using syntactic strategies
Taxonomic Relationship
MRD(Wordnet)
MRD(Wordnet)
Overview NTRL Module
Extraction of frequent words
WordSpace creation
Extraction of similar concept pairs
Non-Taxonomic Relationship
Concept specification templates
Domain Specific Text Corpus
Domain Specific Text Corpus
TRA Module
Matched Result Analysis
Trimmed Result Analysis
Modification using syntactic strategies
Taxonomic Relationship
MRD(Wordnet)
MRD(Wordnet)
TRA
Matched Result Analysis Constructs PAB and STM
Trimmed Result Analysis Remove unnecessary nodes
Modification using statistical strategies Allows for human input
NTRL Module
Extraction of frequent words
WordSpace creation
Extraction of similar concept pairs
Non-Taxonomic Relationship
Concept specification templates
Domain Specific Text Corpus
Domain Specific Text Corpus
NTRL
Extraction of key words Primitive: 4 words Collocation matrix
ai,j = fi before f j …f8 f4 f3 f7 f8f4 f1 f3 f4 f9 f2f5 f1 f7 f1 f5 …
…f8 f4 f3 f7 f8f4 f1 f3 f4 f9 f2f5 f1 f7 f1 f5 …
NTRL
o WordSpace Creation Context Vectors Word Vectors
Sum of Context Vectors г(w)=∑ ( ∑ φ(f))
iε C(w) f close to i
A vector representation of a word of phrase w
a 4-gram vector of a 4 gram f
Appearance places of a word or phrase w
WordSpace is a collocation of г(w)
NTRL
Extraction of Concept Pairs Each input has a best-matched “synset”
Synset: collection of word vectors Sum of the word vectors set to a concept which
corresponds with each input term Inner product of all combinations of concept
pairs Match is determined by user set threshold
Case Study: .87
NTRL
Constructing Concept Specification Templates Set of Similar concept pairs and
association rules DODDLE sets priorities between
concept pairs Based on TRA Module and Co-occurrence
information
Case Study
Law-“Contract for International Sale of Goods”
Business -“XML Common Business Library”
Support: 0.4 %Confidence: 80%
Law Case Study
Given 46 Concepts WordSpace: 77 concept pairs Association between input terms: 55
pairs or terms Templates
Taxonomic Results
Bus. Precision Recall per path
Recall per subtree
Matched Result
.2 .29 .71
Trimmed Result
.22 .13 .5
Law Precision Recall per path
Recall per subtree
Matched Result
.25 .23 .19
Trimmed Result
.3 .3 .15
Non-taxonomic Results
Law WS AR Join of WS and AR
# Extracted Concept Pairs
77 55 117
# Accepted Concept Pairs
18 13 27
Precision .23 .24 .23
Recall .38 .27 .56
Bus. WS AR Join of WS and AR
# Extracted Concept Pairs
40 39 66
# Accepted Concept Pairs
30 20 39
Precision .75 .51 .59
Problems/ Future Work
Threshold Changes with each domain
Specification of a Concept Relation Still need to specify relationships
Ambiguity of Multiple Terminology “transmission” Semantic specialization of multi-definition
words needed. DODDLE-R
Uses RDF tags
Conclusion
Uses MRD and text corpus Two strategies for taxonomic: matched
result analysis and trimmed result analysis
Non-Taxonomic: extracted by co-occurrence information in text corpus
Concept Specification: a way to eliminate concept pairs to build an ontology
Recommended