Upload
neci
View
31
Download
4
Embed Size (px)
DESCRIPTION
Learning to Map between Ontologies on the Semantic Web. AnHai Doan, Jayant Madhavan , Pedro Domingos, and Alon Halevy Databases and Data Mining group University of Washington. Semantic Web. Mark-up data on the web using ontologies Enable intelligent information processing over the web - PowerPoint PPT Presentation
Citation preview
Learning to Map between Learning to Map between Ontologies on the Semantic Ontologies on the Semantic WebWeb
AnHai Doan, Jayant Madhavan,Pedro Domingos, and Alon Halevy
Databases and Data Mining group
University of Washington
Semantic WebSemantic Web
Mark-up data on the web using ontologies
Enable intelligent information processing over the web Personal software agents Queries over multiple web pages …
An ExampleAn Example
Semantic Mappings allow information processing across Semantic Mappings allow information processing across ontologiesontologies
www.cs.washington.edu www.cs.usyd.edu.au
James CookPhD, U Sydney
Data Instance
Find Prof. Cook, a professor in a Seattle college, earlier an assoc. professor at his alma mater in Australia
Semantic Mapping
People
Staff Faculty
Professor Assoc. Professor
Asst. Professor
Academic Technical
Professor SeniorLecturer
Lecturer
Staff
NameEducation
… …
Semantic Web: State of the ArtSemantic Web: State of the Art
Languages for ontologies RDF, DAML+OIL,…
Ontology learning and Ontology design tools [Maedche’02], Protégé, Ontolingua,…
Semantic Mappings crucial to the SW vision [Uscold’01, Berners-Lee, et al.’01]
Without semantic mappings…Tower of Without semantic mappings…Tower of Babel !!!Babel !!!
Semantic Mapping Semantic Mapping ChallengesChallenges
Ontologies can be very different Different vocabularies, different design principles Overlap, but not coincide
Semantic Mapping information Data instances marked up with ontologies Concept names and taxonomic structure Constraints on the mapping
OverviewOverview
People
Staff Faculty
Professor
Assoc. Professo
r
Asst. Professo
r
Academic Technical
Professor
SeniorLecturer
Lecturer
Staff
Faculty
Define Similarity
Sim(Fac,Staff
)Sim(Fac,Prof)
Sim(Fac,Acad)
ComputeSimilarity
SatisfyConstraints
Academic?
Our ContributionsOur Contributions An automatic solution to taxonomy matching
Handles different similarity notions Exploits information in data instances and taxonomic
structure, using multi-strategy learning
Extend solution to handle wide variety of constraints, using Relaxation Labeling
An implementation, our GLUEGLUE system, and experiments on real-world taxonomies
High accuracy (68-98%) on large taxonomies (100-330 concepts)
Defining Similarity Defining Similarity
Multiple Similarity measures in terms of the JPDMultiple Similarity measures in terms of the JPD
Assoc. Prof Snr. Lecturer
A,S
A, S
A,S
A,S
P(A,S) + P(A,S) + P(A,S)
P(A,S)=
Joint Probability Distribution: P(A,S),P(A,S),P(A,S),P(A,S)
HypotheticalCommonMarked updomain
P(A S)
P(A S)Sim(Assoc. Prof., Snr. Lect.) =
[Jaccard, 1908]
No common data instancesNo common data instances
In practice, not easy to find data tagged with both ontologies !
United States Australia
Solution: Use Machine LearningSolution: Use Machine Learning
A
A
S
S
Machine Learning for computing Machine Learning for computing similaritiessimilarities
JPD estimated by counting the sizes of the partitionsJPD estimated by counting the sizes of the partitions
CLS
S
S
United States AustraliaA
A
S
S
CLA
A
A
A,S A,S
A,S A,S
A,S A,S
A,S A,S
Improve Predictive Accuracy – Use Improve Predictive Accuracy – Use Multi-Strategy Learning Multi-Strategy Learning
Single Classifier cannot exploit all available information
Combine the prediction of multiple classifiersCombine the prediction of multiple classifiers
CLA1
A
A
A
A
CLAN
A
A
…Content Learner
Frequencies on different words in the text in the data instances
Name LearnerWords used in the names of concepts in the taxonomy
Others …
Meta-Learner
So far…So far…
Define Similarity
ComputeSimilarity
SatisfyConstraints
Joint ProbabilityDistribution
Multi-strategyLearning
Constraints due to the taxonomy structure
Domain specific constraints Department-Chair can only map to a unique
concept
Numerous constraints of different types
StaffPeople
Next Step: Exploit ConstraintsNext Step: Exploit Constraints
Staff Fac
Prof Assoc. Prof Asst. Prof
Acad Tech
Prof Snr. Lect. Lect.
PeoplePeople StaffStaffParentsParents
ChildrenChildren
Extended Relaxation Labeling to ontology matchingExtended Relaxation Labeling to ontology matching
Solution: Relaxation LabelingSolution: Relaxation Labeling
Find the best label assignment given a set of constraints
Start with an initial label assignment Iteratively improves labels, given constraints
Standard Relaxation Labeling not applicable Extended in many ways
Acad
StaffPeople
Staff Fac
Prof Assoc. Prof Asst. Prof
Tech
Prof Snr. Lect. Lect.
PeoplePeople
ProfProf Assoc. Assoc. ProfProf
Asst. Asst. ProfProf
FacFac
ProProff
?
StaffStaff
Snr. Lect.Snr. Lect. Lect.Lect.
StaffStaff
AcadAcad
ProProff
Snr. Lect.Snr. Lect. Lect.Lect.
?
Distribution Estimator
Joint Distributions:P(A,B),P(A,B),…
Taxonomy O2
(structure + data instances)Taxonomy O1
(structure + data instances)
Putting it all together Putting it all together GLUE System GLUE System
Relaxation Labeler
Generic & Domain constraints
Mappings for O1 , Mappings for O2
Similarity Estimator
Similarity Matrix
Similarity function
DistributionEstimator
Meta Learner
Learner CL1 Learner CLN
Real World ExperimentsReal World Experiments Taxonomies on the web
University classes (UW and Cornell) Companies (Yahoo and The Standard)
For each taxonomy Extracted data instances – course descriptions, and company
profiles Trivial data cleaning 100 – 300 concepts per taxonomy 3-4 depth of taxonomies 10-90 average data instances per concept
Evaluation against manual mappings as the gold standard
ResultsResults
0
10
20
30
40
50
60
70
80
90
100
Cornell to Wash. Wash. to Cornell Cornell to Wash. Wash. to Cornell Standard to Yahoo Yahoo to Standard
Mat
chin
g a
ccu
racy
(%)
Name Learner Content Learner Meta Learner Relaxation Labeler
University I University II Companies
Related WorkRelated Work Our LSD schema matching system [Doan,
Domingos, Halevy ’01] GLUE handles taxonomies, richer models, and a
much richer set of constraints
Other Ontology and Schema Matching work [Noy, Musen’01], [Melnik, et al.’02], [Ichise, et al.’01] Mostly heuristics, or single machine learning
techniques
Relaxation Labeling for constraint satisfaction [Hummel, Zucker’83], [Chakrabarti, et al.’00] Significantly extend this approach
Conclusions & Future WorkConclusions & Future Work
An automated solution to taxonomy matching Handles multiple notions of similarity Exploits data instances and taxonomy structure Incorporates generic and domain-specific constraints Produces high accuracy results
Future Work More expressive models Complex Mappings Automated reasoning about mappings between
models