4 North Park • Suite 106 • Hunt Valley, MD 21030 • 410-584-0009 • www.revelytix.com
Ontology Based Information Management
MatchIT 1.1: Data Integration with
Semantic Mapping Technologies
Michael Schidlowsky
Sr. Software Architect
Data Integration
Motivated by:
• Organizational Changes
Mergers and Acquisitions
Internal reorganizations (e.g., DHS)
• Data Mining
• Standards Conformance
• Migration Efforts
• Legacy Systems
• Decouple data sources from application code
Data Integration
Challenges for integration specialist include:
• Domain-specific terms
• Unfamiliarity with source schemas
• Large size of schema set
• Semantics often not captured
• Captured semantics
Stored in ad-hoc formats
Cannot be reused to facilitate future data integration efforts
Data Integration: ExampleBackground:
Acme Inc., merges with CompuGlobalHyperMeganet.
Technical Challenge:
Need “Virtual Database” of all sales for all stores in real-time.
• Which fields represent customers?
CUSTOMERID
CUST_ID
SSN
• Which fields represent ‘Price’?
Sale_Amt
Total_Sale
• What if your database has 10,000 columns?
Data Integration: ExampleBackground:
HR needs to use employee information for new company portal.
Technical Challenge:
Data must be in XML and conform to standard HR schema.
• Find all fields related to Address?
RESIDENCE
PREV_RESIDENCE
• What if your database has 10,000 columns?
Ideal Matching Solution• Finds lexical relationships
• Captures semantic information
• Finds semantic relationships
• Provides programmatic access to results (API)
• Fast
• Scalable
• Human Involvement
MatchIT Philosophy
Best Matching tool already exists!
What is meant by “ID”?
MatchIT Philosophy
Best Matching tool already exists!
What is meant by “ID”?
- “PLEASE PRESENT ID”
MatchIT Philosophy
Best Matching tool already exists!
What is meant by “ID”?
- “PLEASE PRESENT ID”
- NY, NJ, ID
MatchIT Philosophy
Best Matching tool already exists!
What is meant by “ID”?
- “PLEASE PRESENT ID”
- NY, NJ, ID
- SUPEREGO, EGO, ID
MatchIT 1.1
- MatchIT is a semantic and lexical matching tool.
- Session Outline:
- Import and process schemas
- Perform lexical matching
- Create and manage a semantic vocabulary
- Perform semantic matching
- Demonstrate 3rd Party integration with Data Integration tool (MetaMatrix)
Import & Process SchemasRevelytix Models are RDF/OWL
• Flexible model architecture
• Extensible
• Interoperable
Current Importers:
• JDBC
• XML Schema
• MetaMatrix XMI ModelsImporter Demo
Lexical Matching
Uses lexical distance measures to determine lexical similarity.
• Fastest matching technique
• Requires no work other than importing schemas
• Often yields interesting results
Lexical Matching Demo
Create Vocabulary from Schemas
A Vocabulary is
• A set of symbols
• Occurrences of those symbols in your schemas
• Binding of each symbol to one or more semantic concepts
• Created by MatchIT from schemas using tokenization algorithms.
• Reusable
Tokenization AlgorithmsDifferent schemas require different tokenization techniques.
Tokenization algorithms determine how symbols are extracted from schemas:
• Capitalization
• Delimiters
• English Language
Vocabulary Demo
Matching Techniques
MatchIT currently uses two types of matching techniques:
• Lexical Matching
Attempts to determine similarity based on the lexical distance between them.
• Semantic Matching
Attempts to determine similarity based on the ontological distance between them within a semantic knowledge base.
Parts Supplier Schema(as seen by a person)
Parts Supplier Schema (as seen by a computer)
Semantic Matching
How semantically similar are two concepts?
car
motor vehicle
self-propelled vehicle
wheeled vehicle
vehicle
craft
aircraft
heavier-than-air craft
airplanetruck
is a
is a
is a
is a is a
is a
is a
is a
is a
car and truck are very similar
Car and airplane are less similar
Semantic Matching
Uses knowledge base distance measures to determine semantic similarity.
• Presents ranked candidate matches
• Based on semantics captured in Vocabularies
• The only way to effectively find relationships between lexically dissimilar symbols:
GenderCode SexCode
Provider Supplier
Amount Quantity Semantic Matching Demo
3rd Party Integration
MatchIT Integration
• MatchIT Java API
• Stand-alone application
• Embeddable application (as Eclipse plug-ins).
• Hides unapproved matches
• Useful for various 3rd Party applications:
- Data Integration
- Data Discovery
- Ontology Mediation
- Search
- Metadata Management
- Data Cleansing
MetaMatrix Demo
4 North Park • Suite 106 • Hunt Valley, MD 21030 • 410-584-0009 • www.revelytix.com
Ontology Based Information Management
Questions?
MatchIT 30-day trial available at http://www.revelytix.com
Michael Schidlowsky