Upload
alice-beverly-knight
View
213
Download
0
Embed Size (px)
Citation preview
1
Ontology Enabled Data Discovery and Integration
Kai LinSan Diego Supercomputer Center
University of California, San Diego
A. K. Sinha, Z. Malik, A. Rezgui, A. DaltonVirginia Tech
2
Motivations
• A better way to discover and understand datasets
Use the knowledge in ontologies to find datasets
• A better way to query datasets
Query through ontologies without knowing the schemas
• A better way to integrate multiple datasets
Integrate multiple datasets on-the-fly if they are mapped to ontologies
3
What Is Ontology
A formal, explicit specification of a shared conceptualization
unambiguous definitionof all concepts, attributes
and relationships
machine-readability commonly accepted
understanding
conceptual modelof a domain
4
Why Represent Domain Knowledge as Ontology
• Separate domain knowledge module from the operational module
• Configurable knowledge module
• Share and reuse domain knowledge
• Analyze domain knowledge
5
What’s Inside An Ontology?
• Concepts: Classes + Class-hierarchy– instances
• Properties: often also called “Roles” or “Slots”– labeled instance-value-pairs
• Axioms/Relations:– relations between classes (disjoint, covers)– inheritance (multiple? defaults?) – restrictions on slots (type, cardinality)– Characteristics of slots (symm., trans., …)
• reasoning tasks: – Classification: Which classes does an instance belong to? – Subsumption: Does a class subsume another one?– Consistency checking: Is there a contradiction in my axioms/instances?
6
Resource Description Framework (RDF)
XML Schema is not enough for semantics• only describe Grammar, i.e. syntax of single documents• can not express inheritance for concepts• no means to express complex integrity constraints• in an unambiguous way
Resource Description Framework (RDF) an infrastructure for the encoding, exchange and reuse of structured metadata
<document href=”page.html”> <author>Peter Morris</author></document>
<author> <fistName>Peter</fistName> <lastName>Morris</lastName> <documents> <uri>page.html</uri> </documents></author>
The author of ‘page.html‘ is Peter Morris
What is the “correct” way of expressing it?
7
RDF IdeaRDF is intended to provide a simple way for making statements about resources
Resources objects that are uniquely identified by an URI (Uniform Resource Identifier)
• Anything can have a URI.• an entire Web page, • a whole collection of pages e.g. an entire Website, • object that is not directly accessible via the Web such as a printed book.
Property a specific aspect, characteristic, attribute, or relation used to describe a resource has a specific meaning, defines its permitted values
• Lives-In, CarColor, WorkFor, HasA, IncludedIn, hasAuthor…
Statement a specific resource together with a named property plus the value of that property for that resource. Each RDF statement can be written down as a triple (Subject, Property, Object) or a graph
Resource propertyValue
Resource
8
A RDF Example
<?xml version="1.0"?> <rdf:RDF xmlns:rdf = “http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:dc = “http://purl.org/dc/elements/1.1/”> <rdf:Description rdf:about = “http://www.polleres.net/page.html”> <dc:creator> <rdf:Description rdf:about = “http://www.polleres.net/peter”>
<hasName>Peter Morris</hasName> </rdf:Description> </dc:creator> </rdf:Description></rdf:RDF>
http://www.polleres.net/page.html
http://www.polleres.net/peter
Peter Morris
http://purl.org/dc/elements/1.1/creator
hasName
April 1,2004
creationDate
English
http://purl.org/dc/elements/1.1/language
9
A General RDF Format
value of property-A
value of property-B
<?xml version="1.0"?><Resource-A> <property-A> <Resource-B> <property-B> <Resource-C> <property-C> Value-C </property-C> </Resource-C> </property-B> </Resource-B> </property-A></Resource-A>
Convention:• A capital letter to start a type (class) name• A lowercase letter to start a property name
10
RDF Schema (RDFS)
Core Class • rdfs:Resource• rdfs:Literal• rdf:XMLLiteral• rdfs:Class• rdfs:Property• rdfs:DataType• rdfs:Container
Core Property• rdf:type• rdfs:subClassOf• rdfs:subPropertyOf• rdfs:domain• rdfs:range• rdfs:label• rdfs:comment
RDFS is a simple ontology language
• RDF: triples for making assertions about resources• RDFS extends RDF with “schema vocabulary”, e.g.:
– Class, Property– type, subClassOf, subPropertyOf– range, domain
representing simple assertions, taxonomy + typing
11
RDFS Example
Resource Class Property
HoverVehicle
Company
Number
Vehicle
SeaVehicleLandVehicle
subClassOf
subClassOfsubClassOf
subClassOfsubClassOf
type
producedBy
type
numberOfEngine
12
• RDFS too weak to describe resources in sufficient detail:– No localised range and domain constraints
• Can’t say that the range of hasChild is person when applied to persons and elephant when applied to elephants
– No existence/cardinality constraints• Can’t say that all instances of person have a mother that is also a
person, or that persons have exactly 2 parents
– No transitive, inverse or symmetrical properties• Can’t say that isPartOf is a transitive property, that hasPart is the
inverse of isPartOf or that touches is symmetrical
– No in/equality• Can’t say that a class/instance is the same as some other
class/instance, can’t say that some classes/instances are definitely disjoint/different.
– No boolean algebra• Can’t say that that one class is the union, intersection, complement of
other classes, etc.
Limitations of RDFS
13
OWL Language - Overview
• Three species of OWL– OWL DL stays in Description Logic fragment– OWL Lite is “easier to implement” subset of OWL DL – OWL Full is union of OWL syntax and RDF
• OWL DL based on Description Logic– In fact it is equivalent to SHOIN(Dn) DL
• OWL DL Benefits from many years of DL research– Well defined semantics– Formal properties well understood (complexity, decidability)– Known reasoning algorithms– Implemented systems (highly optimised)
• OWL full has all that and all the possibilities of RDF/RDFS which destroy decidability
Full
DL
Lite
14
Full
DL
Lite
• OWL Full • Allow meta-classes etc
•OWL DL•Negation (disjointWith, complementOf)•unionOf •Full Cardinality•Enumerated types (oneOf)
• OWL Light •(sub)classes, individuals•(sub)properties, domain, range•intersection•(in)equality•cardinality 0/1•datatypes•inverse, transitive, symmetric•hasValue•someValuesFrom•allValuesFrom
RDF Schema
OWL Layers (Lite, DL, Full)
15
Ontology Inconsistency
• You may define Classes were no individual can fulfill its definition. Via reasoning engines such a definition can be found also in big ontologies.
– Cow ≡ Animal ⊓ Vegetarian
– Sheep Animal ⊑– Vegetarian ≡ eats Animal
– MadCow ≡ Cow ⊓ eats.Sheep
16
Open/Close World Assumption
Close World Assumption– The fact in the ontology describe completely what I know, all that is not in the
ontology is assumed to be false..
Open World Assumption (used in OWL)– There are something not described by the ontology
An ontology says: There is a train at 14:00
There is a train at 15:00Is there a train at 17:00?
no by Close World Assumptionunknown by Open World Assumption
17
Resource Discovery in GEON
• A Resource Registration System for Data Providers– Register ontologies (domain knowledge)
– Register datasets with metadata including data access information
– Optionally register datasets to ontologies (which is crucial for data integration and smart search)
• A Search Engine for Data Users– Metadata based search
– Spatial coverage based search
– Temporal coverage based search
– Concept based search
• Both are available through a public portal on the web
18
Metadata(ADN)
GEON Data Registration System
Resource Registration System
SRBMetadata
(ADN)Metadata(ADN)Metadata
(ADN)Excel
GeoTIFF
Shapefile
Catalog
General Information Ontology Annotations
Access Control
SubjectsFormatKeywordsSpatial coverage'sTemporal coverage's…………
Integrated Resources
Log
Resource Metadata
GEON Search
Resource Schemas
19
Database Registration
Table
Table
Table
Table
View
View
Original Database
Table Def
Table Def View Def
Published Database select tables and
views to register
GEON Mediator
GEON JDBC Driver
Application
20
Write Protection
Mediator
Database
UPDATE B
• Only accepts SELECT statements• Rejects any requests other than SELECT
A
B
C
B
21
Read Protection on Unregistered Tables and Views
MediatorDatabase
SELECT *FROM A
An unregistered table or view is invisible to an end user• The data in the table can’t be viewed by SELECT statement • The schema can’t be fetched
A
B
C
B
22
Item Level Ontological Data Registration for Discovering
The search engine uses ontologies to find more results, for example, the fact that Polygon is a subclass of GeometricalObject is used in the searching.
Rectangle
CirclePolygon Surface
GeometricalObject_2D
Ontology: Dataset Properties
mentions uses has instances
Search for GeometricalObject_2D Return datasets associated with Polygon
23
Data Integration Challenges: Heterogeneities
• Syntactical Heterogeneity
heterogeneous data format
e.g. 02-04-2004 vs. 02/04/04
• Structural Heterogeneity
heterogeneous data models and schemas
e.g. 02-04-2004 is saved as three columns or one columns
• Semantics Heterogeneity
fuzzy metadata, terminology, “hidden” semantics, implicit assumptions
GEON Preferred Solution:• Datasets are semantically registered first• Heterogeneities is resolved by registration
24
Database Integration
Integration at three levels
Level 1: Federation Based Integration• Users should be knowledgeable to each databases
Level 2: View Based Integration• The intended users are somebody who want to do integration for
others or make integration results reusable
Level 3: Ontology Based Integration• The easiest way for end users
25
Level 1: Federation Based Integration
C
A B
G
D
F
E
C
A B
D
GF
E
Mediatorbackend
backendSELECT * FROM A, E WHERE ……
• Use SQL to query the federated database• Structural and semantic heterogeneity should be solved by users themselves
26
Level 2: View Based Integration
C
A B
G
D
F
E
CA B
D
GFE
Mediatorbackend
backendSELECT * FROM V, W WHERE ……
• Allow defining views on top of the federated databases• Allow hiding the original backend schemas• Integration results can be shared and reused
V W
27
Level 3: Ontology Based Integration
• Require ontology annotations for backend databases • Use simple ontology query language to query the integrated database• Users don’t need know the backend schemas and local semantics
C
A B
G
D
F
E
CA B
D
GFE
Mediatorbackend
backend Ontology Based Query
28
Ontology Enabled Data Integration
• Ontology Enabled Semantic Integration
Challenges for Computer Scientists and Domain Scientists
– Computer Scientists: build an integration system based on the ontological registration of datasets
– Domain Scientists: create domain ontologies– Data Providers: register datasets to ontologies
Ontology1 Ontology2 ontology3
dataset1 dataset2 dataset3 dataset4
29
Ontological Data Registration for Data integration
• Registering a dataset to an ontology for data integration is a procedure to generate a partial model of the ontology from the dataset itself
From registrationdataset
individuals ontology
p
Not all the constraints in the ontology are satisfied
by the generated individuals
30
• Associate one or more columns under an optional SQL condition to a selected class in the ontology
• Provide a mapping method if no explicit names of individuals should be generated
Registering Relational Tables to Ontology Classes
…… Latitude …… Longitude ……
23.5 47.9
…… …… …… …… ……
Location(23.5, 47.9) is the name of an individual of the class Location
Same name indicates the same location
RockSample GeologicAge ……
Jurassic/Triassic
Precambrian
…………
GeologicalAge
Precambrian Cenozoic Paleozoic
31
Registering Tables to Ontology Object Properties
• Associate two entities which are already registered to the domain class and the range class of a selected object property in the ontology
…… RockSampleID …… PERIOD ……
…… …… …… …… ……
Rock GeologicAgehasAge
32
ODAL (Ontological Database Annotation Language)
<odal:NamedIndividuals odal:id="RockSample" odal:database="VTDatabase"> <odal:Class odal:resource="http://geon.vt.edu#RockSample" /> <odal:Table>Samples</odal:Table> <odal:Table>RockTexture</odal:Table> <odal:Table>RockGeoChemistry</odal:Table> <odal:Table>ModalData</odal:Table> <odal:Table>MineralChemistry</odal:Table> <odal:Table>Images</odal:Table> <odal:Column>ssID</odal:Column> </odal:NamedIndividuals>
GUI
generateto ODALprocessor
The values in the column ssID of the table Samples, RockTexture, RockGeoChemistry, ModalData,MineralChemistry and Images represent instances of RockSample
• Create a partial model of ontologies from database• Independent on any GUI• Independent on any concrete implementations• reusable
33
ODAL: Import Ontologies
The Ontologies used for annotating a database can be imported as follows:
<?xml version="1.0"?> <odal:ODAL xmlns:rdf = “http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:odal = “http://www.sdsc.edu/odal#” ><odal:Ontology> <odal:Imports rdf:resource="http://www.library.org/Book.owl"/> <odal:Imports rdf:resource="http://www.writer.org/Writer.owl"/></odal:Ontology>
……
</odal:ODAL>
34
ODAL: Database Connection Declaration
The target databases for making annotation is declared as follows:
<?xml version="1.0"?> <odal:ODAL xmlns:rdf = “http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:odal = “http://www.sdsc.edu/odal#” >……<odal:Database odal:id="PublicationDatabase"> <odal:DatabaseProductName>Oracle<odal:DatabaseProductName> <odal:DatabaseProductVersion>9.1.21<odal:DatabaseProductVersion> <odal:Host>oracle.sdsc.edu</odal:Host> <odal:Port>3456</odal:Port> <odal:DatabaseName>Publications</odal:DatabaseName></odal:Database>……
</odal:ODAL>
35
ODAL: Simple Named Individuals
<odal:NamedIndividuals odal:id="BookInTableBookPrice" odal:database="PublicationDatabase" > <odal:Class odal:resource="http://www.amazon.com/Book.owl#Book"/> <odal:Schema>Collections</odal:Schema> <odal:Table>book-price</odal:Table> <odal:Column>ISBN</odal:Column></odal:NamedIndividuals>
Suppose the book ontology contains a class Book and the schema Collection contains a table book-price with a column ISBN.
odal:id gives a name to the declaration, and represents the set of the individuals generated by the statement.
The statement says that each value in the column ISBN represents a book individual.
36
ODAL: Named Individuals from Multiple Columns
<odal:NamedIndividuals odal:id="LocationInTableRockSample" > <odal:Class odal:resource="http://www.usgs.org/Space.owl#Location"/> <odal:Schema>California</odal:Schema> <odal:Table>Rock-Sample</odal:Table> <odal:Column>Latitude</odal:Column> <odal:Column>Longitude</odal:Column></odal:NamedIndividuals>
Suppose an ontology contains a class Location and a database table Rock-Sample with two columns Latitude and Longitude.
The statement says that a pair of latitude and longitude gives a location
37
ODAL: Named Individuals with Conditions
<odal:NamedIndividuals odal:id="MaleEmployeeInTableEmployee" > <odal:Class odal:resource="http://www.abc.com/Employee.owl#MaleEmployee"/> <odal:Table>employee</odal:Table> <odal:Column>EmployeeId</odal:Column> <odal:Condition><![CDATA[ Gender=’M’ >]]</odal:Condition></odal:NamedIndividuals>
<odal:NamedIndividuals odal:id="FemaleEmployeeInTableEmployee" > <odal:Class odal:resource="http://www.abc.com/Employee#FemaleEmployee"/> <odal:Table>employee</odal:Table> <odal:Column>EmployeeId</odal:Column> <odal:Condition><![CDATA[ Gender=’F’ >]]</odal:Condition></odal:NamedIndividuals>
A condition in an odal:Condition element should be a boolean expression which isvalid to be used in any WHERE clauses of SQL queries
38
ODAL: Data Type Property Declaration
<odal:NamedIndividuals odal:id="PersonInTablePerson" > <odal:Class odal:resource="http://www.foo.org/Person.owl#Person"/> <odal:Table>Person</odal:Table> <odal:Column>ssn</odal:Column></odal:NamedIndividuals>
<odal:OntologyProperty> <odal:DatatypeProperty odal:resource="http://www.foo.org/Person.owl#hasAge"/> <odal:Table>person</odal:Table> <odal:Domain odal:resource="PersonInTablePerson" /> <odal:Range odal:resource="age" /></odal:OntologyProperty>
…8…1234-56-7890…
…age…SSN… Person
double
hasAge
39
• Usually we don’t make join on individuals cross different resources
• A set of datatype properties can be declared as a key for a class in the ontology. We do join cross multiple resources based on keys.
e.g. { hasLatitude, hasLongitude} can be declared as a key of Location
Two locations from different resources are same if they have the same
latitude and longitude
Conditions for Joining from Different Resources
Rock
RockSampleID
10001
…...
RockID
10001
……
We don’t know whether 10001 represents the same rock in the two resources. By default, we assume they are not.
40
SOQL (Simple Ontology Query Language)
Query single or integrated resources • via ontologies (i.e., high level logical views)• independent on any physical presentation (i.e. schemas)
RockSample Location
ValueWithUnit float
location
hasSiO2
value
lat long
unit
string
SELECT X.location.*; FROM RockSample X WHERE X.location.lat > 60 AND X.location.long > 100 AND X.hasSiO2.value < 30 AND X.hasSiO2.unit =‘weightPercetage’
GUIgenerate
to SOQLprocessor
41
The Architecture of GEON Semantic Mediator
Portal or Application
Mediator JDBC Driver
GUI
SOQLSemantic Query Rewriter
SOQL Parser Ontology
Reasoner
SOQL Processor
Spatial SQL against federal schemas
SQL Parser
OWL ODAL
Query Execution
Query Optimization
QueryPlanning Internal Database
Oracle DB2 MySQLSQL
ServerPostgreSQL PostGIS
ODAL Processor
42
SELECT X.code, X.location.* FROM SeismicStation X, Railroad Y WHERE distance(X.location, Y.geometry) < 1
SELECT X2.stationcode, X2.lat, X2.lon FROM railroads_of_the_united_states X1, stationdatatable X2 WHERE distance(X1.the_geom, MakePoint(X2.lat, X2.lon)) < 1
GEONSOQLGUI
SOQL Processor
Railroadshapefile
Seismic Stations
Schema Mediator
distance(X1.the_geom, MakePoint(X2.lat, X2.lon)) < 1
SELECT X1.the_geom FROM railroads X1
Question: Finding all seismic stations within 1 mile from railroads
SELECT X2.stationcode, X2.lat, X2.lon FROM stationdatatable X2
WHERE bounding box condition
43
Questions?
44
How to Connect to GEON Databases
• Download GEON JDBC Driver• Use the following code to create a connection
// load driverClass.forName ("org.geongrid.jdbc.driver.Driver");
// set the mediator URLString url = "jdbc:geon://geon01.sdsc.edu:2532/GEON-63cb404c-6038-11d9-a69f”;
// open the connectionConnection conn = DriverManager.getConnection(url, "geonuser", "geongrid");
GEON JDBC protocolThe host name and port number of GEON Mediator
GEON ID
Note: the original account information is invisible to end users