Upload
joanne-luciano
View
261
Download
0
Tags:
Embed Size (px)
DESCRIPTION
BioIT 2005 Reported on Project with Siderean Software. http://bit.ly/gsi68E (J Web Semantics Paper,
Citation preview
An RDF Data Model for the Semantic Web
5th Oracle Life Sciences User Group meetingMay 16-17, 2005
Agenda
Introduction – 5 min– Susie Stephens
Semantic Web for Life Sciences – 25 min– Susie Stephens
Oracle support of RDF in RDBMS – 25 min– Souripriya Das
Demo of Siderean’s Seamark Navigation Server – 25 min– Mike DiLascio, David LaVigna & Joanne Luciano
Discussion – 10 min– Susie Stephens
Semantic Web for Life Sciences
Susie Stephens
What is the Semantic Web?A machine-readable format that is Web compatibleThe Semantic Web adds definition tags to information in Web pages
– Enables computers to discover data more effectively
– Allows new associations to form between pieces of information
Resource Description FrameworkW3C standard for the common data formatBased on triples (subject–predicate–object)Everything has a URI Ontologies used to label the RDF tagged elements
Image Source: W3C
Image Source: W3C
Enterprise Integration Hub
Image Source: W3C
Semantic Web Stack
Image Source: W3C
Pharma Productivity
Source: PhRMA & FDA 2003
Critical Path Initiative
Source: Innovation or Stagnation, FDA Report, March 2004
Ontology Frameworks for Integration
<translatesTo>
mRNA <located>
Localization
<targets>
<affectedTissue>
Intervention point<MOA>
<probeFor>
Drug
Protein
<participatesIn>
<partOf>
Cascade pathway
Bio-process
<drugInteraction>
Target model
<affecting>
Treatment
Gene
<influences>
Disease
<profiledBy>
Microarray experiment
<efficacyMarkerFor>
<hasProduct>
<transcribes>
Biological Pathways
Image Source: Cytoscape
Beyond the “Dead” Graphical Model
Image Source: KEGG
Assigning Trust Values to Data
Image Source: SWANS
InferencingIf Gene G is implicated in Disease D, and its Protein Product P is a functional component of only Pathway P2 -> then Disease D directly perturbs Pathway P2<rdf:Description><log:is rdf:parseType=‘Quote’><rdf:Description rdf:about=‘variable#Gene_G’>
<hasProduct rdf:resource=‘variable#Protein_P’/><isImplicatedIn rdf:resource=‘variable#Disease_D’/>
</rdf:Description><rdf:Description rdf:about=‘variable#Protein_P’><inPathway rdf:resource=‘variable#Pathway_P2’/>
</rdf:Description><log:is><log:implies rdf:parseType=‘Quote’>
<rdf:Description rdf:about=‘variable#Disease_D’><D_perturbs rdf:resource=‘variable#pathway_P2’>
</rdf:Description></log:implies></rdf:Description>
Why Semantic Web for Life Sciences?
Heterogeneous data integration using explicit semanticsExpression well-defined and rich models of biological systemsAnnotating findings and interpretations formally and sharing with other scientists Embedding models and semantics within papersApplying logic to infer additional insights and to propose and/or capture new hypotheses
Q U E S T I O N SQ U E S T I O N SA N S W E R SA N S W E R S
RDF Support in Oracle RDBMS
Souripriya Das, Ph.D.Consultant Member of Technical Staff
Oracle New England Development Center
Overview
Three types of database objectsModel RDF graph consisting of a set of triplesRulebase Set of (user-defined) rulesRule Index Entailed RDF graph
We discuss following aspects for each type of objectDDLDMLViewsSecurity
RDF Query (with Inference)
RDF Models
Model: Overview
Each RDF Model (graph) consists of a set of triplesA triple (statement) consists of three components
– Subject URI or blank node– Predicate URI– Object URI or literal or blank node
A statement itself can be a resource (allowing nested graphs)
Model: Example
Family:(:John :brotherOf :Mary)(:John :age “16”^^xsd:Integer)(:Mary :parentOf :Matt)(:John :name “John”)(:Mary :name “Mary”)
Reification:(:John :thinks _:S1)(_:S1 rdf:subject :Sue)(_:S1 rdf:predicate :livesIn)(_:S1 rdf:object “NYC”)
:John
:Mary
brotherOf
:Matt
16
parentOf
age
:Sue NYClivesIn
thinks
RDF Query
SDO_RDF_MATCH Table FuncArguments
– Graph pattern A sequence of triple patternsTriple patterns typically use variables
– RDF Data set a set of models– Filter– Aliases
…FROM TABLE(SDO_RDF_MATCH(
‘(?x :brotherOf ?y) (?y :parentOf ?z)’,SDO_RDF_Models(‘family’),…
)) t…
SDO_RDF_MATCH: returnColumns (of type VARCHAR2) in each returned row:
For each variable ?x in Graph Pattern– x– x$rdfVTYP
URI, Literal, Blank node– x$rdfLTYP
Specific literal type (e.g., xsd:integer)– x$rdfCLOB
Contains actual value, if ?x matches a CLOB value
– x$rdfLANGLanguage tag, if any (e.g., “en-us”)
If no variable in Graph Pattern– A dummy column
SDO_RDF_MATCH: matchingMatching multiple representations
The same point in value space may have multiple representations
– “10”^^xsd:Integer– “10”^^xsd:PositiveInteger– “010”^^xsd:Integer– “000010”^^xsd:Integer
SDO_RDF_MATCH automatically resolves these
RDF Query: ExampleFind salary and hiredate of all the unclesSELECT emp.name, emp.salary, emp.hiredateFROM emp,
TABLE(SDO_RDF_MATCH(‘(?x :brotherOf ?y) (?y :parentOf ?z) (?x :name ?name)’,SDO_RDF_Models(‘family'),
…)) tWHERE emp.name=t.name;Use of SDO_RDF_MATCH allows embedding a graph query in a SQL query
RDF Query: Example 2Find pairs of persons residing at the same address where the first person rents a truck and the second person buys a fertilizerSELECT t3.x name1, t3.y name2FROM AddrTable t1, AddrTable t2,
TABLE(SDO_RDF_MATCH(‘(?x :rents ?a) (?a rdf:type :Truck)(?y :buys ?b) (?b rdf:type :Fertilizer)’,SDO_RDF_Models(‘Activities'),
…)) t3WHERE t1.name=t3.x and t2.name=t3.y and
t1.addr=t2.addr;
RDF Rulebases
Rulebase: Overview
Each RDF rulebase consists of a set of rulesEach rule consists of
– antecedent: graph-pattern– filter condition (optional)– Consequent: graph-pattern
One or more rulebases may be used with relevant RDF models (graphs) to obtain entailed graphs
Rulebase: Example
Rules in a rulebase family_rb:Antecedent: ‘(?x :brotherOf ?y) (?y :parentOf ?z)’Filter: NULLConsequent: ‘(?x :uncleOf ?z)’
Antecedent: ‘(?x :age ?a)’Filter: ‘a >= 65’Consequent: ‘(?x :ageGroup “Senior”)’
Antecedent: ‘(?x :parentOf ?y) (?y :parentOf ?z)’Filter: NULLConsequent: ‘(?x :grandParentOf ?z)’
RDF Rule Indexes
Rule Index: Overview
A rule index represents an entailed graphA rule index is created on an RDF dataset (consisting of a set of RDF models and a set of RDF rulebases)
Rule Index: Example
A rule index may be created on a dataset consisting of
– family RDF data, and– family_rb rulebase (shown earlier)
The rule index will contain inferred triples showing uncleOf and ageGroup information
RDF Query with Inference
SDO_RDF_MATCH with Rulebases
Arguments– Graph pattern
A sequence of triples (with variables)– RDF Data set
a set of modelsa set of rulebases
– Filter– Aliases
…FROM TABLE(SDO_RDF_MATCH(
‘(?x :uncleOf ?y)’,SDO_RDF_Models(‘family’),SDO_RDF_Rulebases (‘rdfs’, ‘family_rb’)…
)) t…
RDF Query w/ Inference: Example
Find salary and hiredate of all the unclesSELECT emp.name, emp.salary, emp.hiredateFROM emp,
TABLE(SDO_RDF_MATCH(‘(?x :uncleOf ?y) (?x :name ?name)’,SDO_RDF_Models(‘family'),SDO_RDF_Rulebases(‘rdfs’, ‘family_rb'),
…)) tWHERE emp.name=t.name;
RDF Query w/ Inference: Example 2
Find pairs of persons residing at the same address where the first person rents a truck and the second person buys a fertilizerSELECT t3.x name1, t3.y name2FROM AddrTable t1, AddrTable t2,
TABLE(SDO_RDF_MATCH(‘(?x :rents ?a) (?a rdf:type :Truck)(?y :buys ?b) (?b rdf:type :Fertilizer)’,SDO_RDF_Models(‘Activities'),SDO_RDF_Rulebases(‘rdfs’),
…)) t3WHERE t1.name=t3.x and t2.name=t3.y and
t1.addr=t2.addr;
RDF Models
Model: DDL
Procedures provided as part of the API may be used to
– Create a model– Drop a model
When a user creates a model, a database view gets created automatically
– rdfm_familyA model corresponds to a column of type SDO_RDF_TRIPLE_S in a base tableEach model has exactly one base table associated with it
Model: DDL Creating a Model
Create an Application TableCREATE TABLE family_table (
id NUMBER, family_triple SDO_RDF_TRIPLE_S);Create a Model
EXEC SDO_RDF.CREATE_RDF_MODEL(‘family’, ‘family_table’,‘family_triple’);Automatically creates the following database view
rdfm_family (…)
Loading RDF Data into Oracle
Java API provided to load NTriple into NDM
Sample XSLs provided– To convert RDF to NTriple– To convert RDF to INSERT statements
Model: DML
SQL DML commands may be used to do DML operations on a base table to effect DML (i.e., triple insert, delete, and update) on the corresponding model
Insert TriplesINSERT INTO family_table VALUES (1,
SDO_RDF_TRIPLE_S(‘family','<http://example.org/family/John>', '<http://example.org/family/brotherOf>', ‘<http://example.org/family/Mary>'));
Model: Security
The creator of the base table corresponding to a model can grant privileges to other usersTo perform DML to a model, a user must have DML privileges for the corresponding base tableThe creator of a model can grant QUERY privileges on the corresponding database view to other usersA user can query only those models for which s/he has QUERY privileges to the corr. database viewsOnly the creator of a model can drop the model
Model: Views
Database views corresponding to the models
RDF Rulebases
Rulebase: DDL
Procedures provided as part of the API may be used to
– Create a rulebasecreate_rulebase('family_rb');
– Drop a rulebase– drop_rulebase('family_rb');
When a user creates a rulebase, a database view gets created automatically
– rdfr_family_rb (rule_name, antecedent, filter, consequent, aliases)
Rulebase: DML
SQL DML commands may be used on the database view corresponding to a target rulebase to insert, delete, and update rulesinsert into mdsys.rdfr_family_rb values(‘uncle_rule',‘(?x :brotherOf ?y) (?y :parentOf ?z)’,NULL,'(?x :uncleOf ?z)', SDO_RDF_Aliases(…));
Rulebase: Security
Creator of a rulebase can grant privileges to the corresponding database view to other usersPerforming DML operations requires invoker to have appropriate privileges on the database viewOnly the creator of a rulebase can drop the rulebase
Rulebase: Views
RDF_RULEBASE_INFO– Contains the list of rulebases– For each rulebase, contains additional
information (such as, creator, view name, etc)Content of each rulebase is available from the corresponding database view
RDF Rule Indexes
Rule Index: DDL
Procedures provided as part of the API may be used to
– Create a rule indexcreate_rules_index ('family_rb_rix_family‘,
SDO_RDF_Models('family'),SDO_RDF_Rulebases(‘rdfs','family_rb'));
– Drop a rule indexdrop_rules_index ('family_rb_rix_family');
When a user creates a rule index, a database view gets created automatically
– rdfi_family_rb_rix_family (…)
Rule Index: Security
To create a rule index on an RDF dataset (models and rulebases), user needs to have QUERY privileges on those models and rulebasesCreator of a rule index holds QUERY privilege on the rule index and may grant this privilege to other usersOnly the creator of a rule index can drop it
Rule Index: Views
RDF_RULEINDEX_INFO– Contains the list of rule indexes– For each rule index, contains additional
information (such as, creator, status, etc)RDF_RULEINDEX_DATASETS
– For every rule index, stores the names of its models and rulebases
Rule Index: Dependencies
Content of a rule index depends upon the content of each element of its dataset
– Any modification to the models or rulebases in its dataset invalidates the rule index
– Dropping a model or rulebase will drop dependent rule indexes automatically.
Summary
RDF Data Model– Models (Graphs)– RDF Query using SDO_RDF_MATCH Table Function
RDF Data Model with (user-defined) Rules– Models (Graphs)– Rulebases– Rule Indexes– RDF Query on entailed RDF graphs
Management (DDL, DML, Security, …)– Models, Rulebases, and Rule Indexes
RDF Data Model Demo
Demo: Family Schema
Demo: Family Schema 2
Demo: Family Model Data
Demo: Family Model Data (Alt)
Demo: Query without Inferenceselect m from TABLE(SDO_RDF_MATCH(
'(?m rdf:type :Male)',SDO_RDF_Models('family'),null,SDO_RDF_Aliases(SDO_RDF_Alias('', 'http://www.example.org/family/')),
null));M--------------------------------------------------------------------------------http://www.example.org/family/Jackhttp://www.example.org/family/Tom
Demo: Query w/ RDFS Inference select m from TABLE(SDO_RDF_MATCH(
'(?m rdf:type :Male)',SDO_RDF_Models('family'),SDO_RDF_Rulebases(‘RDFS’),SDO_RDF_Aliases(SDO_RDF_Alias('', 'http://www.example.org/family/')),
null));M--------------------------------------------------------------------------------http://www.example.org/family/Jackhttp://www.example.org/family/Tomhttp://www.example.org/family/Johnhttp://www.example.org/family/Matthttp://www.example.org/family/Sammy
Demo: Family RulebaseAntecedent: ‘(?x :parentOf ?y) (?y :parentOf ?z)’Filter: NULLConsequent: ‘(?x :grandParentOf ?z)’
Demo: Query w/ Family and RDFS Inference
select x, y from TABLE(SDO_RDF_MATCH('(?x :grandParentOf ?y) (?x rdf:type :Male)',SDO_RDF_Models('family'),SDO_RDF_Rulebases('RDFS','family_rb'), SDO_RDF_Aliases(SDO_RDF_Alias('','http://www.example.org/family/')),
null));X Y------------------------------------------------------ -----------------------------------------------------http://www.example.org/family/John http://www.example.org/family/Cindyhttp://www.example.org/family/John http://www.example.org/family/Tomhttp://www.example.org/family/John http://www.example.org/family/Jackhttp://www.example.org/family/John http://www.example.org/family/Cathy
Q U E S T I O N SQ U E S T I O N SA N S W E R SA N S W E R S
Demo of Siderean’s Seamark Navigation Server
Mike DiLascio & Joanne Luciano
Agenda
About Siderean Software & Predictive Medicine, Inc.Introducing Seamark Navigation Server v.3.6Seamark & Oracle 10g RDF Data Model Demonstration of Seamark / Oracle 10g integrationLessons Learned / Q&A
About Siderean Software
Aggregate, organize and navigate information -the way users think –
-to improve analysis and decision making.
Founded in 2001 and based in El Segundo, CAVentured backed in 2004Delivering RDF-centric navigation and analysis capabilities for end users (a.k.a. - “the last mile”)Active W3C member leveraging Semantic Web standardsDemonstrating integrated Seamark navigation layer over Oracle 10g RDF Data Model in collaboration with Predictive Medicine, Inc.
Current solutions“50,000 results!!! Now what?” “I give up! Hello? Get me an apple!” “Why do I get oranges when I’m looking
for apples?”
CONTENT PRODUCER:“I just produced three apples
last week!”
Knowledge management –breathtakingly expensive
Enterprise search –a brute force approach
IT:“As soon as I fix his, hers stops working.”
Introducing Seamark Navigation Server“I can see the big picture!” “No more staring at a blank text box.” “I can drill down quickly to what I want.”
CONTENT PRODUCER:“I knew we had an apple in
here somewhere.”
Seamark – layering organization to deliver pinpoint navigation
IT:“I can take my coffee
break now.”
How it works: process
Term
Event
Person
PlaceText
View View
Metadata about data and content is aggregated…
Organized into a unified information architecture…
Analyzed to generate on-demand views…
Providing pinpoint navigation across
the data and content
How it works: architectureUser Navigation
and User Tagging
Unstructured Content and Data Feeds
Search Engines
Structured Content Sources
MetadataAggregator
NavigationMetadata
NavigationWeb Services
Web Browsers& Portals
User Alerts
Feed Aggregators
Seamark/Oracle integration architecture: Phase 1
User Navigationand User Tagging
Web Browsers& Portals
Feed Aggregators
CachedNavigationMetadata
NavigationWeb Services
User Alerts
Oracle 10g RDF Data Model for scalable
persistence of metadata
Batch RDFMatchQuery issued from
Seamark at index time
Seamark/Oracle integration architecture: Phase 2
User Navigationand User Tagging
DynamicNavigation Metadata
NavigationWeb Services
Web Browsers& Portals
Federated RDFMatchQueries issued from
Seamark at query time
User Alerts
Oracle 10g RDF Data Model for scalable
persistence of metadata Feed Aggregators
Seamark Demo: Background & ConceptsLife Sciences demonstration premise
RDF offers high value during early stage research
Leveraging strengths of Oracle 10g & Seamark v3.6Oracle – large datasets / scalabilitySeamark – useful subsets / flexible navigation & insights
Project elapsed time - about one week Locating and identifying data sources represented the
greatest time elementData sources in RDF required minimal integration timeNon-RDF data sources required transformation and linking
values (non-trivial but straightforward)
Seamark Demonstration: Identification of new drug candidates
GO2Keyword.rdf
UniProt.rdf
GO.rdf
Keywords.rdf
Taxonomy.rdfPubMed.xml
Citation
IntAct.rdf
Organism
Enzymes.rdf
OMIM.rdf
GO2OMIM.rdf
GO2Enzyme.rdf
MIM Id
KEGG.rdf
KeywordGO2UniProt.rdf
Protein
Enzyme
ProbeSet.rdf
Gene
Probe
Pathway
Compound
1. Differentiate different forms of disease2. Identify patients subgroups. 3. Identify top biomarkers4. Identify function5. Identify biological and chemical properties and disease associations of biomarker 6. Identify documents7. Identify role in metabolic pathways8. Identify compounds that interact9. Identify and compare function in other organisms10. Identify any prior art
Live Seamark Life Sciences Demonstration:
Sample Screenshots
Seamark application start page shows integration of OMIM, GO, KEGG, UniProt and NCBI
Select: Probe Set ID: “M18255_cds2_s_at”
Cytoplasm 1st of 9 MatchesCellular Location Via Gene Ontology
Results: 9 Matches on “M18255_cds2_s_at” to the Gene Ontology
Page Scroll
Cytoplasm 1st of 9 Matches
Page Scroll
Plasma Membrane, …, 2nd of 9 MatchesCellular Location Via Gene Ontology
Page Scroll for more results, etc.
Cytoplasm 1st of 9 Matches
Start Page: Optionally search across entire collection based upon keywords from the integrated data sources
Seamark Lessons LearnedRDF offers multiple unconstrained views of data/relationships– Provides maximum flexibility during early stage research – Later stages can leverage OWL to constrain known
relationships
Data providers – Timing is right to publish in RDF format– Cut your customer’s integration costs– Speed discovery time
Even with one week of effort…– Proof of Concept demonstrates value of broad & deep
integration – Additional value in extending POC in customer pilot initiatives
Siderean Seamark Conclusion
Getting the precise information we need from today’s data glut is profoundly difficultSolving this problem requires a solution that works the way you thinkSiderean is the world’s first turnkey navigation server for the enterprise and people at large
To arrange a demonstration of Seamark or for more information please contact:
Mike DiLascioOffice: +1 781 652 0339Mobile: +1 781 354 [email protected]
Thank You!
Siderean Software, Inc.390 North Sepulveda Blvd., Suite 2070El Segundo, CA 90245-4475 USAhttp://www.siderean.com