14
Semantic Integration of Heterogeneous Domain-Specific Information: The NIF Case Amarnath Gupta Univ. of California San Diego

Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …

Embed Size (px)

Citation preview

Semantic Integration of Heterogeneous Domain-

Specific Information:The NIF Case

Amarnath Gupta

Univ. of California San Diego

When is an information system “semantic”?

An Abstract Question

There is no concrete answer …but …

Everybody Loves Google

Can bing be far behind?

A Domain-specific Search

publications

Hub sources

“Bing is a search engine that finds and organizes the answers you need so you can make faster, more informed decisions” !!!

What happened to “organizes the

answers” and helping more informed

decisions? !!!

Neuroscience Information Framework

Recognized entities

“semantic equivalence”

In NIFDomain = ontological definitions

+ axioms

Indexing property chains for fast query expansion

Schema mapping when possible

Ontological Source Filtering

Bicycle as in bi-cyclic

Bicycle as a therapeutic aidOntological Resource Annotation

Data Ingestion and

Transformation

Ontology Ingestion and Transformati

on

Rela

tional

Query

Pro

cess

or

Tree Q

uery

Pro

cess

or

Gra

ph Q

uery

Pro

cess

or

OntoQuestIn

dex

Str

uct

ure

sType-Partitioned Data Store

Ontology Repository

User Query Parser

Keyw

ord

Q

uery

Pro

cess

or

Query Planner

Data Reade

r

Data Reade

r

Data Reade

r

Execution Engine

OWL Reade

r

OBO Reade

r

RDFS Reade

r

Semantic & Assn. Catalogs

...

Current Query Architecture

•How to store, index and query ontologies efficiently? •What about different forms of ontology? •What about multiple inter-mapped ontologies?

Some Performance Numbers

Q1. A single term ontological query synonyms(Hippocampus) Q2. transcription AND gene AND pathwayQ3. (gene) AND (pathway) AND (regulation OR "biological regulation") AND (transcription) AND (recombinant) Q4. synonyms(zebrafish AND descendants(promoter,subclassOf))Q5. synonyms(descendants(Hippocampus,partOf))Q6. synonyms(Hippocampus) AND equivalent(synonyms(memory)) Q7. synonyms(x:descendants(neuron,subclassOf)

where x.neurotransmitter='GABA') AND synonyms(gene where gene name='IGF')Q8. synonyms(x:descendants(neuron,subclassOf) where

x.soma.location=descendants(Hippocampus,partOf))

The Abstract ProblemGiven

n data sources (n of the order of hundreds) Structured (relational) Semi-structured (XML, RDF) Un-structured (text) With specialized data semantics (pathway graphs, social nets,

annotated images, …) A domain specified by an ontology with known

entailment rules (preferably less expressive than full MSO logic)

A set of mappings from the data to the ontologyConstruct

An information system such that The ontology is the effective target schema Its query language has an enhanced keyword model

(or any associative query language) User queries are transformed into “intentionally

equivalent” source queries Results are ranked by relevance The system is responsive, robust and scalable

•Bootstrapping from a seed ontology•Creating a feature-derived ontology

A Linked Graph Perspective

We can view the data problem as a “constrained” graph integration exercise where

Every data/knowledge resource can be considered as a graph that is governed by a set of (Description Logic) axioms about its structure and component relationships

Connections between individual resources can be defined both at the level of the instance or at the level of the concepts

The connections themselves can be defined in terms of asserted or inferred Description Logic statements

The ontology’s role is to provide the bridges that can be considered “general knowledge” that is modularized under a well formed upper ontology.

Too Many IssuesToo Little Time (and Resources)

What’s the best way to implement ontologies with concrete domains through a graph-based approach?

Graphs with Colored DAG backbones? Balancing Materialized vs. Computed edges for best

time-space tradeoffsWhat is an appropriate result model for an associative graph query?

What is the query language and result model of a story?

Combining result presentation and navigation options?

Ranking Models? Contextual Query Interpretation and Ranking?

Oh! Scalability!!!