8
A Design to Integrate Heterogeneous Microarray Databases Abhishek Dabral

Microarray DB Integration

Embed Size (px)

Citation preview

Page 1: Microarray DB Integration

A Design to Integrate Heterogeneous Microarray Databases

Abhishek Dabral

Page 2: Microarray DB Integration

Background

What are Microarrays? Generally speaking: DNA expression

chips

So what is the problem? 1000’s of genes/chip * 1000’s of chips = vast

amount of information

Heterogeneous databases

Page 3: Microarray DB Integration

Our goal…

…is the creation of an“idealized” system that actively identifies data sources of interest, automatically overcomes syntactic and semantic heterogeneities wherever it discovers them, and provides transparent declarative, optimized query access

over all sources.

Specifically targeted for microarray domains

Benefits: cost and time-efficient

Page 4: Microarray DB Integration

First Step: Standardization

Microarray Gene Expression Data (MGED) Society

Minimum Information About a Microarray Experiment (MIAME)

Microarray and Gene Expression (MAGE) group

Page 5: Microarray DB Integration

Methodology

Schema

Step 4

Step 2

Step 3

Step 1

Schema

Object classes and attributes

Step 1: The schemas from the data sources are to be extracted.

Step 2: Tokenization step - the table name along with the attribute to be the token for the schema. Carried out for every table in each of the schemas.

Step 3: An ontology is constructed from these schemas.

Step 4: The object classes from Step 2 are clustered into class clusters

Page 6: Microarray DB Integration

Methodology

Step 5: The clusters obtained from Step 4 are treated as base sets and their ontologies are traced and the common features are examined to arrive at an ontology base set per cluster.

Step 6: The ontologies in the

ontology base set from Step 5 are integrated to obtain ontology clusters. The result of this step is a cluster of related metadata terms grouped together.

Step 7: The ontology clusters are then named accordingly and recorded in the metadata updater to be used in the next iteration.

Step 7

Step 5

Step 6

Class Clusters

Ontology Base Set

Ontology Clusters

Page 7: Microarray DB Integration

Architecture

Semantically Enhanced Enterprise Directory Services (SEEDS)

Figure 1: MicroSEEDS Architecture

Automated process

Data storage

Human actor

DNA microarray Researcher

MicroSEEDS Metadata

Metadata Access

Semantic FacilitatorTM SM Metadata

Updater

Semantic Homogeneity

Promotion

External Information

Bioinformatics Sources

MicroSEEDS Attributes

MicroSEEDS objectclasses

Relational database Flat file

LDAP directory XML Schema

Specific to the microarray domain, the preliminary goal of microSEEDS architecture is to minimize semantic heterogeneity through proactive promotion of semantic homogeneity.

Page 8: Microarray DB Integration

References

http://www.mged.org/ - MGED Society