Upload
abhishek-dabral
View
93
Download
1
Embed Size (px)
Citation preview
A Design to Integrate Heterogeneous Microarray Databases
Abhishek Dabral
Background
What are Microarrays? Generally speaking: DNA expression
chips
So what is the problem? 1000’s of genes/chip * 1000’s of chips = vast
amount of information
Heterogeneous databases
Our goal…
…is the creation of an“idealized” system that actively identifies data sources of interest, automatically overcomes syntactic and semantic heterogeneities wherever it discovers them, and provides transparent declarative, optimized query access
over all sources.
Specifically targeted for microarray domains
Benefits: cost and time-efficient
First Step: Standardization
Microarray Gene Expression Data (MGED) Society
Minimum Information About a Microarray Experiment (MIAME)
Microarray and Gene Expression (MAGE) group
Methodology
Schema
Step 4
Step 2
Step 3
Step 1
Schema
Object classes and attributes
Step 1: The schemas from the data sources are to be extracted.
Step 2: Tokenization step - the table name along with the attribute to be the token for the schema. Carried out for every table in each of the schemas.
Step 3: An ontology is constructed from these schemas.
Step 4: The object classes from Step 2 are clustered into class clusters
Methodology
Step 5: The clusters obtained from Step 4 are treated as base sets and their ontologies are traced and the common features are examined to arrive at an ontology base set per cluster.
Step 6: The ontologies in the
ontology base set from Step 5 are integrated to obtain ontology clusters. The result of this step is a cluster of related metadata terms grouped together.
Step 7: The ontology clusters are then named accordingly and recorded in the metadata updater to be used in the next iteration.
Step 7
Step 5
Step 6
Class Clusters
Ontology Base Set
Ontology Clusters
Architecture
Semantically Enhanced Enterprise Directory Services (SEEDS)
Figure 1: MicroSEEDS Architecture
Automated process
Data storage
Human actor
DNA microarray Researcher
MicroSEEDS Metadata
Metadata Access
Semantic FacilitatorTM SM Metadata
Updater
Semantic Homogeneity
Promotion
External Information
Bioinformatics Sources
MicroSEEDS Attributes
MicroSEEDS objectclasses
Relational database Flat file
LDAP directory XML Schema
Specific to the microarray domain, the preliminary goal of microSEEDS architecture is to minimize semantic heterogeneity through proactive promotion of semantic homogeneity.
References
http://www.mged.org/ - MGED Society