Sharing Microarray Experiment Knowledge Chips to Hits Oct. 28, 2002 Chris Stoeckert, Ph.D. Dept. of...

Preview:

Citation preview

Sharing Microarray Experiment Knowledge

Chips to Hits Oct. 28, 2002

Chris Stoeckert, Ph.D.

Dept. of Genetics & Center for Bioinformatics

University of Pennsylvania

Nature, October 3, 2002

http://plasmodb.org/David Roos, Jessie Kissinger, Bindu Gajria, Martin Fraunholz, Jules Milgram, Phil

Labo, Amit Bahl, Dave Pearson, Dinesh Gupta, Hagai GinsburgJonathan Crabtree, Jonathan Schug, Brian Brunk, Greg Grant, Trish Whetzel, Matt

Mailman, Li Li

Desirable Microarray Queries

• Return all experiments using developmental stage X.– Sort by platform type– Which are untreated? Treated?

• Treated by what

• How comparable are these?

• What can these experiments tell me?

Microarray Information to be Shared

Figure from:David J. Duggan et al. (1999) Expression Profiling using cDNA microarrays. Nature Genetics 21: 10-14

The Computational View of Microarray Information

Need an ontology to unambiguously represent this information.

What is an Ontology?• In philosophy, an ontology is a systematic account of

Existence.• In AI, an ontology is a systematic account of what can

be represented.• The knowledge of a domain is represented in a

declarative formalism.– Classes, relations, functions, or other objects are defined

with human-readable text describing what the names mean, and formal axioms that constrain the interpretation.

• A common ontology defines the vocabulary with which queries and assertions are exchanged.

Excerpted and adapted from: http://www-ksl.stanford.edu/kst/what-is-an-ontology.html

An Experimental Ontology

• An ontology for microarray experiments– Not an ontology of life but of experiments – Parts are applicable to describing experiments in

general

• Our approach to interfacing with other ontologies is “experimental”– Not mapping terms from related ontologies– Provide a framework to hang other ontologies off of

• Know where to find different types of annotation• How to interpret that annotation

http://www.mged.org

Relationship of MGED Efforts

MAGEMIAMEDB

MIAMEDBExternal

Ontologies/CVs

MGED Ontology

Software and database developers

Investigators annotating experiments

The MGED Ontology Home Page

http://www.cbil.upenn.edu/Ontology

The MGED Ontology Home Page

http://mged.sourceforge.net/ontologies/

The MGED Ontology Provides a Listing of Resources for Many Species

The MGED Ontology Organizes the Resources According to Concepts

The MGED Ontology is Structured in DAML+OIL using OILed 3.4

MGED Ontology: BiomaterialDescription: BiosourceProperty: Age

MGED Ontology: BiosourceOntologyEntry: DiseaseState

External References ©-BioMaterialDescription

©-Biosource Property

©-Organism

©-Age

©-DevelopmentStage

©-Sex

©-StrainOrLine

©-BiosourceProvider

©-OrganismPart

©-BioMaterialManipulation

©-EnvironmentalHistory

©-CultureCondition

©-Temperature

©-Humidity

©-Light

©-PathogenTests

©-Water

©-Nutrients

©-Treatment

©-CompoundBasedTreatment

(Compound)

(Treatment_application)

(Measurement)

MGED Ontology Instances

NCBI TaxonomyNCBI Taxonomy

Mouse Anatomical DictionaryMouse Anatomical Dictionary

International Committee on Standardized Genetic Nomenclature for Mice

International Committee on Standardized Genetic Nomenclature for Mice

Mouse Anatomical DictionaryMouse Anatomical Dictionary

ChemIDplusChemIDplus

Mus musculus musculus id: 39442

7 weeks after birth

Stage 28

Female

C57BL/6N

Charles River, Japan

Liver

22 2C

55 5%

12 hours light/dark cycle

Specified pathogen free conditions

ad libitum

MF, Oriental Yeast, Tokyo, Japan

Fenofibrate, CAS 49562-28-9

in vivo, oral gavage

100mg/kg body weight

An example of microarray sample annotation using the MGED ontology Susanna A. Sansone, Helen Parkinson, Philippe Rocca-Serra,

Chris Stoeckert and Alvis Brazma

The MGED Ontology in Action: MIAMExpress

Journals are Adopting the MGED Standards

Use of Minimal Information About Microarray Experiment (MIAME)

The MGED Ontology in Action: RAD

Generating Forms from the MGED Ontology

OntologyEntry

ExternalDatabases

PHP/SQL WWW

RAD Forms

MGED OntologyAnatomy

DevelopmentalStageDiseaseLineage

PATOAttributePhenotype

Taxon

SRES

RAD3

MGED Ontology

Using the MGED standards in RAD• RAD: RNA Abundance Database

– Stoeckert et al.(2000) Bioinformatics

• RAD 3.0– MIAME compliant and MAGE supportive– Building Importers, exporters for MAGE

• Incorporates MGED ontology– Uses OntologyEntry to point to internal tables and

external resources

• Expand processing and analysis information storage– Driven by experience and new approaches

ElementAnnotation

Analysis

AnalysisImplementationParam

AnalysisInput

AnalysisImplementation1

0..*1

0..*

1 0..*1 0..*

AnalysisInvocationParamAnalysisInvocation1

0..*1

0..*

1

0..*

1

0..*

1 0..*1 0..*

AnalysisOutput

1

0..*

1

0..*

CompositeElementAnnotation

ArrayAnnotation

CompositeElementImp

0..*0..1 0..*0..1

1

0..*

1

0..*

ElementResultImp CompositeElementResultImp

1

0..*

1

0..*

0..10..* 0..10..*

QuantificationParam

RelatedQuantification

Study

StudyDesignDescription

StudyAssay10..* 10..*

StudyDesignAssay

StudyFactorValueAssayLabeledExtract

BioMaterialImp1

0..*

1

0..*

LabelMethod

0..1

0..*

0..1

0..*

ProtocolParam

MAGEDocumentation

MAGE_ML

0..*

1

0..*

1

AcquisitionParam

Assay

1

0..*

1

0..*

1

0..*

1

0..*

1

0..*

1

0..*

1

0..*

1

0..*

Channel

1

0..*

1

0..*

0..*0..1

0..*0..1

Quantification1

0..*

1

0..*1

0..*

1

0..*

10..*

10..*

1 0..*1 0..*1 0..*1 0..*

Acquisition1

0..*

1

0..*

1

0..*

1

0..*

1

0..*

1

0..*

1

0..*

1

0..*

RelatedAcquisition1 0..*1 0..*1 0..*1 0..*

ProcessImplementationParam

ProcessIO

ProcessInvocation

1

0..*

1

0..*

ProcessInvocationParam10..* 10..*

Array

1

0..*

1

0..*

10..*

10..* 1 0..*1 0..*

BioMaterialMeasurement1 0..*1 0..*

Protocol

1

0..*

1

0..*

1

0..*

1

0..*

0..1

0..*

0..1

0..*

0..1

0..*

0..1

0..*Treatment

1

0..*

1

0..*

1

0..*

1

0..*

0..1

0..*

0..1

0..*

StudyDesign

1

0..*

1

0..*10..* 10..*

1 0..*1 0..*

BioMaterialCharacteristic1

0..*1

0..*

ProcessImplementation10..* 10..*

1

0..*

1

0..*

ElementImp

0..10..* 0..10..*

1

0..*

1

0..*

1

0..*

1

0..*

1

0..*

1

0..*

Control

1

0..*

1

0..*

ProcessResult1 0..*1 0..*

StudyFactor

1

0..*

1

0..*

10..* 10..*

OntologyEntry10..* 10..*

0..*0..1

0..*0..1

1

0..*

1

0..*

RAD schema uses MAGE/MIAMEMAGE

ExperimentArray

BioMaterialBioAssay

BioAssayData Protocol, Descr.

HigherLevelAnalysis

MAGEExperiment

ArrayBioMaterial

BioAssayBioAssayData

Protocol, Descr.HigherLevelAnalysis

MIAMEExperimental Design

Array designSamples

Hybridization, MeasureNormalization

.

MIAMEExperimental Design

Array designSamples

Hybridization, MeasureNormalization

.

RAD is now part of GUS-3.0 GUS has 5 name spaces compartmentalizing different

types of information.

Namespace Domain Features

Core Data Provenance Workflows

Sres Shared resorurces Ontologies

DoTSsequence and

annotationCentral dogma

RAD Gene expresssion MIAME/MAGE

TESS Gene regulation Grammars

Data Integration

• GO• Species• Tissue• Dev. Stage

Ontologies

SRes

acute myeloid leukemia

Data Provenance

• Ownership• Protection• Algorithms• Similarity• Versioning• Workflow

Core

with sequence similarity to c-fos

GenomicSequence

• Genes, gene models• STSs, repeats, etc• Cross-species analysis

TranscribedSequence

• Characterize transcripts• RH mapping• Library analysis • Cross-species analysis• DOTS

ProteinSequence

• Domains• Function• Structure• Cross-species analysis

DoTS

Transcription factors

•Arrays•SAGE•Conditions

TranscriptExpression

RAD

up-regulated in

• Binding Sites• Patterns• Grammars

Gene Regulation

TESS

and common promoter motifs

GUS Supports Multiple ProjectsAllGenesAllGenes PlasmoDBPlasmoDB

EPConDBEPConDB

CoreSRESTESSRADDoTS

Oracle RDBMS Object Layer for Data Loading

Java ServletsOther sites,Other projects,e.g. GeneDB

Other sites,Other projects,e.g. GeneDB

Available at http://www.gusdb.org

Summary• The MGED ontology is being developed within the microarray

community to provide consistent terminology for experiments.– Make it easier and more accurate to annotate a microarray experiment. – Use structured fields and controlled terms to query databases.

• This community effort has resulted in a list of multiple resources for many species and a machine-readable document of microarray concepts, definitions, and values.– The MGED Ontology is a work in progress but can be used now to

build forms for databases• RAD has incorporated the MGED ontology for forms

– Can export data from RAD into MAGE– RAD as part of GUS provides integration of gene expression,

annotation, and sequence.

Acknowledgements

• MGED Ontology– Helen Parkinson (EBI)

– Trish Whetzel

– The MGED Ontology Working Group

– MAGE working group

• RAD/GUS– Brian Brunk– Jonathan Crabtree– Steve Fischer– Yongchang Gan– Greg Grant – Hongxian He– Li Li– Junmin Liu – Matt Mailman– Elizabetta Manduchi– Joan Mazzarelli– Shannon McWeeney (OHSU) – Debbie Pinney– Angel Pizarro– Jonathan Schug– Trish Whetzel

www.mged.org www.cbil.upenn.edu

http://www.ebi.ac.uk/SOFG

Recommended