75
ILSI-HESI agreement with EBI: ArrayExpress, public repository for toxicogenomics data Susanna Assunta Sansone [email protected] Microarray Informatics Team European Bioinformatics Institute (EBI) Hoffmann-La Roche The European Bioinformatics Institute The European Bioinformatics Institute

ILSI-HESI agreement with EBI: ArrayExpress, public repository for toxicogenomics data Susanna Assunta Sansone [email protected] Microarray Informatics

Embed Size (px)

Citation preview

  • ILSI-HESI agreement with EBI:ArrayExpress, public repository for toxicogenomics dataSusanna Assunta [email protected] Informatics TeamEuropean Bioinformatics Institute (EBI)

    Hoffmann-La Roche

  • AcknowledgmentsMicroarray Informatics Team, EBI, esp.: Alvis BrazmaHelen ParkinsonMohammad ShojatalabUgis Sarkans Industry Support team, EBI MGED steering committeeMIAME working group Chris Stoeckert, U. Penn. and members of MGED

  • Talk structurePart I= ArrayExpress at EBI: A public repository for gene expression data

    Demo= MIAMExpress: Submission/annotation tool

    Part II= ILSI-HESI IMD: Toxicogenomics data transfer to ArrayExpress

  • Part I - Talk structureData standardization:MGED groupMIAME conceptsMGED Ontology

    Uses of MIAME concepts:ArrayExpress databaseMAGE-OM the object model

    Data flow in out ArrayExpress

  • Part I - Talk structureData standardization:MGED groupMIAME conceptsMGED Ontology

  • Data standardization - MGEDMGED = Microarray Gene Expression DbEBI+worlds largest labs (TIGR, Sanger, Stanford, Agilent, Affymetrics, etc.)www.mged.orgAimsFacilitate adoption of standards:AnnotationData representationIntroduce:Experimental controlsData normalization methods

  • Data standardization - Why?Size of datasetDifferent platforms - nylon, glassDifferent technologies - oligos, spottedReferences to external db not stable!Gene expression data only have a meaning in the context of a detailed experiment description

  • MIAME-Minimum Information About Microarray Experiment MGED group has published: MIAME v1.0 doc (Brazma et al., Nature Gen, 2001)

    Minimum information that must be reported about a microarray experiment in order to ensure: its interpretability potential verification of the results

  • MIAME-Minimum Information About Microarray ExperimentPublicationExternal linksDescribes the6 parts of a microarray experiment Normalisation

  • MIAME - Experimental design6 parts of a microarray experiment NormalisationDataSampleHybridisationArraySource(e.g. Taxonomy)Gene(e.g. EMBL)PublicationThe set of the hybridisation experiments as a whole

  • MIAME - Experimental designOne/more hybridisations experiments in some way related and addressing related questions:Author, contact information, citations Type of experiment e.g.:time coursenormal vs diseased comparisonExperimental factors i.e. tested parameters in the experiment e.g.:timedoseresponse to a compoundList of organisms used in the experimentList of platforms used

  • MIAME - Experimental designList of samples, array and hybridisations and their relationship e.g.:SamplesS1, S2, S3ArraysA1, A2, A3Hybridisations:H1 is S1 and S2 on A1H2 is S2 and S3 on A2H3 is S1 and S2 on A3

    Which hybridisations are replicates e.g.:H1 and H3 are replicates

  • MIAME - Experimental designQuality related indicators e.g.:type of replicates

    Free-text description of the experiment or link to an e-publication

  • MIAME-Minimum Information About Microarray ExperimentPublicationExternal links6 parts of a microarray experiment NormalisationArray

  • MIAME - Array designNormalisationDataSampleHybridisationSource(e.g. Taxonomy)Gene(e.g. EMBL)PublicationArrayExperimentEach array used and each element (spot) on the array

  • MIAME - Array designFor the database, the array description should be normally submitted only once

    For each physical array used in the experiment a unique ID and the array type are given

    Array design related information e.g.:platform type = insitu synthesized or spotted, array provider, etc.surface type = glass, membrane, etc.

  • MIAME - Array designProperties of each type of elements on the array, that are generated by similar protocols e.g.:synthesized oligos, PCR products, plasmids, colonies, etc.

    Each element (spot) on the array:Elements may be simple or composite (Affymetrix) Each element must be identified by either the sequence, clone ID, PCR primer pair, or in any other unambiguous wayComposite elements may be identified by a reference sequenceElements may be linked to genes (preferably)This information is normally provided in a separate file e.g.:spreadsheet

  • MIAME-Minimum Information About Microarray ExperimentPublicationExternal links6 parts of a microarray experiment NormalisationSample

  • MIAME - SampleNormalisationDataSampleHybridisationSource(e.g. Taxonomy)Gene(e.g. EMBL)PublicationArrayExperimentSamples used, the extract preparation and labelling

  • MIAME - SampleSample source e.g.:OrganismCell source and typeDevelopmental stageOrganism part (tissue)Animal/plant strain or lineGenetic variationDisease state or normalTypically only some of these qualifiers are relevant and there isthe need to implement the annotation for sample source ! (To be continued)

  • MIAME - SampleSample treatment e.g.:in vivo / in vitroCompoundsThere is the need to implement the annotation for sampletreatment ! (To be continued)

    Hybridisation extract preparationLaboratory protocol, including extraction method, whether RNA, mRNA, or genomic DNA is extracted, amplification method

    LabellingLaboratory protocol, including amount of nucleic acids labelled, label used (e.g. Cy3, Cy5, 33P, etc)

  • MIAME-Minimum Information About Microarray ExperimentPublicationExternal links6 parts of a microarray experiment NormalisationHybridisation

  • MIAME - HybridisationsNormalisationDataSampleHybridisationSource(e.g. Taxonomy)Gene(e.g. EMBL)PublicationArrayExperimentProcedures and parameters

  • Laboratory protocol including:The solution e.g.:concentration of solutesBlocking agentWash procedureQuantity of labelled target usedTime, concentration, volume, temperatureDescription of the hybridisation instrumentsMIAME - Hybridisations

  • MIAME-Minimum Information About Microarray ExperimentPublicationExternal links6 parts of a microarray experiment NormalisationData

  • MIAME - DataNormalisationDataSampleHybridisationSource(e.g. Taxonomy)Gene(e.g. EMBL)PublicationArrayExperimentImages, quantitation, specifications

  • MIAME - Data Three data processing levels:

  • MIAME - Data Why three data processing levels? Each experiment uses different units! Non reliable information

    Lack of gene expression measurement units!

    What do we do in absence of standards? Record raw, intermediate and final analysis data Together with detailed annotation on the analysis

    This passes on the responsibility of interpreting the final data to the user

  • MIAME - DataRaw dataArray scans The scanner image file e.g.: TIFF, DAT

    Scanning information: Scan parameters: laser power spatial resolution pixel space PMT voltage Laboratory protocol for scanning Scanning hardware and software

    No MGED consensus on raw data!!

  • MIAME - DataIntermediate data SpotsQuantitationsSpot quantitations Image analysis and quantitation: Complete image analysis output for each element normally given as separate file e.g.: spreadsheet

    Image analysis information: Image analysis software specifications All parameters

  • MIAME - Data Summarised information from possible replicates: Derived measurement values summarising related elements as used by the author Reliability information for these values given as separate file, e.g.: spreadsheet Specifications of these two e.g.: median value of the replicates, standard deviation

    ConditionsGenesGene expression levelsFinal data

  • MIAME-Minimum Information About Microarray ExperimentPublicationExternal links6 parts of a microarray experiment Normalisation

  • MIAME - NormalisationNormalisationDataSampleHybridisationSource(e.g. Taxonomy)Gene(e.g. EMBL)PublicationArrayExperimentA typical experiment involves a number of hybridisations in which the data from multiple samples are analysed and compared

    For this comparison, the reported hybridisation intensities (from the image processing) must be first normalised

  • MIAME - NormalisationNormalisation adjust for a number of technical variations between and within hybridisation

    Normalisation strategy e.g.:SpikingHousekeeping geneTotal array

    Normalisation algorithm

    Control array elements

    Hybridisation extract preparation

  • 6 parts of a microarray experiment NormalisationDataSampleHybridisationSource(e.g. Taxonomy)Gene(e.g. EMBL)PublicationArrayExperiment Annotation implementations requiredGene expression data only have a meaning in the context of a detailed sample (source-treatment) and array (gene) descriptionMIAME - Annotation

  • MIAME - Gene annotationNormalisationDataSampleHybridisationGene(e.g. EMBL)PublicationArrayExperimentSource(e.g. Taxonomy) Unambiguous identification: Interpret data

    !!Synonyms!! Alternative to gene names Community approved names

    Usable external sources e.g.: EMBL-GenBank (sequence acc#) Jackson Lab (approved mouse gene names) HUGO (approved human gene names)

  • MIAME - Sample annotationNormalisationDataSampleHybridisationGene(e.g. EMBL)PublicationArrayExperiment Unambiguous identification: Interpret data

    Usable external sources e.g.: NCBI Taxonomy (organisms) Jackson Lab (mouse strains) Mouse Atlas (mouse anatomy) Merck Index, CAS # (compounds)

    CVs and ontologies are needed: Reduce free-text description Facilitate data queries-analysisSource(e.g. Taxonomy)

  • What are CV and Ontology?CV = Controlled Vocabulary:Set of restrictive terms used to describe something, in the simplest case it could be a list

    Ontology:Describes the relationship between the terms in a structured wayProvides semantics and constraintsAllows for computational inferences and reliable comparisons

  • Ontology exampleBuild an ontology for e.g.:Affymetrics GeneChip Rat Toxicology U34 Array

    (Top Level Class) Array element type (Sub-Class) oligos (slot constraint) manufactured by Affymetrics (instance) GeneChip Rat Toxicology U34 Array

  • MIAME - MGED OntologyMGED Sample (BioMaterial) ontology:Under construction by Chris Stoeckert www.cbil.upenn.edu/Ontology/MGED_ontology.htmlMotivated by MIAMEDefines terms, provides constraints, develops CVs for microarray experiment submissions Links also to external CVs and ontologies

  • MIAME Q,V,S tripletsMIAME definitions include the Q,V,S triplets:User defined qualifier, value, source tripletUsed to describe a new termqualifier = what the term describes (cell type)value = its value (epithelial)source = its source (Grays anatomy-38th ed.)User defined terms are added to the MGED ontology

  • Part I - Talk structureData standardization:MGED groupMIAME conceptsMGED Ontology

    Uses of MIAME concepts:ArrayExpress databaseMAGE-OM the object model

  • Uses of MIAME concepts Specifies the content of the information: Sufficient information must be recorded to: Correctly interpret Replicate the experiments Structured information must be recorded to: Correctly retrieve Analyse the data Uses: Creation of MIAME-compliant databases e.g.: ArrayExpress at EBI Development of submission/annotation tool for generating MIAME-compliant information e.g.: MIAMExpress

  • ArrayExpress

    A public repository for gene expression data

    MIAME-compliant

  • MAGE-OM Microarray Gene Expression Object Model: MIAME compliant Standard Joint submission to OMG, 2001, by MGED and Rosetta OMG (Object Management Group) is an international non-profit software consortium that is setting standards in the area of distributed object computing

    ArrayExpress- Object Model

  • MAGE-ML Mark-up Language: Derived from MAGE-OM Describe and communicate MIAME information DTD = predominantly computer readable

    UML Unified Modelling Language: UML specifications are used to develop and describe MAGE-OM UML = human readableArrayExpress- Object Model

  • MAGE-OM - UML specifications Related classes are grouped together in packages MAGE-OM has 16 packages

  • MAGE-OM mapping to MIAMENormalisation+ other 7 auxiliary packages:AuditandSecurity, Protocol, Measuraments, BioEvent, BQS, Description, HighLevelAnalysis

  • Part I - Talk structureData standardization:MGED groupMIAME conceptsMGED Ontology

    Uses of MIAME concepts:ArrayExpress databaseMAGE-OM the object model

    Data flow in out ArrayExpress

  • Data flow in-out ArrayExpress

    Users

  • Data flow in-out ArrayExpress

    UsersLoaderSubmissionSubmissionMAGE-ML MIAME compliant Data model implemented in ORACLE Deals with: Raw data Processed data Data transformation Independent of: Experimental platform Image analysis method Normalization method

  • Data flow in-out ArrayExpress

    Userscentral databasedata warehouseArrayExpressLoaderSubmissionSubmissionMAGE-MLMIAMExpress Submission/annotation tool Generates MIAME-compliant information Beta-testers Demo version (general) Target specific interfaces e.g.: Specie specific Toxicology specific

  • Talk structurePart I= ArrayExpress at EBI: A public repository for gene expression data

    Demo= MIAMExpress: Submission/annotation tool

  • Talk structurePart I= ArrayExpress at EBI: A public repository for gene expression data

    Demo= MIAMExpress: Submission/annotation tool

    Part II= ILSI-HESI IMD: Toxicogenomics data transfer to ArrayExpress

  • Part II - Talk structureData transfer from IMD to ArrayExpress:Can data be parsed?MIAME-compliant?

    Toxicology specific MIAMExpress interface:ILSI toxicogenomics data submission

    Areas of collaboration-Summary

  • Part II - Talk structureData transfer from IMD to ArrayExpress:Can data be parsed?MIAME-compliant?

  • Data parsing?From IMD to ArrayExpress:Lexical parsingMapping information to MAGE-OM

    !! Semantic parsing !!Glossary issues

  • NormalisationSampleHybridisationArrayDataData mapping - Semantics!

  • ExperimentNormalisationSampleHybridisationDataData mapping - Semantics!IMD=chip,microarray chip!! Synonyms !!

  • ExperimentNormalisationSampleHybridisationDataIMD=chip description,microarray chip description!! Synonyms !!

    Data mapping - Semantics!

  • ExperimentNormalisationSampleHybridisationDataIMD=chip design,microarray chip design!! Synonyms !!

    Data mapping - Semantics!

  • ExperimentNormalisationSampleHybridisationDataIMD=platform,microarray platform, microarray platform type!! Synonyms !!Data mapping - Semantics!

  • MIAME - compliant?IMD MIAME-compliant?Minimal system for data exchangeComparisons

    Current status for toxicogenomic data:Non-MIAME compliant

    Additional information required:To be flagged as MIAME compliant To build queries to the database: ArrayExpress has a object model query mechanism

    Why additional information?

  • ILSI-HESI ObjectiveILSI-HESI objective:To have publicly available information to assist in developing consensus on potential applications and interpretation of microarray data with respect to mechanism-based risk assessmentTo critically assess the potential utility of these new method for the process of hazard identification

    Toxicologists (other than ILSI-HESI members)Can correctly interpret and replicate the toxicogenomics experimentsCan correctly retrieve and analyse the toxicogenomics data

    Sufficient and structured information must be recorded in order to achieve ILSI-HESI objective

  • IMD - DataThree type of data: Required:fold_change of spot intensityOptional: relative_intensity coefficient_variation of relative_intensityAdditional:present/absent/marginal_call (for Affymetrics)P_value (for replicates)

  • MIAME compliant - Data Requirements:

  • Why three data processing levels? Lack of gene expression measurement units!

    What do we do in absence of standards? Record raw, intermediate and final analysis data Together with detailed annotation on the analysis

    This allows toxicologists (other than ILSI-HESI members) to interpret the final data

    Increase the value of toxicology data by achieving ILSI-HESI objective To give a critical mass to the ILSI-HESI studies

    MIAME compliant - Data

  • IMD Experiment description

    Hepatotoxicity e.g.:Oral (gavage) Study in Male SD Rats on Methapyrilene

  • IMD Experiment descriptionGood level of information

    Still incomplete to be MIAME compliant e.g.:Detailed protocols required e.g.: Hybridization chamber type, scanner type, label quantity etc.

    Need for:CV and ontologies

  • Excerpt from Sample Descriptioncourtesy of M. Hoffman, S. Schmidtke, Lion BioSciencesOrganism: Mus musculus [ NCBI taxonomy browser ]Cell source: in-house bred mice (contact: [email protected]) Sex: female [ MGED ]Age: 3 - 4 weeks after birth [ MGED ]Growth conditions: normal controlled environment20 - 22 oC average temperaturehoused in cages according to EU legislationspecified pathogen free conditions (SPF)14 hours light cycle10 hours dark cycleDevelopmental stage: stage 28 (juvenile (young) mice) [ GXD "Mouse Anatomical Dictionary" ]Organism part: thymus [ GXD "Mouse Anatomical Dictionary" ]Strain or line: C57BL/6 [ International Committee on Standardized Genetic Nomenclature for Mice ]Genetic Variation: Inbr (J) 150. Origin: substrains 6 and 10 were separated prior to 1937. This substrain is now probably the most widely used of all inbred strains. Substrain 6 and 10 differ at the H9, Igh2 and Lv loci. Maint. by J,N, Ola. [ International Committee on Standardized Genetic Nomenclature for Mice ]Treatment: in vivo [ MGED ] [ intraperitoneal ] injection of [ Dexamethasone ] into mice, 10 microgram per 25 g bodyweight of the mouseCompound: drug [ MGED ] synthetic [ glucocorticoid ] [ Dexamethasone ], dissolved in PBS

  • Part II - Talk structureData transfer from IMD to ArrayExpress:Can data be parsed?MIAME-compliant?

    Toxicology specific MIAMExpress interface:ILSI toxicogenomics data submission

    Areas of collaboration-Summary

  • Toxicology specific MIAMExpressToxicology specific interface options:in vivo or in vitro Study specific (Hepatotoxicity, Nephrotoxicity, Genotoxicity)CVs and ontologies to be developed:CVs in pull down menus Q,V,S users driven ontologiesExtend MGED ontology to include toxicology specifics termsDynamic, fast and easy to useBrowse:Protocols Arrays

  • Areas of collaborationData transfer:Parser from IMD to ArrayExpress (MAGE-ML)Additional information required:MIAME compliant flag (e.g. data, protocols, sample pooling etc.)Build complex queriesData submission:Submission via toxicology specific MIAMExpressCVs and ontologiesInterfaces optionsProtocolsOther data:Volume (79 from Hetapotoxicity)Clinical chemistry, HistophatologyFormat (images also?) and volumeMailing list