53
Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 [email protected] http://smd.stanford.edu/

Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 [email protected]

Embed Size (px)

Citation preview

Page 1: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Publishing expression data from the SMD

Catherine BallTuesday, May 30, [email protected]://smd.stanford.edu/

Page 2: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

User Help: Tutorials and Workshops• SMD Help & FAQ

http://genome-www.stanford.edu/microarray/helpindex.html

• SMD Tutorials – regularly scheduled (we hope)– Welcome to SMD– Data analysis, Normalization and Clustering– Publishing expression data– Power users and the data repository– Interested? Email [email protected]

Page 3: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Publishing expression data : a tutorial

• What we won’t discuss:– User Registration– Loader Accounts– Submitting Data– Finding Your Data– Displaying Your Data– Data Retrieval and Analysis– Submitting a Printlist– Data Normalization

– Data Quality Assessment– Data Analysis (clustering)– External User Tools (XCluster,

TreeView, etc.)

• What we will discuss:– Publishing

• Publisher’s requirements• Experimenter’s responsibilities

– Hybridization Annotation• Categories, Subcategories• Protocols• Procedures and parameters• Clinical Data

– Experiment Set Annotation• Organizing Data• Experiment Design Categories• Experimental Factors• Factor Values

– Making your data available• SMD• Web Supplements• Public Data repositories

Please fill out the sign-up sheet and survey form

Questions? email us at: [email protected]

Page 4: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Publishing expression data

• Background• Publishing requirements and

responsibilities• Pre-publication responsibilities

– Hybridization Annotation– Experiment Set Annotation

• Post-publication responsibilities– Making your data available

Page 5: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

• Extremely difficult to either interpret or analyze expression results without being aware of all the variables

• Typically, these annotations, if they exist at all, are not attached to the data

Background : Interpretation and Analysis

Biological characteristics, experimental design, protocol parameters, filtering parameters, etc.

Perhaps in a lab notebook, eventual publication (if ever published), or in the worst scenario, only in the experimenter’s head

Page 6: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Background : MGED

• Microarray Gene Expression Database Society• http://www.mged.org/• Initially established November, 1999, Cambridge, UK. • Realized there were serious problems in

communicating the results of genomic-scale expression results

• Keen interest in a data standards, specifications, and transmission.

Page 7: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Background : Emerging standards

• MIAME : Minimal Information About a Microarray Experiment

– the requisite information needed to both verify your analysis and allow others to perform distinct analyses

– Nature Genetics (2001) 29, 365-371

• MAGE-ML: MicroArray Gene Expression Markup Language

– data format standard required for transmission and integration into other expression repositories

– Genome Biology (2002), 3(9):research0046.1–0046.9

Page 8: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Background : MIAME checklist

• MGED Guide to authors, editors and reviewers of microarray gene expression papers

• In the interests of full disclosure and open research, a checklist of requirements was proposed, aimed at allowing manuscript readers “to understand the experiment, to identify the sequences being assayed, and to interpret the resulting data. ”

http://www.mged.org/Workgroups/MIAME/miame_checklist.html

Page 9: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Publication Requirement?

… also being adopted by Cell and The Lancet - others to follow…

Page 10: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Publishing responsibilities

• Pre-publication– Provide the data and full annotation to the

reviewers and editors. – This may evolve to sending data to a repository

prior to publication (reviewer anonymity)• Post-publication

– For the foreseeable future, provide a static snapshot of the raw result data and filtered/clustered data along with the gene annotation at the time of publication

Page 11: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Implications of MIAME for Stanford Microarray Researchers

• As of December 1, 2002, anyone submitting a paper to a Nature journal must submit his/her data to a public microarray data repository (such as ArrayExpress).

• SMD users should start assembling and entering experimental data in preparation for more widespread acceptance of these standards.

Page 12: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

MIAME checklist

Six parts 1. Biological Samples

2. Hybridizations

3. Data Normalization and Transformation

4. Experimental Design and Factors

5. Array Design

6. Measurements

Page 13: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

SMD Stores Procedures

• Biological Sample (Channels 1 and 2)• Growth Conditions (Channels 1 and 2)• Treatment (Channels 1 and 2)• Extract Preparation (Channels 1 and 2)• Chromatin IP • Amplification (Channels 1 and 2)• Labeling (Channels 1 and 2)• Hybridization Conditions• Scanning Procedure (Channels 1 and 2)• Feature Extraction• User-defined Procedures

Page 14: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Recording Procedural Details : Two Mechanisms

• Full text Protocols– Great for providing the full documentation of the

protocol to a fellow researcher, but…– Poor for indicating which experimental parameter

is the key to the experimental design• Procedural parameters

– Great for supervised analysis and singling out the important details of the experiment, but…

– Poor for synthesizing the entire procedure together in a legible manner

Page 15: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Where are the tools?

Enter New Data

View Existing Data

Page 16: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

List Existing Protocols

• Display within SMD, or View external resource

• Edit your protocol from the list

Page 17: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Edit Existing Protocol

Page 18: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Entering a New Protocol• Choose the procedure• Supply the formatted plain text, or a simple description if

providing the URL

Page 19: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Flowchart to Add Annotations

Page 20: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Edit your hybridizations

Use “Edit” to add procedural details to your experiments

Page 21: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Experiment Types

• CGH– Comparison of genomic copy number between samples

(Comparative Genome Hybridization). • Chromatin IP

– Investigation of DNA-protein interactions in which protein-bound DNA is immunoprecipitated.

• Expression (Type I)– Investigation of gene expression where the control sample is tailored

to the particular experiment (not a common reference).• Expression (Type II)

– Investigation of gene expression where the control RNA is made from a common reference.

• GMS– Genome Mismatch Scanning. Investigation of the parental origin of

genomic DNA.

Page 22: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Edit your hybridizations

Use “Edit” to add procedural details to your experiments

Page 23: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Associating a protocol with a hybridization

• Associate a previously entered protocol• Enter a new one, if need be

Page 24: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Adding Procedural Parameter Values for a Hybridization

• Same interface is used to add experimental parameter values

• Parameter values are linked directly to the hybridization

• Procedural parameters are modeled as experimental factors

Page 25: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Edit your hybridizations

Use “Edit” to add clinical annotation to your experiments

Page 26: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Associating Patient Information

• Patient parameters we store– Age at diagnosis– Sex– Ethnicity– Family History– Status– Time from Operation to

Death– Date of last follow-up– Patient lost prior to

follow-up?

Page 27: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Associating Clinical Sample Information

• Sample parameters we store– Tracking Information– Unique Sample ID– Linking Database– Sample Information– Sample Source– Time Post-mortem (hrs) of sample removal– Sample State, Size– Granularity– Organ of origin– Attending Surgeon– Pre-Operative Information– Prior Treatment– Clinical Stage– Post-Operative Information– Tumor Grade, Size, Type– Margins– Time from Diagnosis To Operation– Angioinvasion– Total Lymph Nodes– Positive Lymph Nodes– Pathological Stages FollowUp Information– Recurrence– Post Operative Therapy Time from Operation to

Recurrence

Page 28: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Batch Association of Annotations

Batch Entry

Page 29: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

MIAME checklist

Six parts 1. Biological Samples

2. Hybridizations

3. Data Normalization and Transformation

4. Experimental Design and Factors

5. Array Design

6. Measurements

Page 30: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

MIAME checklist : Data Normalization and Transformation

Page 31: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

MIAME checklist

Six parts 1. Biological Samples

2. Hybridizations

3. Data Normalization and Transformation

4. Experimental Design and Factors

5. Array Design

6. Measurements

Page 32: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

MIAME : Experimental Design

• Experimental Design and Factors– type of experiment (set of hybridizations)

– The number of hybridizations performed– experimental factors– hybridization design– the type of reference used for the

hybridization– quality control steps taken

Page 33: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Organizing Data: Arraylists vs Experiment Sets• Arraylists

– Personal list of experiments

– Contains no annotation

– More difficult to share with others

– Flat file that exists in your loader account

– Accessed through Advanced Search

• Experiment Sets– Annotated list of

experiments– Exists in the database

therefore dynamic (edit, delete, or annotate through a web interface)

– Easily shared with other users/ collaborators

– Extensible– Accessed through Basic

Search– Required for publication

within SMD

Page 34: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Easily convert your arraylist into an experiment set

Page 35: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Experiment Set Creation

Selecting the data for inclusion within the experiment set

• Select experiments using either the basic or advanced search as a starting point

Page 36: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Experiment Set Organization

Page 37: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Base Annotation for the Experiment Set

–Set description•For publications, this would likely be either the abstract or a figure legend

Page 38: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Finding Your Sets in SMD: Basic Search

Experiment Sets allow you to search data

on pre-defined experiment groups.

Page 39: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Edit your Experiment Set

Page 40: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu
Page 41: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu
Page 42: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Experiment Factors : Step 1

Procedures Parameters Measurements?

Page 43: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Experiment Factors : Step 2

These values can be automatically acquired/suggested from your procedural parameters values, but only if you have annotated your experiments.

Note: full text protocols cannot be utilized for this purpose, but fulfill their own purpose.

Page 44: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Benefits of Experiment Annotation

• Meet MIAME requirements• Meet publishing requirements (see above)

• Serve as a basis for new analysis tools

Page 45: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Post-publication responsibilities

• Making your data easily available and accessible for the foreseeable future– SMD– web supplement– public repositories

Page 46: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Post-publication : SMD

• Send us the name of your MIAME-annotated experiment set

• We’ll make the arrays world-viewable for you, and publicize your paper

• Gene annotations and normalizations may change, so you must also provide a distinct, static view (web supplement)

Contact [email protected]

Page 47: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Post-publication : web supplement• We encourage you to make a web supplement,

which represents a snapshot of the data, as published

• Options:1. You can make the web-site and host it on your own.

2. You can make the web-site on your own and you can ask us to host it.

3. You can ask us to construct one for you. Usually, given the amount of work that this entails (ask us ahead of time), the curator creating the website will expect collaborative consideration.

Contact [email protected]

Page 48: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

Post-publication : repositories

– Submit your data to a public repository• ArrayExpress at the EBI

– http://www.ebi.ac.uk/arrayexpress/

• Gene Expression Omnibus (GEO) and NCBI– http://www.ncbi.nlm.nih.gov/geo/

– We produce valid MAGE-ML for experiment sets and array designs and can communicate these to the repositories for you

Contact [email protected]

Page 49: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

If you require assistance with either the creation of a web supplement or submission of your dataset to a repository, contact us at [email protected]

Page 50: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

MIAME Resources

• MIAME working group– http://www.mged.org/miame

• MIAME checklist for authors, editors– http://www.mged.org/miame/miame_checklist.html

Page 51: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

SMD: Getting Help

• Click on the “Help” menu– Tool-specific links

will be listed at the top.

• Use the SMD help index to look for specific subjects

• Send e-mail to:[email protected]

Page 52: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

SMD: Office Hours

• Grant building, S201• Mondays 1-3 pm• Wednesdays 2-4 pm

Page 53: Publishing expression data from the SMD Catherine Ball Tuesday, May 30, 2006 array@genome.stanford.edu

SMD StaffGavin SherlockCo-Investigator

Catherine BallDirector

Janos DemeterComputational Biologist

Catherine BeauheimScientific Programmer

Heng JinScientific Programmer

Patrick BrownCo-InvestigatorFarrell Wymore

Lead ProgrammerMichael NitzbergDatabase Administrator

Zac ZachariahSystems Administrator

Don MaierSenior Software Engineer

Takashi KidoVisiting Scholar