27
Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of Biological Complexity Virginia Commonwealth University Richmond, VA [email protected]

Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

Embed Size (px)

Citation preview

Page 1: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

Analysis and Interpretation of Microarray Data

Michael F. Miles, M.D., Ph.D.

Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of Biological Complexity

Virginia Commonwealth University

Richmond, VA

[email protected]

Page 2: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

Expression Profiling: A Non-biased, Genomic Approach to Resolving the Mechanisms of Addiction

Candidate Gene Studies

Cycles of Expression Profiling:

“Molecular Triangulation”

Merge with Biological Databases

Page 3: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

High Density DNA Microarrays

Page 4: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

Oligonucleotide Array Analysis

AAAA

Oligo(dT)-T7

Total RNA Rtase/Pol II

dsDNAAAAA-T7TTTT-T7

CTP-biotin

T7 polTTTT-5’5’

Biotin-cRNA

Hybridization

Steptavidin-phycoerythrin

Scanning

PM

MM

Page 5: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

Stepwise Analysis of Microarray Data

• Low-level analysis -- image analysis, expression quantitation

• Primary analysis -- is there a change in expression?

• Secondary analysis -- what genes show correlated patterns of expression? (supervised vs. unsupervised)

• Tertiary analysis -- is there a phenotypic “trace” for a given expression pattern?

Page 6: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

Hybridization and Scanning

GE Database (SQL Server)

Primary Analysis

(MAS-5, S-score, d-chip,

PDNN)

Clustering Techniques

Statistical Filtering

(e.g. SAM)

Overlay Biological Databases(PubGene, GenMAPP,

EASE, WebQTL,

etc.)

Provisional Gene

“Patterns”

Filtered Gene Lists

Candidate Genes

Molecular Validation

(RT-PCR, in situ, Western)

Behavioral Validation

Normalize, De-noise

Experimental Design

Page 7: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

Quality Assessment

• Gene specific: R/G correlation, %BG, %spot, biological variation

• Array specific: normalization factor, % genes present, linearity, control/spike performance (e.g. 5’/3’ ratio, intensity)

• Across arrays: linearity, correlation, background, normalization factors

Page 8: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

Type of Variance FactorsBiological Animal-animal differences (intra/inter cage, supplier)

Genotype

Circadian rhythms

Stress

Technical Sample treatment/harvesting (dissections, injections)

Target preparation (enzyme lots, mRNA quality)

Lot-to-lot chip variation

Chip processing (scanning order)

Environmental Temperature

Handling

Noise/odors

Sources of Variance in Microarray Experiments

Page 9: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

Chip Normalization Procedures

• Whole chip intensity– Assumes relatively few changes, uniform error/noise

across chip and abundance classes

– Linear vs. “piece wise” linear (quantile, lowess)

• Spiked standards– Requires exquisite technical control, assumes uniform

behavior

• Internal Standards– Assumes no significant regulation

Page 10: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

“Lowess” normalization,Pin-specific Profiles

After Print-tip Normalization

Slide Normalization: Pieces and Pins

See also: Schuchhardt, J. et al., NAR 28: e47 (2000)

http://www.ipam.ucla.edu/publications/fg2000/fgt_tspeed9.pdf

Page 11: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

Affymetrix Arrays: PM-MM Difference Calculation

Probe pairs control for non-specific hybridization of oligonucleotides

Page 12: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

Probe Level Analysis: Challenges

• Large variability in PM and MM intensities• Only 11-25 probe pairs• MM is a complex mixture of true signal and

background• Normalization required to compare across

chips• Intensity dependent noise• Etc.

Page 13: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

Probe Level Analysis Methods• AvgDiff -- Affymetrix 1996, trimmed mean with exclusion of

outliers, PM-MM

• MAS 5 -- Affymetrix 2001, modeled correction of MM, Tukey’s bi-weight, PM-MM or PM-m

• MBEI -- Li and Wong 2001, modeled correction and outlier detection, PM-MM or PM only

• RMA (Robust Multichip Analysis) -- Irizarry et al. 2002, PM only

• PDNN (Position Dependent Nearest Neighbor) - Zhang et al. 2003, thermodynamic model for probe interactions, PM only

Page 14: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of
Page 15: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

MAS 5 Fold-Change vs. S-scores

Page 16: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

Secondary Analysis: Expression Patterns

• Supervised multivariate analyses– Support vector machines

• Non-supervised clustering methods– Hierarchical– K-means– SOM

Page 17: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

PFCHIP VTA

NAC

Use of S-score in Hierarchical Clustering of Brain Regional Expression Patterns

0 +2-2

relative change

PFCHIP NAC

VTA

AvgDiff S-score

Page 18: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

Tertiary Analysis: Connecting Function with Expression Patterns

• Annotation– UniGene/Swiss-Prot, SOURCE, DAVID

• Biased functional assessment– Manual, GenMAPP, GeneSpring

• Non-biased functional queries– PubGen– MAPPFinder, DAVID/Ease, GEPAS, GOTree

Machine, others• Overlaying genomics and genetics

– WebQTL

Page 19: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

Non-biased (semi) Functional Group Analysis: GenMAPP

Page 20: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

Expression Analysis Systematic Explorer -- EASE

http://apps1.niaid.nih.gov/david/upload.jsp

Genome Biol. 2003;4(10):R70. Epub 2003 Sep 11.

Page 21: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

EASE -- Options in Analysis

Page 22: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

Efforts to Integrate Diverse Biological Databases

with Expression Information: PubGen

www.PubGen.org

Page 23: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

1

2

3

4

6

8

5

7

9

10

11

NAC PFC VTA

B6 Et

D2 Et

B6/D2

B6 Et

D2 Et

B6/D2

B6 Et

D2 Et

B6/D2

Functional Annotation Association Mining (EASE)

High-throughput Literature Association Mining (PubGene)

Genetic Associations (WebQTL)

Additional Expression Associations (Molecular Triangulation)

Page 24: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

Analysis Stages for Oligonucleotide Microarrays

Analysis Stage Description Examples of MethodsNormalization Equalizes overall signal across

arrays to be compared, ensureslinearity of response acrossabundance classes

Whole chip(26)Quantile(27)

Probe reduction Combines signals from multipleprobes or probe pairs to define“expression level”. Identifiesgenes with invalid or hyper-variable expression levels.

Weighted average (MAS 4)(29)Tukey bi-weight (MAS 5)(30)Model-based (MBEI)(31)Log scale linear additive (RMA)(32)Position-dependent stacking energy modeling(PDNN) (33)

Comparative Compares expression of a geneacross two or more arrays todetermine significant changes inexpression

t-testrank order (MAS 5) (30)permutation (SAM) (46, 47)S-score (48)

Multivariatestudies

Identifies significant correlationsin expression data acrossexperiments/conditions

hierarchical clusteringk-means clusteringself-organizing mapsprinciple components analysis& many more(34, 49)

Biological overlay Identify functions for givengenes, clusters of genes;hypothesis generation

Multiple database access (Source)(50)PubMed correlations (PubGene)(51)Gene Ontology rankings (GenMAPP,MAPPFinder, DAVID/EASE)(52, 53)

Page 25: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

Bioinformatics Resources for Microarray Experiments

Name Description Link

SOURCE Human, rat, mouse gene compilationfrom multiple databases; allows batchsubmissions for annotation

http://source.stanford.edu/cgi-bin/sourceSearch

GeneLynx Human, mouse gene compilation;multiple database links regardinggene/protein structure and function

http://www.genelynx.org/

DAVID/Ease Mines gene list for frequency of GOcategories; annotation of gene list;statistical analysis of biological themesin gene list (EASE)

http://apps1.niaid.nih.gov/David/upload.asp

GenMAPP/MAPPFinder Superimposes array data on biologicalpathways; statistical ranking offunctional groups

http://www.genmapp.org/

FatiGO Mines gene list for occurrence of GOterms; statistical comparison of twolists for over-representation

http://fatigo.bioinfo.cnio.es/

PubGene Finds associations between genes inbiomedical literature; superimposesarray data on literature links;commercial version available

http://www.pubgene.org/

MEME Search promoter regions of genes inlist/cluster for conserved motifs

http://meme.sdsc.edu/meme/website/intro.html

Page 26: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

Expression Networks

Expression Profiling

Pharmacology Genetics

Complex

Trait

Prot-Prot

Interactions

OntologyHomolo-Gene

BioMed Lit

Relations

Quaternary Analysis: Profiles to Physiology

Page 27: Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of