Analysis and Interpretation of Microarray Data
Michael F. Miles, M.D., Ph.D.
Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of Biological Complexity
Virginia Commonwealth University
Richmond, VA
Expression Profiling: A Non-biased, Genomic Approach to Resolving the Mechanisms of Addiction
Candidate Gene Studies
Cycles of Expression Profiling:
“Molecular Triangulation”
Merge with Biological Databases
High Density DNA Microarrays
Oligonucleotide Array Analysis
AAAA
Oligo(dT)-T7
Total RNA Rtase/Pol II
dsDNAAAAA-T7TTTT-T7
CTP-biotin
T7 polTTTT-5’5’
Biotin-cRNA
Hybridization
Steptavidin-phycoerythrin
Scanning
PM
MM
Stepwise Analysis of Microarray Data
• Low-level analysis -- image analysis, expression quantitation
• Primary analysis -- is there a change in expression?
• Secondary analysis -- what genes show correlated patterns of expression? (supervised vs. unsupervised)
• Tertiary analysis -- is there a phenotypic “trace” for a given expression pattern?
Hybridization and Scanning
GE Database (SQL Server)
Primary Analysis
(MAS-5, S-score, d-chip,
PDNN)
Clustering Techniques
Statistical Filtering
(e.g. SAM)
Overlay Biological Databases(PubGene, GenMAPP,
EASE, WebQTL,
etc.)
Provisional Gene
“Patterns”
Filtered Gene Lists
Candidate Genes
Molecular Validation
(RT-PCR, in situ, Western)
Behavioral Validation
Normalize, De-noise
Experimental Design
Quality Assessment
• Gene specific: R/G correlation, %BG, %spot, biological variation
• Array specific: normalization factor, % genes present, linearity, control/spike performance (e.g. 5’/3’ ratio, intensity)
• Across arrays: linearity, correlation, background, normalization factors
Type of Variance FactorsBiological Animal-animal differences (intra/inter cage, supplier)
Genotype
Circadian rhythms
Stress
Technical Sample treatment/harvesting (dissections, injections)
Target preparation (enzyme lots, mRNA quality)
Lot-to-lot chip variation
Chip processing (scanning order)
Environmental Temperature
Handling
Noise/odors
Sources of Variance in Microarray Experiments
Chip Normalization Procedures
• Whole chip intensity– Assumes relatively few changes, uniform error/noise
across chip and abundance classes
– Linear vs. “piece wise” linear (quantile, lowess)
• Spiked standards– Requires exquisite technical control, assumes uniform
behavior
• Internal Standards– Assumes no significant regulation
“Lowess” normalization,Pin-specific Profiles
After Print-tip Normalization
Slide Normalization: Pieces and Pins
See also: Schuchhardt, J. et al., NAR 28: e47 (2000)
http://www.ipam.ucla.edu/publications/fg2000/fgt_tspeed9.pdf
Affymetrix Arrays: PM-MM Difference Calculation
Probe pairs control for non-specific hybridization of oligonucleotides
Probe Level Analysis: Challenges
• Large variability in PM and MM intensities• Only 11-25 probe pairs• MM is a complex mixture of true signal and
background• Normalization required to compare across
chips• Intensity dependent noise• Etc.
Probe Level Analysis Methods• AvgDiff -- Affymetrix 1996, trimmed mean with exclusion of
outliers, PM-MM
• MAS 5 -- Affymetrix 2001, modeled correction of MM, Tukey’s bi-weight, PM-MM or PM-m
• MBEI -- Li and Wong 2001, modeled correction and outlier detection, PM-MM or PM only
• RMA (Robust Multichip Analysis) -- Irizarry et al. 2002, PM only
• PDNN (Position Dependent Nearest Neighbor) - Zhang et al. 2003, thermodynamic model for probe interactions, PM only
MAS 5 Fold-Change vs. S-scores
Secondary Analysis: Expression Patterns
• Supervised multivariate analyses– Support vector machines
• Non-supervised clustering methods– Hierarchical– K-means– SOM
PFCHIP VTA
NAC
Use of S-score in Hierarchical Clustering of Brain Regional Expression Patterns
0 +2-2
relative change
PFCHIP NAC
VTA
AvgDiff S-score
Tertiary Analysis: Connecting Function with Expression Patterns
• Annotation– UniGene/Swiss-Prot, SOURCE, DAVID
• Biased functional assessment– Manual, GenMAPP, GeneSpring
• Non-biased functional queries– PubGen– MAPPFinder, DAVID/Ease, GEPAS, GOTree
Machine, others• Overlaying genomics and genetics
– WebQTL
Non-biased (semi) Functional Group Analysis: GenMAPP
Expression Analysis Systematic Explorer -- EASE
http://apps1.niaid.nih.gov/david/upload.jsp
Genome Biol. 2003;4(10):R70. Epub 2003 Sep 11.
EASE -- Options in Analysis
Efforts to Integrate Diverse Biological Databases
with Expression Information: PubGen
www.PubGen.org
1
2
3
4
6
8
5
7
9
10
11
NAC PFC VTA
B6 Et
D2 Et
B6/D2
B6 Et
D2 Et
B6/D2
B6 Et
D2 Et
B6/D2
Functional Annotation Association Mining (EASE)
High-throughput Literature Association Mining (PubGene)
Genetic Associations (WebQTL)
Additional Expression Associations (Molecular Triangulation)
Analysis Stages for Oligonucleotide Microarrays
Analysis Stage Description Examples of MethodsNormalization Equalizes overall signal across
arrays to be compared, ensureslinearity of response acrossabundance classes
Whole chip(26)Quantile(27)
Probe reduction Combines signals from multipleprobes or probe pairs to define“expression level”. Identifiesgenes with invalid or hyper-variable expression levels.
Weighted average (MAS 4)(29)Tukey bi-weight (MAS 5)(30)Model-based (MBEI)(31)Log scale linear additive (RMA)(32)Position-dependent stacking energy modeling(PDNN) (33)
Comparative Compares expression of a geneacross two or more arrays todetermine significant changes inexpression
t-testrank order (MAS 5) (30)permutation (SAM) (46, 47)S-score (48)
Multivariatestudies
Identifies significant correlationsin expression data acrossexperiments/conditions
hierarchical clusteringk-means clusteringself-organizing mapsprinciple components analysis& many more(34, 49)
Biological overlay Identify functions for givengenes, clusters of genes;hypothesis generation
Multiple database access (Source)(50)PubMed correlations (PubGene)(51)Gene Ontology rankings (GenMAPP,MAPPFinder, DAVID/EASE)(52, 53)
Bioinformatics Resources for Microarray Experiments
Name Description Link
SOURCE Human, rat, mouse gene compilationfrom multiple databases; allows batchsubmissions for annotation
http://source.stanford.edu/cgi-bin/sourceSearch
GeneLynx Human, mouse gene compilation;multiple database links regardinggene/protein structure and function
http://www.genelynx.org/
DAVID/Ease Mines gene list for frequency of GOcategories; annotation of gene list;statistical analysis of biological themesin gene list (EASE)
http://apps1.niaid.nih.gov/David/upload.asp
GenMAPP/MAPPFinder Superimposes array data on biologicalpathways; statistical ranking offunctional groups
http://www.genmapp.org/
FatiGO Mines gene list for occurrence of GOterms; statistical comparison of twolists for over-representation
http://fatigo.bioinfo.cnio.es/
PubGene Finds associations between genes inbiomedical literature; superimposesarray data on literature links;commercial version available
http://www.pubgene.org/
MEME Search promoter regions of genes inlist/cluster for conserved motifs
http://meme.sdsc.edu/meme/website/intro.html
Expression Networks
Expression Profiling
Pharmacology Genetics
Complex
Trait
Prot-Prot
Interactions
OntologyHomolo-Gene
BioMed Lit
Relations
Quaternary Analysis: Profiles to Physiology