Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr...

Preview:

Citation preview

Microarray Quality Assessment

Issues in High-Throughput Data Analysis

BIOS 691-803 Spring 2010

Dr Mark Reimers

Quality Assessment

• Are there any factors that would lead you to doubt or distrust a particular datum (array) ?

• Quality of inputs – e.g. RNA quality

• Statistical QA – evidence of systematic variation different from others

BioAnalyzer

Ideal: Two sharp peaks for 18S & 28S RNA

Spot QA for cDNA Spotted Arrays

• Spot Measures– Signal/Noise

• Foreground / background or – foreground / SD

– Uniformity– Spot Area

• Global Measures– Qualitative assessments – Averages of spot measures

• Inspect images for artifacts– Streaks of dye, scratches etc.

• Are there biases in regions?

With commercial arrays we assume these issues are under control

Statistical Approaches

• Question: Are any samples different from others on technical grounds?

• Exploratory Data Analysis (EDA)

• Boxplots, clustering, PCA– Are there any outliers?– Are there associations with technical factors?

• Technician; date of sample prep; etc.

EDA - Boxplots

• Boxplot of 16 chips from Cheung et al Nature 2005

45

67

89

Another Portrait - Densities

4 6 8 10 12 14

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Density Plots:before and after

log(Signal)

De

nsity

Chips

GSM25524.CELGSM25525.CELGSM25526.CELGSM25527.CELGSM25528.CELGSM25529.CELGSM25530.CELGSM25531.CEL

GSM25540.CELGSM25541.CELGSM25542.CELGSM25543.CELGSM25548.CELGSM25549.CELGSM25550.CELGSM25551.CEL

Probe Intensities in 23 Replicates

Some Causes of Technical Variation

• Temperature of hybridization differs• Amount of RNA differs• RNA degraded in some samples• Yield of conversion to cDNA or cRNA

differs• Strength of ionic buffers differs• Stringency of wash differs• Scratches on some chips• Ozone (affects Cy5) at some times

Borrow an Idea from Model Testing

• Question: Is the model adequate? Or do hidden factors cause systematic errors?

• Examine residuals after fitting model – Should be IID Normal– Is there structure in residuals?– Plot against known technical covariates, such

as order of sample

• How to adapt residual examination for high-throughput assays?

Statistical QA for Arrays

• Model for signal of probe i on chip j: yij ~ i + ij

– Each gene has same mean in all arrays (mostly true)– Look at residuals after fitting model

• New twist for high-throughput assays:– Examine residuals within each chip (fix j; vary i)– Plot against known technical factors of probes– Is there any factor that seems to be predicting

systematic errors?

Statistical QA of Arrays• Significant artifacts may not be obvious

from visual inspection or bulk statistics

• General approach: plot deviations from average or residuals from fit against any technical variable:– Average Intensity across chips

– CG content or Tm

– Probe position relative to 3’ end of gene (for poly-T primed RNA)

– Physical location on chip

Ratio vs Intensity Plots: Saturation & Quenching

• Saturation– Decreasing rate of

binding of RNA at higher occupancies on probe

• Quenching:– Light emitted by one

dye molecule may be re-absorbed by a nearby dye molecule

– Then lost as heat– Effect proportional to

square of density

Plot of log ratio against average log intensity across chips

GSM25377 from the CEPH expression data GSE2552

How Much Variability on R-I?

• Ratio-Intensity plots for six arrays at random from Cheung et al Nature (2005)

Covariation with Probe Tm

• MAQC project

• Agilent 44K– Array 1C3– Performed by

Agilent

•Plot of log ratios to average against Tm •Bimodal distribution because two samples are very different

Covariation with Probe Position

• RNA degrades from 5’ end

• Intensity should decrease from 3’ end uniformly across chips

• affyRNAdeg plots in affy package

Plot of average intensity for each probe position across all genes against probe position

Effect of Runs of Guanines

• 4 G’s allows quadruplex structure

Spatial Variation Across Chips

Red/Green ratios show variation-probably concentrated

Ratios of ratios on slide to ratios on standard show consistent biases

In House Spotted Arrays

Ratio of ratios shows much clearer concentration of red spots on some slides

Note non-random but highly irregular concentration of red

Legend

Bioconductor arrayQuality Package

Background Subtraction (1)

• We think that local background contributes to bias

• Does subtracting background remove bias?

Local off-spot background may not be the best estimate of spot background (non-specific hyb)

Spots BG subtracted

Background Subtraction (2)

Raw spot ratios show a mild bias relative to averageAfter subtracting a high green bg in the center a red bias results

Raw Ratios Background BG-subtracted

Other Bias Patterns

This spotted oligo array shows strong biases at the beginning and end of each print-tip group

The background shows a milder version of this effect

Subtracting background compensates for about half this effect

Processed Raw Spot Background

Local Bias on Affymetrix ChipsImage of raw data on a log2 scale shows striations but no obvious artifacts

Image of ratios of probes to standard shows a smudge

Non-coding probes

Images show high values as red, low values as yellow

Spatial Artifacts on Affy Chips

Bubbles (yellow) in hybridization chamber

Touching cover slip and wiping incompletely

Scratches on cover slip

QC in Bioconductor

• Robust Multi-chip Analysis (RMA) – fits a linear model to each probe set– High residuals show regional patterns

High residuals in green

Available in affyQCReport package at www.bioconductor.org

See http://plmimagegallery.bmbolstad.com/

Affy QC Metrics in Bioconductor

• affyPLM package fits probe level model to Affymetrix raw data

• NUSE - Normalized Unscaled Standard Errors – normalized relative to

each gene

• How many big errors?

Spatial Artifacts in Agilent

• Usually not so strong as on other array types

• More diffuse artifacts – probably reflecting washing irregularities

Spatial Artifacts in Nimblegen

• More common than Agilent

• Usually more diffuse, probably reflecting washing

• Some sharp artifacts of unclear origin

Spatial Artifacts in Illumina Arrays

• Often bigger artifacts than Affy

• Less consequential because more beads, and all have same sequence

Recommended