Filtering and Normalization of Microarray Gene Expression Data
Waclaw Kusnierczyk Norwegian University of Science and Technology
Trondheim, Norway
Slide 2
slide 2 Outline Filtering: spots removal of spots based on
quality measures Normalization compensation for measurement errors
Examples of common problems
Slide 3
slide 3 Useful plots Channel - channel plot (CC)Intensity -
ratio plot (AM or IR)
Slide 4
slide 4 Filtering: Spots Criteria used to remove spots spot
area [pixels] signal/noise ratio (spot intensity vs. background
intensity) other quality measures (e.g. based on quality scores
from image analysis software) morphological criteria pixel-level
variability
Slide 5
slide 5 Filtering: Spots Spot area
Slide 6
slide 6 Filtering: Spots Spot area based filtering keep spots
with area > threshold in both channels problem: setting the
appropriate threshold dependent on the definition of the spot
(image analysis software), and the distribution of the spot area
typical value: 10 pixels
Slide 7
slide 7 Filtering: Spots Signal and background
Slide 8
slide 8 Filtering: Spots Signal/noise based filtering keep
spots with signal / background > threshold in both channels
problem: setting the appropriate threshold dependent on the spot
and background definition (image analysis software) typical value:
sgn/bkg > 2 (or, equivalent, sgn - bkg > bkg)
Slide 9
slide 9 Filtering: Spots Signal/noise based filtering
(alternative) flag spots if S ij < B ij + Bij, where: S ij : i
th spot intensity in j th channel (not corrected) B ij : i th spot
background in j th channel Bij : i th spot background deviation in
j th channel : user defined threshold
Slide 10
slide 10 Filtering: Spots (example)
Slide 11
slide 11 Filtering: Spots Other criteria Intensity threshold on
background corrected intensity (for each channel separately) Spot
quality measures (pixelwise distributional properties of spot and
background intensities, manual morphology-based spot flagging etc.)
Replicate-based spot filtering (adaptive threshold selection based
on a repeatability coefficient, coefficient of variation etc.)
Slide 12
slide 12 Filtering: Spots Total intensity (log 2 )
threshold
Slide 13
slide 13 Filtering: Spots Morphology based filtering
Slide 14
slide 14 Normalization Analysis of systematic errors adjustment
for bias coming from variation in the technology rather than from
biology Different sources of non-linearity Print-tip differences
Efficiency of dye incorporation (labelling) Non-uniformity in
hybridisation Scanning Between slide variation (print quality,
ambient conditions)
Slide 15
slide 15 Normalization Selection of elements Housekeeping
genes, spike controls, tip-dependence, raw data, between array
normalization Method Constant subtraction (shift) (mean/median log
2 ratio, iterative c estimation, ANOVA) Locally weighted mean
(intensity or location dependent) Other recently proposed
methods
Slide 16
slide 16 Normalization (example 1) Intensity independent
normalization with median ratio subtraction
Slide 17
slide 17 Normalization (example 1) Intensity independent
normalization with median ratio subtraction
Slide 18
slide 18 Normalization (example 1) Intensity dependent
normalization with locally weighted mean, global
slide 22 Normalization Location dependent normalization with
locally weighted mean (from SNOMAD web page)
Slide 23
slide 23 Common problems: examples
Slide 24
slide 24 Common problems: examples
Slide 25
slide 25 Common problems: examples
Slide 26
slide 26 Common problems: examples
Slide 27
slide 27 Common problems: examples
Slide 28
slide 28 Common problems: examples
Slide 29
slide 29 Common problems: examples
Slide 30
slide 30 Acknowledgments Mette Langaas Department of
Mathematical Sciences, Norwegian Institute of Science and
Technology Astrid Lgreid, Kristin Nrsett Department of Physiology
and Biomedical Engineering, Norwegian Institute of Science and
Technology Per Kristian Lehre Department of Computer and
Information Science, Norwegian Institute of Science and
Technology