Image Processing for cDNA Microarray Data

Preview:

DESCRIPTION

Image Processing for cDNA Microarray Data. Prepared with massive assistance from Yee Hwa Yang (Berkeley, WEHI), and reporting on work done jointly with her, Sandrine Dudoit (Stanford) and Mike Buckley (CSIRO, Sydney). - PowerPoint PPT Presentation

Citation preview

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Image Processing for cDNA Microarray Data

Prepared with massive assistance from Yee Hwa Yang (Berkeley, WEHI), and reporting on work done jointly with her, Sandrine Dudoit (Stanford) and Mike Buckley (CSIRO, Sydney).

References : M Eisen and P Brown, Methods in Enzymology vol 303, 1999; Chapter 2, DNA Microarrays (ed M Schena, OUP 1999) by Mack J Schermer; Chapter 13, Microarray Biochip Technology (ed M Schena, Eaton 2000) by Basarsky et al.

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Scanner Process

Dye Photons Electrons Signal

Laser PMTA/D

Convertor

excitation amplification FilteringTime-spaceaveraging

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

GenePix 4000a Microarray Scanner Protocol

1. Turn on scanner.

2. Slide scanner door open. Insert chip hyp side down and clip chip holder easily around the slide

3 Set PMTs to 600 in both 635nm (Cy3) and 532 (Cy5) channels.

4. Perform low resolution “PREVIEW SCAN” to determine location of spots and initial hyb intensities

5. Once scan location determined, draw a “SCAN AREA” marquis around the array

6. Perform quick visual inspection of hyb and make initial adjustments to PMTs

7. For gene expression hybs, raise or lower the red and green PMTs to achieve color balance

8. Before you perform your data scan, change “LINES TO AVERAGE” to 2.

9. Perform a high-resolution “DATA-SCAN”……(ctd)

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

10. Observe the histograms and make adjustments to PMTs.

11. Once the PMT level has been set so that the Intensity Ratio is near 1.00 perform a “DATA SCAN” over “SCAN AREA” and save the results.

12. To save your image, select “SAVE IMAGES”.

13. Save as type=Multi-image TIFF files.

14. Once scanned and saved, you are ready to assign spot identities and calculate results.

Note: For us, normalization is performed later during data analysis, see next lecture.

GenePix 4000a Microarray Scanner Protocol, ctd

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Scanner

Laser

PMT

Dye

Glass Slide

Objective Lens

Detector lens

Pinhole

Beam-splitter

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

How to adjust for PMT?

Cy3 Cy51 600 6002 650 6003 650 6504 700 6505 650 7006 700 7007 750 750

saturated

Very weak

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

After normalisation

In addition, the ranking of the genes stays pretty much the same.

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Practical Problems 1

• Comet Tails• Likely caused by

insufficiently rapid immersion of the slides in the succinic anhydride blocking solution.

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Practical Problems 2

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Practical Problems 3

High Background• 2 likely causes:

– Insufficient blocking.

– Precipitation of the

labeled probe.

Weak Signals

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Practical Problems 4

Spot overlap:Likely cause: toomuch rehydrationduring post -processing.

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Practical Problems 5

DustDust

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Steps in Images Processing

1. Addressing: locate centers

2. Segmentation: classification of pixels either as signal or background. using seeded region growing).

3. Information extraction: for each spot of the array, calculates signal intensity pairs, background and quality measures.

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Steps in Image Processing

• Spot Intensities– mean (pixel intensities).– median (pixel intensities).

– Pixel variation (IQR of log (pixel

intensities).• Background values

– Local

– Morphological opening

– Constant (global)

– None

• Quality Information

Signal

Background

3. Information Extraction

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Addressing

This is the process of assigning coordinates to each of the spots.

Automating this part of the procedure permits high throughput analysis.

4 by 4 grids19 by 21 spots per grid

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Addressing

Registration

Registration

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Problems in automatic addressing

Misregistration of the red and green channels

Rotation of the array in the image

Skew in the arrayRotation

Rotation

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Segmentation methods• Fixed circles• Adaptive Circle• Adaptive Shape

– Edge detection.– Seeded Region Growing. (R. Adams and L.

Bishof (1994) :Regions grow outwards from the seed points preferentially according to the difference between a pixel’s value and the running mean of values in an adjoining region.

• Histogram Methods– Adaptive threshold.

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Examples of algorithms and software implementation

Methods Software / algorithms

Fixed Circle ScanAlyze, GenePix, QuantArray

Adaptive Circle GenePix

Adaptive Shape Edging and region growing.

Histogram Method QuantArray and adaptivethresholding.

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Limitation of fixed circle method

SRG Fixed Circle

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Limitation of circular segmentation

—Small spot—Not circular

Results from SRG

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Information Extraction

—Spot Intensities—mean (pixel intensities).—median (pixel intensities).

—Background values—Local —Morphological opening—Constant (global)—None

—Quality Information

Take the average

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Local Backgrounds

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Information

• Quality– Area– Circularity– Signal to Noise ratio

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Quality Measurements

• Array– Correlation between spot intensities.– Percentage of spots with no signals.– Distribution of spot signal area.

• Spot– Signal / Noise ratio.– Variation in pixel intensities.– Identification of “bad spot” (spots with no signal).

• Ratio (2 spots combined)– Circularity

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Quality of Array Distribution of areas. - Judge by eye

- Look at variation. (e.g, SD)

Cy3 area

• mean 57

•median 56

•SD 20.67

Cy5 area

• mean 59

• median 57

• SD 24.34

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Does the image analysis matter?Spot.nbgSpot.nbg Spot.morphSpot.morph

Spot.valleySpot.valley ScanAlyzeScanAlyze

Department of Statistics, University of California, Berkeley , and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research

Background makes a difference

Background method Segmentation method Exp1 Exp2S.nbg 6 6Gp.nbg 7 6SA.nbg 6 6

No background QA.fix.nbg 7 6QA.hist.nbg 7 6QA.adp.nbg 14 14S.valley 17 21GP 11 11

Local surrounding SA 12 14QA.fix 18 23QA.hist 9 8QA.adp 27 26

Others S.morph 9 9S.const 14 14

Medians of the SD of log2(R/G) for 8 replicated spots multiplied by 100and rounded to the nearest integer.

Recommended