MB206 1
Also known as DNA ChipAllows simultaneous measurement
of the level of transcription for every gene in a genome (gene expression)
Transcription? Process of copying of DNA into
messenger RNA (mRNA) Environment dependant!
Microarray detects mRNA, or rather the more stable cDNA
MB206 2
• High-throughput measuring- 5000-20000 gene expressions at the same time
• Identify genes that behaves different in different cell populations- tumor cells vs healthy cells- brain cells vs liver cells- same tissue different organisms
• Time series experiments- gene expressions over time after treatment
microarray
scanning
analysis
cDNA clones(probes)
PCR product amplificationpurification
printing
0.1nl / spotHybridize
RNA
Tumor sample
cDNA
RNA
Reference sample
cDNA
excitation
red lasergreen
laser
emission
overlay images and normalise
Hybridize
RNA
Tumor sample
cDNA
RNA
Reference sample
cDNA
Biological questionDifferentially expressed genesSample class prediction etc.
Testing
Biological verification and interpretation
Microarray experiment
Estimation
Experimental design
Image analysis
Normalization
Clustering Discrimination
R, G
16-bit TIFF files
(Rfg, Rbg), (Gfg, Gbg)
Laser scans array and produces images One laser for each color, e.g. one for green,
one for red Image analysis, main tasks:
Noise suppression Spot localization and detection, including the
extraction of the background intensity, the spot position, and the spot boundary and size
Data quantification and quality assessment Image Analysis is a book on its own:
Kamberova, G. & Shah, S. “DNA Array Image Analysis Nuts & Bolts“. DNA Press LLC, 2002
MB206 12
Transformed data {(M,A)}n=1..5184:
M = log2(R/G) (ratio),
A = log2(R·G)1/2 = 1/2·log2(R·G) (intensity signal)
R=(22A+M)1/2, G=(22A-M)1/2
“Observed” data {(R,G)}n=1..5184:
R = red channel signalG = green channel signal
(background corrected or not)
Biased towards the green channel & Intensity dependent artifacts
Scaled print-tip normalization
Median Absolute Deviation (MAD) Scaling
Averaging
Extreme in T values?
Extreme in M values?...or extreme in some other statistics?
Gene: Mavg Aavg TSE
2341 -0.86 10.9 -18.00.125
6412 -0.75 11.1 -14.70.102 6123 -0.70 9.8-12.2 0.121
102 0.65 10.3 -14.50.136 2020 0.64 9.3 -11.90.118
3132 0.62 9.9 -14.40.090
4439 -0.62 9.7 -14.60.088
2031 -0.61 10.7 -13.70.087
657 -0.60 9.2 -13.60.094
502 0.58 10.0 -12.70.101
1239 -0.58 9.8 -11.40.103
5392 -0.57 9.9 -20.70.057
3921 0.52 11.3 13.50.083
...
10. Which genes are actually up- and down regulated?
11. P-values.
12. Planning of experiments:- what is best design?- what is an optimal sample sizes?
13. Classification:- of samples.- of genes.
14. Clustering:- of samples.- of genes.
15. Time course experiments.
16. Gene networks.- identification of pathways
17. ...
1. Image analysis- what is foreground?- what is background?
2. Quality- which spots can we trust?- which slides can we trust?
3. Artifacts from preparing the RNA, the printing, the scanning etc.
4. Data cleanup
5. Normalization within an experiment:- when few genes change.- when many genes change.- dye-swap to minimize dye effects.
6. Normalization between experiments:- location and scale effects.
7. What is noise and what is variability?
MB206 20
Brown & Botstein, 1999