25
1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

Embed Size (px)

Citation preview

Page 1: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

1

Two Color Microarrays

EPP 245/298

Statistical Analysis of

Laboratory Data

Page 2: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

2

Two-Color Arrays

• Two-color arrays are designed to account for variability in slides and spots by using two samples on each slide, each labeled with a different dye.

• If a spot is too large, for example, both signals will be too big, and the difference or ratio will eliminate that source of variability

Page 3: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

3

Dyes

• The most common dye sets are Cy3 (green) and Cy5 (red), which fluoresce at approximately 550 nm and 649 nm respectively (red light ~ 700 nm, green light ~ 550 nm)

• The dyes are excited with lasers at 532 nm (Cy3 green) and 635 nm (Cy5 red)

• The emissions are read via filters using a ccd device

Page 4: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

4

Page 5: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

5

Page 6: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

6

Page 7: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

7

File Format

• A slide scanned with Axon GenePix produces a file with extension .gpr that contains the results:http://www.axon.com/gn_GenePix_File_Formats.html

• This contains 29 rows of headers followed by 43 columns of data (in our example files)

• For full analysis one may also need a .gal file that describes the layout of the arrays

Page 8: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

8

"Block" "Column" "Row" "Name" "ID" "X" "Y" "Dia." "F635 Median" "F635 Mean" "F635 SD" "B635 Median" "B635 Mean" "B635 SD" "% > B635+1SD" "% > B635+2SD" "F635 % Sat."

Page 9: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

9

"F532 Median" "F532 Mean" "F532 SD" "B532 Median" "B532 Mean" "B532 SD" "% > B532+1SD" "% > B532+2SD" "F532 % Sat."

Page 10: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

10

"Ratio of Medians (635/532)" "Ratio of Means (635/532)" "Median of Ratios (635/532)" "Mean of Ratios (635/532)" "Ratios SD (635/532)""Rgn Ratio (635/532)" "Rgn R² (635/532)" "F Pixels" "B Pixels" "Sum of Medians" "Sum of Means" "Log Ratio (635/532)" "F635 Median - B635""F532 Median - B532" "F635 Mean - B635" "F532 Mean - B532" "Flags"

Page 11: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

11

Analysis Choices

• Mean or median foreground intensity

• Background corrected or not

• Log transform (base 2, e, or 10) or glog transform

• Log is compatible only with no background correction

• Glog is best with background correction

Page 12: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

12

d41 <- read.table("037841.gpr",header=T,skip=29)d41 <- d41[,c(4,5,9,10,12,13,18,19,21,22)]

d50 <- read.table("037850.gpr",header=T,skip=29)d50 <- d50[,c(4,5,9,10,12,13,18,19,21,22)]

d46 <- read.table("037846.gpr",header=T,skip=29)d46 <- d46[,c(4,5,9,10,12,13,18,19,21,22)]

d47 <- read.table("037847.gpr",header=T,skip=29)d47 <- d47[,c(4,5,9,10,12,13,18,19,21,22)]

d48 <- read.table("037848.gpr",header=T,skip=29)d48 <- d48[,c(4,5,9,10,12,13,18,19,21,22)]

d49 <- read.table("037849.gpr",header=T,skip=29)d49 <- d49[,c(4,5,9,10,12,13,18,19,21,22)]

d43 <- read.table("037843.gpr",header=T,skip=29)d43 <- d43[,c(4,5,9,10,12,13,18,19,21,22)]

Page 13: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

13

dataprep <- function(method="median",bc=F){ if ((method=="mean")&(bc)) cvec <- c(1,0,-1,0) if ((method!="median")&(bc)) cvec <- c(0,1,0,-1) if ((method=="mean")&(!bc)) cvec <- c(1,0,0,0) if ((method!="median")&(!bc)) cvec <- c(0,1,0,0)

d41a <- as.matrix(d41[,3:6]) %*% cvec d41b <- as.matrix(d41[,7:10]) %*% cvec d50a <- as.matrix(d50[,3:6]) %*% cvec d50b <- as.matrix(d50[,7:10]) %*% cvec d46a <- as.matrix(d46[,3:6]) %*% cvec d46b <- as.matrix(d46[,7:10]) %*% cvec ... ... ... ... ... ... ... ... ... ...

d45a <- as.matrix(d43[,3:6]) %*% cvec d45b <- as.matrix(d43[,7:10]) %*% cvec alldata <- cbind(d41a,d41b,d50a,d50b,d46a,d46b,d47a,d47b, d48a,d48b,d49a,d49b,d43a,d43b,d44a,d44b,d42a,d42b,d43a,d43b) return(alldata)}

Page 14: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

14

alldata <- dataprep(method="median",bc=F)rownames(alldata) <- d41[,1]dye <- as.factor(rep(c("Cy5","Cy3"),10))slide <- as.factor(rep(1:10,each=2))treat <- c(1,0,0,1,0,1,1,0,0,3,3,0,0,3,3,0,0,1,1,0)

geneID <- d41[,1:2]

Page 15: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

15

Array normalization

• Array normalization is meant to increase the precision of comparisons by adjusting for variations that cover entire arrays

• Without normalization, the analysis would be valid, but possibly less sensitive

• However, a poor normalization method will be worse than none at all.

Page 16: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

16

Possible normalization methods

• We can equalize the mean or median intensity by adding or multiplying a correction term

• We can use different normalizations at different intensity levels (intensity-based normalization) for example by lowess or quantiles

• We can normalize for other things such as print tips

Page 17: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

17

Group 1 Group 2

Array 1 Array 2 Array 3 Array 4

Gene 1 1100 900 425 550

Gene 2 110 95 85 110

Gene 3 80 65 55 80

Example for Normalization

Page 18: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

18

> normex <- matrix(c(1100,110,80,900,95,65,425,85,55,550,110,80),ncol=4)> normex [,1] [,2] [,3] [,4][1,] 1100 900 425 550[2,] 110 95 85 110[3,] 80 65 55 80> group <- as.factor(c(1,1,2,2))

> anova(lm(normex[1,] ~ group))Analysis of Variance Table

Response: normex[1, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 262656 262656 18.888 0.04908 *Residuals 2 27812 13906 ---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Page 19: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

19

> anova(lm(normex[2,] ~ group))Analysis of Variance Table

Response: normex[2, ] Df Sum Sq Mean Sq F value Pr(>F)group 1 25.0 25.0 0.1176 0.7643Residuals 2 425.0 212.5

> anova(lm(normex[3,] ~ group))Analysis of Variance Table

Response: normex[3, ] Df Sum Sq Mean Sq F value Pr(>F)group 1 25.0 25.0 0.1176 0.7643Residuals 2 425.0 212.5

Page 20: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

20

Group 1 Group 2

Array 1 Array 2 Array 3 Array 4

Gene 1 975 851 541 608

Gene 2 -15 46 201 168

Gene 3 -45 16 171 138

Additive Normalization by Means

Page 21: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

21

> mn <- mean(cmn)> normex - rbind(cmn,cmn,cmn)+mn [,1] [,2] [,3] [,4]cmn 974.58333 851.25 541.25 607.9167cmn -15.41667 46.25 201.25 167.9167cmn -45.41667 16.25 171.25 137.9167> normex.1 <- normex - rbind(cmn,cmn,cmn)+mn

Page 22: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

22

> mn <- mean(cmn)> anova(lm(normex.1[1,] ~ group))Analysis of Variance Table

Response: normex.1[1, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 114469 114469 23.295 0.04035 *Residuals 2 9828 4914 > anova(lm(normex.1[2,] ~ group))Analysis of Variance Table

Response: normex.1[2, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 28617.4 28617.4 23.295 0.04035 *Residuals 2 2456.9 1228.5 > anova(lm(normex.1[3,] ~ group))Analysis of Variance Table

Response: normex.1[3, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 28617.4 28617.4 23.295 0.04035 *Residuals 2 2456.9 1228.5

Page 23: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

23

Group 1 Group 2

Array 1 Array 2 Array 3 Array 4

Gene 1 779 776 687 679

Gene 2 78 82 137 136

Gene 3 57 56 89 99

Multiplicative Normalization by Means

Page 24: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

24

> normex*mn/rbind(cmn,cmn,cmn) [,1] [,2] [,3] [,4]cmn 779.16667 775.82547 687.33407 679.13851cmn 77.91667 81.89269 137.46681 135.82770cmn 56.66667 56.03184 88.94912 98.78378> normex.2 <- normex*mn/rbind(cmn,cmn,cmn)> anova(lm(normex.2[1,] ~ group))

Response: normex.2[1, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 8884.9 8884.9 453.71 0.002197 **Residuals 2 39.2 19.6 > anova(lm(normex.2[2,] ~ group))

Response: normex.2[2, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 3219.7 3219.7 696.33 0.001433 **Residuals 2 9.2 4.6 > anova(lm(normex.2[3,] ~ group))

Response: normex.2[3, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 1407.54 1407.54 57.969 0.01682 *Residuals 2 48.56 24.28

Page 25: 1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004 EPP 245 Statistical Analysis of Laboratory Data

25

Group 1 Group 2

Array 1 Array 2 Array 3 Array 4

Gene 1

Gene 2

Gene 3

Multiplicative Normalization by Medians