32
Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew P. Harrison University of Essex, U. K.

Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

  • View
    225

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

Identification of spatial biases in Affymetrix oligonucleotide

microarrays

Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew P. Harrison

University of Essex, U. K.

Page 2: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

Agenda

Arteaga-Salas, et. al.

1. Introduction

2. Identification of spatial biases with replicates .

3. Reduction of spatial biases with replicates.

4. Identification of spatial biases without replicates.

5. Reduction of spatial biases without replicates.

6. Conclusions

Page 3: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

1. Introduction

Arteaga-Salas, et. al.

• Microarrays are popular tools to measure gene expression.

• Several laboratories invest important resources on this technology.

• Affymetrix Oligonucleotide Microarrays contain spatial biases in

their hybridizations (Suarez-Fariñas et. al. (2005); Langdon et. al.

(2008)). The problem is independent of chip-type.

• Some methods have been proposed to identify and reduce these

biases for replicated arrays.

• No methods available for experiments without replication.

Page 4: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

2. Identification of spatial flaws w/replicates

Arteaga-Salas, et. al.

• Suarez-Fariñas et. al. (2005) developed the “Harshlight” package

(available in Bioconductor).

• Harshlight uses statistical and image processing methods to

identify spatial defects.

• After identification of flawed locations in the array the user can

correct by substituting with the median value of all the available

arrays at each location, or with “N/A”.

• Disadvantage: ONLY works in the presence of replicate arrays.

Page 5: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

Arteaga-Salas, et. al.

ArcsBlobs

Rings

Harshlight report for 3 replicates of the GSE4217 experiment available at GEO (arrays GSM96262-4)

Page 6: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

2.1 Another method

Arteaga-Salas, et. al.

• Arteaga-Salas et. al. (2008) developed an independent method to

identify spatial biases using replicate arrays.

ij

ijijrijr

Ld

Where Lijr is the logarithm of the observed intensity values, ij is the

median of the Lijr values and ij is the standard deviation of the Lij

values.

• Select locations where abs(dijr)>25% (say).

•For location (i,j) and replicate r calculate dijr

Page 7: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

Arteaga-Salas, et. al.

• The selected locations represent “unusually high” or “unusually

low” values, in comparison with a reference set (in this case, the

reference set is the median of all replicates).

• Disadvantage: ONLY works in the presence of replicate arrays.

• Next:

Example 1: Three HG-U133 Plus 2.0 replicates (from GEO).

Example 2: Three HG-U133A replicates (from Affymetrix).

Example 3: Four DrosGenome1 replicates (from GEO).

Page 8: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

Arteaga-Salas, et. al.

Arcs

BlobsRings

Spatial flaws for 3 replicates of the GSE4217 experiment available at GEO (GSM96262-4) using HG-U133A Plus 2.0 arrays

Unusually

high values

Unusually

low values

Replicate 1 Replicate 2 Replicate 3

Page 9: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

Arteaga-Salas, et. al.

Spatial flaws for 3 replicates of the HG-U133A SpikeIn Experiment -- Affymetrix

Unusually

high values

Unusually

low values

Replicate 1 Replicate 2 Replicate 3

Page 10: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

Arteaga-Salas, et. al.

Spatial flaws for 4 replicates of the GSE6515 experiment available at GEO (GSM149276-9) using DrosGenome1 arrays

Replicate 1 Replicate 2 Replicate 3 Replicate 4

Page 11: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

3. Reducing spatial biases w/replicates

Arteaga-Salas, et. al.

• Harshlight proposes to substitute flawed locations with the

median (HMS) of all the arrays at each location or with “N/A”.

• Arteaga-Salas et. al. (2008) introduced two procedures to assist

with flaw removal:

CPP (complementary probe pair) adjustment, suitable only for

replicated arrays.

LPE (local probe effect) adjustment, suitable for replicate or non-

replicate arrays.

• CPP and LPE can be used separately or in sequence.

Page 12: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

3.1 Local Probe Effect (LPE) adjustment

Arteaga-Salas, et. al.

• LPE can be used whenever R (R>2) arrays are available.

• It uses the spatial structure in a 5 x 5 window centred at location

(i,j) to decide whether adjustment should take place.

• For array r we first calculate the values dijr given by,

ij

ijijrijr

Ld

Where Lijr is the logarithm of the observed value, ij is the median of

the Lijr values and ij is the standard deviation of the Lij values.

Page 13: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

Arteaga-Salas, et. al.

• Now, define Iij and Gij as follows:

Iij – The identifier of the array where dijr has largest absolute value.

Gij – Is 1 if the d-value with largest magnitude is positive, otherwise is

equal to -1.

• Using these two values calculate Eij with,

ijijij GIE

So, with R arrays, Eij takes one of the values { -R,-(R-1),…,-2,-1, 1,2,…

(R-1),R }

Page 14: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

Cell at location (i,j)

r =1 r =2 r =3

Original 45 38.8 34952

L ijr 3.807 3.658 10.462

d ijr -0.558 -0.596 1.154

ij = 5.976

ij = 3.886

I ij = 3

G ij = 1

E ij = 3

Arteaga-Salas, et. al.

• An example,

• If the 5 x 5 window contains a majority of informative locations (PM

or MM only) with the same E-code, then a spatial bias is present.

We adjust the value in cell (i,j,r).

5 x 5 window centered at (i,j)

3

-1 -1 3 1 -2

-1 3 3 3 3

3 3 3 3 1

3 3 3 3 3

-1 3 3 3 -2

17 cases where E=3

Page 15: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

• The adjusted value is given by,

•For each location in we calculate the d-values for array r in need

of correction, and let be their average.

Arteaga-Salas, et. al.

• Let be the set of N informative locations within the window (in the

example, N=17).

dLL ijijraijr

aijrL

d

Page 16: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

Arteaga-Salas, et. al.

• We apply LPE+CPP and Harshlight Median Substitution (HMS) to

Example 1 to illustrate the reduction of the spatial biases:

3.2 Results

Total % of defects

replicate 1 replicate 2 replicate 3

original 6.3 7.9 8.9

HMS (once) 1.7 3.0 3.3

HMS (twice) 0.8 2.2 2.3

CPP 0.9 0.9 1.8

LPE 3.8 5.3 5.2

CPP+LPE 0.8 0.9 1.8

LPE+CPP 0.6 0.6 1.7

Page 17: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

Arteaga-Salas, et. al.

Example 1 (three HG-U133 Plus 2.0 replicates)

Page 18: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

Arteaga-Salas, et. al.

Example 1 after LPE+CPP

Page 19: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

Arteaga-Salas, et. al.

How do we know that these adjustments are the appropriate adjustments?

Page 20: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

Arteaga-Salas, et. al.

From Arteaga-Salas et. al. (2008) in “Statistical Applications in Genetics and Molecular

Biology” (SAGMB).

ROC curves to measure the rate of false/negative positives in the HG-U133A

Spike-In Experiment (Affymetrix) before and after Spatial Flaws Reduction. Gene

Expression summarized with RMA.

Page 21: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

4. Identification --- without replicates

Arteaga-Salas, et. al.

• In the absence of replicates the two methods described before are

not applicable to visualize spatial flaws.

• To identify spatial biases without replicates we need an alternative

reference set to compare the values.

• Langdon et. al. (2008) calculated an “Average GeneChip” and a

“Variance GeneChip” using Affymetrix Chips in the Gene

Expression Omnibus (GEO) as available in February 2007.

• This was done separately by Chip type and organism.

Page 22: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

4.1 The Average GeneChip

Arteaga-Salas, et. al.

• To obtain the “Average GeneChip” the arithmetic mean of the

natural logarithm of the observed probe values in each available

chip was calculated.

• The upper and lower 0.5% of the values were discarded to avoid

the effects of outliers.

• Using the same set of data the variance was calculated to obtain

the “Variance GeneChip”.

Page 23: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

4.2 Steps to visualize spatial biases

Arteaga-Salas, et. al.

Let A be the Average GeneChip, V the Variance GeneChip and L the

logarithm of the observed values.

1. For each location (i,j) in the array, calculate

ij

ijijij

V

ALh

2. Sort hij by column j. For each sorted value assign a rank, and

store them in array K.

Page 24: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

Arteaga-Salas, et. al.

3. Define a “sub-array” centered at (i,j). A sub-array size 11 x 11

includes enough spatial information in a neighbourhood.

2

61

1

*61

*61

nn

ij

KZ

4. The sub-array centered at Kij contains information about

PM/MM/other probes. To avoid correlated values we do not

consider adjacent cells (only one probe in a PM,MM probe pair).

In total we select 61 probes from the total 121 available.

Calculate the scores Zij,

is the mean and is the variance of a discrete uniform distribution

(defined by the size of the chip).

Page 25: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

Arteaga-Salas, et. al.

5. Plot the locations where abs(Z)>= 2*S to identify neighbourhoods

with unusually low or unusually high values.

Following these 5 steps we applied the procedure separately to

three HG-U133 Plus 2.0 arrays from GEO (GSM46959,

GSM76563 and GSM117700), from the accession number

GSE2109.

The scores Z ~ N(0,S2). In the absence of spatial biases S2=1.

Page 26: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

Arteaga-Salas, et. al.

GSM46959 GSM76563 GSM117700

ScratchBlobs Unusual concentration

high

low

Page 27: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

5. Reducing biases – without replicates

Arteaga-Salas, et. al.

• Problem: In the absence of replicates, two of the three methods

presented are not applicable (CPP and Harshlight are not, LPE is).

• Without replicates we don’t know which are the “correct” values (we

need some reference arrays).

• Alternative: We can compare a “contaminated” array with other

arrays (at least two) of the same type where flaws have been

previously reduced.

• In Section 4 we presented three HG-U133A Plus2.0 arrays

“contaminated”. In Section 3 we “cleaned” three replicate arrays of the

same type.

Page 28: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

Arteaga-Salas, et. al.

• We now have three arrays of the same type.

• We can remove the flaws in the contaminated array using LPE.

The “clean” arrays: choose two of the three replicates previously

cleaned with LPE+CPP (let’s choose the first and second replicates

according to the Table).

The “contaminated” arrays: the three arrays presented in part 3.2

(the process is done separately for the three arrays).

Page 29: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

Arteaga-Salas, et. al.

GSM46959 GSM76563 GSM117700

high

low

Page 30: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

Arteaga-Salas, et. al.

Remaining flaws after LPE

Page 31: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

5. Conclusions

Arteaga-Salas, et. al.

• Oligonucleotide arrays contain spatial flaws in their hybridizations

(they are usually manifested as “blobs”, “rings” or “scratches”).

• The problem IS NOT uncommon.

• Some methods to reduce flaws exist, but not for experiments

without replication.

• Spatial biases AFFECT gene expression measurements.

Page 32: Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew

THANK YOU!

Arteaga-Salas, et. al.