7
Compensation of Cross-Hybridization in DNA Microarray Data Joint work with Quaid Morris (1) , Tim Hughes (2) and Brendan Frey (1) (1)Probabilistic and Statistical Inference Group, University of Toronto (2) Banting & Best Department of Medical Research, University of Toronto Jim Huang (1) ,

Detection and Compensation of Cross- Hybridization in DNA Microarray Data Joint work with Quaid Morris (1), Tim Hughes (2) and Brendan Frey (1) (1)Probabilistic

Embed Size (px)

Citation preview

Page 1: Detection and Compensation of Cross- Hybridization in DNA Microarray Data Joint work with Quaid Morris (1), Tim Hughes (2) and Brendan Frey (1) (1)Probabilistic

Detection and Compensation of Cross-

Hybridization in DNA Microarray Data

Joint work with Quaid Morris(1), Tim Hughes(2)

and Brendan Frey(1)

(1) Probabilistic and Statistical Inference Group, University of Toronto

(2) Banting & Best Department of Medical Research, University of Toronto

Jim Huang (1),

Page 2: Detection and Compensation of Cross- Hybridization in DNA Microarray Data Joint work with Quaid Morris (1), Tim Hughes (2) and Brendan Frey (1) (1)Probabilistic

Description and Applications of DNA Microarrays Microarrays consist of a 2-D

array of probes, each with a short DNA sequence attached. These sequences are called oligonucleotide sequences.

The output of each probe is approximately proportional to the amount of DNA that binds to the probe from a given tissue; the data for each probe is an N-dimensional expression profile vector, where N is the number of tissues used on the array.

DNA microarrays can be used to measure the level of gene expression across these N tissues.

Page 3: Detection and Compensation of Cross- Hybridization in DNA Microarray Data Joint work with Quaid Morris (1), Tim Hughes (2) and Brendan Frey (1) (1)Probabilistic

Hybridization and cross-hybridization The process of 2

complementary DNA strands binding is called hybridization;

Ideally, an oligonucleotide probe will only bind to the DNA sequence for which it was designed and to which it is complementary;

However, many DNA sequences are similar to one another and can bind to other probes on the array;

This phenomenon is called cross-hybridization;

AGCTAGGAT

TCGAT CCTA

ATCTAGAAT

TCGAT CCTA

Hybridization Cross-hybridization

Oligonucleotide Probe

DNA from tissue sample

Page 4: Detection and Compensation of Cross- Hybridization in DNA Microarray Data Joint work with Quaid Morris (1), Tim Hughes (2) and Brendan Frey (1) (1)Probabilistic

The trouble with cross-hybridization

With cross-hybridization, each probe will signal the presence of multiple sequences other than that it was designed for;

This skews the observed data from the expected data.

Expected expression profile vector

(no hybridization)

Observed expression profile vector

(cross-hybridized)

= +

Page 5: Detection and Compensation of Cross- Hybridization in DNA Microarray Data Joint work with Quaid Morris (1), Tim Hughes (2) and Brendan Frey (1) (1)Probabilistic

Detecting cross-hybridization (1)

To test for whether cross-hybridization is impacting the gene expression data, we perform a BLAST sequence match on all oligonucleotide probe sequences used on the microarray;

Many probes will be matched with sequences for which it wasn’t specifically designed.

Page 6: Detection and Compensation of Cross- Hybridization in DNA Microarray Data Joint work with Quaid Morris (1), Tim Hughes (2) and Brendan Frey (1) (1)Probabilistic

Detecting cross-hybridization (2) We compute the Pearson

correlation coefficient ρ between matched probe sequence expression profiles and between the profiles of randomly-paired probes;

Approximately 33% of the BLAST-matched probes have ρ > 0.95, whereas only 2% of randomly-matched probes have ρ >0.95;

This difference in the 2 distributions indicates that cross-hybridization indeed has a significant impact on the observed gene expression data.

Page 7: Detection and Compensation of Cross- Hybridization in DNA Microarray Data Joint work with Quaid Morris (1), Tim Hughes (2) and Brendan Frey (1) (1)Probabilistic

Compensating for cross-hybridization

We model the observed, cross-hybridized expression profile vector x as a matrix product of a hybridization matrix Λ and an unobserved expression profile vector z in which there is no cross-hybridization.

The elements λij of the Λ matrix are set as parameterized functions of the Gibbs free energy ΔGij between probes i and j.

To compensate for cross-hybridization, we use a generalized Expectation-Maximization algorithm in which we solve for z and Λ iteratively.