23
ViaLogy ViaLogy Lien Chung Lien Chung Jim Breaux, Jim Breaux, Ph.D. Ph.D. SoCalBSI 2004 SoCalBSI 2004 Improvements to Microarray Improvements to Microarray Analytical Methods and Analytical Methods and Development of Differential Development of Differential Expression Toolkit ” Expression Toolkit ” Funded by the National Science Foundation and National Institute of Health

ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

ViaLogyViaLogy

Lien ChungLien Chung

Jim Breaux, Ph.D.Jim Breaux, Ph.D.

SoCalBSI 2004SoCalBSI 2004

““ Improvements to Microarray Analytical Improvements to Microarray Analytical Methods and Development of Methods and Development of

Differential Expression Toolkit ”Differential Expression Toolkit ”

Funded by the National Science Foundation and National Institute of Health

Page 2: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

Outline of Talk

Background Affymetrix GeneChips Vialogy and Microarray Analysis

Accelerating Low Level Analysis Algorithms Quantile Normalization Median Polish

Differential Expression Toolkit Statistical Analysis of Microarrays (SAM)

Future Direction

Page 3: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

Affymetrix GeneChip® Microarrays

Useful tool to measure the level of mRNA expression of thousands of genes in a biological

sample

Signal detection

Convert fluorescence to signal

Normalization

Reduce unwanted variation across chips

Summarization

Reduce 11- 20 probe intensities of each gene to a single value

Low Level Low Level AnalysisAnalysis

Page 4: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

Internet Resources

An open source and open software project for the analysis and comprehension of genomic data

A collection of analysis packages implemented in the R language

Packages used: affy, siggenes

BioConductor

R Project Open source language and environment for statistical

computing and graphics

Pros: built in mathematical functions, supports graphics

Cons: computationally slow

Page 5: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

ViaLogy’s Low Level Analysis(Part 1)

VMAxS

Microarray image

Pixel intensity

CEL Report

Feature level signal

Signal Detection via“Active Signal Processing”

Page 6: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

CEL Report

NORMALIZATION

(Quantile Normalization)

SUMMARIZATION

(Median Polish)

Project 1: Recode RMA as a C interface from R Specific to Vialogy’s input files

Introduce a way to deal with zero values

Break up process into individual functions

ViaLogy’s Low Level Analysis (Part 2)

Robust Multi-Chip Analysis (RMA)

Written in R and C language (affy package)

Only specific to Affymetrix input files

Do not have special ways of dealing with zero values

Irizarry, R. et al (2003)

Slow Run Time in R language

Page 7: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

Quantile Normalization Significant variation in the distribution of intensity values across arrays Transforms the distribution of probe intensities to be same across arrays Final distribution is the average of each quantile across chips

Bolstad et al. (2003)

Density

Log Intensities

Page 8: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

Quantile Normalization cont’d

1 5 3 5

2 1 6 7

3 2 2 6

4 6 1 8

1 1 1 5

2 2 2 6

3 5 3 7

4 6 6 8

2

3

4.5

6

Sort each column of original matrix

Take average across rows

Set each value to corresponding row

average

Unsort columns of

matrix to original order

2 4.5 4.5 2

3 2 6 4.5

4.5 3 3 3

6 6 2 6

2 2 2 2

3 3 3 3

4.5 4.5 4.5 4.5

6 6 6 6

Bolstad et al. (2003)

Page 9: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

Median Polish

Summarization step used in RMA Fits a linear model to the data for each probe

set across all microarrays Greatly reduces variability for genes

expressed at lower levels

Tukey, J. (1977)

Irizarry, R. (2003)

11-20 features

per gene

1 expression value

per gene

Page 10: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

Quantile Normalization and Median Polish in C

Read literature on Quantile Normalization and Median Polish

Use R and C code as foundation for my code

Add functionalities to deal with ties and zeroes

Testing of code for accuracy of algorithm

Steps Involved . . .

Results . . .

QUANTILE NORMALIZATION

11 min 53 secs

For ~ 20,000 genes, 30 Arrays

MEDIAN POLISH

4 min 43 secs

10 secs 20 secs

R code

C code

Page 11: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

CEL file

NORMALIZATION

(Quantile Normalization)

SUMMARIZATION

(Median Polish)

Differential ExpressionToolkit

Project 2 :

To Recap . . .

Page 12: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

Statistical Analysis of Microarrays (SAM)

; 1, 2,...,id i p

(1) (2) ( )...b b bpd d d

( ) ( )1

1 Bb

i ib

d dB

Calculate a statistic (d-score) for each gene.

Order the d-scores.

Create B sets of random permutations of group labels. For each permutation calculate d-scores for all genes and order them.

From the B set of ordered statistics, find expected order statistics.

Plot observed d-scores v. expected d-scores and evaluate significant genes based on user-defined threshold (Δ)

(1) (2) ( )... pd d d

Tusher et al. (2001)

Page 13: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

SAM Example

Group 1 Group 2

1 2 3 4 5 6

Gene 1 1.1 0.3 0.4 2.1 1.6 1.3

Gene 2 0.1 1.2 0.5 1.5 -0.3 -0.3

Gene 3 0.7 -0.2 1.3 -0.3 -0.5 1.5

Gene 4 -0.9 1.4 0.6 -0.6 1.0 1.3

Gene 5 1.5 0.8 1.0 -0.7 0.3 -0.8

ordered

d-score d-score

-1.5 -1.5

0.3 -0.2

0.4 0.3

-0.2 0.4

1.6 1.6

Observed d-scores

2 1

0i

i

x xd

s s

0

mean of group

stand. dev

fudge factor (constant)

i

i

x i

s

s

Page 14: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

SAM Example (cont’d)

Permutation # i

Group 1 Group 2 ordered

5 2 4 1 6 3 d-score d-score

Gene 1 1.1 0.3 0.4 2.1 1.6 1.3 0.3 -1.2

Gene 2 0.1 1.2 0.5 1.5 -0.3 -0.3 0.9 -0.2

Gene 3 0.7 -0.2 1.3 -0.3 -0.5 1.5 -0.2 0.3

Gene 4 -0.9 1.4 0.6 -0.6 1.0 1.3 0.5 0.5

Gene 5 1.5 0.8 1.0 -0.7 0.3 -0.8 -1.2 0.9

Permutation #1 Permutation #2 … Permutation #B Avg d-scores

-1.2 -0.6 -0.2 0.5

Ordered -0.2 -0.3 -0.1 0.8

d-scores 0.3 0.1 1.0 1.3

0.5 0.2 1.2 1.2

0.9 1.6 1.3 0.6

Expected d-scores

Page 15: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

SAM Example (cont’d)

Page 16: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

SAM Implementation

Siggenes (BioConductor)

R language (slow)

Too many options

C interface from R Faster run time

Specific to Vialogy’s input files and functionalities

Read SAM literature and understand algorithm

Go through Siggenes source code

Write C code, taking out unnecessary steps and adding additional functionalities

For data set of ~ 7000 genes, 8 Arrays

SAM in R C interface from R

~60 seconds ~5 seconds

Page 17: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

Input to SAM

Page 18: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

Results in R

Page 19: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

Future Direction

1. SAM Implementation for other study types such as

“paired” and “one-class” Procedures for dealing with zeros

2. Differential Expression Toolkit Evaluate other more accurate and efficient

methods

Page 20: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

References Journals

Irizarry, R. et al. (2003) “Exploration, normalization, and summaries of high density oligonucleotide array probe level data,” Biostatistics.

Bolstad, (2003). “A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Bias and Variance” Bioinformatics

Tukey, John. (1977) “Exploratory Data Analysis”. Tusher et al. (2001). “Significance analysis of microarrays

applied to ionizing radiation response,” PNAS. Websites

www.bioconductor.org www.r-project.org www-stat.stanford.edu/~tibs/SAM/

Page 21: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

Acknowledgements

SoCalBSI Members Prof. Jamil Momand Prof. Sandra Sharp Prof. Wendie Johnston Prof. Nancy Warter-Perez Jacqueline Heras Fellow Interns

Jim Breaux, Ph.D.

Sandeep Gulati, Ph.D.

Robin Hill

Juan Guitterez

Vijay Daggumati

Other Employees

National Science Foundation & National Institute of Health

Page 22: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Page 23: ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

Median Polish Cont’d

and so on…until sum of the “residuals” of the matrix is small

The probeset summary for each gene is computed by taking into account the row effect and column effect that is determined by Median Polish

Tukey, J. (1977)