Affymetrix Probe Level Analysis - Biostatisticsririzarr/Talks/jnj-affy.pdf · • affy R package ()...

Affymetrix Probe Level Analysis

Rafael A. Irizarry and Zhijin WuDepartment of Biostatistics, JHU

Johnson and Johnson, 12/5/3

Contact Information

• e-mails: rafa@jhu.edu, zwu@jhsph.edu

• Personal webpages: • http://www.biostat.jhsph.edu/~ririzarr• http://www.biostat.jhsph.edu/~zwu

• (http://www.bioconductor.org)

Acknowledgements

• Paco Martinez-Murillo, Forrest Spencer (JHU)• Felix Naef (Rockefeller)• Ben Bolstad, Sandrine Dudoit, Terry Speed

(Berkeley)• Jean Yang (UCSF)• Robert Gentleman (Harvard)• Wolfgang Huber (Germany)• Johnson & Johnson

Outline

• Quick review of technology• Overview of Issues• Previous Work• RMA• Improvements to RMA

Applications of microarrays

• Measuring transcript abundance– Differential Expression– Classifying samples– Detecting expression pattern

• Other applications:– Genotyping– TAG arrays

How they work

Before Labelling

Array 1 Array 2

Sample 1 Sample 2

Before Hybridization

Array 1 Array 2

Sample 1 Sample 2

After Hybridization

Array 1 Array 2

Scanner Image

Array 1 Array 2

Quantification

Array 1 Array 2

4 2 0 3 0 4 0 3

Microarray Image

Case Study: Preprocessing Affymetrix GeneChip Arrays

2424µµmm

Millions of copies of a specificMillions of copies of a specificoligonucleotideoligonucleotide probeprobe

Image of Hybridized Probe ArrayImage of Hybridized Probe Array

>200,000 different>200,000 differentcomplementary probes complementary probes

Single stranded, Single stranded, labeled RNA targetlabeled RNA target

OligonucleotideOligonucleotide probeprobe

1.28cm1.28cm

GeneChipGeneChip Probe ArrayProbe ArrayHybridized Probe CellHybridized Probe Cell

Compliments of D. Gerhold

Before Hybridization

Array 1 Array 2

Sample 1 Sample 2

More Realistic

Array 1 Array 2

Sample 1 Sample 2

Non-specific Hybridization

Array 1 Array 2

Statistical Problem

• Each gene is represented by 11-20 pairs (PM and MM) of probe intensities

• Each array has 8K-20K genes• Usually there are various arrays• Obtain measure for each gene on each array:

• Background adjustment and normalizationare issues

Summarize probeset data

Default until 2002 (MAS 4.0)• GeneChip® software used Avg.diff

• with A a set of “suitable” pairs chosen by software.

• Obvious Problems:– Many negative expression values– No log transform

∑Α∈

jj MMPMdiffAvg )(1.

Why use log?

Original Scale Log Scale

Original scale Log scale

Current default (MAS 5.0)

• GeneChip® new version uses something else

• with MM* a version of MM that is never bigger than PM.

• Ad-hoc background procedure and scale normalization are used.

)}{log( *jj MMPMghtTukeyBiweisignal −=

Can this be improved?

Log-scale scatter plot

log2(expression 1)

MvA Plot

A= ½{ log2(expression 2) + log2(expression 1) } /2

Can this be improved?

Precision/Accuracy

• It appears precision can be improved. How does it relate to accuracy?

• Spike-in experiments (Affymetrix and GeneLogic)

• Dilution Study (GeneLogic)

Use Spike-In Experiment

First academic alternative: dChip

Li and Wong fit a model

Here represents expression on chip iand represents the probe effect

A non-linear normalization technique is used and the model assumptions are used to remove outliers.

),0(, 2σεεφθ NMMPM ijijjiijij ∝+=−

iθjφ

dChip is betterbut still room for improvement

Three steps

From the spike-in data we learn that:

• We need to background adjust• Normalize• Summarize appropriately (in the log-scale)

Why background correct?

100 100

Concentration of 0 pM

Concentration of 1.0 pM

Concentration of 0.5 pM

Why background correct?

Why normalize?

Compliments of Ben Bolstad

Why correct for non-specific hyb?

One MM not enough? Look for more!

RMA• Robust Multiarray Analysis (RMA) is a 3-step

approch: – ignores MM and remove global background– quantile normalize– use median polish to estimate log expression robustly

• Irizarry et al: Biostatistics (2003)

• Irizarry et al: NAR (2003)

• affy R package (www.bioconductor.org)

Background adjustment

Deterministic Model

PM = B + N + SMM = B + N

PM – MM = S

Do MMs measure non-specific binding?Look at Yeast DNA hybridized to Human Chip(HGU95)

log (PM-B) v log (MM-B)

Not perfectly: This explains large variance

Stochastic Model(Additive background/multiplicative error)

PM = BPM + NPM + S,MM = BMM+ NMM

log (NPM), log (NMM) ~ Bivariate Normal (ρ ≈ 0.7)S = exp ( Ө + α + ε )Ө is the quantity of interest (log scale expression)

E[ PM – MM ] = S, but Var[ log( PM – MM ) ] ~ 1/exp(Ө) (can be very large)

Can we just ignore background?

PM is a biased estimate of Ө

Alternative Approach

Predict log(S) from PM,MM

For example: 1) E[ log(S) | PM, MM ]

2) Estimate Ө and obtain standard error: Formal hypothesis testing

Quantile normalization

Summarization

• Do it in the log-scale• Account for the probe effect• Use robust procedure

Probe-effect

• Li and Wong (2001) first observed the very strong probe effect

• Within the same probeset, a large range of intensities (orders of magnitude) is observed. But across arrays, variance of intensities, for the same probe, is relatively small

• This probe effect explains high correlation between replicate arrays

Expression from 2 replicate arrays

Correlation is higher than 0.99

Expression from probesets divided into 2 (at random)

Correlation drops to 0.55

Probe effect seen in spike-ins

Why fit log scale additive model?

RMA• Instead of subtracting MM,

Assume PM = B + S• To estimate S, use expectation: E[S|B+S], with B

normal and S exponential• After quantile normalization, assume:

log2Sij = Өi + αj + εij• Estimate Өi using robust procedure (median

polish)• We call this procedure RMA• Does it make a difference?

Does it make a difference?

Ranks 1

27020743063 3935 4639 4652 5149 5372 5947 64486870703775498429 9721

Perfect

Ranks1 23 4 56 7 8 9

10 111213141516

MAS 5.0

Ranks 1

27020743063 3935 4639 4652 5149 5372 5947 64486870703775498429 9721

Ranks1 23 4 67

10 16 4556 5888

406999

1643 2739

Can RMA be improved?

Global Accuracy and Precision

499.960.110.61RMA218882.430.630.69MAS 5.0RankPercentileMedian SDSlope

Can RMA be improved?

Current Work

• Incorporate MM and sequence information to build an improved model and estimate

• Find alternative, faster, approaches to posterior mean

• Preliminary work: GCRMA

Predict NSB with sequence info

Naef’s model

• Assume that being an A,T,G or C has a position dependent effect on probe effect

• Assume that this effect is a smooth function of position (Naef uses cubic polynomials we use splines)

• Use training data to get affinities

Naef uses these to predict probe effect

We use them to predict NSB too

Problems with MM

Also they take up half the space on the chip ($250)

Affymetrix Probe Level Analysis - Biostatisticsririzarr/Talks/jnj-affy.pdf · • affy R package ()...

Documents

AFFY Pharma Private Limited Uttar Pradesh India

SP Qudax Caps 100mg Rus Ins Affy TJKG01 · Предохраняет липиды клеточных мембран от перекисного окисления. Сокращает

Time Words and Music by Michael Bublé, Alan Chang and … · Words and Music by Michael Bublé, Alan Chang and Amy Foster 2 3 4 MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM

Time 3.28 min. Words and Music by Michael Bublé, Alan ... · Words and Music by Michael Bublé, Alan Chang and Amy Foster 2 3 4 MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM

Bioconductor · 2020. 4. 30. · # 108 affy_huex_1_0_st_v2 AFFY HuEx 1 0 st v2 probe feature_page # 109 affy_hugenefl AFFY HuGeneFL probe feature_page # 110 affy_hugene_1_0_st_v1

Analysis of Affy 1.0 ST Gene Array Data in R

Package ‘affy’ · Package ‘affy’ April 4, 2014 Version 1.40.0 Title Methods for Affymetrix Oligonucleotide Arrays Author Rafael A. Irizarry , Laurent Gautier

Download Tapple.pdf · Affy Tapple argues that it relied on Aptos' representation that any defect in the site's security system would be remedied to prevent any further issues. Affy

Textual Description of a y - Johns Hopkins Bloomberg ...ririzarr/Raffy/affy.pdf · Various built in function permit the user to easily view graphical displays of the ... description=

Time Music and lyrics by Barry Manilow ONE VOICE Rit. 32 MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM 33 A Tempo 33 34 35 36 Sop. Cor. Solo Cor. Rep. ICor. 2nd Cor

Time Words and Music by Andrew Hozier Byrne TAKE ME … · Words and Music by Andrew Hozier rByrne 3 4 MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM

Time Words and music byJames Blunt, Steve Robson,Ryan Tedder and Bob Marley … · 2017-08-10 · MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM MM

for AFFY Stock Week Investment Advice · AFFY, and answer all of your questions about the economy, stock market, and your investments. Investment Advice for AFFY Fundamental Analysis

Laser and Rotary Materials · L932-203 L932-206 L932-209 L706-206 L612-203 L612-206 L622-206 L502-203 L502-206 mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm Black/White

Working with Aﬀymetrix data: estrogen, a 2x2 factorial ...compdiag.molgen.mpg.de/ngfn/docs/2004/nov/exercises-affy.pdf · can consult the vignettes for the affy package for this

FRIDGE & FREEZER - LG USA...400 mm 50 mm 570 mm mm 565 mm 570 mm 592 mm 404 mm 51 mm 54.5 mm 39 mm 696 mm 5 mm 400 mm 1017 mm 617 mm 1775 mm 1802 mm 220 mm 60 mm 300 mm 360 mm •

affy, diff. exp., clustering - Lectures

AFFY PHARMA PRIVATE LIMITED · Affy group of companies are companies closely held by Arvind Billa and family members supported by good professionals and very strong committed work

Mm, Mm, Mm, Mm is for moon . Mm, Mm, Mm, Mm is for mouse . Mm, / m /, moon

Data Extraction cDNA arrays Affy arrays. Stanford microarray database