53
A Review/Update of the ERCC & A Review/Update of the ERCC & MAQC Microarray Consortia and MAQC Microarray Consortia and Some Applications of Their Some Applications of Their Findings Findings Expressionist” Seminar Expressionist” Seminar Group Group Johns Hopkins School of Public Health Johns Hopkins School of Public Health Ernest S. Kawasaki Ernest S. Kawasaki NCI Advanced Technology Center NCI Advanced Technology Center Microarray Facility Microarray Facility

A Review/Update of the ERCC & MAQC Microarray Consortia and Some Applications of Their Findings “Expressionist” Seminar Group Johns Hopkins School of Public

Embed Size (px)

Citation preview

A Review/Update of the ERCC & A Review/Update of the ERCC & MAQC Microarray Consortia MAQC Microarray Consortia

and Some Applications of Their and Some Applications of Their FindingsFindings

““Expressionist” Seminar GroupExpressionist” Seminar GroupJohns Hopkins School of Public HealthJohns Hopkins School of Public Health

Ernest S. KawasakiErnest S. Kawasaki

NCI Advanced Technology Center NCI Advanced Technology Center Microarray FacilityMicroarray Facility

August 9, 2006August 9, 2006

ERCC Summary/UpdateERCC Summary/UpdateExternal RNA Controls ConsortiumExternal RNA Controls Consortium

MAQC Summary/UpdateMAQC Summary/UpdateMicroArray Quality Control ConsortiumMicroArray Quality Control Consortium

Possible Use of ERCC/MAQC Possible Use of ERCC/MAQC Standards & Large Data SetStandards & Large Data Set

Organizations/Consortia Developing Organizations/Consortia Developing Standards & Controls for Gene Expression Standards & Controls for Gene Expression

Profiling TechnologiesProfiling Technologies

• MGED -- MGED -- Microarray Gene Expression DatabaseMicroarray Gene Expression Database Standard for data reporting Standard for data reporting

(MIAME)(MIAME)

• MAQC -- MAQC -- Microarray Quality Control GroupMicroarray Quality Control Group FDA sponsored RNA standards, ref FDA sponsored RNA standards, ref datasets, etc. datasets, etc. (Leming Shi et al)(Leming Shi et al)

• ERCC -- ERCC -- External RNA Controls ConsortiumExternal RNA Controls Consortium (M. Salit, J. Warrington et al)(M. Salit, J. Warrington et al)

• NIST -- NIST -- Metrology for Gene Expression ProgramMetrology for Gene Expression Program provides a better understanding of the provides a better understanding of the fundamentals of microarray technologies fundamentals of microarray technologies (M. Salit, M. Satterfield, et al)(M. Salit, M. Satterfield, et al)

4000 -

3000 -

2000 -

0

Nu

mb

er o

f p

ub

lica

tio

ns

Nu

mb

er o

f p

ub

lica

tio

ns

1995-8 1999 2000 2001 2002 2003 2004

Rapid Increase in Microarray PublicationsRapid Increase in Microarray Publications2005 2005 -- The -- The 10 Year Anniversary10 Year Anniversary of the First Expression Microarray of the First Expression Microarray

20052005

5000 -5000 -

5555

43504350

54005400!!

Yearly Summary From PubMedYearly Summary From PubMed

140140 42542511251125

20002000

31103110

No common standards are No common standards are used across platforms so used across platforms so data are difficult or data are difficult or impossible to compare.impossible to compare.

Proliferation of Whole Genome ArraysProliferation of Whole Genome Arrays

ABIABI 60mer60mer 31,000 31,000 Probe Probe SetsSets

AffymetrixAffymetrix 25mer25mer 54,000 54,000 “ “ “ “

AgilentAgilent 60mer60mer 44,000 44,000 “ “ ““

GE AmershamGE Amersham 30mer30mer 55,000 55,000 “ “ ““

IlluminaIllumina 50mer50mer 46,000 46,000 “ “ “ “

Microarrays Inc.Microarrays Inc. 70mer70mer 49,000 49,000 “ “ “ “

NimbleGenNimbleGen 60mer60mer 38,000 38,000 “ “ “ “

Phalanx BiotechPhalanx Biotech 60mer60mer ~30,000~30,000 “ “ “ “

Home BrewHome Brew 70mer 70mer ~40,000 ~40,000 “ “ ““ cDNAcDNA

Etc, Etc, Many other companies Many other companies (Combimatrix) (Combimatrix) making smaller making smaller

custom arrays.custom arrays. DNA-DNA hybrid occupies ~4nmDNA-DNA hybrid occupies ~4nm22 on slide surface on slide surface

ERCCERCCEExternalxternal RRNANA CControlontrol CConsortiumonsortium

Conception in March 2003Conception in March 2003Stanford UniversityStanford University

The Private, Public, and Academic The Private, Public, and Academic sectors working together to produce sectors working together to produce control materials for gene expression control materials for gene expression

analysis.analysis.

Mark Salit NIST/ Janet Warrington Affymetrix

Mission of the ERCCMission of the ERCC

The ERCC is developing The ERCC is developing external RNA controlsexternal RNA controls

useful for gene expression useful for gene expression assays in Microarrays & QRT-assays in Microarrays & QRT-

PCR on a wide variety of PCR on a wide variety of platforms.platforms.

J. Warrington -- Affymetrix

ERCC -- Over 70 Members and Counting…..ERCC -- Over 70 Members and Counting…..

Members of the ERCCMembers of the ERCC More than 70 and countingMore than 70 and counting….

A good mix of academic, government and commercial A good mix of academic, government and commercial organizations with ~115 scientists, 10 countriesorganizations with ~115 scientists, 10 countries

AffymetrixAgilentAmbionApplied BiosystemsATCCBiomerieuxBMSCambridge UniversityCapital BioCelera DiagnosticsCenetronCenters for Disease ControlCenters for Medicare & Medicaid ServicesClinical & Laboratory StandardsInstituteClinical Hospital Center ZagrebCombimatrixEli LillyEppendorf Microarray DivisionExpression Analysis

FDA, CBERFDA, CDERFDA, CDRHFDA, NCTRFDA, OIVDGE HealthcareGenetics Society of VietnamHarvard UniversityIlluminaInformax, Inc.International Federation of Clinical Chemistry & Laboratory MedicineInvitrogenJohns Hopkins UniversityLawrence Livermore LabLGCMarine Molecular Quality ControlsMayo ClinicNational Institute of Standards & TechnologyNIH, National Cancer InstituteNorthwestern

NugenicQiagenQueens University HospitalRoche Molecular SystemsStanford UniversityStratageneTokyo UniversityUCLAUniversity Health NetworkUS Department of AgricultureVeridex, Johnson & JohnsonVialogyVigentechEtc, etc, etc

J. Warrington -- Affymetrix

Nature Methods vol 2, p731, 2005Nature Methods vol 2, p731, 2005

The ERCC is producing standardized The ERCC is producing standardized expression controls, analysis tools and expression controls, analysis tools and

protocolsprotocols• Well-characterized, widely accepted RNA standard

controls for multiple platforms – Certified Reference Material (CRM)

• Protocols for multiple applications, research and the clinical laboratory (CLSI – Clinical & Laboratory Standards Inst) Approved July 2006!

• Software tools to support development work• Software tools to support multiple applications

J. Warrington -- Affymetrix

Control Sequences June 2006Control Sequences June 2006NumberNumber AffiliationAffiliation Genus SpeciesGenus Species LengthLength

1 - 281 - 28 AffymetrixAffymetrix B. subtlisB. subtlis 700 – 2000700 – 2000

29 - 4029 - 40 AffymetrixAffymetrix Artificial SequencesArtificial Sequences 500 – 1900500 – 1900

41 - 4341 - 43 USDA-ARS-NCAURUSDA-ARS-NCAUR Bos taurusBos taurus 500500

44 - 4644 - 46 USDA-ARS-NCAURUSDA-ARS-NCAUR Glycine maxGlycine max 500500

4747 AmbionAmbion Lamda phageLamda phage 20022002

48 - 5348 - 53 AmbionAmbion Artificial SequencesArtificial Sequences 10001000

54 - 6154 - 61 AmbionAmbion E. coliE. coli 750 – 2000750 – 2000

62 - 8262 - 82 Stanford UniversityStanford University MethanococcusMethanococcus 500 – 750500 – 750

83 - 8583 - 85 Agilent TechnologiesAgilent Technologies Artificial SequencesArtificial Sequences 500500

86 - 9086 - 90 GE HealthCareGE HealthCare E. coliE. coli 10001000

91 – 14091 – 140

141-146141-146

Ambion/AtacticAmbion/Atactic

EppendorfEppendorf

Artificial SequencesArtificial Sequences 10001000

L. Reid -- Expression Analysis, J. Warrington -- AffymetrixL. Reid -- Expression Analysis, J. Warrington -- Affymetrix

A Summary of How ERCC Controls Will Be A Summary of How ERCC Controls Will Be Tested and SelectedTested and Selected

Testing Strategy for RNA ControlsTesting Strategy for RNA Controls

1.1. Design and development -- generate Design and development -- generate reagents -- ~100 in place w/70 sequencedreagents -- ~100 in place w/70 sequenced

2.2. Prototype testing -- validate reagentsPrototype testing -- validate reagents

3.3. Proof of concept -- validate the assaysProof of concept -- validate the assays

4.4. Functional testing -- validate the productFunctional testing -- validate the product

5.5. Performance review -- analyze all dataPerformance review -- analyze all data

Testing begins in the 4Testing begins in the 4thth quarter. quarter. L. Reid et al L. Reid et al

Uses of RNA Controls/StandardsUses of RNA Controls/Standards

• Negative ControlsNegative Controls

-- Determine “true” background-- Determine “true” background

-- QC for slide quality, hybridization, etc.-- QC for slide quality, hybridization, etc.

• Positive ControlsPositive Controls

-- QC as above-- QC as above

-- Labeling efficiency-- Labeling efficiency

-- Dilution series, determine sensitivity of assay,-- Dilution series, determine sensitivity of assay, determine lowest conc. with reliable signaldetermine lowest conc. with reliable signal

-- Ratiometric series, normalization tool-- Ratiometric series, normalization tool

Will allow better comparison of intra or inter lab data Will allow better comparison of intra or inter lab data

and with the same or different array platforms.and with the same or different array platforms.

Tests for Validation of ERCC ControlsTests for Validation of ERCC Controls

• Negative control test – background studiesNegative control test – background studies • Cross-hybridization – determine if any of the Cross-hybridization – determine if any of the controls hybridize to each other or to mRNAscontrols hybridize to each other or to mRNAs

• Labeling test – determine efficiency in the Labeling test – determine efficiency in the presence of complex RNA samplepresence of complex RNA sample

• Latin square – test controls over a range ofLatin square – test controls over a range of concentrations (1:5,000,000 to 1:1000)concentrations (1:5,000,000 to 1:1000)

• Linear range test and ratiometric studiesLinear range test and ratiometric studies

Above studies will require ~102 arrays per site!Above studies will require ~102 arrays per site!

Latin Squares Design for Testing ControlsLatin Squares Design for Testing Controls

A1 – A4 = the 4 arrays used A1 – A4 = the 4 arrays used

G1 – G4 = the 4 transcripts being studiedG1 – G4 = the 4 transcripts being studied

L1 – L4 = the 4 concentrations of each transcriptL1 – L4 = the 4 concentrations of each transcriptL. Reid, BMC Genomics 6:150L. Reid, BMC Genomics 6:150

ERCC Test SitesERCC Test Sites>100 Arrays/Site for Validating Controls>100 Arrays/Site for Validating Controls

• AffymetrixAffymetrix• GE HealthcareGE Healthcare• IlluminaIllumina• NIAIDNIAID• NovartisNovartis• QiagenQiagen• Agilent, ABI, Roche maybeAgilent, ABI, Roche maybe

The MAQC ProjectThe MAQC ProjectMicroArray Quality ControlMicroArray Quality Control

• An FDA sponsored consortium (Leming Shi)An FDA sponsored consortium (Leming Shi)

• Founded to address concerns of microarrayFounded to address concerns of microarray community concerning reproducibility of community concerning reproducibility of expression profiling experiments.expression profiling experiments.

• Group consists of over 140 members from Group consists of over 140 members from academia, government, pharma & biotech.academia, government, pharma & biotech.

• A large study was designed to compare ex-A large study was designed to compare ex- pression data from 10 different platforms andpression data from 10 different platforms and 40 different test sites with >650 arrays.40 different test sites with >650 arrays.

• Study has been completed and results will beStudy has been completed and results will be published in published in Nature Biotechnology. Nature Biotechnology. Data will Data will released next month.released next month.

MAQC Study Goals/Exptl. DesignMAQC Study Goals/Exptl. Design• Establish a set of reference standards for use in the Establish a set of reference standards for use in the MAQC, but more importantly for the array communityMAQC, but more importantly for the array community• Generate large collection of reference data sets usingGenerate large collection of reference data sets using multiple microarray platforms and many diff. labs….multiple microarray platforms and many diff. labs….....• Promote the use of reference RNA samples…..Promote the use of reference RNA samples…..• Make recommendations on the appropriate uses of Make recommendations on the appropriate uses of microarray technology.microarray technology.

The MAQC group first tested multiple RNAs with 160 arrays and then The MAQC group first tested multiple RNAs with 160 arrays and then chose two for titration studies with 200 arrays. Two RNAs at two chose two for titration studies with 200 arrays. Two RNAs at two concentrations were chosen for repeated (5 arrays per sample) concentrations were chosen for repeated (5 arrays per sample) assays for four pools. The samples were UHRR from Stratagene and assays for four pools. The samples were UHRR from Stratagene and Human Brain Ref from Ambion. The four pools were: A. 100% UHRRHuman Brain Ref from Ambion. The four pools were: A. 100% UHRRB. 100% HBRR C. 75% UHRR: 25% HBRR D. 25% UHRR:75% HBRR.B. 100% HBRR C. 75% UHRR: 25% HBRR D. 25% UHRR:75% HBRR.

At the completion of this study there is data from At the completion of this study there is data from over 1026 arrays!over 1026 arrays!

Platforms Used In MAQC StudyPlatforms Used In MAQC Study

ABIABI (Applied Biosystems)(Applied Biosystems) One-Color ArrayOne-Color Array 32,878 Probes 32,878 Probes

AFX AFX (Affymetrix)(Affymetrix) One-Color ArrayOne-Color Array 54,675 Probes 54,675 Probes

AGL AGL (Agilent)(Agilent) Two-Color ArrayTwo-Color Array 43,931 Probes 43,931 ProbesAGI AGI (Agilent)(Agilent) One-Color ArrayOne-Color Array 43,931 Probes 43,931 Probes

CBC CBC (CapitalBioCorp)(CapitalBioCorp) One & Two ColorOne & Two Color 23,231 Probes 23,231 Probes

EPP EPP (Eppendorf)(Eppendorf) One-Color ArrayOne-Color Array 294 Probes 294 Probes

GEH GEH (GE Healthcare)(GE Healthcare) One-Color ArrayOne-Color Array 54,359 Probes 54,359 Probes

ILM ILM (Illumina)(Illumina) One-Color ArrayOne-Color Array 47,293 Probes 47,293 Probes

NCI NCI (NCI-Operon)(NCI-Operon) Two-Color ArrayTwo-Color Array 37,632 Probes 37,632 Probes

TCI TCI (TeleChem Int)(TeleChem Int) One & Two ColorOne & Two Color 27,648 Probes 27,648 Probes

TAQ TAQ (Applied Biosystems)(Applied Biosystems) TaqManTaqMan® Assays® Assays 1,004 PCRs 1,004 PCRs

QGN QGN (Panomics)(Panomics) QuantiGene AssayQuantiGene Assay 245 Probes 245 Probes

GEX GEX (GeneExpress)(GeneExpress) StaRT-PCR™ Assay StaRT-PCR™ Assay 205 Probes 205 Probes

MAQC STUDY DESIGNMAQC STUDY DESIGN

12,091 Genes12,091 Genes

Used for Com-Used for Com-

parison Acrossparison Across

All Platforms.All Platforms.(Damir Herman, Jean (Damir Herman, Jean Thierry-Mieg)Thierry-Mieg)

Take Home Messages/General Findings From Take Home Messages/General Findings From MAQC StudyMAQC Study

• Large data sets are available for objectively Large data sets are available for objectively assessing platform performance and variousassessing platform performance and various data analysis algorithms. data analysis algorithms.

• Microarray technology is reproducible and Microarray technology is reproducible and reliable when one has an understanding of reliable when one has an understanding of its limitations.its limitations.

• Cross platform analyses requires a veryCross platform analyses requires a very careful annotation & mapping of probecareful annotation & mapping of probe sequences.sequences.

• All the platforms had good intra-lab repeat-All the platforms had good intra-lab repeat- ability, and inter-lab reproducibility after ability, and inter-lab reproducibility after removal of outliers.removal of outliers.

• Methods of microarray analysis are an impor-Methods of microarray analysis are an impor- tant variable, and this large data set will help tant variable, and this large data set will help resolve issues in this area (statisticians andresolve issues in this area (statisticians and bioinformaticists take delight……..) bioinformaticists take delight……..)

Manuscripts in MAQC Study -- Entire Issue ofManuscripts in MAQC Study -- Entire Issue ofNature Biotechnology Sept. 2006Nature Biotechnology Sept. 2006

• EditorialEditorial• FDA ForwardFDA Forward• Stanford - Data quality in genomics and microarraysStanford - Data quality in genomics and microarrays• Impact of microarray data quality in genomic data Impact of microarray data quality in genomic data submissions to the FDAsubmissions to the FDA• US EPA efforts to develop a framework for using US EPA efforts to develop a framework for using genomics data in risk assessment and regulatorygenomics data in risk assessment and regulatory decision making.decision making.• MAQC main manuscript – overall descriptionMAQC main manuscript – overall description• The reproducibility of differentially expressed gene The reproducibility of differentially expressed gene lists in microarray studieslists in microarray studies**• An analysis and comparison of alternative platforms An analysis and comparison of alternative platforms • Use of RNA titrations to assess platform performancUse of RNA titrations to assess platform performanc• Performance of one-color vs two-color arraysPerformance of one-color vs two-color arrays

MAQC Manuscripts (cont.)MAQC Manuscripts (cont.)

• External RNA controls for assessment of microarrayExternal RNA controls for assessment of microarray analytical performanceanalytical performance• Normalization and technical variation in gene Normalization and technical variation in gene expression measurementsexpression measurements**• Toxigenomics and microarrays: biological responseToxigenomics and microarrays: biological response measurements are preserved across platformsmeasurements are preserved across platforms• Reproducibility probability score: A metric incorp-Reproducibility probability score: A metric incorp- orating measurement variability across labs for orating measurement variability across labs for gene comparisongene comparison**

Late news: 9 manuscripts submitted and 6 Late news: 9 manuscripts submitted and 6 were accepted. With 3 commentaries there are were accepted. With 3 commentaries there are 9 articles in the Sept. Nature Biotechnology 9 articles in the Sept. Nature Biotechnology Suppl. from the MAQC.Suppl. from the MAQC.

Nature May 25, 2006Nature May 25, 2006

With proper use of negative With proper use of negative and positive controls, and positive controls, microarrays may be used microarrays may be used to identify, quantitate to identify, quantitate expression and count the expression and count the absolute number of genes absolute number of genes being expressed in any being expressed in any given cell or tissue sample.given cell or tissue sample.………………Anonymous………….Anonymous………….

aka ESKaka ESK

Present (P)& Absent (A) Calls in Present (P)& Absent (A) Calls in Spotted Long Oligo ArraysSpotted Long Oligo Arrays

• “ “Average” cell expresses <10,000 genes.Average” cell expresses <10,000 genes.

• “ “Whole” genome array contains >25,000 genes.Whole” genome array contains >25,000 genes.

• Therefore, Therefore, Present Present calls should be 40% or less or 60%calls should be 40% or less or 60% Absent.Absent.

• However, However, P P calls are usually 90% or more using usualcalls are usually 90% or more using usual image analysis systems like GenePix.image analysis systems like GenePix.

• Why is this? Why do we care?Why is this? Why do we care?

Good negative controls may resolve Good negative controls may resolve this issue.this issue.

Li et al (2005) Bioinformatics 21:2875

What is Background?What is Background?Articles are still being written about how to determine “true” background. Controls can be Controls can be used to settle this used to settle this issue.issue.

Internal BackgroundInternal Background

External BackgroundExternal Background

W Yin et al (2005) Bioinformatics 21:2410W Yin et al (2005) Bioinformatics 21:2410

Common Methods for Background Common Methods for Background SubtractionSubtraction

Use of Negative Controls for Use of Negative Controls for Background SubtractionBackground Subtraction

Internal Background ~ 500-1000 unitsInternal Background ~ 500-1000 unitsExternal Background ~ 100-200 “External Background ~ 100-200 “

%Present using external = 96%%Present using external = 96%%Present using internal = 77%%Present using internal = 77%= 21,565/22,464 vs 17,010/22,464= 21,565/22,464 vs 17,010/22,464

Bckgrd subt eliminated 4,555 genes Bckgrd subt eliminated 4,555 genes from further analysis. Good or bad??from further analysis. Good or bad??

Use of negative controls can dramatically change values for % Use of negative controls can dramatically change values for % genes expressed and gene expression ratios!genes expressed and gene expression ratios!

Low Low SignalSignal

N

Negative Control Negative Control BackgroundBackground

External BackgroundExternal Background

Negative Controls & BackgroundNegative Controls & Background

jurk

at_7

0ju

rkat

_71

jurk

at_7

2ju

rkat

_73

L42

8_57

L42

8_58

L42

8_74

L42

8_75

lnca

p_5

3ln

cap

_54

lnca

p_5

5ln

cap

_56

mcf

_48

mcf

_49

mcf

_51

mcf

_52

oci

_66

oci

_67

oci

_68

oci

_69

sud

_61

sud

_62

sud

_63

sud

_64

0

1000

2000

3000

4000

5000B_Cy5neg_Cy5

F_Cy5

Mea

n in

tens

itie

s

jurk

at_7

0ju

rkat

_71

jurk

at_7

2ju

rkat

_73

L42

8_57

L42

8_58

L42

8_74

L42

8_75

lnca

p_5

3ln

cap

_54

lnca

p_5

5ln

cap

_56

mcf

_48

mcf

_49

mcf

_51

mcf

_52

oci

_66

oci

_67

oci

_68

oci

_69

sud

_61

sud

_62

sud

_63

sud

_64

0

1000

2000

3000

4000

5000

6000

B_Cy3neg_Cy3

F_Cy3

Mea

n in

tens

itie

s

Signal distribution of noise background (B), Signal distribution of noise background (B), negative control background (median)(neg) negative control background (median)(neg) and mean intensities of all probes (F) on the and mean intensities of all probes (F) on the slide separated by Cy5 and Cy3 channelsslide separated by Cy5 and Cy3 channels

Histogram of Intensities of Negative ControlsHistogram of Intensities of Negative Controls

Histogram of Jurkat 70: all negativecontrol spots:Cy3 and Cy5

0100

200

300

400

500

600

700

800

900

1000

1100

1200

1300

1400

1500

1600

1700

1800

1900

2000

2100

2200

2300

2400

2500

2600

2700

2800

2900

3000

3100

3200

3300

3400

3500

3600

3700

3800

3900

4000

4100

4200

4300

4400

4500

4600

4700

4800

4900

5000

5100

5200

5300

5400

5500

5600

5700

5800

5900

6000

6100

6200

6300

6400

6500

66000

10

20

30

40

50

Bin

neg

ati

ve p

rob

e n

um

bers

Cy5Cy5

Cy3Cy3

Influence of Type of Background Subtraction on Influence of Type of Background Subtraction on Expression RatiosExpression Ratios

• Assume control sample gene has signal of 600 units.Assume control sample gene has signal of 600 units. The experimental has a signal of 5600 in same gene.The experimental has a signal of 5600 in same gene. • The external background is 100 units.The external background is 100 units.

• Therefore, the calculated ratio value would be 11.Therefore, the calculated ratio value would be 11.5500/500 = 115500/500 = 11

• But if the negative control background is 500, the But if the negative control background is 500, the ratio is now 51ratio is now 51..

5100/100 = 515100/100 = 51

• Use of negative controls as background may relieve Use of negative controls as background may relieve some of the “compression” in ratios for these types ofsome of the “compression” in ratios for these types of arrays and give a more accurate expression value.arrays and give a more accurate expression value.

1 2 3 4 5 6 7 8 9 10 11 12

1. jurkat; 2. jurkat_neg; 3. L428l; 4. L428_neg; 5. lncap; 6. lncap_neg;7. mcf; 8. mcf_neg; 9. oci; 10. oci_neg; 11. sud; 12sud_neg

Box plots of CV (data are loess normalized, one set with Box plots of CV (data are loess normalized, one set with negative bg sub, another set without) – this figure shows negative bg sub, another set without) – this figure shows background subtraction could improve the data qualitybackground subtraction could improve the data quality

Probability at each cut off threshold

0 250 500 750 1000 1250 1500 1750 20000.00

0.25

0.50

0.75

1.00

tp5

tn5

3 standard deviation

cut off threshold

Pro

bab

ilit

yProbability of True Positives and True Probability of True Positives and True

Negatives Using 3 SD CutoffNegatives Using 3 SD Cutoff

ROC of Jurkat_70 Cy5

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00.00

0.25

0.50

0.75

1.00

3 standard deviation

1- specificity

sen

sit

ivit

y

Sensitivity & Specificity Cutoff ThresholdSensitivity & Specificity Cutoff ThresholdAt a 3 SD ValueAt a 3 SD Value

Courtesy Eric Hoffman

Perfect Match (PM) and Mis-Match (MM)Perfect Match (PM) and Mis-Match (MM)

Perfect Match (PM) and Mismatch (MM): The Perfect Match (PM) and Mismatch (MM): The Affy Image Quantitation MethodsAffy Image Quantitation Methods

GCOS (Gene Chip Operating System): default Affy GCOS (Gene Chip Operating System): default Affy analysis software.analysis software.

RMA (Robust Multiarray Average): Irizarry method using RMA (Robust Multiarray Average): Irizarry method using only PM signals.only PM signals.

GCRMA: Similar to RNA but takes into account GC GCRMA: Similar to RNA but takes into account GC contentcontent

dChip: Similar to GCOS but has with or without MM dChip: Similar to GCOS but has with or without MM options.options.

YW ChipYW Chip: The Yonghong Wang method. PM only with : The Yonghong Wang method. PM only with only sequence validated oligos used in analysis.only sequence validated oligos used in analysis.

Correlation Between 2 Technical Replicates – Affy ChipsCorrelation Between 2 Technical Replicates – Affy Chips

GCOSGCOS No background subtractionNo background subtraction

MAS background subtractionMAS background subtraction RMA background subtractionRMA background subtraction

Influence of Different Methods of Background SubtractionInfluence of Different Methods of Background Subtraction

PM Only vs PM-MM AnalyisPM Only vs PM-MM Analyis of of Technical ReplicatesTechnical Replicates

Mean Values IntensitiesMean Values Intensities S.D. Dist. Probesets 4 RepsS.D. Dist. Probesets 4 Reps

PM OnlyPM Only PM-MMPM-MM

PMPMPMPMMMMM PMPM

PMPMMMMM

Log22 Intensities

Correlation Study of Gene With Absent CallsCorrelation Study of Gene With Absent CallsGenes here were called absent by GCOS in 8 hybs from 2 technical Genes here were called absent by GCOS in 8 hybs from 2 technical replicates. Data indicates that absent calls may not be truly absent in replicates. Data indicates that absent calls may not be truly absent in many cases.many cases.

The MM Probes: C or T at 13The MM Probes: C or T at 13thth Position May Position May Result in Artefactual High Signal: 92% of All Result in Artefactual High Signal: 92% of All MM with Higher Signal Than PM have C or TMM with Higher Signal Than PM have C or T

Probe Mapping Data Will Be Available for Probe Mapping Data Will Be Available for All Platforms Used In The MAQC StudyAll Platforms Used In The MAQC Study

Analysis of Probe Sequences Within Probe Analysis of Probe Sequences Within Probe Sets in Affy Gene ChipSets in Affy Gene Chip

# of “Correct” or Mapped# of “Correct” or Mapped # Probe Sets in Each# Probe Sets in Each Oligos/Probe SetOligos/Probe Set CategoryCategory

11 692 69222 514 51433 433 43344 450 45055 425 42566 499 49977 626 62688 862 86299 1608 1608

1010 3771 3771 1111 36562 36562

How The ERCC & MAQC Can Increase TheHow The ERCC & MAQC Can Increase TheReliability/Acceptance of Microarray DataReliability/Acceptance of Microarray Data

• A set of controls used by all expression platforms willA set of controls used by all expression platforms will go a long way to end confusion about comparabilitygo a long way to end confusion about comparability of data from related experiments.of data from related experiments.• Probe mapping and sequences from all platforms will Probe mapping and sequences from all platforms will be extremely useful for cross platform comparisons.be extremely useful for cross platform comparisons.• Very large data set from all major platforms will point Very large data set from all major platforms will point out problem areas in present protocols/technologies,out problem areas in present protocols/technologies, which, hopefully, will result in their improvement.which, hopefully, will result in their improvement.

• Large data sets from ERCC and MAQC combined will Large data sets from ERCC and MAQC combined will provide a great resource for critically evaluating algo-provide a great resource for critically evaluating algo- rithms used in analyzing arrays. Which analysis rithms used in analyzing arrays. Which analysis method provides “true” answers?method provides “true” answers?• Hopefully, a (workable) consensus about utilization ofHopefully, a (workable) consensus about utilization of microarray technologies will arise from these two large microarray technologies will arise from these two large exercises in (sometimes a bit contentious) human exercises in (sometimes a bit contentious) human scientific cooperation.scientific cooperation.

Is your back to the wall? Are you under a lot of pressure?Is your back to the wall? Are you under a lot of pressure?

USFUSF

In Closing……. In Closing……. My attempt at being funny…..My attempt at being funny…..

Do you feel you’re on the Treadmill of Life?Do you feel you’re on the Treadmill of Life?

Moebius Strip II by M.C. EscherMoebius Strip II by M.C. Escher

Nature vol. 246, p776, 2003Nature vol. 246, p776, 2003

Are you uncertain of what to do next?Are you uncertain of what to do next?

Nano Smiley DNAsNano Smiley DNAs --- Many Happy Genomes --- Many Happy Genomes

100 nm100 nm

Courtesy P Rothemund Nature v440p297y06Courtesy P Rothemund Nature v440p297y06

Keep on smilin’, ‘caus when you’re smilin’, Keep on smilin’, ‘caus when you’re smilin’, the whole world smiles with you……the whole world smiles with you……

Thank you all ------Thank you all ------

ERCC & MAQC ConsortiaERCC & MAQC Consortia

ATC Microarray Lab CrewATC Microarray Lab Crew

YW for Analysis & AP for YW for Analysis & AP for Chip DataChip Data