15
BioMed Central Page 1 of 15 (page number not for citation purposes) BMC Genomics Open Access BMC Genomics 2002, 3 x Methodology article Optimization and evaluation of T7 based RNA linear amplification protocols for cDNA microarray analysis Hongjuan Zhao 1 , Trevor Hastie 2 , Michael L Whitfield 3 , Anne-Lise Børresen- Dale 4 and Stefanie S Jeffrey* 1 Address: 1 Department of Surgery, Medical School Lab-Surge Bldg P214, Stanford University, Stanford, CA 94305-5494, USA, 2 Department of Statistics, Sequoia Hall, Stanford University, Stanford, CA 94305-4065, USA, 3 Department of Genetics, CCSR 2260, Stanford University, Stanford, CA 94305-5163, USA and 4 Department of Genetics, Norwegian Radium Hospital, University of Oslo, Oslo, Norway E-mail: Hongjuan Zhao - [email protected]; Trevor Hastie - [email protected]; Michael L Whitfield - [email protected]; Anne-Lise Børresen-Dale - [email protected]; Stefanie S Jeffrey* - [email protected] *Corresponding author Abstract Background: T7 based linear amplification of RNA is used to obtain sufficient antisense RNA for microarray expression profiling. We optimized and systematically evaluated the fidelity and reproducibility of different amplification protocols using total RNA obtained from primary human breast carcinomas and high-density cDNA microarrays. Results: Using an optimized protocol, the average correlation coefficient of gene expression of 11,123 cDNA clones between amplified and unamplified samples is 0.82 (0.85 when a virtual array was created using repeatedly amplified samples to minimize experimental variation). Less than 4% of genes show changes in expression level by 2-fold or greater after amplification compared to unamplified samples. Most changes due to amplification are not systematic both within one tumor sample and between different tumors. Amplification appears to dampen the variation of gene expression for some genes when compared to unamplified poly(A) + RNA. The reproducibility between repeatedly amplified samples is 0.97 when performed on the same day, but drops to 0.90 when performed weeks apart. The fidelity and reproducibility of amplification is not affected by decreasing the amount of input total RNA in the 0.3–3 µg range. Adding template-switching primer, DNA ligase, or column purification of double-stranded cDNA does not improve the fidelity of amplification. The correlation coefficient between amplified and unamplified samples is higher when total RNA is used as template for both experimental and reference RNA amplification. Conclusion: T7 based linear amplification reproducibly generates amplified RNA that closely approximates original sample for gene expression profiling using cDNA microarrays. Background Gene expression profiling using complementary DNA (cDNA) microarrays is being applied for multiple purpos- es such as defining the taxonomy of different molecular subtypes of human breast and other cancers [1–10] and discovering biomarkers and therapeutic targets [11,12]. A limitation of the use of this technology is that small spec- imens of human tissue, such as obtained by core needle or Published: 30 October 2002 BMC Genomics 2002, 3:31 Received: 6 July 2002 Accepted: 30 October 2002 This article is available from: http://www.biomedcentral.com/1471-2164/3/31 © 2002 Zhao et al; licensee BioMed Central Ltd. This article is published in Open Access: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.

Open Access Optimization and evaluation of T7 based RNA ...€¦ · (not employing a TS primer) and 9700 clone cDNA microarrays. They found that a differential expression ra-tio cut-off

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Open Access Optimization and evaluation of T7 based RNA ...€¦ · (not employing a TS primer) and 9700 clone cDNA microarrays. They found that a differential expression ra-tio cut-off

BioMed CentralBMC Genomics

ss

Open AcceBMC Genomics 2002, 3 xMethodology articleOptimization and evaluation of T7 based RNA linear amplification protocols for cDNA microarray analysisHongjuan Zhao1, Trevor Hastie2, Michael L Whitfield3, Anne-Lise Børresen-Dale4 and Stefanie S Jeffrey*1

Address: 1Department of Surgery, Medical School Lab-Surge Bldg P214, Stanford University, Stanford, CA 94305-5494, USA, 2Department of Statistics, Sequoia Hall, Stanford University, Stanford, CA 94305-4065, USA, 3Department of Genetics, CCSR 2260, Stanford University, Stanford, CA 94305-5163, USA and 4Department of Genetics, Norwegian Radium Hospital, University of Oslo, Oslo, Norway

E-mail: Hongjuan Zhao - [email protected]; Trevor Hastie - [email protected]; Michael L Whitfield - [email protected]; Anne-Lise Børresen-Dale - [email protected]; Stefanie S Jeffrey* - [email protected]

*Corresponding author

AbstractBackground: T7 based linear amplification of RNA is used to obtain sufficient antisense RNA formicroarray expression profiling. We optimized and systematically evaluated the fidelity andreproducibility of different amplification protocols using total RNA obtained from primary humanbreast carcinomas and high-density cDNA microarrays.

Results: Using an optimized protocol, the average correlation coefficient of gene expression of11,123 cDNA clones between amplified and unamplified samples is 0.82 (0.85 when a virtual arraywas created using repeatedly amplified samples to minimize experimental variation). Less than 4%of genes show changes in expression level by 2-fold or greater after amplification compared tounamplified samples. Most changes due to amplification are not systematic both within one tumorsample and between different tumors. Amplification appears to dampen the variation of geneexpression for some genes when compared to unamplified poly(A)+ RNA. The reproducibilitybetween repeatedly amplified samples is 0.97 when performed on the same day, but drops to 0.90when performed weeks apart. The fidelity and reproducibility of amplification is not affected bydecreasing the amount of input total RNA in the 0.3–3 µg range. Adding template-switching primer,DNA ligase, or column purification of double-stranded cDNA does not improve the fidelity ofamplification. The correlation coefficient between amplified and unamplified samples is higher whentotal RNA is used as template for both experimental and reference RNA amplification.

Conclusion: T7 based linear amplification reproducibly generates amplified RNA that closelyapproximates original sample for gene expression profiling using cDNA microarrays.

BackgroundGene expression profiling using complementary DNA(cDNA) microarrays is being applied for multiple purpos-es such as defining the taxonomy of different molecular

subtypes of human breast and other cancers [1–10] anddiscovering biomarkers and therapeutic targets [11,12]. Alimitation of the use of this technology is that small spec-imens of human tissue, such as obtained by core needle or

Published: 30 October 2002

BMC Genomics 2002, 3:31

Received: 6 July 2002Accepted: 30 October 2002

This article is available from: http://www.biomedcentral.com/1471-2164/3/31

© 2002 Zhao et al; licensee BioMed Central Ltd. This article is published in Open Access: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.

Page 1 of 15(page number not for citation purposes)

Page 2: Open Access Optimization and evaluation of T7 based RNA ...€¦ · (not employing a TS primer) and 9700 clone cDNA microarrays. They found that a differential expression ra-tio cut-off

BMC Genomics 2002, 3 http://www.biomedcentral.com/1471-2164/3/31

fine needle aspiration (FNA) biopsies, may not be suffi-cient for microarray hybridization using direct labellingprotocols. Typical microarray labelling procedures require2–4 µg poly(A)+ RNA or 25–50 µg total RNA per cDNAmicroarray. This amount of poly(A)+ RNA or total RNAcan be obtained from samples of human tissue that weighgreater than 50–100 mg. However, core needle biopsies ofbreast cancers, for example, weigh in the 10–25 mg rangeand yield only 3–15 µg of total RNA. Small tumors iden-tified using early detection strategies may thus be toosmall to excise a specimen with enough RNA for microar-ray analysis. A pilot study by Assersohn et al. [13] showedthat only 15% of FNA samples from human breast cancersproduced sufficient mRNA for expression array analysis.One approach to low specimen RNA input has been to useindirect labelling techniques to increase fluorescence sig-nal intensity, such as with aminoallyl nucleotides. Al-though less expensive, we and other colleagues havefound that indirect labelling techniques are not always re-liable compared to direct labelling methods. For valuabletumor specimen, reliability is paramount. A very recent re-port used amino C6dT-modified random hexamers toprime cDNA synthesis in conjunction with aminoallyl-dUTP and increased fluorescence intensity enough suchthat as little as 1 µg of total RNA from cell lines gave suffi-cient signal for cDNA microarray hybridization [14]. Thereliability of this method with human tumor specimenwarrants further testing.

RNA amplification techniques have been developed to ad-dress the need for sufficient RNA from tiny specimen formicroarray hybridization. Other examples of specimen re-quiring amplification for genome-wide characterizationof gene expression include purified populations of cellsobtained by either flow cytometry, laser capture microdis-section, breast ductal or bronchial lavage, or microendos-copy. Although one group has used unamplified totalRNA extracted from ~2 × 104 microdissected cells for hy-bridization on 5000 clone membrane-based arrays [15],most groups perform RNA amplification for this purpose[16–18], especially when using high-density slide-basedarrays.

The most commonly used mechanism for RNA amplifica-tion is a T7 based linear amplification method first devel-oped by Van Gelder, Eberwine and coworkers [19–21].This method utilizes a synthetic oligo(dT) primer contain-ing the phage T7 RNA polymerase promoter to prime syn-thesis of first strand cDNA by reverse transcription of thepoly(A)+ RNA component of total RNA. Second strandcDNA is synthesized by degrading the poly(A)+ RNAstrand with RNase H, followed by second strand synthesiswith E. coli DNA polymerase I. Amplified antisense RNA(aRNA) is obtained from in vitro transcription of the dou-ble-stranded cDNA (ds cDNA) template using T7 RNA

polymerase. Several protocols based on this mechanismhave been developed and used in microarray analyses[16,22–28].

In spite of the increasing use of T7 based linear amplifica-tion techniques in the study of human disease, systematicevaluation of the fidelity and the reproducibility of ampli-fication mechanisms has been limited. Such informationis important to determine how well the amplified sampleresembles the unamplified sample and the validity of ap-plying this technique to the study of human tissues. Astudy by Wang and Miller et al. [24] described a T7 basedamplification protocol modified with a template-switch-ing (TS) primer used to theoretically generate a full-lengthds cDNA. The gene expression of amplified total RNAfrom melanoma cell lines hybridized to 2000 gene micro-arrays was compared to that of unamplified total or po-ly(A)+ RNA using cluster analysis and determining thenumber of outlier genes between single experiments. Ap-proximately 3–6% of genes were discordant outliers whenanalyzed using 3-fold or greater expression ratios in atleast one hybridization and compared between total RNA,mRNA, and different amounts of input aRNA. Hu and co-workers [27] compared amplified and unamplified sam-ples using total RNA obtained from human glioma celllines and 2300 clone microarrays (printed with duplicateson the same array) to evaluate a similar T7 based protocolwith a TS mechanism adopted from Wang and Miller et al.[24]. Their results were based on nine microarray experi-ments and showed concordance between amplified andunamplified samples, verifying four expressed and twodifferentially expressed genes using Northern and Westernblotting and immunohistochemical assay.

Since there are multiple T7 based amplification protocolsin use, questions remain regarding the effects of differenc-es between these protocols and how these differencestranslate when applied to solid tumors rather than celllines on a genome-wide scale. A study from Incyte Ge-nomics [25] examined gene expression of kidney vs. pla-centa RNA and RNA from matched normal and tumorrenal tissue using their own T7 based amplification kit(not employing a TS primer) and 9700 clone cDNAmicroarrays. They found that a differential expression ra-tio cut-off of greater than or equal to 2-fold produced ex-cellent correlation between samples amplified withdifferent amounts of input poly(A)+ RNA but that a 3-folddifferential expression ratio threshold should be set forcomparing ratios between amplified and unamplifiedmRNA. Decreasing the input of tissue lysate increasedgene expression discordance between amplified and un-amplified samples. As more human tissues were tested,single round amplification produced a 200- to 500-foldyield, lower than the 700-fold yield originally found in

Page 2 of 15(page number not for citation purposes)

Page 3: Open Access Optimization and evaluation of T7 based RNA ...€¦ · (not employing a TS primer) and 9700 clone cDNA microarrays. They found that a differential expression ra-tio cut-off

BMC Genomics 2002, 3 http://www.biomedcentral.com/1471-2164/3/31

their study and lower than yields reported in amplifica-tion studies using cell cultures.

The differences between several T7 based linear amplifica-tion protocols are mainly the following: 1. whether a tem-plate-switching mechanism is used in the synthesis ofsecond strand cDNA, 2. what enzymes are used in the syn-thesis of second strand cDNA, 3. how ds cDNA is purified("cleaned up") prior to in vitro transcription, and 4. howin vitro transcription is performed. Information regardingthe effects of these differences on the fidelity or reproduc-ibility of amplification should help eliminate both unnec-essary procedures and those actually detrimental toamplification. Determination of the range of input totalRNA necessary to achieve reasonable fidelity and repro-ducibility is crucial for researchers dealing with very smallspecimens of human tissue.

To answer these questions and to define an optimal pro-tocol for T7 based linear amplification, we carried out aseries of amplification reactions under different condi-tions using total RNA isolated from primary human breastcarcinomas. The amplified samples were compared to un-amplified samples on high-density cDNA microarrayscontaining 41,931–42,602 clones. We evaluated the ef-fects of TS primer in second strand cDNA synthesis, DNA

ligase activity in second strand cDNA synthesis, columnpurification of ds cDNA, and in vitro transcription timeon the fidelity and reproducibility of amplification andthe yield of aRNA. The effect of diminishing amounts ofinput total RNA was also tested.

Results and DiscussionVariation in cDNA microarray analysis of gene expression using unamplified poly(A)+ RNAIn order to assess the reproducibility of microarray hy-bridization using standard methods, poly(A)+ RNA wasisolated from both primary breast carcinoma BC2 andUniversal Human Reference total RNA (Stratagene®). TheBC2 poly(A)+ RNA labelled with Cy5 and the referencepoly(A)+ RNA labelled with Cy3 were hybridized on five42,000 clone cDNA microarrays. 16,333 clones had a sig-nal greater than 50% above background on all five arrays.Three hybridizations were done on the same day using ar-rays from the same print batch and the average correlationcoefficient between any two hybridizations was 0.97 ±0.01, demonstrating a high reproducibility between paral-lel hybridizations done on the same day. Another two hy-bridizations were performed using poly(A)+ RNA isolatedfrom BC2 total RNA three months later and a differentprint batch of microarrays. The correlation coefficient be-tween these two arrays was 0.95, similar to the average

Table 1: Correlation coefficients of amplified and unamplified expression levels of 14,044 genes selected according to the described cri-teria. Amplifications with or without TS primer and with two different ds cDNA cleanup protocols were performed on BC91 total RNA.

Column for ds cDNA cleanup Bio-6 G-50

Reference RNA amplified Total RNA Poly(A)+ RNA Total RNA Poly(A)+ RNA

w/o TS Virtual 0.84 (n = 2) 0.77 (n = 4) 0.82 (n = 2) 0.80 (n = 4)Average 0.83 ± 0.01 (n = 2) 0.73 ± 0.07 (n = 4) 0.81 ± 0.00 (n = 2) 0.76 ± 0.05 (n = 4)

TS Virtual 0.81 (n = 2) 0.79 (n = 3) 0.77 (n = 1) 0.71 (n = 2)Average 0.80 ± 0.05 (n = 2) 0.74 ± 0.04 (n = 3) 0.68 ± 0.02 (n = 2)

Table 2: Efficacy of amplification using 3 µg total RNA from BC91 and different ds cDNA cleanup methods, with or without TS primer.

Column used for ds cDNA cleanup Bio-6 G-50

Amplification Protocol TS w/o TS TS w/o TS

Yield of aRNA (µg) 7.6 ± 1.0 (n = 5) 11.1 ± 2.2 (n = 3) 8.9 ± 3.8 (n = 6) 10.0 ± 5.1 (n = 6)Fold of amplification 253 ± 33 370 ± 73 297 ± 126 333 ± 170

Page 3 of 15(page number not for citation purposes)

Page 4: Open Access Optimization and evaluation of T7 based RNA ...€¦ · (not employing a TS primer) and 9700 clone cDNA microarrays. They found that a differential expression ra-tio cut-off

BMC Genomics 2002, 3 http://www.biomedcentral.com/1471-2164/3/31

correlation coefficient of the first three arrays. However,when the unamplified poly(A)+ RNA arrays performedweeks apart from different print batches were compared,the average correlation coefficient dropped to 0.89 ± 0.01,indicating that experimental variations due to differencesin microarray printing, poly(A)+ RNA isolation, and RNAlabelling and hybridization contribute a small but detect-able change in results.

In order to minimize experimental variations, we createda virtual poly(A)+ RNA expression array that idealizes thegene expression of sample BC2 by using the average ex-pression level of each clone over multiple hybridizations.By "expression level" we mean the normalized log (base2)ratio of signal intensities of Cy5 (experimental sample) toCy3 (reference) fluorescence. The idealized expressionprofile from the poly(A)+ RNA virtual array is used as ourunamplified "gold standard" for data analyses involvingBC2. The correlation coefficient between each individualpoly(A)+ RNA array and the gold standard ranges from0.95–0.97, similar to that observed between hybridiza-tions performed on the same day. The gold standard vir-tual array therefore represents well-measured geneexpression in the primary tumor and minimizes individu-al experimental variations.

Template-switching does not affect the fidelity of amplifi-cationAs previously mentioned, a protocol based on T7 basedlinear amplification published by Wang, Miller and cow-orkers [24] incorporated a TS mechanism [29] in the syn-thesis of second strand cDNA at the 5' end in order togenerate full-length ds cDNA. This was speculated to be ofadvantage in the hybridization of unmapped sequencesspotted on arrays and to enable higher temperature cDNAsynthesis that would enhance sequence specificity. How-ever, no experimental evidence was provided to supportthe idea that the TS mechanism increases the fidelity ofamplification.

To determine whether the addition of TS primer improvesthe fidelity of amplification, we compared gene expres-sion profiles of aRNA amplified in the presence or absenceof TS primer with expression profiles of unamplified po-ly(A)+ RNA. Total RNA isolated from primary breast carci-noma BC91 and reference total RNA were amplified withor without TS primer using the Wang-Miller protocol [24],except that aRNA was purified using an RNeasy® kit (Qia-gen®). A virtual "gold standard" poly(A)+ RNA array wascreated for BC91 using the average expression level of fourhybridization replicates of unamplified poly(A)+ RNA. A"virtual correlation coefficient" for a given amplificationprotocol was obtained by comparing the virtual amplifiedarray (averaged expression level for each clone from mul-tiple amplified samples) to the virtual gold standard un-

amplified array for BC91. To determine the correlationbetween individual amplified samples and the gold stand-ard, an "average correlation coefficient" for a given ampli-fication protocol was also calculated (the sum ofcorrelation coefficients of individual amplified sampleswith the gold standard divided by the number of ampli-fied samples tested for each condition).

As shown in Table 1, using aRNA amplified from totalRNA as reference, the expression profiles obtained in theabsence of TS primer correlated with the gold standardslightly better than in the presence of TS primer, althoughthe difference was not statistically significant. When po-ly(A)+ RNA rather than total RNA was amplified as refer-ence, the correlation with the gold standard was slightly,but not statistically significantly, better with TS primer(Table 1). The correlation coefficient using aRNA ampli-fied from total RNA as reference is higher than usingaRNA amplified from poly(A)+ RNA as reference regard-less of whether TS primer is used. This suggests that whentotal RNA from a tumor sample is amplified for microar-ray analysis, the reference RNA should also be amplifiedfrom total RNA.

The yield of aRNA amplified from BC91 total RNA usingdifferent protocols is shown in Table 2. Assuming 1% ofthe total RNA is poly(A)+ RNA, a 253 and 370-fold ampli-fication was observed in the presence and absence of TSprimer, respectively. The yield of aRNA amplified from to-tal RNA which are generated from cultured cell lines was2- to 3-fold higher than when the primary tumor totalRNA was amplified (data not shown).

These experiments demonstrate that the TS mechanismdoes not increase the fidelity of amplification, and there-fore can be eliminated from the protocol. The reasons forthe limited effect of the TS mechanism on the correlationcoefficients probably are: 1) the second strand cDNA syn-thesis primed by the TS primer probably represents asmall fraction, while the majority of the synthesis is self-primed or primed by small pieces of RNA generated byRNase H; and 2) adding a few base pairs to the ds cDNAprior to in vitro transcription does not change the aRNAsignificantly enough to affect the array hybridization.

DNA ligase activity is not required for amplificationAmplification protocols that do not include DNA ligase insecond strand cDNA synthesis generate the same lengthaRNA (ranging from 0.2 kb to 6 kb, data not shown) asgenerated from a widely used T7 based amplification pro-tocol developed by Affymetrix®[26] which uses E. coliDNA ligase. The correlation coefficient between amplifiedand unamplified sample and the yield of aRNA amplifiedwithout DNA ligase are high enough to suggest that DNAligase activity is not necessary for RNA amplification in

Page 4 of 15(page number not for citation purposes)

Page 5: Open Access Optimization and evaluation of T7 based RNA ...€¦ · (not employing a TS primer) and 9700 clone cDNA microarrays. They found that a differential expression ra-tio cut-off

BMC Genomics 2002, 3 http://www.biomedcentral.com/1471-2164/3/31

microarray analysis. For confirmation, we omitted DNAligase from the protocol developed by Affymetrix® andcompared the expression profiles of the resulting aRNA toaRNA obtained using the standard Affymetrix® protocolthat includes DNA ligase. The correlation coefficient be-tween amplified and unamplified samples is slightly high-er in the absence of ligase (Table 3), supporting ourprevious conclusion that ligase is not required for totalRNA amplification in cDNA microarray analysis. Howev-er, the yield of aRNA is higher when ligase is used, suggest-ing that ligase may play a role in improving the efficiencyof amplification.

Column cleanup of ds cDNA does not improve the fidelity of amplification, but decreases the yield of aRNAIn the Wang-Miller protocol [24], ds cDNA is purified us-ing a Bio-6 column (Bio-Rad). A drawback to this methodis that the cDNA is eluted with a large volume and needsto be concentrated into a much smaller volume by lyophi-lization prior to in vitro transcription. This is a time-con-suming step, especially when large numbers of samplesare processed. To eliminate the lyophilization step fromthe protocol, we used an alternative column–the Sepha-dex™ G-50 column–to filter out free nucleotides from theds cDNA after completion of the second strand synthesis

reaction. The ds cDNA is then precipitated following phe-nol-chloroform extraction and re-suspended in propervolume for in vitro transcription. The correlation coeffi-cient between amplified and unamplified samples and theyield of aRNA using this less time-consuming modifica-tion are similar to that using the Wang-Miller protocol(Tables 1 and 2).

We further explored the question of what effects the dscDNA column cleanup step itself had on amplification.We amplified total RNA from tumor BC2, either with orwithout the cleanup step of Sephadex™ G-50. Seven am-plifications were done on different dates with the Sepha-dex™ G-50 column and five amplifications were donewithout this cleanup step. Both the virtual and the averagecorrelation coefficients using the column are slightly low-er than without it (Table 4), suggesting that the columncleanup does not improve the fidelity of amplification.Moreover, the yield of aRNA is significantly higher with-out the column purification of ds cDNA, suggesting someloss of ds cDNA on the column. Since the column had anegative effect on amplification by decreasing the yield ofaRNA without improving the fidelity of amplification, weeliminated this step from our protocol.

Table 3: Effect of DNA ligase on the fidelity of amplification.a,b

Protocol Affymetrix Affymetrix w/o ligase

Number of amplifications 6 5Correlation coefficient Virtual 0.84 0.86

Average 0.79 ± 0.04 0.82 ± 0.04Yield of aRNA (µg) 24.1 ± 4.7 19.2 ± 5.9Fold of amplification 803 ± 157 640 ± 197

aData were obtained from comparing expression level of 13,783 clones using the described selection criteria. bInput BC2 total RNA is 3 µg.

Table 4: Effect of column cleanup on the fidelity and yield of amplification.a,b

Protocol G-50 cleanup w/o G-50 cleanup

Number of amplifications 7 5Correlation coefficient Virtual 0.83 0.85

Average 0.79 ± 0.03 0.81 ± 0.02Yield of aRNA (µg) 11.9 ± 2.8 15.9 ± 2.7Fold of amplification 397 ± 93 530 ± 90

aData were obtained from comparing expression level of 12,305 clones using the described selection criteria. bInput BC2 total RNA is 3 µg and amplification was done without TS primer.

Page 5 of 15(page number not for citation purposes)

Page 6: Open Access Optimization and evaluation of T7 based RNA ...€¦ · (not employing a TS primer) and 9700 clone cDNA microarrays. They found that a differential expression ra-tio cut-off

BMC Genomics 2002, 3 http://www.biomedcentral.com/1471-2164/3/31

Effect of in vitro transcription time on the fidelity of RNA amplificationTo determine the effect of in vitro transcription time onamplification, duplicate reactions were performed at 37°Cfor 2, 3, 4, 5 and 6 hours. Two additional 5-hour incuba-tion reactions were stored at 4°C overnight to determinethe effect of low temperature incubation on amplification.The virtual correlation coefficient is slightly higher for the5-hour incubation at 37°C (Figure 1A). However, in vitrotranscription for 5 hours at 37°C plus overnight incuba-tion at 4°C gives the highest yield of aRNA (Figure 1B).Since the yield of aRNA at any time point is sufficient formultiple hybridizations, we decided to use 5-hour incuba-tion at 37°C for all subsequent amplifications.

Evaluation of the fidelity of T7 based linear amplification protocolsTo systematically evaluate the fidelity of T7 based ampli-fication, we compared the correlation coefficient obtainedfrom four different protocols. The correlation coefficientsof individual samples amplified using different protocolswith the gold standard range from 0.74–0.86 (Figure 2).The scatter plots comparing the gene expression of the vir-tual amplified samples for each protocol with the unam-plified gold standard are shown in Figure 3, with thevirtual correlation coefficients ranging from 0.83–0.86.The differences in correlations obtained using differentprotocols are not statistically different by Student's t-test,demonstrating that differences in gene expression forsamples amplified using different protocols are minor.

Our results also suggest that the level of bias introducedinto gene expression profiling by amplification is relative-ly low; expression profiles obtained using aRNA provide aclose approximation of the true expression profile of theoriginal sample. To assess the biases of amplificationquantitatively, we calculated the number and percentageof genes with expression level change by 4- or 2-fold afteramplification. The biases of amplification by differentprotocols are similar. Specifically, less than 0.2% of11,123 genes (12–15 genes) changed their expression lev-el by 4-fold or greater and less than 6% (306–594 genes)changed expression by 2-fold or greater after amplifica-tion. With the Jeffrey lab protocol, less than 4% of genesshowed changes in expression level by 2-fold or greater.Of the genes that changed, 7 genes and 139 genes changedtheir expression in all four protocols in the same directiongreater than 4-fold and 2-fold, respectively. Also, the vir-tual correlation coefficients between different protocolsare high (average 0.95) (Figure 3), suggesting that slightdifferences in protocols based on T7 linear amplificationmechanism do not affect the correlation of amplifiedsamples to unamplified samples. These results suggest theconclusion that aRNA provides a close approximation ofthe true expression profile of the original sample.

We present here a components of variance model for ex-plaining the different sources of variation in the amplifi-cation protocols (see Statistical Appendix, additional file1). The expression measurement for a gene for a specificarray/protocol/sample can be broken down as

X = Z + e, where

X is the measured expression value

Z is the "true" expression that does not change under rep-lication, and

e is the measurement error that does change under repli-cation.

While we cannot estimate Z and e directly, we can esti-mate their variances from the data. For the four differentamplified protocols and the unamplified arrays, the rele-vant variances are estimated in Table 5. The variance of thetrue expression (Var Z) ranges from 0.623–0.661 for theamplified protocols and is 0.726 for the unamplified ar-rays. The variance of the measurement error (Var e) rangesfrom 0.055–0.102 for the amplified protocols and is0.059 for the unamplified arrays. The estimates of Var Zwere obtained by averaging the pairwise covariances ofthe replicates within each protocol, and the estimates ofVar e by using the within-protocol variance. (While thevariance of a collection of numbers measures how muchthey vary about their mean, the covariance of two sets ofnumbers measures how much they vary with respect toeach other. See Statistical Appendix for more details). Wenotice that Var Z for unamplified poly(A)+ RNA is largerthan all of the others, indicating a dampening effect ongene expression by amplification. Measurement error var-iance is lowest for the Jeffrey lab protocol.

The covariances between the different Zs for the differentmethods (estimated from the virtual arrays) are shown inTable 6. The off-diagonal elements of Table 6 are the cov-ariances; the diagonal elements are the variances of thevirtual arrays. We notice that covariances among the am-plified protocols, which range from 0.62 to 0.66, are high-er than their covariances with the unamplified arrays,which range between 0.57 and 0.61. Furthermore, the var-iances of the amplified protocols (0.63 to 0.68) are lowerthan that of the unamplifed (0.74).

This suggests the following further breakdown:

Z = Zc + Zs, where

Zc is a common expression component, with varianceabout 0.6, and

Page 6 of 15(page number not for citation purposes)

Page 7: Open Access Optimization and evaluation of T7 based RNA ...€¦ · (not employing a TS primer) and 9700 clone cDNA microarrays. They found that a differential expression ra-tio cut-off

BMC Genomics 2002, 3 http://www.biomedcentral.com/1471-2164/3/31

Figure 1Effects of in vitro transcription time on the fidelity of T7 based amplification and the yield of aRNA amplifiedfrom BC2 total RNA. Average correlation coefficients between amplified samples vs. unamplified poly(A)+ RNA at eachtime point are shown in A and average yields of aRNA from each time point in B.

Incubation time at 37ºC

Incubation time at 37ºC

A

B

0.73

0.74

0.75

0.76

0.77

0.78

0.79

0.80

0.81

0.82

2 hr 3 hr 4 hr 5 hr 6 hr 5 hr at 37ºC

and overnightat 4ºC

0

5

10

15

20

25

30

35

2 hr 3 hr 4 hr 5 hr 6 hr 5 hr at 37°C

and overnightat 4ºC

Page 7 of 15(page number not for citation purposes)

Page 8: Open Access Optimization and evaluation of T7 based RNA ...€¦ · (not employing a TS primer) and 9700 clone cDNA microarrays. They found that a differential expression ra-tio cut-off

BMC Genomics 2002, 3 http://www.biomedcentral.com/1471-2164/3/31

Zs is a specific expression component, with variance about0.04 for the amplified arrays and 0.14 for the unamplifiedarrays.

Therefore, the amplified expression values for genes on anarray are largely the same as on the unamplified array. Thecomponent of variation in which they differ appears to becommon for all amplified protocols, and shows a muchhigher variance in the unamplified arrays. The effect ofamplification can be summarized by saying that it has adampening effect on the true expression of some genes(decreased variance in gene expression – see Figure 4).

A recent report compared amplified expression profiles ofdifferent primary breast tumors using Affymetrix® arrays[30]. Unger et al. found that correlation coefficients be-tween gene expressions in different tumors revealed byaRNA ranged between 0.71–0.89. However, gene expres-sion was not measured using unamplified poly(A)+ RNA

from these different tumors, raising the question ofwhether the observed fairly high correlation between di-verse tumors was due to amplification bias. To answer thisquestion, we compared the correlation coefficients be-tween gene expression profiles of different tumors deter-mined either with unamplified poly(A)+ RNA or aRNAamplified using our protocol (Table 7). The correlationcoefficients between BC2 and BC91 measured using po-ly(A)+ RNA or aRNA are the same, 0.55. Moreover, thecorrelation between differences in gene expression be-tween amplified and unamplified samples for different tu-mors is weak (Figure 4), suggesting that genes that differthrough amplification depend on the sample rather thanon systematic changes from amplification. Our resultsdemonstrate that different primary breast tumors are notclosely related to each other in gene expression profiles,and amplification does not affect the correlation of geneexpression between different tumors. Amplification is

Figure 2Box graph of correlation coefficients of the gene expression levels for 11,123 clones, comparing individualamplified samples to the gold standard of BC2 (idealizing unamplified poly(A)+ RNA). Each closed circle repre-sents the correlation coefficient for each individual sample amplified with a particular protocol to the gold standard. The aver-age and virtual correlation coefficients of the replicate samples for each protocol are shown below the graph.

0.72

0.74

0.76

0.78

0.82

0.84

0.86

0.88

0.80

Virtual correlation

coefficient 0.85 0.86 0.83 0.85

Average correlation

coefficient0.79±0.04 0.83±0.04 0.79±0.03 0.82±0.03

Affymetrix Affymetrixw/o ligase

Jeffrey lab withG-50 cleanup

Jeffrey lab

Co

rre

latio

n c

oe

ffic

ien

t

Page 8 of 15(page number not for citation purposes)

Page 9: Open Access Optimization and evaluation of T7 based RNA ...€¦ · (not employing a TS primer) and 9700 clone cDNA microarrays. They found that a differential expression ra-tio cut-off

BMC Genomics 2002, 3 http://www.biomedcentral.com/1471-2164/3/31

Figure 3Scatterplot matrix using average expression ratios of multiple replicate amplifications for each protocol andthe gold standard. The X-axis and Y-axis show virtual gene expression level [normalized log(base2) fluorescence intensityratio of sample to reference averaged over multiple arrays] measured using aRNA amplified by different protocols or unampli-fied poly(A)+ RNA as labelled. The last column of plots shows each amplification protocol (Y-axis) vs. gold standard (X-axis).The correlation coefficient for each pair is listed in each plot. The orange and blue shaded regions indicate more than a two-fold difference between the virtual expression values for each protocol being compared.

Page 9 of 15(page number not for citation purposes)

Page 10: Open Access Optimization and evaluation of T7 based RNA ...€¦ · (not employing a TS primer) and 9700 clone cDNA microarrays. They found that a differential expression ra-tio cut-off

BMC Genomics 2002, 3 http://www.biomedcentral.com/1471-2164/3/31

therefore suitable for comparison of gene expression pro-files among large sample sets.

Evaluation of reproducibility of T7 based linear amplifica-tionAnother important aspect of RNA amplification is the de-gree of reproducibility. To evaluate this, we calculated thecorrelation coefficients between individual amplifiedsamples. The correlation coefficients between individualhybridizations done on the same day for poly(A)+ RNAaveraged 0.97 and for aRNA amplified on the same dayaveraged from 0.91–0.98 (Table 8). The correlation coef-ficients between individual hybridizations done on differ-ent days averaged 0.89 for poly(A)+ RNA and 0.85–0.90for aRNA amplified from total RNA on different days. Thereproducibility of our protocol is slightly better than thatusing other protocols. Notably, when samples are ampli-fied on the same day, the correlations are significantlyhigher than when samples are amplified on different days.In addition, samples amplified with protocols omittingligase activity give higher reproducibility regardless ofwhether they are amplified on the same day or not.

The effect of the amount of input total RNA on amplifica-tionTo determine the effect of the amount of input total RNAon single round amplification, we amplified differentamounts of total RNA, 3 µg, 1 µg, 300 ng, 100 ng, 30 ng

and 10 ng, using different amounts of T7 primer accord-ing to the quantity of input total RNA (Table 9). When theinput total RNA is lower than 300 ng, the yield of aRNAfor is lower than the standard quantity required for onehybridization (3 µg). At amounts greater or equal to 300ng, the correlation coefficients between amplified and un-amplified samples and among amplified samples remainabout the same. The fold of amplification increases withsmaller quantities of template RNA, but the absolute yieldof aRNA decreases. Therefore, within the range of 0.3–3µg total RNA, decreasing the input RNA does not affect thefidelity and reproducibility of amplification.

All original microarray data may be accessed at the RNAAmplification for Microarrays website [http://genome-www.stanford.edu/breast_cancer/amplification/].

ConclusionsIn conclusion, T7 based linear amplification generateshigh fidelity aRNA for gene expression profiling usinghigh-density cDNA microarrays. The average correlationcoefficient between amplified and unamplified samples is0.82 with less than 4% of genes showing changes in ex-pression level by 2-fold or greater using the optimized(Jeffrey lab) protocol. The correlation to unamplified po-ly(A)+ RNA increases to 0.85 when experimental variabil-ity is minimized by configuring multiple amplifiedsamples into a virtual array. Reproducibility between sam-

Table 5: Variance of true gene expression (Var Z) and measurement error (Var e) for each of the different amplified protocols and the unamplified arrays.

Affymetrix Affymetrix w/o ligase Jeffrey lab with G-50 cleanup

Jeffrey lab Poly(A)+ RNA

Var Z 0.661 0.658 0.623 0.661 0.726Var e 0.102 0.066 0.078 0.055 0.059

Table 6: Covariances between the "true" gene expression for the different amplification protocols, estimated from the virtual arrays. The diagonal of the table contains the variances for the techniques.

Affymetrix Affymetrix w/o ligase

Jeffrey lab with G-50 cleanup

Jeffrey lab Poly(A)+ RNA

Affymetrix 0.68 0.66 0.62 0.63 0.60Affymetrix w/o ligase 0.66 0.67 0.62 0.64 0.61

Jeffrey lab with G-50 cleanup 0.62 0.62 0.63 0.64 0.57Jeffrey lab 0.63 0.64 0.64 0.67 0.60

Poly(A)+ RNA 0.60 0.61 0.57 0.60 0.74

Page 10 of 15(page number not for citation purposes)

Page 11: Open Access Optimization and evaluation of T7 based RNA ...€¦ · (not employing a TS primer) and 9700 clone cDNA microarrays. They found that a differential expression ra-tio cut-off

BMC Genomics 2002, 3 http://www.biomedcentral.com/1471-2164/3/31

ples amplified with this technique is high, especiallywhen performed on the same day rather than weeks apart.Amplification produces a dampening effect on gene ex-pression variation.

MethodsTissue acquisitionTwo primary breast carcinomas, BC2 and BC91, in whichmore than 90% of the breast epithelial cells were cancer

cells, were chosen for experiments in this study. The spec-imens were frozen in either liquid nitrogen (BC2) or ondry ice (BC91) within 30 minutes following devasculari-zation and stored at -80°C. Frozen sections were cut fromprimary breast carcinoma specimens and stained with he-matoxylin and eosin to confirm tumor content.

Figure 4Scatterplot of the t-statistics (the numerical score underlying a t-test) comparing the differences in geneexpression between amplified and unamplified RNA for BC2 (X-axis) and BC91 (Y-axis). The tests were based on7 amplified total RNA and 5 unamplified poly(A)+ RNA samples for BC2, and 2 amplified total RNA and 4 unamplified poly(A)+RNA samples for BC91.

Page 11 of 15(page number not for citation purposes)

Page 12: Open Access Optimization and evaluation of T7 based RNA ...€¦ · (not employing a TS primer) and 9700 clone cDNA microarrays. They found that a differential expression ra-tio cut-off

BMC Genomics 2002, 3 http://www.biomedcentral.com/1471-2164/3/31

RNA preparationTotal RNA was isolated from primary tumor tissue usingTRIzol® solution (Invitrogen™) following homogeniza-tion using a PowerGen Model 125 (Fisher Scientific). Po-ly(A)+ RNA was isolated from total RNA with theFastTrack 2.0 kit (Invitrogen™). The concentration of total

RNA and poly(A)+ RNA was determined using a GeneSpecI spectrophotometer (Hitachi) and the integrity of totalRNA and poly(A)+ RNA was assessed using a 2100 Bioan-alyzer (Agilent).

Table 7: Correlation between expression levels of different tumors (BC2 and BC91) determined with both poly(A)+ RNA and aRNA for each tumor.a,b

Virtual correlation coefficient Number of arrays

BC2 poly(A)+ RNA vs. BC91 poly(A)+ RNA 0.55 5 vs. 4BC2 aRNA vs. BC91 aRNA 0.55 2 vs. 2BC2 aRNA vs. BC2 poly(A)+ RNA 0.84 2 vs. 5BC91 aRNA vs. BC91 poly(A)+ RNA 0.82 2 vs. 4BC2 aRNA vs. BC91 poly(A)+ RNA 0.52 2 vs. 4BC91 aRNA vs. BC2 poly(A)+ RNA 0.46 2 vs. 5

aData were obtained from comparing expression level of 11,929 clones using the described selection criteria. bData were obtained using the Jeffrey lab protocol with G-50 cleanup.

Table 8: Evaluation of the reproducibility of T7 based amplification.a,b

Protocol Affymetrixc Affymetrix w/o ligase

Jeffrey lab with G-50 Jeffrey lab poly(A)+ RNA

Average correlation coefficient

Same day 0.91 ± 0.04 (n = 3) 0.98 (n = 2) 0.95 ± 0.01 (n = 3) 0.97 (n = 2) 0.97 ± 0.01 (n = 3)

Different day

0.84 ± 0.05 (n = 3) 0.88 ± 0.05 (n = 3) 0.88 ± 0.03 (n = 4) 0.90 ± 0.03 (n = 3)

0.89 ± 0.03 (n = 2)

Overall 0.86 ± 0.05 (n = 6) 0.89 ± 0.05 (n = 5) 0.89 ± 0.03 (n = 7) 0.91 ± 0.03 (n = 5)

0.92 ± 0.05 (n = 5)

aData were obtained from comparing expression level of 11,123 clones using the described selection criteria. bAmplification was done using 3 µg BC2 total RNA. cNote that hybridizations were performed on Stanford printed cDNA microarrays, not Affymetrix arrays.

Table 9: The effect of the amount of template BC2 total RNA on the fidelity, reproducibility and yield of amplification.a

Input total RNA 3 µg 1 µg 300 ng 100 ng 30 ng 10 ng

T7 primer used (µg) 0.5 0.2 0.1 0.1 0.1 0.1Average correlation coefficient Amplified vs. gold standard

(fidelity)0.80 ± 0.04 (n = 4) 0.81 ± 0.05 (n = 3) 0.84 ± 0.05 (n = 2) ND ND ND

Amplified vs. amplified (reproducibility)

0.92 ± 0.04 (n = 4) 0.88 ± 0.04 (n = 3) 0.90 (n = 2) ND ND ND

Yield (µg) 15.2 ± 5.6 (n = 4) 7.4 ± 4.6 (n = 3) 3.1 ± 1.6 (n = 2) 0.60 ± 0.3 (n = 2)

0.33 ± 0.2 (n = 3)

0.11 ± 0.1 (n = 3)

Fold of amplification 507 ± 186 740 ± 460 1033 ± 530 600 ± 300 1100 ± 667

1100 ± 1000

aData were obtained from comparing expression levels of 13,164 clones using the described selection criteria. ND = not determined due to insuffi-cient yield of aRNA for microarray hybridization after single round of amplification.

Page 12 of 15(page number not for citation purposes)

Page 13: Open Access Optimization and evaluation of T7 based RNA ...€¦ · (not employing a TS primer) and 9700 clone cDNA microarrays. They found that a differential expression ra-tio cut-off

BMC Genomics 2002, 3 http://www.biomedcentral.com/1471-2164/3/31

RNA amplificationThe amplification of total RNA or poly(A)+ RNA was per-formed based on a previously described protocol [24]with our modifications.

For first strand cDNA synthesis, 3 µg (unless otherwisespecified) tumor total RNA, Universal Human Referencetotal RNA (Stratagene®), or 150 ng poly(A)+ RNA wasmixed with 1 µg Eberwine primer (Operon®) in RNase-free water to a total volume of 9 µl. The RNA/primer mix-ture was denatured at 70°C for 3 min and cooled on icefor 2 min, followed by adding: 4 µl of 5X first strand buff-er (Invitrogen™), 2 µl 0.1 M DTT, 1 µl RNasin® (40 U/µl,Promega™), 2 µl 10 mM dNTP, and 2 µl Superscript™ II(200 U/µl, Invitrogen™), and incubated at 42°C for 1.5hours.

Second strand cDNA synthesis was performed by mixingthe first strand synthesis reaction with 106 µl RNase-freewater, 15 µl 10X Advantage™ PCR buffer (Clontech), 3 µl10 mM dNTP mix, 3 µl Advantage™ cDNA polymerasemix (Clontech), and 1 µl RNase H (2 U/µl, Invitrogen™).The reaction was incubated at 37°C for 5 min to digestRNA, followed by 94°C for 2 min to activate the Advan-tage™ cDNA polymerase, 65°C for 1 min to prime and75°C for 30 min to extend the second strand cDNA. Thereaction was stopped by the addition of 7.5 µl 1 M NaOH/2 mM EDTA and incubated at 65°C for 10 min.

ds cDNA was extracted with an equal volume of phe-nol:chloroform: isoamyl alcohol (25:24:1), transferred toa Phase Lock Gel™ tube (Eppendorf) and centrifuged at16,000 g for 5 min. The ds cDNA (aqueous layer) wastransferred to a new tube and precipitated by adding 1 µllinear acrylamide (0.1 µg/µl), 70 µl 7.5 M NH4Ac and 1ml 200 proof ethanol, and centrifuged at 16,000 g for 20min at room temperature. The pellet was washed in 500µl 75% ethanol, centrifuged at 16,000 g for 5 min, airdried and resuspended in 16 µl RNase-free water.

In vitro transcription of ds cDNA was performed using aT7 MEGAscript™ kit (Ambion®). Four microliters of each75 mM NTP, 4 µl of 10X reaction buffer and 4 µl of T7polymerase mix was added to the 16 µl of ds cDNA. Thereaction was then carried out at 37°C for 5 hours. aRNAwas cleaned up using an RNeasy® mini kit (Qiagen®) asdescribed by the manufacturer.

The optimized version of the amplification protocol mayalso be downloaded from [http://www.stanford.edu/group/sjeffreylab/].

RNA labelling and hybridizationThree micrograms of aRNA (unless otherwise specified) or2 µg poly(A)+ RNA were labelled either with Cy5-dUTP

(experimental sample) or Cy3-dUTP (reference) in a 30.4µl reaction. RNA was first mixed with either 8 µg of ran-dom primer for aRNA or 5 µg of oligo(dT) primer for po-ly(A)+ RNA in 16 µl of RNase-free water. RNA/primer mixwas incubated at 70°C for 10 min and cooled on ice for 2min. The following reagents were added: 6 µl of 5X firststrand buffer, 3 µl 0.1 M DTT, 0.7 µl 50X dNTP (25 mMdATP, dCTP, dGTP and 10 mM dTTP), 3 µl 1 mM Cy3-dUTP or Cy5-dUTP and 1.7 µl Superscript™ II (200 U/µl).The labelling reaction was carried out at 42°C for 2 hourduring which 1 µl Superscript™ II was added to the reac-tion at the end of the first hour. The input RNA was hydro-lyzed by adding 15 µl 0.1 M NaOH/2 mM EDTA andincubated at 65°C for 8 min, followed by neutralizationwith 15 µl 0.1 M HCl. The Cy5 and Cy3 labelled probeswere combined and purified in a Microcon® YM-30 col-umn (Millipore) by washing three times with Tris-EDTAbuffer. 15 µg Human Cot-1 DNA was added to the probebefore the first wash. The purified probe was adjusted to atotal volume of 26 µl and mixed with 5.3 µl 20X saline-sodium citrate (SSC), 1 µl yeast tRNA (10 µg/µl), 2 µl po-ly(A) DNA (10 µg/µl), and 0.6 µl 10% sodium dodecylsulfate (SDS). The resulting 35 µl probe solution was de-natured at 95°C for 2 min and then incubated at 42°C for25 min. The probe was then hybridized to cDNA arrays at65°C for 14–18 hours. Depending on the print batch, thearrays contained from 42,772 to 43,915 spots (41,931–42,602 distinct clones representing 16,907–18,417named genes, 3946–4145 ESTs with known functions and19,369–21,384 ESTs with unknown functions), and weremanufactured as previously described [31–33]. Followinghybridization, the arrays were washed with 2X SSC with0.05% SDS once for 2 min at room temperature, 1X SSCfor 2 min at room temperature, 0.2X SSC three times for 1min at 45–50°C.

Imaging and data analysisThe arrays with hybridized probes were scanned using anAxon scanner. The scanned images were analyzed first us-ing GenePix® Pro 3.0 software (Axon Instruments) andspots of poor quality determined by visual inspectionwere also removed from further analysis. The resultingdata collected from each array was submitted to the Stan-ford Microarray Database (SMD, [http://genome-www5.stanford.edu/microarray/SMD]) [34]. A total of 97arrays were submitted (60 experiments done with BC2and 37 experiments performed with BC91). Only featureswith a signal intensity >50% above background in bothCy5 and Cy3 channels for all of the samples included in aparticular analysis were retrieved from SMD. Pearson'scorrelation coefficient was calculated using Microsoft® Ex-cel 2000. A components of variance model was used to ex-plain different sources of variation in the amplificationprotocols.

Page 13 of 15(page number not for citation purposes)

Page 14: Open Access Optimization and evaluation of T7 based RNA ...€¦ · (not employing a TS primer) and 9700 clone cDNA microarrays. They found that a differential expression ra-tio cut-off

BMC Genomics 2002, 3 http://www.biomedcentral.com/1471-2164/3/31

Authors' contributionsHZ participated in study design, isolated the RNA, per-formed all the amplification and hybridization experi-ments, participated in data analysis, and drafted themanuscript. TH performed the statistical analyses anddrafted the statistical portions of the manuscript. MW par-ticipated in study design and editing of the manuscript.ALBD participated in data analysis and editing of themanuscript. SSJ conceived of the study, guided its designand coordination, participated in data analysis, draftedportions of the manuscript, and supervised editing.

All authors read and approved the final manuscript.

AcknowledgementsWe are grateful to Drs. David Botstein and Patrick O. Brown for helpful discussions and Susan Overholser for her invaluable assistance in the prep-aration of this manuscript. This work was supported by NIH/NCI Grant U01 CA85129 and California Breast Cancer Research Program Grant 5JB-0126. M.L.W. is supported by a National Research Service award from the National Human Genome Research Institute and by funds from the Sclero-derma Research Foundation. S.S.J.'s website is Stefanie Jeffrey Lab [http://www.stanford.edu/group/sjeffreylab/].

References1. Perou CM, Jeffrey SS, van de Rijn M, Rees CA, Eisen MB, Ross DT,

Pergamenschikov A, Williams CF, Zhu SX, Lee JC, et al: Distinctivegene expression patterns in human mammary epithelialcells and breast cancers. Proc Natl Acad Sci USA 1999, 96:9212-9217

2. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, MesirovJP, Coller H, Loh ML, Downing JR, Caligiuri MA, et al: Molecularclassification of cancer: class discovery and class predictionby gene expression monitoring. Science 1999, 286:531-537

3. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A,Boldrick JC, Sabet H, Tran T, Yu X, et al: Distinct types of diffuselarge B-cell lymphoma identified by gene expression profil-ing. Nature 2000, 403:503-511

4. Perou CM, Sørlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pol-lack JR, Ross DT, Johnsen H, Akslen LA, et al: Molecular portraitsof human breast tumours. Nature 2000, 406:747-752

5. Sørlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, HastieT, Eisen MB, van de Rijn M, Jeffrey SS, et al: Gene expression pat-terns of breast carcinomas distinguish tumor subclasses withclinical implications. Proc Natl Acad Sci USA 2001, 98:10869-10874

6. Hedenfalk I, Duggan D, Chen Y, Radmacher M, Bittner M, Simon R,Meltzer P, Gusterson B, Esteller M, Kallioneiemi OP, et al: Gene ex-pression profiles in hereditary breast cancer. N Engl J Med2001, 344:539-548

7. Garber ME, Troyanskaya OG, Schluens K, Petersen S, Thaesler Z, Pa-cyna-Gengelbach M, van de Rijn M, Rosen GD, Perou CM, et al: Di-versity of gene expression in adenocarcinoma of the lung.Proc Natl Acad Sci USA 2001, 98:13784-13789

8. Hegde P, Qi R, Gaspard R, Abernathy K, Dharap S, Earlem-Hughes J,Gay C, Nwokekeh NU, Chen T, Saeed AI, et al: Identification of tu-

mor markers in models of human colorectal cancer using a19,200-element complementary DNA microarray. Cancer Res2001, 61:7792-7797

9. van't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Pe-terse HL, van der Kooy K, Marton MJ, Witteveen AT, et al: Gene ex-pression profiling predicts clinical outcome of breast cancer.Nature 2002, 415:530-536

10. Jeffrey SS, Fero MJ, Børresen-Dale A-L, Botstein D: Expression ar-ray technology: applications for the diagnosis and treatmentof breast cancer. Mol Interv 2002, 2:101-109

11. Debouck C, Metcalf B: The impact of genomics on drug discov-ery. Annu Rev Pharmacol Toxicol 2000, 40:193-207

12. Clarke PA, te Poele R, Wooster R, Workman P: Gene expressionmicroarray analysis in cancer biology, pharmacology, anddrug development: progress and potential. Biochem Pharmacol2001, 62:1311-1336

13. Assersohn L, Gangi L, Zhao Y, Dowsett M, Simon R, Powles TJ, LiuET: The feasibility of using fine needle aspiration from prima-ry breast cancers for cDNA microarray analyses. Clin CancerRes 2002, 8:794-801

14. Xiang CC, Kozhich OA, Chen M, Inman JM, Phan QP, Chen Y, Brown-stein MJ: Amine-modified random primers to label probes forDNA microarrays. Nat Biotechnol 2002, 20:738-742

15. Sgroi DC, Teng S, Robinson G, LeVangie R, Hudson JR Jr, ElkahlounAG: In vivo gene expression profile analysis of human breastcancer progression. Cancer Research 1999, 59:5656-5661

16. Luo L, Salunga RC, Guo H, Bittner A, Joy KC, Galindo JE, Xiao H, Rog-ers KE, Wan JS, Jackson MR, et al: Gene expression profiles of la-ser-captured adjacent neuronal subtypes. Nat Med 1999, 5:117-122

17. Ohyama H, Zhang X, Kohno Y, Alevizos I, Posner M, Wong DT, ToddR: Laser capture microdissection-generated target samplefor high-density oligonucleotide array hybridization. Biotech-niques 2000, 29:530-536

18. Luzzi V, Holtschlag V, Watson MA: Expression Profiling of ductalcarcinoma in situ by laser capture microdissection and high-density oligonucleotide arrays. Am J Pathology 2001, 158:2005-2010

19. Van Gelder RN, von Zastrow ME, Yool A, Dement WC, Barchas JD,Eberwine JH: Amplified RNA synthesized from limited quanti-ties of heterogeneous cDNA. Proc Natl Acad Sci U S A 1990,87:1663-1667

20. Eberwine J, Yeh H, Miyashiro K, Cao Y, Nair S, Finnell R, Zettel M,Coleman P: Analysis of gene expression in single live neurons.Proc Natl Acad Sci U S A 1992, 89:3010-3014

21. Phillips J, Eberwine JH: Antisense RNA amplification: a linearamplification method for analyzing the mRNA populationfrom single living cells. Methods 1996, 10:283-288

22. Wodicka L, Dong H, Mittmann M, Ho MH, Lockhart DJ: Genome-wide expression monitoring in Saccharomyces cerevisiae.Nat Biotechnol 1997, 15:1359-1367

23. Mahadevappa M, Warrington J: A high-density probe array sam-ple preparation method using 10- to 100-fold fewer cells. NatBiotechnol 1999, 17:1134-1136

24. Wang E, Miller LD, Ohnmacht GA, Liu ET, Marincola FM: High-fidel-ity mRNA amplification for gene profiling. Nat Biotechnol 2000,18:457-459

25. Pabón C, Modrusan Z, Ruvolo MV, Coleman IM, Daniel S, Yue H, Ar-nold LJ Jr, Reynolds MA: Optimized T7 amplification system formicroarray analysis. Biotechniques 2001, 31:874-879

26. Affymetrix GeneChip® Expression Analysis Technical Manu-al [http://www.affymetrix.com/Download/manuals/expression_manual.pdf]

27. Hu L, Wang J, Baggerly K, Wang H, Fuller GN, Hamilton SR,Coombes KR, Zhang W: Obtaining reliable information fromminute amounts of RNA using cDNA microarrays. BMC Ge-nomics 2002, 3:16

28. Sotiriou C, Powles TJ, Dowsett M, Jazaeri AA, Feldman AL, Asser-sohn L, Gadisetti C, Libutti SK, Liu ET: Gene expression profilesderived from fine needle aspiration correlate with responseto systemic chemotherapy in breast cancer. Breast Cancer Res2002, 4:R3

29. Matz M, Shagin D, Bogdanova E, Britanova O, Lukyanov S, DiatchenkoL, Chenchik A: Amplification of cDNA ends based on tem-plate-switching effect and step-out PCR. Nucleic Acids Res 1999,27:1558-1560

Additional material

Additional File 1Statistical AppendixClick here for file[http://www.biomedcentral.com/content/supplementary/1471-2164-3-31-S1.pdf]

Page 14 of 15(page number not for citation purposes)

Page 15: Open Access Optimization and evaluation of T7 based RNA ...€¦ · (not employing a TS primer) and 9700 clone cDNA microarrays. They found that a differential expression ra-tio cut-off

BMC Genomics 2002, 3 http://www.biomedcentral.com/1471-2164/3/31

30. Unger MA, Rishi M, Clemmer VB, Hartman JL, Keiper EA, GreshockJD, Chodosh LA, Liebman MN, Weber BL: Characterization of ad-jacent breast tumors using oligonucleotide microarrays.Breast Cancer Res 2001, 3:336-341

31. Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitor-ing of gene expression patterns with a complementary DNAmicroarray. Science 1995, 270:467-70

32. Shalon D, Smith SJ, Brown PO: A DNA microarray system foranalyzing complex DNA samples using two-color fluores-cent probe hybridization. Genome Res 1996, 6:639-645

33. DeRisi JL, Iyer VR, Brown PO: Exploring the metabolic and ge-netic control of gene expression on a genomic scale. Science1997, 278:680-686

34. Sherlock G, Hernandez-Boussard T, Kasarskis A, Binkley G, MateseJC, Dwight SS, Kaloper M, Weng S, Jin H, Ball CA, et al: The Stan-ford Microarray Database. Nucleic Acids Res 2001, 29:152-155

Publish with BioMed Central and every scientist can read your work free of charge

"BioMedcentral will be the most significant development for disseminating the results of biomedical research in our lifetime."

Paul Nurse, Director-General, Imperial Cancer Research Fund

Publish with BMC and your research papers will be:

available free of charge to the entire biomedical community

peer reviewed and published immediately upon acceptance

cited in PubMed and archived on PubMed Central

yours - you keep the copyright

[email protected] your manuscript here:http://www.biomedcentral.com/manuscript/

BioMedcentral.com

Page 15 of 15(page number not for citation purposes)