13
SURVEY AND SUMMARY Biases in small RNA deep sequencing data Carsten A. Raabe 1, *, Thean-Hock Tang 2 , Juergen Brosius 1 and Timofey S. Rozhdestvensky 1, * 1 Institute of Experimental Pathology (ZMBE), University of Muenster, Von-Esmarch-Strasse 56, 48149 Muenster, Germany and 2 Advanced Medical and Dental Institute (AMDI), Universiti Sains Malaysia, 13200 Penang, Malaysia Received July 11, 2013; Revised September 20, 2013; Accepted October 8, 2013 ABSTRACT High-throughput RNA sequencing (RNA-seq) is considered a powerful tool for novel gene discovery and fine-tuned transcriptional profiling. The digital nature of RNA-seq is also believed to simplify meta-analysis and to reduce background noise associated with hybridization-based approaches. The development of multiplex sequencing enables efficient and economic parallel analysis of gene ex- pression. In addition, RNA-seq is of particular value when low RNA expression or modest changes between samples are monitored. However, recent data uncovered severe bias in the sequencing of small non-protein coding RNA (small RNA-seq or sRNA-seq), such that the expression levels of some RNAs appeared to be artificially enhanced and others diminished or even undetectable. The use of different adapters and barcodes during ligation as well as complex RNA structures and modifications drastically influence cDNA synthesis efficacies and exemplify sources of bias in deep sequencing. In addition, variable specific RNA G/C-content is associated with unequal polymerase chain reaction amplification efficiencies. Given the central importance of RNA-seq to molecular biology and personalized medicine, we review recent findings that challenge small non-protein coding RNA-seq data and suggest approaches and precau- tions to overcome or minimize bias. INTRODUCTION Based on the detection of open reading frames (ORFs), entire transcriptomes are differentiated into two main RNA classes. The first comprises all protein coding messenger RNAs (mRNAs), and various subclasses of non-protein coding RNAs constitute the second (1,2). Although non-protein coding RNAs do not encode proteins, they are key players in controlling diverse bio- chemical pathways. Functions carried out by non-protein coding RNAs might be exerted by the RNA itself or in complex with proteins (1–3). RNA size criteria to define small non-protein coding RNA (The size cutoff for defining small RNAs used to be about 500 nt to include small RNAs, such as SRP RNA (300 nt), 7SK RNA (332 nt), MRP RNA (287 nt) and small nuclear RNAs. Large or long RNAs were defined as being in the size range of mRNAs but devoid of apparent ORFs. The first publications on miRNAs correctly designated them as tiny RNAs to discriminate them from the larger small RNAs (4,6–8)) remain a matter of debate (4); however, in eukaryotes, size ranges of 10–500 nt are generally accepted (5). Small non-protein coding RNAs participate in the regulation of many cellular activities, including replica- tion, chromosome modification, transcription, splicing and translation, in various organisms (6,7). Expression of many non-protein coding RNAs is subject to tight spatiotemporal control (8). Therefore, in addition to non-protein coding RNA discovery, it is of major import- ance to validate their corresponding expression profiles. Of late, research on non-protein coding RNAs has focused on discovery and functional analysis of micro RNAs (miRNAs), a vast class of 22 nt-long non- protein coding RNAs, detected in multicellular and most unicellular eukaryotes (9–12). They are involved in the fine-tuned regulation of various biochemical pathways by modulating gene expression. The majority of miRNAs exert their function via base complementarities to mRNA targets. Depending on the degree of comple- mentarity, miRNAs complexed with proteins (RISC, RNA-induced silencing complexes) down-regulate trans- lation or trigger RNA degradation (7,13). In addition to these posttranscriptional mechanisms of action, novel nuclear functions for miRNAs are also suggested (14,15). Furthermore, the use of miRNAs as potential *To whom correspondence should be addressed. Tel: +49 251 8358607; Fax:+49 251 8352134; Email: [email protected] Correspondence may also be addressed to Carsten A. Raabe. Tel: +49 251 8358615; Fax: +49 251 8352134; Email: [email protected] Nucleic Acids Research, 2013, 1–13 doi:10.1093/nar/gkt1021 ß The Author(s) 2013. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research Advance Access published November 5, 2013

SURVEY AND SUMMARY Biases in small RNA deep sequencing data

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SURVEY AND SUMMARY Biases in small RNA deep sequencing data

SURVEY AND SUMMARY

Biases in small RNA deep sequencing dataCarsten A Raabe1 Thean-Hock Tang2 Juergen Brosius1 and

Timofey S Rozhdestvensky1

1Institute of Experimental Pathology (ZMBE) University of Muenster Von-Esmarch-Strasse 56 48149 MuensterGermany and 2Advanced Medical and Dental Institute (AMDI) Universiti Sains Malaysia 13200 PenangMalaysia

Received July 11 2013 Revised September 20 2013 Accepted October 8 2013

ABSTRACT

High-throughput RNA sequencing (RNA-seq) isconsidered a powerful tool for novel gene discoveryand fine-tuned transcriptional profiling The digitalnature of RNA-seq is also believed to simplifymeta-analysis and to reduce background noiseassociated with hybridization-based approachesThe development of multiplex sequencing enablesefficient and economic parallel analysis of gene ex-pression In addition RNA-seq is of particular valuewhen low RNA expression or modest changesbetween samples are monitored However recentdata uncovered severe bias in the sequencing ofsmall non-protein coding RNA (small RNA-seq orsRNA-seq) such that the expression levels ofsome RNAs appeared to be artificially enhancedand others diminished or even undetectable Theuse of different adapters and barcodes duringligation as well as complex RNA structures andmodifications drastically influence cDNA synthesisefficacies and exemplify sources of bias in deepsequencing In addition variable specific RNAGC-content is associated with unequal polymerasechain reaction amplification efficiencies Given thecentral importance of RNA-seq to molecular biologyand personalized medicine we review recentfindings that challenge small non-protein codingRNA-seq data and suggest approaches and precau-tions to overcome or minimize bias

INTRODUCTION

Based on the detection of open reading frames (ORFs)entire transcriptomes are differentiated into two mainRNA classes The first comprises all protein codingmessenger RNAs (mRNAs) and various subclasses of

non-protein coding RNAs constitute the second (12)Although non-protein coding RNAs do not encodeproteins they are key players in controlling diverse bio-chemical pathways Functions carried out by non-proteincoding RNAs might be exerted by the RNA itself or incomplex with proteins (1ndash3) RNA size criteria to definesmall non-protein coding RNA (The size cutoff fordefining small RNAs used to be about 500 nt to includesmall RNAs such as SRP RNA (300 nt) 7SK RNA(332 nt) MRP RNA (287 nt) and small nuclear RNAsLarge or long RNAs were defined as being in the sizerange of mRNAs but devoid of apparent ORFs Thefirst publications on miRNAs correctly designated themas tiny RNAs to discriminate them from the larger smallRNAs (46ndash8)) remain a matter of debate (4) however ineukaryotes size ranges of 10ndash500 nt are generally accepted(5) Small non-protein coding RNAs participate in theregulation of many cellular activities including replica-tion chromosome modification transcription splicingand translation in various organisms (67) Expressionof many non-protein coding RNAs is subject to tightspatiotemporal control (8) Therefore in addition tonon-protein coding RNA discovery it is of major import-ance to validate their corresponding expression profilesOf late research on non-protein coding RNAs hasfocused on discovery and functional analysis of microRNAs (miRNAs) a vast class of 22 nt-long non-protein coding RNAs detected in multicellular and mostunicellular eukaryotes (9ndash12) They are involved in thefine-tuned regulation of various biochemical pathwaysby modulating gene expression The majority ofmiRNAs exert their function via base complementaritiesto mRNA targets Depending on the degree of comple-mentarity miRNAs complexed with proteins (RISCRNA-induced silencing complexes) down-regulate trans-lation or trigger RNA degradation (713) In addition tothese posttranscriptional mechanisms of action novelnuclear functions for miRNAs are also suggested(1415) Furthermore the use of miRNAs as potential

To whom correspondence should be addressed Tel +49 251 8358607 Fax +49 251 8352134 Email rozhdestuni-muensterdeCorrespondence may also be addressed to Carsten A Raabe Tel +49 251 8358615 Fax +49 251 8352134 Email raabecuni-muenstserde

Nucleic Acids Research 2013 1ndash13doi101093nargkt1021

The Author(s) 2013 Published by Oxford University PressThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby30) whichpermits unrestricted reuse distribution and reproduction in any medium provided the original work is properly cited

Nucleic Acids Research Advance Access published November 5 2013

biomarkers to monitor progression of disease or treatmenthas been reported (1617)The high capacity and comparably low cost of modern

deep-sequencing analysis suggest that high-throughputsmall non-protein coding RNA sequencing (smallRNAseq or sRNA-seq) might be a superior tool whenapplied to personalized medicine The commonly usedmicroarray-based analysis has to compensate fordifferences in hybridization efficacies in order to ensuretrue representation of relative expression levelsHybridization-based assays must also be sensitiveenough to permit isoform-specific miRNA detectionSuch requirements are sometimes hard to attain sinceindividual miRNAs often display very different meltingtemperatures (18) Furthermore the quantification oflow-abundance miRNAs by micro-array is oftencomplicated due to low signal to noise ratios RNA-seqis particularly suited to investigating transcripts of lowabundance (81719) Deep sequencing is thereforethought to ultimately widen the dynamic range of RNAquantification (20ndash23) and the digital nature of RNA-seqreduces noise classically associated with hybridization-based approaches In addition RNA-seq does notdepend on prior genome annotation and enablesmapping of RNAs at single nucleotide resolution(2324) Because probetarget-specific hybridizationkinetics require expensive normalization to compareresults across platforms and to rank RNA levels accordingto relative abundance RNA-seq is generally considered tosimplify RNA quantification (25) In general RNA-seq iswidely regarded as an appropriate tool to characterizeentire transcriptomes and it has proven to be of particularvalue in small non-protein coding RNA discovery (sRNA-seq methods) (81924ndash26) Recent systematic investiga-tions however revealed severe method-dependent distor-tions in for example miRNA quantification (2728)Based on identical starting material the analysesdemonstrated that alternative methods in cDNA construc-tion resulted in entirely different miRNA expression levelprofiles Surprisingly the choice of sequencing platformcontributed little to the differences reported Library rep-licates to test for reproducibility yielded comparableresults indicating that data distortion was likely causedby differences inherent to cDNA construction protocols(27) Artificial RNA test sets containing equal amounts of473 synthetic human miRNAs were quantified by digitalgene expression analysis Throughout the study a non-uniform distribution of the miRNAs was uncovered thecorresponding values of miRNA expression differed by upto four orders of magnitude between datasets (27) Untilrecently sources of bias that might distort sRNA-seq datawere not considered Enzymes involved in RNA end-modification are obvious candidates for causing bias inrelative expression levels (29) however the subsequentsteps of reverse transcription and polymerase chainreaction (PCR) amplification are also likely to displaytemplate preferences thereby favoring the amplificationof some RNAs over others In particular the respectiveRNA GC-content greatly influences rates of cDNA syn-thesis and is also responsible for template-specific prefer-ences in PCR amplification (30ndash32) Not surprisingly a

recent analysis demonstrated that RNA secondary struc-ture and RNAndashadapter cofolding drastically influencedRNA ligation efficacies (33)

Here we review recent data indicating various sourcesof bias in sRNA-seq inherent to the common multipleexperimental steps involved in library preparationExperimental strategies and methods to overcome orminimize such distortion in expression levels are suggestedwherever possible Application of sRNA-seq-relatedmethods in mRNA deep sequencing and personalizedmedicine are also discussed

RNA END-MODIFICATION

The analysis of full-length non-protein coding RNAs insequencing projects requires RNA end-modification orequivalent strategies to ensure identification of nativeRNA termini as a precondition for cDNA construction(29) Irrespective of the ensuing protocol RNA 30-endsare subjected to enzymatic modification to add eitherhomopolymer stretches or synthetic adapter oligonucleo-tides (Figure 1 Supplementary Data and SupplementaryTable S1) The template-independent addition of oligo(C)or (A) stretches is catalyzed by Escherichia coli poly(A)polymerase Alternative strategies of RNA end-modifica-tion rely on RNA ligation to introduce adapter oligo-nucleotides RNA ligation is also applied to modifyRNA 50-ends

RNA 30-tailing

Homopolymer addition catalyzed by E coli poly(A) poly-merase is a well-established approach (3435) The enzymeaccepts all nucleotide triphosphates (NTPs) as substratesand catalyzes the template-independent addition ofnucleotide monophophate (NMP) to RNA 30-terminiAlternative procedures that employ the correspondingyeast nuclear counterpart (Poly(U) polymerase PUP)are more of a theoretical interest (3536) Sources of thepotential biases influencing RNA-30-end tailing are dis-cussed below

Substrate preferences of E coli Poly(A) polymeraseIn vitro bacterial polyadenylation is known to reduce thebiological halftime of targeted RNAs (37) Sequenceanalyses of in vivo-modified RNAs strongly suggestthat E coli poly(A) polymerase accepts only ATP as asubstrate in vitro it is reported to utilize albeit withunequal affinity all NTPs as substrates for transfer (38)Different affinities of poly(A) polymerase toward specificNTPs influence reaction rates significantly (38) In generalUTP and GTP additions necessitate extended incubationtimes to reach product yields comparable to tailing withATP and CTP Therefore the addition of homopolymeric(A) or (C) stretches is the most effective and commonpractice in cDNA generation (39ndash42)

RNA secondary structure influences RNA 30-end tailingIn vitro data indicate significant reduction in theefficacy of RNA 30-tailing caused by terminal stem loopstructures (38) implying that RNA denaturation prior to

2 Nucleic Acids Research 2013

RNA-tailing is of major importance to avoid bias (39)In vitro experiments also demonstrated that addition ofthree to six unpaired nucleotides might be sufficient torender the reaction secondary structure-independent

(38) However approximation of the minimal extensionlength needed to ameliorate RNA structure effectsshould not be generalized because it is likely to be RNAstructure-specific

Figure 1 Illustration of the steps involved in cDNA construction including potential sources of bias (A) The starting pool of non-protein codingRNAs with different 50- and 30-end modifications schematically indicated by different line types Abbreviations for the various modificationsOH hydroxyl OPO3 20-30-cyclic phosphate ppp triphosphate p monophosphate cap cap and 20-OCH3 20-O-methyl (B) The left paneldepicts different enzymatic pre-treatments prior to RNA 50-end ligation to enrich for different RNA subtypes From left to right RNA withoutany pretreatment (50-adapter ligation) RNA pretreated with tobacco acid phosphatase (TAP) RNA pretreated with TerminatorTM 50-phosphate-dependent exonuclease and TAP (Terminator 50-exo TAP) RNA treated with T4 polynucleotide kinase (T4 PNK) RNA classes accessible foradapter ligation after the respective 50-end pretreatments are schematically represented below each pretreatment The right panel depicts subtypes ofRNA classes accessible for 30-end tailing (-oligo(A) or ndasholigo(C) tailing) and adapter ligation (C) Possible biases associated with RNA 50-(left) and30-(right) end-modifications and with the subsequent steps of cDNA construction

Nucleic Acids Research 2013 3

RNA primary structure influences RNA 30-end tailingRecently to examine the influence of primary structureon RNA tailing efficacy three in vitro-transcribed testRNAs were assayed in comparative oligo(A)- andoligo(C)-tailing reactions (39) Interestingly after RNAdenaturation all test RNAs irrespective of primaryRNA structure performed equally well in the C-tailingassays and proceeded up to 95 completion as judgedby shifts in mobility (calculated as percent input) (39) Toexamine the influences of the 30-terminal nucleotide onRNA A-tailing RNAs identical except for the last nucleo-tide were examined in oligo(A)-tailing assays and up to2-fold changes in the efficacy of RNA modification werereported (36) The analysis also revealed that uridine atthe RNA 30-ends was the least preferred nucleotideHowever reaction kinetics indicated that even an RNAmolecule with an unfavorable 30-terminal uridine wasalmost completely modified if the reactions proceededlonger (36) Therefore increased incubation periodsmight compensate for substrate-specific variation inRNA 30-tailing efficacies Hence the available dataindicate that primary structure-related influences are lesslikely to bias sRNA-seq data if the protocol relies onRNA 30-end tailing However recent observations sug-gested that the RNA 50-end phosphorylation state influ-ences the resulting 30-end homopolymer tail length ofotherwise identical test RNAs (39)

RNA 30-end modification impacts tailing efficienciesThe 20-O-methylation of RNA 30-ends is also reported toreduce the efficacy of RNA tailing reactions (36) Theefficacy of oligo(A) addition can drop by as much as80 and even after extended incubation periods certainRNAs remain virtually unmodified (Figure 1) Therefore20-O-methylation at RNA 30-ends renders RNAs lessaccessible to RNA tailing (36) and such RNAs wouldeventually appear underrepresented in sRNA-seqNotably most plant miRNAs and vertebrate piRNAsare reported to be substrates for 20-O-methylation(4344) The drop in tailing efficacy and the correspondingdistortion in expression level might be particularlyrelevant when such RNAs are subjected to RNA tailingprocedures

RNA ligation

Today RNA ligase-catalyzed reactions are most oftenutilized to modify RNA termini for cDNA constructionThe genome of bacteriophage T4 harbors two differentRNA ligases T4Rnl1 and T4Rnl2 T4Rnl1 compensatesfor host defense mechanisms by sealing breaks introducedinto the anticodon loop of host tRNALys during infection(4546) T4Rnl2 has been detected more recentlyAlthough phylogenetically related to various classes ofDNA and RNA ligases the biological function ofT4Rnl2 remains unknown (334748) Both RNA ligasesas well as derivatives of T4Rnl2 are frequently utilized incDNA library construction In particular the truncatedversion of RNA ligase 2 has been reported to minimizeside reactions (4950) Variations in the compositions of

substrates and reagents during ligation are likely to intro-duce biases

RNA ligation-the reaction pathwayRNA ligation leads to ATP-dependent 30-50 phospho-diester bond formation Two molecular features aredistinguished (51) donor molecules that provide50-monophosphorylated RNA termini (5253) and ac-ceptors that contain 30-hydroxyl functional groups(52ndash54) (Figure 2) Two reaction intermediates areisolated suggesting a three-step reaction mechanismDuring the initial step of ligation accumulation ofmonoadenylated RNA ligase is observed (5556) Theenzyme-catalyzed adenosyl-transfer to the RNA ligaserequires phosphoramide bond formation and isaccompanied by pyrophosphate release (Figure 2) In thesecond step the AMP moiety is transferred to a50-monophosphorylated RNA donor molecule to form a50-adenylated RNA intermediate (50-AppRNA) (5253)(Figure 2) The 50-50 anhydride linkage is required toactivate donor molecules for the subsequent sealingreaction Finally an acceptor RNA 30-end hydroxylgroup (30-OH) attacks the alpha phosphorus of the 50-AppRNA donor molecule leading to covalent strandsealing and AMP release (5253) (Figure 2) The factthat pre-adenylated donors perform efficiently in reactionsconducted without ATP supplement provides formalproof of the reaction pathway and also is of major import-ance to practical applications (4957) The 50-App-modified adapters are now the reagents of choice tominimize side reactions (294957)

Side products in ligation reactionsA side reaction common to RNA ligation is the intramo-lecular RNA circularization (Figures 1 and 2) (29) Thereaction pathway is identical to the intermolecular ligationdiscussed above and is generally preferred due to closeproximity of the RNA ends Thus reactions that permitintra- and intermolecular products are dominated byintramolecular circularization (5158) It is of major im-portance to minimize side products in RNA ligation tomaintain reaction yields Generally the application of50-pre-adenylated adapter molecules in RNA 30-end liga-tions is presumed to abolish RNA circularization (5960)However the reversal of donor adenylation as a source ofcircularization was recently reported (293961) (Figure 2)In reactions performed with pre-adenylated donor mol-ecules 50-monophosphorylated RNAs were circularizedeven in the absence of ATP (2939) Remarkably forsome test RNAs the circularization constituted themajor product (39) Therefore preferred circularizationmight render certain RNAs inaccessible to cDNA con-struction The yield of circularization also depends onthe type of RNA-ligase enzyme utilized T4Rnl1-catalyzedreactions display the highest level of circularization (29)The tendency of T4Rnl1 to increase the amount of circularproduct might be explained in part by its biologicalfunction in intramolecular sealing reactions A C-termin-ally truncated version of T4Rnl2 displays minimal adenyl-transfer activity while maintaining the ability to form30 50 phosphodiester bonds (52) Ligase-mediated

4 Nucleic Acids Research 2013

RNA circularization depends on AMP removal from thepre-adenylated 50-App-adapter and its subsequent transferto 50-monophosphorylated RNAs therefore the applica-tion of truncated T4Rnl2 (T4Rnl2tr) to minimize this sidereaction is highly suggested (4950) In addition theexchange of lysine with glutamine (K227Q) in T4Rnl2trfurther reduced adenyl-transfer competence but retainedactivity to catalyze phosphodiester bond formation (48)Therefore the application of T4Rnl2 derivatives is bene-ficial for decreasing bias in sRNA-seq analysis (29)

Alternatively phosphatase pretreatment to generateRNA 50-OH termini that are inert to ligase-catalyzedadenylation helps to minimize side reactions The proced-ure ultimately requires the application of polynucleotidekinase to generate monophosphorylated RNA endsHowever tRNA rRNA and mRNA processing inter-mediates are reported to contain 50-hydroxylated termini(2962) Polynucleotide kinase treatment as a necessaryprerequisite to RNA 50-adapter ligation leads to the inclu-sion of abundant processing intermediates in cDNAlibraries The alternative techniques to avoid RNA circu-larization might distort cDNA representation and aretherefore likely to reduce the complexity of sRNA-seqexperiments (2962)

30-end ligation depends on RNAndashadapter combinationRNA 30-end ligation using T4Rnl1 was recently reportedas a potential source of bias in deep sRNA sequencing(39) Comparisons of the product yields of ligation reac-tions were made using three test RNAs and variousadapter sequences Depending on the specific RNAndashadapter combination the products differed remarkably

Similarly Jayaprakash et al (63) reported variations inproduct yields depending on specific RNAndashadapter com-binations in Rnl2tr-catalyzed reactions Because the RNA30-end ligation experiments were conducted with 50 pre-adenylated adapters the results also indicate thatadenyl-transfer is less likely the cause of the observedbias (63) RNA 30-adapter ligation was enhanced whenadapter pools displaying nucleotide variation at the two50 terminal positions were used (63) Therefore increasedvariability at adapter 50-ends enhances RNA ligation yieldand in general helps to minimize distortion in sRNA-seq(333963) RNA 50-phosphorylation also influences theefficacy of adapter ligation to RNA 30-ends (39) Theligation efficacies of T4 Rnl1-mediated reactions werecompared for test RNAs featuring different phosphoryl-ation states at RNA 50-ends and different adaptersincluding those with 50-App-modification UnexpectedlyRNAs that performed well when monophosphorylateddisplayed weak acceptor characteristics whentriphosphorylated Therefore the efficacy of RNA 30-endmodification might also be a function of the RNA 50-phos-phorylation state (Figure 1)

Ligation of adapters to modified RNAsIn ligation assays T4Rnl1 and T4Rnl2tr performedalmost identically using 30-unmodified test RNA sub-strates However the efficacy of T4Rnl1-catalyzed reac-tions dropped remarkably when 20-O-methylated RNAswere used (36) In contrast T4Rnl2tr performed almostequally well irrespective of RNA modification (36) Theadditions of 25 (wv) PEG 8000 to the T4Rnl2tr-mediated reactions significantly increased reaction yields

Figure 2 Schematic representation of the three-step mechanism of the bacteriophage T4 RNA ligase (T4Rnl) reaction with the potential sideproducts Encircled Arabic numbers indicate the order of ligation steps Step 1 adenylation of the T4Rnl active site Step 2 donor 50-adenylation(side reaction circularization of 30-end-unprotected donor molecules) Step 3 phosphodiester bond formation between 30-hydroxylated (OH) acceptorand donor molecules (side reaction reverse adenylation of donor and circularization of acceptor molecules) PPi pyrophosphate p monophosphateAMP adenosine monophosphate (Ap) App 50-adenylated termini

Nucleic Acids Research 2013 5

(5064) Hence additives that increase the effective mo-lecular concentration have a major influence on productyields and meta-analyses of data generated by differentligases or even in different buffer compositions are error-proneAs an aside specific enrichment of 30-end 20-

O-methylated RNAs like the piwi-interacting RNAs(piRNAs) or plant miRNAs is achieved by periodatetreatment The reaction relies on vicinal hydroxyl groupsto oxidize sugar moieties and is accompanied by pentosering opening The resulting vicinal dialdehyde reacts withwater and forms a chemical equilibrium with thecorresponding dioxane derivative upon ring closure The20-O-methyl group-containing RNAs are insensitive tooxidation and remain accessible for adapter ligation ortailing (65ndash67)

Impact of RNA secondary structure and adapter-RNAcofolding on ligationRecent analysis indicates that RNA secondary structureimpacts sRNA-seq bias (29) During kinetic profiling ofRNAndashadapter ligation test miRNAs were trapped in non-reactive states Denaturing the remaining non-ligatedRNAs resulted in increased ligation yield indirectly sug-gesting the influence of RNA secondary structure (29) Tofurther analyze the impact of secondary structure on RNA30-end ligation a pool of substrate RNAs with 21randomized nucleotides at their 30-ends was ligated withindividual pre-adenylated DNA adapters and reactionyields were subsequently quantified by deep sequencing(33) To avoid bias caused by template-specific RNA50-end modification(s) all test RNAs contained identical50-termini (33) RNAs that harbored stems at their 30-endswere underrepresented in the sequencing data whereasRNAs containing a minimum of three unstructured nu-cleotides at the 30-ends were overrepresented The dataindicate that RNA ligase-mediated reactions in generalare more efficacious with RNA substrates that harborstructurally accessible 30-ends (33)In addition the cofolding of RNA and adapter mol-

ecules impacts ligation (33) Adapter sequence variationthat interfered with the RNA 30-stem formation andpermitted intermolecular base-pairing increased ligationyields (33) Interestingly this data contradicts prioranalysis emphasizing a predominant primary sequence-related influence on data distortion (3363) Althoughthe actual source of discrepancy remains to be analyzedprotocol-specific variation probably due to changes in de-naturation is likely to have caused the observed difference(see supplementary information) Irrespective of the actualreason the advice for minimizing this bias is identical andrelies on the application of adapter pools to increase thelikelihood of productive interactions (33) Thus inaddition to the RNA nucleotide composition RNAndashadapter cofolding and secondary structure-related influ-ences contribute to sRNA-seq bias

RNA 50-end ligationThe chemical properties of RNA 50-ends display structuralvariations Depending on the starting material and experi-mental design it may be necessary to alter the chemistry

of RNA 50-termini prior to cDNA generation Forexample neither 50-triphosphorylated RNAs in prokary-otes nor primary eukaryotic RNA polymerase III tran-scripts are accessible for direct 5-adapter ligationSimilarly RNA polymerase II-specific trimethylguanosinecap structures render RNA 50-ends inaccessible Thusenzymatic pretreatment is usually conducted to avoidbias in the starting material (Figure 1 SupplementaryData and Supplementary Table S1)

Specific biochemistry of RNA 50-termini and enrichmentof various RNA classesAlthough the various chemical properties of RNA 50-endsestablish limitations for unbiased representation they alsopermit specific enrichments (41) (Figure 1 SupplementaryData and Supplementary Table S1) RNAs that containmonophosphorylated 50-ends and 30-hydroxyl groups aredirectly accessible to cDNA generation proceduresConsequently unaltered RNA starting material leadsto its specific enrichment (Figure 1 SupplementaryTable S1) In contrast triphosphorylated or 50-cappedRNAs require TAP treatment to generate 50-monophosphorylated RNAs and to permit 50-end ligation

Biochemical alternatives to specifically cleave pyrophos-phate bonds in 50-triphophorylated RNAs were reportedrecently (68) whereby TAP-mediated hydrolysis issubstituted by polyphosphatase (69) or pyrophospho-hydrolase (7071) treatment In vitro analysis to investigatethe substrate specificity of E coli rppH pyrophospho-hydrolase indicated that pyrophosphate hydrolysisproceeds with identical efficacy irrespective of the RNA50-terminal nucleotide (71) Furthermore a subtle declinein the enzymatic activity in response to potentially inhibi-tory RNA 50-terminal stem loop structures was reported(71) These strategies are particularly appealing when bac-terial primary transcripts or nascent eukaryotic RNA poly-merase III transcripts are central to the investigation Inaddition to targeting triphophorylated RNA 50-ends poly-phosphatase also accepts diphophorylated RNA 50-terminito catalyze pyrophosphate bond cleavage It should beemphasized that 50-capped RNA polymerase II transcriptsare not amenable to polyphosphatase or pyrophospho-hydrolase treatment and therefore escape identification

The application of a TerminatorTM 50-phosphate-dependent exonuclease is frequently utilized to enrich for50-capped and 50-triphospharylated RNAs (417273)The inherent 50 30 exonuclease activity digests50-monophosphorylated RNAs and leaves other tran-scripts unaltered (Figure 1 Supplementary Data andSupplementary Table S1) Therefore different chemistriesof RNA 50-ends not only distort sRNA-seq but also allowfor specific enrichment protocols Subsequent analysispermits the distinction among RNA subsets such asprimary transcripts capped RNAs and other productsof RNA 50-end processing (417273)

Nucleotide preference of RNA 50 ligationApart from specific differences in RNA 50-ends the influ-ence of the adapter sequence itself on ligation efficacy iswell documented Biochemical data indicated that ligationreactions proceeded most effectively when an adenosine

6 Nucleic Acids Research 2013

residue occupied the 30-terminal position of adapter(acceptor) molecules (5174) Recently adapter pools dis-playing sequence identity at all except the two 30-terminalnucleotides were used to investigate sequence-dependentvariation in acceptor function Based on deep sequencingas the readout the relative ranking of miRNA frequenciesvaried as a consequence of the specific adapterndashRNA com-binations (63) Once more miRNA expression profiles arenot only a function of true RNA abundance but alsoreflect varying ligation efficacies (63) Interestingly theuse of adapter pools highly improved the correlation ofmiRNA levels between replicates (63)

Influence of RNA secondary structure on RNA 50-endligationTo investigate potential influences on 50-end modificationderived from RNA secondary structure ligation yieldswere compared among test sets of fully randomizedRNAs A remarkable underrepresentation of double-strand folds at 50-ligation sites indicated strong preferencesfor single-stranded RNA substrates in ligation reactionsconducted with T4Rnl1 (75)

Sugar moiety-dependent change in ligation efficacyDNA-based adapter oligonucleotides performwell in RNA30-end ligation reactions (especially when 50-adenylated)whereas RNA-based adapters are preferred for RNA50-end modification (76) Kinetic analysis of DNA-basedadapter molecules for RNA 50-ligation demonstratedslower reactions compared with their RNA counterparts(77) Recent analyses revealed more complex substrate-specific effects between test RNAs and different adaptersWhen DNA- and RNA-based adapters of identical se-quences were investigated in ligation assays no generalpreference with regard to the sugar moiety of the adaptermolecule was observed in RNA 50-end ligation (3963) Inconclusion to avoid bias adapter pools are recommendedto display sequence variation The chance of productiveligation might be enhanced using RNA DNA or mixedadapter oligonucleotides (3963)

Adapter barcoding influences ligationIn addition to nucleotide preferences of the reactingresidues upstream sequences also exert influence onligation yields Adapters harboring different Roche MIDtags placed 19 nucleotides upstream from the reacting30-ends performed entirely differently when assayed withidentical test RNAs (39) Depending on the combinationof in vitro transcribed non-protein coding RNAs andMID-modified adapters ligation product levels (calculatedas non-protein coding RNA input) ranged from 60ndash70to undetectable (39) Recent reports emphasized thatbarcoding for multiplex sequencing leads to significantbias in miRNA expression profiles (78ndash80) This bias wasdiminished when barcodes were introduced by PCRamplification of cDNA (78ndash81)

Quantitative considerations and barcodingIn general the discovery of novel RNAs and analyses oftranscripts that are expressed at very low levels requiresubstantially more cDNA reads than experiments

directed toward investigating differential gene expressionsHence the design of sRNA-seq experiments needs totake into account the rationale of the investigationContigs formed by continuously overlapping readsduring genomic mapping procedures represent thebio-computational equivalent of individual RNAentities Any selected read during cDNA mapping eithercontributes to pre-existing already populated RNAs ormight generate novel contigs The probability of eitherprocess is a function of RNA size as well as the corres-ponding RNA expression value Of course contigs repre-senting longer rather than shorter RNAs are more likelyto be populated by chance RNAs of higher relative ex-pression levels are generally more densely packed withcDNA reads than RNAs that display medium or low tran-scription levels Consequently if multiplexing is chosenfor economic reasons the necessary sequencing depthmust be approximated prior to sequence runs in order togenerate adequate sample sizes The two fixed parametersfor estimating admissible multiplexing are therefore (i) ex-perimental goal and (ii) lane capacity According toIllumina TruSeq Small RNA Sample Prep Kitguidance 5 million sRNA-seq reads are considered appro-priate for novel miRNA discovery within human-sizedgenomes For miRNA profiling usually about 25 millionreads are recommended These approximations permit re-calculation geared toward specific needs and requirements

Alternative cloning strategies to avoid 50-end modificationBecause the 30-terminal position of cDNA corresponds tothe RNA 50-ends Pak et al (82) used a technique to avoidmodification of RNA 50-termini and therefore to minimizethe need for enzymatic pretreatment The strategy intro-duces adapter sequences not to RNA 50-termini butrather to 30-ends of the corresponding cDNAs (82) Theadvantage of omitting enzymatic treatment of RNA50-ends might be offset by incomplete reverse transcriptionthat precludes full-length cDNA cloning or appropriateassessment of RNA 50-endsRecently ligation to cDNA 30-ends was investigated in

more detail (36) Perfect double-stranded DNA-RNA testhybrids and single-stranded DNAs were utilized to mimicpotential products of reverse transcription All T4 DNA-and RNA-ligases including T4Rnl2 truncated derivativeswere compared to delineate optimal conditions for cDNA30-end modification Surprisingly ligation reactionscarried out with pre-adenylated DNA donor moleculesstrongly indicated that double-stranded DNA-RNAchimera constituted the most reactive substrates forT4Rnl1-mediated ligation In addition supplements ofPEG 8000 strikingly influenced ligation final concentra-tions of 25 PEG 8000 increased reaction yields Identicalsingle-stranded test cDNAs exhibited only poor substratecharacteristics The A-form-like conformation of theDNA-RNA hybrid might have compensated for theprimary structure-related influences and permitted moresimilar ligation outcomes among substrates (36)However the analysis examined only one test substrateand therefore might not be amenable to drawing generalconclusions (36) A further interesting observation indi-cates that cDNA 50-terminal nucleotides influence cDNA

Nucleic Acids Research 2013 7

30-end ligation yields (39) As the cDNA 50-terminalsequence is a consequence of specific RNA 30-end modifi-cation differences between protocols exert different influ-ences on cDNA 30-end adapter ligation and in turn mightalso cause expression profile bias (39)The lsquoSmartTM Approachrsquo (83) offers another interesting

alternative to capture mature RNA 50-ends and to avoidRNA 50-end adapter ligation The technology is based onthe template switching capacity of Moloney murineleukemia virus (M-MLV) reverse transcriptase Oncefirst strand synthesis approaches the RNA 50-end theterminal transferase activity of M-MLV reverse transcript-ase adds oligo(C)-stretches to the growing cDNAOligonucleotides that 30-terminate with oligo(G) stretchesserve to extend templates and permit reverse transcriptionto proceed Thus M-MLV reverse transcriptase enablesthe generation of cDNAs with known terminal sequenceelements (83) and thereby eliminates the need to modifythe RNA 50-ends Recently the technology was adapted tothe specific needs of Ilumina RNA-seq analysis (84)

Adapterndashadapter ligation disturbs sRNA-seqA side effect common to sRNA-seq protocols that rely onligase-mediated sequential RNA end-modification is PCRamplification of empty adapterndashadapter ligation products(8185) Adapterndashdimer formation is a direct consequenceof the molecular adapter excess employed to ensure suffi-cient product yield and might dominate in reactions wherethe concentration of total RNA starting material is low(8186) Adapter molecules that are 50-pre-adenylatedremain in reaction mixtures after RNA 30-end modifica-tion and react with RNA 50-adapter oligonucleotideswhich subsequently favors dimer formation Thereforepurifying products away from excess adapters via gelelectrophoresis is recommended (78) However for prep-aration of tiny RNAs such as miRNAs separation of thesmall size difference between ligated product and adaptersis not trivial Alternative approaches to avoid size frac-tionation rely on application of complementary lsquodimer-eliminator-LNA-oligonucleotidesrsquo (8586) The interferingoligonucleotide is designed to overlap the adapterndashadapterligation junction and thereby distinguish between targetedside product and adapter ligated RNA (86) The comple-mentarity of the interfering oligonucleotides extends intothe RNA 30-adapter sequence thus blocking primerannealing during reverse transcription and as a resultexcludes adapter dimers from RNA sequencing (86)Vigneault et al reported a similar strategy to overcomeunintended dimer amplification (81) Pre-annealing of theRT primer after RNA 30-adapter ligation but prior toRNA 50-end modification efficiently decreased adapterdimer formation The excess of non-ligated free RNA30-adapters was captured by complementary reverse tran-scriptase primers and sequestered in less-reactive double-stranded structures (81) The latter approach permitseffective reduction of dimer amplification and is reportedto increase cDNA library complexity (81) ObviouslyRNA tailing strategies (see above) are superior becausethey completely circumvent the problem of adapter-dimer formation To avoid adapter tailing as an analogous

side reaction to dimer formation it is advisable to carryout 30-tailing prior to 50-adapter ligation

Depending on the experimental goal sRNA-seqrequires varying amounts of cDNA reads (see above)Therefore techniques to reduce the amount of readsthat represent empty adapter dimers are of central import-ance as they permit an increase in sequencing depthreducing costs and ultimately enabling novel smallnon-protein coding RNA gene discovery

PCR AMPLIFICATION BIAS

Apart from novel third-generation sequencingtechnologies that permit direct sequence analysis of un-amplified cDNA all commonly used protocols implementPCR amplification (87) In addition the introduction ofbarcode identifiers by PCR amplification may also avoidbias introduced by differences in RNAndashadapter-specificligation efficacies (78) Therefore multiplexing of differentsamples during PCR is widely regarded to permit meta-analysis (78) However even PCR amplification itself isknown to introduce expression bias into datasets(303288) In particular when many different templatesare amplified in parallel variations not only in templatelength but also in base composition might lead topreferred template-specific amplifications (30) Extremeexamples of allelic dropout due to single base-exchangewere reported for PCR-mediated genotyping (89) TheRNA GC-content is especially responsible for data dis-tortion Generally cDNAs of high GC-content are lessreadily amplified and thus remain underrepresented Forexample read distributions calculated over the entire 28SrRNA were inversely correlated with the correspondingregional GC-content (32) As argued earlier base com-positions of different small non-protein coding RNAs(miRNAs piRNAs snoRNAs scaRNAs etc) varywidely The parallel amplification of complex templatemixtures that vary in their respective GC-content is there-fore a challenging procedure and likely to influence thecomplexity of the resulting cDNA library (30) Changesin complexity do not only influence distortion of quanti-tative measurements but certain RNAs might also lsquodropoutrsquo completely and remain unidentified Extended periodsof denaturation are considered to effectively decrease thelikelihood of uneven amplification (90) PCR amplifica-tion does not only change the overall GC-content of theresulting cDNA pool but is also reported to introducelength bias (30) which might be an important factorwhen RNAs of different sizes are subjected to parallelcDNA preparations Shorter sequences of significantlylower GC-content are likely to be overrepresented

Recent analysis also emphasized the impact of PCRbuffer compositions on PCR amplification bias (30)Remarkably PCR-mediated bias was already observedin some cases within 15 rounds of PCR amplification(30) While the popular Phusion PCR system surprisinglyperformed the worst (30) data indicate that someoptimized PCR buffers are capable of significantlyreducing PCR-mediated distortion (3090)

8 Nucleic Acids Research 2013

sRNA-SEQ AND PERSONALIZED MEDICINE

In recent years increasing attention has focused on thepotential of miRNA transcriptional profiling to diagnosedisease or to monitor progression and success in treatment(161791) The decreasing costs of high-throughputsequence analysis favors sRNA-seq as a common toolfor medical diagnostics and personalized medicine Aftercell death the total RNA of circulating cancer cells isreleased into the blood stream (92) These extracellularRNAs (exRNAs) are packaged within vesicles such asexosomes where they are complexed with differentproteins for example miRNAs in the RISC complexVesicles protect exRNA from ribonucleases and preventdegradation Among the various exRNAs miRNAs aresome of the most stable molecules (93) Easy accessibilityof patient blood samples and the stability of miRNAsmake them effective biomarkers (94) Diagnostics basedon miRNA were successfully conducted on biologicalsamples of low quality including dried blood (95) Inaddition miRNAs remain stable even when samples aresubjected to repeated rounds of tissue freezing andthawing which usually degrades RNAs (96) A furtherappealing feature of miRNAs in medical research is thatchanges in their expression profiles are observed duringdisease onset (97) In addition miRNAs often displaytissue-specific expression Therefore it might be possibleto identify the cellular origin of most metastases (91)possibly providing decision guidance for appropriatetreatment However as elaborated in detail miRNA ex-pression profiles detected by deep sequencing do not onlyreflect changes in relative expression but are alsoinfluenced by miRNA-adapter-specific ligation efficaciesespecially in cost-effective parallel analyses that oftenrequire multiplexing Different barcode introduced byligation generate bias in miRNA representation and abun-dance (78) Even when incorporated by PCR barcodingmight still lead to data distortion (30ndash32) Third-gener-ation sequencing techniques such as the Helicos (InNovember 2012 Helicos filed bankruptcy (httpbizyahoocome121115hlcs8-khtml)) system avoid PCRamplification steps in cDNA preparation and permitdirect sequencing of unamplified samples (87) The RNA30-end modification to prime cDNA synthesis (or directsequencing) in addition to relatively high error frequencies(15) might still lead to bias and so would not beespecially applicable in personalized medicine (98)Collectively deep sequencing to monitor change inmedical samples is of importance and certainly ofinterest but requires resolution of serious standardizationissues prior to general application

qRT-PCR a valid alternative to deep sequencing

Alternative techniques to quantify specific miRNA testsets rely on quantitative real-time reverse transcriptionPCR (qRT-PCR) (99) In particular diagnostic screensin cancer medicine rely on the application of qRT-PCR(99ndash101) In cases where biomarker disease correlation iswell established and when quantification of limitednumbers of biomarkers is sufficient for diagnosis and tomonitor disease progression or treatment qRT-PCR is

a valuable tool (99) Recently the convenience ofqRT-PCR-based medical diagnosis was suggested tomake it superior in routine clinical applications (99)Interestingly novel developments in qRT-PCR allowsample processing in 384-well formats and liken thismethod to cost-effective high-throughput applications(99102) A further advantage of qRT-PCR is the reduc-tion in downstream bio-computation (99103) The timebetween qRT-PCR and diagnostic result is shorter thanfor sRNA-seq or microarray-based inference Howeverthe analysis of miRNA expression profiles by qRT-PCRis technically challenging due to RNA-inherent pitfalls(103104)The miRNAs lack structural features such as mRNA

poly(A) tails that could serve as class-specific enrichmentor priming of reverse transcription (103) Alternative tech-niques for miRNA sample enrichment would rely onimmune-precipitation but most of them are not suitablefor typical small size samples in medical analysis (105) Asmentioned above the heterogeneity of the RNA GC-content translates into a wide range of primer annealingtemperatures for a population of miRNAs and mighttherefore complicate parallel analysis of multiple RNAs(103) Furthermore the qRT-PCR template is presentnot only within the mature miRNA molecule but also ispart of various precursor transcripts (103) It is thereforeimportant to ensure that primary or precursor-miRNAs(pri- or pre-miRNAs) do not provide templates inqRT-PCR amplification and as a consequence do notcontribute to the signal intensities detected Difficultiesthat relate to the similarity displayed between differentmiRNA isoforms are mentioned above within thecontext of array-based inference to quantify differentialgene expressionThus far two alternative approaches are commonly

utilized to reverse transcribe miRNA templates(103106) miRNA-specific stem-loop primers are oftenused when only limited numbers of RNAs are subject toquantification A major advantage of using miRNA-specific primers for reverse transcription is an increase insensitivity and specificity However RNA 30-end-tailing-based approaches enable bulk reverse transcription ofmultiple targets in parallel and are therefore advantageouswhen many miRNAs are analyzed (see above) A recentcomparative analysis to investigate reverse transcriptionefficacies of the two methods uncovered only minor dif-ferences (106) The analysis involved only a limitedmiRNA test set and therefore the results cannot begeneralized As an aside the oligo d(T)-based designmight also enable reverse transcription of miRNA andthe corresponding mRNA target in single test tube reac-tions (106) Ligation-mediated PCR or padlock probes incombination with rolling circle amplification are technicalalternatives for avoiding reverse transcription (107108)The methods rely on ligation-mediated template gener-ation and are therefore not devoid of bias (see above)Both approaches permit generation of miRNA signaturesacross samples and do not require elaborate analyticalpost-processing or expensive technical equipment(107108) However as elaborated here and abovereverse transcription is not devoid of template-specific

Nucleic Acids Research 2013 9

bias In particular RNA secondary structure and RNAmodifications both influence reverse transcription efficacy(see above) Similar to PCRs reverse transcription is nega-tively affected by increasing RNA GC-contentBecause the relative RNA GC-content dictates primer

Tm qRT-PCRs are also strongly influenced by forwardprimer design thereby affecting both qRT-PCR specificityand sensitivity Altering the length of primer-templatecomplementarity permits adjustment of the actualprimer Tm Conversely in cases where RNA nucleotidecomposition would dictate very low melting temperaturesLNA-modified nucleotides might be incorporated into theprimer sequence to increase the corresponding primer Tm(18) However the design of LNA-containing oligonucleo-tides is not always trivial (109) For the actual measure-ment of miRNA in real time two alternative fluorescentreporter molecules are frequently utilized SYBR Greenand the Taqman assay (103) Detection of the fluores-cence signal emitted by SYBR Green increases about100-fold as the molecule intercalates in double-strandedDNA However the mechanism of DNA intercalationsuggests the dye is not target-specific and thereforecannot discriminate primer dimers unspecific productsand target amplification (103) Because the intercalatingdye is simply added to the assay it is often possible toeasily adapt the qRT-PCR assay to already establishedand validated analytical PCRs (104) In comparison theTaqman assay provides increased specificity as the de-tection is based on sequence-specific probendashtarget inter-action Certainly the qRT-PCR assays offer manyadvantages but require evaluation of various factors thatinfluence and potentially bias analysis Thereforeinternational standardization to ensure comparable ex-perimental design and to permit meta-analysis amongdatasets is of prime importance for establishing clinicalqRT-PCR-based personal medicine The MinimumInformation for Publication of Quantitative Real-TimePCR experiments guidelines established standards for bet-ter experimental practice and suggest routes to data ana-lysis that increase reliability and data comparison (110)

mRNA-SEQ

Deep sequencing is not limited to expression profiling ofmiRNA or other small non-protein coding RNAs Evenentire transcriptomes including all protein coding andlong non-protein coding RNAs are subjected to high-throughput sequencing methods Currently as the lengthof single sequence runs is limited (from 100 to 500 nt de-pending on the sequencing platform) longer RNAsrequire fragmentation prior to cDNA synthesis (25)Millions of small reads resulting from mRNA-seq areanalyzed to identify novel transcripts or splice variants(111112) Relative quantification of reads enables one tomonitor differential gene expression among samples andto identify changes in isoform distribution (111) As withsRNA-seq the digital nature of RNA deep sequencing isthought to simplify meta-analysis of mRNA or long non-protein coding RNA expression data (25) Some protocolsfor modification of RNA fragment ends rely on

chemistries identical to sRNA-seq (84113) Therefore ex-pression biases observed in sRNA-seq analysis are alsolikely to cause distortion in whole transcriptome profilingVariations in nucleotide composition and changes in RNAsecondary structure of different RNA fragments mightcorrelate with uneven read distribution Unequal readcoverage generates uncertainty in isoform detection andin extreme cases exons representing unfavorable nucleo-tide compositions might escape analysis As fragmentationis suggested to generate many different but at least par-tially overlapping reads bias in mRNA-seq might be lessprominent Recently a novel technique based on circligasewas introduced to avoid RNA 50-end ligation (114)Reverse transcription is conducted with anchoredoligo(dT) primers that contain an additional sequence topermit later PCR amplification Following first strandcDNA synthesis circligase-mediated ligation is carriedout The resulting circular molecules are subjected toPCR amplification The strategy is of general interestand might be a valuable tool for sRNA-seq as well

CONCLUSION

Bias in general describes systematic errors that reflectmethod-related distortion from the truth Bias might bedetected easily when data generated by different methodsare subjected to analytical comparison Because it is sys-tematic in nature bias is not subject to variation inrepeated experiments Here based on biochemical anddeep sequencing data analyses we have presentedvarious steps in sRNA-seq protocols likely to causesevere distortions in the relative expression levels of indi-vidual sRNAs

The efficacy of RNA end-modification to permit cDNAconstruction depends not only on differences in strategiesand reagents but also on changes in among others buffercompositions and additives that could increase molecularcrowding Therefore meta-analysis of data generated withnon-identical methods reagents or even buffers is likely tobe error-prone (36396378) To achieve comparableresults and to evaluate relative changes in the quantifica-tion of the resulting data experiments and analyses shouldbe conducted under strictly identical conditions It has tobe stressed that bias in RNA-specific variation is not onlyrestricted to inter-sample comparison but also effects therelative ranking of intra-sample RNA expression

In addition to the more obvious quantitative aspects ofbias there are also qualitative aspects because RNA-seqmethods are also employed for RNA discovery IndividualRNAs might lsquodrop outrsquo from detection because they arenot amenable to cDNA construction under chosen condi-tions Certainly to increase the likelihood of productiveRNA end-modification and increase the complexity ofsRNA-seq library content the parallel application of dif-ferent strategies and reagents is required (33396379) Inparticular increased variability of adapter sequences (ieadapter pools) helps to increase the diversity of RNAsaccessed (63) A recent report emphasized the need forparallel application of different RNA- and DNA-basedadapter oligonucleotides (39)

10 Nucleic Acids Research 2013

The source of expression level distortion and the failureto detect certain RNA species at all appear not to belimited to reactions of RNAcDNA-end modificationbut also apply to cDNA amplification by PCR (30ndash32)Differences in reaction buffers RNA GC-content sec-ondary structures RNA length and primers might eachlead to bias during PCR Therefore the analysis ofsRNA-seq data to examine relative changes in geneexpression or identification reflects both true RNA abun-dance and biases related to the methods applied

The sRNA-seq approach is also attractive to medicalresearch Deep sequencing might permit the screening ofmultiple patient samples to monitor the progression ofdisease or treatment Therefore it is urgently necessaryto establish rigid international standardizations tominimize distortion and to avoid misleading conclusionsduring RNA-seq data interpretation In summary quali-tative and quantitative RNomics remain more challengingthan anticipated

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

FUNDING

National Genome Research Network [NGFNIII01GS0808 to JB and TSR] the Deutsche Forschungs-gemeinschaft [BR 7544-1 to JB] USM ResearchUniversity Grant [1001CIPPT81305 to THT]Funding for open access charge The licence was fundedby Dr Brosiusrsquos research budget

Conflict of interest statement None declared

REFERENCES

1 BrosiusJ (2009) The fragmented gene Ann N Y Acad Sci1178 186ndash193

2 BrosiusJ and TiedgeH (2004) RNomenclature RNA Biol 181ndash83

3 LiuQ and ParooZ (2010) Biochemical principles of small RNApathways Annu Rev Biochem 79 295ndash319

4 BrosiusJ (2012) In ErdmannVA and BarciszewskiJ (eds)From Nucleic Acids Sequences to Molecular Medicine RNATechnologies Springer Berlin Heidelberg pp 1ndash18

5 HuttenhoferA BrosiusJ and BachellerieJP (2002) RNomicsidentification and function of small non-messenger RNAs CurrOpin Chem Biol 6 835ndash843

6 AnS and SongJJ (2011) The coded functions of noncodingRNAs for gene regulation Mol Cells 31 491ndash496

7 BushatiN and CohenSM (2007) microRNA functions AnnuRev Cell Dev Biol 23 175ndash205

8 ChiangHR SchoenfeldLW RubyJG AuyeungVCSpiesN BaekD JohnstonWK RussC LuoS BabiarzJEet al (2010) Mammalian microRNAs experimental evaluation ofnovel and previously annotated genes Genes Dev 24 992ndash1009

9 AmbrosV (2001) microRNAs tiny regulators with greatpotential Cell 107 823ndash826

10 LauNC LimLP WeinsteinEG and BartelDP (2001) Anabundant class of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 294 858ndash862

11 RuvkunG (2001) Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash799

12 RuvkunGB (2003) The tiny RNA world Harvey Lect 991ndash21

13 GurtanAM and SharpPA (2013) The Role of miRNAs inRegulating Gene Expression Networks J Mol Biol 4253582ndash3600

14 LiangH ZhangJ ZenK ZhangCY and ChenX (2013)Nuclear microRNAs and their unconventional role in regulatingnon-coding RNAs Protein Cell 4 325ndash330

15 HuangV and LiLC (2012) miRNA goes nuclear RNA Biol 9269ndash273

16 WangZ HanJ CuiY FanK and ZhouX (2013) CirculatingmicroRNA-21 as noninvasive predictive biomarker for response incancer immunotherapy Med Hypotheses 81 41ndash43

17 ZhaoC ZhangJ ZhangS YuD ChenY LiuQ ShiMNiC and ZhuM (2013) Diagnostic and biological significance ofmicroRNA-192 in pancreatic ductal adenocarcinoma Oncol Rep30 276ndash284

18 van RooijE (2011) The art of microRNA research Circ Res108 219ndash234

19 LeeLW ZhangS EtheridgeA MaL MartinD GalasDand WangK (2010) Complexity of the microRNA repertoirerevealed by next-generation sequencing RNA 16 2170ndash2180

20 OkoniewskiMJ and MillerCJ (2006) Hybridization interactionsbetween probesets in short oligo microarrays lead to spuriouscorrelations BMC Bioinformatics 7 276

21 MortazaviA WilliamsBA McCueK SchaefferL andWoldB (2008) Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 5 621ndash628

22 LiuS LinL JiangP WangD and XingY (2011) Acomparison of RNA-Seq and high-density exon array fordetecting differential gene expression between closely relatedspecies Nucleic Acids Res 39 578ndash588

23 RaghavachariN BarbJ YangY LiuP WoodhouseKLevyD OrsquoDonnellC MunsonPJ and KatoG (2012) Asystematic comparison and evaluation of high density exon arraysand RNA-seq technology used to unravel the peripheral bloodtranscriptome of sickle cell disease BMC Med Genomics 5 28

24 MartinJA and WangZ (2011) Next-generation transcriptomeassembly Nat Rev Genet 12 671ndash682

25 WangZ GersteinM and SnyderM (2009) RNA-Seq arevolutionary tool for transcriptomics Nat Rev Genet 1057ndash63

26 KozomaraA and Griffiths-JonesS (2011) miRBase integratingmicroRNA annotation and deep-sequencing data Nucleic AcidsRes 39 D152ndashD157

27 LinsenSE de WitE JanssensG HeaterS ChapmanLParkinRK FritzB WymanSK de BruijnE VoestEE et al(2009) Limitations and possibilities of small RNA digital geneexpression profiling Nat Methods 6 474ndash476

28 BakerM (2010) MicroRNA profiling separating signal fromnoise Nat Methods 7 687ndash692

29 HafnerM RenwickN BrownM MihailovicA HolochDLinC PenaJT NusbaumJD MorozovP LudwigJ et al(2011) RNA-ligase-dependent biases in miRNA representationin deep-sequenced small RNA cDNA libraries RNA 171697ndash1712

30 DabneyJ and MeyerM (2012) Length and GC-biases duringsequencing library amplification a comparison of variouspolymerase-buffer systems with ancient and modern DNAsequencing libraries Biotechniques 52 87ndash94

31 OrpanaAK HoTH and StenmanJ (2012) Multiple heatpulses during PCR extension enabling amplification of GC-richsequences and reducing amplification bias Anal Chem 842081ndash2087

32 SendlerE JohnsonGD and KrawetzSA (2011) Local andglobal factors affecting RNA sequencing analysis Anal Biochem419 317ndash322

33 ZhuangF FuchsRT SunZ ZhengY and RobbGB (2012)Structural bias in T4 RNA ligase-mediated 30-adapter ligationNucleic Acids Res 40 e54

34 DeChiaraTM and BrosiusJ (1987) Neural BC1 RNA cDNAclones reveal nonrepetitive sequence content Proc Natl AcadSci USA 84 2624ndash2628

Nucleic Acids Research 2013 11

35 WickensM and KwakJE (2008) Molecular biology A tail talefor U Science 319 1344ndash1345

36 MunafoDB and RobbGB (2010) Optimization of enzymaticreaction conditions for generating representative pools of cDNAfrom small RNA RNA 16 2537ndash2552

37 SarkarN (1997) Polyadenylation of mRNA in prokaryotesAnnu Rev Biochem 66 173ndash197

38 Yehudai-ResheffS and SchusterG (2000) Characterization of theEcoli poly(A) polymerase nucleotide specificity RNA-bindingaffinities and RNA structure dependence Nucleic Acids Res 281139ndash1144

39 RaabeCA HoeCH RandauG BrosiusJ TangTH andRozhdestvenskyTS (2011) The rocks and shallows of deep RNAsequencing Examples in the Vibrio cholerae RNome RNA 171357ndash1366

40 RaabeCA SanchezCP RandauG RobeckT SkryabinBVChinniSV KubeM ReinhardtR NgGH ManickamRet al (2010) A global view of the nonprotein-codingtranscriptome in Plasmodium falciparum Nucleic Acids Res 38608ndash617

41 SharmaCM HoffmannS DarfeuilleF ReignierJ FindeissSSittkaA ChabasS ReicheK HackermullerJ ReinhardtRet al (2010) The primary transcriptome of the major humanpathogen Helicobacter pylori Nature 464 250ndash255

42 SharmaCM and VogelJ (2009) Experimental approaches forthe discovery and characterization of regulatory small RNACurr Opin Microbiol 12 536ndash546

43 YuB YangZ LiJ MinakhinaS YangM PadgettRWStewardR and ChenX (2005) Methylation as a crucial step inplant microRNA biogenesis Science 307 932ndash935

44 HouwingS KammingaLM BerezikovE CronemboldDGirardA van den ElstH FilippovDV BlaserH RazEMoensCB et al (2007) A role for Piwi and piRNAs in germcell maintenance and transposon silencing in Zebrafish Cell 12969ndash82

45 AckermannHW (2003) Bacteriophage observations andevolution Res Microbiol 154 245ndash251

46 RossmannMG MesyanzhinovVV ArisakaF andLeimanPG (2004) The bacteriophage T4 DNA injectionmachine Curr Opin Struct Biol 14 171ndash180

47 HoCK and ShumanS (2002) Bacteriophage T4 RNA ligase 2(gp241) exemplifies a family of RNA ligases found in allphylogenetic domains Proc Natl Acad Sci USA 9912709ndash12714

48 YinS HoCK and ShumanS (2003) Structure-function analysisof T4 RNA ligase 2 J Biol Chem 278 17601ndash17608

49 AravinA and TuschlT (2005) Identification and characterizationof small RNAs involved in RNA silencing FEBS Lett 5795830ndash5840

50 ViolletS FuchsRT MunafoDB ZhuangF and RobbGB(2011) T4 RNA ligase 2 truncated active site mutants improvedtools for RNA analysis BMC Biotechnol 11 72

51 UhlenbeckOC and GumportRI (1982) T4 RNA LigaseIn BoyerPD (ed) The Enzymes Nucleic Acids Part BAcademic Press Inc New York pp 31ndash58

52 HoCK WangLK LimaCD and ShumanS (2004) Structureand mechanism of RNA ligase Structure 12 327ndash339

53 El OmariK RenJ BirdLE BonaMK KlarmannGLeGriceSF and StammersDK (2006) Molecular architectureand ligand recognition determinants for T4 RNA ligase J BiolChem 281 1573ndash1579

54 StarkMR PleissJA DerasM ScaringeSA and RaderSD(2006) An RNA ligase-mediated method for the efficient creationof large synthetic RNAs RNA 12 2014ndash2019

55 CranstonJW SilberR MalathiVG and HurwitzJ (1974)Studies on ribonucleic acid ligase Characterization of anadenosine triphosphate-inorganic pyrophosphate exchangereaction and demonstration of an enzyme-adenylate complex withT4 bacteriophage-induced enzyme J Biol Chem 2497447ndash7456

56 HigginsNP GeballeAP SnopekTJ SuginoA andCozzarelliNR (1977) Bacteriophage T4 RNA ligase preparationof a physically homogeneous nuclease-free enzyme fromhyperproducing infected cells Nucleic Acids Res 4 3175ndash3186

57 PfefferS Lagos-QuintanaM and TuschlT (2005) Cloning ofsmall RNA molecules Curr Protoc Mol Biol Chapter 26 Unit26 24

58 BruceAG and UhlenbeckOC (1978) Reactions at the terminiof tRNA with T4 RNA ligase Nucleic Acids Res 5 3665ndash3677

59 VigneaultF SismourAM and ChurchGM (2008) EfficientmicroRNA capture and bar-coding via enzymatic oligonucleotideadenylation Nat Methods 5 777ndash779

60 LiuJM LivnyJ LawrenceMS KimballMD WaldorMKand CamilliA (2009) Experimental discovery of sRNAs in Vibriocholerae by direct cloning 5StRNA depletion and parallelsequencing Nucleic Acids Res 37 e46

61 KrugM and UhlenbeckOC (1982) Reversal of T4 RNA ligaseBiochemistry 21 1858ndash1864

62 HafnerM LandgrafP LudwigJ RiceA OjoT LinCHolochD LimC and TuschlT (2008) Identification ofmicroRNAs and other small regulatory RNAs using cDNAlibrary sequencing Methods 44 3ndash12

63 JayaprakashAD JabadoO BrownBD and SachidanandamR(2011) Identification and remediation of biases in the activity ofRNA ligases in small-RNA deep sequencing Nucleic Acids Res39 e141

64 YewCW and KumarSV (2012) Isolation and cloning ofmicroRNAs from recalcitrant plant tissues with small amounts oftotal RNA a step-by step approach Mol Biol Rep 391783ndash1790

65 KurthHM and MochizukiK (2009) 2rsquo-O-methylation stabilizesPiwi-associated small RNAs and ensures DNA elimination inTetrahymena RNA 15 675ndash685

66 EbhardtHA ThiEP WangMB and UnrauPJ (2005)Extensive 30 modification of plant small RNAs is modulated byhelper component-proteinase expression Proc Natl Acad SciUSA 102 13398ndash13403

67 WangJ CzechB CrunkA WallaceA MitrevaMHannonGJ and DavisRE (2011) Deep small RNA sequencingfrom the nematode Ascaris reveals conservation functionaldiversification and novel developmental profiles Genome Res 211462ndash1477

68 ZhuangF FuchsRT and RobbGB (2012) Small RNAexpression profiling by high-throughput sequencing implicationsof enzymatic manipulation J Nucleic Acids 2012 360358

69 FujimuraT and EstebanR (2012) Cap snatching of yeast L-Adouble-stranded RNA virus can operate in trans and requiresviral polymerase actively engaging in transcription J Biol Chem287 12797ndash12804

70 BessmanMJ WalshJD DunnCA SwaminathanJWeldonJE and ShenJ (2001) The gene ygdP associated withthe invasiveness of Escherichia coli K1 designates a Nudixhydrolase Orf176 active on adenosine (50)-pentaphospho-(50)-adenosine (Ap5A) J Biol Chem 276 37834ndash37838

71 DeanaA CelesnikH and BelascoJG (2008) The bacterialenzyme RppH triggers messenger RNA degradation by50 pyrophosphate removal Nature 451 355ndash358

72 BlewettN CollerJ and GoldstrohmA (2011) A quantitativeassay for measuring mRNA decapping by splinted ligation reversetranscription polymerase chain reaction qSL-RT-PCR RNA 17535ndash543

73 KrogerC DillonSC CameronAD PapenfortKSivasankaranSK HokampK ChaoY SittkaA HebrardMHandlerK et al (2012) The transcriptional landscape and smallRNAs of Salmonella enterica serovar Typhimurium Proc NatlAcad Sci USA 109 E1277ndashE1286

74 EnglandTE and UhlenbeckOC (1978) Enzymaticoligoribonucleotide synthesis with T4 RNA ligase Biochemistry17 2069ndash2076

75 SorefanK PaisH HallAE KozomaraA Griffiths-JonesSMoultonV and DalmayT (2012) Reducing ligation bias of smallRNAs in libraries for next generation sequencing Silence 3 4

76 ZhelkovskyAM and McReynoldsLA (2011) Simple andefficient synthesis of 50 pre-adenylated DNA using thermostableRNA ligase Nucleic Acids Res 39 e117

77 McCoyMI and GumportRI (1980) T4 ribonucleic acid ligasejoins single-strand oligo(deoxyribonucleotides) Biochemistry 19635ndash642

12 Nucleic Acids Research 2013

78 AlonS VigneaultF EminagaS ChristodoulouDCSeidmanJG ChurchGM and EisenbergE (2011) Barcodingbias in high-throughput multiplex sequencing of miRNA GenomeRes 21 1506ndash1511

79 SunG WuX WangJ LiH LiX GaoH RossiJ andYenY (2011) A bias-reducing strategy in profiling small RNAsusing Solexa RNA 17 2256ndash2262

80 Van NieuwerburghF SoetaertS PodshivalovaK Ay-LinWangE SchafferL DeforceD SalomonDR HeadSR andOrdoukhanianP (2011) Quantitative bias in Illumina TruSeq anda novel post amplification barcoding strategy for multiplexedDNA and small RNA deep sequencing PLoS One 6 e26969

81 VigneaultF Ter-OvanesyanD AlonS EminagaSC ChristodoulouD SeidmanJG EisenbergE and MChurchG (2012) High-throughput multiplex sequencingof miRNA Curr Protoc Hum Genet Chapter 11 Unit 1112 11-10

82 PakJ and FireA (2007) Distinct populations of primary andsecondary effectors during RNAi in C elegans Science 315241ndash244

83 ZhuYY MachlederEM ChenchikA LiR and SiebertPD(2001) Reverse transcriptase template switching a SMARTapproach for full-length cDNA library constructionBiotechniques 30 892ndash897

84 LevinJZ YassourM AdiconisX NusbaumCThompsonDA FriedmanN GnirkeA and RegevA (2010)Comprehensive comparative analysis of strand-specific RNAsequencing methods Nat Methods 7 709ndash715

85 AndoY BurroughsAM KawanoM de HoonMJ andHayashizakiY (2012) Targeted methods to improve small RNAprofiles generated by deep sequencing In MallickB andGhoshZ (eds) Regulatory RNAs Basics Methods andApplications Springer-Verlag Heidelberg Germany pp 253ndash271

86 KawanoM KawazuC LizioM KawajiH CarninciPSuzukiH and HayashizakiY (2010) Reduction of non-insertsequence reads by dimer eliminator LNA oligonucleotide forsmall RNA deep sequencing Biotechniques 49 751ndash755

87 KapranovP OzsolakF and MilosPM (2012) Profiling of shortRNAs using Helicos single-molecule sequencing Methods MolBiol 822 219ndash232

88 KanagawaT (2003) Bias and artifacts in multitemplatepolymerase chain reactions (PCR) J Biosci Bioeng 96 317ndash323

89 DayDJ SpeiserPW SchulzeE BettendorfM FitnessJBaranyF and WhitePC (1996) Identification of non-amplifyingCYP21 genes when using PCR-based diagnosis of 21-hydroxylasedeficiency in congenital adrenal hyperplasia (CAH) affectedpedigrees Hum Mol Genet 5 2039ndash2048

90 AirdD RossMG ChenWS DanielssonM FennellTRussC JaffeDB NusbaumC and GnirkeA (2011) Analyzingand minimizing PCR amplification bias in Illumina sequencinglibraries Genome Biol 12 R18

91 UllahMF and AatifM (2009) The footprints of cancerdevelopment cancer biomarkers Cancer Treat Rev 35 193ndash200

92 MoMH ChenL FuY WangW and FuSW (2012) Cell-freeCirculating miRNA Biomarkers in Cancer J Cancer 3 432ndash448

93 ZenK and ZhangCY (2012) Circulating microRNAs a novelclass of biomarkers to diagnose and monitor human cancersMed Res Rev 32 326ndash348

94 KimT and ReitmairA (2013) Non-Coding RNAs FunctionalAspects and Diagnostic Utility in Oncology Int J Mol Sci 144934ndash4968

95 PatnaikSK MallickR and YendamuriS (2010) Detection ofmicroRNAs in dried serum blots Anal Biochem 407 147ndash149

96 MitchellPS ParkinRK KrohEM FritzBR WymanSKPogosova-AgadjanyanEL PetersonA NoteboomJOrsquoBriantKC AllenA et al (2008) Circulating microRNAs as

stable blood-based markers for cancer detection Proc NatlAcad Sci USA 105 10513ndash10518

97 CieslaM SkrzypekK KozakowskaM LobodaAJozkowiczA and DulakJ (2011) MicroRNAs as biomarkers ofdisease onset Anal Bioanal Chem 401 2051ndash2061

98 OzsolakF (2012) Third-generation sequencing techniques andapplications to drug discovery Expert Opin Drug Discov 7231ndash243

99 TsongalisGJ CalinG CordelierP CroceC MonzonF andSzafranska-SchwarzbachAE (2013) MicroRNA analysis is itready for prime time Clin Chem 59 343ndash347

100 AmiraN Cancel-TassinG BernardiniS Cochand-PriolletBBittardH ManginP FournierG LatilA and CussenotO(2004) Expression in bladder transitional cell carcinoma by real-time quantitative reverse transcription polymerase chain reactionarray of 65 genes at the tumor suppressor locus 9q341-2identification of 5 candidates tumor suppressor genes Int JCancer 111 539ndash542

101 ZhangX (2007) Biomarker validation movement towardspersonalized medicine Expert Rev Mol Diagn 7 469ndash471

102 SchmittgenTD LeeEJ and JiangJ (2008) High-throughputreal-time PCR Methods Mol Biol 429 89ndash98

103 BenesV and CastoldiM (2010) Expression profiling ofmicroRNA using real-time quantitative PCR how to use it andwhat is available Methods 50 244ndash249

104 BustinSA and NolanT (2004) Pitfalls of quantitative real-timereverse-transcription polymerase chain reaction J Biomol Tech15 155ndash166

105 BeitzingerM and MeisterG (2011) Experimental identificationof microRNA targets by immunoprecipitation of Argonauteprotein complexes Methods Mol Biol 732 153ndash167

106 FiedlerSD CarlettiMZ and ChristensonLK (2010)Quantitative RT-PCR methods for mature microRNA expressionanalysis Methods Mol Biol 630 49ndash64

107 ArefianE KianiJ SoleimaniM ShariatiSA Aghaee-BakhtiariSH AtashiA GheisariY AhmadbeigiN Banaei-MoghaddamAM NaderiM et al (2011) Analysis ofmicroRNA signatures using size-coded ligation-mediated PCRNucleic Acids Res 39 e80

108 JonstrupSP KochJ and KjemsJ (2006) A microRNAdetection system based on padlock probes and rolling circleamplification RNA 12 1747ndash1752

109 LatorraD ArarK and HurleyJM (2003) Designconsiderations and effects of LNA in PCR primers Mol CellProbes 17 253ndash259

110 BustinSA BenesV GarsonJA HellemansJ HuggettJKubistaM MuellerR NolanT PfafflMW ShipleyGLet al (2009) The MIQE guidelines minimum information forpublication of quantitative real-time PCR experiments ClinChem 55 611ndash622

111 TrapnellC WilliamsBA PerteaG MortazaviA KwanGvan BarenMJ SalzbergSL WoldBJ and PachterL (2010)Transcript assembly and quantification by RNA-Seq revealsunannotated transcripts and isoform switching during celldifferentiation Nat Biotechnol 28 511ndash515

112 TrapnellC PachterL and SalzbergSL (2009) TopHatdiscovering splice junctions with RNA-Seq Bioinformatics 251105ndash1111

113 ListerR OrsquoMalleyRC Tonti-FilippiniJ GregoryBDBerryCC MillarAH and EckerJR (2008) Highly integratedsingle-base resolution maps of the epigenome in ArabidopsisCell 133 523ndash536

114 PolidorosAN PasentsisK and TsaftarisAS (2006) Rollingcircle amplification-RACE a method for simultaneous isolationof 50 and 30 cDNA ends from amplified cDNA templatesBiotechniques 41 35ndash42

Nucleic Acids Research 2013 13

Page 2: SURVEY AND SUMMARY Biases in small RNA deep sequencing data

biomarkers to monitor progression of disease or treatmenthas been reported (1617)The high capacity and comparably low cost of modern

deep-sequencing analysis suggest that high-throughputsmall non-protein coding RNA sequencing (smallRNAseq or sRNA-seq) might be a superior tool whenapplied to personalized medicine The commonly usedmicroarray-based analysis has to compensate fordifferences in hybridization efficacies in order to ensuretrue representation of relative expression levelsHybridization-based assays must also be sensitiveenough to permit isoform-specific miRNA detectionSuch requirements are sometimes hard to attain sinceindividual miRNAs often display very different meltingtemperatures (18) Furthermore the quantification oflow-abundance miRNAs by micro-array is oftencomplicated due to low signal to noise ratios RNA-seqis particularly suited to investigating transcripts of lowabundance (81719) Deep sequencing is thereforethought to ultimately widen the dynamic range of RNAquantification (20ndash23) and the digital nature of RNA-seqreduces noise classically associated with hybridization-based approaches In addition RNA-seq does notdepend on prior genome annotation and enablesmapping of RNAs at single nucleotide resolution(2324) Because probetarget-specific hybridizationkinetics require expensive normalization to compareresults across platforms and to rank RNA levels accordingto relative abundance RNA-seq is generally considered tosimplify RNA quantification (25) In general RNA-seq iswidely regarded as an appropriate tool to characterizeentire transcriptomes and it has proven to be of particularvalue in small non-protein coding RNA discovery (sRNA-seq methods) (81924ndash26) Recent systematic investiga-tions however revealed severe method-dependent distor-tions in for example miRNA quantification (2728)Based on identical starting material the analysesdemonstrated that alternative methods in cDNA construc-tion resulted in entirely different miRNA expression levelprofiles Surprisingly the choice of sequencing platformcontributed little to the differences reported Library rep-licates to test for reproducibility yielded comparableresults indicating that data distortion was likely causedby differences inherent to cDNA construction protocols(27) Artificial RNA test sets containing equal amounts of473 synthetic human miRNAs were quantified by digitalgene expression analysis Throughout the study a non-uniform distribution of the miRNAs was uncovered thecorresponding values of miRNA expression differed by upto four orders of magnitude between datasets (27) Untilrecently sources of bias that might distort sRNA-seq datawere not considered Enzymes involved in RNA end-modification are obvious candidates for causing bias inrelative expression levels (29) however the subsequentsteps of reverse transcription and polymerase chainreaction (PCR) amplification are also likely to displaytemplate preferences thereby favoring the amplificationof some RNAs over others In particular the respectiveRNA GC-content greatly influences rates of cDNA syn-thesis and is also responsible for template-specific prefer-ences in PCR amplification (30ndash32) Not surprisingly a

recent analysis demonstrated that RNA secondary struc-ture and RNAndashadapter cofolding drastically influencedRNA ligation efficacies (33)

Here we review recent data indicating various sourcesof bias in sRNA-seq inherent to the common multipleexperimental steps involved in library preparationExperimental strategies and methods to overcome orminimize such distortion in expression levels are suggestedwherever possible Application of sRNA-seq-relatedmethods in mRNA deep sequencing and personalizedmedicine are also discussed

RNA END-MODIFICATION

The analysis of full-length non-protein coding RNAs insequencing projects requires RNA end-modification orequivalent strategies to ensure identification of nativeRNA termini as a precondition for cDNA construction(29) Irrespective of the ensuing protocol RNA 30-endsare subjected to enzymatic modification to add eitherhomopolymer stretches or synthetic adapter oligonucleo-tides (Figure 1 Supplementary Data and SupplementaryTable S1) The template-independent addition of oligo(C)or (A) stretches is catalyzed by Escherichia coli poly(A)polymerase Alternative strategies of RNA end-modifica-tion rely on RNA ligation to introduce adapter oligo-nucleotides RNA ligation is also applied to modifyRNA 50-ends

RNA 30-tailing

Homopolymer addition catalyzed by E coli poly(A) poly-merase is a well-established approach (3435) The enzymeaccepts all nucleotide triphosphates (NTPs) as substratesand catalyzes the template-independent addition ofnucleotide monophophate (NMP) to RNA 30-terminiAlternative procedures that employ the correspondingyeast nuclear counterpart (Poly(U) polymerase PUP)are more of a theoretical interest (3536) Sources of thepotential biases influencing RNA-30-end tailing are dis-cussed below

Substrate preferences of E coli Poly(A) polymeraseIn vitro bacterial polyadenylation is known to reduce thebiological halftime of targeted RNAs (37) Sequenceanalyses of in vivo-modified RNAs strongly suggestthat E coli poly(A) polymerase accepts only ATP as asubstrate in vitro it is reported to utilize albeit withunequal affinity all NTPs as substrates for transfer (38)Different affinities of poly(A) polymerase toward specificNTPs influence reaction rates significantly (38) In generalUTP and GTP additions necessitate extended incubationtimes to reach product yields comparable to tailing withATP and CTP Therefore the addition of homopolymeric(A) or (C) stretches is the most effective and commonpractice in cDNA generation (39ndash42)

RNA secondary structure influences RNA 30-end tailingIn vitro data indicate significant reduction in theefficacy of RNA 30-tailing caused by terminal stem loopstructures (38) implying that RNA denaturation prior to

2 Nucleic Acids Research 2013

RNA-tailing is of major importance to avoid bias (39)In vitro experiments also demonstrated that addition ofthree to six unpaired nucleotides might be sufficient torender the reaction secondary structure-independent

(38) However approximation of the minimal extensionlength needed to ameliorate RNA structure effectsshould not be generalized because it is likely to be RNAstructure-specific

Figure 1 Illustration of the steps involved in cDNA construction including potential sources of bias (A) The starting pool of non-protein codingRNAs with different 50- and 30-end modifications schematically indicated by different line types Abbreviations for the various modificationsOH hydroxyl OPO3 20-30-cyclic phosphate ppp triphosphate p monophosphate cap cap and 20-OCH3 20-O-methyl (B) The left paneldepicts different enzymatic pre-treatments prior to RNA 50-end ligation to enrich for different RNA subtypes From left to right RNA withoutany pretreatment (50-adapter ligation) RNA pretreated with tobacco acid phosphatase (TAP) RNA pretreated with TerminatorTM 50-phosphate-dependent exonuclease and TAP (Terminator 50-exo TAP) RNA treated with T4 polynucleotide kinase (T4 PNK) RNA classes accessible foradapter ligation after the respective 50-end pretreatments are schematically represented below each pretreatment The right panel depicts subtypes ofRNA classes accessible for 30-end tailing (-oligo(A) or ndasholigo(C) tailing) and adapter ligation (C) Possible biases associated with RNA 50-(left) and30-(right) end-modifications and with the subsequent steps of cDNA construction

Nucleic Acids Research 2013 3

RNA primary structure influences RNA 30-end tailingRecently to examine the influence of primary structureon RNA tailing efficacy three in vitro-transcribed testRNAs were assayed in comparative oligo(A)- andoligo(C)-tailing reactions (39) Interestingly after RNAdenaturation all test RNAs irrespective of primaryRNA structure performed equally well in the C-tailingassays and proceeded up to 95 completion as judgedby shifts in mobility (calculated as percent input) (39) Toexamine the influences of the 30-terminal nucleotide onRNA A-tailing RNAs identical except for the last nucleo-tide were examined in oligo(A)-tailing assays and up to2-fold changes in the efficacy of RNA modification werereported (36) The analysis also revealed that uridine atthe RNA 30-ends was the least preferred nucleotideHowever reaction kinetics indicated that even an RNAmolecule with an unfavorable 30-terminal uridine wasalmost completely modified if the reactions proceededlonger (36) Therefore increased incubation periodsmight compensate for substrate-specific variation inRNA 30-tailing efficacies Hence the available dataindicate that primary structure-related influences are lesslikely to bias sRNA-seq data if the protocol relies onRNA 30-end tailing However recent observations sug-gested that the RNA 50-end phosphorylation state influ-ences the resulting 30-end homopolymer tail length ofotherwise identical test RNAs (39)

RNA 30-end modification impacts tailing efficienciesThe 20-O-methylation of RNA 30-ends is also reported toreduce the efficacy of RNA tailing reactions (36) Theefficacy of oligo(A) addition can drop by as much as80 and even after extended incubation periods certainRNAs remain virtually unmodified (Figure 1) Therefore20-O-methylation at RNA 30-ends renders RNAs lessaccessible to RNA tailing (36) and such RNAs wouldeventually appear underrepresented in sRNA-seqNotably most plant miRNAs and vertebrate piRNAsare reported to be substrates for 20-O-methylation(4344) The drop in tailing efficacy and the correspondingdistortion in expression level might be particularlyrelevant when such RNAs are subjected to RNA tailingprocedures

RNA ligation

Today RNA ligase-catalyzed reactions are most oftenutilized to modify RNA termini for cDNA constructionThe genome of bacteriophage T4 harbors two differentRNA ligases T4Rnl1 and T4Rnl2 T4Rnl1 compensatesfor host defense mechanisms by sealing breaks introducedinto the anticodon loop of host tRNALys during infection(4546) T4Rnl2 has been detected more recentlyAlthough phylogenetically related to various classes ofDNA and RNA ligases the biological function ofT4Rnl2 remains unknown (334748) Both RNA ligasesas well as derivatives of T4Rnl2 are frequently utilized incDNA library construction In particular the truncatedversion of RNA ligase 2 has been reported to minimizeside reactions (4950) Variations in the compositions of

substrates and reagents during ligation are likely to intro-duce biases

RNA ligation-the reaction pathwayRNA ligation leads to ATP-dependent 30-50 phospho-diester bond formation Two molecular features aredistinguished (51) donor molecules that provide50-monophosphorylated RNA termini (5253) and ac-ceptors that contain 30-hydroxyl functional groups(52ndash54) (Figure 2) Two reaction intermediates areisolated suggesting a three-step reaction mechanismDuring the initial step of ligation accumulation ofmonoadenylated RNA ligase is observed (5556) Theenzyme-catalyzed adenosyl-transfer to the RNA ligaserequires phosphoramide bond formation and isaccompanied by pyrophosphate release (Figure 2) In thesecond step the AMP moiety is transferred to a50-monophosphorylated RNA donor molecule to form a50-adenylated RNA intermediate (50-AppRNA) (5253)(Figure 2) The 50-50 anhydride linkage is required toactivate donor molecules for the subsequent sealingreaction Finally an acceptor RNA 30-end hydroxylgroup (30-OH) attacks the alpha phosphorus of the 50-AppRNA donor molecule leading to covalent strandsealing and AMP release (5253) (Figure 2) The factthat pre-adenylated donors perform efficiently in reactionsconducted without ATP supplement provides formalproof of the reaction pathway and also is of major import-ance to practical applications (4957) The 50-App-modified adapters are now the reagents of choice tominimize side reactions (294957)

Side products in ligation reactionsA side reaction common to RNA ligation is the intramo-lecular RNA circularization (Figures 1 and 2) (29) Thereaction pathway is identical to the intermolecular ligationdiscussed above and is generally preferred due to closeproximity of the RNA ends Thus reactions that permitintra- and intermolecular products are dominated byintramolecular circularization (5158) It is of major im-portance to minimize side products in RNA ligation tomaintain reaction yields Generally the application of50-pre-adenylated adapter molecules in RNA 30-end liga-tions is presumed to abolish RNA circularization (5960)However the reversal of donor adenylation as a source ofcircularization was recently reported (293961) (Figure 2)In reactions performed with pre-adenylated donor mol-ecules 50-monophosphorylated RNAs were circularizedeven in the absence of ATP (2939) Remarkably forsome test RNAs the circularization constituted themajor product (39) Therefore preferred circularizationmight render certain RNAs inaccessible to cDNA con-struction The yield of circularization also depends onthe type of RNA-ligase enzyme utilized T4Rnl1-catalyzedreactions display the highest level of circularization (29)The tendency of T4Rnl1 to increase the amount of circularproduct might be explained in part by its biologicalfunction in intramolecular sealing reactions A C-termin-ally truncated version of T4Rnl2 displays minimal adenyl-transfer activity while maintaining the ability to form30 50 phosphodiester bonds (52) Ligase-mediated

4 Nucleic Acids Research 2013

RNA circularization depends on AMP removal from thepre-adenylated 50-App-adapter and its subsequent transferto 50-monophosphorylated RNAs therefore the applica-tion of truncated T4Rnl2 (T4Rnl2tr) to minimize this sidereaction is highly suggested (4950) In addition theexchange of lysine with glutamine (K227Q) in T4Rnl2trfurther reduced adenyl-transfer competence but retainedactivity to catalyze phosphodiester bond formation (48)Therefore the application of T4Rnl2 derivatives is bene-ficial for decreasing bias in sRNA-seq analysis (29)

Alternatively phosphatase pretreatment to generateRNA 50-OH termini that are inert to ligase-catalyzedadenylation helps to minimize side reactions The proced-ure ultimately requires the application of polynucleotidekinase to generate monophosphorylated RNA endsHowever tRNA rRNA and mRNA processing inter-mediates are reported to contain 50-hydroxylated termini(2962) Polynucleotide kinase treatment as a necessaryprerequisite to RNA 50-adapter ligation leads to the inclu-sion of abundant processing intermediates in cDNAlibraries The alternative techniques to avoid RNA circu-larization might distort cDNA representation and aretherefore likely to reduce the complexity of sRNA-seqexperiments (2962)

30-end ligation depends on RNAndashadapter combinationRNA 30-end ligation using T4Rnl1 was recently reportedas a potential source of bias in deep sRNA sequencing(39) Comparisons of the product yields of ligation reac-tions were made using three test RNAs and variousadapter sequences Depending on the specific RNAndashadapter combination the products differed remarkably

Similarly Jayaprakash et al (63) reported variations inproduct yields depending on specific RNAndashadapter com-binations in Rnl2tr-catalyzed reactions Because the RNA30-end ligation experiments were conducted with 50 pre-adenylated adapters the results also indicate thatadenyl-transfer is less likely the cause of the observedbias (63) RNA 30-adapter ligation was enhanced whenadapter pools displaying nucleotide variation at the two50 terminal positions were used (63) Therefore increasedvariability at adapter 50-ends enhances RNA ligation yieldand in general helps to minimize distortion in sRNA-seq(333963) RNA 50-phosphorylation also influences theefficacy of adapter ligation to RNA 30-ends (39) Theligation efficacies of T4 Rnl1-mediated reactions werecompared for test RNAs featuring different phosphoryl-ation states at RNA 50-ends and different adaptersincluding those with 50-App-modification UnexpectedlyRNAs that performed well when monophosphorylateddisplayed weak acceptor characteristics whentriphosphorylated Therefore the efficacy of RNA 30-endmodification might also be a function of the RNA 50-phos-phorylation state (Figure 1)

Ligation of adapters to modified RNAsIn ligation assays T4Rnl1 and T4Rnl2tr performedalmost identically using 30-unmodified test RNA sub-strates However the efficacy of T4Rnl1-catalyzed reac-tions dropped remarkably when 20-O-methylated RNAswere used (36) In contrast T4Rnl2tr performed almostequally well irrespective of RNA modification (36) Theadditions of 25 (wv) PEG 8000 to the T4Rnl2tr-mediated reactions significantly increased reaction yields

Figure 2 Schematic representation of the three-step mechanism of the bacteriophage T4 RNA ligase (T4Rnl) reaction with the potential sideproducts Encircled Arabic numbers indicate the order of ligation steps Step 1 adenylation of the T4Rnl active site Step 2 donor 50-adenylation(side reaction circularization of 30-end-unprotected donor molecules) Step 3 phosphodiester bond formation between 30-hydroxylated (OH) acceptorand donor molecules (side reaction reverse adenylation of donor and circularization of acceptor molecules) PPi pyrophosphate p monophosphateAMP adenosine monophosphate (Ap) App 50-adenylated termini

Nucleic Acids Research 2013 5

(5064) Hence additives that increase the effective mo-lecular concentration have a major influence on productyields and meta-analyses of data generated by differentligases or even in different buffer compositions are error-proneAs an aside specific enrichment of 30-end 20-

O-methylated RNAs like the piwi-interacting RNAs(piRNAs) or plant miRNAs is achieved by periodatetreatment The reaction relies on vicinal hydroxyl groupsto oxidize sugar moieties and is accompanied by pentosering opening The resulting vicinal dialdehyde reacts withwater and forms a chemical equilibrium with thecorresponding dioxane derivative upon ring closure The20-O-methyl group-containing RNAs are insensitive tooxidation and remain accessible for adapter ligation ortailing (65ndash67)

Impact of RNA secondary structure and adapter-RNAcofolding on ligationRecent analysis indicates that RNA secondary structureimpacts sRNA-seq bias (29) During kinetic profiling ofRNAndashadapter ligation test miRNAs were trapped in non-reactive states Denaturing the remaining non-ligatedRNAs resulted in increased ligation yield indirectly sug-gesting the influence of RNA secondary structure (29) Tofurther analyze the impact of secondary structure on RNA30-end ligation a pool of substrate RNAs with 21randomized nucleotides at their 30-ends was ligated withindividual pre-adenylated DNA adapters and reactionyields were subsequently quantified by deep sequencing(33) To avoid bias caused by template-specific RNA50-end modification(s) all test RNAs contained identical50-termini (33) RNAs that harbored stems at their 30-endswere underrepresented in the sequencing data whereasRNAs containing a minimum of three unstructured nu-cleotides at the 30-ends were overrepresented The dataindicate that RNA ligase-mediated reactions in generalare more efficacious with RNA substrates that harborstructurally accessible 30-ends (33)In addition the cofolding of RNA and adapter mol-

ecules impacts ligation (33) Adapter sequence variationthat interfered with the RNA 30-stem formation andpermitted intermolecular base-pairing increased ligationyields (33) Interestingly this data contradicts prioranalysis emphasizing a predominant primary sequence-related influence on data distortion (3363) Althoughthe actual source of discrepancy remains to be analyzedprotocol-specific variation probably due to changes in de-naturation is likely to have caused the observed difference(see supplementary information) Irrespective of the actualreason the advice for minimizing this bias is identical andrelies on the application of adapter pools to increase thelikelihood of productive interactions (33) Thus inaddition to the RNA nucleotide composition RNAndashadapter cofolding and secondary structure-related influ-ences contribute to sRNA-seq bias

RNA 50-end ligationThe chemical properties of RNA 50-ends display structuralvariations Depending on the starting material and experi-mental design it may be necessary to alter the chemistry

of RNA 50-termini prior to cDNA generation Forexample neither 50-triphosphorylated RNAs in prokary-otes nor primary eukaryotic RNA polymerase III tran-scripts are accessible for direct 5-adapter ligationSimilarly RNA polymerase II-specific trimethylguanosinecap structures render RNA 50-ends inaccessible Thusenzymatic pretreatment is usually conducted to avoidbias in the starting material (Figure 1 SupplementaryData and Supplementary Table S1)

Specific biochemistry of RNA 50-termini and enrichmentof various RNA classesAlthough the various chemical properties of RNA 50-endsestablish limitations for unbiased representation they alsopermit specific enrichments (41) (Figure 1 SupplementaryData and Supplementary Table S1) RNAs that containmonophosphorylated 50-ends and 30-hydroxyl groups aredirectly accessible to cDNA generation proceduresConsequently unaltered RNA starting material leadsto its specific enrichment (Figure 1 SupplementaryTable S1) In contrast triphosphorylated or 50-cappedRNAs require TAP treatment to generate 50-monophosphorylated RNAs and to permit 50-end ligation

Biochemical alternatives to specifically cleave pyrophos-phate bonds in 50-triphophorylated RNAs were reportedrecently (68) whereby TAP-mediated hydrolysis issubstituted by polyphosphatase (69) or pyrophospho-hydrolase (7071) treatment In vitro analysis to investigatethe substrate specificity of E coli rppH pyrophospho-hydrolase indicated that pyrophosphate hydrolysisproceeds with identical efficacy irrespective of the RNA50-terminal nucleotide (71) Furthermore a subtle declinein the enzymatic activity in response to potentially inhibi-tory RNA 50-terminal stem loop structures was reported(71) These strategies are particularly appealing when bac-terial primary transcripts or nascent eukaryotic RNA poly-merase III transcripts are central to the investigation Inaddition to targeting triphophorylated RNA 50-ends poly-phosphatase also accepts diphophorylated RNA 50-terminito catalyze pyrophosphate bond cleavage It should beemphasized that 50-capped RNA polymerase II transcriptsare not amenable to polyphosphatase or pyrophospho-hydrolase treatment and therefore escape identification

The application of a TerminatorTM 50-phosphate-dependent exonuclease is frequently utilized to enrich for50-capped and 50-triphospharylated RNAs (417273)The inherent 50 30 exonuclease activity digests50-monophosphorylated RNAs and leaves other tran-scripts unaltered (Figure 1 Supplementary Data andSupplementary Table S1) Therefore different chemistriesof RNA 50-ends not only distort sRNA-seq but also allowfor specific enrichment protocols Subsequent analysispermits the distinction among RNA subsets such asprimary transcripts capped RNAs and other productsof RNA 50-end processing (417273)

Nucleotide preference of RNA 50 ligationApart from specific differences in RNA 50-ends the influ-ence of the adapter sequence itself on ligation efficacy iswell documented Biochemical data indicated that ligationreactions proceeded most effectively when an adenosine

6 Nucleic Acids Research 2013

residue occupied the 30-terminal position of adapter(acceptor) molecules (5174) Recently adapter pools dis-playing sequence identity at all except the two 30-terminalnucleotides were used to investigate sequence-dependentvariation in acceptor function Based on deep sequencingas the readout the relative ranking of miRNA frequenciesvaried as a consequence of the specific adapterndashRNA com-binations (63) Once more miRNA expression profiles arenot only a function of true RNA abundance but alsoreflect varying ligation efficacies (63) Interestingly theuse of adapter pools highly improved the correlation ofmiRNA levels between replicates (63)

Influence of RNA secondary structure on RNA 50-endligationTo investigate potential influences on 50-end modificationderived from RNA secondary structure ligation yieldswere compared among test sets of fully randomizedRNAs A remarkable underrepresentation of double-strand folds at 50-ligation sites indicated strong preferencesfor single-stranded RNA substrates in ligation reactionsconducted with T4Rnl1 (75)

Sugar moiety-dependent change in ligation efficacyDNA-based adapter oligonucleotides performwell in RNA30-end ligation reactions (especially when 50-adenylated)whereas RNA-based adapters are preferred for RNA50-end modification (76) Kinetic analysis of DNA-basedadapter molecules for RNA 50-ligation demonstratedslower reactions compared with their RNA counterparts(77) Recent analyses revealed more complex substrate-specific effects between test RNAs and different adaptersWhen DNA- and RNA-based adapters of identical se-quences were investigated in ligation assays no generalpreference with regard to the sugar moiety of the adaptermolecule was observed in RNA 50-end ligation (3963) Inconclusion to avoid bias adapter pools are recommendedto display sequence variation The chance of productiveligation might be enhanced using RNA DNA or mixedadapter oligonucleotides (3963)

Adapter barcoding influences ligationIn addition to nucleotide preferences of the reactingresidues upstream sequences also exert influence onligation yields Adapters harboring different Roche MIDtags placed 19 nucleotides upstream from the reacting30-ends performed entirely differently when assayed withidentical test RNAs (39) Depending on the combinationof in vitro transcribed non-protein coding RNAs andMID-modified adapters ligation product levels (calculatedas non-protein coding RNA input) ranged from 60ndash70to undetectable (39) Recent reports emphasized thatbarcoding for multiplex sequencing leads to significantbias in miRNA expression profiles (78ndash80) This bias wasdiminished when barcodes were introduced by PCRamplification of cDNA (78ndash81)

Quantitative considerations and barcodingIn general the discovery of novel RNAs and analyses oftranscripts that are expressed at very low levels requiresubstantially more cDNA reads than experiments

directed toward investigating differential gene expressionsHence the design of sRNA-seq experiments needs totake into account the rationale of the investigationContigs formed by continuously overlapping readsduring genomic mapping procedures represent thebio-computational equivalent of individual RNAentities Any selected read during cDNA mapping eithercontributes to pre-existing already populated RNAs ormight generate novel contigs The probability of eitherprocess is a function of RNA size as well as the corres-ponding RNA expression value Of course contigs repre-senting longer rather than shorter RNAs are more likelyto be populated by chance RNAs of higher relative ex-pression levels are generally more densely packed withcDNA reads than RNAs that display medium or low tran-scription levels Consequently if multiplexing is chosenfor economic reasons the necessary sequencing depthmust be approximated prior to sequence runs in order togenerate adequate sample sizes The two fixed parametersfor estimating admissible multiplexing are therefore (i) ex-perimental goal and (ii) lane capacity According toIllumina TruSeq Small RNA Sample Prep Kitguidance 5 million sRNA-seq reads are considered appro-priate for novel miRNA discovery within human-sizedgenomes For miRNA profiling usually about 25 millionreads are recommended These approximations permit re-calculation geared toward specific needs and requirements

Alternative cloning strategies to avoid 50-end modificationBecause the 30-terminal position of cDNA corresponds tothe RNA 50-ends Pak et al (82) used a technique to avoidmodification of RNA 50-termini and therefore to minimizethe need for enzymatic pretreatment The strategy intro-duces adapter sequences not to RNA 50-termini butrather to 30-ends of the corresponding cDNAs (82) Theadvantage of omitting enzymatic treatment of RNA50-ends might be offset by incomplete reverse transcriptionthat precludes full-length cDNA cloning or appropriateassessment of RNA 50-endsRecently ligation to cDNA 30-ends was investigated in

more detail (36) Perfect double-stranded DNA-RNA testhybrids and single-stranded DNAs were utilized to mimicpotential products of reverse transcription All T4 DNA-and RNA-ligases including T4Rnl2 truncated derivativeswere compared to delineate optimal conditions for cDNA30-end modification Surprisingly ligation reactionscarried out with pre-adenylated DNA donor moleculesstrongly indicated that double-stranded DNA-RNAchimera constituted the most reactive substrates forT4Rnl1-mediated ligation In addition supplements ofPEG 8000 strikingly influenced ligation final concentra-tions of 25 PEG 8000 increased reaction yields Identicalsingle-stranded test cDNAs exhibited only poor substratecharacteristics The A-form-like conformation of theDNA-RNA hybrid might have compensated for theprimary structure-related influences and permitted moresimilar ligation outcomes among substrates (36)However the analysis examined only one test substrateand therefore might not be amenable to drawing generalconclusions (36) A further interesting observation indi-cates that cDNA 50-terminal nucleotides influence cDNA

Nucleic Acids Research 2013 7

30-end ligation yields (39) As the cDNA 50-terminalsequence is a consequence of specific RNA 30-end modifi-cation differences between protocols exert different influ-ences on cDNA 30-end adapter ligation and in turn mightalso cause expression profile bias (39)The lsquoSmartTM Approachrsquo (83) offers another interesting

alternative to capture mature RNA 50-ends and to avoidRNA 50-end adapter ligation The technology is based onthe template switching capacity of Moloney murineleukemia virus (M-MLV) reverse transcriptase Oncefirst strand synthesis approaches the RNA 50-end theterminal transferase activity of M-MLV reverse transcript-ase adds oligo(C)-stretches to the growing cDNAOligonucleotides that 30-terminate with oligo(G) stretchesserve to extend templates and permit reverse transcriptionto proceed Thus M-MLV reverse transcriptase enablesthe generation of cDNAs with known terminal sequenceelements (83) and thereby eliminates the need to modifythe RNA 50-ends Recently the technology was adapted tothe specific needs of Ilumina RNA-seq analysis (84)

Adapterndashadapter ligation disturbs sRNA-seqA side effect common to sRNA-seq protocols that rely onligase-mediated sequential RNA end-modification is PCRamplification of empty adapterndashadapter ligation products(8185) Adapterndashdimer formation is a direct consequenceof the molecular adapter excess employed to ensure suffi-cient product yield and might dominate in reactions wherethe concentration of total RNA starting material is low(8186) Adapter molecules that are 50-pre-adenylatedremain in reaction mixtures after RNA 30-end modifica-tion and react with RNA 50-adapter oligonucleotideswhich subsequently favors dimer formation Thereforepurifying products away from excess adapters via gelelectrophoresis is recommended (78) However for prep-aration of tiny RNAs such as miRNAs separation of thesmall size difference between ligated product and adaptersis not trivial Alternative approaches to avoid size frac-tionation rely on application of complementary lsquodimer-eliminator-LNA-oligonucleotidesrsquo (8586) The interferingoligonucleotide is designed to overlap the adapterndashadapterligation junction and thereby distinguish between targetedside product and adapter ligated RNA (86) The comple-mentarity of the interfering oligonucleotides extends intothe RNA 30-adapter sequence thus blocking primerannealing during reverse transcription and as a resultexcludes adapter dimers from RNA sequencing (86)Vigneault et al reported a similar strategy to overcomeunintended dimer amplification (81) Pre-annealing of theRT primer after RNA 30-adapter ligation but prior toRNA 50-end modification efficiently decreased adapterdimer formation The excess of non-ligated free RNA30-adapters was captured by complementary reverse tran-scriptase primers and sequestered in less-reactive double-stranded structures (81) The latter approach permitseffective reduction of dimer amplification and is reportedto increase cDNA library complexity (81) ObviouslyRNA tailing strategies (see above) are superior becausethey completely circumvent the problem of adapter-dimer formation To avoid adapter tailing as an analogous

side reaction to dimer formation it is advisable to carryout 30-tailing prior to 50-adapter ligation

Depending on the experimental goal sRNA-seqrequires varying amounts of cDNA reads (see above)Therefore techniques to reduce the amount of readsthat represent empty adapter dimers are of central import-ance as they permit an increase in sequencing depthreducing costs and ultimately enabling novel smallnon-protein coding RNA gene discovery

PCR AMPLIFICATION BIAS

Apart from novel third-generation sequencingtechnologies that permit direct sequence analysis of un-amplified cDNA all commonly used protocols implementPCR amplification (87) In addition the introduction ofbarcode identifiers by PCR amplification may also avoidbias introduced by differences in RNAndashadapter-specificligation efficacies (78) Therefore multiplexing of differentsamples during PCR is widely regarded to permit meta-analysis (78) However even PCR amplification itself isknown to introduce expression bias into datasets(303288) In particular when many different templatesare amplified in parallel variations not only in templatelength but also in base composition might lead topreferred template-specific amplifications (30) Extremeexamples of allelic dropout due to single base-exchangewere reported for PCR-mediated genotyping (89) TheRNA GC-content is especially responsible for data dis-tortion Generally cDNAs of high GC-content are lessreadily amplified and thus remain underrepresented Forexample read distributions calculated over the entire 28SrRNA were inversely correlated with the correspondingregional GC-content (32) As argued earlier base com-positions of different small non-protein coding RNAs(miRNAs piRNAs snoRNAs scaRNAs etc) varywidely The parallel amplification of complex templatemixtures that vary in their respective GC-content is there-fore a challenging procedure and likely to influence thecomplexity of the resulting cDNA library (30) Changesin complexity do not only influence distortion of quanti-tative measurements but certain RNAs might also lsquodropoutrsquo completely and remain unidentified Extended periodsof denaturation are considered to effectively decrease thelikelihood of uneven amplification (90) PCR amplifica-tion does not only change the overall GC-content of theresulting cDNA pool but is also reported to introducelength bias (30) which might be an important factorwhen RNAs of different sizes are subjected to parallelcDNA preparations Shorter sequences of significantlylower GC-content are likely to be overrepresented

Recent analysis also emphasized the impact of PCRbuffer compositions on PCR amplification bias (30)Remarkably PCR-mediated bias was already observedin some cases within 15 rounds of PCR amplification(30) While the popular Phusion PCR system surprisinglyperformed the worst (30) data indicate that someoptimized PCR buffers are capable of significantlyreducing PCR-mediated distortion (3090)

8 Nucleic Acids Research 2013

sRNA-SEQ AND PERSONALIZED MEDICINE

In recent years increasing attention has focused on thepotential of miRNA transcriptional profiling to diagnosedisease or to monitor progression and success in treatment(161791) The decreasing costs of high-throughputsequence analysis favors sRNA-seq as a common toolfor medical diagnostics and personalized medicine Aftercell death the total RNA of circulating cancer cells isreleased into the blood stream (92) These extracellularRNAs (exRNAs) are packaged within vesicles such asexosomes where they are complexed with differentproteins for example miRNAs in the RISC complexVesicles protect exRNA from ribonucleases and preventdegradation Among the various exRNAs miRNAs aresome of the most stable molecules (93) Easy accessibilityof patient blood samples and the stability of miRNAsmake them effective biomarkers (94) Diagnostics basedon miRNA were successfully conducted on biologicalsamples of low quality including dried blood (95) Inaddition miRNAs remain stable even when samples aresubjected to repeated rounds of tissue freezing andthawing which usually degrades RNAs (96) A furtherappealing feature of miRNAs in medical research is thatchanges in their expression profiles are observed duringdisease onset (97) In addition miRNAs often displaytissue-specific expression Therefore it might be possibleto identify the cellular origin of most metastases (91)possibly providing decision guidance for appropriatetreatment However as elaborated in detail miRNA ex-pression profiles detected by deep sequencing do not onlyreflect changes in relative expression but are alsoinfluenced by miRNA-adapter-specific ligation efficaciesespecially in cost-effective parallel analyses that oftenrequire multiplexing Different barcode introduced byligation generate bias in miRNA representation and abun-dance (78) Even when incorporated by PCR barcodingmight still lead to data distortion (30ndash32) Third-gener-ation sequencing techniques such as the Helicos (InNovember 2012 Helicos filed bankruptcy (httpbizyahoocome121115hlcs8-khtml)) system avoid PCRamplification steps in cDNA preparation and permitdirect sequencing of unamplified samples (87) The RNA30-end modification to prime cDNA synthesis (or directsequencing) in addition to relatively high error frequencies(15) might still lead to bias and so would not beespecially applicable in personalized medicine (98)Collectively deep sequencing to monitor change inmedical samples is of importance and certainly ofinterest but requires resolution of serious standardizationissues prior to general application

qRT-PCR a valid alternative to deep sequencing

Alternative techniques to quantify specific miRNA testsets rely on quantitative real-time reverse transcriptionPCR (qRT-PCR) (99) In particular diagnostic screensin cancer medicine rely on the application of qRT-PCR(99ndash101) In cases where biomarker disease correlation iswell established and when quantification of limitednumbers of biomarkers is sufficient for diagnosis and tomonitor disease progression or treatment qRT-PCR is

a valuable tool (99) Recently the convenience ofqRT-PCR-based medical diagnosis was suggested tomake it superior in routine clinical applications (99)Interestingly novel developments in qRT-PCR allowsample processing in 384-well formats and liken thismethod to cost-effective high-throughput applications(99102) A further advantage of qRT-PCR is the reduc-tion in downstream bio-computation (99103) The timebetween qRT-PCR and diagnostic result is shorter thanfor sRNA-seq or microarray-based inference Howeverthe analysis of miRNA expression profiles by qRT-PCRis technically challenging due to RNA-inherent pitfalls(103104)The miRNAs lack structural features such as mRNA

poly(A) tails that could serve as class-specific enrichmentor priming of reverse transcription (103) Alternative tech-niques for miRNA sample enrichment would rely onimmune-precipitation but most of them are not suitablefor typical small size samples in medical analysis (105) Asmentioned above the heterogeneity of the RNA GC-content translates into a wide range of primer annealingtemperatures for a population of miRNAs and mighttherefore complicate parallel analysis of multiple RNAs(103) Furthermore the qRT-PCR template is presentnot only within the mature miRNA molecule but also ispart of various precursor transcripts (103) It is thereforeimportant to ensure that primary or precursor-miRNAs(pri- or pre-miRNAs) do not provide templates inqRT-PCR amplification and as a consequence do notcontribute to the signal intensities detected Difficultiesthat relate to the similarity displayed between differentmiRNA isoforms are mentioned above within thecontext of array-based inference to quantify differentialgene expressionThus far two alternative approaches are commonly

utilized to reverse transcribe miRNA templates(103106) miRNA-specific stem-loop primers are oftenused when only limited numbers of RNAs are subject toquantification A major advantage of using miRNA-specific primers for reverse transcription is an increase insensitivity and specificity However RNA 30-end-tailing-based approaches enable bulk reverse transcription ofmultiple targets in parallel and are therefore advantageouswhen many miRNAs are analyzed (see above) A recentcomparative analysis to investigate reverse transcriptionefficacies of the two methods uncovered only minor dif-ferences (106) The analysis involved only a limitedmiRNA test set and therefore the results cannot begeneralized As an aside the oligo d(T)-based designmight also enable reverse transcription of miRNA andthe corresponding mRNA target in single test tube reac-tions (106) Ligation-mediated PCR or padlock probes incombination with rolling circle amplification are technicalalternatives for avoiding reverse transcription (107108)The methods rely on ligation-mediated template gener-ation and are therefore not devoid of bias (see above)Both approaches permit generation of miRNA signaturesacross samples and do not require elaborate analyticalpost-processing or expensive technical equipment(107108) However as elaborated here and abovereverse transcription is not devoid of template-specific

Nucleic Acids Research 2013 9

bias In particular RNA secondary structure and RNAmodifications both influence reverse transcription efficacy(see above) Similar to PCRs reverse transcription is nega-tively affected by increasing RNA GC-contentBecause the relative RNA GC-content dictates primer

Tm qRT-PCRs are also strongly influenced by forwardprimer design thereby affecting both qRT-PCR specificityand sensitivity Altering the length of primer-templatecomplementarity permits adjustment of the actualprimer Tm Conversely in cases where RNA nucleotidecomposition would dictate very low melting temperaturesLNA-modified nucleotides might be incorporated into theprimer sequence to increase the corresponding primer Tm(18) However the design of LNA-containing oligonucleo-tides is not always trivial (109) For the actual measure-ment of miRNA in real time two alternative fluorescentreporter molecules are frequently utilized SYBR Greenand the Taqman assay (103) Detection of the fluores-cence signal emitted by SYBR Green increases about100-fold as the molecule intercalates in double-strandedDNA However the mechanism of DNA intercalationsuggests the dye is not target-specific and thereforecannot discriminate primer dimers unspecific productsand target amplification (103) Because the intercalatingdye is simply added to the assay it is often possible toeasily adapt the qRT-PCR assay to already establishedand validated analytical PCRs (104) In comparison theTaqman assay provides increased specificity as the de-tection is based on sequence-specific probendashtarget inter-action Certainly the qRT-PCR assays offer manyadvantages but require evaluation of various factors thatinfluence and potentially bias analysis Thereforeinternational standardization to ensure comparable ex-perimental design and to permit meta-analysis amongdatasets is of prime importance for establishing clinicalqRT-PCR-based personal medicine The MinimumInformation for Publication of Quantitative Real-TimePCR experiments guidelines established standards for bet-ter experimental practice and suggest routes to data ana-lysis that increase reliability and data comparison (110)

mRNA-SEQ

Deep sequencing is not limited to expression profiling ofmiRNA or other small non-protein coding RNAs Evenentire transcriptomes including all protein coding andlong non-protein coding RNAs are subjected to high-throughput sequencing methods Currently as the lengthof single sequence runs is limited (from 100 to 500 nt de-pending on the sequencing platform) longer RNAsrequire fragmentation prior to cDNA synthesis (25)Millions of small reads resulting from mRNA-seq areanalyzed to identify novel transcripts or splice variants(111112) Relative quantification of reads enables one tomonitor differential gene expression among samples andto identify changes in isoform distribution (111) As withsRNA-seq the digital nature of RNA deep sequencing isthought to simplify meta-analysis of mRNA or long non-protein coding RNA expression data (25) Some protocolsfor modification of RNA fragment ends rely on

chemistries identical to sRNA-seq (84113) Therefore ex-pression biases observed in sRNA-seq analysis are alsolikely to cause distortion in whole transcriptome profilingVariations in nucleotide composition and changes in RNAsecondary structure of different RNA fragments mightcorrelate with uneven read distribution Unequal readcoverage generates uncertainty in isoform detection andin extreme cases exons representing unfavorable nucleo-tide compositions might escape analysis As fragmentationis suggested to generate many different but at least par-tially overlapping reads bias in mRNA-seq might be lessprominent Recently a novel technique based on circligasewas introduced to avoid RNA 50-end ligation (114)Reverse transcription is conducted with anchoredoligo(dT) primers that contain an additional sequence topermit later PCR amplification Following first strandcDNA synthesis circligase-mediated ligation is carriedout The resulting circular molecules are subjected toPCR amplification The strategy is of general interestand might be a valuable tool for sRNA-seq as well

CONCLUSION

Bias in general describes systematic errors that reflectmethod-related distortion from the truth Bias might bedetected easily when data generated by different methodsare subjected to analytical comparison Because it is sys-tematic in nature bias is not subject to variation inrepeated experiments Here based on biochemical anddeep sequencing data analyses we have presentedvarious steps in sRNA-seq protocols likely to causesevere distortions in the relative expression levels of indi-vidual sRNAs

The efficacy of RNA end-modification to permit cDNAconstruction depends not only on differences in strategiesand reagents but also on changes in among others buffercompositions and additives that could increase molecularcrowding Therefore meta-analysis of data generated withnon-identical methods reagents or even buffers is likely tobe error-prone (36396378) To achieve comparableresults and to evaluate relative changes in the quantifica-tion of the resulting data experiments and analyses shouldbe conducted under strictly identical conditions It has tobe stressed that bias in RNA-specific variation is not onlyrestricted to inter-sample comparison but also effects therelative ranking of intra-sample RNA expression

In addition to the more obvious quantitative aspects ofbias there are also qualitative aspects because RNA-seqmethods are also employed for RNA discovery IndividualRNAs might lsquodrop outrsquo from detection because they arenot amenable to cDNA construction under chosen condi-tions Certainly to increase the likelihood of productiveRNA end-modification and increase the complexity ofsRNA-seq library content the parallel application of dif-ferent strategies and reagents is required (33396379) Inparticular increased variability of adapter sequences (ieadapter pools) helps to increase the diversity of RNAsaccessed (63) A recent report emphasized the need forparallel application of different RNA- and DNA-basedadapter oligonucleotides (39)

10 Nucleic Acids Research 2013

The source of expression level distortion and the failureto detect certain RNA species at all appear not to belimited to reactions of RNAcDNA-end modificationbut also apply to cDNA amplification by PCR (30ndash32)Differences in reaction buffers RNA GC-content sec-ondary structures RNA length and primers might eachlead to bias during PCR Therefore the analysis ofsRNA-seq data to examine relative changes in geneexpression or identification reflects both true RNA abun-dance and biases related to the methods applied

The sRNA-seq approach is also attractive to medicalresearch Deep sequencing might permit the screening ofmultiple patient samples to monitor the progression ofdisease or treatment Therefore it is urgently necessaryto establish rigid international standardizations tominimize distortion and to avoid misleading conclusionsduring RNA-seq data interpretation In summary quali-tative and quantitative RNomics remain more challengingthan anticipated

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

FUNDING

National Genome Research Network [NGFNIII01GS0808 to JB and TSR] the Deutsche Forschungs-gemeinschaft [BR 7544-1 to JB] USM ResearchUniversity Grant [1001CIPPT81305 to THT]Funding for open access charge The licence was fundedby Dr Brosiusrsquos research budget

Conflict of interest statement None declared

REFERENCES

1 BrosiusJ (2009) The fragmented gene Ann N Y Acad Sci1178 186ndash193

2 BrosiusJ and TiedgeH (2004) RNomenclature RNA Biol 181ndash83

3 LiuQ and ParooZ (2010) Biochemical principles of small RNApathways Annu Rev Biochem 79 295ndash319

4 BrosiusJ (2012) In ErdmannVA and BarciszewskiJ (eds)From Nucleic Acids Sequences to Molecular Medicine RNATechnologies Springer Berlin Heidelberg pp 1ndash18

5 HuttenhoferA BrosiusJ and BachellerieJP (2002) RNomicsidentification and function of small non-messenger RNAs CurrOpin Chem Biol 6 835ndash843

6 AnS and SongJJ (2011) The coded functions of noncodingRNAs for gene regulation Mol Cells 31 491ndash496

7 BushatiN and CohenSM (2007) microRNA functions AnnuRev Cell Dev Biol 23 175ndash205

8 ChiangHR SchoenfeldLW RubyJG AuyeungVCSpiesN BaekD JohnstonWK RussC LuoS BabiarzJEet al (2010) Mammalian microRNAs experimental evaluation ofnovel and previously annotated genes Genes Dev 24 992ndash1009

9 AmbrosV (2001) microRNAs tiny regulators with greatpotential Cell 107 823ndash826

10 LauNC LimLP WeinsteinEG and BartelDP (2001) Anabundant class of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 294 858ndash862

11 RuvkunG (2001) Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash799

12 RuvkunGB (2003) The tiny RNA world Harvey Lect 991ndash21

13 GurtanAM and SharpPA (2013) The Role of miRNAs inRegulating Gene Expression Networks J Mol Biol 4253582ndash3600

14 LiangH ZhangJ ZenK ZhangCY and ChenX (2013)Nuclear microRNAs and their unconventional role in regulatingnon-coding RNAs Protein Cell 4 325ndash330

15 HuangV and LiLC (2012) miRNA goes nuclear RNA Biol 9269ndash273

16 WangZ HanJ CuiY FanK and ZhouX (2013) CirculatingmicroRNA-21 as noninvasive predictive biomarker for response incancer immunotherapy Med Hypotheses 81 41ndash43

17 ZhaoC ZhangJ ZhangS YuD ChenY LiuQ ShiMNiC and ZhuM (2013) Diagnostic and biological significance ofmicroRNA-192 in pancreatic ductal adenocarcinoma Oncol Rep30 276ndash284

18 van RooijE (2011) The art of microRNA research Circ Res108 219ndash234

19 LeeLW ZhangS EtheridgeA MaL MartinD GalasDand WangK (2010) Complexity of the microRNA repertoirerevealed by next-generation sequencing RNA 16 2170ndash2180

20 OkoniewskiMJ and MillerCJ (2006) Hybridization interactionsbetween probesets in short oligo microarrays lead to spuriouscorrelations BMC Bioinformatics 7 276

21 MortazaviA WilliamsBA McCueK SchaefferL andWoldB (2008) Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 5 621ndash628

22 LiuS LinL JiangP WangD and XingY (2011) Acomparison of RNA-Seq and high-density exon array fordetecting differential gene expression between closely relatedspecies Nucleic Acids Res 39 578ndash588

23 RaghavachariN BarbJ YangY LiuP WoodhouseKLevyD OrsquoDonnellC MunsonPJ and KatoG (2012) Asystematic comparison and evaluation of high density exon arraysand RNA-seq technology used to unravel the peripheral bloodtranscriptome of sickle cell disease BMC Med Genomics 5 28

24 MartinJA and WangZ (2011) Next-generation transcriptomeassembly Nat Rev Genet 12 671ndash682

25 WangZ GersteinM and SnyderM (2009) RNA-Seq arevolutionary tool for transcriptomics Nat Rev Genet 1057ndash63

26 KozomaraA and Griffiths-JonesS (2011) miRBase integratingmicroRNA annotation and deep-sequencing data Nucleic AcidsRes 39 D152ndashD157

27 LinsenSE de WitE JanssensG HeaterS ChapmanLParkinRK FritzB WymanSK de BruijnE VoestEE et al(2009) Limitations and possibilities of small RNA digital geneexpression profiling Nat Methods 6 474ndash476

28 BakerM (2010) MicroRNA profiling separating signal fromnoise Nat Methods 7 687ndash692

29 HafnerM RenwickN BrownM MihailovicA HolochDLinC PenaJT NusbaumJD MorozovP LudwigJ et al(2011) RNA-ligase-dependent biases in miRNA representationin deep-sequenced small RNA cDNA libraries RNA 171697ndash1712

30 DabneyJ and MeyerM (2012) Length and GC-biases duringsequencing library amplification a comparison of variouspolymerase-buffer systems with ancient and modern DNAsequencing libraries Biotechniques 52 87ndash94

31 OrpanaAK HoTH and StenmanJ (2012) Multiple heatpulses during PCR extension enabling amplification of GC-richsequences and reducing amplification bias Anal Chem 842081ndash2087

32 SendlerE JohnsonGD and KrawetzSA (2011) Local andglobal factors affecting RNA sequencing analysis Anal Biochem419 317ndash322

33 ZhuangF FuchsRT SunZ ZhengY and RobbGB (2012)Structural bias in T4 RNA ligase-mediated 30-adapter ligationNucleic Acids Res 40 e54

34 DeChiaraTM and BrosiusJ (1987) Neural BC1 RNA cDNAclones reveal nonrepetitive sequence content Proc Natl AcadSci USA 84 2624ndash2628

Nucleic Acids Research 2013 11

35 WickensM and KwakJE (2008) Molecular biology A tail talefor U Science 319 1344ndash1345

36 MunafoDB and RobbGB (2010) Optimization of enzymaticreaction conditions for generating representative pools of cDNAfrom small RNA RNA 16 2537ndash2552

37 SarkarN (1997) Polyadenylation of mRNA in prokaryotesAnnu Rev Biochem 66 173ndash197

38 Yehudai-ResheffS and SchusterG (2000) Characterization of theEcoli poly(A) polymerase nucleotide specificity RNA-bindingaffinities and RNA structure dependence Nucleic Acids Res 281139ndash1144

39 RaabeCA HoeCH RandauG BrosiusJ TangTH andRozhdestvenskyTS (2011) The rocks and shallows of deep RNAsequencing Examples in the Vibrio cholerae RNome RNA 171357ndash1366

40 RaabeCA SanchezCP RandauG RobeckT SkryabinBVChinniSV KubeM ReinhardtR NgGH ManickamRet al (2010) A global view of the nonprotein-codingtranscriptome in Plasmodium falciparum Nucleic Acids Res 38608ndash617

41 SharmaCM HoffmannS DarfeuilleF ReignierJ FindeissSSittkaA ChabasS ReicheK HackermullerJ ReinhardtRet al (2010) The primary transcriptome of the major humanpathogen Helicobacter pylori Nature 464 250ndash255

42 SharmaCM and VogelJ (2009) Experimental approaches forthe discovery and characterization of regulatory small RNACurr Opin Microbiol 12 536ndash546

43 YuB YangZ LiJ MinakhinaS YangM PadgettRWStewardR and ChenX (2005) Methylation as a crucial step inplant microRNA biogenesis Science 307 932ndash935

44 HouwingS KammingaLM BerezikovE CronemboldDGirardA van den ElstH FilippovDV BlaserH RazEMoensCB et al (2007) A role for Piwi and piRNAs in germcell maintenance and transposon silencing in Zebrafish Cell 12969ndash82

45 AckermannHW (2003) Bacteriophage observations andevolution Res Microbiol 154 245ndash251

46 RossmannMG MesyanzhinovVV ArisakaF andLeimanPG (2004) The bacteriophage T4 DNA injectionmachine Curr Opin Struct Biol 14 171ndash180

47 HoCK and ShumanS (2002) Bacteriophage T4 RNA ligase 2(gp241) exemplifies a family of RNA ligases found in allphylogenetic domains Proc Natl Acad Sci USA 9912709ndash12714

48 YinS HoCK and ShumanS (2003) Structure-function analysisof T4 RNA ligase 2 J Biol Chem 278 17601ndash17608

49 AravinA and TuschlT (2005) Identification and characterizationof small RNAs involved in RNA silencing FEBS Lett 5795830ndash5840

50 ViolletS FuchsRT MunafoDB ZhuangF and RobbGB(2011) T4 RNA ligase 2 truncated active site mutants improvedtools for RNA analysis BMC Biotechnol 11 72

51 UhlenbeckOC and GumportRI (1982) T4 RNA LigaseIn BoyerPD (ed) The Enzymes Nucleic Acids Part BAcademic Press Inc New York pp 31ndash58

52 HoCK WangLK LimaCD and ShumanS (2004) Structureand mechanism of RNA ligase Structure 12 327ndash339

53 El OmariK RenJ BirdLE BonaMK KlarmannGLeGriceSF and StammersDK (2006) Molecular architectureand ligand recognition determinants for T4 RNA ligase J BiolChem 281 1573ndash1579

54 StarkMR PleissJA DerasM ScaringeSA and RaderSD(2006) An RNA ligase-mediated method for the efficient creationof large synthetic RNAs RNA 12 2014ndash2019

55 CranstonJW SilberR MalathiVG and HurwitzJ (1974)Studies on ribonucleic acid ligase Characterization of anadenosine triphosphate-inorganic pyrophosphate exchangereaction and demonstration of an enzyme-adenylate complex withT4 bacteriophage-induced enzyme J Biol Chem 2497447ndash7456

56 HigginsNP GeballeAP SnopekTJ SuginoA andCozzarelliNR (1977) Bacteriophage T4 RNA ligase preparationof a physically homogeneous nuclease-free enzyme fromhyperproducing infected cells Nucleic Acids Res 4 3175ndash3186

57 PfefferS Lagos-QuintanaM and TuschlT (2005) Cloning ofsmall RNA molecules Curr Protoc Mol Biol Chapter 26 Unit26 24

58 BruceAG and UhlenbeckOC (1978) Reactions at the terminiof tRNA with T4 RNA ligase Nucleic Acids Res 5 3665ndash3677

59 VigneaultF SismourAM and ChurchGM (2008) EfficientmicroRNA capture and bar-coding via enzymatic oligonucleotideadenylation Nat Methods 5 777ndash779

60 LiuJM LivnyJ LawrenceMS KimballMD WaldorMKand CamilliA (2009) Experimental discovery of sRNAs in Vibriocholerae by direct cloning 5StRNA depletion and parallelsequencing Nucleic Acids Res 37 e46

61 KrugM and UhlenbeckOC (1982) Reversal of T4 RNA ligaseBiochemistry 21 1858ndash1864

62 HafnerM LandgrafP LudwigJ RiceA OjoT LinCHolochD LimC and TuschlT (2008) Identification ofmicroRNAs and other small regulatory RNAs using cDNAlibrary sequencing Methods 44 3ndash12

63 JayaprakashAD JabadoO BrownBD and SachidanandamR(2011) Identification and remediation of biases in the activity ofRNA ligases in small-RNA deep sequencing Nucleic Acids Res39 e141

64 YewCW and KumarSV (2012) Isolation and cloning ofmicroRNAs from recalcitrant plant tissues with small amounts oftotal RNA a step-by step approach Mol Biol Rep 391783ndash1790

65 KurthHM and MochizukiK (2009) 2rsquo-O-methylation stabilizesPiwi-associated small RNAs and ensures DNA elimination inTetrahymena RNA 15 675ndash685

66 EbhardtHA ThiEP WangMB and UnrauPJ (2005)Extensive 30 modification of plant small RNAs is modulated byhelper component-proteinase expression Proc Natl Acad SciUSA 102 13398ndash13403

67 WangJ CzechB CrunkA WallaceA MitrevaMHannonGJ and DavisRE (2011) Deep small RNA sequencingfrom the nematode Ascaris reveals conservation functionaldiversification and novel developmental profiles Genome Res 211462ndash1477

68 ZhuangF FuchsRT and RobbGB (2012) Small RNAexpression profiling by high-throughput sequencing implicationsof enzymatic manipulation J Nucleic Acids 2012 360358

69 FujimuraT and EstebanR (2012) Cap snatching of yeast L-Adouble-stranded RNA virus can operate in trans and requiresviral polymerase actively engaging in transcription J Biol Chem287 12797ndash12804

70 BessmanMJ WalshJD DunnCA SwaminathanJWeldonJE and ShenJ (2001) The gene ygdP associated withthe invasiveness of Escherichia coli K1 designates a Nudixhydrolase Orf176 active on adenosine (50)-pentaphospho-(50)-adenosine (Ap5A) J Biol Chem 276 37834ndash37838

71 DeanaA CelesnikH and BelascoJG (2008) The bacterialenzyme RppH triggers messenger RNA degradation by50 pyrophosphate removal Nature 451 355ndash358

72 BlewettN CollerJ and GoldstrohmA (2011) A quantitativeassay for measuring mRNA decapping by splinted ligation reversetranscription polymerase chain reaction qSL-RT-PCR RNA 17535ndash543

73 KrogerC DillonSC CameronAD PapenfortKSivasankaranSK HokampK ChaoY SittkaA HebrardMHandlerK et al (2012) The transcriptional landscape and smallRNAs of Salmonella enterica serovar Typhimurium Proc NatlAcad Sci USA 109 E1277ndashE1286

74 EnglandTE and UhlenbeckOC (1978) Enzymaticoligoribonucleotide synthesis with T4 RNA ligase Biochemistry17 2069ndash2076

75 SorefanK PaisH HallAE KozomaraA Griffiths-JonesSMoultonV and DalmayT (2012) Reducing ligation bias of smallRNAs in libraries for next generation sequencing Silence 3 4

76 ZhelkovskyAM and McReynoldsLA (2011) Simple andefficient synthesis of 50 pre-adenylated DNA using thermostableRNA ligase Nucleic Acids Res 39 e117

77 McCoyMI and GumportRI (1980) T4 ribonucleic acid ligasejoins single-strand oligo(deoxyribonucleotides) Biochemistry 19635ndash642

12 Nucleic Acids Research 2013

78 AlonS VigneaultF EminagaS ChristodoulouDCSeidmanJG ChurchGM and EisenbergE (2011) Barcodingbias in high-throughput multiplex sequencing of miRNA GenomeRes 21 1506ndash1511

79 SunG WuX WangJ LiH LiX GaoH RossiJ andYenY (2011) A bias-reducing strategy in profiling small RNAsusing Solexa RNA 17 2256ndash2262

80 Van NieuwerburghF SoetaertS PodshivalovaK Ay-LinWangE SchafferL DeforceD SalomonDR HeadSR andOrdoukhanianP (2011) Quantitative bias in Illumina TruSeq anda novel post amplification barcoding strategy for multiplexedDNA and small RNA deep sequencing PLoS One 6 e26969

81 VigneaultF Ter-OvanesyanD AlonS EminagaSC ChristodoulouD SeidmanJG EisenbergE and MChurchG (2012) High-throughput multiplex sequencingof miRNA Curr Protoc Hum Genet Chapter 11 Unit 1112 11-10

82 PakJ and FireA (2007) Distinct populations of primary andsecondary effectors during RNAi in C elegans Science 315241ndash244

83 ZhuYY MachlederEM ChenchikA LiR and SiebertPD(2001) Reverse transcriptase template switching a SMARTapproach for full-length cDNA library constructionBiotechniques 30 892ndash897

84 LevinJZ YassourM AdiconisX NusbaumCThompsonDA FriedmanN GnirkeA and RegevA (2010)Comprehensive comparative analysis of strand-specific RNAsequencing methods Nat Methods 7 709ndash715

85 AndoY BurroughsAM KawanoM de HoonMJ andHayashizakiY (2012) Targeted methods to improve small RNAprofiles generated by deep sequencing In MallickB andGhoshZ (eds) Regulatory RNAs Basics Methods andApplications Springer-Verlag Heidelberg Germany pp 253ndash271

86 KawanoM KawazuC LizioM KawajiH CarninciPSuzukiH and HayashizakiY (2010) Reduction of non-insertsequence reads by dimer eliminator LNA oligonucleotide forsmall RNA deep sequencing Biotechniques 49 751ndash755

87 KapranovP OzsolakF and MilosPM (2012) Profiling of shortRNAs using Helicos single-molecule sequencing Methods MolBiol 822 219ndash232

88 KanagawaT (2003) Bias and artifacts in multitemplatepolymerase chain reactions (PCR) J Biosci Bioeng 96 317ndash323

89 DayDJ SpeiserPW SchulzeE BettendorfM FitnessJBaranyF and WhitePC (1996) Identification of non-amplifyingCYP21 genes when using PCR-based diagnosis of 21-hydroxylasedeficiency in congenital adrenal hyperplasia (CAH) affectedpedigrees Hum Mol Genet 5 2039ndash2048

90 AirdD RossMG ChenWS DanielssonM FennellTRussC JaffeDB NusbaumC and GnirkeA (2011) Analyzingand minimizing PCR amplification bias in Illumina sequencinglibraries Genome Biol 12 R18

91 UllahMF and AatifM (2009) The footprints of cancerdevelopment cancer biomarkers Cancer Treat Rev 35 193ndash200

92 MoMH ChenL FuY WangW and FuSW (2012) Cell-freeCirculating miRNA Biomarkers in Cancer J Cancer 3 432ndash448

93 ZenK and ZhangCY (2012) Circulating microRNAs a novelclass of biomarkers to diagnose and monitor human cancersMed Res Rev 32 326ndash348

94 KimT and ReitmairA (2013) Non-Coding RNAs FunctionalAspects and Diagnostic Utility in Oncology Int J Mol Sci 144934ndash4968

95 PatnaikSK MallickR and YendamuriS (2010) Detection ofmicroRNAs in dried serum blots Anal Biochem 407 147ndash149

96 MitchellPS ParkinRK KrohEM FritzBR WymanSKPogosova-AgadjanyanEL PetersonA NoteboomJOrsquoBriantKC AllenA et al (2008) Circulating microRNAs as

stable blood-based markers for cancer detection Proc NatlAcad Sci USA 105 10513ndash10518

97 CieslaM SkrzypekK KozakowskaM LobodaAJozkowiczA and DulakJ (2011) MicroRNAs as biomarkers ofdisease onset Anal Bioanal Chem 401 2051ndash2061

98 OzsolakF (2012) Third-generation sequencing techniques andapplications to drug discovery Expert Opin Drug Discov 7231ndash243

99 TsongalisGJ CalinG CordelierP CroceC MonzonF andSzafranska-SchwarzbachAE (2013) MicroRNA analysis is itready for prime time Clin Chem 59 343ndash347

100 AmiraN Cancel-TassinG BernardiniS Cochand-PriolletBBittardH ManginP FournierG LatilA and CussenotO(2004) Expression in bladder transitional cell carcinoma by real-time quantitative reverse transcription polymerase chain reactionarray of 65 genes at the tumor suppressor locus 9q341-2identification of 5 candidates tumor suppressor genes Int JCancer 111 539ndash542

101 ZhangX (2007) Biomarker validation movement towardspersonalized medicine Expert Rev Mol Diagn 7 469ndash471

102 SchmittgenTD LeeEJ and JiangJ (2008) High-throughputreal-time PCR Methods Mol Biol 429 89ndash98

103 BenesV and CastoldiM (2010) Expression profiling ofmicroRNA using real-time quantitative PCR how to use it andwhat is available Methods 50 244ndash249

104 BustinSA and NolanT (2004) Pitfalls of quantitative real-timereverse-transcription polymerase chain reaction J Biomol Tech15 155ndash166

105 BeitzingerM and MeisterG (2011) Experimental identificationof microRNA targets by immunoprecipitation of Argonauteprotein complexes Methods Mol Biol 732 153ndash167

106 FiedlerSD CarlettiMZ and ChristensonLK (2010)Quantitative RT-PCR methods for mature microRNA expressionanalysis Methods Mol Biol 630 49ndash64

107 ArefianE KianiJ SoleimaniM ShariatiSA Aghaee-BakhtiariSH AtashiA GheisariY AhmadbeigiN Banaei-MoghaddamAM NaderiM et al (2011) Analysis ofmicroRNA signatures using size-coded ligation-mediated PCRNucleic Acids Res 39 e80

108 JonstrupSP KochJ and KjemsJ (2006) A microRNAdetection system based on padlock probes and rolling circleamplification RNA 12 1747ndash1752

109 LatorraD ArarK and HurleyJM (2003) Designconsiderations and effects of LNA in PCR primers Mol CellProbes 17 253ndash259

110 BustinSA BenesV GarsonJA HellemansJ HuggettJKubistaM MuellerR NolanT PfafflMW ShipleyGLet al (2009) The MIQE guidelines minimum information forpublication of quantitative real-time PCR experiments ClinChem 55 611ndash622

111 TrapnellC WilliamsBA PerteaG MortazaviA KwanGvan BarenMJ SalzbergSL WoldBJ and PachterL (2010)Transcript assembly and quantification by RNA-Seq revealsunannotated transcripts and isoform switching during celldifferentiation Nat Biotechnol 28 511ndash515

112 TrapnellC PachterL and SalzbergSL (2009) TopHatdiscovering splice junctions with RNA-Seq Bioinformatics 251105ndash1111

113 ListerR OrsquoMalleyRC Tonti-FilippiniJ GregoryBDBerryCC MillarAH and EckerJR (2008) Highly integratedsingle-base resolution maps of the epigenome in ArabidopsisCell 133 523ndash536

114 PolidorosAN PasentsisK and TsaftarisAS (2006) Rollingcircle amplification-RACE a method for simultaneous isolationof 50 and 30 cDNA ends from amplified cDNA templatesBiotechniques 41 35ndash42

Nucleic Acids Research 2013 13

Page 3: SURVEY AND SUMMARY Biases in small RNA deep sequencing data

RNA-tailing is of major importance to avoid bias (39)In vitro experiments also demonstrated that addition ofthree to six unpaired nucleotides might be sufficient torender the reaction secondary structure-independent

(38) However approximation of the minimal extensionlength needed to ameliorate RNA structure effectsshould not be generalized because it is likely to be RNAstructure-specific

Figure 1 Illustration of the steps involved in cDNA construction including potential sources of bias (A) The starting pool of non-protein codingRNAs with different 50- and 30-end modifications schematically indicated by different line types Abbreviations for the various modificationsOH hydroxyl OPO3 20-30-cyclic phosphate ppp triphosphate p monophosphate cap cap and 20-OCH3 20-O-methyl (B) The left paneldepicts different enzymatic pre-treatments prior to RNA 50-end ligation to enrich for different RNA subtypes From left to right RNA withoutany pretreatment (50-adapter ligation) RNA pretreated with tobacco acid phosphatase (TAP) RNA pretreated with TerminatorTM 50-phosphate-dependent exonuclease and TAP (Terminator 50-exo TAP) RNA treated with T4 polynucleotide kinase (T4 PNK) RNA classes accessible foradapter ligation after the respective 50-end pretreatments are schematically represented below each pretreatment The right panel depicts subtypes ofRNA classes accessible for 30-end tailing (-oligo(A) or ndasholigo(C) tailing) and adapter ligation (C) Possible biases associated with RNA 50-(left) and30-(right) end-modifications and with the subsequent steps of cDNA construction

Nucleic Acids Research 2013 3

RNA primary structure influences RNA 30-end tailingRecently to examine the influence of primary structureon RNA tailing efficacy three in vitro-transcribed testRNAs were assayed in comparative oligo(A)- andoligo(C)-tailing reactions (39) Interestingly after RNAdenaturation all test RNAs irrespective of primaryRNA structure performed equally well in the C-tailingassays and proceeded up to 95 completion as judgedby shifts in mobility (calculated as percent input) (39) Toexamine the influences of the 30-terminal nucleotide onRNA A-tailing RNAs identical except for the last nucleo-tide were examined in oligo(A)-tailing assays and up to2-fold changes in the efficacy of RNA modification werereported (36) The analysis also revealed that uridine atthe RNA 30-ends was the least preferred nucleotideHowever reaction kinetics indicated that even an RNAmolecule with an unfavorable 30-terminal uridine wasalmost completely modified if the reactions proceededlonger (36) Therefore increased incubation periodsmight compensate for substrate-specific variation inRNA 30-tailing efficacies Hence the available dataindicate that primary structure-related influences are lesslikely to bias sRNA-seq data if the protocol relies onRNA 30-end tailing However recent observations sug-gested that the RNA 50-end phosphorylation state influ-ences the resulting 30-end homopolymer tail length ofotherwise identical test RNAs (39)

RNA 30-end modification impacts tailing efficienciesThe 20-O-methylation of RNA 30-ends is also reported toreduce the efficacy of RNA tailing reactions (36) Theefficacy of oligo(A) addition can drop by as much as80 and even after extended incubation periods certainRNAs remain virtually unmodified (Figure 1) Therefore20-O-methylation at RNA 30-ends renders RNAs lessaccessible to RNA tailing (36) and such RNAs wouldeventually appear underrepresented in sRNA-seqNotably most plant miRNAs and vertebrate piRNAsare reported to be substrates for 20-O-methylation(4344) The drop in tailing efficacy and the correspondingdistortion in expression level might be particularlyrelevant when such RNAs are subjected to RNA tailingprocedures

RNA ligation

Today RNA ligase-catalyzed reactions are most oftenutilized to modify RNA termini for cDNA constructionThe genome of bacteriophage T4 harbors two differentRNA ligases T4Rnl1 and T4Rnl2 T4Rnl1 compensatesfor host defense mechanisms by sealing breaks introducedinto the anticodon loop of host tRNALys during infection(4546) T4Rnl2 has been detected more recentlyAlthough phylogenetically related to various classes ofDNA and RNA ligases the biological function ofT4Rnl2 remains unknown (334748) Both RNA ligasesas well as derivatives of T4Rnl2 are frequently utilized incDNA library construction In particular the truncatedversion of RNA ligase 2 has been reported to minimizeside reactions (4950) Variations in the compositions of

substrates and reagents during ligation are likely to intro-duce biases

RNA ligation-the reaction pathwayRNA ligation leads to ATP-dependent 30-50 phospho-diester bond formation Two molecular features aredistinguished (51) donor molecules that provide50-monophosphorylated RNA termini (5253) and ac-ceptors that contain 30-hydroxyl functional groups(52ndash54) (Figure 2) Two reaction intermediates areisolated suggesting a three-step reaction mechanismDuring the initial step of ligation accumulation ofmonoadenylated RNA ligase is observed (5556) Theenzyme-catalyzed adenosyl-transfer to the RNA ligaserequires phosphoramide bond formation and isaccompanied by pyrophosphate release (Figure 2) In thesecond step the AMP moiety is transferred to a50-monophosphorylated RNA donor molecule to form a50-adenylated RNA intermediate (50-AppRNA) (5253)(Figure 2) The 50-50 anhydride linkage is required toactivate donor molecules for the subsequent sealingreaction Finally an acceptor RNA 30-end hydroxylgroup (30-OH) attacks the alpha phosphorus of the 50-AppRNA donor molecule leading to covalent strandsealing and AMP release (5253) (Figure 2) The factthat pre-adenylated donors perform efficiently in reactionsconducted without ATP supplement provides formalproof of the reaction pathway and also is of major import-ance to practical applications (4957) The 50-App-modified adapters are now the reagents of choice tominimize side reactions (294957)

Side products in ligation reactionsA side reaction common to RNA ligation is the intramo-lecular RNA circularization (Figures 1 and 2) (29) Thereaction pathway is identical to the intermolecular ligationdiscussed above and is generally preferred due to closeproximity of the RNA ends Thus reactions that permitintra- and intermolecular products are dominated byintramolecular circularization (5158) It is of major im-portance to minimize side products in RNA ligation tomaintain reaction yields Generally the application of50-pre-adenylated adapter molecules in RNA 30-end liga-tions is presumed to abolish RNA circularization (5960)However the reversal of donor adenylation as a source ofcircularization was recently reported (293961) (Figure 2)In reactions performed with pre-adenylated donor mol-ecules 50-monophosphorylated RNAs were circularizedeven in the absence of ATP (2939) Remarkably forsome test RNAs the circularization constituted themajor product (39) Therefore preferred circularizationmight render certain RNAs inaccessible to cDNA con-struction The yield of circularization also depends onthe type of RNA-ligase enzyme utilized T4Rnl1-catalyzedreactions display the highest level of circularization (29)The tendency of T4Rnl1 to increase the amount of circularproduct might be explained in part by its biologicalfunction in intramolecular sealing reactions A C-termin-ally truncated version of T4Rnl2 displays minimal adenyl-transfer activity while maintaining the ability to form30 50 phosphodiester bonds (52) Ligase-mediated

4 Nucleic Acids Research 2013

RNA circularization depends on AMP removal from thepre-adenylated 50-App-adapter and its subsequent transferto 50-monophosphorylated RNAs therefore the applica-tion of truncated T4Rnl2 (T4Rnl2tr) to minimize this sidereaction is highly suggested (4950) In addition theexchange of lysine with glutamine (K227Q) in T4Rnl2trfurther reduced adenyl-transfer competence but retainedactivity to catalyze phosphodiester bond formation (48)Therefore the application of T4Rnl2 derivatives is bene-ficial for decreasing bias in sRNA-seq analysis (29)

Alternatively phosphatase pretreatment to generateRNA 50-OH termini that are inert to ligase-catalyzedadenylation helps to minimize side reactions The proced-ure ultimately requires the application of polynucleotidekinase to generate monophosphorylated RNA endsHowever tRNA rRNA and mRNA processing inter-mediates are reported to contain 50-hydroxylated termini(2962) Polynucleotide kinase treatment as a necessaryprerequisite to RNA 50-adapter ligation leads to the inclu-sion of abundant processing intermediates in cDNAlibraries The alternative techniques to avoid RNA circu-larization might distort cDNA representation and aretherefore likely to reduce the complexity of sRNA-seqexperiments (2962)

30-end ligation depends on RNAndashadapter combinationRNA 30-end ligation using T4Rnl1 was recently reportedas a potential source of bias in deep sRNA sequencing(39) Comparisons of the product yields of ligation reac-tions were made using three test RNAs and variousadapter sequences Depending on the specific RNAndashadapter combination the products differed remarkably

Similarly Jayaprakash et al (63) reported variations inproduct yields depending on specific RNAndashadapter com-binations in Rnl2tr-catalyzed reactions Because the RNA30-end ligation experiments were conducted with 50 pre-adenylated adapters the results also indicate thatadenyl-transfer is less likely the cause of the observedbias (63) RNA 30-adapter ligation was enhanced whenadapter pools displaying nucleotide variation at the two50 terminal positions were used (63) Therefore increasedvariability at adapter 50-ends enhances RNA ligation yieldand in general helps to minimize distortion in sRNA-seq(333963) RNA 50-phosphorylation also influences theefficacy of adapter ligation to RNA 30-ends (39) Theligation efficacies of T4 Rnl1-mediated reactions werecompared for test RNAs featuring different phosphoryl-ation states at RNA 50-ends and different adaptersincluding those with 50-App-modification UnexpectedlyRNAs that performed well when monophosphorylateddisplayed weak acceptor characteristics whentriphosphorylated Therefore the efficacy of RNA 30-endmodification might also be a function of the RNA 50-phos-phorylation state (Figure 1)

Ligation of adapters to modified RNAsIn ligation assays T4Rnl1 and T4Rnl2tr performedalmost identically using 30-unmodified test RNA sub-strates However the efficacy of T4Rnl1-catalyzed reac-tions dropped remarkably when 20-O-methylated RNAswere used (36) In contrast T4Rnl2tr performed almostequally well irrespective of RNA modification (36) Theadditions of 25 (wv) PEG 8000 to the T4Rnl2tr-mediated reactions significantly increased reaction yields

Figure 2 Schematic representation of the three-step mechanism of the bacteriophage T4 RNA ligase (T4Rnl) reaction with the potential sideproducts Encircled Arabic numbers indicate the order of ligation steps Step 1 adenylation of the T4Rnl active site Step 2 donor 50-adenylation(side reaction circularization of 30-end-unprotected donor molecules) Step 3 phosphodiester bond formation between 30-hydroxylated (OH) acceptorand donor molecules (side reaction reverse adenylation of donor and circularization of acceptor molecules) PPi pyrophosphate p monophosphateAMP adenosine monophosphate (Ap) App 50-adenylated termini

Nucleic Acids Research 2013 5

(5064) Hence additives that increase the effective mo-lecular concentration have a major influence on productyields and meta-analyses of data generated by differentligases or even in different buffer compositions are error-proneAs an aside specific enrichment of 30-end 20-

O-methylated RNAs like the piwi-interacting RNAs(piRNAs) or plant miRNAs is achieved by periodatetreatment The reaction relies on vicinal hydroxyl groupsto oxidize sugar moieties and is accompanied by pentosering opening The resulting vicinal dialdehyde reacts withwater and forms a chemical equilibrium with thecorresponding dioxane derivative upon ring closure The20-O-methyl group-containing RNAs are insensitive tooxidation and remain accessible for adapter ligation ortailing (65ndash67)

Impact of RNA secondary structure and adapter-RNAcofolding on ligationRecent analysis indicates that RNA secondary structureimpacts sRNA-seq bias (29) During kinetic profiling ofRNAndashadapter ligation test miRNAs were trapped in non-reactive states Denaturing the remaining non-ligatedRNAs resulted in increased ligation yield indirectly sug-gesting the influence of RNA secondary structure (29) Tofurther analyze the impact of secondary structure on RNA30-end ligation a pool of substrate RNAs with 21randomized nucleotides at their 30-ends was ligated withindividual pre-adenylated DNA adapters and reactionyields were subsequently quantified by deep sequencing(33) To avoid bias caused by template-specific RNA50-end modification(s) all test RNAs contained identical50-termini (33) RNAs that harbored stems at their 30-endswere underrepresented in the sequencing data whereasRNAs containing a minimum of three unstructured nu-cleotides at the 30-ends were overrepresented The dataindicate that RNA ligase-mediated reactions in generalare more efficacious with RNA substrates that harborstructurally accessible 30-ends (33)In addition the cofolding of RNA and adapter mol-

ecules impacts ligation (33) Adapter sequence variationthat interfered with the RNA 30-stem formation andpermitted intermolecular base-pairing increased ligationyields (33) Interestingly this data contradicts prioranalysis emphasizing a predominant primary sequence-related influence on data distortion (3363) Althoughthe actual source of discrepancy remains to be analyzedprotocol-specific variation probably due to changes in de-naturation is likely to have caused the observed difference(see supplementary information) Irrespective of the actualreason the advice for minimizing this bias is identical andrelies on the application of adapter pools to increase thelikelihood of productive interactions (33) Thus inaddition to the RNA nucleotide composition RNAndashadapter cofolding and secondary structure-related influ-ences contribute to sRNA-seq bias

RNA 50-end ligationThe chemical properties of RNA 50-ends display structuralvariations Depending on the starting material and experi-mental design it may be necessary to alter the chemistry

of RNA 50-termini prior to cDNA generation Forexample neither 50-triphosphorylated RNAs in prokary-otes nor primary eukaryotic RNA polymerase III tran-scripts are accessible for direct 5-adapter ligationSimilarly RNA polymerase II-specific trimethylguanosinecap structures render RNA 50-ends inaccessible Thusenzymatic pretreatment is usually conducted to avoidbias in the starting material (Figure 1 SupplementaryData and Supplementary Table S1)

Specific biochemistry of RNA 50-termini and enrichmentof various RNA classesAlthough the various chemical properties of RNA 50-endsestablish limitations for unbiased representation they alsopermit specific enrichments (41) (Figure 1 SupplementaryData and Supplementary Table S1) RNAs that containmonophosphorylated 50-ends and 30-hydroxyl groups aredirectly accessible to cDNA generation proceduresConsequently unaltered RNA starting material leadsto its specific enrichment (Figure 1 SupplementaryTable S1) In contrast triphosphorylated or 50-cappedRNAs require TAP treatment to generate 50-monophosphorylated RNAs and to permit 50-end ligation

Biochemical alternatives to specifically cleave pyrophos-phate bonds in 50-triphophorylated RNAs were reportedrecently (68) whereby TAP-mediated hydrolysis issubstituted by polyphosphatase (69) or pyrophospho-hydrolase (7071) treatment In vitro analysis to investigatethe substrate specificity of E coli rppH pyrophospho-hydrolase indicated that pyrophosphate hydrolysisproceeds with identical efficacy irrespective of the RNA50-terminal nucleotide (71) Furthermore a subtle declinein the enzymatic activity in response to potentially inhibi-tory RNA 50-terminal stem loop structures was reported(71) These strategies are particularly appealing when bac-terial primary transcripts or nascent eukaryotic RNA poly-merase III transcripts are central to the investigation Inaddition to targeting triphophorylated RNA 50-ends poly-phosphatase also accepts diphophorylated RNA 50-terminito catalyze pyrophosphate bond cleavage It should beemphasized that 50-capped RNA polymerase II transcriptsare not amenable to polyphosphatase or pyrophospho-hydrolase treatment and therefore escape identification

The application of a TerminatorTM 50-phosphate-dependent exonuclease is frequently utilized to enrich for50-capped and 50-triphospharylated RNAs (417273)The inherent 50 30 exonuclease activity digests50-monophosphorylated RNAs and leaves other tran-scripts unaltered (Figure 1 Supplementary Data andSupplementary Table S1) Therefore different chemistriesof RNA 50-ends not only distort sRNA-seq but also allowfor specific enrichment protocols Subsequent analysispermits the distinction among RNA subsets such asprimary transcripts capped RNAs and other productsof RNA 50-end processing (417273)

Nucleotide preference of RNA 50 ligationApart from specific differences in RNA 50-ends the influ-ence of the adapter sequence itself on ligation efficacy iswell documented Biochemical data indicated that ligationreactions proceeded most effectively when an adenosine

6 Nucleic Acids Research 2013

residue occupied the 30-terminal position of adapter(acceptor) molecules (5174) Recently adapter pools dis-playing sequence identity at all except the two 30-terminalnucleotides were used to investigate sequence-dependentvariation in acceptor function Based on deep sequencingas the readout the relative ranking of miRNA frequenciesvaried as a consequence of the specific adapterndashRNA com-binations (63) Once more miRNA expression profiles arenot only a function of true RNA abundance but alsoreflect varying ligation efficacies (63) Interestingly theuse of adapter pools highly improved the correlation ofmiRNA levels between replicates (63)

Influence of RNA secondary structure on RNA 50-endligationTo investigate potential influences on 50-end modificationderived from RNA secondary structure ligation yieldswere compared among test sets of fully randomizedRNAs A remarkable underrepresentation of double-strand folds at 50-ligation sites indicated strong preferencesfor single-stranded RNA substrates in ligation reactionsconducted with T4Rnl1 (75)

Sugar moiety-dependent change in ligation efficacyDNA-based adapter oligonucleotides performwell in RNA30-end ligation reactions (especially when 50-adenylated)whereas RNA-based adapters are preferred for RNA50-end modification (76) Kinetic analysis of DNA-basedadapter molecules for RNA 50-ligation demonstratedslower reactions compared with their RNA counterparts(77) Recent analyses revealed more complex substrate-specific effects between test RNAs and different adaptersWhen DNA- and RNA-based adapters of identical se-quences were investigated in ligation assays no generalpreference with regard to the sugar moiety of the adaptermolecule was observed in RNA 50-end ligation (3963) Inconclusion to avoid bias adapter pools are recommendedto display sequence variation The chance of productiveligation might be enhanced using RNA DNA or mixedadapter oligonucleotides (3963)

Adapter barcoding influences ligationIn addition to nucleotide preferences of the reactingresidues upstream sequences also exert influence onligation yields Adapters harboring different Roche MIDtags placed 19 nucleotides upstream from the reacting30-ends performed entirely differently when assayed withidentical test RNAs (39) Depending on the combinationof in vitro transcribed non-protein coding RNAs andMID-modified adapters ligation product levels (calculatedas non-protein coding RNA input) ranged from 60ndash70to undetectable (39) Recent reports emphasized thatbarcoding for multiplex sequencing leads to significantbias in miRNA expression profiles (78ndash80) This bias wasdiminished when barcodes were introduced by PCRamplification of cDNA (78ndash81)

Quantitative considerations and barcodingIn general the discovery of novel RNAs and analyses oftranscripts that are expressed at very low levels requiresubstantially more cDNA reads than experiments

directed toward investigating differential gene expressionsHence the design of sRNA-seq experiments needs totake into account the rationale of the investigationContigs formed by continuously overlapping readsduring genomic mapping procedures represent thebio-computational equivalent of individual RNAentities Any selected read during cDNA mapping eithercontributes to pre-existing already populated RNAs ormight generate novel contigs The probability of eitherprocess is a function of RNA size as well as the corres-ponding RNA expression value Of course contigs repre-senting longer rather than shorter RNAs are more likelyto be populated by chance RNAs of higher relative ex-pression levels are generally more densely packed withcDNA reads than RNAs that display medium or low tran-scription levels Consequently if multiplexing is chosenfor economic reasons the necessary sequencing depthmust be approximated prior to sequence runs in order togenerate adequate sample sizes The two fixed parametersfor estimating admissible multiplexing are therefore (i) ex-perimental goal and (ii) lane capacity According toIllumina TruSeq Small RNA Sample Prep Kitguidance 5 million sRNA-seq reads are considered appro-priate for novel miRNA discovery within human-sizedgenomes For miRNA profiling usually about 25 millionreads are recommended These approximations permit re-calculation geared toward specific needs and requirements

Alternative cloning strategies to avoid 50-end modificationBecause the 30-terminal position of cDNA corresponds tothe RNA 50-ends Pak et al (82) used a technique to avoidmodification of RNA 50-termini and therefore to minimizethe need for enzymatic pretreatment The strategy intro-duces adapter sequences not to RNA 50-termini butrather to 30-ends of the corresponding cDNAs (82) Theadvantage of omitting enzymatic treatment of RNA50-ends might be offset by incomplete reverse transcriptionthat precludes full-length cDNA cloning or appropriateassessment of RNA 50-endsRecently ligation to cDNA 30-ends was investigated in

more detail (36) Perfect double-stranded DNA-RNA testhybrids and single-stranded DNAs were utilized to mimicpotential products of reverse transcription All T4 DNA-and RNA-ligases including T4Rnl2 truncated derivativeswere compared to delineate optimal conditions for cDNA30-end modification Surprisingly ligation reactionscarried out with pre-adenylated DNA donor moleculesstrongly indicated that double-stranded DNA-RNAchimera constituted the most reactive substrates forT4Rnl1-mediated ligation In addition supplements ofPEG 8000 strikingly influenced ligation final concentra-tions of 25 PEG 8000 increased reaction yields Identicalsingle-stranded test cDNAs exhibited only poor substratecharacteristics The A-form-like conformation of theDNA-RNA hybrid might have compensated for theprimary structure-related influences and permitted moresimilar ligation outcomes among substrates (36)However the analysis examined only one test substrateand therefore might not be amenable to drawing generalconclusions (36) A further interesting observation indi-cates that cDNA 50-terminal nucleotides influence cDNA

Nucleic Acids Research 2013 7

30-end ligation yields (39) As the cDNA 50-terminalsequence is a consequence of specific RNA 30-end modifi-cation differences between protocols exert different influ-ences on cDNA 30-end adapter ligation and in turn mightalso cause expression profile bias (39)The lsquoSmartTM Approachrsquo (83) offers another interesting

alternative to capture mature RNA 50-ends and to avoidRNA 50-end adapter ligation The technology is based onthe template switching capacity of Moloney murineleukemia virus (M-MLV) reverse transcriptase Oncefirst strand synthesis approaches the RNA 50-end theterminal transferase activity of M-MLV reverse transcript-ase adds oligo(C)-stretches to the growing cDNAOligonucleotides that 30-terminate with oligo(G) stretchesserve to extend templates and permit reverse transcriptionto proceed Thus M-MLV reverse transcriptase enablesthe generation of cDNAs with known terminal sequenceelements (83) and thereby eliminates the need to modifythe RNA 50-ends Recently the technology was adapted tothe specific needs of Ilumina RNA-seq analysis (84)

Adapterndashadapter ligation disturbs sRNA-seqA side effect common to sRNA-seq protocols that rely onligase-mediated sequential RNA end-modification is PCRamplification of empty adapterndashadapter ligation products(8185) Adapterndashdimer formation is a direct consequenceof the molecular adapter excess employed to ensure suffi-cient product yield and might dominate in reactions wherethe concentration of total RNA starting material is low(8186) Adapter molecules that are 50-pre-adenylatedremain in reaction mixtures after RNA 30-end modifica-tion and react with RNA 50-adapter oligonucleotideswhich subsequently favors dimer formation Thereforepurifying products away from excess adapters via gelelectrophoresis is recommended (78) However for prep-aration of tiny RNAs such as miRNAs separation of thesmall size difference between ligated product and adaptersis not trivial Alternative approaches to avoid size frac-tionation rely on application of complementary lsquodimer-eliminator-LNA-oligonucleotidesrsquo (8586) The interferingoligonucleotide is designed to overlap the adapterndashadapterligation junction and thereby distinguish between targetedside product and adapter ligated RNA (86) The comple-mentarity of the interfering oligonucleotides extends intothe RNA 30-adapter sequence thus blocking primerannealing during reverse transcription and as a resultexcludes adapter dimers from RNA sequencing (86)Vigneault et al reported a similar strategy to overcomeunintended dimer amplification (81) Pre-annealing of theRT primer after RNA 30-adapter ligation but prior toRNA 50-end modification efficiently decreased adapterdimer formation The excess of non-ligated free RNA30-adapters was captured by complementary reverse tran-scriptase primers and sequestered in less-reactive double-stranded structures (81) The latter approach permitseffective reduction of dimer amplification and is reportedto increase cDNA library complexity (81) ObviouslyRNA tailing strategies (see above) are superior becausethey completely circumvent the problem of adapter-dimer formation To avoid adapter tailing as an analogous

side reaction to dimer formation it is advisable to carryout 30-tailing prior to 50-adapter ligation

Depending on the experimental goal sRNA-seqrequires varying amounts of cDNA reads (see above)Therefore techniques to reduce the amount of readsthat represent empty adapter dimers are of central import-ance as they permit an increase in sequencing depthreducing costs and ultimately enabling novel smallnon-protein coding RNA gene discovery

PCR AMPLIFICATION BIAS

Apart from novel third-generation sequencingtechnologies that permit direct sequence analysis of un-amplified cDNA all commonly used protocols implementPCR amplification (87) In addition the introduction ofbarcode identifiers by PCR amplification may also avoidbias introduced by differences in RNAndashadapter-specificligation efficacies (78) Therefore multiplexing of differentsamples during PCR is widely regarded to permit meta-analysis (78) However even PCR amplification itself isknown to introduce expression bias into datasets(303288) In particular when many different templatesare amplified in parallel variations not only in templatelength but also in base composition might lead topreferred template-specific amplifications (30) Extremeexamples of allelic dropout due to single base-exchangewere reported for PCR-mediated genotyping (89) TheRNA GC-content is especially responsible for data dis-tortion Generally cDNAs of high GC-content are lessreadily amplified and thus remain underrepresented Forexample read distributions calculated over the entire 28SrRNA were inversely correlated with the correspondingregional GC-content (32) As argued earlier base com-positions of different small non-protein coding RNAs(miRNAs piRNAs snoRNAs scaRNAs etc) varywidely The parallel amplification of complex templatemixtures that vary in their respective GC-content is there-fore a challenging procedure and likely to influence thecomplexity of the resulting cDNA library (30) Changesin complexity do not only influence distortion of quanti-tative measurements but certain RNAs might also lsquodropoutrsquo completely and remain unidentified Extended periodsof denaturation are considered to effectively decrease thelikelihood of uneven amplification (90) PCR amplifica-tion does not only change the overall GC-content of theresulting cDNA pool but is also reported to introducelength bias (30) which might be an important factorwhen RNAs of different sizes are subjected to parallelcDNA preparations Shorter sequences of significantlylower GC-content are likely to be overrepresented

Recent analysis also emphasized the impact of PCRbuffer compositions on PCR amplification bias (30)Remarkably PCR-mediated bias was already observedin some cases within 15 rounds of PCR amplification(30) While the popular Phusion PCR system surprisinglyperformed the worst (30) data indicate that someoptimized PCR buffers are capable of significantlyreducing PCR-mediated distortion (3090)

8 Nucleic Acids Research 2013

sRNA-SEQ AND PERSONALIZED MEDICINE

In recent years increasing attention has focused on thepotential of miRNA transcriptional profiling to diagnosedisease or to monitor progression and success in treatment(161791) The decreasing costs of high-throughputsequence analysis favors sRNA-seq as a common toolfor medical diagnostics and personalized medicine Aftercell death the total RNA of circulating cancer cells isreleased into the blood stream (92) These extracellularRNAs (exRNAs) are packaged within vesicles such asexosomes where they are complexed with differentproteins for example miRNAs in the RISC complexVesicles protect exRNA from ribonucleases and preventdegradation Among the various exRNAs miRNAs aresome of the most stable molecules (93) Easy accessibilityof patient blood samples and the stability of miRNAsmake them effective biomarkers (94) Diagnostics basedon miRNA were successfully conducted on biologicalsamples of low quality including dried blood (95) Inaddition miRNAs remain stable even when samples aresubjected to repeated rounds of tissue freezing andthawing which usually degrades RNAs (96) A furtherappealing feature of miRNAs in medical research is thatchanges in their expression profiles are observed duringdisease onset (97) In addition miRNAs often displaytissue-specific expression Therefore it might be possibleto identify the cellular origin of most metastases (91)possibly providing decision guidance for appropriatetreatment However as elaborated in detail miRNA ex-pression profiles detected by deep sequencing do not onlyreflect changes in relative expression but are alsoinfluenced by miRNA-adapter-specific ligation efficaciesespecially in cost-effective parallel analyses that oftenrequire multiplexing Different barcode introduced byligation generate bias in miRNA representation and abun-dance (78) Even when incorporated by PCR barcodingmight still lead to data distortion (30ndash32) Third-gener-ation sequencing techniques such as the Helicos (InNovember 2012 Helicos filed bankruptcy (httpbizyahoocome121115hlcs8-khtml)) system avoid PCRamplification steps in cDNA preparation and permitdirect sequencing of unamplified samples (87) The RNA30-end modification to prime cDNA synthesis (or directsequencing) in addition to relatively high error frequencies(15) might still lead to bias and so would not beespecially applicable in personalized medicine (98)Collectively deep sequencing to monitor change inmedical samples is of importance and certainly ofinterest but requires resolution of serious standardizationissues prior to general application

qRT-PCR a valid alternative to deep sequencing

Alternative techniques to quantify specific miRNA testsets rely on quantitative real-time reverse transcriptionPCR (qRT-PCR) (99) In particular diagnostic screensin cancer medicine rely on the application of qRT-PCR(99ndash101) In cases where biomarker disease correlation iswell established and when quantification of limitednumbers of biomarkers is sufficient for diagnosis and tomonitor disease progression or treatment qRT-PCR is

a valuable tool (99) Recently the convenience ofqRT-PCR-based medical diagnosis was suggested tomake it superior in routine clinical applications (99)Interestingly novel developments in qRT-PCR allowsample processing in 384-well formats and liken thismethod to cost-effective high-throughput applications(99102) A further advantage of qRT-PCR is the reduc-tion in downstream bio-computation (99103) The timebetween qRT-PCR and diagnostic result is shorter thanfor sRNA-seq or microarray-based inference Howeverthe analysis of miRNA expression profiles by qRT-PCRis technically challenging due to RNA-inherent pitfalls(103104)The miRNAs lack structural features such as mRNA

poly(A) tails that could serve as class-specific enrichmentor priming of reverse transcription (103) Alternative tech-niques for miRNA sample enrichment would rely onimmune-precipitation but most of them are not suitablefor typical small size samples in medical analysis (105) Asmentioned above the heterogeneity of the RNA GC-content translates into a wide range of primer annealingtemperatures for a population of miRNAs and mighttherefore complicate parallel analysis of multiple RNAs(103) Furthermore the qRT-PCR template is presentnot only within the mature miRNA molecule but also ispart of various precursor transcripts (103) It is thereforeimportant to ensure that primary or precursor-miRNAs(pri- or pre-miRNAs) do not provide templates inqRT-PCR amplification and as a consequence do notcontribute to the signal intensities detected Difficultiesthat relate to the similarity displayed between differentmiRNA isoforms are mentioned above within thecontext of array-based inference to quantify differentialgene expressionThus far two alternative approaches are commonly

utilized to reverse transcribe miRNA templates(103106) miRNA-specific stem-loop primers are oftenused when only limited numbers of RNAs are subject toquantification A major advantage of using miRNA-specific primers for reverse transcription is an increase insensitivity and specificity However RNA 30-end-tailing-based approaches enable bulk reverse transcription ofmultiple targets in parallel and are therefore advantageouswhen many miRNAs are analyzed (see above) A recentcomparative analysis to investigate reverse transcriptionefficacies of the two methods uncovered only minor dif-ferences (106) The analysis involved only a limitedmiRNA test set and therefore the results cannot begeneralized As an aside the oligo d(T)-based designmight also enable reverse transcription of miRNA andthe corresponding mRNA target in single test tube reac-tions (106) Ligation-mediated PCR or padlock probes incombination with rolling circle amplification are technicalalternatives for avoiding reverse transcription (107108)The methods rely on ligation-mediated template gener-ation and are therefore not devoid of bias (see above)Both approaches permit generation of miRNA signaturesacross samples and do not require elaborate analyticalpost-processing or expensive technical equipment(107108) However as elaborated here and abovereverse transcription is not devoid of template-specific

Nucleic Acids Research 2013 9

bias In particular RNA secondary structure and RNAmodifications both influence reverse transcription efficacy(see above) Similar to PCRs reverse transcription is nega-tively affected by increasing RNA GC-contentBecause the relative RNA GC-content dictates primer

Tm qRT-PCRs are also strongly influenced by forwardprimer design thereby affecting both qRT-PCR specificityand sensitivity Altering the length of primer-templatecomplementarity permits adjustment of the actualprimer Tm Conversely in cases where RNA nucleotidecomposition would dictate very low melting temperaturesLNA-modified nucleotides might be incorporated into theprimer sequence to increase the corresponding primer Tm(18) However the design of LNA-containing oligonucleo-tides is not always trivial (109) For the actual measure-ment of miRNA in real time two alternative fluorescentreporter molecules are frequently utilized SYBR Greenand the Taqman assay (103) Detection of the fluores-cence signal emitted by SYBR Green increases about100-fold as the molecule intercalates in double-strandedDNA However the mechanism of DNA intercalationsuggests the dye is not target-specific and thereforecannot discriminate primer dimers unspecific productsand target amplification (103) Because the intercalatingdye is simply added to the assay it is often possible toeasily adapt the qRT-PCR assay to already establishedand validated analytical PCRs (104) In comparison theTaqman assay provides increased specificity as the de-tection is based on sequence-specific probendashtarget inter-action Certainly the qRT-PCR assays offer manyadvantages but require evaluation of various factors thatinfluence and potentially bias analysis Thereforeinternational standardization to ensure comparable ex-perimental design and to permit meta-analysis amongdatasets is of prime importance for establishing clinicalqRT-PCR-based personal medicine The MinimumInformation for Publication of Quantitative Real-TimePCR experiments guidelines established standards for bet-ter experimental practice and suggest routes to data ana-lysis that increase reliability and data comparison (110)

mRNA-SEQ

Deep sequencing is not limited to expression profiling ofmiRNA or other small non-protein coding RNAs Evenentire transcriptomes including all protein coding andlong non-protein coding RNAs are subjected to high-throughput sequencing methods Currently as the lengthof single sequence runs is limited (from 100 to 500 nt de-pending on the sequencing platform) longer RNAsrequire fragmentation prior to cDNA synthesis (25)Millions of small reads resulting from mRNA-seq areanalyzed to identify novel transcripts or splice variants(111112) Relative quantification of reads enables one tomonitor differential gene expression among samples andto identify changes in isoform distribution (111) As withsRNA-seq the digital nature of RNA deep sequencing isthought to simplify meta-analysis of mRNA or long non-protein coding RNA expression data (25) Some protocolsfor modification of RNA fragment ends rely on

chemistries identical to sRNA-seq (84113) Therefore ex-pression biases observed in sRNA-seq analysis are alsolikely to cause distortion in whole transcriptome profilingVariations in nucleotide composition and changes in RNAsecondary structure of different RNA fragments mightcorrelate with uneven read distribution Unequal readcoverage generates uncertainty in isoform detection andin extreme cases exons representing unfavorable nucleo-tide compositions might escape analysis As fragmentationis suggested to generate many different but at least par-tially overlapping reads bias in mRNA-seq might be lessprominent Recently a novel technique based on circligasewas introduced to avoid RNA 50-end ligation (114)Reverse transcription is conducted with anchoredoligo(dT) primers that contain an additional sequence topermit later PCR amplification Following first strandcDNA synthesis circligase-mediated ligation is carriedout The resulting circular molecules are subjected toPCR amplification The strategy is of general interestand might be a valuable tool for sRNA-seq as well

CONCLUSION

Bias in general describes systematic errors that reflectmethod-related distortion from the truth Bias might bedetected easily when data generated by different methodsare subjected to analytical comparison Because it is sys-tematic in nature bias is not subject to variation inrepeated experiments Here based on biochemical anddeep sequencing data analyses we have presentedvarious steps in sRNA-seq protocols likely to causesevere distortions in the relative expression levels of indi-vidual sRNAs

The efficacy of RNA end-modification to permit cDNAconstruction depends not only on differences in strategiesand reagents but also on changes in among others buffercompositions and additives that could increase molecularcrowding Therefore meta-analysis of data generated withnon-identical methods reagents or even buffers is likely tobe error-prone (36396378) To achieve comparableresults and to evaluate relative changes in the quantifica-tion of the resulting data experiments and analyses shouldbe conducted under strictly identical conditions It has tobe stressed that bias in RNA-specific variation is not onlyrestricted to inter-sample comparison but also effects therelative ranking of intra-sample RNA expression

In addition to the more obvious quantitative aspects ofbias there are also qualitative aspects because RNA-seqmethods are also employed for RNA discovery IndividualRNAs might lsquodrop outrsquo from detection because they arenot amenable to cDNA construction under chosen condi-tions Certainly to increase the likelihood of productiveRNA end-modification and increase the complexity ofsRNA-seq library content the parallel application of dif-ferent strategies and reagents is required (33396379) Inparticular increased variability of adapter sequences (ieadapter pools) helps to increase the diversity of RNAsaccessed (63) A recent report emphasized the need forparallel application of different RNA- and DNA-basedadapter oligonucleotides (39)

10 Nucleic Acids Research 2013

The source of expression level distortion and the failureto detect certain RNA species at all appear not to belimited to reactions of RNAcDNA-end modificationbut also apply to cDNA amplification by PCR (30ndash32)Differences in reaction buffers RNA GC-content sec-ondary structures RNA length and primers might eachlead to bias during PCR Therefore the analysis ofsRNA-seq data to examine relative changes in geneexpression or identification reflects both true RNA abun-dance and biases related to the methods applied

The sRNA-seq approach is also attractive to medicalresearch Deep sequencing might permit the screening ofmultiple patient samples to monitor the progression ofdisease or treatment Therefore it is urgently necessaryto establish rigid international standardizations tominimize distortion and to avoid misleading conclusionsduring RNA-seq data interpretation In summary quali-tative and quantitative RNomics remain more challengingthan anticipated

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

FUNDING

National Genome Research Network [NGFNIII01GS0808 to JB and TSR] the Deutsche Forschungs-gemeinschaft [BR 7544-1 to JB] USM ResearchUniversity Grant [1001CIPPT81305 to THT]Funding for open access charge The licence was fundedby Dr Brosiusrsquos research budget

Conflict of interest statement None declared

REFERENCES

1 BrosiusJ (2009) The fragmented gene Ann N Y Acad Sci1178 186ndash193

2 BrosiusJ and TiedgeH (2004) RNomenclature RNA Biol 181ndash83

3 LiuQ and ParooZ (2010) Biochemical principles of small RNApathways Annu Rev Biochem 79 295ndash319

4 BrosiusJ (2012) In ErdmannVA and BarciszewskiJ (eds)From Nucleic Acids Sequences to Molecular Medicine RNATechnologies Springer Berlin Heidelberg pp 1ndash18

5 HuttenhoferA BrosiusJ and BachellerieJP (2002) RNomicsidentification and function of small non-messenger RNAs CurrOpin Chem Biol 6 835ndash843

6 AnS and SongJJ (2011) The coded functions of noncodingRNAs for gene regulation Mol Cells 31 491ndash496

7 BushatiN and CohenSM (2007) microRNA functions AnnuRev Cell Dev Biol 23 175ndash205

8 ChiangHR SchoenfeldLW RubyJG AuyeungVCSpiesN BaekD JohnstonWK RussC LuoS BabiarzJEet al (2010) Mammalian microRNAs experimental evaluation ofnovel and previously annotated genes Genes Dev 24 992ndash1009

9 AmbrosV (2001) microRNAs tiny regulators with greatpotential Cell 107 823ndash826

10 LauNC LimLP WeinsteinEG and BartelDP (2001) Anabundant class of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 294 858ndash862

11 RuvkunG (2001) Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash799

12 RuvkunGB (2003) The tiny RNA world Harvey Lect 991ndash21

13 GurtanAM and SharpPA (2013) The Role of miRNAs inRegulating Gene Expression Networks J Mol Biol 4253582ndash3600

14 LiangH ZhangJ ZenK ZhangCY and ChenX (2013)Nuclear microRNAs and their unconventional role in regulatingnon-coding RNAs Protein Cell 4 325ndash330

15 HuangV and LiLC (2012) miRNA goes nuclear RNA Biol 9269ndash273

16 WangZ HanJ CuiY FanK and ZhouX (2013) CirculatingmicroRNA-21 as noninvasive predictive biomarker for response incancer immunotherapy Med Hypotheses 81 41ndash43

17 ZhaoC ZhangJ ZhangS YuD ChenY LiuQ ShiMNiC and ZhuM (2013) Diagnostic and biological significance ofmicroRNA-192 in pancreatic ductal adenocarcinoma Oncol Rep30 276ndash284

18 van RooijE (2011) The art of microRNA research Circ Res108 219ndash234

19 LeeLW ZhangS EtheridgeA MaL MartinD GalasDand WangK (2010) Complexity of the microRNA repertoirerevealed by next-generation sequencing RNA 16 2170ndash2180

20 OkoniewskiMJ and MillerCJ (2006) Hybridization interactionsbetween probesets in short oligo microarrays lead to spuriouscorrelations BMC Bioinformatics 7 276

21 MortazaviA WilliamsBA McCueK SchaefferL andWoldB (2008) Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 5 621ndash628

22 LiuS LinL JiangP WangD and XingY (2011) Acomparison of RNA-Seq and high-density exon array fordetecting differential gene expression between closely relatedspecies Nucleic Acids Res 39 578ndash588

23 RaghavachariN BarbJ YangY LiuP WoodhouseKLevyD OrsquoDonnellC MunsonPJ and KatoG (2012) Asystematic comparison and evaluation of high density exon arraysand RNA-seq technology used to unravel the peripheral bloodtranscriptome of sickle cell disease BMC Med Genomics 5 28

24 MartinJA and WangZ (2011) Next-generation transcriptomeassembly Nat Rev Genet 12 671ndash682

25 WangZ GersteinM and SnyderM (2009) RNA-Seq arevolutionary tool for transcriptomics Nat Rev Genet 1057ndash63

26 KozomaraA and Griffiths-JonesS (2011) miRBase integratingmicroRNA annotation and deep-sequencing data Nucleic AcidsRes 39 D152ndashD157

27 LinsenSE de WitE JanssensG HeaterS ChapmanLParkinRK FritzB WymanSK de BruijnE VoestEE et al(2009) Limitations and possibilities of small RNA digital geneexpression profiling Nat Methods 6 474ndash476

28 BakerM (2010) MicroRNA profiling separating signal fromnoise Nat Methods 7 687ndash692

29 HafnerM RenwickN BrownM MihailovicA HolochDLinC PenaJT NusbaumJD MorozovP LudwigJ et al(2011) RNA-ligase-dependent biases in miRNA representationin deep-sequenced small RNA cDNA libraries RNA 171697ndash1712

30 DabneyJ and MeyerM (2012) Length and GC-biases duringsequencing library amplification a comparison of variouspolymerase-buffer systems with ancient and modern DNAsequencing libraries Biotechniques 52 87ndash94

31 OrpanaAK HoTH and StenmanJ (2012) Multiple heatpulses during PCR extension enabling amplification of GC-richsequences and reducing amplification bias Anal Chem 842081ndash2087

32 SendlerE JohnsonGD and KrawetzSA (2011) Local andglobal factors affecting RNA sequencing analysis Anal Biochem419 317ndash322

33 ZhuangF FuchsRT SunZ ZhengY and RobbGB (2012)Structural bias in T4 RNA ligase-mediated 30-adapter ligationNucleic Acids Res 40 e54

34 DeChiaraTM and BrosiusJ (1987) Neural BC1 RNA cDNAclones reveal nonrepetitive sequence content Proc Natl AcadSci USA 84 2624ndash2628

Nucleic Acids Research 2013 11

35 WickensM and KwakJE (2008) Molecular biology A tail talefor U Science 319 1344ndash1345

36 MunafoDB and RobbGB (2010) Optimization of enzymaticreaction conditions for generating representative pools of cDNAfrom small RNA RNA 16 2537ndash2552

37 SarkarN (1997) Polyadenylation of mRNA in prokaryotesAnnu Rev Biochem 66 173ndash197

38 Yehudai-ResheffS and SchusterG (2000) Characterization of theEcoli poly(A) polymerase nucleotide specificity RNA-bindingaffinities and RNA structure dependence Nucleic Acids Res 281139ndash1144

39 RaabeCA HoeCH RandauG BrosiusJ TangTH andRozhdestvenskyTS (2011) The rocks and shallows of deep RNAsequencing Examples in the Vibrio cholerae RNome RNA 171357ndash1366

40 RaabeCA SanchezCP RandauG RobeckT SkryabinBVChinniSV KubeM ReinhardtR NgGH ManickamRet al (2010) A global view of the nonprotein-codingtranscriptome in Plasmodium falciparum Nucleic Acids Res 38608ndash617

41 SharmaCM HoffmannS DarfeuilleF ReignierJ FindeissSSittkaA ChabasS ReicheK HackermullerJ ReinhardtRet al (2010) The primary transcriptome of the major humanpathogen Helicobacter pylori Nature 464 250ndash255

42 SharmaCM and VogelJ (2009) Experimental approaches forthe discovery and characterization of regulatory small RNACurr Opin Microbiol 12 536ndash546

43 YuB YangZ LiJ MinakhinaS YangM PadgettRWStewardR and ChenX (2005) Methylation as a crucial step inplant microRNA biogenesis Science 307 932ndash935

44 HouwingS KammingaLM BerezikovE CronemboldDGirardA van den ElstH FilippovDV BlaserH RazEMoensCB et al (2007) A role for Piwi and piRNAs in germcell maintenance and transposon silencing in Zebrafish Cell 12969ndash82

45 AckermannHW (2003) Bacteriophage observations andevolution Res Microbiol 154 245ndash251

46 RossmannMG MesyanzhinovVV ArisakaF andLeimanPG (2004) The bacteriophage T4 DNA injectionmachine Curr Opin Struct Biol 14 171ndash180

47 HoCK and ShumanS (2002) Bacteriophage T4 RNA ligase 2(gp241) exemplifies a family of RNA ligases found in allphylogenetic domains Proc Natl Acad Sci USA 9912709ndash12714

48 YinS HoCK and ShumanS (2003) Structure-function analysisof T4 RNA ligase 2 J Biol Chem 278 17601ndash17608

49 AravinA and TuschlT (2005) Identification and characterizationof small RNAs involved in RNA silencing FEBS Lett 5795830ndash5840

50 ViolletS FuchsRT MunafoDB ZhuangF and RobbGB(2011) T4 RNA ligase 2 truncated active site mutants improvedtools for RNA analysis BMC Biotechnol 11 72

51 UhlenbeckOC and GumportRI (1982) T4 RNA LigaseIn BoyerPD (ed) The Enzymes Nucleic Acids Part BAcademic Press Inc New York pp 31ndash58

52 HoCK WangLK LimaCD and ShumanS (2004) Structureand mechanism of RNA ligase Structure 12 327ndash339

53 El OmariK RenJ BirdLE BonaMK KlarmannGLeGriceSF and StammersDK (2006) Molecular architectureand ligand recognition determinants for T4 RNA ligase J BiolChem 281 1573ndash1579

54 StarkMR PleissJA DerasM ScaringeSA and RaderSD(2006) An RNA ligase-mediated method for the efficient creationof large synthetic RNAs RNA 12 2014ndash2019

55 CranstonJW SilberR MalathiVG and HurwitzJ (1974)Studies on ribonucleic acid ligase Characterization of anadenosine triphosphate-inorganic pyrophosphate exchangereaction and demonstration of an enzyme-adenylate complex withT4 bacteriophage-induced enzyme J Biol Chem 2497447ndash7456

56 HigginsNP GeballeAP SnopekTJ SuginoA andCozzarelliNR (1977) Bacteriophage T4 RNA ligase preparationof a physically homogeneous nuclease-free enzyme fromhyperproducing infected cells Nucleic Acids Res 4 3175ndash3186

57 PfefferS Lagos-QuintanaM and TuschlT (2005) Cloning ofsmall RNA molecules Curr Protoc Mol Biol Chapter 26 Unit26 24

58 BruceAG and UhlenbeckOC (1978) Reactions at the terminiof tRNA with T4 RNA ligase Nucleic Acids Res 5 3665ndash3677

59 VigneaultF SismourAM and ChurchGM (2008) EfficientmicroRNA capture and bar-coding via enzymatic oligonucleotideadenylation Nat Methods 5 777ndash779

60 LiuJM LivnyJ LawrenceMS KimballMD WaldorMKand CamilliA (2009) Experimental discovery of sRNAs in Vibriocholerae by direct cloning 5StRNA depletion and parallelsequencing Nucleic Acids Res 37 e46

61 KrugM and UhlenbeckOC (1982) Reversal of T4 RNA ligaseBiochemistry 21 1858ndash1864

62 HafnerM LandgrafP LudwigJ RiceA OjoT LinCHolochD LimC and TuschlT (2008) Identification ofmicroRNAs and other small regulatory RNAs using cDNAlibrary sequencing Methods 44 3ndash12

63 JayaprakashAD JabadoO BrownBD and SachidanandamR(2011) Identification and remediation of biases in the activity ofRNA ligases in small-RNA deep sequencing Nucleic Acids Res39 e141

64 YewCW and KumarSV (2012) Isolation and cloning ofmicroRNAs from recalcitrant plant tissues with small amounts oftotal RNA a step-by step approach Mol Biol Rep 391783ndash1790

65 KurthHM and MochizukiK (2009) 2rsquo-O-methylation stabilizesPiwi-associated small RNAs and ensures DNA elimination inTetrahymena RNA 15 675ndash685

66 EbhardtHA ThiEP WangMB and UnrauPJ (2005)Extensive 30 modification of plant small RNAs is modulated byhelper component-proteinase expression Proc Natl Acad SciUSA 102 13398ndash13403

67 WangJ CzechB CrunkA WallaceA MitrevaMHannonGJ and DavisRE (2011) Deep small RNA sequencingfrom the nematode Ascaris reveals conservation functionaldiversification and novel developmental profiles Genome Res 211462ndash1477

68 ZhuangF FuchsRT and RobbGB (2012) Small RNAexpression profiling by high-throughput sequencing implicationsof enzymatic manipulation J Nucleic Acids 2012 360358

69 FujimuraT and EstebanR (2012) Cap snatching of yeast L-Adouble-stranded RNA virus can operate in trans and requiresviral polymerase actively engaging in transcription J Biol Chem287 12797ndash12804

70 BessmanMJ WalshJD DunnCA SwaminathanJWeldonJE and ShenJ (2001) The gene ygdP associated withthe invasiveness of Escherichia coli K1 designates a Nudixhydrolase Orf176 active on adenosine (50)-pentaphospho-(50)-adenosine (Ap5A) J Biol Chem 276 37834ndash37838

71 DeanaA CelesnikH and BelascoJG (2008) The bacterialenzyme RppH triggers messenger RNA degradation by50 pyrophosphate removal Nature 451 355ndash358

72 BlewettN CollerJ and GoldstrohmA (2011) A quantitativeassay for measuring mRNA decapping by splinted ligation reversetranscription polymerase chain reaction qSL-RT-PCR RNA 17535ndash543

73 KrogerC DillonSC CameronAD PapenfortKSivasankaranSK HokampK ChaoY SittkaA HebrardMHandlerK et al (2012) The transcriptional landscape and smallRNAs of Salmonella enterica serovar Typhimurium Proc NatlAcad Sci USA 109 E1277ndashE1286

74 EnglandTE and UhlenbeckOC (1978) Enzymaticoligoribonucleotide synthesis with T4 RNA ligase Biochemistry17 2069ndash2076

75 SorefanK PaisH HallAE KozomaraA Griffiths-JonesSMoultonV and DalmayT (2012) Reducing ligation bias of smallRNAs in libraries for next generation sequencing Silence 3 4

76 ZhelkovskyAM and McReynoldsLA (2011) Simple andefficient synthesis of 50 pre-adenylated DNA using thermostableRNA ligase Nucleic Acids Res 39 e117

77 McCoyMI and GumportRI (1980) T4 ribonucleic acid ligasejoins single-strand oligo(deoxyribonucleotides) Biochemistry 19635ndash642

12 Nucleic Acids Research 2013

78 AlonS VigneaultF EminagaS ChristodoulouDCSeidmanJG ChurchGM and EisenbergE (2011) Barcodingbias in high-throughput multiplex sequencing of miRNA GenomeRes 21 1506ndash1511

79 SunG WuX WangJ LiH LiX GaoH RossiJ andYenY (2011) A bias-reducing strategy in profiling small RNAsusing Solexa RNA 17 2256ndash2262

80 Van NieuwerburghF SoetaertS PodshivalovaK Ay-LinWangE SchafferL DeforceD SalomonDR HeadSR andOrdoukhanianP (2011) Quantitative bias in Illumina TruSeq anda novel post amplification barcoding strategy for multiplexedDNA and small RNA deep sequencing PLoS One 6 e26969

81 VigneaultF Ter-OvanesyanD AlonS EminagaSC ChristodoulouD SeidmanJG EisenbergE and MChurchG (2012) High-throughput multiplex sequencingof miRNA Curr Protoc Hum Genet Chapter 11 Unit 1112 11-10

82 PakJ and FireA (2007) Distinct populations of primary andsecondary effectors during RNAi in C elegans Science 315241ndash244

83 ZhuYY MachlederEM ChenchikA LiR and SiebertPD(2001) Reverse transcriptase template switching a SMARTapproach for full-length cDNA library constructionBiotechniques 30 892ndash897

84 LevinJZ YassourM AdiconisX NusbaumCThompsonDA FriedmanN GnirkeA and RegevA (2010)Comprehensive comparative analysis of strand-specific RNAsequencing methods Nat Methods 7 709ndash715

85 AndoY BurroughsAM KawanoM de HoonMJ andHayashizakiY (2012) Targeted methods to improve small RNAprofiles generated by deep sequencing In MallickB andGhoshZ (eds) Regulatory RNAs Basics Methods andApplications Springer-Verlag Heidelberg Germany pp 253ndash271

86 KawanoM KawazuC LizioM KawajiH CarninciPSuzukiH and HayashizakiY (2010) Reduction of non-insertsequence reads by dimer eliminator LNA oligonucleotide forsmall RNA deep sequencing Biotechniques 49 751ndash755

87 KapranovP OzsolakF and MilosPM (2012) Profiling of shortRNAs using Helicos single-molecule sequencing Methods MolBiol 822 219ndash232

88 KanagawaT (2003) Bias and artifacts in multitemplatepolymerase chain reactions (PCR) J Biosci Bioeng 96 317ndash323

89 DayDJ SpeiserPW SchulzeE BettendorfM FitnessJBaranyF and WhitePC (1996) Identification of non-amplifyingCYP21 genes when using PCR-based diagnosis of 21-hydroxylasedeficiency in congenital adrenal hyperplasia (CAH) affectedpedigrees Hum Mol Genet 5 2039ndash2048

90 AirdD RossMG ChenWS DanielssonM FennellTRussC JaffeDB NusbaumC and GnirkeA (2011) Analyzingand minimizing PCR amplification bias in Illumina sequencinglibraries Genome Biol 12 R18

91 UllahMF and AatifM (2009) The footprints of cancerdevelopment cancer biomarkers Cancer Treat Rev 35 193ndash200

92 MoMH ChenL FuY WangW and FuSW (2012) Cell-freeCirculating miRNA Biomarkers in Cancer J Cancer 3 432ndash448

93 ZenK and ZhangCY (2012) Circulating microRNAs a novelclass of biomarkers to diagnose and monitor human cancersMed Res Rev 32 326ndash348

94 KimT and ReitmairA (2013) Non-Coding RNAs FunctionalAspects and Diagnostic Utility in Oncology Int J Mol Sci 144934ndash4968

95 PatnaikSK MallickR and YendamuriS (2010) Detection ofmicroRNAs in dried serum blots Anal Biochem 407 147ndash149

96 MitchellPS ParkinRK KrohEM FritzBR WymanSKPogosova-AgadjanyanEL PetersonA NoteboomJOrsquoBriantKC AllenA et al (2008) Circulating microRNAs as

stable blood-based markers for cancer detection Proc NatlAcad Sci USA 105 10513ndash10518

97 CieslaM SkrzypekK KozakowskaM LobodaAJozkowiczA and DulakJ (2011) MicroRNAs as biomarkers ofdisease onset Anal Bioanal Chem 401 2051ndash2061

98 OzsolakF (2012) Third-generation sequencing techniques andapplications to drug discovery Expert Opin Drug Discov 7231ndash243

99 TsongalisGJ CalinG CordelierP CroceC MonzonF andSzafranska-SchwarzbachAE (2013) MicroRNA analysis is itready for prime time Clin Chem 59 343ndash347

100 AmiraN Cancel-TassinG BernardiniS Cochand-PriolletBBittardH ManginP FournierG LatilA and CussenotO(2004) Expression in bladder transitional cell carcinoma by real-time quantitative reverse transcription polymerase chain reactionarray of 65 genes at the tumor suppressor locus 9q341-2identification of 5 candidates tumor suppressor genes Int JCancer 111 539ndash542

101 ZhangX (2007) Biomarker validation movement towardspersonalized medicine Expert Rev Mol Diagn 7 469ndash471

102 SchmittgenTD LeeEJ and JiangJ (2008) High-throughputreal-time PCR Methods Mol Biol 429 89ndash98

103 BenesV and CastoldiM (2010) Expression profiling ofmicroRNA using real-time quantitative PCR how to use it andwhat is available Methods 50 244ndash249

104 BustinSA and NolanT (2004) Pitfalls of quantitative real-timereverse-transcription polymerase chain reaction J Biomol Tech15 155ndash166

105 BeitzingerM and MeisterG (2011) Experimental identificationof microRNA targets by immunoprecipitation of Argonauteprotein complexes Methods Mol Biol 732 153ndash167

106 FiedlerSD CarlettiMZ and ChristensonLK (2010)Quantitative RT-PCR methods for mature microRNA expressionanalysis Methods Mol Biol 630 49ndash64

107 ArefianE KianiJ SoleimaniM ShariatiSA Aghaee-BakhtiariSH AtashiA GheisariY AhmadbeigiN Banaei-MoghaddamAM NaderiM et al (2011) Analysis ofmicroRNA signatures using size-coded ligation-mediated PCRNucleic Acids Res 39 e80

108 JonstrupSP KochJ and KjemsJ (2006) A microRNAdetection system based on padlock probes and rolling circleamplification RNA 12 1747ndash1752

109 LatorraD ArarK and HurleyJM (2003) Designconsiderations and effects of LNA in PCR primers Mol CellProbes 17 253ndash259

110 BustinSA BenesV GarsonJA HellemansJ HuggettJKubistaM MuellerR NolanT PfafflMW ShipleyGLet al (2009) The MIQE guidelines minimum information forpublication of quantitative real-time PCR experiments ClinChem 55 611ndash622

111 TrapnellC WilliamsBA PerteaG MortazaviA KwanGvan BarenMJ SalzbergSL WoldBJ and PachterL (2010)Transcript assembly and quantification by RNA-Seq revealsunannotated transcripts and isoform switching during celldifferentiation Nat Biotechnol 28 511ndash515

112 TrapnellC PachterL and SalzbergSL (2009) TopHatdiscovering splice junctions with RNA-Seq Bioinformatics 251105ndash1111

113 ListerR OrsquoMalleyRC Tonti-FilippiniJ GregoryBDBerryCC MillarAH and EckerJR (2008) Highly integratedsingle-base resolution maps of the epigenome in ArabidopsisCell 133 523ndash536

114 PolidorosAN PasentsisK and TsaftarisAS (2006) Rollingcircle amplification-RACE a method for simultaneous isolationof 50 and 30 cDNA ends from amplified cDNA templatesBiotechniques 41 35ndash42

Nucleic Acids Research 2013 13

Page 4: SURVEY AND SUMMARY Biases in small RNA deep sequencing data

RNA primary structure influences RNA 30-end tailingRecently to examine the influence of primary structureon RNA tailing efficacy three in vitro-transcribed testRNAs were assayed in comparative oligo(A)- andoligo(C)-tailing reactions (39) Interestingly after RNAdenaturation all test RNAs irrespective of primaryRNA structure performed equally well in the C-tailingassays and proceeded up to 95 completion as judgedby shifts in mobility (calculated as percent input) (39) Toexamine the influences of the 30-terminal nucleotide onRNA A-tailing RNAs identical except for the last nucleo-tide were examined in oligo(A)-tailing assays and up to2-fold changes in the efficacy of RNA modification werereported (36) The analysis also revealed that uridine atthe RNA 30-ends was the least preferred nucleotideHowever reaction kinetics indicated that even an RNAmolecule with an unfavorable 30-terminal uridine wasalmost completely modified if the reactions proceededlonger (36) Therefore increased incubation periodsmight compensate for substrate-specific variation inRNA 30-tailing efficacies Hence the available dataindicate that primary structure-related influences are lesslikely to bias sRNA-seq data if the protocol relies onRNA 30-end tailing However recent observations sug-gested that the RNA 50-end phosphorylation state influ-ences the resulting 30-end homopolymer tail length ofotherwise identical test RNAs (39)

RNA 30-end modification impacts tailing efficienciesThe 20-O-methylation of RNA 30-ends is also reported toreduce the efficacy of RNA tailing reactions (36) Theefficacy of oligo(A) addition can drop by as much as80 and even after extended incubation periods certainRNAs remain virtually unmodified (Figure 1) Therefore20-O-methylation at RNA 30-ends renders RNAs lessaccessible to RNA tailing (36) and such RNAs wouldeventually appear underrepresented in sRNA-seqNotably most plant miRNAs and vertebrate piRNAsare reported to be substrates for 20-O-methylation(4344) The drop in tailing efficacy and the correspondingdistortion in expression level might be particularlyrelevant when such RNAs are subjected to RNA tailingprocedures

RNA ligation

Today RNA ligase-catalyzed reactions are most oftenutilized to modify RNA termini for cDNA constructionThe genome of bacteriophage T4 harbors two differentRNA ligases T4Rnl1 and T4Rnl2 T4Rnl1 compensatesfor host defense mechanisms by sealing breaks introducedinto the anticodon loop of host tRNALys during infection(4546) T4Rnl2 has been detected more recentlyAlthough phylogenetically related to various classes ofDNA and RNA ligases the biological function ofT4Rnl2 remains unknown (334748) Both RNA ligasesas well as derivatives of T4Rnl2 are frequently utilized incDNA library construction In particular the truncatedversion of RNA ligase 2 has been reported to minimizeside reactions (4950) Variations in the compositions of

substrates and reagents during ligation are likely to intro-duce biases

RNA ligation-the reaction pathwayRNA ligation leads to ATP-dependent 30-50 phospho-diester bond formation Two molecular features aredistinguished (51) donor molecules that provide50-monophosphorylated RNA termini (5253) and ac-ceptors that contain 30-hydroxyl functional groups(52ndash54) (Figure 2) Two reaction intermediates areisolated suggesting a three-step reaction mechanismDuring the initial step of ligation accumulation ofmonoadenylated RNA ligase is observed (5556) Theenzyme-catalyzed adenosyl-transfer to the RNA ligaserequires phosphoramide bond formation and isaccompanied by pyrophosphate release (Figure 2) In thesecond step the AMP moiety is transferred to a50-monophosphorylated RNA donor molecule to form a50-adenylated RNA intermediate (50-AppRNA) (5253)(Figure 2) The 50-50 anhydride linkage is required toactivate donor molecules for the subsequent sealingreaction Finally an acceptor RNA 30-end hydroxylgroup (30-OH) attacks the alpha phosphorus of the 50-AppRNA donor molecule leading to covalent strandsealing and AMP release (5253) (Figure 2) The factthat pre-adenylated donors perform efficiently in reactionsconducted without ATP supplement provides formalproof of the reaction pathway and also is of major import-ance to practical applications (4957) The 50-App-modified adapters are now the reagents of choice tominimize side reactions (294957)

Side products in ligation reactionsA side reaction common to RNA ligation is the intramo-lecular RNA circularization (Figures 1 and 2) (29) Thereaction pathway is identical to the intermolecular ligationdiscussed above and is generally preferred due to closeproximity of the RNA ends Thus reactions that permitintra- and intermolecular products are dominated byintramolecular circularization (5158) It is of major im-portance to minimize side products in RNA ligation tomaintain reaction yields Generally the application of50-pre-adenylated adapter molecules in RNA 30-end liga-tions is presumed to abolish RNA circularization (5960)However the reversal of donor adenylation as a source ofcircularization was recently reported (293961) (Figure 2)In reactions performed with pre-adenylated donor mol-ecules 50-monophosphorylated RNAs were circularizedeven in the absence of ATP (2939) Remarkably forsome test RNAs the circularization constituted themajor product (39) Therefore preferred circularizationmight render certain RNAs inaccessible to cDNA con-struction The yield of circularization also depends onthe type of RNA-ligase enzyme utilized T4Rnl1-catalyzedreactions display the highest level of circularization (29)The tendency of T4Rnl1 to increase the amount of circularproduct might be explained in part by its biologicalfunction in intramolecular sealing reactions A C-termin-ally truncated version of T4Rnl2 displays minimal adenyl-transfer activity while maintaining the ability to form30 50 phosphodiester bonds (52) Ligase-mediated

4 Nucleic Acids Research 2013

RNA circularization depends on AMP removal from thepre-adenylated 50-App-adapter and its subsequent transferto 50-monophosphorylated RNAs therefore the applica-tion of truncated T4Rnl2 (T4Rnl2tr) to minimize this sidereaction is highly suggested (4950) In addition theexchange of lysine with glutamine (K227Q) in T4Rnl2trfurther reduced adenyl-transfer competence but retainedactivity to catalyze phosphodiester bond formation (48)Therefore the application of T4Rnl2 derivatives is bene-ficial for decreasing bias in sRNA-seq analysis (29)

Alternatively phosphatase pretreatment to generateRNA 50-OH termini that are inert to ligase-catalyzedadenylation helps to minimize side reactions The proced-ure ultimately requires the application of polynucleotidekinase to generate monophosphorylated RNA endsHowever tRNA rRNA and mRNA processing inter-mediates are reported to contain 50-hydroxylated termini(2962) Polynucleotide kinase treatment as a necessaryprerequisite to RNA 50-adapter ligation leads to the inclu-sion of abundant processing intermediates in cDNAlibraries The alternative techniques to avoid RNA circu-larization might distort cDNA representation and aretherefore likely to reduce the complexity of sRNA-seqexperiments (2962)

30-end ligation depends on RNAndashadapter combinationRNA 30-end ligation using T4Rnl1 was recently reportedas a potential source of bias in deep sRNA sequencing(39) Comparisons of the product yields of ligation reac-tions were made using three test RNAs and variousadapter sequences Depending on the specific RNAndashadapter combination the products differed remarkably

Similarly Jayaprakash et al (63) reported variations inproduct yields depending on specific RNAndashadapter com-binations in Rnl2tr-catalyzed reactions Because the RNA30-end ligation experiments were conducted with 50 pre-adenylated adapters the results also indicate thatadenyl-transfer is less likely the cause of the observedbias (63) RNA 30-adapter ligation was enhanced whenadapter pools displaying nucleotide variation at the two50 terminal positions were used (63) Therefore increasedvariability at adapter 50-ends enhances RNA ligation yieldand in general helps to minimize distortion in sRNA-seq(333963) RNA 50-phosphorylation also influences theefficacy of adapter ligation to RNA 30-ends (39) Theligation efficacies of T4 Rnl1-mediated reactions werecompared for test RNAs featuring different phosphoryl-ation states at RNA 50-ends and different adaptersincluding those with 50-App-modification UnexpectedlyRNAs that performed well when monophosphorylateddisplayed weak acceptor characteristics whentriphosphorylated Therefore the efficacy of RNA 30-endmodification might also be a function of the RNA 50-phos-phorylation state (Figure 1)

Ligation of adapters to modified RNAsIn ligation assays T4Rnl1 and T4Rnl2tr performedalmost identically using 30-unmodified test RNA sub-strates However the efficacy of T4Rnl1-catalyzed reac-tions dropped remarkably when 20-O-methylated RNAswere used (36) In contrast T4Rnl2tr performed almostequally well irrespective of RNA modification (36) Theadditions of 25 (wv) PEG 8000 to the T4Rnl2tr-mediated reactions significantly increased reaction yields

Figure 2 Schematic representation of the three-step mechanism of the bacteriophage T4 RNA ligase (T4Rnl) reaction with the potential sideproducts Encircled Arabic numbers indicate the order of ligation steps Step 1 adenylation of the T4Rnl active site Step 2 donor 50-adenylation(side reaction circularization of 30-end-unprotected donor molecules) Step 3 phosphodiester bond formation between 30-hydroxylated (OH) acceptorand donor molecules (side reaction reverse adenylation of donor and circularization of acceptor molecules) PPi pyrophosphate p monophosphateAMP adenosine monophosphate (Ap) App 50-adenylated termini

Nucleic Acids Research 2013 5

(5064) Hence additives that increase the effective mo-lecular concentration have a major influence on productyields and meta-analyses of data generated by differentligases or even in different buffer compositions are error-proneAs an aside specific enrichment of 30-end 20-

O-methylated RNAs like the piwi-interacting RNAs(piRNAs) or plant miRNAs is achieved by periodatetreatment The reaction relies on vicinal hydroxyl groupsto oxidize sugar moieties and is accompanied by pentosering opening The resulting vicinal dialdehyde reacts withwater and forms a chemical equilibrium with thecorresponding dioxane derivative upon ring closure The20-O-methyl group-containing RNAs are insensitive tooxidation and remain accessible for adapter ligation ortailing (65ndash67)

Impact of RNA secondary structure and adapter-RNAcofolding on ligationRecent analysis indicates that RNA secondary structureimpacts sRNA-seq bias (29) During kinetic profiling ofRNAndashadapter ligation test miRNAs were trapped in non-reactive states Denaturing the remaining non-ligatedRNAs resulted in increased ligation yield indirectly sug-gesting the influence of RNA secondary structure (29) Tofurther analyze the impact of secondary structure on RNA30-end ligation a pool of substrate RNAs with 21randomized nucleotides at their 30-ends was ligated withindividual pre-adenylated DNA adapters and reactionyields were subsequently quantified by deep sequencing(33) To avoid bias caused by template-specific RNA50-end modification(s) all test RNAs contained identical50-termini (33) RNAs that harbored stems at their 30-endswere underrepresented in the sequencing data whereasRNAs containing a minimum of three unstructured nu-cleotides at the 30-ends were overrepresented The dataindicate that RNA ligase-mediated reactions in generalare more efficacious with RNA substrates that harborstructurally accessible 30-ends (33)In addition the cofolding of RNA and adapter mol-

ecules impacts ligation (33) Adapter sequence variationthat interfered with the RNA 30-stem formation andpermitted intermolecular base-pairing increased ligationyields (33) Interestingly this data contradicts prioranalysis emphasizing a predominant primary sequence-related influence on data distortion (3363) Althoughthe actual source of discrepancy remains to be analyzedprotocol-specific variation probably due to changes in de-naturation is likely to have caused the observed difference(see supplementary information) Irrespective of the actualreason the advice for minimizing this bias is identical andrelies on the application of adapter pools to increase thelikelihood of productive interactions (33) Thus inaddition to the RNA nucleotide composition RNAndashadapter cofolding and secondary structure-related influ-ences contribute to sRNA-seq bias

RNA 50-end ligationThe chemical properties of RNA 50-ends display structuralvariations Depending on the starting material and experi-mental design it may be necessary to alter the chemistry

of RNA 50-termini prior to cDNA generation Forexample neither 50-triphosphorylated RNAs in prokary-otes nor primary eukaryotic RNA polymerase III tran-scripts are accessible for direct 5-adapter ligationSimilarly RNA polymerase II-specific trimethylguanosinecap structures render RNA 50-ends inaccessible Thusenzymatic pretreatment is usually conducted to avoidbias in the starting material (Figure 1 SupplementaryData and Supplementary Table S1)

Specific biochemistry of RNA 50-termini and enrichmentof various RNA classesAlthough the various chemical properties of RNA 50-endsestablish limitations for unbiased representation they alsopermit specific enrichments (41) (Figure 1 SupplementaryData and Supplementary Table S1) RNAs that containmonophosphorylated 50-ends and 30-hydroxyl groups aredirectly accessible to cDNA generation proceduresConsequently unaltered RNA starting material leadsto its specific enrichment (Figure 1 SupplementaryTable S1) In contrast triphosphorylated or 50-cappedRNAs require TAP treatment to generate 50-monophosphorylated RNAs and to permit 50-end ligation

Biochemical alternatives to specifically cleave pyrophos-phate bonds in 50-triphophorylated RNAs were reportedrecently (68) whereby TAP-mediated hydrolysis issubstituted by polyphosphatase (69) or pyrophospho-hydrolase (7071) treatment In vitro analysis to investigatethe substrate specificity of E coli rppH pyrophospho-hydrolase indicated that pyrophosphate hydrolysisproceeds with identical efficacy irrespective of the RNA50-terminal nucleotide (71) Furthermore a subtle declinein the enzymatic activity in response to potentially inhibi-tory RNA 50-terminal stem loop structures was reported(71) These strategies are particularly appealing when bac-terial primary transcripts or nascent eukaryotic RNA poly-merase III transcripts are central to the investigation Inaddition to targeting triphophorylated RNA 50-ends poly-phosphatase also accepts diphophorylated RNA 50-terminito catalyze pyrophosphate bond cleavage It should beemphasized that 50-capped RNA polymerase II transcriptsare not amenable to polyphosphatase or pyrophospho-hydrolase treatment and therefore escape identification

The application of a TerminatorTM 50-phosphate-dependent exonuclease is frequently utilized to enrich for50-capped and 50-triphospharylated RNAs (417273)The inherent 50 30 exonuclease activity digests50-monophosphorylated RNAs and leaves other tran-scripts unaltered (Figure 1 Supplementary Data andSupplementary Table S1) Therefore different chemistriesof RNA 50-ends not only distort sRNA-seq but also allowfor specific enrichment protocols Subsequent analysispermits the distinction among RNA subsets such asprimary transcripts capped RNAs and other productsof RNA 50-end processing (417273)

Nucleotide preference of RNA 50 ligationApart from specific differences in RNA 50-ends the influ-ence of the adapter sequence itself on ligation efficacy iswell documented Biochemical data indicated that ligationreactions proceeded most effectively when an adenosine

6 Nucleic Acids Research 2013

residue occupied the 30-terminal position of adapter(acceptor) molecules (5174) Recently adapter pools dis-playing sequence identity at all except the two 30-terminalnucleotides were used to investigate sequence-dependentvariation in acceptor function Based on deep sequencingas the readout the relative ranking of miRNA frequenciesvaried as a consequence of the specific adapterndashRNA com-binations (63) Once more miRNA expression profiles arenot only a function of true RNA abundance but alsoreflect varying ligation efficacies (63) Interestingly theuse of adapter pools highly improved the correlation ofmiRNA levels between replicates (63)

Influence of RNA secondary structure on RNA 50-endligationTo investigate potential influences on 50-end modificationderived from RNA secondary structure ligation yieldswere compared among test sets of fully randomizedRNAs A remarkable underrepresentation of double-strand folds at 50-ligation sites indicated strong preferencesfor single-stranded RNA substrates in ligation reactionsconducted with T4Rnl1 (75)

Sugar moiety-dependent change in ligation efficacyDNA-based adapter oligonucleotides performwell in RNA30-end ligation reactions (especially when 50-adenylated)whereas RNA-based adapters are preferred for RNA50-end modification (76) Kinetic analysis of DNA-basedadapter molecules for RNA 50-ligation demonstratedslower reactions compared with their RNA counterparts(77) Recent analyses revealed more complex substrate-specific effects between test RNAs and different adaptersWhen DNA- and RNA-based adapters of identical se-quences were investigated in ligation assays no generalpreference with regard to the sugar moiety of the adaptermolecule was observed in RNA 50-end ligation (3963) Inconclusion to avoid bias adapter pools are recommendedto display sequence variation The chance of productiveligation might be enhanced using RNA DNA or mixedadapter oligonucleotides (3963)

Adapter barcoding influences ligationIn addition to nucleotide preferences of the reactingresidues upstream sequences also exert influence onligation yields Adapters harboring different Roche MIDtags placed 19 nucleotides upstream from the reacting30-ends performed entirely differently when assayed withidentical test RNAs (39) Depending on the combinationof in vitro transcribed non-protein coding RNAs andMID-modified adapters ligation product levels (calculatedas non-protein coding RNA input) ranged from 60ndash70to undetectable (39) Recent reports emphasized thatbarcoding for multiplex sequencing leads to significantbias in miRNA expression profiles (78ndash80) This bias wasdiminished when barcodes were introduced by PCRamplification of cDNA (78ndash81)

Quantitative considerations and barcodingIn general the discovery of novel RNAs and analyses oftranscripts that are expressed at very low levels requiresubstantially more cDNA reads than experiments

directed toward investigating differential gene expressionsHence the design of sRNA-seq experiments needs totake into account the rationale of the investigationContigs formed by continuously overlapping readsduring genomic mapping procedures represent thebio-computational equivalent of individual RNAentities Any selected read during cDNA mapping eithercontributes to pre-existing already populated RNAs ormight generate novel contigs The probability of eitherprocess is a function of RNA size as well as the corres-ponding RNA expression value Of course contigs repre-senting longer rather than shorter RNAs are more likelyto be populated by chance RNAs of higher relative ex-pression levels are generally more densely packed withcDNA reads than RNAs that display medium or low tran-scription levels Consequently if multiplexing is chosenfor economic reasons the necessary sequencing depthmust be approximated prior to sequence runs in order togenerate adequate sample sizes The two fixed parametersfor estimating admissible multiplexing are therefore (i) ex-perimental goal and (ii) lane capacity According toIllumina TruSeq Small RNA Sample Prep Kitguidance 5 million sRNA-seq reads are considered appro-priate for novel miRNA discovery within human-sizedgenomes For miRNA profiling usually about 25 millionreads are recommended These approximations permit re-calculation geared toward specific needs and requirements

Alternative cloning strategies to avoid 50-end modificationBecause the 30-terminal position of cDNA corresponds tothe RNA 50-ends Pak et al (82) used a technique to avoidmodification of RNA 50-termini and therefore to minimizethe need for enzymatic pretreatment The strategy intro-duces adapter sequences not to RNA 50-termini butrather to 30-ends of the corresponding cDNAs (82) Theadvantage of omitting enzymatic treatment of RNA50-ends might be offset by incomplete reverse transcriptionthat precludes full-length cDNA cloning or appropriateassessment of RNA 50-endsRecently ligation to cDNA 30-ends was investigated in

more detail (36) Perfect double-stranded DNA-RNA testhybrids and single-stranded DNAs were utilized to mimicpotential products of reverse transcription All T4 DNA-and RNA-ligases including T4Rnl2 truncated derivativeswere compared to delineate optimal conditions for cDNA30-end modification Surprisingly ligation reactionscarried out with pre-adenylated DNA donor moleculesstrongly indicated that double-stranded DNA-RNAchimera constituted the most reactive substrates forT4Rnl1-mediated ligation In addition supplements ofPEG 8000 strikingly influenced ligation final concentra-tions of 25 PEG 8000 increased reaction yields Identicalsingle-stranded test cDNAs exhibited only poor substratecharacteristics The A-form-like conformation of theDNA-RNA hybrid might have compensated for theprimary structure-related influences and permitted moresimilar ligation outcomes among substrates (36)However the analysis examined only one test substrateand therefore might not be amenable to drawing generalconclusions (36) A further interesting observation indi-cates that cDNA 50-terminal nucleotides influence cDNA

Nucleic Acids Research 2013 7

30-end ligation yields (39) As the cDNA 50-terminalsequence is a consequence of specific RNA 30-end modifi-cation differences between protocols exert different influ-ences on cDNA 30-end adapter ligation and in turn mightalso cause expression profile bias (39)The lsquoSmartTM Approachrsquo (83) offers another interesting

alternative to capture mature RNA 50-ends and to avoidRNA 50-end adapter ligation The technology is based onthe template switching capacity of Moloney murineleukemia virus (M-MLV) reverse transcriptase Oncefirst strand synthesis approaches the RNA 50-end theterminal transferase activity of M-MLV reverse transcript-ase adds oligo(C)-stretches to the growing cDNAOligonucleotides that 30-terminate with oligo(G) stretchesserve to extend templates and permit reverse transcriptionto proceed Thus M-MLV reverse transcriptase enablesthe generation of cDNAs with known terminal sequenceelements (83) and thereby eliminates the need to modifythe RNA 50-ends Recently the technology was adapted tothe specific needs of Ilumina RNA-seq analysis (84)

Adapterndashadapter ligation disturbs sRNA-seqA side effect common to sRNA-seq protocols that rely onligase-mediated sequential RNA end-modification is PCRamplification of empty adapterndashadapter ligation products(8185) Adapterndashdimer formation is a direct consequenceof the molecular adapter excess employed to ensure suffi-cient product yield and might dominate in reactions wherethe concentration of total RNA starting material is low(8186) Adapter molecules that are 50-pre-adenylatedremain in reaction mixtures after RNA 30-end modifica-tion and react with RNA 50-adapter oligonucleotideswhich subsequently favors dimer formation Thereforepurifying products away from excess adapters via gelelectrophoresis is recommended (78) However for prep-aration of tiny RNAs such as miRNAs separation of thesmall size difference between ligated product and adaptersis not trivial Alternative approaches to avoid size frac-tionation rely on application of complementary lsquodimer-eliminator-LNA-oligonucleotidesrsquo (8586) The interferingoligonucleotide is designed to overlap the adapterndashadapterligation junction and thereby distinguish between targetedside product and adapter ligated RNA (86) The comple-mentarity of the interfering oligonucleotides extends intothe RNA 30-adapter sequence thus blocking primerannealing during reverse transcription and as a resultexcludes adapter dimers from RNA sequencing (86)Vigneault et al reported a similar strategy to overcomeunintended dimer amplification (81) Pre-annealing of theRT primer after RNA 30-adapter ligation but prior toRNA 50-end modification efficiently decreased adapterdimer formation The excess of non-ligated free RNA30-adapters was captured by complementary reverse tran-scriptase primers and sequestered in less-reactive double-stranded structures (81) The latter approach permitseffective reduction of dimer amplification and is reportedto increase cDNA library complexity (81) ObviouslyRNA tailing strategies (see above) are superior becausethey completely circumvent the problem of adapter-dimer formation To avoid adapter tailing as an analogous

side reaction to dimer formation it is advisable to carryout 30-tailing prior to 50-adapter ligation

Depending on the experimental goal sRNA-seqrequires varying amounts of cDNA reads (see above)Therefore techniques to reduce the amount of readsthat represent empty adapter dimers are of central import-ance as they permit an increase in sequencing depthreducing costs and ultimately enabling novel smallnon-protein coding RNA gene discovery

PCR AMPLIFICATION BIAS

Apart from novel third-generation sequencingtechnologies that permit direct sequence analysis of un-amplified cDNA all commonly used protocols implementPCR amplification (87) In addition the introduction ofbarcode identifiers by PCR amplification may also avoidbias introduced by differences in RNAndashadapter-specificligation efficacies (78) Therefore multiplexing of differentsamples during PCR is widely regarded to permit meta-analysis (78) However even PCR amplification itself isknown to introduce expression bias into datasets(303288) In particular when many different templatesare amplified in parallel variations not only in templatelength but also in base composition might lead topreferred template-specific amplifications (30) Extremeexamples of allelic dropout due to single base-exchangewere reported for PCR-mediated genotyping (89) TheRNA GC-content is especially responsible for data dis-tortion Generally cDNAs of high GC-content are lessreadily amplified and thus remain underrepresented Forexample read distributions calculated over the entire 28SrRNA were inversely correlated with the correspondingregional GC-content (32) As argued earlier base com-positions of different small non-protein coding RNAs(miRNAs piRNAs snoRNAs scaRNAs etc) varywidely The parallel amplification of complex templatemixtures that vary in their respective GC-content is there-fore a challenging procedure and likely to influence thecomplexity of the resulting cDNA library (30) Changesin complexity do not only influence distortion of quanti-tative measurements but certain RNAs might also lsquodropoutrsquo completely and remain unidentified Extended periodsof denaturation are considered to effectively decrease thelikelihood of uneven amplification (90) PCR amplifica-tion does not only change the overall GC-content of theresulting cDNA pool but is also reported to introducelength bias (30) which might be an important factorwhen RNAs of different sizes are subjected to parallelcDNA preparations Shorter sequences of significantlylower GC-content are likely to be overrepresented

Recent analysis also emphasized the impact of PCRbuffer compositions on PCR amplification bias (30)Remarkably PCR-mediated bias was already observedin some cases within 15 rounds of PCR amplification(30) While the popular Phusion PCR system surprisinglyperformed the worst (30) data indicate that someoptimized PCR buffers are capable of significantlyreducing PCR-mediated distortion (3090)

8 Nucleic Acids Research 2013

sRNA-SEQ AND PERSONALIZED MEDICINE

In recent years increasing attention has focused on thepotential of miRNA transcriptional profiling to diagnosedisease or to monitor progression and success in treatment(161791) The decreasing costs of high-throughputsequence analysis favors sRNA-seq as a common toolfor medical diagnostics and personalized medicine Aftercell death the total RNA of circulating cancer cells isreleased into the blood stream (92) These extracellularRNAs (exRNAs) are packaged within vesicles such asexosomes where they are complexed with differentproteins for example miRNAs in the RISC complexVesicles protect exRNA from ribonucleases and preventdegradation Among the various exRNAs miRNAs aresome of the most stable molecules (93) Easy accessibilityof patient blood samples and the stability of miRNAsmake them effective biomarkers (94) Diagnostics basedon miRNA were successfully conducted on biologicalsamples of low quality including dried blood (95) Inaddition miRNAs remain stable even when samples aresubjected to repeated rounds of tissue freezing andthawing which usually degrades RNAs (96) A furtherappealing feature of miRNAs in medical research is thatchanges in their expression profiles are observed duringdisease onset (97) In addition miRNAs often displaytissue-specific expression Therefore it might be possibleto identify the cellular origin of most metastases (91)possibly providing decision guidance for appropriatetreatment However as elaborated in detail miRNA ex-pression profiles detected by deep sequencing do not onlyreflect changes in relative expression but are alsoinfluenced by miRNA-adapter-specific ligation efficaciesespecially in cost-effective parallel analyses that oftenrequire multiplexing Different barcode introduced byligation generate bias in miRNA representation and abun-dance (78) Even when incorporated by PCR barcodingmight still lead to data distortion (30ndash32) Third-gener-ation sequencing techniques such as the Helicos (InNovember 2012 Helicos filed bankruptcy (httpbizyahoocome121115hlcs8-khtml)) system avoid PCRamplification steps in cDNA preparation and permitdirect sequencing of unamplified samples (87) The RNA30-end modification to prime cDNA synthesis (or directsequencing) in addition to relatively high error frequencies(15) might still lead to bias and so would not beespecially applicable in personalized medicine (98)Collectively deep sequencing to monitor change inmedical samples is of importance and certainly ofinterest but requires resolution of serious standardizationissues prior to general application

qRT-PCR a valid alternative to deep sequencing

Alternative techniques to quantify specific miRNA testsets rely on quantitative real-time reverse transcriptionPCR (qRT-PCR) (99) In particular diagnostic screensin cancer medicine rely on the application of qRT-PCR(99ndash101) In cases where biomarker disease correlation iswell established and when quantification of limitednumbers of biomarkers is sufficient for diagnosis and tomonitor disease progression or treatment qRT-PCR is

a valuable tool (99) Recently the convenience ofqRT-PCR-based medical diagnosis was suggested tomake it superior in routine clinical applications (99)Interestingly novel developments in qRT-PCR allowsample processing in 384-well formats and liken thismethod to cost-effective high-throughput applications(99102) A further advantage of qRT-PCR is the reduc-tion in downstream bio-computation (99103) The timebetween qRT-PCR and diagnostic result is shorter thanfor sRNA-seq or microarray-based inference Howeverthe analysis of miRNA expression profiles by qRT-PCRis technically challenging due to RNA-inherent pitfalls(103104)The miRNAs lack structural features such as mRNA

poly(A) tails that could serve as class-specific enrichmentor priming of reverse transcription (103) Alternative tech-niques for miRNA sample enrichment would rely onimmune-precipitation but most of them are not suitablefor typical small size samples in medical analysis (105) Asmentioned above the heterogeneity of the RNA GC-content translates into a wide range of primer annealingtemperatures for a population of miRNAs and mighttherefore complicate parallel analysis of multiple RNAs(103) Furthermore the qRT-PCR template is presentnot only within the mature miRNA molecule but also ispart of various precursor transcripts (103) It is thereforeimportant to ensure that primary or precursor-miRNAs(pri- or pre-miRNAs) do not provide templates inqRT-PCR amplification and as a consequence do notcontribute to the signal intensities detected Difficultiesthat relate to the similarity displayed between differentmiRNA isoforms are mentioned above within thecontext of array-based inference to quantify differentialgene expressionThus far two alternative approaches are commonly

utilized to reverse transcribe miRNA templates(103106) miRNA-specific stem-loop primers are oftenused when only limited numbers of RNAs are subject toquantification A major advantage of using miRNA-specific primers for reverse transcription is an increase insensitivity and specificity However RNA 30-end-tailing-based approaches enable bulk reverse transcription ofmultiple targets in parallel and are therefore advantageouswhen many miRNAs are analyzed (see above) A recentcomparative analysis to investigate reverse transcriptionefficacies of the two methods uncovered only minor dif-ferences (106) The analysis involved only a limitedmiRNA test set and therefore the results cannot begeneralized As an aside the oligo d(T)-based designmight also enable reverse transcription of miRNA andthe corresponding mRNA target in single test tube reac-tions (106) Ligation-mediated PCR or padlock probes incombination with rolling circle amplification are technicalalternatives for avoiding reverse transcription (107108)The methods rely on ligation-mediated template gener-ation and are therefore not devoid of bias (see above)Both approaches permit generation of miRNA signaturesacross samples and do not require elaborate analyticalpost-processing or expensive technical equipment(107108) However as elaborated here and abovereverse transcription is not devoid of template-specific

Nucleic Acids Research 2013 9

bias In particular RNA secondary structure and RNAmodifications both influence reverse transcription efficacy(see above) Similar to PCRs reverse transcription is nega-tively affected by increasing RNA GC-contentBecause the relative RNA GC-content dictates primer

Tm qRT-PCRs are also strongly influenced by forwardprimer design thereby affecting both qRT-PCR specificityand sensitivity Altering the length of primer-templatecomplementarity permits adjustment of the actualprimer Tm Conversely in cases where RNA nucleotidecomposition would dictate very low melting temperaturesLNA-modified nucleotides might be incorporated into theprimer sequence to increase the corresponding primer Tm(18) However the design of LNA-containing oligonucleo-tides is not always trivial (109) For the actual measure-ment of miRNA in real time two alternative fluorescentreporter molecules are frequently utilized SYBR Greenand the Taqman assay (103) Detection of the fluores-cence signal emitted by SYBR Green increases about100-fold as the molecule intercalates in double-strandedDNA However the mechanism of DNA intercalationsuggests the dye is not target-specific and thereforecannot discriminate primer dimers unspecific productsand target amplification (103) Because the intercalatingdye is simply added to the assay it is often possible toeasily adapt the qRT-PCR assay to already establishedand validated analytical PCRs (104) In comparison theTaqman assay provides increased specificity as the de-tection is based on sequence-specific probendashtarget inter-action Certainly the qRT-PCR assays offer manyadvantages but require evaluation of various factors thatinfluence and potentially bias analysis Thereforeinternational standardization to ensure comparable ex-perimental design and to permit meta-analysis amongdatasets is of prime importance for establishing clinicalqRT-PCR-based personal medicine The MinimumInformation for Publication of Quantitative Real-TimePCR experiments guidelines established standards for bet-ter experimental practice and suggest routes to data ana-lysis that increase reliability and data comparison (110)

mRNA-SEQ

Deep sequencing is not limited to expression profiling ofmiRNA or other small non-protein coding RNAs Evenentire transcriptomes including all protein coding andlong non-protein coding RNAs are subjected to high-throughput sequencing methods Currently as the lengthof single sequence runs is limited (from 100 to 500 nt de-pending on the sequencing platform) longer RNAsrequire fragmentation prior to cDNA synthesis (25)Millions of small reads resulting from mRNA-seq areanalyzed to identify novel transcripts or splice variants(111112) Relative quantification of reads enables one tomonitor differential gene expression among samples andto identify changes in isoform distribution (111) As withsRNA-seq the digital nature of RNA deep sequencing isthought to simplify meta-analysis of mRNA or long non-protein coding RNA expression data (25) Some protocolsfor modification of RNA fragment ends rely on

chemistries identical to sRNA-seq (84113) Therefore ex-pression biases observed in sRNA-seq analysis are alsolikely to cause distortion in whole transcriptome profilingVariations in nucleotide composition and changes in RNAsecondary structure of different RNA fragments mightcorrelate with uneven read distribution Unequal readcoverage generates uncertainty in isoform detection andin extreme cases exons representing unfavorable nucleo-tide compositions might escape analysis As fragmentationis suggested to generate many different but at least par-tially overlapping reads bias in mRNA-seq might be lessprominent Recently a novel technique based on circligasewas introduced to avoid RNA 50-end ligation (114)Reverse transcription is conducted with anchoredoligo(dT) primers that contain an additional sequence topermit later PCR amplification Following first strandcDNA synthesis circligase-mediated ligation is carriedout The resulting circular molecules are subjected toPCR amplification The strategy is of general interestand might be a valuable tool for sRNA-seq as well

CONCLUSION

Bias in general describes systematic errors that reflectmethod-related distortion from the truth Bias might bedetected easily when data generated by different methodsare subjected to analytical comparison Because it is sys-tematic in nature bias is not subject to variation inrepeated experiments Here based on biochemical anddeep sequencing data analyses we have presentedvarious steps in sRNA-seq protocols likely to causesevere distortions in the relative expression levels of indi-vidual sRNAs

The efficacy of RNA end-modification to permit cDNAconstruction depends not only on differences in strategiesand reagents but also on changes in among others buffercompositions and additives that could increase molecularcrowding Therefore meta-analysis of data generated withnon-identical methods reagents or even buffers is likely tobe error-prone (36396378) To achieve comparableresults and to evaluate relative changes in the quantifica-tion of the resulting data experiments and analyses shouldbe conducted under strictly identical conditions It has tobe stressed that bias in RNA-specific variation is not onlyrestricted to inter-sample comparison but also effects therelative ranking of intra-sample RNA expression

In addition to the more obvious quantitative aspects ofbias there are also qualitative aspects because RNA-seqmethods are also employed for RNA discovery IndividualRNAs might lsquodrop outrsquo from detection because they arenot amenable to cDNA construction under chosen condi-tions Certainly to increase the likelihood of productiveRNA end-modification and increase the complexity ofsRNA-seq library content the parallel application of dif-ferent strategies and reagents is required (33396379) Inparticular increased variability of adapter sequences (ieadapter pools) helps to increase the diversity of RNAsaccessed (63) A recent report emphasized the need forparallel application of different RNA- and DNA-basedadapter oligonucleotides (39)

10 Nucleic Acids Research 2013

The source of expression level distortion and the failureto detect certain RNA species at all appear not to belimited to reactions of RNAcDNA-end modificationbut also apply to cDNA amplification by PCR (30ndash32)Differences in reaction buffers RNA GC-content sec-ondary structures RNA length and primers might eachlead to bias during PCR Therefore the analysis ofsRNA-seq data to examine relative changes in geneexpression or identification reflects both true RNA abun-dance and biases related to the methods applied

The sRNA-seq approach is also attractive to medicalresearch Deep sequencing might permit the screening ofmultiple patient samples to monitor the progression ofdisease or treatment Therefore it is urgently necessaryto establish rigid international standardizations tominimize distortion and to avoid misleading conclusionsduring RNA-seq data interpretation In summary quali-tative and quantitative RNomics remain more challengingthan anticipated

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

FUNDING

National Genome Research Network [NGFNIII01GS0808 to JB and TSR] the Deutsche Forschungs-gemeinschaft [BR 7544-1 to JB] USM ResearchUniversity Grant [1001CIPPT81305 to THT]Funding for open access charge The licence was fundedby Dr Brosiusrsquos research budget

Conflict of interest statement None declared

REFERENCES

1 BrosiusJ (2009) The fragmented gene Ann N Y Acad Sci1178 186ndash193

2 BrosiusJ and TiedgeH (2004) RNomenclature RNA Biol 181ndash83

3 LiuQ and ParooZ (2010) Biochemical principles of small RNApathways Annu Rev Biochem 79 295ndash319

4 BrosiusJ (2012) In ErdmannVA and BarciszewskiJ (eds)From Nucleic Acids Sequences to Molecular Medicine RNATechnologies Springer Berlin Heidelberg pp 1ndash18

5 HuttenhoferA BrosiusJ and BachellerieJP (2002) RNomicsidentification and function of small non-messenger RNAs CurrOpin Chem Biol 6 835ndash843

6 AnS and SongJJ (2011) The coded functions of noncodingRNAs for gene regulation Mol Cells 31 491ndash496

7 BushatiN and CohenSM (2007) microRNA functions AnnuRev Cell Dev Biol 23 175ndash205

8 ChiangHR SchoenfeldLW RubyJG AuyeungVCSpiesN BaekD JohnstonWK RussC LuoS BabiarzJEet al (2010) Mammalian microRNAs experimental evaluation ofnovel and previously annotated genes Genes Dev 24 992ndash1009

9 AmbrosV (2001) microRNAs tiny regulators with greatpotential Cell 107 823ndash826

10 LauNC LimLP WeinsteinEG and BartelDP (2001) Anabundant class of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 294 858ndash862

11 RuvkunG (2001) Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash799

12 RuvkunGB (2003) The tiny RNA world Harvey Lect 991ndash21

13 GurtanAM and SharpPA (2013) The Role of miRNAs inRegulating Gene Expression Networks J Mol Biol 4253582ndash3600

14 LiangH ZhangJ ZenK ZhangCY and ChenX (2013)Nuclear microRNAs and their unconventional role in regulatingnon-coding RNAs Protein Cell 4 325ndash330

15 HuangV and LiLC (2012) miRNA goes nuclear RNA Biol 9269ndash273

16 WangZ HanJ CuiY FanK and ZhouX (2013) CirculatingmicroRNA-21 as noninvasive predictive biomarker for response incancer immunotherapy Med Hypotheses 81 41ndash43

17 ZhaoC ZhangJ ZhangS YuD ChenY LiuQ ShiMNiC and ZhuM (2013) Diagnostic and biological significance ofmicroRNA-192 in pancreatic ductal adenocarcinoma Oncol Rep30 276ndash284

18 van RooijE (2011) The art of microRNA research Circ Res108 219ndash234

19 LeeLW ZhangS EtheridgeA MaL MartinD GalasDand WangK (2010) Complexity of the microRNA repertoirerevealed by next-generation sequencing RNA 16 2170ndash2180

20 OkoniewskiMJ and MillerCJ (2006) Hybridization interactionsbetween probesets in short oligo microarrays lead to spuriouscorrelations BMC Bioinformatics 7 276

21 MortazaviA WilliamsBA McCueK SchaefferL andWoldB (2008) Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 5 621ndash628

22 LiuS LinL JiangP WangD and XingY (2011) Acomparison of RNA-Seq and high-density exon array fordetecting differential gene expression between closely relatedspecies Nucleic Acids Res 39 578ndash588

23 RaghavachariN BarbJ YangY LiuP WoodhouseKLevyD OrsquoDonnellC MunsonPJ and KatoG (2012) Asystematic comparison and evaluation of high density exon arraysand RNA-seq technology used to unravel the peripheral bloodtranscriptome of sickle cell disease BMC Med Genomics 5 28

24 MartinJA and WangZ (2011) Next-generation transcriptomeassembly Nat Rev Genet 12 671ndash682

25 WangZ GersteinM and SnyderM (2009) RNA-Seq arevolutionary tool for transcriptomics Nat Rev Genet 1057ndash63

26 KozomaraA and Griffiths-JonesS (2011) miRBase integratingmicroRNA annotation and deep-sequencing data Nucleic AcidsRes 39 D152ndashD157

27 LinsenSE de WitE JanssensG HeaterS ChapmanLParkinRK FritzB WymanSK de BruijnE VoestEE et al(2009) Limitations and possibilities of small RNA digital geneexpression profiling Nat Methods 6 474ndash476

28 BakerM (2010) MicroRNA profiling separating signal fromnoise Nat Methods 7 687ndash692

29 HafnerM RenwickN BrownM MihailovicA HolochDLinC PenaJT NusbaumJD MorozovP LudwigJ et al(2011) RNA-ligase-dependent biases in miRNA representationin deep-sequenced small RNA cDNA libraries RNA 171697ndash1712

30 DabneyJ and MeyerM (2012) Length and GC-biases duringsequencing library amplification a comparison of variouspolymerase-buffer systems with ancient and modern DNAsequencing libraries Biotechniques 52 87ndash94

31 OrpanaAK HoTH and StenmanJ (2012) Multiple heatpulses during PCR extension enabling amplification of GC-richsequences and reducing amplification bias Anal Chem 842081ndash2087

32 SendlerE JohnsonGD and KrawetzSA (2011) Local andglobal factors affecting RNA sequencing analysis Anal Biochem419 317ndash322

33 ZhuangF FuchsRT SunZ ZhengY and RobbGB (2012)Structural bias in T4 RNA ligase-mediated 30-adapter ligationNucleic Acids Res 40 e54

34 DeChiaraTM and BrosiusJ (1987) Neural BC1 RNA cDNAclones reveal nonrepetitive sequence content Proc Natl AcadSci USA 84 2624ndash2628

Nucleic Acids Research 2013 11

35 WickensM and KwakJE (2008) Molecular biology A tail talefor U Science 319 1344ndash1345

36 MunafoDB and RobbGB (2010) Optimization of enzymaticreaction conditions for generating representative pools of cDNAfrom small RNA RNA 16 2537ndash2552

37 SarkarN (1997) Polyadenylation of mRNA in prokaryotesAnnu Rev Biochem 66 173ndash197

38 Yehudai-ResheffS and SchusterG (2000) Characterization of theEcoli poly(A) polymerase nucleotide specificity RNA-bindingaffinities and RNA structure dependence Nucleic Acids Res 281139ndash1144

39 RaabeCA HoeCH RandauG BrosiusJ TangTH andRozhdestvenskyTS (2011) The rocks and shallows of deep RNAsequencing Examples in the Vibrio cholerae RNome RNA 171357ndash1366

40 RaabeCA SanchezCP RandauG RobeckT SkryabinBVChinniSV KubeM ReinhardtR NgGH ManickamRet al (2010) A global view of the nonprotein-codingtranscriptome in Plasmodium falciparum Nucleic Acids Res 38608ndash617

41 SharmaCM HoffmannS DarfeuilleF ReignierJ FindeissSSittkaA ChabasS ReicheK HackermullerJ ReinhardtRet al (2010) The primary transcriptome of the major humanpathogen Helicobacter pylori Nature 464 250ndash255

42 SharmaCM and VogelJ (2009) Experimental approaches forthe discovery and characterization of regulatory small RNACurr Opin Microbiol 12 536ndash546

43 YuB YangZ LiJ MinakhinaS YangM PadgettRWStewardR and ChenX (2005) Methylation as a crucial step inplant microRNA biogenesis Science 307 932ndash935

44 HouwingS KammingaLM BerezikovE CronemboldDGirardA van den ElstH FilippovDV BlaserH RazEMoensCB et al (2007) A role for Piwi and piRNAs in germcell maintenance and transposon silencing in Zebrafish Cell 12969ndash82

45 AckermannHW (2003) Bacteriophage observations andevolution Res Microbiol 154 245ndash251

46 RossmannMG MesyanzhinovVV ArisakaF andLeimanPG (2004) The bacteriophage T4 DNA injectionmachine Curr Opin Struct Biol 14 171ndash180

47 HoCK and ShumanS (2002) Bacteriophage T4 RNA ligase 2(gp241) exemplifies a family of RNA ligases found in allphylogenetic domains Proc Natl Acad Sci USA 9912709ndash12714

48 YinS HoCK and ShumanS (2003) Structure-function analysisof T4 RNA ligase 2 J Biol Chem 278 17601ndash17608

49 AravinA and TuschlT (2005) Identification and characterizationof small RNAs involved in RNA silencing FEBS Lett 5795830ndash5840

50 ViolletS FuchsRT MunafoDB ZhuangF and RobbGB(2011) T4 RNA ligase 2 truncated active site mutants improvedtools for RNA analysis BMC Biotechnol 11 72

51 UhlenbeckOC and GumportRI (1982) T4 RNA LigaseIn BoyerPD (ed) The Enzymes Nucleic Acids Part BAcademic Press Inc New York pp 31ndash58

52 HoCK WangLK LimaCD and ShumanS (2004) Structureand mechanism of RNA ligase Structure 12 327ndash339

53 El OmariK RenJ BirdLE BonaMK KlarmannGLeGriceSF and StammersDK (2006) Molecular architectureand ligand recognition determinants for T4 RNA ligase J BiolChem 281 1573ndash1579

54 StarkMR PleissJA DerasM ScaringeSA and RaderSD(2006) An RNA ligase-mediated method for the efficient creationof large synthetic RNAs RNA 12 2014ndash2019

55 CranstonJW SilberR MalathiVG and HurwitzJ (1974)Studies on ribonucleic acid ligase Characterization of anadenosine triphosphate-inorganic pyrophosphate exchangereaction and demonstration of an enzyme-adenylate complex withT4 bacteriophage-induced enzyme J Biol Chem 2497447ndash7456

56 HigginsNP GeballeAP SnopekTJ SuginoA andCozzarelliNR (1977) Bacteriophage T4 RNA ligase preparationof a physically homogeneous nuclease-free enzyme fromhyperproducing infected cells Nucleic Acids Res 4 3175ndash3186

57 PfefferS Lagos-QuintanaM and TuschlT (2005) Cloning ofsmall RNA molecules Curr Protoc Mol Biol Chapter 26 Unit26 24

58 BruceAG and UhlenbeckOC (1978) Reactions at the terminiof tRNA with T4 RNA ligase Nucleic Acids Res 5 3665ndash3677

59 VigneaultF SismourAM and ChurchGM (2008) EfficientmicroRNA capture and bar-coding via enzymatic oligonucleotideadenylation Nat Methods 5 777ndash779

60 LiuJM LivnyJ LawrenceMS KimballMD WaldorMKand CamilliA (2009) Experimental discovery of sRNAs in Vibriocholerae by direct cloning 5StRNA depletion and parallelsequencing Nucleic Acids Res 37 e46

61 KrugM and UhlenbeckOC (1982) Reversal of T4 RNA ligaseBiochemistry 21 1858ndash1864

62 HafnerM LandgrafP LudwigJ RiceA OjoT LinCHolochD LimC and TuschlT (2008) Identification ofmicroRNAs and other small regulatory RNAs using cDNAlibrary sequencing Methods 44 3ndash12

63 JayaprakashAD JabadoO BrownBD and SachidanandamR(2011) Identification and remediation of biases in the activity ofRNA ligases in small-RNA deep sequencing Nucleic Acids Res39 e141

64 YewCW and KumarSV (2012) Isolation and cloning ofmicroRNAs from recalcitrant plant tissues with small amounts oftotal RNA a step-by step approach Mol Biol Rep 391783ndash1790

65 KurthHM and MochizukiK (2009) 2rsquo-O-methylation stabilizesPiwi-associated small RNAs and ensures DNA elimination inTetrahymena RNA 15 675ndash685

66 EbhardtHA ThiEP WangMB and UnrauPJ (2005)Extensive 30 modification of plant small RNAs is modulated byhelper component-proteinase expression Proc Natl Acad SciUSA 102 13398ndash13403

67 WangJ CzechB CrunkA WallaceA MitrevaMHannonGJ and DavisRE (2011) Deep small RNA sequencingfrom the nematode Ascaris reveals conservation functionaldiversification and novel developmental profiles Genome Res 211462ndash1477

68 ZhuangF FuchsRT and RobbGB (2012) Small RNAexpression profiling by high-throughput sequencing implicationsof enzymatic manipulation J Nucleic Acids 2012 360358

69 FujimuraT and EstebanR (2012) Cap snatching of yeast L-Adouble-stranded RNA virus can operate in trans and requiresviral polymerase actively engaging in transcription J Biol Chem287 12797ndash12804

70 BessmanMJ WalshJD DunnCA SwaminathanJWeldonJE and ShenJ (2001) The gene ygdP associated withthe invasiveness of Escherichia coli K1 designates a Nudixhydrolase Orf176 active on adenosine (50)-pentaphospho-(50)-adenosine (Ap5A) J Biol Chem 276 37834ndash37838

71 DeanaA CelesnikH and BelascoJG (2008) The bacterialenzyme RppH triggers messenger RNA degradation by50 pyrophosphate removal Nature 451 355ndash358

72 BlewettN CollerJ and GoldstrohmA (2011) A quantitativeassay for measuring mRNA decapping by splinted ligation reversetranscription polymerase chain reaction qSL-RT-PCR RNA 17535ndash543

73 KrogerC DillonSC CameronAD PapenfortKSivasankaranSK HokampK ChaoY SittkaA HebrardMHandlerK et al (2012) The transcriptional landscape and smallRNAs of Salmonella enterica serovar Typhimurium Proc NatlAcad Sci USA 109 E1277ndashE1286

74 EnglandTE and UhlenbeckOC (1978) Enzymaticoligoribonucleotide synthesis with T4 RNA ligase Biochemistry17 2069ndash2076

75 SorefanK PaisH HallAE KozomaraA Griffiths-JonesSMoultonV and DalmayT (2012) Reducing ligation bias of smallRNAs in libraries for next generation sequencing Silence 3 4

76 ZhelkovskyAM and McReynoldsLA (2011) Simple andefficient synthesis of 50 pre-adenylated DNA using thermostableRNA ligase Nucleic Acids Res 39 e117

77 McCoyMI and GumportRI (1980) T4 ribonucleic acid ligasejoins single-strand oligo(deoxyribonucleotides) Biochemistry 19635ndash642

12 Nucleic Acids Research 2013

78 AlonS VigneaultF EminagaS ChristodoulouDCSeidmanJG ChurchGM and EisenbergE (2011) Barcodingbias in high-throughput multiplex sequencing of miRNA GenomeRes 21 1506ndash1511

79 SunG WuX WangJ LiH LiX GaoH RossiJ andYenY (2011) A bias-reducing strategy in profiling small RNAsusing Solexa RNA 17 2256ndash2262

80 Van NieuwerburghF SoetaertS PodshivalovaK Ay-LinWangE SchafferL DeforceD SalomonDR HeadSR andOrdoukhanianP (2011) Quantitative bias in Illumina TruSeq anda novel post amplification barcoding strategy for multiplexedDNA and small RNA deep sequencing PLoS One 6 e26969

81 VigneaultF Ter-OvanesyanD AlonS EminagaSC ChristodoulouD SeidmanJG EisenbergE and MChurchG (2012) High-throughput multiplex sequencingof miRNA Curr Protoc Hum Genet Chapter 11 Unit 1112 11-10

82 PakJ and FireA (2007) Distinct populations of primary andsecondary effectors during RNAi in C elegans Science 315241ndash244

83 ZhuYY MachlederEM ChenchikA LiR and SiebertPD(2001) Reverse transcriptase template switching a SMARTapproach for full-length cDNA library constructionBiotechniques 30 892ndash897

84 LevinJZ YassourM AdiconisX NusbaumCThompsonDA FriedmanN GnirkeA and RegevA (2010)Comprehensive comparative analysis of strand-specific RNAsequencing methods Nat Methods 7 709ndash715

85 AndoY BurroughsAM KawanoM de HoonMJ andHayashizakiY (2012) Targeted methods to improve small RNAprofiles generated by deep sequencing In MallickB andGhoshZ (eds) Regulatory RNAs Basics Methods andApplications Springer-Verlag Heidelberg Germany pp 253ndash271

86 KawanoM KawazuC LizioM KawajiH CarninciPSuzukiH and HayashizakiY (2010) Reduction of non-insertsequence reads by dimer eliminator LNA oligonucleotide forsmall RNA deep sequencing Biotechniques 49 751ndash755

87 KapranovP OzsolakF and MilosPM (2012) Profiling of shortRNAs using Helicos single-molecule sequencing Methods MolBiol 822 219ndash232

88 KanagawaT (2003) Bias and artifacts in multitemplatepolymerase chain reactions (PCR) J Biosci Bioeng 96 317ndash323

89 DayDJ SpeiserPW SchulzeE BettendorfM FitnessJBaranyF and WhitePC (1996) Identification of non-amplifyingCYP21 genes when using PCR-based diagnosis of 21-hydroxylasedeficiency in congenital adrenal hyperplasia (CAH) affectedpedigrees Hum Mol Genet 5 2039ndash2048

90 AirdD RossMG ChenWS DanielssonM FennellTRussC JaffeDB NusbaumC and GnirkeA (2011) Analyzingand minimizing PCR amplification bias in Illumina sequencinglibraries Genome Biol 12 R18

91 UllahMF and AatifM (2009) The footprints of cancerdevelopment cancer biomarkers Cancer Treat Rev 35 193ndash200

92 MoMH ChenL FuY WangW and FuSW (2012) Cell-freeCirculating miRNA Biomarkers in Cancer J Cancer 3 432ndash448

93 ZenK and ZhangCY (2012) Circulating microRNAs a novelclass of biomarkers to diagnose and monitor human cancersMed Res Rev 32 326ndash348

94 KimT and ReitmairA (2013) Non-Coding RNAs FunctionalAspects and Diagnostic Utility in Oncology Int J Mol Sci 144934ndash4968

95 PatnaikSK MallickR and YendamuriS (2010) Detection ofmicroRNAs in dried serum blots Anal Biochem 407 147ndash149

96 MitchellPS ParkinRK KrohEM FritzBR WymanSKPogosova-AgadjanyanEL PetersonA NoteboomJOrsquoBriantKC AllenA et al (2008) Circulating microRNAs as

stable blood-based markers for cancer detection Proc NatlAcad Sci USA 105 10513ndash10518

97 CieslaM SkrzypekK KozakowskaM LobodaAJozkowiczA and DulakJ (2011) MicroRNAs as biomarkers ofdisease onset Anal Bioanal Chem 401 2051ndash2061

98 OzsolakF (2012) Third-generation sequencing techniques andapplications to drug discovery Expert Opin Drug Discov 7231ndash243

99 TsongalisGJ CalinG CordelierP CroceC MonzonF andSzafranska-SchwarzbachAE (2013) MicroRNA analysis is itready for prime time Clin Chem 59 343ndash347

100 AmiraN Cancel-TassinG BernardiniS Cochand-PriolletBBittardH ManginP FournierG LatilA and CussenotO(2004) Expression in bladder transitional cell carcinoma by real-time quantitative reverse transcription polymerase chain reactionarray of 65 genes at the tumor suppressor locus 9q341-2identification of 5 candidates tumor suppressor genes Int JCancer 111 539ndash542

101 ZhangX (2007) Biomarker validation movement towardspersonalized medicine Expert Rev Mol Diagn 7 469ndash471

102 SchmittgenTD LeeEJ and JiangJ (2008) High-throughputreal-time PCR Methods Mol Biol 429 89ndash98

103 BenesV and CastoldiM (2010) Expression profiling ofmicroRNA using real-time quantitative PCR how to use it andwhat is available Methods 50 244ndash249

104 BustinSA and NolanT (2004) Pitfalls of quantitative real-timereverse-transcription polymerase chain reaction J Biomol Tech15 155ndash166

105 BeitzingerM and MeisterG (2011) Experimental identificationof microRNA targets by immunoprecipitation of Argonauteprotein complexes Methods Mol Biol 732 153ndash167

106 FiedlerSD CarlettiMZ and ChristensonLK (2010)Quantitative RT-PCR methods for mature microRNA expressionanalysis Methods Mol Biol 630 49ndash64

107 ArefianE KianiJ SoleimaniM ShariatiSA Aghaee-BakhtiariSH AtashiA GheisariY AhmadbeigiN Banaei-MoghaddamAM NaderiM et al (2011) Analysis ofmicroRNA signatures using size-coded ligation-mediated PCRNucleic Acids Res 39 e80

108 JonstrupSP KochJ and KjemsJ (2006) A microRNAdetection system based on padlock probes and rolling circleamplification RNA 12 1747ndash1752

109 LatorraD ArarK and HurleyJM (2003) Designconsiderations and effects of LNA in PCR primers Mol CellProbes 17 253ndash259

110 BustinSA BenesV GarsonJA HellemansJ HuggettJKubistaM MuellerR NolanT PfafflMW ShipleyGLet al (2009) The MIQE guidelines minimum information forpublication of quantitative real-time PCR experiments ClinChem 55 611ndash622

111 TrapnellC WilliamsBA PerteaG MortazaviA KwanGvan BarenMJ SalzbergSL WoldBJ and PachterL (2010)Transcript assembly and quantification by RNA-Seq revealsunannotated transcripts and isoform switching during celldifferentiation Nat Biotechnol 28 511ndash515

112 TrapnellC PachterL and SalzbergSL (2009) TopHatdiscovering splice junctions with RNA-Seq Bioinformatics 251105ndash1111

113 ListerR OrsquoMalleyRC Tonti-FilippiniJ GregoryBDBerryCC MillarAH and EckerJR (2008) Highly integratedsingle-base resolution maps of the epigenome in ArabidopsisCell 133 523ndash536

114 PolidorosAN PasentsisK and TsaftarisAS (2006) Rollingcircle amplification-RACE a method for simultaneous isolationof 50 and 30 cDNA ends from amplified cDNA templatesBiotechniques 41 35ndash42

Nucleic Acids Research 2013 13

Page 5: SURVEY AND SUMMARY Biases in small RNA deep sequencing data

RNA circularization depends on AMP removal from thepre-adenylated 50-App-adapter and its subsequent transferto 50-monophosphorylated RNAs therefore the applica-tion of truncated T4Rnl2 (T4Rnl2tr) to minimize this sidereaction is highly suggested (4950) In addition theexchange of lysine with glutamine (K227Q) in T4Rnl2trfurther reduced adenyl-transfer competence but retainedactivity to catalyze phosphodiester bond formation (48)Therefore the application of T4Rnl2 derivatives is bene-ficial for decreasing bias in sRNA-seq analysis (29)

Alternatively phosphatase pretreatment to generateRNA 50-OH termini that are inert to ligase-catalyzedadenylation helps to minimize side reactions The proced-ure ultimately requires the application of polynucleotidekinase to generate monophosphorylated RNA endsHowever tRNA rRNA and mRNA processing inter-mediates are reported to contain 50-hydroxylated termini(2962) Polynucleotide kinase treatment as a necessaryprerequisite to RNA 50-adapter ligation leads to the inclu-sion of abundant processing intermediates in cDNAlibraries The alternative techniques to avoid RNA circu-larization might distort cDNA representation and aretherefore likely to reduce the complexity of sRNA-seqexperiments (2962)

30-end ligation depends on RNAndashadapter combinationRNA 30-end ligation using T4Rnl1 was recently reportedas a potential source of bias in deep sRNA sequencing(39) Comparisons of the product yields of ligation reac-tions were made using three test RNAs and variousadapter sequences Depending on the specific RNAndashadapter combination the products differed remarkably

Similarly Jayaprakash et al (63) reported variations inproduct yields depending on specific RNAndashadapter com-binations in Rnl2tr-catalyzed reactions Because the RNA30-end ligation experiments were conducted with 50 pre-adenylated adapters the results also indicate thatadenyl-transfer is less likely the cause of the observedbias (63) RNA 30-adapter ligation was enhanced whenadapter pools displaying nucleotide variation at the two50 terminal positions were used (63) Therefore increasedvariability at adapter 50-ends enhances RNA ligation yieldand in general helps to minimize distortion in sRNA-seq(333963) RNA 50-phosphorylation also influences theefficacy of adapter ligation to RNA 30-ends (39) Theligation efficacies of T4 Rnl1-mediated reactions werecompared for test RNAs featuring different phosphoryl-ation states at RNA 50-ends and different adaptersincluding those with 50-App-modification UnexpectedlyRNAs that performed well when monophosphorylateddisplayed weak acceptor characteristics whentriphosphorylated Therefore the efficacy of RNA 30-endmodification might also be a function of the RNA 50-phos-phorylation state (Figure 1)

Ligation of adapters to modified RNAsIn ligation assays T4Rnl1 and T4Rnl2tr performedalmost identically using 30-unmodified test RNA sub-strates However the efficacy of T4Rnl1-catalyzed reac-tions dropped remarkably when 20-O-methylated RNAswere used (36) In contrast T4Rnl2tr performed almostequally well irrespective of RNA modification (36) Theadditions of 25 (wv) PEG 8000 to the T4Rnl2tr-mediated reactions significantly increased reaction yields

Figure 2 Schematic representation of the three-step mechanism of the bacteriophage T4 RNA ligase (T4Rnl) reaction with the potential sideproducts Encircled Arabic numbers indicate the order of ligation steps Step 1 adenylation of the T4Rnl active site Step 2 donor 50-adenylation(side reaction circularization of 30-end-unprotected donor molecules) Step 3 phosphodiester bond formation between 30-hydroxylated (OH) acceptorand donor molecules (side reaction reverse adenylation of donor and circularization of acceptor molecules) PPi pyrophosphate p monophosphateAMP adenosine monophosphate (Ap) App 50-adenylated termini

Nucleic Acids Research 2013 5

(5064) Hence additives that increase the effective mo-lecular concentration have a major influence on productyields and meta-analyses of data generated by differentligases or even in different buffer compositions are error-proneAs an aside specific enrichment of 30-end 20-

O-methylated RNAs like the piwi-interacting RNAs(piRNAs) or plant miRNAs is achieved by periodatetreatment The reaction relies on vicinal hydroxyl groupsto oxidize sugar moieties and is accompanied by pentosering opening The resulting vicinal dialdehyde reacts withwater and forms a chemical equilibrium with thecorresponding dioxane derivative upon ring closure The20-O-methyl group-containing RNAs are insensitive tooxidation and remain accessible for adapter ligation ortailing (65ndash67)

Impact of RNA secondary structure and adapter-RNAcofolding on ligationRecent analysis indicates that RNA secondary structureimpacts sRNA-seq bias (29) During kinetic profiling ofRNAndashadapter ligation test miRNAs were trapped in non-reactive states Denaturing the remaining non-ligatedRNAs resulted in increased ligation yield indirectly sug-gesting the influence of RNA secondary structure (29) Tofurther analyze the impact of secondary structure on RNA30-end ligation a pool of substrate RNAs with 21randomized nucleotides at their 30-ends was ligated withindividual pre-adenylated DNA adapters and reactionyields were subsequently quantified by deep sequencing(33) To avoid bias caused by template-specific RNA50-end modification(s) all test RNAs contained identical50-termini (33) RNAs that harbored stems at their 30-endswere underrepresented in the sequencing data whereasRNAs containing a minimum of three unstructured nu-cleotides at the 30-ends were overrepresented The dataindicate that RNA ligase-mediated reactions in generalare more efficacious with RNA substrates that harborstructurally accessible 30-ends (33)In addition the cofolding of RNA and adapter mol-

ecules impacts ligation (33) Adapter sequence variationthat interfered with the RNA 30-stem formation andpermitted intermolecular base-pairing increased ligationyields (33) Interestingly this data contradicts prioranalysis emphasizing a predominant primary sequence-related influence on data distortion (3363) Althoughthe actual source of discrepancy remains to be analyzedprotocol-specific variation probably due to changes in de-naturation is likely to have caused the observed difference(see supplementary information) Irrespective of the actualreason the advice for minimizing this bias is identical andrelies on the application of adapter pools to increase thelikelihood of productive interactions (33) Thus inaddition to the RNA nucleotide composition RNAndashadapter cofolding and secondary structure-related influ-ences contribute to sRNA-seq bias

RNA 50-end ligationThe chemical properties of RNA 50-ends display structuralvariations Depending on the starting material and experi-mental design it may be necessary to alter the chemistry

of RNA 50-termini prior to cDNA generation Forexample neither 50-triphosphorylated RNAs in prokary-otes nor primary eukaryotic RNA polymerase III tran-scripts are accessible for direct 5-adapter ligationSimilarly RNA polymerase II-specific trimethylguanosinecap structures render RNA 50-ends inaccessible Thusenzymatic pretreatment is usually conducted to avoidbias in the starting material (Figure 1 SupplementaryData and Supplementary Table S1)

Specific biochemistry of RNA 50-termini and enrichmentof various RNA classesAlthough the various chemical properties of RNA 50-endsestablish limitations for unbiased representation they alsopermit specific enrichments (41) (Figure 1 SupplementaryData and Supplementary Table S1) RNAs that containmonophosphorylated 50-ends and 30-hydroxyl groups aredirectly accessible to cDNA generation proceduresConsequently unaltered RNA starting material leadsto its specific enrichment (Figure 1 SupplementaryTable S1) In contrast triphosphorylated or 50-cappedRNAs require TAP treatment to generate 50-monophosphorylated RNAs and to permit 50-end ligation

Biochemical alternatives to specifically cleave pyrophos-phate bonds in 50-triphophorylated RNAs were reportedrecently (68) whereby TAP-mediated hydrolysis issubstituted by polyphosphatase (69) or pyrophospho-hydrolase (7071) treatment In vitro analysis to investigatethe substrate specificity of E coli rppH pyrophospho-hydrolase indicated that pyrophosphate hydrolysisproceeds with identical efficacy irrespective of the RNA50-terminal nucleotide (71) Furthermore a subtle declinein the enzymatic activity in response to potentially inhibi-tory RNA 50-terminal stem loop structures was reported(71) These strategies are particularly appealing when bac-terial primary transcripts or nascent eukaryotic RNA poly-merase III transcripts are central to the investigation Inaddition to targeting triphophorylated RNA 50-ends poly-phosphatase also accepts diphophorylated RNA 50-terminito catalyze pyrophosphate bond cleavage It should beemphasized that 50-capped RNA polymerase II transcriptsare not amenable to polyphosphatase or pyrophospho-hydrolase treatment and therefore escape identification

The application of a TerminatorTM 50-phosphate-dependent exonuclease is frequently utilized to enrich for50-capped and 50-triphospharylated RNAs (417273)The inherent 50 30 exonuclease activity digests50-monophosphorylated RNAs and leaves other tran-scripts unaltered (Figure 1 Supplementary Data andSupplementary Table S1) Therefore different chemistriesof RNA 50-ends not only distort sRNA-seq but also allowfor specific enrichment protocols Subsequent analysispermits the distinction among RNA subsets such asprimary transcripts capped RNAs and other productsof RNA 50-end processing (417273)

Nucleotide preference of RNA 50 ligationApart from specific differences in RNA 50-ends the influ-ence of the adapter sequence itself on ligation efficacy iswell documented Biochemical data indicated that ligationreactions proceeded most effectively when an adenosine

6 Nucleic Acids Research 2013

residue occupied the 30-terminal position of adapter(acceptor) molecules (5174) Recently adapter pools dis-playing sequence identity at all except the two 30-terminalnucleotides were used to investigate sequence-dependentvariation in acceptor function Based on deep sequencingas the readout the relative ranking of miRNA frequenciesvaried as a consequence of the specific adapterndashRNA com-binations (63) Once more miRNA expression profiles arenot only a function of true RNA abundance but alsoreflect varying ligation efficacies (63) Interestingly theuse of adapter pools highly improved the correlation ofmiRNA levels between replicates (63)

Influence of RNA secondary structure on RNA 50-endligationTo investigate potential influences on 50-end modificationderived from RNA secondary structure ligation yieldswere compared among test sets of fully randomizedRNAs A remarkable underrepresentation of double-strand folds at 50-ligation sites indicated strong preferencesfor single-stranded RNA substrates in ligation reactionsconducted with T4Rnl1 (75)

Sugar moiety-dependent change in ligation efficacyDNA-based adapter oligonucleotides performwell in RNA30-end ligation reactions (especially when 50-adenylated)whereas RNA-based adapters are preferred for RNA50-end modification (76) Kinetic analysis of DNA-basedadapter molecules for RNA 50-ligation demonstratedslower reactions compared with their RNA counterparts(77) Recent analyses revealed more complex substrate-specific effects between test RNAs and different adaptersWhen DNA- and RNA-based adapters of identical se-quences were investigated in ligation assays no generalpreference with regard to the sugar moiety of the adaptermolecule was observed in RNA 50-end ligation (3963) Inconclusion to avoid bias adapter pools are recommendedto display sequence variation The chance of productiveligation might be enhanced using RNA DNA or mixedadapter oligonucleotides (3963)

Adapter barcoding influences ligationIn addition to nucleotide preferences of the reactingresidues upstream sequences also exert influence onligation yields Adapters harboring different Roche MIDtags placed 19 nucleotides upstream from the reacting30-ends performed entirely differently when assayed withidentical test RNAs (39) Depending on the combinationof in vitro transcribed non-protein coding RNAs andMID-modified adapters ligation product levels (calculatedas non-protein coding RNA input) ranged from 60ndash70to undetectable (39) Recent reports emphasized thatbarcoding for multiplex sequencing leads to significantbias in miRNA expression profiles (78ndash80) This bias wasdiminished when barcodes were introduced by PCRamplification of cDNA (78ndash81)

Quantitative considerations and barcodingIn general the discovery of novel RNAs and analyses oftranscripts that are expressed at very low levels requiresubstantially more cDNA reads than experiments

directed toward investigating differential gene expressionsHence the design of sRNA-seq experiments needs totake into account the rationale of the investigationContigs formed by continuously overlapping readsduring genomic mapping procedures represent thebio-computational equivalent of individual RNAentities Any selected read during cDNA mapping eithercontributes to pre-existing already populated RNAs ormight generate novel contigs The probability of eitherprocess is a function of RNA size as well as the corres-ponding RNA expression value Of course contigs repre-senting longer rather than shorter RNAs are more likelyto be populated by chance RNAs of higher relative ex-pression levels are generally more densely packed withcDNA reads than RNAs that display medium or low tran-scription levels Consequently if multiplexing is chosenfor economic reasons the necessary sequencing depthmust be approximated prior to sequence runs in order togenerate adequate sample sizes The two fixed parametersfor estimating admissible multiplexing are therefore (i) ex-perimental goal and (ii) lane capacity According toIllumina TruSeq Small RNA Sample Prep Kitguidance 5 million sRNA-seq reads are considered appro-priate for novel miRNA discovery within human-sizedgenomes For miRNA profiling usually about 25 millionreads are recommended These approximations permit re-calculation geared toward specific needs and requirements

Alternative cloning strategies to avoid 50-end modificationBecause the 30-terminal position of cDNA corresponds tothe RNA 50-ends Pak et al (82) used a technique to avoidmodification of RNA 50-termini and therefore to minimizethe need for enzymatic pretreatment The strategy intro-duces adapter sequences not to RNA 50-termini butrather to 30-ends of the corresponding cDNAs (82) Theadvantage of omitting enzymatic treatment of RNA50-ends might be offset by incomplete reverse transcriptionthat precludes full-length cDNA cloning or appropriateassessment of RNA 50-endsRecently ligation to cDNA 30-ends was investigated in

more detail (36) Perfect double-stranded DNA-RNA testhybrids and single-stranded DNAs were utilized to mimicpotential products of reverse transcription All T4 DNA-and RNA-ligases including T4Rnl2 truncated derivativeswere compared to delineate optimal conditions for cDNA30-end modification Surprisingly ligation reactionscarried out with pre-adenylated DNA donor moleculesstrongly indicated that double-stranded DNA-RNAchimera constituted the most reactive substrates forT4Rnl1-mediated ligation In addition supplements ofPEG 8000 strikingly influenced ligation final concentra-tions of 25 PEG 8000 increased reaction yields Identicalsingle-stranded test cDNAs exhibited only poor substratecharacteristics The A-form-like conformation of theDNA-RNA hybrid might have compensated for theprimary structure-related influences and permitted moresimilar ligation outcomes among substrates (36)However the analysis examined only one test substrateand therefore might not be amenable to drawing generalconclusions (36) A further interesting observation indi-cates that cDNA 50-terminal nucleotides influence cDNA

Nucleic Acids Research 2013 7

30-end ligation yields (39) As the cDNA 50-terminalsequence is a consequence of specific RNA 30-end modifi-cation differences between protocols exert different influ-ences on cDNA 30-end adapter ligation and in turn mightalso cause expression profile bias (39)The lsquoSmartTM Approachrsquo (83) offers another interesting

alternative to capture mature RNA 50-ends and to avoidRNA 50-end adapter ligation The technology is based onthe template switching capacity of Moloney murineleukemia virus (M-MLV) reverse transcriptase Oncefirst strand synthesis approaches the RNA 50-end theterminal transferase activity of M-MLV reverse transcript-ase adds oligo(C)-stretches to the growing cDNAOligonucleotides that 30-terminate with oligo(G) stretchesserve to extend templates and permit reverse transcriptionto proceed Thus M-MLV reverse transcriptase enablesthe generation of cDNAs with known terminal sequenceelements (83) and thereby eliminates the need to modifythe RNA 50-ends Recently the technology was adapted tothe specific needs of Ilumina RNA-seq analysis (84)

Adapterndashadapter ligation disturbs sRNA-seqA side effect common to sRNA-seq protocols that rely onligase-mediated sequential RNA end-modification is PCRamplification of empty adapterndashadapter ligation products(8185) Adapterndashdimer formation is a direct consequenceof the molecular adapter excess employed to ensure suffi-cient product yield and might dominate in reactions wherethe concentration of total RNA starting material is low(8186) Adapter molecules that are 50-pre-adenylatedremain in reaction mixtures after RNA 30-end modifica-tion and react with RNA 50-adapter oligonucleotideswhich subsequently favors dimer formation Thereforepurifying products away from excess adapters via gelelectrophoresis is recommended (78) However for prep-aration of tiny RNAs such as miRNAs separation of thesmall size difference between ligated product and adaptersis not trivial Alternative approaches to avoid size frac-tionation rely on application of complementary lsquodimer-eliminator-LNA-oligonucleotidesrsquo (8586) The interferingoligonucleotide is designed to overlap the adapterndashadapterligation junction and thereby distinguish between targetedside product and adapter ligated RNA (86) The comple-mentarity of the interfering oligonucleotides extends intothe RNA 30-adapter sequence thus blocking primerannealing during reverse transcription and as a resultexcludes adapter dimers from RNA sequencing (86)Vigneault et al reported a similar strategy to overcomeunintended dimer amplification (81) Pre-annealing of theRT primer after RNA 30-adapter ligation but prior toRNA 50-end modification efficiently decreased adapterdimer formation The excess of non-ligated free RNA30-adapters was captured by complementary reverse tran-scriptase primers and sequestered in less-reactive double-stranded structures (81) The latter approach permitseffective reduction of dimer amplification and is reportedto increase cDNA library complexity (81) ObviouslyRNA tailing strategies (see above) are superior becausethey completely circumvent the problem of adapter-dimer formation To avoid adapter tailing as an analogous

side reaction to dimer formation it is advisable to carryout 30-tailing prior to 50-adapter ligation

Depending on the experimental goal sRNA-seqrequires varying amounts of cDNA reads (see above)Therefore techniques to reduce the amount of readsthat represent empty adapter dimers are of central import-ance as they permit an increase in sequencing depthreducing costs and ultimately enabling novel smallnon-protein coding RNA gene discovery

PCR AMPLIFICATION BIAS

Apart from novel third-generation sequencingtechnologies that permit direct sequence analysis of un-amplified cDNA all commonly used protocols implementPCR amplification (87) In addition the introduction ofbarcode identifiers by PCR amplification may also avoidbias introduced by differences in RNAndashadapter-specificligation efficacies (78) Therefore multiplexing of differentsamples during PCR is widely regarded to permit meta-analysis (78) However even PCR amplification itself isknown to introduce expression bias into datasets(303288) In particular when many different templatesare amplified in parallel variations not only in templatelength but also in base composition might lead topreferred template-specific amplifications (30) Extremeexamples of allelic dropout due to single base-exchangewere reported for PCR-mediated genotyping (89) TheRNA GC-content is especially responsible for data dis-tortion Generally cDNAs of high GC-content are lessreadily amplified and thus remain underrepresented Forexample read distributions calculated over the entire 28SrRNA were inversely correlated with the correspondingregional GC-content (32) As argued earlier base com-positions of different small non-protein coding RNAs(miRNAs piRNAs snoRNAs scaRNAs etc) varywidely The parallel amplification of complex templatemixtures that vary in their respective GC-content is there-fore a challenging procedure and likely to influence thecomplexity of the resulting cDNA library (30) Changesin complexity do not only influence distortion of quanti-tative measurements but certain RNAs might also lsquodropoutrsquo completely and remain unidentified Extended periodsof denaturation are considered to effectively decrease thelikelihood of uneven amplification (90) PCR amplifica-tion does not only change the overall GC-content of theresulting cDNA pool but is also reported to introducelength bias (30) which might be an important factorwhen RNAs of different sizes are subjected to parallelcDNA preparations Shorter sequences of significantlylower GC-content are likely to be overrepresented

Recent analysis also emphasized the impact of PCRbuffer compositions on PCR amplification bias (30)Remarkably PCR-mediated bias was already observedin some cases within 15 rounds of PCR amplification(30) While the popular Phusion PCR system surprisinglyperformed the worst (30) data indicate that someoptimized PCR buffers are capable of significantlyreducing PCR-mediated distortion (3090)

8 Nucleic Acids Research 2013

sRNA-SEQ AND PERSONALIZED MEDICINE

In recent years increasing attention has focused on thepotential of miRNA transcriptional profiling to diagnosedisease or to monitor progression and success in treatment(161791) The decreasing costs of high-throughputsequence analysis favors sRNA-seq as a common toolfor medical diagnostics and personalized medicine Aftercell death the total RNA of circulating cancer cells isreleased into the blood stream (92) These extracellularRNAs (exRNAs) are packaged within vesicles such asexosomes where they are complexed with differentproteins for example miRNAs in the RISC complexVesicles protect exRNA from ribonucleases and preventdegradation Among the various exRNAs miRNAs aresome of the most stable molecules (93) Easy accessibilityof patient blood samples and the stability of miRNAsmake them effective biomarkers (94) Diagnostics basedon miRNA were successfully conducted on biologicalsamples of low quality including dried blood (95) Inaddition miRNAs remain stable even when samples aresubjected to repeated rounds of tissue freezing andthawing which usually degrades RNAs (96) A furtherappealing feature of miRNAs in medical research is thatchanges in their expression profiles are observed duringdisease onset (97) In addition miRNAs often displaytissue-specific expression Therefore it might be possibleto identify the cellular origin of most metastases (91)possibly providing decision guidance for appropriatetreatment However as elaborated in detail miRNA ex-pression profiles detected by deep sequencing do not onlyreflect changes in relative expression but are alsoinfluenced by miRNA-adapter-specific ligation efficaciesespecially in cost-effective parallel analyses that oftenrequire multiplexing Different barcode introduced byligation generate bias in miRNA representation and abun-dance (78) Even when incorporated by PCR barcodingmight still lead to data distortion (30ndash32) Third-gener-ation sequencing techniques such as the Helicos (InNovember 2012 Helicos filed bankruptcy (httpbizyahoocome121115hlcs8-khtml)) system avoid PCRamplification steps in cDNA preparation and permitdirect sequencing of unamplified samples (87) The RNA30-end modification to prime cDNA synthesis (or directsequencing) in addition to relatively high error frequencies(15) might still lead to bias and so would not beespecially applicable in personalized medicine (98)Collectively deep sequencing to monitor change inmedical samples is of importance and certainly ofinterest but requires resolution of serious standardizationissues prior to general application

qRT-PCR a valid alternative to deep sequencing

Alternative techniques to quantify specific miRNA testsets rely on quantitative real-time reverse transcriptionPCR (qRT-PCR) (99) In particular diagnostic screensin cancer medicine rely on the application of qRT-PCR(99ndash101) In cases where biomarker disease correlation iswell established and when quantification of limitednumbers of biomarkers is sufficient for diagnosis and tomonitor disease progression or treatment qRT-PCR is

a valuable tool (99) Recently the convenience ofqRT-PCR-based medical diagnosis was suggested tomake it superior in routine clinical applications (99)Interestingly novel developments in qRT-PCR allowsample processing in 384-well formats and liken thismethod to cost-effective high-throughput applications(99102) A further advantage of qRT-PCR is the reduc-tion in downstream bio-computation (99103) The timebetween qRT-PCR and diagnostic result is shorter thanfor sRNA-seq or microarray-based inference Howeverthe analysis of miRNA expression profiles by qRT-PCRis technically challenging due to RNA-inherent pitfalls(103104)The miRNAs lack structural features such as mRNA

poly(A) tails that could serve as class-specific enrichmentor priming of reverse transcription (103) Alternative tech-niques for miRNA sample enrichment would rely onimmune-precipitation but most of them are not suitablefor typical small size samples in medical analysis (105) Asmentioned above the heterogeneity of the RNA GC-content translates into a wide range of primer annealingtemperatures for a population of miRNAs and mighttherefore complicate parallel analysis of multiple RNAs(103) Furthermore the qRT-PCR template is presentnot only within the mature miRNA molecule but also ispart of various precursor transcripts (103) It is thereforeimportant to ensure that primary or precursor-miRNAs(pri- or pre-miRNAs) do not provide templates inqRT-PCR amplification and as a consequence do notcontribute to the signal intensities detected Difficultiesthat relate to the similarity displayed between differentmiRNA isoforms are mentioned above within thecontext of array-based inference to quantify differentialgene expressionThus far two alternative approaches are commonly

utilized to reverse transcribe miRNA templates(103106) miRNA-specific stem-loop primers are oftenused when only limited numbers of RNAs are subject toquantification A major advantage of using miRNA-specific primers for reverse transcription is an increase insensitivity and specificity However RNA 30-end-tailing-based approaches enable bulk reverse transcription ofmultiple targets in parallel and are therefore advantageouswhen many miRNAs are analyzed (see above) A recentcomparative analysis to investigate reverse transcriptionefficacies of the two methods uncovered only minor dif-ferences (106) The analysis involved only a limitedmiRNA test set and therefore the results cannot begeneralized As an aside the oligo d(T)-based designmight also enable reverse transcription of miRNA andthe corresponding mRNA target in single test tube reac-tions (106) Ligation-mediated PCR or padlock probes incombination with rolling circle amplification are technicalalternatives for avoiding reverse transcription (107108)The methods rely on ligation-mediated template gener-ation and are therefore not devoid of bias (see above)Both approaches permit generation of miRNA signaturesacross samples and do not require elaborate analyticalpost-processing or expensive technical equipment(107108) However as elaborated here and abovereverse transcription is not devoid of template-specific

Nucleic Acids Research 2013 9

bias In particular RNA secondary structure and RNAmodifications both influence reverse transcription efficacy(see above) Similar to PCRs reverse transcription is nega-tively affected by increasing RNA GC-contentBecause the relative RNA GC-content dictates primer

Tm qRT-PCRs are also strongly influenced by forwardprimer design thereby affecting both qRT-PCR specificityand sensitivity Altering the length of primer-templatecomplementarity permits adjustment of the actualprimer Tm Conversely in cases where RNA nucleotidecomposition would dictate very low melting temperaturesLNA-modified nucleotides might be incorporated into theprimer sequence to increase the corresponding primer Tm(18) However the design of LNA-containing oligonucleo-tides is not always trivial (109) For the actual measure-ment of miRNA in real time two alternative fluorescentreporter molecules are frequently utilized SYBR Greenand the Taqman assay (103) Detection of the fluores-cence signal emitted by SYBR Green increases about100-fold as the molecule intercalates in double-strandedDNA However the mechanism of DNA intercalationsuggests the dye is not target-specific and thereforecannot discriminate primer dimers unspecific productsand target amplification (103) Because the intercalatingdye is simply added to the assay it is often possible toeasily adapt the qRT-PCR assay to already establishedand validated analytical PCRs (104) In comparison theTaqman assay provides increased specificity as the de-tection is based on sequence-specific probendashtarget inter-action Certainly the qRT-PCR assays offer manyadvantages but require evaluation of various factors thatinfluence and potentially bias analysis Thereforeinternational standardization to ensure comparable ex-perimental design and to permit meta-analysis amongdatasets is of prime importance for establishing clinicalqRT-PCR-based personal medicine The MinimumInformation for Publication of Quantitative Real-TimePCR experiments guidelines established standards for bet-ter experimental practice and suggest routes to data ana-lysis that increase reliability and data comparison (110)

mRNA-SEQ

Deep sequencing is not limited to expression profiling ofmiRNA or other small non-protein coding RNAs Evenentire transcriptomes including all protein coding andlong non-protein coding RNAs are subjected to high-throughput sequencing methods Currently as the lengthof single sequence runs is limited (from 100 to 500 nt de-pending on the sequencing platform) longer RNAsrequire fragmentation prior to cDNA synthesis (25)Millions of small reads resulting from mRNA-seq areanalyzed to identify novel transcripts or splice variants(111112) Relative quantification of reads enables one tomonitor differential gene expression among samples andto identify changes in isoform distribution (111) As withsRNA-seq the digital nature of RNA deep sequencing isthought to simplify meta-analysis of mRNA or long non-protein coding RNA expression data (25) Some protocolsfor modification of RNA fragment ends rely on

chemistries identical to sRNA-seq (84113) Therefore ex-pression biases observed in sRNA-seq analysis are alsolikely to cause distortion in whole transcriptome profilingVariations in nucleotide composition and changes in RNAsecondary structure of different RNA fragments mightcorrelate with uneven read distribution Unequal readcoverage generates uncertainty in isoform detection andin extreme cases exons representing unfavorable nucleo-tide compositions might escape analysis As fragmentationis suggested to generate many different but at least par-tially overlapping reads bias in mRNA-seq might be lessprominent Recently a novel technique based on circligasewas introduced to avoid RNA 50-end ligation (114)Reverse transcription is conducted with anchoredoligo(dT) primers that contain an additional sequence topermit later PCR amplification Following first strandcDNA synthesis circligase-mediated ligation is carriedout The resulting circular molecules are subjected toPCR amplification The strategy is of general interestand might be a valuable tool for sRNA-seq as well

CONCLUSION

Bias in general describes systematic errors that reflectmethod-related distortion from the truth Bias might bedetected easily when data generated by different methodsare subjected to analytical comparison Because it is sys-tematic in nature bias is not subject to variation inrepeated experiments Here based on biochemical anddeep sequencing data analyses we have presentedvarious steps in sRNA-seq protocols likely to causesevere distortions in the relative expression levels of indi-vidual sRNAs

The efficacy of RNA end-modification to permit cDNAconstruction depends not only on differences in strategiesand reagents but also on changes in among others buffercompositions and additives that could increase molecularcrowding Therefore meta-analysis of data generated withnon-identical methods reagents or even buffers is likely tobe error-prone (36396378) To achieve comparableresults and to evaluate relative changes in the quantifica-tion of the resulting data experiments and analyses shouldbe conducted under strictly identical conditions It has tobe stressed that bias in RNA-specific variation is not onlyrestricted to inter-sample comparison but also effects therelative ranking of intra-sample RNA expression

In addition to the more obvious quantitative aspects ofbias there are also qualitative aspects because RNA-seqmethods are also employed for RNA discovery IndividualRNAs might lsquodrop outrsquo from detection because they arenot amenable to cDNA construction under chosen condi-tions Certainly to increase the likelihood of productiveRNA end-modification and increase the complexity ofsRNA-seq library content the parallel application of dif-ferent strategies and reagents is required (33396379) Inparticular increased variability of adapter sequences (ieadapter pools) helps to increase the diversity of RNAsaccessed (63) A recent report emphasized the need forparallel application of different RNA- and DNA-basedadapter oligonucleotides (39)

10 Nucleic Acids Research 2013

The source of expression level distortion and the failureto detect certain RNA species at all appear not to belimited to reactions of RNAcDNA-end modificationbut also apply to cDNA amplification by PCR (30ndash32)Differences in reaction buffers RNA GC-content sec-ondary structures RNA length and primers might eachlead to bias during PCR Therefore the analysis ofsRNA-seq data to examine relative changes in geneexpression or identification reflects both true RNA abun-dance and biases related to the methods applied

The sRNA-seq approach is also attractive to medicalresearch Deep sequencing might permit the screening ofmultiple patient samples to monitor the progression ofdisease or treatment Therefore it is urgently necessaryto establish rigid international standardizations tominimize distortion and to avoid misleading conclusionsduring RNA-seq data interpretation In summary quali-tative and quantitative RNomics remain more challengingthan anticipated

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

FUNDING

National Genome Research Network [NGFNIII01GS0808 to JB and TSR] the Deutsche Forschungs-gemeinschaft [BR 7544-1 to JB] USM ResearchUniversity Grant [1001CIPPT81305 to THT]Funding for open access charge The licence was fundedby Dr Brosiusrsquos research budget

Conflict of interest statement None declared

REFERENCES

1 BrosiusJ (2009) The fragmented gene Ann N Y Acad Sci1178 186ndash193

2 BrosiusJ and TiedgeH (2004) RNomenclature RNA Biol 181ndash83

3 LiuQ and ParooZ (2010) Biochemical principles of small RNApathways Annu Rev Biochem 79 295ndash319

4 BrosiusJ (2012) In ErdmannVA and BarciszewskiJ (eds)From Nucleic Acids Sequences to Molecular Medicine RNATechnologies Springer Berlin Heidelberg pp 1ndash18

5 HuttenhoferA BrosiusJ and BachellerieJP (2002) RNomicsidentification and function of small non-messenger RNAs CurrOpin Chem Biol 6 835ndash843

6 AnS and SongJJ (2011) The coded functions of noncodingRNAs for gene regulation Mol Cells 31 491ndash496

7 BushatiN and CohenSM (2007) microRNA functions AnnuRev Cell Dev Biol 23 175ndash205

8 ChiangHR SchoenfeldLW RubyJG AuyeungVCSpiesN BaekD JohnstonWK RussC LuoS BabiarzJEet al (2010) Mammalian microRNAs experimental evaluation ofnovel and previously annotated genes Genes Dev 24 992ndash1009

9 AmbrosV (2001) microRNAs tiny regulators with greatpotential Cell 107 823ndash826

10 LauNC LimLP WeinsteinEG and BartelDP (2001) Anabundant class of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 294 858ndash862

11 RuvkunG (2001) Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash799

12 RuvkunGB (2003) The tiny RNA world Harvey Lect 991ndash21

13 GurtanAM and SharpPA (2013) The Role of miRNAs inRegulating Gene Expression Networks J Mol Biol 4253582ndash3600

14 LiangH ZhangJ ZenK ZhangCY and ChenX (2013)Nuclear microRNAs and their unconventional role in regulatingnon-coding RNAs Protein Cell 4 325ndash330

15 HuangV and LiLC (2012) miRNA goes nuclear RNA Biol 9269ndash273

16 WangZ HanJ CuiY FanK and ZhouX (2013) CirculatingmicroRNA-21 as noninvasive predictive biomarker for response incancer immunotherapy Med Hypotheses 81 41ndash43

17 ZhaoC ZhangJ ZhangS YuD ChenY LiuQ ShiMNiC and ZhuM (2013) Diagnostic and biological significance ofmicroRNA-192 in pancreatic ductal adenocarcinoma Oncol Rep30 276ndash284

18 van RooijE (2011) The art of microRNA research Circ Res108 219ndash234

19 LeeLW ZhangS EtheridgeA MaL MartinD GalasDand WangK (2010) Complexity of the microRNA repertoirerevealed by next-generation sequencing RNA 16 2170ndash2180

20 OkoniewskiMJ and MillerCJ (2006) Hybridization interactionsbetween probesets in short oligo microarrays lead to spuriouscorrelations BMC Bioinformatics 7 276

21 MortazaviA WilliamsBA McCueK SchaefferL andWoldB (2008) Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 5 621ndash628

22 LiuS LinL JiangP WangD and XingY (2011) Acomparison of RNA-Seq and high-density exon array fordetecting differential gene expression between closely relatedspecies Nucleic Acids Res 39 578ndash588

23 RaghavachariN BarbJ YangY LiuP WoodhouseKLevyD OrsquoDonnellC MunsonPJ and KatoG (2012) Asystematic comparison and evaluation of high density exon arraysand RNA-seq technology used to unravel the peripheral bloodtranscriptome of sickle cell disease BMC Med Genomics 5 28

24 MartinJA and WangZ (2011) Next-generation transcriptomeassembly Nat Rev Genet 12 671ndash682

25 WangZ GersteinM and SnyderM (2009) RNA-Seq arevolutionary tool for transcriptomics Nat Rev Genet 1057ndash63

26 KozomaraA and Griffiths-JonesS (2011) miRBase integratingmicroRNA annotation and deep-sequencing data Nucleic AcidsRes 39 D152ndashD157

27 LinsenSE de WitE JanssensG HeaterS ChapmanLParkinRK FritzB WymanSK de BruijnE VoestEE et al(2009) Limitations and possibilities of small RNA digital geneexpression profiling Nat Methods 6 474ndash476

28 BakerM (2010) MicroRNA profiling separating signal fromnoise Nat Methods 7 687ndash692

29 HafnerM RenwickN BrownM MihailovicA HolochDLinC PenaJT NusbaumJD MorozovP LudwigJ et al(2011) RNA-ligase-dependent biases in miRNA representationin deep-sequenced small RNA cDNA libraries RNA 171697ndash1712

30 DabneyJ and MeyerM (2012) Length and GC-biases duringsequencing library amplification a comparison of variouspolymerase-buffer systems with ancient and modern DNAsequencing libraries Biotechniques 52 87ndash94

31 OrpanaAK HoTH and StenmanJ (2012) Multiple heatpulses during PCR extension enabling amplification of GC-richsequences and reducing amplification bias Anal Chem 842081ndash2087

32 SendlerE JohnsonGD and KrawetzSA (2011) Local andglobal factors affecting RNA sequencing analysis Anal Biochem419 317ndash322

33 ZhuangF FuchsRT SunZ ZhengY and RobbGB (2012)Structural bias in T4 RNA ligase-mediated 30-adapter ligationNucleic Acids Res 40 e54

34 DeChiaraTM and BrosiusJ (1987) Neural BC1 RNA cDNAclones reveal nonrepetitive sequence content Proc Natl AcadSci USA 84 2624ndash2628

Nucleic Acids Research 2013 11

35 WickensM and KwakJE (2008) Molecular biology A tail talefor U Science 319 1344ndash1345

36 MunafoDB and RobbGB (2010) Optimization of enzymaticreaction conditions for generating representative pools of cDNAfrom small RNA RNA 16 2537ndash2552

37 SarkarN (1997) Polyadenylation of mRNA in prokaryotesAnnu Rev Biochem 66 173ndash197

38 Yehudai-ResheffS and SchusterG (2000) Characterization of theEcoli poly(A) polymerase nucleotide specificity RNA-bindingaffinities and RNA structure dependence Nucleic Acids Res 281139ndash1144

39 RaabeCA HoeCH RandauG BrosiusJ TangTH andRozhdestvenskyTS (2011) The rocks and shallows of deep RNAsequencing Examples in the Vibrio cholerae RNome RNA 171357ndash1366

40 RaabeCA SanchezCP RandauG RobeckT SkryabinBVChinniSV KubeM ReinhardtR NgGH ManickamRet al (2010) A global view of the nonprotein-codingtranscriptome in Plasmodium falciparum Nucleic Acids Res 38608ndash617

41 SharmaCM HoffmannS DarfeuilleF ReignierJ FindeissSSittkaA ChabasS ReicheK HackermullerJ ReinhardtRet al (2010) The primary transcriptome of the major humanpathogen Helicobacter pylori Nature 464 250ndash255

42 SharmaCM and VogelJ (2009) Experimental approaches forthe discovery and characterization of regulatory small RNACurr Opin Microbiol 12 536ndash546

43 YuB YangZ LiJ MinakhinaS YangM PadgettRWStewardR and ChenX (2005) Methylation as a crucial step inplant microRNA biogenesis Science 307 932ndash935

44 HouwingS KammingaLM BerezikovE CronemboldDGirardA van den ElstH FilippovDV BlaserH RazEMoensCB et al (2007) A role for Piwi and piRNAs in germcell maintenance and transposon silencing in Zebrafish Cell 12969ndash82

45 AckermannHW (2003) Bacteriophage observations andevolution Res Microbiol 154 245ndash251

46 RossmannMG MesyanzhinovVV ArisakaF andLeimanPG (2004) The bacteriophage T4 DNA injectionmachine Curr Opin Struct Biol 14 171ndash180

47 HoCK and ShumanS (2002) Bacteriophage T4 RNA ligase 2(gp241) exemplifies a family of RNA ligases found in allphylogenetic domains Proc Natl Acad Sci USA 9912709ndash12714

48 YinS HoCK and ShumanS (2003) Structure-function analysisof T4 RNA ligase 2 J Biol Chem 278 17601ndash17608

49 AravinA and TuschlT (2005) Identification and characterizationof small RNAs involved in RNA silencing FEBS Lett 5795830ndash5840

50 ViolletS FuchsRT MunafoDB ZhuangF and RobbGB(2011) T4 RNA ligase 2 truncated active site mutants improvedtools for RNA analysis BMC Biotechnol 11 72

51 UhlenbeckOC and GumportRI (1982) T4 RNA LigaseIn BoyerPD (ed) The Enzymes Nucleic Acids Part BAcademic Press Inc New York pp 31ndash58

52 HoCK WangLK LimaCD and ShumanS (2004) Structureand mechanism of RNA ligase Structure 12 327ndash339

53 El OmariK RenJ BirdLE BonaMK KlarmannGLeGriceSF and StammersDK (2006) Molecular architectureand ligand recognition determinants for T4 RNA ligase J BiolChem 281 1573ndash1579

54 StarkMR PleissJA DerasM ScaringeSA and RaderSD(2006) An RNA ligase-mediated method for the efficient creationof large synthetic RNAs RNA 12 2014ndash2019

55 CranstonJW SilberR MalathiVG and HurwitzJ (1974)Studies on ribonucleic acid ligase Characterization of anadenosine triphosphate-inorganic pyrophosphate exchangereaction and demonstration of an enzyme-adenylate complex withT4 bacteriophage-induced enzyme J Biol Chem 2497447ndash7456

56 HigginsNP GeballeAP SnopekTJ SuginoA andCozzarelliNR (1977) Bacteriophage T4 RNA ligase preparationof a physically homogeneous nuclease-free enzyme fromhyperproducing infected cells Nucleic Acids Res 4 3175ndash3186

57 PfefferS Lagos-QuintanaM and TuschlT (2005) Cloning ofsmall RNA molecules Curr Protoc Mol Biol Chapter 26 Unit26 24

58 BruceAG and UhlenbeckOC (1978) Reactions at the terminiof tRNA with T4 RNA ligase Nucleic Acids Res 5 3665ndash3677

59 VigneaultF SismourAM and ChurchGM (2008) EfficientmicroRNA capture and bar-coding via enzymatic oligonucleotideadenylation Nat Methods 5 777ndash779

60 LiuJM LivnyJ LawrenceMS KimballMD WaldorMKand CamilliA (2009) Experimental discovery of sRNAs in Vibriocholerae by direct cloning 5StRNA depletion and parallelsequencing Nucleic Acids Res 37 e46

61 KrugM and UhlenbeckOC (1982) Reversal of T4 RNA ligaseBiochemistry 21 1858ndash1864

62 HafnerM LandgrafP LudwigJ RiceA OjoT LinCHolochD LimC and TuschlT (2008) Identification ofmicroRNAs and other small regulatory RNAs using cDNAlibrary sequencing Methods 44 3ndash12

63 JayaprakashAD JabadoO BrownBD and SachidanandamR(2011) Identification and remediation of biases in the activity ofRNA ligases in small-RNA deep sequencing Nucleic Acids Res39 e141

64 YewCW and KumarSV (2012) Isolation and cloning ofmicroRNAs from recalcitrant plant tissues with small amounts oftotal RNA a step-by step approach Mol Biol Rep 391783ndash1790

65 KurthHM and MochizukiK (2009) 2rsquo-O-methylation stabilizesPiwi-associated small RNAs and ensures DNA elimination inTetrahymena RNA 15 675ndash685

66 EbhardtHA ThiEP WangMB and UnrauPJ (2005)Extensive 30 modification of plant small RNAs is modulated byhelper component-proteinase expression Proc Natl Acad SciUSA 102 13398ndash13403

67 WangJ CzechB CrunkA WallaceA MitrevaMHannonGJ and DavisRE (2011) Deep small RNA sequencingfrom the nematode Ascaris reveals conservation functionaldiversification and novel developmental profiles Genome Res 211462ndash1477

68 ZhuangF FuchsRT and RobbGB (2012) Small RNAexpression profiling by high-throughput sequencing implicationsof enzymatic manipulation J Nucleic Acids 2012 360358

69 FujimuraT and EstebanR (2012) Cap snatching of yeast L-Adouble-stranded RNA virus can operate in trans and requiresviral polymerase actively engaging in transcription J Biol Chem287 12797ndash12804

70 BessmanMJ WalshJD DunnCA SwaminathanJWeldonJE and ShenJ (2001) The gene ygdP associated withthe invasiveness of Escherichia coli K1 designates a Nudixhydrolase Orf176 active on adenosine (50)-pentaphospho-(50)-adenosine (Ap5A) J Biol Chem 276 37834ndash37838

71 DeanaA CelesnikH and BelascoJG (2008) The bacterialenzyme RppH triggers messenger RNA degradation by50 pyrophosphate removal Nature 451 355ndash358

72 BlewettN CollerJ and GoldstrohmA (2011) A quantitativeassay for measuring mRNA decapping by splinted ligation reversetranscription polymerase chain reaction qSL-RT-PCR RNA 17535ndash543

73 KrogerC DillonSC CameronAD PapenfortKSivasankaranSK HokampK ChaoY SittkaA HebrardMHandlerK et al (2012) The transcriptional landscape and smallRNAs of Salmonella enterica serovar Typhimurium Proc NatlAcad Sci USA 109 E1277ndashE1286

74 EnglandTE and UhlenbeckOC (1978) Enzymaticoligoribonucleotide synthesis with T4 RNA ligase Biochemistry17 2069ndash2076

75 SorefanK PaisH HallAE KozomaraA Griffiths-JonesSMoultonV and DalmayT (2012) Reducing ligation bias of smallRNAs in libraries for next generation sequencing Silence 3 4

76 ZhelkovskyAM and McReynoldsLA (2011) Simple andefficient synthesis of 50 pre-adenylated DNA using thermostableRNA ligase Nucleic Acids Res 39 e117

77 McCoyMI and GumportRI (1980) T4 ribonucleic acid ligasejoins single-strand oligo(deoxyribonucleotides) Biochemistry 19635ndash642

12 Nucleic Acids Research 2013

78 AlonS VigneaultF EminagaS ChristodoulouDCSeidmanJG ChurchGM and EisenbergE (2011) Barcodingbias in high-throughput multiplex sequencing of miRNA GenomeRes 21 1506ndash1511

79 SunG WuX WangJ LiH LiX GaoH RossiJ andYenY (2011) A bias-reducing strategy in profiling small RNAsusing Solexa RNA 17 2256ndash2262

80 Van NieuwerburghF SoetaertS PodshivalovaK Ay-LinWangE SchafferL DeforceD SalomonDR HeadSR andOrdoukhanianP (2011) Quantitative bias in Illumina TruSeq anda novel post amplification barcoding strategy for multiplexedDNA and small RNA deep sequencing PLoS One 6 e26969

81 VigneaultF Ter-OvanesyanD AlonS EminagaSC ChristodoulouD SeidmanJG EisenbergE and MChurchG (2012) High-throughput multiplex sequencingof miRNA Curr Protoc Hum Genet Chapter 11 Unit 1112 11-10

82 PakJ and FireA (2007) Distinct populations of primary andsecondary effectors during RNAi in C elegans Science 315241ndash244

83 ZhuYY MachlederEM ChenchikA LiR and SiebertPD(2001) Reverse transcriptase template switching a SMARTapproach for full-length cDNA library constructionBiotechniques 30 892ndash897

84 LevinJZ YassourM AdiconisX NusbaumCThompsonDA FriedmanN GnirkeA and RegevA (2010)Comprehensive comparative analysis of strand-specific RNAsequencing methods Nat Methods 7 709ndash715

85 AndoY BurroughsAM KawanoM de HoonMJ andHayashizakiY (2012) Targeted methods to improve small RNAprofiles generated by deep sequencing In MallickB andGhoshZ (eds) Regulatory RNAs Basics Methods andApplications Springer-Verlag Heidelberg Germany pp 253ndash271

86 KawanoM KawazuC LizioM KawajiH CarninciPSuzukiH and HayashizakiY (2010) Reduction of non-insertsequence reads by dimer eliminator LNA oligonucleotide forsmall RNA deep sequencing Biotechniques 49 751ndash755

87 KapranovP OzsolakF and MilosPM (2012) Profiling of shortRNAs using Helicos single-molecule sequencing Methods MolBiol 822 219ndash232

88 KanagawaT (2003) Bias and artifacts in multitemplatepolymerase chain reactions (PCR) J Biosci Bioeng 96 317ndash323

89 DayDJ SpeiserPW SchulzeE BettendorfM FitnessJBaranyF and WhitePC (1996) Identification of non-amplifyingCYP21 genes when using PCR-based diagnosis of 21-hydroxylasedeficiency in congenital adrenal hyperplasia (CAH) affectedpedigrees Hum Mol Genet 5 2039ndash2048

90 AirdD RossMG ChenWS DanielssonM FennellTRussC JaffeDB NusbaumC and GnirkeA (2011) Analyzingand minimizing PCR amplification bias in Illumina sequencinglibraries Genome Biol 12 R18

91 UllahMF and AatifM (2009) The footprints of cancerdevelopment cancer biomarkers Cancer Treat Rev 35 193ndash200

92 MoMH ChenL FuY WangW and FuSW (2012) Cell-freeCirculating miRNA Biomarkers in Cancer J Cancer 3 432ndash448

93 ZenK and ZhangCY (2012) Circulating microRNAs a novelclass of biomarkers to diagnose and monitor human cancersMed Res Rev 32 326ndash348

94 KimT and ReitmairA (2013) Non-Coding RNAs FunctionalAspects and Diagnostic Utility in Oncology Int J Mol Sci 144934ndash4968

95 PatnaikSK MallickR and YendamuriS (2010) Detection ofmicroRNAs in dried serum blots Anal Biochem 407 147ndash149

96 MitchellPS ParkinRK KrohEM FritzBR WymanSKPogosova-AgadjanyanEL PetersonA NoteboomJOrsquoBriantKC AllenA et al (2008) Circulating microRNAs as

stable blood-based markers for cancer detection Proc NatlAcad Sci USA 105 10513ndash10518

97 CieslaM SkrzypekK KozakowskaM LobodaAJozkowiczA and DulakJ (2011) MicroRNAs as biomarkers ofdisease onset Anal Bioanal Chem 401 2051ndash2061

98 OzsolakF (2012) Third-generation sequencing techniques andapplications to drug discovery Expert Opin Drug Discov 7231ndash243

99 TsongalisGJ CalinG CordelierP CroceC MonzonF andSzafranska-SchwarzbachAE (2013) MicroRNA analysis is itready for prime time Clin Chem 59 343ndash347

100 AmiraN Cancel-TassinG BernardiniS Cochand-PriolletBBittardH ManginP FournierG LatilA and CussenotO(2004) Expression in bladder transitional cell carcinoma by real-time quantitative reverse transcription polymerase chain reactionarray of 65 genes at the tumor suppressor locus 9q341-2identification of 5 candidates tumor suppressor genes Int JCancer 111 539ndash542

101 ZhangX (2007) Biomarker validation movement towardspersonalized medicine Expert Rev Mol Diagn 7 469ndash471

102 SchmittgenTD LeeEJ and JiangJ (2008) High-throughputreal-time PCR Methods Mol Biol 429 89ndash98

103 BenesV and CastoldiM (2010) Expression profiling ofmicroRNA using real-time quantitative PCR how to use it andwhat is available Methods 50 244ndash249

104 BustinSA and NolanT (2004) Pitfalls of quantitative real-timereverse-transcription polymerase chain reaction J Biomol Tech15 155ndash166

105 BeitzingerM and MeisterG (2011) Experimental identificationof microRNA targets by immunoprecipitation of Argonauteprotein complexes Methods Mol Biol 732 153ndash167

106 FiedlerSD CarlettiMZ and ChristensonLK (2010)Quantitative RT-PCR methods for mature microRNA expressionanalysis Methods Mol Biol 630 49ndash64

107 ArefianE KianiJ SoleimaniM ShariatiSA Aghaee-BakhtiariSH AtashiA GheisariY AhmadbeigiN Banaei-MoghaddamAM NaderiM et al (2011) Analysis ofmicroRNA signatures using size-coded ligation-mediated PCRNucleic Acids Res 39 e80

108 JonstrupSP KochJ and KjemsJ (2006) A microRNAdetection system based on padlock probes and rolling circleamplification RNA 12 1747ndash1752

109 LatorraD ArarK and HurleyJM (2003) Designconsiderations and effects of LNA in PCR primers Mol CellProbes 17 253ndash259

110 BustinSA BenesV GarsonJA HellemansJ HuggettJKubistaM MuellerR NolanT PfafflMW ShipleyGLet al (2009) The MIQE guidelines minimum information forpublication of quantitative real-time PCR experiments ClinChem 55 611ndash622

111 TrapnellC WilliamsBA PerteaG MortazaviA KwanGvan BarenMJ SalzbergSL WoldBJ and PachterL (2010)Transcript assembly and quantification by RNA-Seq revealsunannotated transcripts and isoform switching during celldifferentiation Nat Biotechnol 28 511ndash515

112 TrapnellC PachterL and SalzbergSL (2009) TopHatdiscovering splice junctions with RNA-Seq Bioinformatics 251105ndash1111

113 ListerR OrsquoMalleyRC Tonti-FilippiniJ GregoryBDBerryCC MillarAH and EckerJR (2008) Highly integratedsingle-base resolution maps of the epigenome in ArabidopsisCell 133 523ndash536

114 PolidorosAN PasentsisK and TsaftarisAS (2006) Rollingcircle amplification-RACE a method for simultaneous isolationof 50 and 30 cDNA ends from amplified cDNA templatesBiotechniques 41 35ndash42

Nucleic Acids Research 2013 13

Page 6: SURVEY AND SUMMARY Biases in small RNA deep sequencing data

(5064) Hence additives that increase the effective mo-lecular concentration have a major influence on productyields and meta-analyses of data generated by differentligases or even in different buffer compositions are error-proneAs an aside specific enrichment of 30-end 20-

O-methylated RNAs like the piwi-interacting RNAs(piRNAs) or plant miRNAs is achieved by periodatetreatment The reaction relies on vicinal hydroxyl groupsto oxidize sugar moieties and is accompanied by pentosering opening The resulting vicinal dialdehyde reacts withwater and forms a chemical equilibrium with thecorresponding dioxane derivative upon ring closure The20-O-methyl group-containing RNAs are insensitive tooxidation and remain accessible for adapter ligation ortailing (65ndash67)

Impact of RNA secondary structure and adapter-RNAcofolding on ligationRecent analysis indicates that RNA secondary structureimpacts sRNA-seq bias (29) During kinetic profiling ofRNAndashadapter ligation test miRNAs were trapped in non-reactive states Denaturing the remaining non-ligatedRNAs resulted in increased ligation yield indirectly sug-gesting the influence of RNA secondary structure (29) Tofurther analyze the impact of secondary structure on RNA30-end ligation a pool of substrate RNAs with 21randomized nucleotides at their 30-ends was ligated withindividual pre-adenylated DNA adapters and reactionyields were subsequently quantified by deep sequencing(33) To avoid bias caused by template-specific RNA50-end modification(s) all test RNAs contained identical50-termini (33) RNAs that harbored stems at their 30-endswere underrepresented in the sequencing data whereasRNAs containing a minimum of three unstructured nu-cleotides at the 30-ends were overrepresented The dataindicate that RNA ligase-mediated reactions in generalare more efficacious with RNA substrates that harborstructurally accessible 30-ends (33)In addition the cofolding of RNA and adapter mol-

ecules impacts ligation (33) Adapter sequence variationthat interfered with the RNA 30-stem formation andpermitted intermolecular base-pairing increased ligationyields (33) Interestingly this data contradicts prioranalysis emphasizing a predominant primary sequence-related influence on data distortion (3363) Althoughthe actual source of discrepancy remains to be analyzedprotocol-specific variation probably due to changes in de-naturation is likely to have caused the observed difference(see supplementary information) Irrespective of the actualreason the advice for minimizing this bias is identical andrelies on the application of adapter pools to increase thelikelihood of productive interactions (33) Thus inaddition to the RNA nucleotide composition RNAndashadapter cofolding and secondary structure-related influ-ences contribute to sRNA-seq bias

RNA 50-end ligationThe chemical properties of RNA 50-ends display structuralvariations Depending on the starting material and experi-mental design it may be necessary to alter the chemistry

of RNA 50-termini prior to cDNA generation Forexample neither 50-triphosphorylated RNAs in prokary-otes nor primary eukaryotic RNA polymerase III tran-scripts are accessible for direct 5-adapter ligationSimilarly RNA polymerase II-specific trimethylguanosinecap structures render RNA 50-ends inaccessible Thusenzymatic pretreatment is usually conducted to avoidbias in the starting material (Figure 1 SupplementaryData and Supplementary Table S1)

Specific biochemistry of RNA 50-termini and enrichmentof various RNA classesAlthough the various chemical properties of RNA 50-endsestablish limitations for unbiased representation they alsopermit specific enrichments (41) (Figure 1 SupplementaryData and Supplementary Table S1) RNAs that containmonophosphorylated 50-ends and 30-hydroxyl groups aredirectly accessible to cDNA generation proceduresConsequently unaltered RNA starting material leadsto its specific enrichment (Figure 1 SupplementaryTable S1) In contrast triphosphorylated or 50-cappedRNAs require TAP treatment to generate 50-monophosphorylated RNAs and to permit 50-end ligation

Biochemical alternatives to specifically cleave pyrophos-phate bonds in 50-triphophorylated RNAs were reportedrecently (68) whereby TAP-mediated hydrolysis issubstituted by polyphosphatase (69) or pyrophospho-hydrolase (7071) treatment In vitro analysis to investigatethe substrate specificity of E coli rppH pyrophospho-hydrolase indicated that pyrophosphate hydrolysisproceeds with identical efficacy irrespective of the RNA50-terminal nucleotide (71) Furthermore a subtle declinein the enzymatic activity in response to potentially inhibi-tory RNA 50-terminal stem loop structures was reported(71) These strategies are particularly appealing when bac-terial primary transcripts or nascent eukaryotic RNA poly-merase III transcripts are central to the investigation Inaddition to targeting triphophorylated RNA 50-ends poly-phosphatase also accepts diphophorylated RNA 50-terminito catalyze pyrophosphate bond cleavage It should beemphasized that 50-capped RNA polymerase II transcriptsare not amenable to polyphosphatase or pyrophospho-hydrolase treatment and therefore escape identification

The application of a TerminatorTM 50-phosphate-dependent exonuclease is frequently utilized to enrich for50-capped and 50-triphospharylated RNAs (417273)The inherent 50 30 exonuclease activity digests50-monophosphorylated RNAs and leaves other tran-scripts unaltered (Figure 1 Supplementary Data andSupplementary Table S1) Therefore different chemistriesof RNA 50-ends not only distort sRNA-seq but also allowfor specific enrichment protocols Subsequent analysispermits the distinction among RNA subsets such asprimary transcripts capped RNAs and other productsof RNA 50-end processing (417273)

Nucleotide preference of RNA 50 ligationApart from specific differences in RNA 50-ends the influ-ence of the adapter sequence itself on ligation efficacy iswell documented Biochemical data indicated that ligationreactions proceeded most effectively when an adenosine

6 Nucleic Acids Research 2013

residue occupied the 30-terminal position of adapter(acceptor) molecules (5174) Recently adapter pools dis-playing sequence identity at all except the two 30-terminalnucleotides were used to investigate sequence-dependentvariation in acceptor function Based on deep sequencingas the readout the relative ranking of miRNA frequenciesvaried as a consequence of the specific adapterndashRNA com-binations (63) Once more miRNA expression profiles arenot only a function of true RNA abundance but alsoreflect varying ligation efficacies (63) Interestingly theuse of adapter pools highly improved the correlation ofmiRNA levels between replicates (63)

Influence of RNA secondary structure on RNA 50-endligationTo investigate potential influences on 50-end modificationderived from RNA secondary structure ligation yieldswere compared among test sets of fully randomizedRNAs A remarkable underrepresentation of double-strand folds at 50-ligation sites indicated strong preferencesfor single-stranded RNA substrates in ligation reactionsconducted with T4Rnl1 (75)

Sugar moiety-dependent change in ligation efficacyDNA-based adapter oligonucleotides performwell in RNA30-end ligation reactions (especially when 50-adenylated)whereas RNA-based adapters are preferred for RNA50-end modification (76) Kinetic analysis of DNA-basedadapter molecules for RNA 50-ligation demonstratedslower reactions compared with their RNA counterparts(77) Recent analyses revealed more complex substrate-specific effects between test RNAs and different adaptersWhen DNA- and RNA-based adapters of identical se-quences were investigated in ligation assays no generalpreference with regard to the sugar moiety of the adaptermolecule was observed in RNA 50-end ligation (3963) Inconclusion to avoid bias adapter pools are recommendedto display sequence variation The chance of productiveligation might be enhanced using RNA DNA or mixedadapter oligonucleotides (3963)

Adapter barcoding influences ligationIn addition to nucleotide preferences of the reactingresidues upstream sequences also exert influence onligation yields Adapters harboring different Roche MIDtags placed 19 nucleotides upstream from the reacting30-ends performed entirely differently when assayed withidentical test RNAs (39) Depending on the combinationof in vitro transcribed non-protein coding RNAs andMID-modified adapters ligation product levels (calculatedas non-protein coding RNA input) ranged from 60ndash70to undetectable (39) Recent reports emphasized thatbarcoding for multiplex sequencing leads to significantbias in miRNA expression profiles (78ndash80) This bias wasdiminished when barcodes were introduced by PCRamplification of cDNA (78ndash81)

Quantitative considerations and barcodingIn general the discovery of novel RNAs and analyses oftranscripts that are expressed at very low levels requiresubstantially more cDNA reads than experiments

directed toward investigating differential gene expressionsHence the design of sRNA-seq experiments needs totake into account the rationale of the investigationContigs formed by continuously overlapping readsduring genomic mapping procedures represent thebio-computational equivalent of individual RNAentities Any selected read during cDNA mapping eithercontributes to pre-existing already populated RNAs ormight generate novel contigs The probability of eitherprocess is a function of RNA size as well as the corres-ponding RNA expression value Of course contigs repre-senting longer rather than shorter RNAs are more likelyto be populated by chance RNAs of higher relative ex-pression levels are generally more densely packed withcDNA reads than RNAs that display medium or low tran-scription levels Consequently if multiplexing is chosenfor economic reasons the necessary sequencing depthmust be approximated prior to sequence runs in order togenerate adequate sample sizes The two fixed parametersfor estimating admissible multiplexing are therefore (i) ex-perimental goal and (ii) lane capacity According toIllumina TruSeq Small RNA Sample Prep Kitguidance 5 million sRNA-seq reads are considered appro-priate for novel miRNA discovery within human-sizedgenomes For miRNA profiling usually about 25 millionreads are recommended These approximations permit re-calculation geared toward specific needs and requirements

Alternative cloning strategies to avoid 50-end modificationBecause the 30-terminal position of cDNA corresponds tothe RNA 50-ends Pak et al (82) used a technique to avoidmodification of RNA 50-termini and therefore to minimizethe need for enzymatic pretreatment The strategy intro-duces adapter sequences not to RNA 50-termini butrather to 30-ends of the corresponding cDNAs (82) Theadvantage of omitting enzymatic treatment of RNA50-ends might be offset by incomplete reverse transcriptionthat precludes full-length cDNA cloning or appropriateassessment of RNA 50-endsRecently ligation to cDNA 30-ends was investigated in

more detail (36) Perfect double-stranded DNA-RNA testhybrids and single-stranded DNAs were utilized to mimicpotential products of reverse transcription All T4 DNA-and RNA-ligases including T4Rnl2 truncated derivativeswere compared to delineate optimal conditions for cDNA30-end modification Surprisingly ligation reactionscarried out with pre-adenylated DNA donor moleculesstrongly indicated that double-stranded DNA-RNAchimera constituted the most reactive substrates forT4Rnl1-mediated ligation In addition supplements ofPEG 8000 strikingly influenced ligation final concentra-tions of 25 PEG 8000 increased reaction yields Identicalsingle-stranded test cDNAs exhibited only poor substratecharacteristics The A-form-like conformation of theDNA-RNA hybrid might have compensated for theprimary structure-related influences and permitted moresimilar ligation outcomes among substrates (36)However the analysis examined only one test substrateand therefore might not be amenable to drawing generalconclusions (36) A further interesting observation indi-cates that cDNA 50-terminal nucleotides influence cDNA

Nucleic Acids Research 2013 7

30-end ligation yields (39) As the cDNA 50-terminalsequence is a consequence of specific RNA 30-end modifi-cation differences between protocols exert different influ-ences on cDNA 30-end adapter ligation and in turn mightalso cause expression profile bias (39)The lsquoSmartTM Approachrsquo (83) offers another interesting

alternative to capture mature RNA 50-ends and to avoidRNA 50-end adapter ligation The technology is based onthe template switching capacity of Moloney murineleukemia virus (M-MLV) reverse transcriptase Oncefirst strand synthesis approaches the RNA 50-end theterminal transferase activity of M-MLV reverse transcript-ase adds oligo(C)-stretches to the growing cDNAOligonucleotides that 30-terminate with oligo(G) stretchesserve to extend templates and permit reverse transcriptionto proceed Thus M-MLV reverse transcriptase enablesthe generation of cDNAs with known terminal sequenceelements (83) and thereby eliminates the need to modifythe RNA 50-ends Recently the technology was adapted tothe specific needs of Ilumina RNA-seq analysis (84)

Adapterndashadapter ligation disturbs sRNA-seqA side effect common to sRNA-seq protocols that rely onligase-mediated sequential RNA end-modification is PCRamplification of empty adapterndashadapter ligation products(8185) Adapterndashdimer formation is a direct consequenceof the molecular adapter excess employed to ensure suffi-cient product yield and might dominate in reactions wherethe concentration of total RNA starting material is low(8186) Adapter molecules that are 50-pre-adenylatedremain in reaction mixtures after RNA 30-end modifica-tion and react with RNA 50-adapter oligonucleotideswhich subsequently favors dimer formation Thereforepurifying products away from excess adapters via gelelectrophoresis is recommended (78) However for prep-aration of tiny RNAs such as miRNAs separation of thesmall size difference between ligated product and adaptersis not trivial Alternative approaches to avoid size frac-tionation rely on application of complementary lsquodimer-eliminator-LNA-oligonucleotidesrsquo (8586) The interferingoligonucleotide is designed to overlap the adapterndashadapterligation junction and thereby distinguish between targetedside product and adapter ligated RNA (86) The comple-mentarity of the interfering oligonucleotides extends intothe RNA 30-adapter sequence thus blocking primerannealing during reverse transcription and as a resultexcludes adapter dimers from RNA sequencing (86)Vigneault et al reported a similar strategy to overcomeunintended dimer amplification (81) Pre-annealing of theRT primer after RNA 30-adapter ligation but prior toRNA 50-end modification efficiently decreased adapterdimer formation The excess of non-ligated free RNA30-adapters was captured by complementary reverse tran-scriptase primers and sequestered in less-reactive double-stranded structures (81) The latter approach permitseffective reduction of dimer amplification and is reportedto increase cDNA library complexity (81) ObviouslyRNA tailing strategies (see above) are superior becausethey completely circumvent the problem of adapter-dimer formation To avoid adapter tailing as an analogous

side reaction to dimer formation it is advisable to carryout 30-tailing prior to 50-adapter ligation

Depending on the experimental goal sRNA-seqrequires varying amounts of cDNA reads (see above)Therefore techniques to reduce the amount of readsthat represent empty adapter dimers are of central import-ance as they permit an increase in sequencing depthreducing costs and ultimately enabling novel smallnon-protein coding RNA gene discovery

PCR AMPLIFICATION BIAS

Apart from novel third-generation sequencingtechnologies that permit direct sequence analysis of un-amplified cDNA all commonly used protocols implementPCR amplification (87) In addition the introduction ofbarcode identifiers by PCR amplification may also avoidbias introduced by differences in RNAndashadapter-specificligation efficacies (78) Therefore multiplexing of differentsamples during PCR is widely regarded to permit meta-analysis (78) However even PCR amplification itself isknown to introduce expression bias into datasets(303288) In particular when many different templatesare amplified in parallel variations not only in templatelength but also in base composition might lead topreferred template-specific amplifications (30) Extremeexamples of allelic dropout due to single base-exchangewere reported for PCR-mediated genotyping (89) TheRNA GC-content is especially responsible for data dis-tortion Generally cDNAs of high GC-content are lessreadily amplified and thus remain underrepresented Forexample read distributions calculated over the entire 28SrRNA were inversely correlated with the correspondingregional GC-content (32) As argued earlier base com-positions of different small non-protein coding RNAs(miRNAs piRNAs snoRNAs scaRNAs etc) varywidely The parallel amplification of complex templatemixtures that vary in their respective GC-content is there-fore a challenging procedure and likely to influence thecomplexity of the resulting cDNA library (30) Changesin complexity do not only influence distortion of quanti-tative measurements but certain RNAs might also lsquodropoutrsquo completely and remain unidentified Extended periodsof denaturation are considered to effectively decrease thelikelihood of uneven amplification (90) PCR amplifica-tion does not only change the overall GC-content of theresulting cDNA pool but is also reported to introducelength bias (30) which might be an important factorwhen RNAs of different sizes are subjected to parallelcDNA preparations Shorter sequences of significantlylower GC-content are likely to be overrepresented

Recent analysis also emphasized the impact of PCRbuffer compositions on PCR amplification bias (30)Remarkably PCR-mediated bias was already observedin some cases within 15 rounds of PCR amplification(30) While the popular Phusion PCR system surprisinglyperformed the worst (30) data indicate that someoptimized PCR buffers are capable of significantlyreducing PCR-mediated distortion (3090)

8 Nucleic Acids Research 2013

sRNA-SEQ AND PERSONALIZED MEDICINE

In recent years increasing attention has focused on thepotential of miRNA transcriptional profiling to diagnosedisease or to monitor progression and success in treatment(161791) The decreasing costs of high-throughputsequence analysis favors sRNA-seq as a common toolfor medical diagnostics and personalized medicine Aftercell death the total RNA of circulating cancer cells isreleased into the blood stream (92) These extracellularRNAs (exRNAs) are packaged within vesicles such asexosomes where they are complexed with differentproteins for example miRNAs in the RISC complexVesicles protect exRNA from ribonucleases and preventdegradation Among the various exRNAs miRNAs aresome of the most stable molecules (93) Easy accessibilityof patient blood samples and the stability of miRNAsmake them effective biomarkers (94) Diagnostics basedon miRNA were successfully conducted on biologicalsamples of low quality including dried blood (95) Inaddition miRNAs remain stable even when samples aresubjected to repeated rounds of tissue freezing andthawing which usually degrades RNAs (96) A furtherappealing feature of miRNAs in medical research is thatchanges in their expression profiles are observed duringdisease onset (97) In addition miRNAs often displaytissue-specific expression Therefore it might be possibleto identify the cellular origin of most metastases (91)possibly providing decision guidance for appropriatetreatment However as elaborated in detail miRNA ex-pression profiles detected by deep sequencing do not onlyreflect changes in relative expression but are alsoinfluenced by miRNA-adapter-specific ligation efficaciesespecially in cost-effective parallel analyses that oftenrequire multiplexing Different barcode introduced byligation generate bias in miRNA representation and abun-dance (78) Even when incorporated by PCR barcodingmight still lead to data distortion (30ndash32) Third-gener-ation sequencing techniques such as the Helicos (InNovember 2012 Helicos filed bankruptcy (httpbizyahoocome121115hlcs8-khtml)) system avoid PCRamplification steps in cDNA preparation and permitdirect sequencing of unamplified samples (87) The RNA30-end modification to prime cDNA synthesis (or directsequencing) in addition to relatively high error frequencies(15) might still lead to bias and so would not beespecially applicable in personalized medicine (98)Collectively deep sequencing to monitor change inmedical samples is of importance and certainly ofinterest but requires resolution of serious standardizationissues prior to general application

qRT-PCR a valid alternative to deep sequencing

Alternative techniques to quantify specific miRNA testsets rely on quantitative real-time reverse transcriptionPCR (qRT-PCR) (99) In particular diagnostic screensin cancer medicine rely on the application of qRT-PCR(99ndash101) In cases where biomarker disease correlation iswell established and when quantification of limitednumbers of biomarkers is sufficient for diagnosis and tomonitor disease progression or treatment qRT-PCR is

a valuable tool (99) Recently the convenience ofqRT-PCR-based medical diagnosis was suggested tomake it superior in routine clinical applications (99)Interestingly novel developments in qRT-PCR allowsample processing in 384-well formats and liken thismethod to cost-effective high-throughput applications(99102) A further advantage of qRT-PCR is the reduc-tion in downstream bio-computation (99103) The timebetween qRT-PCR and diagnostic result is shorter thanfor sRNA-seq or microarray-based inference Howeverthe analysis of miRNA expression profiles by qRT-PCRis technically challenging due to RNA-inherent pitfalls(103104)The miRNAs lack structural features such as mRNA

poly(A) tails that could serve as class-specific enrichmentor priming of reverse transcription (103) Alternative tech-niques for miRNA sample enrichment would rely onimmune-precipitation but most of them are not suitablefor typical small size samples in medical analysis (105) Asmentioned above the heterogeneity of the RNA GC-content translates into a wide range of primer annealingtemperatures for a population of miRNAs and mighttherefore complicate parallel analysis of multiple RNAs(103) Furthermore the qRT-PCR template is presentnot only within the mature miRNA molecule but also ispart of various precursor transcripts (103) It is thereforeimportant to ensure that primary or precursor-miRNAs(pri- or pre-miRNAs) do not provide templates inqRT-PCR amplification and as a consequence do notcontribute to the signal intensities detected Difficultiesthat relate to the similarity displayed between differentmiRNA isoforms are mentioned above within thecontext of array-based inference to quantify differentialgene expressionThus far two alternative approaches are commonly

utilized to reverse transcribe miRNA templates(103106) miRNA-specific stem-loop primers are oftenused when only limited numbers of RNAs are subject toquantification A major advantage of using miRNA-specific primers for reverse transcription is an increase insensitivity and specificity However RNA 30-end-tailing-based approaches enable bulk reverse transcription ofmultiple targets in parallel and are therefore advantageouswhen many miRNAs are analyzed (see above) A recentcomparative analysis to investigate reverse transcriptionefficacies of the two methods uncovered only minor dif-ferences (106) The analysis involved only a limitedmiRNA test set and therefore the results cannot begeneralized As an aside the oligo d(T)-based designmight also enable reverse transcription of miRNA andthe corresponding mRNA target in single test tube reac-tions (106) Ligation-mediated PCR or padlock probes incombination with rolling circle amplification are technicalalternatives for avoiding reverse transcription (107108)The methods rely on ligation-mediated template gener-ation and are therefore not devoid of bias (see above)Both approaches permit generation of miRNA signaturesacross samples and do not require elaborate analyticalpost-processing or expensive technical equipment(107108) However as elaborated here and abovereverse transcription is not devoid of template-specific

Nucleic Acids Research 2013 9

bias In particular RNA secondary structure and RNAmodifications both influence reverse transcription efficacy(see above) Similar to PCRs reverse transcription is nega-tively affected by increasing RNA GC-contentBecause the relative RNA GC-content dictates primer

Tm qRT-PCRs are also strongly influenced by forwardprimer design thereby affecting both qRT-PCR specificityand sensitivity Altering the length of primer-templatecomplementarity permits adjustment of the actualprimer Tm Conversely in cases where RNA nucleotidecomposition would dictate very low melting temperaturesLNA-modified nucleotides might be incorporated into theprimer sequence to increase the corresponding primer Tm(18) However the design of LNA-containing oligonucleo-tides is not always trivial (109) For the actual measure-ment of miRNA in real time two alternative fluorescentreporter molecules are frequently utilized SYBR Greenand the Taqman assay (103) Detection of the fluores-cence signal emitted by SYBR Green increases about100-fold as the molecule intercalates in double-strandedDNA However the mechanism of DNA intercalationsuggests the dye is not target-specific and thereforecannot discriminate primer dimers unspecific productsand target amplification (103) Because the intercalatingdye is simply added to the assay it is often possible toeasily adapt the qRT-PCR assay to already establishedand validated analytical PCRs (104) In comparison theTaqman assay provides increased specificity as the de-tection is based on sequence-specific probendashtarget inter-action Certainly the qRT-PCR assays offer manyadvantages but require evaluation of various factors thatinfluence and potentially bias analysis Thereforeinternational standardization to ensure comparable ex-perimental design and to permit meta-analysis amongdatasets is of prime importance for establishing clinicalqRT-PCR-based personal medicine The MinimumInformation for Publication of Quantitative Real-TimePCR experiments guidelines established standards for bet-ter experimental practice and suggest routes to data ana-lysis that increase reliability and data comparison (110)

mRNA-SEQ

Deep sequencing is not limited to expression profiling ofmiRNA or other small non-protein coding RNAs Evenentire transcriptomes including all protein coding andlong non-protein coding RNAs are subjected to high-throughput sequencing methods Currently as the lengthof single sequence runs is limited (from 100 to 500 nt de-pending on the sequencing platform) longer RNAsrequire fragmentation prior to cDNA synthesis (25)Millions of small reads resulting from mRNA-seq areanalyzed to identify novel transcripts or splice variants(111112) Relative quantification of reads enables one tomonitor differential gene expression among samples andto identify changes in isoform distribution (111) As withsRNA-seq the digital nature of RNA deep sequencing isthought to simplify meta-analysis of mRNA or long non-protein coding RNA expression data (25) Some protocolsfor modification of RNA fragment ends rely on

chemistries identical to sRNA-seq (84113) Therefore ex-pression biases observed in sRNA-seq analysis are alsolikely to cause distortion in whole transcriptome profilingVariations in nucleotide composition and changes in RNAsecondary structure of different RNA fragments mightcorrelate with uneven read distribution Unequal readcoverage generates uncertainty in isoform detection andin extreme cases exons representing unfavorable nucleo-tide compositions might escape analysis As fragmentationis suggested to generate many different but at least par-tially overlapping reads bias in mRNA-seq might be lessprominent Recently a novel technique based on circligasewas introduced to avoid RNA 50-end ligation (114)Reverse transcription is conducted with anchoredoligo(dT) primers that contain an additional sequence topermit later PCR amplification Following first strandcDNA synthesis circligase-mediated ligation is carriedout The resulting circular molecules are subjected toPCR amplification The strategy is of general interestand might be a valuable tool for sRNA-seq as well

CONCLUSION

Bias in general describes systematic errors that reflectmethod-related distortion from the truth Bias might bedetected easily when data generated by different methodsare subjected to analytical comparison Because it is sys-tematic in nature bias is not subject to variation inrepeated experiments Here based on biochemical anddeep sequencing data analyses we have presentedvarious steps in sRNA-seq protocols likely to causesevere distortions in the relative expression levels of indi-vidual sRNAs

The efficacy of RNA end-modification to permit cDNAconstruction depends not only on differences in strategiesand reagents but also on changes in among others buffercompositions and additives that could increase molecularcrowding Therefore meta-analysis of data generated withnon-identical methods reagents or even buffers is likely tobe error-prone (36396378) To achieve comparableresults and to evaluate relative changes in the quantifica-tion of the resulting data experiments and analyses shouldbe conducted under strictly identical conditions It has tobe stressed that bias in RNA-specific variation is not onlyrestricted to inter-sample comparison but also effects therelative ranking of intra-sample RNA expression

In addition to the more obvious quantitative aspects ofbias there are also qualitative aspects because RNA-seqmethods are also employed for RNA discovery IndividualRNAs might lsquodrop outrsquo from detection because they arenot amenable to cDNA construction under chosen condi-tions Certainly to increase the likelihood of productiveRNA end-modification and increase the complexity ofsRNA-seq library content the parallel application of dif-ferent strategies and reagents is required (33396379) Inparticular increased variability of adapter sequences (ieadapter pools) helps to increase the diversity of RNAsaccessed (63) A recent report emphasized the need forparallel application of different RNA- and DNA-basedadapter oligonucleotides (39)

10 Nucleic Acids Research 2013

The source of expression level distortion and the failureto detect certain RNA species at all appear not to belimited to reactions of RNAcDNA-end modificationbut also apply to cDNA amplification by PCR (30ndash32)Differences in reaction buffers RNA GC-content sec-ondary structures RNA length and primers might eachlead to bias during PCR Therefore the analysis ofsRNA-seq data to examine relative changes in geneexpression or identification reflects both true RNA abun-dance and biases related to the methods applied

The sRNA-seq approach is also attractive to medicalresearch Deep sequencing might permit the screening ofmultiple patient samples to monitor the progression ofdisease or treatment Therefore it is urgently necessaryto establish rigid international standardizations tominimize distortion and to avoid misleading conclusionsduring RNA-seq data interpretation In summary quali-tative and quantitative RNomics remain more challengingthan anticipated

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

FUNDING

National Genome Research Network [NGFNIII01GS0808 to JB and TSR] the Deutsche Forschungs-gemeinschaft [BR 7544-1 to JB] USM ResearchUniversity Grant [1001CIPPT81305 to THT]Funding for open access charge The licence was fundedby Dr Brosiusrsquos research budget

Conflict of interest statement None declared

REFERENCES

1 BrosiusJ (2009) The fragmented gene Ann N Y Acad Sci1178 186ndash193

2 BrosiusJ and TiedgeH (2004) RNomenclature RNA Biol 181ndash83

3 LiuQ and ParooZ (2010) Biochemical principles of small RNApathways Annu Rev Biochem 79 295ndash319

4 BrosiusJ (2012) In ErdmannVA and BarciszewskiJ (eds)From Nucleic Acids Sequences to Molecular Medicine RNATechnologies Springer Berlin Heidelberg pp 1ndash18

5 HuttenhoferA BrosiusJ and BachellerieJP (2002) RNomicsidentification and function of small non-messenger RNAs CurrOpin Chem Biol 6 835ndash843

6 AnS and SongJJ (2011) The coded functions of noncodingRNAs for gene regulation Mol Cells 31 491ndash496

7 BushatiN and CohenSM (2007) microRNA functions AnnuRev Cell Dev Biol 23 175ndash205

8 ChiangHR SchoenfeldLW RubyJG AuyeungVCSpiesN BaekD JohnstonWK RussC LuoS BabiarzJEet al (2010) Mammalian microRNAs experimental evaluation ofnovel and previously annotated genes Genes Dev 24 992ndash1009

9 AmbrosV (2001) microRNAs tiny regulators with greatpotential Cell 107 823ndash826

10 LauNC LimLP WeinsteinEG and BartelDP (2001) Anabundant class of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 294 858ndash862

11 RuvkunG (2001) Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash799

12 RuvkunGB (2003) The tiny RNA world Harvey Lect 991ndash21

13 GurtanAM and SharpPA (2013) The Role of miRNAs inRegulating Gene Expression Networks J Mol Biol 4253582ndash3600

14 LiangH ZhangJ ZenK ZhangCY and ChenX (2013)Nuclear microRNAs and their unconventional role in regulatingnon-coding RNAs Protein Cell 4 325ndash330

15 HuangV and LiLC (2012) miRNA goes nuclear RNA Biol 9269ndash273

16 WangZ HanJ CuiY FanK and ZhouX (2013) CirculatingmicroRNA-21 as noninvasive predictive biomarker for response incancer immunotherapy Med Hypotheses 81 41ndash43

17 ZhaoC ZhangJ ZhangS YuD ChenY LiuQ ShiMNiC and ZhuM (2013) Diagnostic and biological significance ofmicroRNA-192 in pancreatic ductal adenocarcinoma Oncol Rep30 276ndash284

18 van RooijE (2011) The art of microRNA research Circ Res108 219ndash234

19 LeeLW ZhangS EtheridgeA MaL MartinD GalasDand WangK (2010) Complexity of the microRNA repertoirerevealed by next-generation sequencing RNA 16 2170ndash2180

20 OkoniewskiMJ and MillerCJ (2006) Hybridization interactionsbetween probesets in short oligo microarrays lead to spuriouscorrelations BMC Bioinformatics 7 276

21 MortazaviA WilliamsBA McCueK SchaefferL andWoldB (2008) Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 5 621ndash628

22 LiuS LinL JiangP WangD and XingY (2011) Acomparison of RNA-Seq and high-density exon array fordetecting differential gene expression between closely relatedspecies Nucleic Acids Res 39 578ndash588

23 RaghavachariN BarbJ YangY LiuP WoodhouseKLevyD OrsquoDonnellC MunsonPJ and KatoG (2012) Asystematic comparison and evaluation of high density exon arraysand RNA-seq technology used to unravel the peripheral bloodtranscriptome of sickle cell disease BMC Med Genomics 5 28

24 MartinJA and WangZ (2011) Next-generation transcriptomeassembly Nat Rev Genet 12 671ndash682

25 WangZ GersteinM and SnyderM (2009) RNA-Seq arevolutionary tool for transcriptomics Nat Rev Genet 1057ndash63

26 KozomaraA and Griffiths-JonesS (2011) miRBase integratingmicroRNA annotation and deep-sequencing data Nucleic AcidsRes 39 D152ndashD157

27 LinsenSE de WitE JanssensG HeaterS ChapmanLParkinRK FritzB WymanSK de BruijnE VoestEE et al(2009) Limitations and possibilities of small RNA digital geneexpression profiling Nat Methods 6 474ndash476

28 BakerM (2010) MicroRNA profiling separating signal fromnoise Nat Methods 7 687ndash692

29 HafnerM RenwickN BrownM MihailovicA HolochDLinC PenaJT NusbaumJD MorozovP LudwigJ et al(2011) RNA-ligase-dependent biases in miRNA representationin deep-sequenced small RNA cDNA libraries RNA 171697ndash1712

30 DabneyJ and MeyerM (2012) Length and GC-biases duringsequencing library amplification a comparison of variouspolymerase-buffer systems with ancient and modern DNAsequencing libraries Biotechniques 52 87ndash94

31 OrpanaAK HoTH and StenmanJ (2012) Multiple heatpulses during PCR extension enabling amplification of GC-richsequences and reducing amplification bias Anal Chem 842081ndash2087

32 SendlerE JohnsonGD and KrawetzSA (2011) Local andglobal factors affecting RNA sequencing analysis Anal Biochem419 317ndash322

33 ZhuangF FuchsRT SunZ ZhengY and RobbGB (2012)Structural bias in T4 RNA ligase-mediated 30-adapter ligationNucleic Acids Res 40 e54

34 DeChiaraTM and BrosiusJ (1987) Neural BC1 RNA cDNAclones reveal nonrepetitive sequence content Proc Natl AcadSci USA 84 2624ndash2628

Nucleic Acids Research 2013 11

35 WickensM and KwakJE (2008) Molecular biology A tail talefor U Science 319 1344ndash1345

36 MunafoDB and RobbGB (2010) Optimization of enzymaticreaction conditions for generating representative pools of cDNAfrom small RNA RNA 16 2537ndash2552

37 SarkarN (1997) Polyadenylation of mRNA in prokaryotesAnnu Rev Biochem 66 173ndash197

38 Yehudai-ResheffS and SchusterG (2000) Characterization of theEcoli poly(A) polymerase nucleotide specificity RNA-bindingaffinities and RNA structure dependence Nucleic Acids Res 281139ndash1144

39 RaabeCA HoeCH RandauG BrosiusJ TangTH andRozhdestvenskyTS (2011) The rocks and shallows of deep RNAsequencing Examples in the Vibrio cholerae RNome RNA 171357ndash1366

40 RaabeCA SanchezCP RandauG RobeckT SkryabinBVChinniSV KubeM ReinhardtR NgGH ManickamRet al (2010) A global view of the nonprotein-codingtranscriptome in Plasmodium falciparum Nucleic Acids Res 38608ndash617

41 SharmaCM HoffmannS DarfeuilleF ReignierJ FindeissSSittkaA ChabasS ReicheK HackermullerJ ReinhardtRet al (2010) The primary transcriptome of the major humanpathogen Helicobacter pylori Nature 464 250ndash255

42 SharmaCM and VogelJ (2009) Experimental approaches forthe discovery and characterization of regulatory small RNACurr Opin Microbiol 12 536ndash546

43 YuB YangZ LiJ MinakhinaS YangM PadgettRWStewardR and ChenX (2005) Methylation as a crucial step inplant microRNA biogenesis Science 307 932ndash935

44 HouwingS KammingaLM BerezikovE CronemboldDGirardA van den ElstH FilippovDV BlaserH RazEMoensCB et al (2007) A role for Piwi and piRNAs in germcell maintenance and transposon silencing in Zebrafish Cell 12969ndash82

45 AckermannHW (2003) Bacteriophage observations andevolution Res Microbiol 154 245ndash251

46 RossmannMG MesyanzhinovVV ArisakaF andLeimanPG (2004) The bacteriophage T4 DNA injectionmachine Curr Opin Struct Biol 14 171ndash180

47 HoCK and ShumanS (2002) Bacteriophage T4 RNA ligase 2(gp241) exemplifies a family of RNA ligases found in allphylogenetic domains Proc Natl Acad Sci USA 9912709ndash12714

48 YinS HoCK and ShumanS (2003) Structure-function analysisof T4 RNA ligase 2 J Biol Chem 278 17601ndash17608

49 AravinA and TuschlT (2005) Identification and characterizationof small RNAs involved in RNA silencing FEBS Lett 5795830ndash5840

50 ViolletS FuchsRT MunafoDB ZhuangF and RobbGB(2011) T4 RNA ligase 2 truncated active site mutants improvedtools for RNA analysis BMC Biotechnol 11 72

51 UhlenbeckOC and GumportRI (1982) T4 RNA LigaseIn BoyerPD (ed) The Enzymes Nucleic Acids Part BAcademic Press Inc New York pp 31ndash58

52 HoCK WangLK LimaCD and ShumanS (2004) Structureand mechanism of RNA ligase Structure 12 327ndash339

53 El OmariK RenJ BirdLE BonaMK KlarmannGLeGriceSF and StammersDK (2006) Molecular architectureand ligand recognition determinants for T4 RNA ligase J BiolChem 281 1573ndash1579

54 StarkMR PleissJA DerasM ScaringeSA and RaderSD(2006) An RNA ligase-mediated method for the efficient creationof large synthetic RNAs RNA 12 2014ndash2019

55 CranstonJW SilberR MalathiVG and HurwitzJ (1974)Studies on ribonucleic acid ligase Characterization of anadenosine triphosphate-inorganic pyrophosphate exchangereaction and demonstration of an enzyme-adenylate complex withT4 bacteriophage-induced enzyme J Biol Chem 2497447ndash7456

56 HigginsNP GeballeAP SnopekTJ SuginoA andCozzarelliNR (1977) Bacteriophage T4 RNA ligase preparationof a physically homogeneous nuclease-free enzyme fromhyperproducing infected cells Nucleic Acids Res 4 3175ndash3186

57 PfefferS Lagos-QuintanaM and TuschlT (2005) Cloning ofsmall RNA molecules Curr Protoc Mol Biol Chapter 26 Unit26 24

58 BruceAG and UhlenbeckOC (1978) Reactions at the terminiof tRNA with T4 RNA ligase Nucleic Acids Res 5 3665ndash3677

59 VigneaultF SismourAM and ChurchGM (2008) EfficientmicroRNA capture and bar-coding via enzymatic oligonucleotideadenylation Nat Methods 5 777ndash779

60 LiuJM LivnyJ LawrenceMS KimballMD WaldorMKand CamilliA (2009) Experimental discovery of sRNAs in Vibriocholerae by direct cloning 5StRNA depletion and parallelsequencing Nucleic Acids Res 37 e46

61 KrugM and UhlenbeckOC (1982) Reversal of T4 RNA ligaseBiochemistry 21 1858ndash1864

62 HafnerM LandgrafP LudwigJ RiceA OjoT LinCHolochD LimC and TuschlT (2008) Identification ofmicroRNAs and other small regulatory RNAs using cDNAlibrary sequencing Methods 44 3ndash12

63 JayaprakashAD JabadoO BrownBD and SachidanandamR(2011) Identification and remediation of biases in the activity ofRNA ligases in small-RNA deep sequencing Nucleic Acids Res39 e141

64 YewCW and KumarSV (2012) Isolation and cloning ofmicroRNAs from recalcitrant plant tissues with small amounts oftotal RNA a step-by step approach Mol Biol Rep 391783ndash1790

65 KurthHM and MochizukiK (2009) 2rsquo-O-methylation stabilizesPiwi-associated small RNAs and ensures DNA elimination inTetrahymena RNA 15 675ndash685

66 EbhardtHA ThiEP WangMB and UnrauPJ (2005)Extensive 30 modification of plant small RNAs is modulated byhelper component-proteinase expression Proc Natl Acad SciUSA 102 13398ndash13403

67 WangJ CzechB CrunkA WallaceA MitrevaMHannonGJ and DavisRE (2011) Deep small RNA sequencingfrom the nematode Ascaris reveals conservation functionaldiversification and novel developmental profiles Genome Res 211462ndash1477

68 ZhuangF FuchsRT and RobbGB (2012) Small RNAexpression profiling by high-throughput sequencing implicationsof enzymatic manipulation J Nucleic Acids 2012 360358

69 FujimuraT and EstebanR (2012) Cap snatching of yeast L-Adouble-stranded RNA virus can operate in trans and requiresviral polymerase actively engaging in transcription J Biol Chem287 12797ndash12804

70 BessmanMJ WalshJD DunnCA SwaminathanJWeldonJE and ShenJ (2001) The gene ygdP associated withthe invasiveness of Escherichia coli K1 designates a Nudixhydrolase Orf176 active on adenosine (50)-pentaphospho-(50)-adenosine (Ap5A) J Biol Chem 276 37834ndash37838

71 DeanaA CelesnikH and BelascoJG (2008) The bacterialenzyme RppH triggers messenger RNA degradation by50 pyrophosphate removal Nature 451 355ndash358

72 BlewettN CollerJ and GoldstrohmA (2011) A quantitativeassay for measuring mRNA decapping by splinted ligation reversetranscription polymerase chain reaction qSL-RT-PCR RNA 17535ndash543

73 KrogerC DillonSC CameronAD PapenfortKSivasankaranSK HokampK ChaoY SittkaA HebrardMHandlerK et al (2012) The transcriptional landscape and smallRNAs of Salmonella enterica serovar Typhimurium Proc NatlAcad Sci USA 109 E1277ndashE1286

74 EnglandTE and UhlenbeckOC (1978) Enzymaticoligoribonucleotide synthesis with T4 RNA ligase Biochemistry17 2069ndash2076

75 SorefanK PaisH HallAE KozomaraA Griffiths-JonesSMoultonV and DalmayT (2012) Reducing ligation bias of smallRNAs in libraries for next generation sequencing Silence 3 4

76 ZhelkovskyAM and McReynoldsLA (2011) Simple andefficient synthesis of 50 pre-adenylated DNA using thermostableRNA ligase Nucleic Acids Res 39 e117

77 McCoyMI and GumportRI (1980) T4 ribonucleic acid ligasejoins single-strand oligo(deoxyribonucleotides) Biochemistry 19635ndash642

12 Nucleic Acids Research 2013

78 AlonS VigneaultF EminagaS ChristodoulouDCSeidmanJG ChurchGM and EisenbergE (2011) Barcodingbias in high-throughput multiplex sequencing of miRNA GenomeRes 21 1506ndash1511

79 SunG WuX WangJ LiH LiX GaoH RossiJ andYenY (2011) A bias-reducing strategy in profiling small RNAsusing Solexa RNA 17 2256ndash2262

80 Van NieuwerburghF SoetaertS PodshivalovaK Ay-LinWangE SchafferL DeforceD SalomonDR HeadSR andOrdoukhanianP (2011) Quantitative bias in Illumina TruSeq anda novel post amplification barcoding strategy for multiplexedDNA and small RNA deep sequencing PLoS One 6 e26969

81 VigneaultF Ter-OvanesyanD AlonS EminagaSC ChristodoulouD SeidmanJG EisenbergE and MChurchG (2012) High-throughput multiplex sequencingof miRNA Curr Protoc Hum Genet Chapter 11 Unit 1112 11-10

82 PakJ and FireA (2007) Distinct populations of primary andsecondary effectors during RNAi in C elegans Science 315241ndash244

83 ZhuYY MachlederEM ChenchikA LiR and SiebertPD(2001) Reverse transcriptase template switching a SMARTapproach for full-length cDNA library constructionBiotechniques 30 892ndash897

84 LevinJZ YassourM AdiconisX NusbaumCThompsonDA FriedmanN GnirkeA and RegevA (2010)Comprehensive comparative analysis of strand-specific RNAsequencing methods Nat Methods 7 709ndash715

85 AndoY BurroughsAM KawanoM de HoonMJ andHayashizakiY (2012) Targeted methods to improve small RNAprofiles generated by deep sequencing In MallickB andGhoshZ (eds) Regulatory RNAs Basics Methods andApplications Springer-Verlag Heidelberg Germany pp 253ndash271

86 KawanoM KawazuC LizioM KawajiH CarninciPSuzukiH and HayashizakiY (2010) Reduction of non-insertsequence reads by dimer eliminator LNA oligonucleotide forsmall RNA deep sequencing Biotechniques 49 751ndash755

87 KapranovP OzsolakF and MilosPM (2012) Profiling of shortRNAs using Helicos single-molecule sequencing Methods MolBiol 822 219ndash232

88 KanagawaT (2003) Bias and artifacts in multitemplatepolymerase chain reactions (PCR) J Biosci Bioeng 96 317ndash323

89 DayDJ SpeiserPW SchulzeE BettendorfM FitnessJBaranyF and WhitePC (1996) Identification of non-amplifyingCYP21 genes when using PCR-based diagnosis of 21-hydroxylasedeficiency in congenital adrenal hyperplasia (CAH) affectedpedigrees Hum Mol Genet 5 2039ndash2048

90 AirdD RossMG ChenWS DanielssonM FennellTRussC JaffeDB NusbaumC and GnirkeA (2011) Analyzingand minimizing PCR amplification bias in Illumina sequencinglibraries Genome Biol 12 R18

91 UllahMF and AatifM (2009) The footprints of cancerdevelopment cancer biomarkers Cancer Treat Rev 35 193ndash200

92 MoMH ChenL FuY WangW and FuSW (2012) Cell-freeCirculating miRNA Biomarkers in Cancer J Cancer 3 432ndash448

93 ZenK and ZhangCY (2012) Circulating microRNAs a novelclass of biomarkers to diagnose and monitor human cancersMed Res Rev 32 326ndash348

94 KimT and ReitmairA (2013) Non-Coding RNAs FunctionalAspects and Diagnostic Utility in Oncology Int J Mol Sci 144934ndash4968

95 PatnaikSK MallickR and YendamuriS (2010) Detection ofmicroRNAs in dried serum blots Anal Biochem 407 147ndash149

96 MitchellPS ParkinRK KrohEM FritzBR WymanSKPogosova-AgadjanyanEL PetersonA NoteboomJOrsquoBriantKC AllenA et al (2008) Circulating microRNAs as

stable blood-based markers for cancer detection Proc NatlAcad Sci USA 105 10513ndash10518

97 CieslaM SkrzypekK KozakowskaM LobodaAJozkowiczA and DulakJ (2011) MicroRNAs as biomarkers ofdisease onset Anal Bioanal Chem 401 2051ndash2061

98 OzsolakF (2012) Third-generation sequencing techniques andapplications to drug discovery Expert Opin Drug Discov 7231ndash243

99 TsongalisGJ CalinG CordelierP CroceC MonzonF andSzafranska-SchwarzbachAE (2013) MicroRNA analysis is itready for prime time Clin Chem 59 343ndash347

100 AmiraN Cancel-TassinG BernardiniS Cochand-PriolletBBittardH ManginP FournierG LatilA and CussenotO(2004) Expression in bladder transitional cell carcinoma by real-time quantitative reverse transcription polymerase chain reactionarray of 65 genes at the tumor suppressor locus 9q341-2identification of 5 candidates tumor suppressor genes Int JCancer 111 539ndash542

101 ZhangX (2007) Biomarker validation movement towardspersonalized medicine Expert Rev Mol Diagn 7 469ndash471

102 SchmittgenTD LeeEJ and JiangJ (2008) High-throughputreal-time PCR Methods Mol Biol 429 89ndash98

103 BenesV and CastoldiM (2010) Expression profiling ofmicroRNA using real-time quantitative PCR how to use it andwhat is available Methods 50 244ndash249

104 BustinSA and NolanT (2004) Pitfalls of quantitative real-timereverse-transcription polymerase chain reaction J Biomol Tech15 155ndash166

105 BeitzingerM and MeisterG (2011) Experimental identificationof microRNA targets by immunoprecipitation of Argonauteprotein complexes Methods Mol Biol 732 153ndash167

106 FiedlerSD CarlettiMZ and ChristensonLK (2010)Quantitative RT-PCR methods for mature microRNA expressionanalysis Methods Mol Biol 630 49ndash64

107 ArefianE KianiJ SoleimaniM ShariatiSA Aghaee-BakhtiariSH AtashiA GheisariY AhmadbeigiN Banaei-MoghaddamAM NaderiM et al (2011) Analysis ofmicroRNA signatures using size-coded ligation-mediated PCRNucleic Acids Res 39 e80

108 JonstrupSP KochJ and KjemsJ (2006) A microRNAdetection system based on padlock probes and rolling circleamplification RNA 12 1747ndash1752

109 LatorraD ArarK and HurleyJM (2003) Designconsiderations and effects of LNA in PCR primers Mol CellProbes 17 253ndash259

110 BustinSA BenesV GarsonJA HellemansJ HuggettJKubistaM MuellerR NolanT PfafflMW ShipleyGLet al (2009) The MIQE guidelines minimum information forpublication of quantitative real-time PCR experiments ClinChem 55 611ndash622

111 TrapnellC WilliamsBA PerteaG MortazaviA KwanGvan BarenMJ SalzbergSL WoldBJ and PachterL (2010)Transcript assembly and quantification by RNA-Seq revealsunannotated transcripts and isoform switching during celldifferentiation Nat Biotechnol 28 511ndash515

112 TrapnellC PachterL and SalzbergSL (2009) TopHatdiscovering splice junctions with RNA-Seq Bioinformatics 251105ndash1111

113 ListerR OrsquoMalleyRC Tonti-FilippiniJ GregoryBDBerryCC MillarAH and EckerJR (2008) Highly integratedsingle-base resolution maps of the epigenome in ArabidopsisCell 133 523ndash536

114 PolidorosAN PasentsisK and TsaftarisAS (2006) Rollingcircle amplification-RACE a method for simultaneous isolationof 50 and 30 cDNA ends from amplified cDNA templatesBiotechniques 41 35ndash42

Nucleic Acids Research 2013 13

Page 7: SURVEY AND SUMMARY Biases in small RNA deep sequencing data

residue occupied the 30-terminal position of adapter(acceptor) molecules (5174) Recently adapter pools dis-playing sequence identity at all except the two 30-terminalnucleotides were used to investigate sequence-dependentvariation in acceptor function Based on deep sequencingas the readout the relative ranking of miRNA frequenciesvaried as a consequence of the specific adapterndashRNA com-binations (63) Once more miRNA expression profiles arenot only a function of true RNA abundance but alsoreflect varying ligation efficacies (63) Interestingly theuse of adapter pools highly improved the correlation ofmiRNA levels between replicates (63)

Influence of RNA secondary structure on RNA 50-endligationTo investigate potential influences on 50-end modificationderived from RNA secondary structure ligation yieldswere compared among test sets of fully randomizedRNAs A remarkable underrepresentation of double-strand folds at 50-ligation sites indicated strong preferencesfor single-stranded RNA substrates in ligation reactionsconducted with T4Rnl1 (75)

Sugar moiety-dependent change in ligation efficacyDNA-based adapter oligonucleotides performwell in RNA30-end ligation reactions (especially when 50-adenylated)whereas RNA-based adapters are preferred for RNA50-end modification (76) Kinetic analysis of DNA-basedadapter molecules for RNA 50-ligation demonstratedslower reactions compared with their RNA counterparts(77) Recent analyses revealed more complex substrate-specific effects between test RNAs and different adaptersWhen DNA- and RNA-based adapters of identical se-quences were investigated in ligation assays no generalpreference with regard to the sugar moiety of the adaptermolecule was observed in RNA 50-end ligation (3963) Inconclusion to avoid bias adapter pools are recommendedto display sequence variation The chance of productiveligation might be enhanced using RNA DNA or mixedadapter oligonucleotides (3963)

Adapter barcoding influences ligationIn addition to nucleotide preferences of the reactingresidues upstream sequences also exert influence onligation yields Adapters harboring different Roche MIDtags placed 19 nucleotides upstream from the reacting30-ends performed entirely differently when assayed withidentical test RNAs (39) Depending on the combinationof in vitro transcribed non-protein coding RNAs andMID-modified adapters ligation product levels (calculatedas non-protein coding RNA input) ranged from 60ndash70to undetectable (39) Recent reports emphasized thatbarcoding for multiplex sequencing leads to significantbias in miRNA expression profiles (78ndash80) This bias wasdiminished when barcodes were introduced by PCRamplification of cDNA (78ndash81)

Quantitative considerations and barcodingIn general the discovery of novel RNAs and analyses oftranscripts that are expressed at very low levels requiresubstantially more cDNA reads than experiments

directed toward investigating differential gene expressionsHence the design of sRNA-seq experiments needs totake into account the rationale of the investigationContigs formed by continuously overlapping readsduring genomic mapping procedures represent thebio-computational equivalent of individual RNAentities Any selected read during cDNA mapping eithercontributes to pre-existing already populated RNAs ormight generate novel contigs The probability of eitherprocess is a function of RNA size as well as the corres-ponding RNA expression value Of course contigs repre-senting longer rather than shorter RNAs are more likelyto be populated by chance RNAs of higher relative ex-pression levels are generally more densely packed withcDNA reads than RNAs that display medium or low tran-scription levels Consequently if multiplexing is chosenfor economic reasons the necessary sequencing depthmust be approximated prior to sequence runs in order togenerate adequate sample sizes The two fixed parametersfor estimating admissible multiplexing are therefore (i) ex-perimental goal and (ii) lane capacity According toIllumina TruSeq Small RNA Sample Prep Kitguidance 5 million sRNA-seq reads are considered appro-priate for novel miRNA discovery within human-sizedgenomes For miRNA profiling usually about 25 millionreads are recommended These approximations permit re-calculation geared toward specific needs and requirements

Alternative cloning strategies to avoid 50-end modificationBecause the 30-terminal position of cDNA corresponds tothe RNA 50-ends Pak et al (82) used a technique to avoidmodification of RNA 50-termini and therefore to minimizethe need for enzymatic pretreatment The strategy intro-duces adapter sequences not to RNA 50-termini butrather to 30-ends of the corresponding cDNAs (82) Theadvantage of omitting enzymatic treatment of RNA50-ends might be offset by incomplete reverse transcriptionthat precludes full-length cDNA cloning or appropriateassessment of RNA 50-endsRecently ligation to cDNA 30-ends was investigated in

more detail (36) Perfect double-stranded DNA-RNA testhybrids and single-stranded DNAs were utilized to mimicpotential products of reverse transcription All T4 DNA-and RNA-ligases including T4Rnl2 truncated derivativeswere compared to delineate optimal conditions for cDNA30-end modification Surprisingly ligation reactionscarried out with pre-adenylated DNA donor moleculesstrongly indicated that double-stranded DNA-RNAchimera constituted the most reactive substrates forT4Rnl1-mediated ligation In addition supplements ofPEG 8000 strikingly influenced ligation final concentra-tions of 25 PEG 8000 increased reaction yields Identicalsingle-stranded test cDNAs exhibited only poor substratecharacteristics The A-form-like conformation of theDNA-RNA hybrid might have compensated for theprimary structure-related influences and permitted moresimilar ligation outcomes among substrates (36)However the analysis examined only one test substrateand therefore might not be amenable to drawing generalconclusions (36) A further interesting observation indi-cates that cDNA 50-terminal nucleotides influence cDNA

Nucleic Acids Research 2013 7

30-end ligation yields (39) As the cDNA 50-terminalsequence is a consequence of specific RNA 30-end modifi-cation differences between protocols exert different influ-ences on cDNA 30-end adapter ligation and in turn mightalso cause expression profile bias (39)The lsquoSmartTM Approachrsquo (83) offers another interesting

alternative to capture mature RNA 50-ends and to avoidRNA 50-end adapter ligation The technology is based onthe template switching capacity of Moloney murineleukemia virus (M-MLV) reverse transcriptase Oncefirst strand synthesis approaches the RNA 50-end theterminal transferase activity of M-MLV reverse transcript-ase adds oligo(C)-stretches to the growing cDNAOligonucleotides that 30-terminate with oligo(G) stretchesserve to extend templates and permit reverse transcriptionto proceed Thus M-MLV reverse transcriptase enablesthe generation of cDNAs with known terminal sequenceelements (83) and thereby eliminates the need to modifythe RNA 50-ends Recently the technology was adapted tothe specific needs of Ilumina RNA-seq analysis (84)

Adapterndashadapter ligation disturbs sRNA-seqA side effect common to sRNA-seq protocols that rely onligase-mediated sequential RNA end-modification is PCRamplification of empty adapterndashadapter ligation products(8185) Adapterndashdimer formation is a direct consequenceof the molecular adapter excess employed to ensure suffi-cient product yield and might dominate in reactions wherethe concentration of total RNA starting material is low(8186) Adapter molecules that are 50-pre-adenylatedremain in reaction mixtures after RNA 30-end modifica-tion and react with RNA 50-adapter oligonucleotideswhich subsequently favors dimer formation Thereforepurifying products away from excess adapters via gelelectrophoresis is recommended (78) However for prep-aration of tiny RNAs such as miRNAs separation of thesmall size difference between ligated product and adaptersis not trivial Alternative approaches to avoid size frac-tionation rely on application of complementary lsquodimer-eliminator-LNA-oligonucleotidesrsquo (8586) The interferingoligonucleotide is designed to overlap the adapterndashadapterligation junction and thereby distinguish between targetedside product and adapter ligated RNA (86) The comple-mentarity of the interfering oligonucleotides extends intothe RNA 30-adapter sequence thus blocking primerannealing during reverse transcription and as a resultexcludes adapter dimers from RNA sequencing (86)Vigneault et al reported a similar strategy to overcomeunintended dimer amplification (81) Pre-annealing of theRT primer after RNA 30-adapter ligation but prior toRNA 50-end modification efficiently decreased adapterdimer formation The excess of non-ligated free RNA30-adapters was captured by complementary reverse tran-scriptase primers and sequestered in less-reactive double-stranded structures (81) The latter approach permitseffective reduction of dimer amplification and is reportedto increase cDNA library complexity (81) ObviouslyRNA tailing strategies (see above) are superior becausethey completely circumvent the problem of adapter-dimer formation To avoid adapter tailing as an analogous

side reaction to dimer formation it is advisable to carryout 30-tailing prior to 50-adapter ligation

Depending on the experimental goal sRNA-seqrequires varying amounts of cDNA reads (see above)Therefore techniques to reduce the amount of readsthat represent empty adapter dimers are of central import-ance as they permit an increase in sequencing depthreducing costs and ultimately enabling novel smallnon-protein coding RNA gene discovery

PCR AMPLIFICATION BIAS

Apart from novel third-generation sequencingtechnologies that permit direct sequence analysis of un-amplified cDNA all commonly used protocols implementPCR amplification (87) In addition the introduction ofbarcode identifiers by PCR amplification may also avoidbias introduced by differences in RNAndashadapter-specificligation efficacies (78) Therefore multiplexing of differentsamples during PCR is widely regarded to permit meta-analysis (78) However even PCR amplification itself isknown to introduce expression bias into datasets(303288) In particular when many different templatesare amplified in parallel variations not only in templatelength but also in base composition might lead topreferred template-specific amplifications (30) Extremeexamples of allelic dropout due to single base-exchangewere reported for PCR-mediated genotyping (89) TheRNA GC-content is especially responsible for data dis-tortion Generally cDNAs of high GC-content are lessreadily amplified and thus remain underrepresented Forexample read distributions calculated over the entire 28SrRNA were inversely correlated with the correspondingregional GC-content (32) As argued earlier base com-positions of different small non-protein coding RNAs(miRNAs piRNAs snoRNAs scaRNAs etc) varywidely The parallel amplification of complex templatemixtures that vary in their respective GC-content is there-fore a challenging procedure and likely to influence thecomplexity of the resulting cDNA library (30) Changesin complexity do not only influence distortion of quanti-tative measurements but certain RNAs might also lsquodropoutrsquo completely and remain unidentified Extended periodsof denaturation are considered to effectively decrease thelikelihood of uneven amplification (90) PCR amplifica-tion does not only change the overall GC-content of theresulting cDNA pool but is also reported to introducelength bias (30) which might be an important factorwhen RNAs of different sizes are subjected to parallelcDNA preparations Shorter sequences of significantlylower GC-content are likely to be overrepresented

Recent analysis also emphasized the impact of PCRbuffer compositions on PCR amplification bias (30)Remarkably PCR-mediated bias was already observedin some cases within 15 rounds of PCR amplification(30) While the popular Phusion PCR system surprisinglyperformed the worst (30) data indicate that someoptimized PCR buffers are capable of significantlyreducing PCR-mediated distortion (3090)

8 Nucleic Acids Research 2013

sRNA-SEQ AND PERSONALIZED MEDICINE

In recent years increasing attention has focused on thepotential of miRNA transcriptional profiling to diagnosedisease or to monitor progression and success in treatment(161791) The decreasing costs of high-throughputsequence analysis favors sRNA-seq as a common toolfor medical diagnostics and personalized medicine Aftercell death the total RNA of circulating cancer cells isreleased into the blood stream (92) These extracellularRNAs (exRNAs) are packaged within vesicles such asexosomes where they are complexed with differentproteins for example miRNAs in the RISC complexVesicles protect exRNA from ribonucleases and preventdegradation Among the various exRNAs miRNAs aresome of the most stable molecules (93) Easy accessibilityof patient blood samples and the stability of miRNAsmake them effective biomarkers (94) Diagnostics basedon miRNA were successfully conducted on biologicalsamples of low quality including dried blood (95) Inaddition miRNAs remain stable even when samples aresubjected to repeated rounds of tissue freezing andthawing which usually degrades RNAs (96) A furtherappealing feature of miRNAs in medical research is thatchanges in their expression profiles are observed duringdisease onset (97) In addition miRNAs often displaytissue-specific expression Therefore it might be possibleto identify the cellular origin of most metastases (91)possibly providing decision guidance for appropriatetreatment However as elaborated in detail miRNA ex-pression profiles detected by deep sequencing do not onlyreflect changes in relative expression but are alsoinfluenced by miRNA-adapter-specific ligation efficaciesespecially in cost-effective parallel analyses that oftenrequire multiplexing Different barcode introduced byligation generate bias in miRNA representation and abun-dance (78) Even when incorporated by PCR barcodingmight still lead to data distortion (30ndash32) Third-gener-ation sequencing techniques such as the Helicos (InNovember 2012 Helicos filed bankruptcy (httpbizyahoocome121115hlcs8-khtml)) system avoid PCRamplification steps in cDNA preparation and permitdirect sequencing of unamplified samples (87) The RNA30-end modification to prime cDNA synthesis (or directsequencing) in addition to relatively high error frequencies(15) might still lead to bias and so would not beespecially applicable in personalized medicine (98)Collectively deep sequencing to monitor change inmedical samples is of importance and certainly ofinterest but requires resolution of serious standardizationissues prior to general application

qRT-PCR a valid alternative to deep sequencing

Alternative techniques to quantify specific miRNA testsets rely on quantitative real-time reverse transcriptionPCR (qRT-PCR) (99) In particular diagnostic screensin cancer medicine rely on the application of qRT-PCR(99ndash101) In cases where biomarker disease correlation iswell established and when quantification of limitednumbers of biomarkers is sufficient for diagnosis and tomonitor disease progression or treatment qRT-PCR is

a valuable tool (99) Recently the convenience ofqRT-PCR-based medical diagnosis was suggested tomake it superior in routine clinical applications (99)Interestingly novel developments in qRT-PCR allowsample processing in 384-well formats and liken thismethod to cost-effective high-throughput applications(99102) A further advantage of qRT-PCR is the reduc-tion in downstream bio-computation (99103) The timebetween qRT-PCR and diagnostic result is shorter thanfor sRNA-seq or microarray-based inference Howeverthe analysis of miRNA expression profiles by qRT-PCRis technically challenging due to RNA-inherent pitfalls(103104)The miRNAs lack structural features such as mRNA

poly(A) tails that could serve as class-specific enrichmentor priming of reverse transcription (103) Alternative tech-niques for miRNA sample enrichment would rely onimmune-precipitation but most of them are not suitablefor typical small size samples in medical analysis (105) Asmentioned above the heterogeneity of the RNA GC-content translates into a wide range of primer annealingtemperatures for a population of miRNAs and mighttherefore complicate parallel analysis of multiple RNAs(103) Furthermore the qRT-PCR template is presentnot only within the mature miRNA molecule but also ispart of various precursor transcripts (103) It is thereforeimportant to ensure that primary or precursor-miRNAs(pri- or pre-miRNAs) do not provide templates inqRT-PCR amplification and as a consequence do notcontribute to the signal intensities detected Difficultiesthat relate to the similarity displayed between differentmiRNA isoforms are mentioned above within thecontext of array-based inference to quantify differentialgene expressionThus far two alternative approaches are commonly

utilized to reverse transcribe miRNA templates(103106) miRNA-specific stem-loop primers are oftenused when only limited numbers of RNAs are subject toquantification A major advantage of using miRNA-specific primers for reverse transcription is an increase insensitivity and specificity However RNA 30-end-tailing-based approaches enable bulk reverse transcription ofmultiple targets in parallel and are therefore advantageouswhen many miRNAs are analyzed (see above) A recentcomparative analysis to investigate reverse transcriptionefficacies of the two methods uncovered only minor dif-ferences (106) The analysis involved only a limitedmiRNA test set and therefore the results cannot begeneralized As an aside the oligo d(T)-based designmight also enable reverse transcription of miRNA andthe corresponding mRNA target in single test tube reac-tions (106) Ligation-mediated PCR or padlock probes incombination with rolling circle amplification are technicalalternatives for avoiding reverse transcription (107108)The methods rely on ligation-mediated template gener-ation and are therefore not devoid of bias (see above)Both approaches permit generation of miRNA signaturesacross samples and do not require elaborate analyticalpost-processing or expensive technical equipment(107108) However as elaborated here and abovereverse transcription is not devoid of template-specific

Nucleic Acids Research 2013 9

bias In particular RNA secondary structure and RNAmodifications both influence reverse transcription efficacy(see above) Similar to PCRs reverse transcription is nega-tively affected by increasing RNA GC-contentBecause the relative RNA GC-content dictates primer

Tm qRT-PCRs are also strongly influenced by forwardprimer design thereby affecting both qRT-PCR specificityand sensitivity Altering the length of primer-templatecomplementarity permits adjustment of the actualprimer Tm Conversely in cases where RNA nucleotidecomposition would dictate very low melting temperaturesLNA-modified nucleotides might be incorporated into theprimer sequence to increase the corresponding primer Tm(18) However the design of LNA-containing oligonucleo-tides is not always trivial (109) For the actual measure-ment of miRNA in real time two alternative fluorescentreporter molecules are frequently utilized SYBR Greenand the Taqman assay (103) Detection of the fluores-cence signal emitted by SYBR Green increases about100-fold as the molecule intercalates in double-strandedDNA However the mechanism of DNA intercalationsuggests the dye is not target-specific and thereforecannot discriminate primer dimers unspecific productsand target amplification (103) Because the intercalatingdye is simply added to the assay it is often possible toeasily adapt the qRT-PCR assay to already establishedand validated analytical PCRs (104) In comparison theTaqman assay provides increased specificity as the de-tection is based on sequence-specific probendashtarget inter-action Certainly the qRT-PCR assays offer manyadvantages but require evaluation of various factors thatinfluence and potentially bias analysis Thereforeinternational standardization to ensure comparable ex-perimental design and to permit meta-analysis amongdatasets is of prime importance for establishing clinicalqRT-PCR-based personal medicine The MinimumInformation for Publication of Quantitative Real-TimePCR experiments guidelines established standards for bet-ter experimental practice and suggest routes to data ana-lysis that increase reliability and data comparison (110)

mRNA-SEQ

Deep sequencing is not limited to expression profiling ofmiRNA or other small non-protein coding RNAs Evenentire transcriptomes including all protein coding andlong non-protein coding RNAs are subjected to high-throughput sequencing methods Currently as the lengthof single sequence runs is limited (from 100 to 500 nt de-pending on the sequencing platform) longer RNAsrequire fragmentation prior to cDNA synthesis (25)Millions of small reads resulting from mRNA-seq areanalyzed to identify novel transcripts or splice variants(111112) Relative quantification of reads enables one tomonitor differential gene expression among samples andto identify changes in isoform distribution (111) As withsRNA-seq the digital nature of RNA deep sequencing isthought to simplify meta-analysis of mRNA or long non-protein coding RNA expression data (25) Some protocolsfor modification of RNA fragment ends rely on

chemistries identical to sRNA-seq (84113) Therefore ex-pression biases observed in sRNA-seq analysis are alsolikely to cause distortion in whole transcriptome profilingVariations in nucleotide composition and changes in RNAsecondary structure of different RNA fragments mightcorrelate with uneven read distribution Unequal readcoverage generates uncertainty in isoform detection andin extreme cases exons representing unfavorable nucleo-tide compositions might escape analysis As fragmentationis suggested to generate many different but at least par-tially overlapping reads bias in mRNA-seq might be lessprominent Recently a novel technique based on circligasewas introduced to avoid RNA 50-end ligation (114)Reverse transcription is conducted with anchoredoligo(dT) primers that contain an additional sequence topermit later PCR amplification Following first strandcDNA synthesis circligase-mediated ligation is carriedout The resulting circular molecules are subjected toPCR amplification The strategy is of general interestand might be a valuable tool for sRNA-seq as well

CONCLUSION

Bias in general describes systematic errors that reflectmethod-related distortion from the truth Bias might bedetected easily when data generated by different methodsare subjected to analytical comparison Because it is sys-tematic in nature bias is not subject to variation inrepeated experiments Here based on biochemical anddeep sequencing data analyses we have presentedvarious steps in sRNA-seq protocols likely to causesevere distortions in the relative expression levels of indi-vidual sRNAs

The efficacy of RNA end-modification to permit cDNAconstruction depends not only on differences in strategiesand reagents but also on changes in among others buffercompositions and additives that could increase molecularcrowding Therefore meta-analysis of data generated withnon-identical methods reagents or even buffers is likely tobe error-prone (36396378) To achieve comparableresults and to evaluate relative changes in the quantifica-tion of the resulting data experiments and analyses shouldbe conducted under strictly identical conditions It has tobe stressed that bias in RNA-specific variation is not onlyrestricted to inter-sample comparison but also effects therelative ranking of intra-sample RNA expression

In addition to the more obvious quantitative aspects ofbias there are also qualitative aspects because RNA-seqmethods are also employed for RNA discovery IndividualRNAs might lsquodrop outrsquo from detection because they arenot amenable to cDNA construction under chosen condi-tions Certainly to increase the likelihood of productiveRNA end-modification and increase the complexity ofsRNA-seq library content the parallel application of dif-ferent strategies and reagents is required (33396379) Inparticular increased variability of adapter sequences (ieadapter pools) helps to increase the diversity of RNAsaccessed (63) A recent report emphasized the need forparallel application of different RNA- and DNA-basedadapter oligonucleotides (39)

10 Nucleic Acids Research 2013

The source of expression level distortion and the failureto detect certain RNA species at all appear not to belimited to reactions of RNAcDNA-end modificationbut also apply to cDNA amplification by PCR (30ndash32)Differences in reaction buffers RNA GC-content sec-ondary structures RNA length and primers might eachlead to bias during PCR Therefore the analysis ofsRNA-seq data to examine relative changes in geneexpression or identification reflects both true RNA abun-dance and biases related to the methods applied

The sRNA-seq approach is also attractive to medicalresearch Deep sequencing might permit the screening ofmultiple patient samples to monitor the progression ofdisease or treatment Therefore it is urgently necessaryto establish rigid international standardizations tominimize distortion and to avoid misleading conclusionsduring RNA-seq data interpretation In summary quali-tative and quantitative RNomics remain more challengingthan anticipated

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

FUNDING

National Genome Research Network [NGFNIII01GS0808 to JB and TSR] the Deutsche Forschungs-gemeinschaft [BR 7544-1 to JB] USM ResearchUniversity Grant [1001CIPPT81305 to THT]Funding for open access charge The licence was fundedby Dr Brosiusrsquos research budget

Conflict of interest statement None declared

REFERENCES

1 BrosiusJ (2009) The fragmented gene Ann N Y Acad Sci1178 186ndash193

2 BrosiusJ and TiedgeH (2004) RNomenclature RNA Biol 181ndash83

3 LiuQ and ParooZ (2010) Biochemical principles of small RNApathways Annu Rev Biochem 79 295ndash319

4 BrosiusJ (2012) In ErdmannVA and BarciszewskiJ (eds)From Nucleic Acids Sequences to Molecular Medicine RNATechnologies Springer Berlin Heidelberg pp 1ndash18

5 HuttenhoferA BrosiusJ and BachellerieJP (2002) RNomicsidentification and function of small non-messenger RNAs CurrOpin Chem Biol 6 835ndash843

6 AnS and SongJJ (2011) The coded functions of noncodingRNAs for gene regulation Mol Cells 31 491ndash496

7 BushatiN and CohenSM (2007) microRNA functions AnnuRev Cell Dev Biol 23 175ndash205

8 ChiangHR SchoenfeldLW RubyJG AuyeungVCSpiesN BaekD JohnstonWK RussC LuoS BabiarzJEet al (2010) Mammalian microRNAs experimental evaluation ofnovel and previously annotated genes Genes Dev 24 992ndash1009

9 AmbrosV (2001) microRNAs tiny regulators with greatpotential Cell 107 823ndash826

10 LauNC LimLP WeinsteinEG and BartelDP (2001) Anabundant class of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 294 858ndash862

11 RuvkunG (2001) Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash799

12 RuvkunGB (2003) The tiny RNA world Harvey Lect 991ndash21

13 GurtanAM and SharpPA (2013) The Role of miRNAs inRegulating Gene Expression Networks J Mol Biol 4253582ndash3600

14 LiangH ZhangJ ZenK ZhangCY and ChenX (2013)Nuclear microRNAs and their unconventional role in regulatingnon-coding RNAs Protein Cell 4 325ndash330

15 HuangV and LiLC (2012) miRNA goes nuclear RNA Biol 9269ndash273

16 WangZ HanJ CuiY FanK and ZhouX (2013) CirculatingmicroRNA-21 as noninvasive predictive biomarker for response incancer immunotherapy Med Hypotheses 81 41ndash43

17 ZhaoC ZhangJ ZhangS YuD ChenY LiuQ ShiMNiC and ZhuM (2013) Diagnostic and biological significance ofmicroRNA-192 in pancreatic ductal adenocarcinoma Oncol Rep30 276ndash284

18 van RooijE (2011) The art of microRNA research Circ Res108 219ndash234

19 LeeLW ZhangS EtheridgeA MaL MartinD GalasDand WangK (2010) Complexity of the microRNA repertoirerevealed by next-generation sequencing RNA 16 2170ndash2180

20 OkoniewskiMJ and MillerCJ (2006) Hybridization interactionsbetween probesets in short oligo microarrays lead to spuriouscorrelations BMC Bioinformatics 7 276

21 MortazaviA WilliamsBA McCueK SchaefferL andWoldB (2008) Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 5 621ndash628

22 LiuS LinL JiangP WangD and XingY (2011) Acomparison of RNA-Seq and high-density exon array fordetecting differential gene expression between closely relatedspecies Nucleic Acids Res 39 578ndash588

23 RaghavachariN BarbJ YangY LiuP WoodhouseKLevyD OrsquoDonnellC MunsonPJ and KatoG (2012) Asystematic comparison and evaluation of high density exon arraysand RNA-seq technology used to unravel the peripheral bloodtranscriptome of sickle cell disease BMC Med Genomics 5 28

24 MartinJA and WangZ (2011) Next-generation transcriptomeassembly Nat Rev Genet 12 671ndash682

25 WangZ GersteinM and SnyderM (2009) RNA-Seq arevolutionary tool for transcriptomics Nat Rev Genet 1057ndash63

26 KozomaraA and Griffiths-JonesS (2011) miRBase integratingmicroRNA annotation and deep-sequencing data Nucleic AcidsRes 39 D152ndashD157

27 LinsenSE de WitE JanssensG HeaterS ChapmanLParkinRK FritzB WymanSK de BruijnE VoestEE et al(2009) Limitations and possibilities of small RNA digital geneexpression profiling Nat Methods 6 474ndash476

28 BakerM (2010) MicroRNA profiling separating signal fromnoise Nat Methods 7 687ndash692

29 HafnerM RenwickN BrownM MihailovicA HolochDLinC PenaJT NusbaumJD MorozovP LudwigJ et al(2011) RNA-ligase-dependent biases in miRNA representationin deep-sequenced small RNA cDNA libraries RNA 171697ndash1712

30 DabneyJ and MeyerM (2012) Length and GC-biases duringsequencing library amplification a comparison of variouspolymerase-buffer systems with ancient and modern DNAsequencing libraries Biotechniques 52 87ndash94

31 OrpanaAK HoTH and StenmanJ (2012) Multiple heatpulses during PCR extension enabling amplification of GC-richsequences and reducing amplification bias Anal Chem 842081ndash2087

32 SendlerE JohnsonGD and KrawetzSA (2011) Local andglobal factors affecting RNA sequencing analysis Anal Biochem419 317ndash322

33 ZhuangF FuchsRT SunZ ZhengY and RobbGB (2012)Structural bias in T4 RNA ligase-mediated 30-adapter ligationNucleic Acids Res 40 e54

34 DeChiaraTM and BrosiusJ (1987) Neural BC1 RNA cDNAclones reveal nonrepetitive sequence content Proc Natl AcadSci USA 84 2624ndash2628

Nucleic Acids Research 2013 11

35 WickensM and KwakJE (2008) Molecular biology A tail talefor U Science 319 1344ndash1345

36 MunafoDB and RobbGB (2010) Optimization of enzymaticreaction conditions for generating representative pools of cDNAfrom small RNA RNA 16 2537ndash2552

37 SarkarN (1997) Polyadenylation of mRNA in prokaryotesAnnu Rev Biochem 66 173ndash197

38 Yehudai-ResheffS and SchusterG (2000) Characterization of theEcoli poly(A) polymerase nucleotide specificity RNA-bindingaffinities and RNA structure dependence Nucleic Acids Res 281139ndash1144

39 RaabeCA HoeCH RandauG BrosiusJ TangTH andRozhdestvenskyTS (2011) The rocks and shallows of deep RNAsequencing Examples in the Vibrio cholerae RNome RNA 171357ndash1366

40 RaabeCA SanchezCP RandauG RobeckT SkryabinBVChinniSV KubeM ReinhardtR NgGH ManickamRet al (2010) A global view of the nonprotein-codingtranscriptome in Plasmodium falciparum Nucleic Acids Res 38608ndash617

41 SharmaCM HoffmannS DarfeuilleF ReignierJ FindeissSSittkaA ChabasS ReicheK HackermullerJ ReinhardtRet al (2010) The primary transcriptome of the major humanpathogen Helicobacter pylori Nature 464 250ndash255

42 SharmaCM and VogelJ (2009) Experimental approaches forthe discovery and characterization of regulatory small RNACurr Opin Microbiol 12 536ndash546

43 YuB YangZ LiJ MinakhinaS YangM PadgettRWStewardR and ChenX (2005) Methylation as a crucial step inplant microRNA biogenesis Science 307 932ndash935

44 HouwingS KammingaLM BerezikovE CronemboldDGirardA van den ElstH FilippovDV BlaserH RazEMoensCB et al (2007) A role for Piwi and piRNAs in germcell maintenance and transposon silencing in Zebrafish Cell 12969ndash82

45 AckermannHW (2003) Bacteriophage observations andevolution Res Microbiol 154 245ndash251

46 RossmannMG MesyanzhinovVV ArisakaF andLeimanPG (2004) The bacteriophage T4 DNA injectionmachine Curr Opin Struct Biol 14 171ndash180

47 HoCK and ShumanS (2002) Bacteriophage T4 RNA ligase 2(gp241) exemplifies a family of RNA ligases found in allphylogenetic domains Proc Natl Acad Sci USA 9912709ndash12714

48 YinS HoCK and ShumanS (2003) Structure-function analysisof T4 RNA ligase 2 J Biol Chem 278 17601ndash17608

49 AravinA and TuschlT (2005) Identification and characterizationof small RNAs involved in RNA silencing FEBS Lett 5795830ndash5840

50 ViolletS FuchsRT MunafoDB ZhuangF and RobbGB(2011) T4 RNA ligase 2 truncated active site mutants improvedtools for RNA analysis BMC Biotechnol 11 72

51 UhlenbeckOC and GumportRI (1982) T4 RNA LigaseIn BoyerPD (ed) The Enzymes Nucleic Acids Part BAcademic Press Inc New York pp 31ndash58

52 HoCK WangLK LimaCD and ShumanS (2004) Structureand mechanism of RNA ligase Structure 12 327ndash339

53 El OmariK RenJ BirdLE BonaMK KlarmannGLeGriceSF and StammersDK (2006) Molecular architectureand ligand recognition determinants for T4 RNA ligase J BiolChem 281 1573ndash1579

54 StarkMR PleissJA DerasM ScaringeSA and RaderSD(2006) An RNA ligase-mediated method for the efficient creationof large synthetic RNAs RNA 12 2014ndash2019

55 CranstonJW SilberR MalathiVG and HurwitzJ (1974)Studies on ribonucleic acid ligase Characterization of anadenosine triphosphate-inorganic pyrophosphate exchangereaction and demonstration of an enzyme-adenylate complex withT4 bacteriophage-induced enzyme J Biol Chem 2497447ndash7456

56 HigginsNP GeballeAP SnopekTJ SuginoA andCozzarelliNR (1977) Bacteriophage T4 RNA ligase preparationof a physically homogeneous nuclease-free enzyme fromhyperproducing infected cells Nucleic Acids Res 4 3175ndash3186

57 PfefferS Lagos-QuintanaM and TuschlT (2005) Cloning ofsmall RNA molecules Curr Protoc Mol Biol Chapter 26 Unit26 24

58 BruceAG and UhlenbeckOC (1978) Reactions at the terminiof tRNA with T4 RNA ligase Nucleic Acids Res 5 3665ndash3677

59 VigneaultF SismourAM and ChurchGM (2008) EfficientmicroRNA capture and bar-coding via enzymatic oligonucleotideadenylation Nat Methods 5 777ndash779

60 LiuJM LivnyJ LawrenceMS KimballMD WaldorMKand CamilliA (2009) Experimental discovery of sRNAs in Vibriocholerae by direct cloning 5StRNA depletion and parallelsequencing Nucleic Acids Res 37 e46

61 KrugM and UhlenbeckOC (1982) Reversal of T4 RNA ligaseBiochemistry 21 1858ndash1864

62 HafnerM LandgrafP LudwigJ RiceA OjoT LinCHolochD LimC and TuschlT (2008) Identification ofmicroRNAs and other small regulatory RNAs using cDNAlibrary sequencing Methods 44 3ndash12

63 JayaprakashAD JabadoO BrownBD and SachidanandamR(2011) Identification and remediation of biases in the activity ofRNA ligases in small-RNA deep sequencing Nucleic Acids Res39 e141

64 YewCW and KumarSV (2012) Isolation and cloning ofmicroRNAs from recalcitrant plant tissues with small amounts oftotal RNA a step-by step approach Mol Biol Rep 391783ndash1790

65 KurthHM and MochizukiK (2009) 2rsquo-O-methylation stabilizesPiwi-associated small RNAs and ensures DNA elimination inTetrahymena RNA 15 675ndash685

66 EbhardtHA ThiEP WangMB and UnrauPJ (2005)Extensive 30 modification of plant small RNAs is modulated byhelper component-proteinase expression Proc Natl Acad SciUSA 102 13398ndash13403

67 WangJ CzechB CrunkA WallaceA MitrevaMHannonGJ and DavisRE (2011) Deep small RNA sequencingfrom the nematode Ascaris reveals conservation functionaldiversification and novel developmental profiles Genome Res 211462ndash1477

68 ZhuangF FuchsRT and RobbGB (2012) Small RNAexpression profiling by high-throughput sequencing implicationsof enzymatic manipulation J Nucleic Acids 2012 360358

69 FujimuraT and EstebanR (2012) Cap snatching of yeast L-Adouble-stranded RNA virus can operate in trans and requiresviral polymerase actively engaging in transcription J Biol Chem287 12797ndash12804

70 BessmanMJ WalshJD DunnCA SwaminathanJWeldonJE and ShenJ (2001) The gene ygdP associated withthe invasiveness of Escherichia coli K1 designates a Nudixhydrolase Orf176 active on adenosine (50)-pentaphospho-(50)-adenosine (Ap5A) J Biol Chem 276 37834ndash37838

71 DeanaA CelesnikH and BelascoJG (2008) The bacterialenzyme RppH triggers messenger RNA degradation by50 pyrophosphate removal Nature 451 355ndash358

72 BlewettN CollerJ and GoldstrohmA (2011) A quantitativeassay for measuring mRNA decapping by splinted ligation reversetranscription polymerase chain reaction qSL-RT-PCR RNA 17535ndash543

73 KrogerC DillonSC CameronAD PapenfortKSivasankaranSK HokampK ChaoY SittkaA HebrardMHandlerK et al (2012) The transcriptional landscape and smallRNAs of Salmonella enterica serovar Typhimurium Proc NatlAcad Sci USA 109 E1277ndashE1286

74 EnglandTE and UhlenbeckOC (1978) Enzymaticoligoribonucleotide synthesis with T4 RNA ligase Biochemistry17 2069ndash2076

75 SorefanK PaisH HallAE KozomaraA Griffiths-JonesSMoultonV and DalmayT (2012) Reducing ligation bias of smallRNAs in libraries for next generation sequencing Silence 3 4

76 ZhelkovskyAM and McReynoldsLA (2011) Simple andefficient synthesis of 50 pre-adenylated DNA using thermostableRNA ligase Nucleic Acids Res 39 e117

77 McCoyMI and GumportRI (1980) T4 ribonucleic acid ligasejoins single-strand oligo(deoxyribonucleotides) Biochemistry 19635ndash642

12 Nucleic Acids Research 2013

78 AlonS VigneaultF EminagaS ChristodoulouDCSeidmanJG ChurchGM and EisenbergE (2011) Barcodingbias in high-throughput multiplex sequencing of miRNA GenomeRes 21 1506ndash1511

79 SunG WuX WangJ LiH LiX GaoH RossiJ andYenY (2011) A bias-reducing strategy in profiling small RNAsusing Solexa RNA 17 2256ndash2262

80 Van NieuwerburghF SoetaertS PodshivalovaK Ay-LinWangE SchafferL DeforceD SalomonDR HeadSR andOrdoukhanianP (2011) Quantitative bias in Illumina TruSeq anda novel post amplification barcoding strategy for multiplexedDNA and small RNA deep sequencing PLoS One 6 e26969

81 VigneaultF Ter-OvanesyanD AlonS EminagaSC ChristodoulouD SeidmanJG EisenbergE and MChurchG (2012) High-throughput multiplex sequencingof miRNA Curr Protoc Hum Genet Chapter 11 Unit 1112 11-10

82 PakJ and FireA (2007) Distinct populations of primary andsecondary effectors during RNAi in C elegans Science 315241ndash244

83 ZhuYY MachlederEM ChenchikA LiR and SiebertPD(2001) Reverse transcriptase template switching a SMARTapproach for full-length cDNA library constructionBiotechniques 30 892ndash897

84 LevinJZ YassourM AdiconisX NusbaumCThompsonDA FriedmanN GnirkeA and RegevA (2010)Comprehensive comparative analysis of strand-specific RNAsequencing methods Nat Methods 7 709ndash715

85 AndoY BurroughsAM KawanoM de HoonMJ andHayashizakiY (2012) Targeted methods to improve small RNAprofiles generated by deep sequencing In MallickB andGhoshZ (eds) Regulatory RNAs Basics Methods andApplications Springer-Verlag Heidelberg Germany pp 253ndash271

86 KawanoM KawazuC LizioM KawajiH CarninciPSuzukiH and HayashizakiY (2010) Reduction of non-insertsequence reads by dimer eliminator LNA oligonucleotide forsmall RNA deep sequencing Biotechniques 49 751ndash755

87 KapranovP OzsolakF and MilosPM (2012) Profiling of shortRNAs using Helicos single-molecule sequencing Methods MolBiol 822 219ndash232

88 KanagawaT (2003) Bias and artifacts in multitemplatepolymerase chain reactions (PCR) J Biosci Bioeng 96 317ndash323

89 DayDJ SpeiserPW SchulzeE BettendorfM FitnessJBaranyF and WhitePC (1996) Identification of non-amplifyingCYP21 genes when using PCR-based diagnosis of 21-hydroxylasedeficiency in congenital adrenal hyperplasia (CAH) affectedpedigrees Hum Mol Genet 5 2039ndash2048

90 AirdD RossMG ChenWS DanielssonM FennellTRussC JaffeDB NusbaumC and GnirkeA (2011) Analyzingand minimizing PCR amplification bias in Illumina sequencinglibraries Genome Biol 12 R18

91 UllahMF and AatifM (2009) The footprints of cancerdevelopment cancer biomarkers Cancer Treat Rev 35 193ndash200

92 MoMH ChenL FuY WangW and FuSW (2012) Cell-freeCirculating miRNA Biomarkers in Cancer J Cancer 3 432ndash448

93 ZenK and ZhangCY (2012) Circulating microRNAs a novelclass of biomarkers to diagnose and monitor human cancersMed Res Rev 32 326ndash348

94 KimT and ReitmairA (2013) Non-Coding RNAs FunctionalAspects and Diagnostic Utility in Oncology Int J Mol Sci 144934ndash4968

95 PatnaikSK MallickR and YendamuriS (2010) Detection ofmicroRNAs in dried serum blots Anal Biochem 407 147ndash149

96 MitchellPS ParkinRK KrohEM FritzBR WymanSKPogosova-AgadjanyanEL PetersonA NoteboomJOrsquoBriantKC AllenA et al (2008) Circulating microRNAs as

stable blood-based markers for cancer detection Proc NatlAcad Sci USA 105 10513ndash10518

97 CieslaM SkrzypekK KozakowskaM LobodaAJozkowiczA and DulakJ (2011) MicroRNAs as biomarkers ofdisease onset Anal Bioanal Chem 401 2051ndash2061

98 OzsolakF (2012) Third-generation sequencing techniques andapplications to drug discovery Expert Opin Drug Discov 7231ndash243

99 TsongalisGJ CalinG CordelierP CroceC MonzonF andSzafranska-SchwarzbachAE (2013) MicroRNA analysis is itready for prime time Clin Chem 59 343ndash347

100 AmiraN Cancel-TassinG BernardiniS Cochand-PriolletBBittardH ManginP FournierG LatilA and CussenotO(2004) Expression in bladder transitional cell carcinoma by real-time quantitative reverse transcription polymerase chain reactionarray of 65 genes at the tumor suppressor locus 9q341-2identification of 5 candidates tumor suppressor genes Int JCancer 111 539ndash542

101 ZhangX (2007) Biomarker validation movement towardspersonalized medicine Expert Rev Mol Diagn 7 469ndash471

102 SchmittgenTD LeeEJ and JiangJ (2008) High-throughputreal-time PCR Methods Mol Biol 429 89ndash98

103 BenesV and CastoldiM (2010) Expression profiling ofmicroRNA using real-time quantitative PCR how to use it andwhat is available Methods 50 244ndash249

104 BustinSA and NolanT (2004) Pitfalls of quantitative real-timereverse-transcription polymerase chain reaction J Biomol Tech15 155ndash166

105 BeitzingerM and MeisterG (2011) Experimental identificationof microRNA targets by immunoprecipitation of Argonauteprotein complexes Methods Mol Biol 732 153ndash167

106 FiedlerSD CarlettiMZ and ChristensonLK (2010)Quantitative RT-PCR methods for mature microRNA expressionanalysis Methods Mol Biol 630 49ndash64

107 ArefianE KianiJ SoleimaniM ShariatiSA Aghaee-BakhtiariSH AtashiA GheisariY AhmadbeigiN Banaei-MoghaddamAM NaderiM et al (2011) Analysis ofmicroRNA signatures using size-coded ligation-mediated PCRNucleic Acids Res 39 e80

108 JonstrupSP KochJ and KjemsJ (2006) A microRNAdetection system based on padlock probes and rolling circleamplification RNA 12 1747ndash1752

109 LatorraD ArarK and HurleyJM (2003) Designconsiderations and effects of LNA in PCR primers Mol CellProbes 17 253ndash259

110 BustinSA BenesV GarsonJA HellemansJ HuggettJKubistaM MuellerR NolanT PfafflMW ShipleyGLet al (2009) The MIQE guidelines minimum information forpublication of quantitative real-time PCR experiments ClinChem 55 611ndash622

111 TrapnellC WilliamsBA PerteaG MortazaviA KwanGvan BarenMJ SalzbergSL WoldBJ and PachterL (2010)Transcript assembly and quantification by RNA-Seq revealsunannotated transcripts and isoform switching during celldifferentiation Nat Biotechnol 28 511ndash515

112 TrapnellC PachterL and SalzbergSL (2009) TopHatdiscovering splice junctions with RNA-Seq Bioinformatics 251105ndash1111

113 ListerR OrsquoMalleyRC Tonti-FilippiniJ GregoryBDBerryCC MillarAH and EckerJR (2008) Highly integratedsingle-base resolution maps of the epigenome in ArabidopsisCell 133 523ndash536

114 PolidorosAN PasentsisK and TsaftarisAS (2006) Rollingcircle amplification-RACE a method for simultaneous isolationof 50 and 30 cDNA ends from amplified cDNA templatesBiotechniques 41 35ndash42

Nucleic Acids Research 2013 13

Page 8: SURVEY AND SUMMARY Biases in small RNA deep sequencing data

30-end ligation yields (39) As the cDNA 50-terminalsequence is a consequence of specific RNA 30-end modifi-cation differences between protocols exert different influ-ences on cDNA 30-end adapter ligation and in turn mightalso cause expression profile bias (39)The lsquoSmartTM Approachrsquo (83) offers another interesting

alternative to capture mature RNA 50-ends and to avoidRNA 50-end adapter ligation The technology is based onthe template switching capacity of Moloney murineleukemia virus (M-MLV) reverse transcriptase Oncefirst strand synthesis approaches the RNA 50-end theterminal transferase activity of M-MLV reverse transcript-ase adds oligo(C)-stretches to the growing cDNAOligonucleotides that 30-terminate with oligo(G) stretchesserve to extend templates and permit reverse transcriptionto proceed Thus M-MLV reverse transcriptase enablesthe generation of cDNAs with known terminal sequenceelements (83) and thereby eliminates the need to modifythe RNA 50-ends Recently the technology was adapted tothe specific needs of Ilumina RNA-seq analysis (84)

Adapterndashadapter ligation disturbs sRNA-seqA side effect common to sRNA-seq protocols that rely onligase-mediated sequential RNA end-modification is PCRamplification of empty adapterndashadapter ligation products(8185) Adapterndashdimer formation is a direct consequenceof the molecular adapter excess employed to ensure suffi-cient product yield and might dominate in reactions wherethe concentration of total RNA starting material is low(8186) Adapter molecules that are 50-pre-adenylatedremain in reaction mixtures after RNA 30-end modifica-tion and react with RNA 50-adapter oligonucleotideswhich subsequently favors dimer formation Thereforepurifying products away from excess adapters via gelelectrophoresis is recommended (78) However for prep-aration of tiny RNAs such as miRNAs separation of thesmall size difference between ligated product and adaptersis not trivial Alternative approaches to avoid size frac-tionation rely on application of complementary lsquodimer-eliminator-LNA-oligonucleotidesrsquo (8586) The interferingoligonucleotide is designed to overlap the adapterndashadapterligation junction and thereby distinguish between targetedside product and adapter ligated RNA (86) The comple-mentarity of the interfering oligonucleotides extends intothe RNA 30-adapter sequence thus blocking primerannealing during reverse transcription and as a resultexcludes adapter dimers from RNA sequencing (86)Vigneault et al reported a similar strategy to overcomeunintended dimer amplification (81) Pre-annealing of theRT primer after RNA 30-adapter ligation but prior toRNA 50-end modification efficiently decreased adapterdimer formation The excess of non-ligated free RNA30-adapters was captured by complementary reverse tran-scriptase primers and sequestered in less-reactive double-stranded structures (81) The latter approach permitseffective reduction of dimer amplification and is reportedto increase cDNA library complexity (81) ObviouslyRNA tailing strategies (see above) are superior becausethey completely circumvent the problem of adapter-dimer formation To avoid adapter tailing as an analogous

side reaction to dimer formation it is advisable to carryout 30-tailing prior to 50-adapter ligation

Depending on the experimental goal sRNA-seqrequires varying amounts of cDNA reads (see above)Therefore techniques to reduce the amount of readsthat represent empty adapter dimers are of central import-ance as they permit an increase in sequencing depthreducing costs and ultimately enabling novel smallnon-protein coding RNA gene discovery

PCR AMPLIFICATION BIAS

Apart from novel third-generation sequencingtechnologies that permit direct sequence analysis of un-amplified cDNA all commonly used protocols implementPCR amplification (87) In addition the introduction ofbarcode identifiers by PCR amplification may also avoidbias introduced by differences in RNAndashadapter-specificligation efficacies (78) Therefore multiplexing of differentsamples during PCR is widely regarded to permit meta-analysis (78) However even PCR amplification itself isknown to introduce expression bias into datasets(303288) In particular when many different templatesare amplified in parallel variations not only in templatelength but also in base composition might lead topreferred template-specific amplifications (30) Extremeexamples of allelic dropout due to single base-exchangewere reported for PCR-mediated genotyping (89) TheRNA GC-content is especially responsible for data dis-tortion Generally cDNAs of high GC-content are lessreadily amplified and thus remain underrepresented Forexample read distributions calculated over the entire 28SrRNA were inversely correlated with the correspondingregional GC-content (32) As argued earlier base com-positions of different small non-protein coding RNAs(miRNAs piRNAs snoRNAs scaRNAs etc) varywidely The parallel amplification of complex templatemixtures that vary in their respective GC-content is there-fore a challenging procedure and likely to influence thecomplexity of the resulting cDNA library (30) Changesin complexity do not only influence distortion of quanti-tative measurements but certain RNAs might also lsquodropoutrsquo completely and remain unidentified Extended periodsof denaturation are considered to effectively decrease thelikelihood of uneven amplification (90) PCR amplifica-tion does not only change the overall GC-content of theresulting cDNA pool but is also reported to introducelength bias (30) which might be an important factorwhen RNAs of different sizes are subjected to parallelcDNA preparations Shorter sequences of significantlylower GC-content are likely to be overrepresented

Recent analysis also emphasized the impact of PCRbuffer compositions on PCR amplification bias (30)Remarkably PCR-mediated bias was already observedin some cases within 15 rounds of PCR amplification(30) While the popular Phusion PCR system surprisinglyperformed the worst (30) data indicate that someoptimized PCR buffers are capable of significantlyreducing PCR-mediated distortion (3090)

8 Nucleic Acids Research 2013

sRNA-SEQ AND PERSONALIZED MEDICINE

In recent years increasing attention has focused on thepotential of miRNA transcriptional profiling to diagnosedisease or to monitor progression and success in treatment(161791) The decreasing costs of high-throughputsequence analysis favors sRNA-seq as a common toolfor medical diagnostics and personalized medicine Aftercell death the total RNA of circulating cancer cells isreleased into the blood stream (92) These extracellularRNAs (exRNAs) are packaged within vesicles such asexosomes where they are complexed with differentproteins for example miRNAs in the RISC complexVesicles protect exRNA from ribonucleases and preventdegradation Among the various exRNAs miRNAs aresome of the most stable molecules (93) Easy accessibilityof patient blood samples and the stability of miRNAsmake them effective biomarkers (94) Diagnostics basedon miRNA were successfully conducted on biologicalsamples of low quality including dried blood (95) Inaddition miRNAs remain stable even when samples aresubjected to repeated rounds of tissue freezing andthawing which usually degrades RNAs (96) A furtherappealing feature of miRNAs in medical research is thatchanges in their expression profiles are observed duringdisease onset (97) In addition miRNAs often displaytissue-specific expression Therefore it might be possibleto identify the cellular origin of most metastases (91)possibly providing decision guidance for appropriatetreatment However as elaborated in detail miRNA ex-pression profiles detected by deep sequencing do not onlyreflect changes in relative expression but are alsoinfluenced by miRNA-adapter-specific ligation efficaciesespecially in cost-effective parallel analyses that oftenrequire multiplexing Different barcode introduced byligation generate bias in miRNA representation and abun-dance (78) Even when incorporated by PCR barcodingmight still lead to data distortion (30ndash32) Third-gener-ation sequencing techniques such as the Helicos (InNovember 2012 Helicos filed bankruptcy (httpbizyahoocome121115hlcs8-khtml)) system avoid PCRamplification steps in cDNA preparation and permitdirect sequencing of unamplified samples (87) The RNA30-end modification to prime cDNA synthesis (or directsequencing) in addition to relatively high error frequencies(15) might still lead to bias and so would not beespecially applicable in personalized medicine (98)Collectively deep sequencing to monitor change inmedical samples is of importance and certainly ofinterest but requires resolution of serious standardizationissues prior to general application

qRT-PCR a valid alternative to deep sequencing

Alternative techniques to quantify specific miRNA testsets rely on quantitative real-time reverse transcriptionPCR (qRT-PCR) (99) In particular diagnostic screensin cancer medicine rely on the application of qRT-PCR(99ndash101) In cases where biomarker disease correlation iswell established and when quantification of limitednumbers of biomarkers is sufficient for diagnosis and tomonitor disease progression or treatment qRT-PCR is

a valuable tool (99) Recently the convenience ofqRT-PCR-based medical diagnosis was suggested tomake it superior in routine clinical applications (99)Interestingly novel developments in qRT-PCR allowsample processing in 384-well formats and liken thismethod to cost-effective high-throughput applications(99102) A further advantage of qRT-PCR is the reduc-tion in downstream bio-computation (99103) The timebetween qRT-PCR and diagnostic result is shorter thanfor sRNA-seq or microarray-based inference Howeverthe analysis of miRNA expression profiles by qRT-PCRis technically challenging due to RNA-inherent pitfalls(103104)The miRNAs lack structural features such as mRNA

poly(A) tails that could serve as class-specific enrichmentor priming of reverse transcription (103) Alternative tech-niques for miRNA sample enrichment would rely onimmune-precipitation but most of them are not suitablefor typical small size samples in medical analysis (105) Asmentioned above the heterogeneity of the RNA GC-content translates into a wide range of primer annealingtemperatures for a population of miRNAs and mighttherefore complicate parallel analysis of multiple RNAs(103) Furthermore the qRT-PCR template is presentnot only within the mature miRNA molecule but also ispart of various precursor transcripts (103) It is thereforeimportant to ensure that primary or precursor-miRNAs(pri- or pre-miRNAs) do not provide templates inqRT-PCR amplification and as a consequence do notcontribute to the signal intensities detected Difficultiesthat relate to the similarity displayed between differentmiRNA isoforms are mentioned above within thecontext of array-based inference to quantify differentialgene expressionThus far two alternative approaches are commonly

utilized to reverse transcribe miRNA templates(103106) miRNA-specific stem-loop primers are oftenused when only limited numbers of RNAs are subject toquantification A major advantage of using miRNA-specific primers for reverse transcription is an increase insensitivity and specificity However RNA 30-end-tailing-based approaches enable bulk reverse transcription ofmultiple targets in parallel and are therefore advantageouswhen many miRNAs are analyzed (see above) A recentcomparative analysis to investigate reverse transcriptionefficacies of the two methods uncovered only minor dif-ferences (106) The analysis involved only a limitedmiRNA test set and therefore the results cannot begeneralized As an aside the oligo d(T)-based designmight also enable reverse transcription of miRNA andthe corresponding mRNA target in single test tube reac-tions (106) Ligation-mediated PCR or padlock probes incombination with rolling circle amplification are technicalalternatives for avoiding reverse transcription (107108)The methods rely on ligation-mediated template gener-ation and are therefore not devoid of bias (see above)Both approaches permit generation of miRNA signaturesacross samples and do not require elaborate analyticalpost-processing or expensive technical equipment(107108) However as elaborated here and abovereverse transcription is not devoid of template-specific

Nucleic Acids Research 2013 9

bias In particular RNA secondary structure and RNAmodifications both influence reverse transcription efficacy(see above) Similar to PCRs reverse transcription is nega-tively affected by increasing RNA GC-contentBecause the relative RNA GC-content dictates primer

Tm qRT-PCRs are also strongly influenced by forwardprimer design thereby affecting both qRT-PCR specificityand sensitivity Altering the length of primer-templatecomplementarity permits adjustment of the actualprimer Tm Conversely in cases where RNA nucleotidecomposition would dictate very low melting temperaturesLNA-modified nucleotides might be incorporated into theprimer sequence to increase the corresponding primer Tm(18) However the design of LNA-containing oligonucleo-tides is not always trivial (109) For the actual measure-ment of miRNA in real time two alternative fluorescentreporter molecules are frequently utilized SYBR Greenand the Taqman assay (103) Detection of the fluores-cence signal emitted by SYBR Green increases about100-fold as the molecule intercalates in double-strandedDNA However the mechanism of DNA intercalationsuggests the dye is not target-specific and thereforecannot discriminate primer dimers unspecific productsand target amplification (103) Because the intercalatingdye is simply added to the assay it is often possible toeasily adapt the qRT-PCR assay to already establishedand validated analytical PCRs (104) In comparison theTaqman assay provides increased specificity as the de-tection is based on sequence-specific probendashtarget inter-action Certainly the qRT-PCR assays offer manyadvantages but require evaluation of various factors thatinfluence and potentially bias analysis Thereforeinternational standardization to ensure comparable ex-perimental design and to permit meta-analysis amongdatasets is of prime importance for establishing clinicalqRT-PCR-based personal medicine The MinimumInformation for Publication of Quantitative Real-TimePCR experiments guidelines established standards for bet-ter experimental practice and suggest routes to data ana-lysis that increase reliability and data comparison (110)

mRNA-SEQ

Deep sequencing is not limited to expression profiling ofmiRNA or other small non-protein coding RNAs Evenentire transcriptomes including all protein coding andlong non-protein coding RNAs are subjected to high-throughput sequencing methods Currently as the lengthof single sequence runs is limited (from 100 to 500 nt de-pending on the sequencing platform) longer RNAsrequire fragmentation prior to cDNA synthesis (25)Millions of small reads resulting from mRNA-seq areanalyzed to identify novel transcripts or splice variants(111112) Relative quantification of reads enables one tomonitor differential gene expression among samples andto identify changes in isoform distribution (111) As withsRNA-seq the digital nature of RNA deep sequencing isthought to simplify meta-analysis of mRNA or long non-protein coding RNA expression data (25) Some protocolsfor modification of RNA fragment ends rely on

chemistries identical to sRNA-seq (84113) Therefore ex-pression biases observed in sRNA-seq analysis are alsolikely to cause distortion in whole transcriptome profilingVariations in nucleotide composition and changes in RNAsecondary structure of different RNA fragments mightcorrelate with uneven read distribution Unequal readcoverage generates uncertainty in isoform detection andin extreme cases exons representing unfavorable nucleo-tide compositions might escape analysis As fragmentationis suggested to generate many different but at least par-tially overlapping reads bias in mRNA-seq might be lessprominent Recently a novel technique based on circligasewas introduced to avoid RNA 50-end ligation (114)Reverse transcription is conducted with anchoredoligo(dT) primers that contain an additional sequence topermit later PCR amplification Following first strandcDNA synthesis circligase-mediated ligation is carriedout The resulting circular molecules are subjected toPCR amplification The strategy is of general interestand might be a valuable tool for sRNA-seq as well

CONCLUSION

Bias in general describes systematic errors that reflectmethod-related distortion from the truth Bias might bedetected easily when data generated by different methodsare subjected to analytical comparison Because it is sys-tematic in nature bias is not subject to variation inrepeated experiments Here based on biochemical anddeep sequencing data analyses we have presentedvarious steps in sRNA-seq protocols likely to causesevere distortions in the relative expression levels of indi-vidual sRNAs

The efficacy of RNA end-modification to permit cDNAconstruction depends not only on differences in strategiesand reagents but also on changes in among others buffercompositions and additives that could increase molecularcrowding Therefore meta-analysis of data generated withnon-identical methods reagents or even buffers is likely tobe error-prone (36396378) To achieve comparableresults and to evaluate relative changes in the quantifica-tion of the resulting data experiments and analyses shouldbe conducted under strictly identical conditions It has tobe stressed that bias in RNA-specific variation is not onlyrestricted to inter-sample comparison but also effects therelative ranking of intra-sample RNA expression

In addition to the more obvious quantitative aspects ofbias there are also qualitative aspects because RNA-seqmethods are also employed for RNA discovery IndividualRNAs might lsquodrop outrsquo from detection because they arenot amenable to cDNA construction under chosen condi-tions Certainly to increase the likelihood of productiveRNA end-modification and increase the complexity ofsRNA-seq library content the parallel application of dif-ferent strategies and reagents is required (33396379) Inparticular increased variability of adapter sequences (ieadapter pools) helps to increase the diversity of RNAsaccessed (63) A recent report emphasized the need forparallel application of different RNA- and DNA-basedadapter oligonucleotides (39)

10 Nucleic Acids Research 2013

The source of expression level distortion and the failureto detect certain RNA species at all appear not to belimited to reactions of RNAcDNA-end modificationbut also apply to cDNA amplification by PCR (30ndash32)Differences in reaction buffers RNA GC-content sec-ondary structures RNA length and primers might eachlead to bias during PCR Therefore the analysis ofsRNA-seq data to examine relative changes in geneexpression or identification reflects both true RNA abun-dance and biases related to the methods applied

The sRNA-seq approach is also attractive to medicalresearch Deep sequencing might permit the screening ofmultiple patient samples to monitor the progression ofdisease or treatment Therefore it is urgently necessaryto establish rigid international standardizations tominimize distortion and to avoid misleading conclusionsduring RNA-seq data interpretation In summary quali-tative and quantitative RNomics remain more challengingthan anticipated

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

FUNDING

National Genome Research Network [NGFNIII01GS0808 to JB and TSR] the Deutsche Forschungs-gemeinschaft [BR 7544-1 to JB] USM ResearchUniversity Grant [1001CIPPT81305 to THT]Funding for open access charge The licence was fundedby Dr Brosiusrsquos research budget

Conflict of interest statement None declared

REFERENCES

1 BrosiusJ (2009) The fragmented gene Ann N Y Acad Sci1178 186ndash193

2 BrosiusJ and TiedgeH (2004) RNomenclature RNA Biol 181ndash83

3 LiuQ and ParooZ (2010) Biochemical principles of small RNApathways Annu Rev Biochem 79 295ndash319

4 BrosiusJ (2012) In ErdmannVA and BarciszewskiJ (eds)From Nucleic Acids Sequences to Molecular Medicine RNATechnologies Springer Berlin Heidelberg pp 1ndash18

5 HuttenhoferA BrosiusJ and BachellerieJP (2002) RNomicsidentification and function of small non-messenger RNAs CurrOpin Chem Biol 6 835ndash843

6 AnS and SongJJ (2011) The coded functions of noncodingRNAs for gene regulation Mol Cells 31 491ndash496

7 BushatiN and CohenSM (2007) microRNA functions AnnuRev Cell Dev Biol 23 175ndash205

8 ChiangHR SchoenfeldLW RubyJG AuyeungVCSpiesN BaekD JohnstonWK RussC LuoS BabiarzJEet al (2010) Mammalian microRNAs experimental evaluation ofnovel and previously annotated genes Genes Dev 24 992ndash1009

9 AmbrosV (2001) microRNAs tiny regulators with greatpotential Cell 107 823ndash826

10 LauNC LimLP WeinsteinEG and BartelDP (2001) Anabundant class of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 294 858ndash862

11 RuvkunG (2001) Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash799

12 RuvkunGB (2003) The tiny RNA world Harvey Lect 991ndash21

13 GurtanAM and SharpPA (2013) The Role of miRNAs inRegulating Gene Expression Networks J Mol Biol 4253582ndash3600

14 LiangH ZhangJ ZenK ZhangCY and ChenX (2013)Nuclear microRNAs and their unconventional role in regulatingnon-coding RNAs Protein Cell 4 325ndash330

15 HuangV and LiLC (2012) miRNA goes nuclear RNA Biol 9269ndash273

16 WangZ HanJ CuiY FanK and ZhouX (2013) CirculatingmicroRNA-21 as noninvasive predictive biomarker for response incancer immunotherapy Med Hypotheses 81 41ndash43

17 ZhaoC ZhangJ ZhangS YuD ChenY LiuQ ShiMNiC and ZhuM (2013) Diagnostic and biological significance ofmicroRNA-192 in pancreatic ductal adenocarcinoma Oncol Rep30 276ndash284

18 van RooijE (2011) The art of microRNA research Circ Res108 219ndash234

19 LeeLW ZhangS EtheridgeA MaL MartinD GalasDand WangK (2010) Complexity of the microRNA repertoirerevealed by next-generation sequencing RNA 16 2170ndash2180

20 OkoniewskiMJ and MillerCJ (2006) Hybridization interactionsbetween probesets in short oligo microarrays lead to spuriouscorrelations BMC Bioinformatics 7 276

21 MortazaviA WilliamsBA McCueK SchaefferL andWoldB (2008) Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 5 621ndash628

22 LiuS LinL JiangP WangD and XingY (2011) Acomparison of RNA-Seq and high-density exon array fordetecting differential gene expression between closely relatedspecies Nucleic Acids Res 39 578ndash588

23 RaghavachariN BarbJ YangY LiuP WoodhouseKLevyD OrsquoDonnellC MunsonPJ and KatoG (2012) Asystematic comparison and evaluation of high density exon arraysand RNA-seq technology used to unravel the peripheral bloodtranscriptome of sickle cell disease BMC Med Genomics 5 28

24 MartinJA and WangZ (2011) Next-generation transcriptomeassembly Nat Rev Genet 12 671ndash682

25 WangZ GersteinM and SnyderM (2009) RNA-Seq arevolutionary tool for transcriptomics Nat Rev Genet 1057ndash63

26 KozomaraA and Griffiths-JonesS (2011) miRBase integratingmicroRNA annotation and deep-sequencing data Nucleic AcidsRes 39 D152ndashD157

27 LinsenSE de WitE JanssensG HeaterS ChapmanLParkinRK FritzB WymanSK de BruijnE VoestEE et al(2009) Limitations and possibilities of small RNA digital geneexpression profiling Nat Methods 6 474ndash476

28 BakerM (2010) MicroRNA profiling separating signal fromnoise Nat Methods 7 687ndash692

29 HafnerM RenwickN BrownM MihailovicA HolochDLinC PenaJT NusbaumJD MorozovP LudwigJ et al(2011) RNA-ligase-dependent biases in miRNA representationin deep-sequenced small RNA cDNA libraries RNA 171697ndash1712

30 DabneyJ and MeyerM (2012) Length and GC-biases duringsequencing library amplification a comparison of variouspolymerase-buffer systems with ancient and modern DNAsequencing libraries Biotechniques 52 87ndash94

31 OrpanaAK HoTH and StenmanJ (2012) Multiple heatpulses during PCR extension enabling amplification of GC-richsequences and reducing amplification bias Anal Chem 842081ndash2087

32 SendlerE JohnsonGD and KrawetzSA (2011) Local andglobal factors affecting RNA sequencing analysis Anal Biochem419 317ndash322

33 ZhuangF FuchsRT SunZ ZhengY and RobbGB (2012)Structural bias in T4 RNA ligase-mediated 30-adapter ligationNucleic Acids Res 40 e54

34 DeChiaraTM and BrosiusJ (1987) Neural BC1 RNA cDNAclones reveal nonrepetitive sequence content Proc Natl AcadSci USA 84 2624ndash2628

Nucleic Acids Research 2013 11

35 WickensM and KwakJE (2008) Molecular biology A tail talefor U Science 319 1344ndash1345

36 MunafoDB and RobbGB (2010) Optimization of enzymaticreaction conditions for generating representative pools of cDNAfrom small RNA RNA 16 2537ndash2552

37 SarkarN (1997) Polyadenylation of mRNA in prokaryotesAnnu Rev Biochem 66 173ndash197

38 Yehudai-ResheffS and SchusterG (2000) Characterization of theEcoli poly(A) polymerase nucleotide specificity RNA-bindingaffinities and RNA structure dependence Nucleic Acids Res 281139ndash1144

39 RaabeCA HoeCH RandauG BrosiusJ TangTH andRozhdestvenskyTS (2011) The rocks and shallows of deep RNAsequencing Examples in the Vibrio cholerae RNome RNA 171357ndash1366

40 RaabeCA SanchezCP RandauG RobeckT SkryabinBVChinniSV KubeM ReinhardtR NgGH ManickamRet al (2010) A global view of the nonprotein-codingtranscriptome in Plasmodium falciparum Nucleic Acids Res 38608ndash617

41 SharmaCM HoffmannS DarfeuilleF ReignierJ FindeissSSittkaA ChabasS ReicheK HackermullerJ ReinhardtRet al (2010) The primary transcriptome of the major humanpathogen Helicobacter pylori Nature 464 250ndash255

42 SharmaCM and VogelJ (2009) Experimental approaches forthe discovery and characterization of regulatory small RNACurr Opin Microbiol 12 536ndash546

43 YuB YangZ LiJ MinakhinaS YangM PadgettRWStewardR and ChenX (2005) Methylation as a crucial step inplant microRNA biogenesis Science 307 932ndash935

44 HouwingS KammingaLM BerezikovE CronemboldDGirardA van den ElstH FilippovDV BlaserH RazEMoensCB et al (2007) A role for Piwi and piRNAs in germcell maintenance and transposon silencing in Zebrafish Cell 12969ndash82

45 AckermannHW (2003) Bacteriophage observations andevolution Res Microbiol 154 245ndash251

46 RossmannMG MesyanzhinovVV ArisakaF andLeimanPG (2004) The bacteriophage T4 DNA injectionmachine Curr Opin Struct Biol 14 171ndash180

47 HoCK and ShumanS (2002) Bacteriophage T4 RNA ligase 2(gp241) exemplifies a family of RNA ligases found in allphylogenetic domains Proc Natl Acad Sci USA 9912709ndash12714

48 YinS HoCK and ShumanS (2003) Structure-function analysisof T4 RNA ligase 2 J Biol Chem 278 17601ndash17608

49 AravinA and TuschlT (2005) Identification and characterizationof small RNAs involved in RNA silencing FEBS Lett 5795830ndash5840

50 ViolletS FuchsRT MunafoDB ZhuangF and RobbGB(2011) T4 RNA ligase 2 truncated active site mutants improvedtools for RNA analysis BMC Biotechnol 11 72

51 UhlenbeckOC and GumportRI (1982) T4 RNA LigaseIn BoyerPD (ed) The Enzymes Nucleic Acids Part BAcademic Press Inc New York pp 31ndash58

52 HoCK WangLK LimaCD and ShumanS (2004) Structureand mechanism of RNA ligase Structure 12 327ndash339

53 El OmariK RenJ BirdLE BonaMK KlarmannGLeGriceSF and StammersDK (2006) Molecular architectureand ligand recognition determinants for T4 RNA ligase J BiolChem 281 1573ndash1579

54 StarkMR PleissJA DerasM ScaringeSA and RaderSD(2006) An RNA ligase-mediated method for the efficient creationof large synthetic RNAs RNA 12 2014ndash2019

55 CranstonJW SilberR MalathiVG and HurwitzJ (1974)Studies on ribonucleic acid ligase Characterization of anadenosine triphosphate-inorganic pyrophosphate exchangereaction and demonstration of an enzyme-adenylate complex withT4 bacteriophage-induced enzyme J Biol Chem 2497447ndash7456

56 HigginsNP GeballeAP SnopekTJ SuginoA andCozzarelliNR (1977) Bacteriophage T4 RNA ligase preparationof a physically homogeneous nuclease-free enzyme fromhyperproducing infected cells Nucleic Acids Res 4 3175ndash3186

57 PfefferS Lagos-QuintanaM and TuschlT (2005) Cloning ofsmall RNA molecules Curr Protoc Mol Biol Chapter 26 Unit26 24

58 BruceAG and UhlenbeckOC (1978) Reactions at the terminiof tRNA with T4 RNA ligase Nucleic Acids Res 5 3665ndash3677

59 VigneaultF SismourAM and ChurchGM (2008) EfficientmicroRNA capture and bar-coding via enzymatic oligonucleotideadenylation Nat Methods 5 777ndash779

60 LiuJM LivnyJ LawrenceMS KimballMD WaldorMKand CamilliA (2009) Experimental discovery of sRNAs in Vibriocholerae by direct cloning 5StRNA depletion and parallelsequencing Nucleic Acids Res 37 e46

61 KrugM and UhlenbeckOC (1982) Reversal of T4 RNA ligaseBiochemistry 21 1858ndash1864

62 HafnerM LandgrafP LudwigJ RiceA OjoT LinCHolochD LimC and TuschlT (2008) Identification ofmicroRNAs and other small regulatory RNAs using cDNAlibrary sequencing Methods 44 3ndash12

63 JayaprakashAD JabadoO BrownBD and SachidanandamR(2011) Identification and remediation of biases in the activity ofRNA ligases in small-RNA deep sequencing Nucleic Acids Res39 e141

64 YewCW and KumarSV (2012) Isolation and cloning ofmicroRNAs from recalcitrant plant tissues with small amounts oftotal RNA a step-by step approach Mol Biol Rep 391783ndash1790

65 KurthHM and MochizukiK (2009) 2rsquo-O-methylation stabilizesPiwi-associated small RNAs and ensures DNA elimination inTetrahymena RNA 15 675ndash685

66 EbhardtHA ThiEP WangMB and UnrauPJ (2005)Extensive 30 modification of plant small RNAs is modulated byhelper component-proteinase expression Proc Natl Acad SciUSA 102 13398ndash13403

67 WangJ CzechB CrunkA WallaceA MitrevaMHannonGJ and DavisRE (2011) Deep small RNA sequencingfrom the nematode Ascaris reveals conservation functionaldiversification and novel developmental profiles Genome Res 211462ndash1477

68 ZhuangF FuchsRT and RobbGB (2012) Small RNAexpression profiling by high-throughput sequencing implicationsof enzymatic manipulation J Nucleic Acids 2012 360358

69 FujimuraT and EstebanR (2012) Cap snatching of yeast L-Adouble-stranded RNA virus can operate in trans and requiresviral polymerase actively engaging in transcription J Biol Chem287 12797ndash12804

70 BessmanMJ WalshJD DunnCA SwaminathanJWeldonJE and ShenJ (2001) The gene ygdP associated withthe invasiveness of Escherichia coli K1 designates a Nudixhydrolase Orf176 active on adenosine (50)-pentaphospho-(50)-adenosine (Ap5A) J Biol Chem 276 37834ndash37838

71 DeanaA CelesnikH and BelascoJG (2008) The bacterialenzyme RppH triggers messenger RNA degradation by50 pyrophosphate removal Nature 451 355ndash358

72 BlewettN CollerJ and GoldstrohmA (2011) A quantitativeassay for measuring mRNA decapping by splinted ligation reversetranscription polymerase chain reaction qSL-RT-PCR RNA 17535ndash543

73 KrogerC DillonSC CameronAD PapenfortKSivasankaranSK HokampK ChaoY SittkaA HebrardMHandlerK et al (2012) The transcriptional landscape and smallRNAs of Salmonella enterica serovar Typhimurium Proc NatlAcad Sci USA 109 E1277ndashE1286

74 EnglandTE and UhlenbeckOC (1978) Enzymaticoligoribonucleotide synthesis with T4 RNA ligase Biochemistry17 2069ndash2076

75 SorefanK PaisH HallAE KozomaraA Griffiths-JonesSMoultonV and DalmayT (2012) Reducing ligation bias of smallRNAs in libraries for next generation sequencing Silence 3 4

76 ZhelkovskyAM and McReynoldsLA (2011) Simple andefficient synthesis of 50 pre-adenylated DNA using thermostableRNA ligase Nucleic Acids Res 39 e117

77 McCoyMI and GumportRI (1980) T4 ribonucleic acid ligasejoins single-strand oligo(deoxyribonucleotides) Biochemistry 19635ndash642

12 Nucleic Acids Research 2013

78 AlonS VigneaultF EminagaS ChristodoulouDCSeidmanJG ChurchGM and EisenbergE (2011) Barcodingbias in high-throughput multiplex sequencing of miRNA GenomeRes 21 1506ndash1511

79 SunG WuX WangJ LiH LiX GaoH RossiJ andYenY (2011) A bias-reducing strategy in profiling small RNAsusing Solexa RNA 17 2256ndash2262

80 Van NieuwerburghF SoetaertS PodshivalovaK Ay-LinWangE SchafferL DeforceD SalomonDR HeadSR andOrdoukhanianP (2011) Quantitative bias in Illumina TruSeq anda novel post amplification barcoding strategy for multiplexedDNA and small RNA deep sequencing PLoS One 6 e26969

81 VigneaultF Ter-OvanesyanD AlonS EminagaSC ChristodoulouD SeidmanJG EisenbergE and MChurchG (2012) High-throughput multiplex sequencingof miRNA Curr Protoc Hum Genet Chapter 11 Unit 1112 11-10

82 PakJ and FireA (2007) Distinct populations of primary andsecondary effectors during RNAi in C elegans Science 315241ndash244

83 ZhuYY MachlederEM ChenchikA LiR and SiebertPD(2001) Reverse transcriptase template switching a SMARTapproach for full-length cDNA library constructionBiotechniques 30 892ndash897

84 LevinJZ YassourM AdiconisX NusbaumCThompsonDA FriedmanN GnirkeA and RegevA (2010)Comprehensive comparative analysis of strand-specific RNAsequencing methods Nat Methods 7 709ndash715

85 AndoY BurroughsAM KawanoM de HoonMJ andHayashizakiY (2012) Targeted methods to improve small RNAprofiles generated by deep sequencing In MallickB andGhoshZ (eds) Regulatory RNAs Basics Methods andApplications Springer-Verlag Heidelberg Germany pp 253ndash271

86 KawanoM KawazuC LizioM KawajiH CarninciPSuzukiH and HayashizakiY (2010) Reduction of non-insertsequence reads by dimer eliminator LNA oligonucleotide forsmall RNA deep sequencing Biotechniques 49 751ndash755

87 KapranovP OzsolakF and MilosPM (2012) Profiling of shortRNAs using Helicos single-molecule sequencing Methods MolBiol 822 219ndash232

88 KanagawaT (2003) Bias and artifacts in multitemplatepolymerase chain reactions (PCR) J Biosci Bioeng 96 317ndash323

89 DayDJ SpeiserPW SchulzeE BettendorfM FitnessJBaranyF and WhitePC (1996) Identification of non-amplifyingCYP21 genes when using PCR-based diagnosis of 21-hydroxylasedeficiency in congenital adrenal hyperplasia (CAH) affectedpedigrees Hum Mol Genet 5 2039ndash2048

90 AirdD RossMG ChenWS DanielssonM FennellTRussC JaffeDB NusbaumC and GnirkeA (2011) Analyzingand minimizing PCR amplification bias in Illumina sequencinglibraries Genome Biol 12 R18

91 UllahMF and AatifM (2009) The footprints of cancerdevelopment cancer biomarkers Cancer Treat Rev 35 193ndash200

92 MoMH ChenL FuY WangW and FuSW (2012) Cell-freeCirculating miRNA Biomarkers in Cancer J Cancer 3 432ndash448

93 ZenK and ZhangCY (2012) Circulating microRNAs a novelclass of biomarkers to diagnose and monitor human cancersMed Res Rev 32 326ndash348

94 KimT and ReitmairA (2013) Non-Coding RNAs FunctionalAspects and Diagnostic Utility in Oncology Int J Mol Sci 144934ndash4968

95 PatnaikSK MallickR and YendamuriS (2010) Detection ofmicroRNAs in dried serum blots Anal Biochem 407 147ndash149

96 MitchellPS ParkinRK KrohEM FritzBR WymanSKPogosova-AgadjanyanEL PetersonA NoteboomJOrsquoBriantKC AllenA et al (2008) Circulating microRNAs as

stable blood-based markers for cancer detection Proc NatlAcad Sci USA 105 10513ndash10518

97 CieslaM SkrzypekK KozakowskaM LobodaAJozkowiczA and DulakJ (2011) MicroRNAs as biomarkers ofdisease onset Anal Bioanal Chem 401 2051ndash2061

98 OzsolakF (2012) Third-generation sequencing techniques andapplications to drug discovery Expert Opin Drug Discov 7231ndash243

99 TsongalisGJ CalinG CordelierP CroceC MonzonF andSzafranska-SchwarzbachAE (2013) MicroRNA analysis is itready for prime time Clin Chem 59 343ndash347

100 AmiraN Cancel-TassinG BernardiniS Cochand-PriolletBBittardH ManginP FournierG LatilA and CussenotO(2004) Expression in bladder transitional cell carcinoma by real-time quantitative reverse transcription polymerase chain reactionarray of 65 genes at the tumor suppressor locus 9q341-2identification of 5 candidates tumor suppressor genes Int JCancer 111 539ndash542

101 ZhangX (2007) Biomarker validation movement towardspersonalized medicine Expert Rev Mol Diagn 7 469ndash471

102 SchmittgenTD LeeEJ and JiangJ (2008) High-throughputreal-time PCR Methods Mol Biol 429 89ndash98

103 BenesV and CastoldiM (2010) Expression profiling ofmicroRNA using real-time quantitative PCR how to use it andwhat is available Methods 50 244ndash249

104 BustinSA and NolanT (2004) Pitfalls of quantitative real-timereverse-transcription polymerase chain reaction J Biomol Tech15 155ndash166

105 BeitzingerM and MeisterG (2011) Experimental identificationof microRNA targets by immunoprecipitation of Argonauteprotein complexes Methods Mol Biol 732 153ndash167

106 FiedlerSD CarlettiMZ and ChristensonLK (2010)Quantitative RT-PCR methods for mature microRNA expressionanalysis Methods Mol Biol 630 49ndash64

107 ArefianE KianiJ SoleimaniM ShariatiSA Aghaee-BakhtiariSH AtashiA GheisariY AhmadbeigiN Banaei-MoghaddamAM NaderiM et al (2011) Analysis ofmicroRNA signatures using size-coded ligation-mediated PCRNucleic Acids Res 39 e80

108 JonstrupSP KochJ and KjemsJ (2006) A microRNAdetection system based on padlock probes and rolling circleamplification RNA 12 1747ndash1752

109 LatorraD ArarK and HurleyJM (2003) Designconsiderations and effects of LNA in PCR primers Mol CellProbes 17 253ndash259

110 BustinSA BenesV GarsonJA HellemansJ HuggettJKubistaM MuellerR NolanT PfafflMW ShipleyGLet al (2009) The MIQE guidelines minimum information forpublication of quantitative real-time PCR experiments ClinChem 55 611ndash622

111 TrapnellC WilliamsBA PerteaG MortazaviA KwanGvan BarenMJ SalzbergSL WoldBJ and PachterL (2010)Transcript assembly and quantification by RNA-Seq revealsunannotated transcripts and isoform switching during celldifferentiation Nat Biotechnol 28 511ndash515

112 TrapnellC PachterL and SalzbergSL (2009) TopHatdiscovering splice junctions with RNA-Seq Bioinformatics 251105ndash1111

113 ListerR OrsquoMalleyRC Tonti-FilippiniJ GregoryBDBerryCC MillarAH and EckerJR (2008) Highly integratedsingle-base resolution maps of the epigenome in ArabidopsisCell 133 523ndash536

114 PolidorosAN PasentsisK and TsaftarisAS (2006) Rollingcircle amplification-RACE a method for simultaneous isolationof 50 and 30 cDNA ends from amplified cDNA templatesBiotechniques 41 35ndash42

Nucleic Acids Research 2013 13

Page 9: SURVEY AND SUMMARY Biases in small RNA deep sequencing data

sRNA-SEQ AND PERSONALIZED MEDICINE

In recent years increasing attention has focused on thepotential of miRNA transcriptional profiling to diagnosedisease or to monitor progression and success in treatment(161791) The decreasing costs of high-throughputsequence analysis favors sRNA-seq as a common toolfor medical diagnostics and personalized medicine Aftercell death the total RNA of circulating cancer cells isreleased into the blood stream (92) These extracellularRNAs (exRNAs) are packaged within vesicles such asexosomes where they are complexed with differentproteins for example miRNAs in the RISC complexVesicles protect exRNA from ribonucleases and preventdegradation Among the various exRNAs miRNAs aresome of the most stable molecules (93) Easy accessibilityof patient blood samples and the stability of miRNAsmake them effective biomarkers (94) Diagnostics basedon miRNA were successfully conducted on biologicalsamples of low quality including dried blood (95) Inaddition miRNAs remain stable even when samples aresubjected to repeated rounds of tissue freezing andthawing which usually degrades RNAs (96) A furtherappealing feature of miRNAs in medical research is thatchanges in their expression profiles are observed duringdisease onset (97) In addition miRNAs often displaytissue-specific expression Therefore it might be possibleto identify the cellular origin of most metastases (91)possibly providing decision guidance for appropriatetreatment However as elaborated in detail miRNA ex-pression profiles detected by deep sequencing do not onlyreflect changes in relative expression but are alsoinfluenced by miRNA-adapter-specific ligation efficaciesespecially in cost-effective parallel analyses that oftenrequire multiplexing Different barcode introduced byligation generate bias in miRNA representation and abun-dance (78) Even when incorporated by PCR barcodingmight still lead to data distortion (30ndash32) Third-gener-ation sequencing techniques such as the Helicos (InNovember 2012 Helicos filed bankruptcy (httpbizyahoocome121115hlcs8-khtml)) system avoid PCRamplification steps in cDNA preparation and permitdirect sequencing of unamplified samples (87) The RNA30-end modification to prime cDNA synthesis (or directsequencing) in addition to relatively high error frequencies(15) might still lead to bias and so would not beespecially applicable in personalized medicine (98)Collectively deep sequencing to monitor change inmedical samples is of importance and certainly ofinterest but requires resolution of serious standardizationissues prior to general application

qRT-PCR a valid alternative to deep sequencing

Alternative techniques to quantify specific miRNA testsets rely on quantitative real-time reverse transcriptionPCR (qRT-PCR) (99) In particular diagnostic screensin cancer medicine rely on the application of qRT-PCR(99ndash101) In cases where biomarker disease correlation iswell established and when quantification of limitednumbers of biomarkers is sufficient for diagnosis and tomonitor disease progression or treatment qRT-PCR is

a valuable tool (99) Recently the convenience ofqRT-PCR-based medical diagnosis was suggested tomake it superior in routine clinical applications (99)Interestingly novel developments in qRT-PCR allowsample processing in 384-well formats and liken thismethod to cost-effective high-throughput applications(99102) A further advantage of qRT-PCR is the reduc-tion in downstream bio-computation (99103) The timebetween qRT-PCR and diagnostic result is shorter thanfor sRNA-seq or microarray-based inference Howeverthe analysis of miRNA expression profiles by qRT-PCRis technically challenging due to RNA-inherent pitfalls(103104)The miRNAs lack structural features such as mRNA

poly(A) tails that could serve as class-specific enrichmentor priming of reverse transcription (103) Alternative tech-niques for miRNA sample enrichment would rely onimmune-precipitation but most of them are not suitablefor typical small size samples in medical analysis (105) Asmentioned above the heterogeneity of the RNA GC-content translates into a wide range of primer annealingtemperatures for a population of miRNAs and mighttherefore complicate parallel analysis of multiple RNAs(103) Furthermore the qRT-PCR template is presentnot only within the mature miRNA molecule but also ispart of various precursor transcripts (103) It is thereforeimportant to ensure that primary or precursor-miRNAs(pri- or pre-miRNAs) do not provide templates inqRT-PCR amplification and as a consequence do notcontribute to the signal intensities detected Difficultiesthat relate to the similarity displayed between differentmiRNA isoforms are mentioned above within thecontext of array-based inference to quantify differentialgene expressionThus far two alternative approaches are commonly

utilized to reverse transcribe miRNA templates(103106) miRNA-specific stem-loop primers are oftenused when only limited numbers of RNAs are subject toquantification A major advantage of using miRNA-specific primers for reverse transcription is an increase insensitivity and specificity However RNA 30-end-tailing-based approaches enable bulk reverse transcription ofmultiple targets in parallel and are therefore advantageouswhen many miRNAs are analyzed (see above) A recentcomparative analysis to investigate reverse transcriptionefficacies of the two methods uncovered only minor dif-ferences (106) The analysis involved only a limitedmiRNA test set and therefore the results cannot begeneralized As an aside the oligo d(T)-based designmight also enable reverse transcription of miRNA andthe corresponding mRNA target in single test tube reac-tions (106) Ligation-mediated PCR or padlock probes incombination with rolling circle amplification are technicalalternatives for avoiding reverse transcription (107108)The methods rely on ligation-mediated template gener-ation and are therefore not devoid of bias (see above)Both approaches permit generation of miRNA signaturesacross samples and do not require elaborate analyticalpost-processing or expensive technical equipment(107108) However as elaborated here and abovereverse transcription is not devoid of template-specific

Nucleic Acids Research 2013 9

bias In particular RNA secondary structure and RNAmodifications both influence reverse transcription efficacy(see above) Similar to PCRs reverse transcription is nega-tively affected by increasing RNA GC-contentBecause the relative RNA GC-content dictates primer

Tm qRT-PCRs are also strongly influenced by forwardprimer design thereby affecting both qRT-PCR specificityand sensitivity Altering the length of primer-templatecomplementarity permits adjustment of the actualprimer Tm Conversely in cases where RNA nucleotidecomposition would dictate very low melting temperaturesLNA-modified nucleotides might be incorporated into theprimer sequence to increase the corresponding primer Tm(18) However the design of LNA-containing oligonucleo-tides is not always trivial (109) For the actual measure-ment of miRNA in real time two alternative fluorescentreporter molecules are frequently utilized SYBR Greenand the Taqman assay (103) Detection of the fluores-cence signal emitted by SYBR Green increases about100-fold as the molecule intercalates in double-strandedDNA However the mechanism of DNA intercalationsuggests the dye is not target-specific and thereforecannot discriminate primer dimers unspecific productsand target amplification (103) Because the intercalatingdye is simply added to the assay it is often possible toeasily adapt the qRT-PCR assay to already establishedand validated analytical PCRs (104) In comparison theTaqman assay provides increased specificity as the de-tection is based on sequence-specific probendashtarget inter-action Certainly the qRT-PCR assays offer manyadvantages but require evaluation of various factors thatinfluence and potentially bias analysis Thereforeinternational standardization to ensure comparable ex-perimental design and to permit meta-analysis amongdatasets is of prime importance for establishing clinicalqRT-PCR-based personal medicine The MinimumInformation for Publication of Quantitative Real-TimePCR experiments guidelines established standards for bet-ter experimental practice and suggest routes to data ana-lysis that increase reliability and data comparison (110)

mRNA-SEQ

Deep sequencing is not limited to expression profiling ofmiRNA or other small non-protein coding RNAs Evenentire transcriptomes including all protein coding andlong non-protein coding RNAs are subjected to high-throughput sequencing methods Currently as the lengthof single sequence runs is limited (from 100 to 500 nt de-pending on the sequencing platform) longer RNAsrequire fragmentation prior to cDNA synthesis (25)Millions of small reads resulting from mRNA-seq areanalyzed to identify novel transcripts or splice variants(111112) Relative quantification of reads enables one tomonitor differential gene expression among samples andto identify changes in isoform distribution (111) As withsRNA-seq the digital nature of RNA deep sequencing isthought to simplify meta-analysis of mRNA or long non-protein coding RNA expression data (25) Some protocolsfor modification of RNA fragment ends rely on

chemistries identical to sRNA-seq (84113) Therefore ex-pression biases observed in sRNA-seq analysis are alsolikely to cause distortion in whole transcriptome profilingVariations in nucleotide composition and changes in RNAsecondary structure of different RNA fragments mightcorrelate with uneven read distribution Unequal readcoverage generates uncertainty in isoform detection andin extreme cases exons representing unfavorable nucleo-tide compositions might escape analysis As fragmentationis suggested to generate many different but at least par-tially overlapping reads bias in mRNA-seq might be lessprominent Recently a novel technique based on circligasewas introduced to avoid RNA 50-end ligation (114)Reverse transcription is conducted with anchoredoligo(dT) primers that contain an additional sequence topermit later PCR amplification Following first strandcDNA synthesis circligase-mediated ligation is carriedout The resulting circular molecules are subjected toPCR amplification The strategy is of general interestand might be a valuable tool for sRNA-seq as well

CONCLUSION

Bias in general describes systematic errors that reflectmethod-related distortion from the truth Bias might bedetected easily when data generated by different methodsare subjected to analytical comparison Because it is sys-tematic in nature bias is not subject to variation inrepeated experiments Here based on biochemical anddeep sequencing data analyses we have presentedvarious steps in sRNA-seq protocols likely to causesevere distortions in the relative expression levels of indi-vidual sRNAs

The efficacy of RNA end-modification to permit cDNAconstruction depends not only on differences in strategiesand reagents but also on changes in among others buffercompositions and additives that could increase molecularcrowding Therefore meta-analysis of data generated withnon-identical methods reagents or even buffers is likely tobe error-prone (36396378) To achieve comparableresults and to evaluate relative changes in the quantifica-tion of the resulting data experiments and analyses shouldbe conducted under strictly identical conditions It has tobe stressed that bias in RNA-specific variation is not onlyrestricted to inter-sample comparison but also effects therelative ranking of intra-sample RNA expression

In addition to the more obvious quantitative aspects ofbias there are also qualitative aspects because RNA-seqmethods are also employed for RNA discovery IndividualRNAs might lsquodrop outrsquo from detection because they arenot amenable to cDNA construction under chosen condi-tions Certainly to increase the likelihood of productiveRNA end-modification and increase the complexity ofsRNA-seq library content the parallel application of dif-ferent strategies and reagents is required (33396379) Inparticular increased variability of adapter sequences (ieadapter pools) helps to increase the diversity of RNAsaccessed (63) A recent report emphasized the need forparallel application of different RNA- and DNA-basedadapter oligonucleotides (39)

10 Nucleic Acids Research 2013

The source of expression level distortion and the failureto detect certain RNA species at all appear not to belimited to reactions of RNAcDNA-end modificationbut also apply to cDNA amplification by PCR (30ndash32)Differences in reaction buffers RNA GC-content sec-ondary structures RNA length and primers might eachlead to bias during PCR Therefore the analysis ofsRNA-seq data to examine relative changes in geneexpression or identification reflects both true RNA abun-dance and biases related to the methods applied

The sRNA-seq approach is also attractive to medicalresearch Deep sequencing might permit the screening ofmultiple patient samples to monitor the progression ofdisease or treatment Therefore it is urgently necessaryto establish rigid international standardizations tominimize distortion and to avoid misleading conclusionsduring RNA-seq data interpretation In summary quali-tative and quantitative RNomics remain more challengingthan anticipated

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

FUNDING

National Genome Research Network [NGFNIII01GS0808 to JB and TSR] the Deutsche Forschungs-gemeinschaft [BR 7544-1 to JB] USM ResearchUniversity Grant [1001CIPPT81305 to THT]Funding for open access charge The licence was fundedby Dr Brosiusrsquos research budget

Conflict of interest statement None declared

REFERENCES

1 BrosiusJ (2009) The fragmented gene Ann N Y Acad Sci1178 186ndash193

2 BrosiusJ and TiedgeH (2004) RNomenclature RNA Biol 181ndash83

3 LiuQ and ParooZ (2010) Biochemical principles of small RNApathways Annu Rev Biochem 79 295ndash319

4 BrosiusJ (2012) In ErdmannVA and BarciszewskiJ (eds)From Nucleic Acids Sequences to Molecular Medicine RNATechnologies Springer Berlin Heidelberg pp 1ndash18

5 HuttenhoferA BrosiusJ and BachellerieJP (2002) RNomicsidentification and function of small non-messenger RNAs CurrOpin Chem Biol 6 835ndash843

6 AnS and SongJJ (2011) The coded functions of noncodingRNAs for gene regulation Mol Cells 31 491ndash496

7 BushatiN and CohenSM (2007) microRNA functions AnnuRev Cell Dev Biol 23 175ndash205

8 ChiangHR SchoenfeldLW RubyJG AuyeungVCSpiesN BaekD JohnstonWK RussC LuoS BabiarzJEet al (2010) Mammalian microRNAs experimental evaluation ofnovel and previously annotated genes Genes Dev 24 992ndash1009

9 AmbrosV (2001) microRNAs tiny regulators with greatpotential Cell 107 823ndash826

10 LauNC LimLP WeinsteinEG and BartelDP (2001) Anabundant class of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 294 858ndash862

11 RuvkunG (2001) Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash799

12 RuvkunGB (2003) The tiny RNA world Harvey Lect 991ndash21

13 GurtanAM and SharpPA (2013) The Role of miRNAs inRegulating Gene Expression Networks J Mol Biol 4253582ndash3600

14 LiangH ZhangJ ZenK ZhangCY and ChenX (2013)Nuclear microRNAs and their unconventional role in regulatingnon-coding RNAs Protein Cell 4 325ndash330

15 HuangV and LiLC (2012) miRNA goes nuclear RNA Biol 9269ndash273

16 WangZ HanJ CuiY FanK and ZhouX (2013) CirculatingmicroRNA-21 as noninvasive predictive biomarker for response incancer immunotherapy Med Hypotheses 81 41ndash43

17 ZhaoC ZhangJ ZhangS YuD ChenY LiuQ ShiMNiC and ZhuM (2013) Diagnostic and biological significance ofmicroRNA-192 in pancreatic ductal adenocarcinoma Oncol Rep30 276ndash284

18 van RooijE (2011) The art of microRNA research Circ Res108 219ndash234

19 LeeLW ZhangS EtheridgeA MaL MartinD GalasDand WangK (2010) Complexity of the microRNA repertoirerevealed by next-generation sequencing RNA 16 2170ndash2180

20 OkoniewskiMJ and MillerCJ (2006) Hybridization interactionsbetween probesets in short oligo microarrays lead to spuriouscorrelations BMC Bioinformatics 7 276

21 MortazaviA WilliamsBA McCueK SchaefferL andWoldB (2008) Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 5 621ndash628

22 LiuS LinL JiangP WangD and XingY (2011) Acomparison of RNA-Seq and high-density exon array fordetecting differential gene expression between closely relatedspecies Nucleic Acids Res 39 578ndash588

23 RaghavachariN BarbJ YangY LiuP WoodhouseKLevyD OrsquoDonnellC MunsonPJ and KatoG (2012) Asystematic comparison and evaluation of high density exon arraysand RNA-seq technology used to unravel the peripheral bloodtranscriptome of sickle cell disease BMC Med Genomics 5 28

24 MartinJA and WangZ (2011) Next-generation transcriptomeassembly Nat Rev Genet 12 671ndash682

25 WangZ GersteinM and SnyderM (2009) RNA-Seq arevolutionary tool for transcriptomics Nat Rev Genet 1057ndash63

26 KozomaraA and Griffiths-JonesS (2011) miRBase integratingmicroRNA annotation and deep-sequencing data Nucleic AcidsRes 39 D152ndashD157

27 LinsenSE de WitE JanssensG HeaterS ChapmanLParkinRK FritzB WymanSK de BruijnE VoestEE et al(2009) Limitations and possibilities of small RNA digital geneexpression profiling Nat Methods 6 474ndash476

28 BakerM (2010) MicroRNA profiling separating signal fromnoise Nat Methods 7 687ndash692

29 HafnerM RenwickN BrownM MihailovicA HolochDLinC PenaJT NusbaumJD MorozovP LudwigJ et al(2011) RNA-ligase-dependent biases in miRNA representationin deep-sequenced small RNA cDNA libraries RNA 171697ndash1712

30 DabneyJ and MeyerM (2012) Length and GC-biases duringsequencing library amplification a comparison of variouspolymerase-buffer systems with ancient and modern DNAsequencing libraries Biotechniques 52 87ndash94

31 OrpanaAK HoTH and StenmanJ (2012) Multiple heatpulses during PCR extension enabling amplification of GC-richsequences and reducing amplification bias Anal Chem 842081ndash2087

32 SendlerE JohnsonGD and KrawetzSA (2011) Local andglobal factors affecting RNA sequencing analysis Anal Biochem419 317ndash322

33 ZhuangF FuchsRT SunZ ZhengY and RobbGB (2012)Structural bias in T4 RNA ligase-mediated 30-adapter ligationNucleic Acids Res 40 e54

34 DeChiaraTM and BrosiusJ (1987) Neural BC1 RNA cDNAclones reveal nonrepetitive sequence content Proc Natl AcadSci USA 84 2624ndash2628

Nucleic Acids Research 2013 11

35 WickensM and KwakJE (2008) Molecular biology A tail talefor U Science 319 1344ndash1345

36 MunafoDB and RobbGB (2010) Optimization of enzymaticreaction conditions for generating representative pools of cDNAfrom small RNA RNA 16 2537ndash2552

37 SarkarN (1997) Polyadenylation of mRNA in prokaryotesAnnu Rev Biochem 66 173ndash197

38 Yehudai-ResheffS and SchusterG (2000) Characterization of theEcoli poly(A) polymerase nucleotide specificity RNA-bindingaffinities and RNA structure dependence Nucleic Acids Res 281139ndash1144

39 RaabeCA HoeCH RandauG BrosiusJ TangTH andRozhdestvenskyTS (2011) The rocks and shallows of deep RNAsequencing Examples in the Vibrio cholerae RNome RNA 171357ndash1366

40 RaabeCA SanchezCP RandauG RobeckT SkryabinBVChinniSV KubeM ReinhardtR NgGH ManickamRet al (2010) A global view of the nonprotein-codingtranscriptome in Plasmodium falciparum Nucleic Acids Res 38608ndash617

41 SharmaCM HoffmannS DarfeuilleF ReignierJ FindeissSSittkaA ChabasS ReicheK HackermullerJ ReinhardtRet al (2010) The primary transcriptome of the major humanpathogen Helicobacter pylori Nature 464 250ndash255

42 SharmaCM and VogelJ (2009) Experimental approaches forthe discovery and characterization of regulatory small RNACurr Opin Microbiol 12 536ndash546

43 YuB YangZ LiJ MinakhinaS YangM PadgettRWStewardR and ChenX (2005) Methylation as a crucial step inplant microRNA biogenesis Science 307 932ndash935

44 HouwingS KammingaLM BerezikovE CronemboldDGirardA van den ElstH FilippovDV BlaserH RazEMoensCB et al (2007) A role for Piwi and piRNAs in germcell maintenance and transposon silencing in Zebrafish Cell 12969ndash82

45 AckermannHW (2003) Bacteriophage observations andevolution Res Microbiol 154 245ndash251

46 RossmannMG MesyanzhinovVV ArisakaF andLeimanPG (2004) The bacteriophage T4 DNA injectionmachine Curr Opin Struct Biol 14 171ndash180

47 HoCK and ShumanS (2002) Bacteriophage T4 RNA ligase 2(gp241) exemplifies a family of RNA ligases found in allphylogenetic domains Proc Natl Acad Sci USA 9912709ndash12714

48 YinS HoCK and ShumanS (2003) Structure-function analysisof T4 RNA ligase 2 J Biol Chem 278 17601ndash17608

49 AravinA and TuschlT (2005) Identification and characterizationof small RNAs involved in RNA silencing FEBS Lett 5795830ndash5840

50 ViolletS FuchsRT MunafoDB ZhuangF and RobbGB(2011) T4 RNA ligase 2 truncated active site mutants improvedtools for RNA analysis BMC Biotechnol 11 72

51 UhlenbeckOC and GumportRI (1982) T4 RNA LigaseIn BoyerPD (ed) The Enzymes Nucleic Acids Part BAcademic Press Inc New York pp 31ndash58

52 HoCK WangLK LimaCD and ShumanS (2004) Structureand mechanism of RNA ligase Structure 12 327ndash339

53 El OmariK RenJ BirdLE BonaMK KlarmannGLeGriceSF and StammersDK (2006) Molecular architectureand ligand recognition determinants for T4 RNA ligase J BiolChem 281 1573ndash1579

54 StarkMR PleissJA DerasM ScaringeSA and RaderSD(2006) An RNA ligase-mediated method for the efficient creationof large synthetic RNAs RNA 12 2014ndash2019

55 CranstonJW SilberR MalathiVG and HurwitzJ (1974)Studies on ribonucleic acid ligase Characterization of anadenosine triphosphate-inorganic pyrophosphate exchangereaction and demonstration of an enzyme-adenylate complex withT4 bacteriophage-induced enzyme J Biol Chem 2497447ndash7456

56 HigginsNP GeballeAP SnopekTJ SuginoA andCozzarelliNR (1977) Bacteriophage T4 RNA ligase preparationof a physically homogeneous nuclease-free enzyme fromhyperproducing infected cells Nucleic Acids Res 4 3175ndash3186

57 PfefferS Lagos-QuintanaM and TuschlT (2005) Cloning ofsmall RNA molecules Curr Protoc Mol Biol Chapter 26 Unit26 24

58 BruceAG and UhlenbeckOC (1978) Reactions at the terminiof tRNA with T4 RNA ligase Nucleic Acids Res 5 3665ndash3677

59 VigneaultF SismourAM and ChurchGM (2008) EfficientmicroRNA capture and bar-coding via enzymatic oligonucleotideadenylation Nat Methods 5 777ndash779

60 LiuJM LivnyJ LawrenceMS KimballMD WaldorMKand CamilliA (2009) Experimental discovery of sRNAs in Vibriocholerae by direct cloning 5StRNA depletion and parallelsequencing Nucleic Acids Res 37 e46

61 KrugM and UhlenbeckOC (1982) Reversal of T4 RNA ligaseBiochemistry 21 1858ndash1864

62 HafnerM LandgrafP LudwigJ RiceA OjoT LinCHolochD LimC and TuschlT (2008) Identification ofmicroRNAs and other small regulatory RNAs using cDNAlibrary sequencing Methods 44 3ndash12

63 JayaprakashAD JabadoO BrownBD and SachidanandamR(2011) Identification and remediation of biases in the activity ofRNA ligases in small-RNA deep sequencing Nucleic Acids Res39 e141

64 YewCW and KumarSV (2012) Isolation and cloning ofmicroRNAs from recalcitrant plant tissues with small amounts oftotal RNA a step-by step approach Mol Biol Rep 391783ndash1790

65 KurthHM and MochizukiK (2009) 2rsquo-O-methylation stabilizesPiwi-associated small RNAs and ensures DNA elimination inTetrahymena RNA 15 675ndash685

66 EbhardtHA ThiEP WangMB and UnrauPJ (2005)Extensive 30 modification of plant small RNAs is modulated byhelper component-proteinase expression Proc Natl Acad SciUSA 102 13398ndash13403

67 WangJ CzechB CrunkA WallaceA MitrevaMHannonGJ and DavisRE (2011) Deep small RNA sequencingfrom the nematode Ascaris reveals conservation functionaldiversification and novel developmental profiles Genome Res 211462ndash1477

68 ZhuangF FuchsRT and RobbGB (2012) Small RNAexpression profiling by high-throughput sequencing implicationsof enzymatic manipulation J Nucleic Acids 2012 360358

69 FujimuraT and EstebanR (2012) Cap snatching of yeast L-Adouble-stranded RNA virus can operate in trans and requiresviral polymerase actively engaging in transcription J Biol Chem287 12797ndash12804

70 BessmanMJ WalshJD DunnCA SwaminathanJWeldonJE and ShenJ (2001) The gene ygdP associated withthe invasiveness of Escherichia coli K1 designates a Nudixhydrolase Orf176 active on adenosine (50)-pentaphospho-(50)-adenosine (Ap5A) J Biol Chem 276 37834ndash37838

71 DeanaA CelesnikH and BelascoJG (2008) The bacterialenzyme RppH triggers messenger RNA degradation by50 pyrophosphate removal Nature 451 355ndash358

72 BlewettN CollerJ and GoldstrohmA (2011) A quantitativeassay for measuring mRNA decapping by splinted ligation reversetranscription polymerase chain reaction qSL-RT-PCR RNA 17535ndash543

73 KrogerC DillonSC CameronAD PapenfortKSivasankaranSK HokampK ChaoY SittkaA HebrardMHandlerK et al (2012) The transcriptional landscape and smallRNAs of Salmonella enterica serovar Typhimurium Proc NatlAcad Sci USA 109 E1277ndashE1286

74 EnglandTE and UhlenbeckOC (1978) Enzymaticoligoribonucleotide synthesis with T4 RNA ligase Biochemistry17 2069ndash2076

75 SorefanK PaisH HallAE KozomaraA Griffiths-JonesSMoultonV and DalmayT (2012) Reducing ligation bias of smallRNAs in libraries for next generation sequencing Silence 3 4

76 ZhelkovskyAM and McReynoldsLA (2011) Simple andefficient synthesis of 50 pre-adenylated DNA using thermostableRNA ligase Nucleic Acids Res 39 e117

77 McCoyMI and GumportRI (1980) T4 ribonucleic acid ligasejoins single-strand oligo(deoxyribonucleotides) Biochemistry 19635ndash642

12 Nucleic Acids Research 2013

78 AlonS VigneaultF EminagaS ChristodoulouDCSeidmanJG ChurchGM and EisenbergE (2011) Barcodingbias in high-throughput multiplex sequencing of miRNA GenomeRes 21 1506ndash1511

79 SunG WuX WangJ LiH LiX GaoH RossiJ andYenY (2011) A bias-reducing strategy in profiling small RNAsusing Solexa RNA 17 2256ndash2262

80 Van NieuwerburghF SoetaertS PodshivalovaK Ay-LinWangE SchafferL DeforceD SalomonDR HeadSR andOrdoukhanianP (2011) Quantitative bias in Illumina TruSeq anda novel post amplification barcoding strategy for multiplexedDNA and small RNA deep sequencing PLoS One 6 e26969

81 VigneaultF Ter-OvanesyanD AlonS EminagaSC ChristodoulouD SeidmanJG EisenbergE and MChurchG (2012) High-throughput multiplex sequencingof miRNA Curr Protoc Hum Genet Chapter 11 Unit 1112 11-10

82 PakJ and FireA (2007) Distinct populations of primary andsecondary effectors during RNAi in C elegans Science 315241ndash244

83 ZhuYY MachlederEM ChenchikA LiR and SiebertPD(2001) Reverse transcriptase template switching a SMARTapproach for full-length cDNA library constructionBiotechniques 30 892ndash897

84 LevinJZ YassourM AdiconisX NusbaumCThompsonDA FriedmanN GnirkeA and RegevA (2010)Comprehensive comparative analysis of strand-specific RNAsequencing methods Nat Methods 7 709ndash715

85 AndoY BurroughsAM KawanoM de HoonMJ andHayashizakiY (2012) Targeted methods to improve small RNAprofiles generated by deep sequencing In MallickB andGhoshZ (eds) Regulatory RNAs Basics Methods andApplications Springer-Verlag Heidelberg Germany pp 253ndash271

86 KawanoM KawazuC LizioM KawajiH CarninciPSuzukiH and HayashizakiY (2010) Reduction of non-insertsequence reads by dimer eliminator LNA oligonucleotide forsmall RNA deep sequencing Biotechniques 49 751ndash755

87 KapranovP OzsolakF and MilosPM (2012) Profiling of shortRNAs using Helicos single-molecule sequencing Methods MolBiol 822 219ndash232

88 KanagawaT (2003) Bias and artifacts in multitemplatepolymerase chain reactions (PCR) J Biosci Bioeng 96 317ndash323

89 DayDJ SpeiserPW SchulzeE BettendorfM FitnessJBaranyF and WhitePC (1996) Identification of non-amplifyingCYP21 genes when using PCR-based diagnosis of 21-hydroxylasedeficiency in congenital adrenal hyperplasia (CAH) affectedpedigrees Hum Mol Genet 5 2039ndash2048

90 AirdD RossMG ChenWS DanielssonM FennellTRussC JaffeDB NusbaumC and GnirkeA (2011) Analyzingand minimizing PCR amplification bias in Illumina sequencinglibraries Genome Biol 12 R18

91 UllahMF and AatifM (2009) The footprints of cancerdevelopment cancer biomarkers Cancer Treat Rev 35 193ndash200

92 MoMH ChenL FuY WangW and FuSW (2012) Cell-freeCirculating miRNA Biomarkers in Cancer J Cancer 3 432ndash448

93 ZenK and ZhangCY (2012) Circulating microRNAs a novelclass of biomarkers to diagnose and monitor human cancersMed Res Rev 32 326ndash348

94 KimT and ReitmairA (2013) Non-Coding RNAs FunctionalAspects and Diagnostic Utility in Oncology Int J Mol Sci 144934ndash4968

95 PatnaikSK MallickR and YendamuriS (2010) Detection ofmicroRNAs in dried serum blots Anal Biochem 407 147ndash149

96 MitchellPS ParkinRK KrohEM FritzBR WymanSKPogosova-AgadjanyanEL PetersonA NoteboomJOrsquoBriantKC AllenA et al (2008) Circulating microRNAs as

stable blood-based markers for cancer detection Proc NatlAcad Sci USA 105 10513ndash10518

97 CieslaM SkrzypekK KozakowskaM LobodaAJozkowiczA and DulakJ (2011) MicroRNAs as biomarkers ofdisease onset Anal Bioanal Chem 401 2051ndash2061

98 OzsolakF (2012) Third-generation sequencing techniques andapplications to drug discovery Expert Opin Drug Discov 7231ndash243

99 TsongalisGJ CalinG CordelierP CroceC MonzonF andSzafranska-SchwarzbachAE (2013) MicroRNA analysis is itready for prime time Clin Chem 59 343ndash347

100 AmiraN Cancel-TassinG BernardiniS Cochand-PriolletBBittardH ManginP FournierG LatilA and CussenotO(2004) Expression in bladder transitional cell carcinoma by real-time quantitative reverse transcription polymerase chain reactionarray of 65 genes at the tumor suppressor locus 9q341-2identification of 5 candidates tumor suppressor genes Int JCancer 111 539ndash542

101 ZhangX (2007) Biomarker validation movement towardspersonalized medicine Expert Rev Mol Diagn 7 469ndash471

102 SchmittgenTD LeeEJ and JiangJ (2008) High-throughputreal-time PCR Methods Mol Biol 429 89ndash98

103 BenesV and CastoldiM (2010) Expression profiling ofmicroRNA using real-time quantitative PCR how to use it andwhat is available Methods 50 244ndash249

104 BustinSA and NolanT (2004) Pitfalls of quantitative real-timereverse-transcription polymerase chain reaction J Biomol Tech15 155ndash166

105 BeitzingerM and MeisterG (2011) Experimental identificationof microRNA targets by immunoprecipitation of Argonauteprotein complexes Methods Mol Biol 732 153ndash167

106 FiedlerSD CarlettiMZ and ChristensonLK (2010)Quantitative RT-PCR methods for mature microRNA expressionanalysis Methods Mol Biol 630 49ndash64

107 ArefianE KianiJ SoleimaniM ShariatiSA Aghaee-BakhtiariSH AtashiA GheisariY AhmadbeigiN Banaei-MoghaddamAM NaderiM et al (2011) Analysis ofmicroRNA signatures using size-coded ligation-mediated PCRNucleic Acids Res 39 e80

108 JonstrupSP KochJ and KjemsJ (2006) A microRNAdetection system based on padlock probes and rolling circleamplification RNA 12 1747ndash1752

109 LatorraD ArarK and HurleyJM (2003) Designconsiderations and effects of LNA in PCR primers Mol CellProbes 17 253ndash259

110 BustinSA BenesV GarsonJA HellemansJ HuggettJKubistaM MuellerR NolanT PfafflMW ShipleyGLet al (2009) The MIQE guidelines minimum information forpublication of quantitative real-time PCR experiments ClinChem 55 611ndash622

111 TrapnellC WilliamsBA PerteaG MortazaviA KwanGvan BarenMJ SalzbergSL WoldBJ and PachterL (2010)Transcript assembly and quantification by RNA-Seq revealsunannotated transcripts and isoform switching during celldifferentiation Nat Biotechnol 28 511ndash515

112 TrapnellC PachterL and SalzbergSL (2009) TopHatdiscovering splice junctions with RNA-Seq Bioinformatics 251105ndash1111

113 ListerR OrsquoMalleyRC Tonti-FilippiniJ GregoryBDBerryCC MillarAH and EckerJR (2008) Highly integratedsingle-base resolution maps of the epigenome in ArabidopsisCell 133 523ndash536

114 PolidorosAN PasentsisK and TsaftarisAS (2006) Rollingcircle amplification-RACE a method for simultaneous isolationof 50 and 30 cDNA ends from amplified cDNA templatesBiotechniques 41 35ndash42

Nucleic Acids Research 2013 13

Page 10: SURVEY AND SUMMARY Biases in small RNA deep sequencing data

bias In particular RNA secondary structure and RNAmodifications both influence reverse transcription efficacy(see above) Similar to PCRs reverse transcription is nega-tively affected by increasing RNA GC-contentBecause the relative RNA GC-content dictates primer

Tm qRT-PCRs are also strongly influenced by forwardprimer design thereby affecting both qRT-PCR specificityand sensitivity Altering the length of primer-templatecomplementarity permits adjustment of the actualprimer Tm Conversely in cases where RNA nucleotidecomposition would dictate very low melting temperaturesLNA-modified nucleotides might be incorporated into theprimer sequence to increase the corresponding primer Tm(18) However the design of LNA-containing oligonucleo-tides is not always trivial (109) For the actual measure-ment of miRNA in real time two alternative fluorescentreporter molecules are frequently utilized SYBR Greenand the Taqman assay (103) Detection of the fluores-cence signal emitted by SYBR Green increases about100-fold as the molecule intercalates in double-strandedDNA However the mechanism of DNA intercalationsuggests the dye is not target-specific and thereforecannot discriminate primer dimers unspecific productsand target amplification (103) Because the intercalatingdye is simply added to the assay it is often possible toeasily adapt the qRT-PCR assay to already establishedand validated analytical PCRs (104) In comparison theTaqman assay provides increased specificity as the de-tection is based on sequence-specific probendashtarget inter-action Certainly the qRT-PCR assays offer manyadvantages but require evaluation of various factors thatinfluence and potentially bias analysis Thereforeinternational standardization to ensure comparable ex-perimental design and to permit meta-analysis amongdatasets is of prime importance for establishing clinicalqRT-PCR-based personal medicine The MinimumInformation for Publication of Quantitative Real-TimePCR experiments guidelines established standards for bet-ter experimental practice and suggest routes to data ana-lysis that increase reliability and data comparison (110)

mRNA-SEQ

Deep sequencing is not limited to expression profiling ofmiRNA or other small non-protein coding RNAs Evenentire transcriptomes including all protein coding andlong non-protein coding RNAs are subjected to high-throughput sequencing methods Currently as the lengthof single sequence runs is limited (from 100 to 500 nt de-pending on the sequencing platform) longer RNAsrequire fragmentation prior to cDNA synthesis (25)Millions of small reads resulting from mRNA-seq areanalyzed to identify novel transcripts or splice variants(111112) Relative quantification of reads enables one tomonitor differential gene expression among samples andto identify changes in isoform distribution (111) As withsRNA-seq the digital nature of RNA deep sequencing isthought to simplify meta-analysis of mRNA or long non-protein coding RNA expression data (25) Some protocolsfor modification of RNA fragment ends rely on

chemistries identical to sRNA-seq (84113) Therefore ex-pression biases observed in sRNA-seq analysis are alsolikely to cause distortion in whole transcriptome profilingVariations in nucleotide composition and changes in RNAsecondary structure of different RNA fragments mightcorrelate with uneven read distribution Unequal readcoverage generates uncertainty in isoform detection andin extreme cases exons representing unfavorable nucleo-tide compositions might escape analysis As fragmentationis suggested to generate many different but at least par-tially overlapping reads bias in mRNA-seq might be lessprominent Recently a novel technique based on circligasewas introduced to avoid RNA 50-end ligation (114)Reverse transcription is conducted with anchoredoligo(dT) primers that contain an additional sequence topermit later PCR amplification Following first strandcDNA synthesis circligase-mediated ligation is carriedout The resulting circular molecules are subjected toPCR amplification The strategy is of general interestand might be a valuable tool for sRNA-seq as well

CONCLUSION

Bias in general describes systematic errors that reflectmethod-related distortion from the truth Bias might bedetected easily when data generated by different methodsare subjected to analytical comparison Because it is sys-tematic in nature bias is not subject to variation inrepeated experiments Here based on biochemical anddeep sequencing data analyses we have presentedvarious steps in sRNA-seq protocols likely to causesevere distortions in the relative expression levels of indi-vidual sRNAs

The efficacy of RNA end-modification to permit cDNAconstruction depends not only on differences in strategiesand reagents but also on changes in among others buffercompositions and additives that could increase molecularcrowding Therefore meta-analysis of data generated withnon-identical methods reagents or even buffers is likely tobe error-prone (36396378) To achieve comparableresults and to evaluate relative changes in the quantifica-tion of the resulting data experiments and analyses shouldbe conducted under strictly identical conditions It has tobe stressed that bias in RNA-specific variation is not onlyrestricted to inter-sample comparison but also effects therelative ranking of intra-sample RNA expression

In addition to the more obvious quantitative aspects ofbias there are also qualitative aspects because RNA-seqmethods are also employed for RNA discovery IndividualRNAs might lsquodrop outrsquo from detection because they arenot amenable to cDNA construction under chosen condi-tions Certainly to increase the likelihood of productiveRNA end-modification and increase the complexity ofsRNA-seq library content the parallel application of dif-ferent strategies and reagents is required (33396379) Inparticular increased variability of adapter sequences (ieadapter pools) helps to increase the diversity of RNAsaccessed (63) A recent report emphasized the need forparallel application of different RNA- and DNA-basedadapter oligonucleotides (39)

10 Nucleic Acids Research 2013

The source of expression level distortion and the failureto detect certain RNA species at all appear not to belimited to reactions of RNAcDNA-end modificationbut also apply to cDNA amplification by PCR (30ndash32)Differences in reaction buffers RNA GC-content sec-ondary structures RNA length and primers might eachlead to bias during PCR Therefore the analysis ofsRNA-seq data to examine relative changes in geneexpression or identification reflects both true RNA abun-dance and biases related to the methods applied

The sRNA-seq approach is also attractive to medicalresearch Deep sequencing might permit the screening ofmultiple patient samples to monitor the progression ofdisease or treatment Therefore it is urgently necessaryto establish rigid international standardizations tominimize distortion and to avoid misleading conclusionsduring RNA-seq data interpretation In summary quali-tative and quantitative RNomics remain more challengingthan anticipated

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

FUNDING

National Genome Research Network [NGFNIII01GS0808 to JB and TSR] the Deutsche Forschungs-gemeinschaft [BR 7544-1 to JB] USM ResearchUniversity Grant [1001CIPPT81305 to THT]Funding for open access charge The licence was fundedby Dr Brosiusrsquos research budget

Conflict of interest statement None declared

REFERENCES

1 BrosiusJ (2009) The fragmented gene Ann N Y Acad Sci1178 186ndash193

2 BrosiusJ and TiedgeH (2004) RNomenclature RNA Biol 181ndash83

3 LiuQ and ParooZ (2010) Biochemical principles of small RNApathways Annu Rev Biochem 79 295ndash319

4 BrosiusJ (2012) In ErdmannVA and BarciszewskiJ (eds)From Nucleic Acids Sequences to Molecular Medicine RNATechnologies Springer Berlin Heidelberg pp 1ndash18

5 HuttenhoferA BrosiusJ and BachellerieJP (2002) RNomicsidentification and function of small non-messenger RNAs CurrOpin Chem Biol 6 835ndash843

6 AnS and SongJJ (2011) The coded functions of noncodingRNAs for gene regulation Mol Cells 31 491ndash496

7 BushatiN and CohenSM (2007) microRNA functions AnnuRev Cell Dev Biol 23 175ndash205

8 ChiangHR SchoenfeldLW RubyJG AuyeungVCSpiesN BaekD JohnstonWK RussC LuoS BabiarzJEet al (2010) Mammalian microRNAs experimental evaluation ofnovel and previously annotated genes Genes Dev 24 992ndash1009

9 AmbrosV (2001) microRNAs tiny regulators with greatpotential Cell 107 823ndash826

10 LauNC LimLP WeinsteinEG and BartelDP (2001) Anabundant class of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 294 858ndash862

11 RuvkunG (2001) Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash799

12 RuvkunGB (2003) The tiny RNA world Harvey Lect 991ndash21

13 GurtanAM and SharpPA (2013) The Role of miRNAs inRegulating Gene Expression Networks J Mol Biol 4253582ndash3600

14 LiangH ZhangJ ZenK ZhangCY and ChenX (2013)Nuclear microRNAs and their unconventional role in regulatingnon-coding RNAs Protein Cell 4 325ndash330

15 HuangV and LiLC (2012) miRNA goes nuclear RNA Biol 9269ndash273

16 WangZ HanJ CuiY FanK and ZhouX (2013) CirculatingmicroRNA-21 as noninvasive predictive biomarker for response incancer immunotherapy Med Hypotheses 81 41ndash43

17 ZhaoC ZhangJ ZhangS YuD ChenY LiuQ ShiMNiC and ZhuM (2013) Diagnostic and biological significance ofmicroRNA-192 in pancreatic ductal adenocarcinoma Oncol Rep30 276ndash284

18 van RooijE (2011) The art of microRNA research Circ Res108 219ndash234

19 LeeLW ZhangS EtheridgeA MaL MartinD GalasDand WangK (2010) Complexity of the microRNA repertoirerevealed by next-generation sequencing RNA 16 2170ndash2180

20 OkoniewskiMJ and MillerCJ (2006) Hybridization interactionsbetween probesets in short oligo microarrays lead to spuriouscorrelations BMC Bioinformatics 7 276

21 MortazaviA WilliamsBA McCueK SchaefferL andWoldB (2008) Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 5 621ndash628

22 LiuS LinL JiangP WangD and XingY (2011) Acomparison of RNA-Seq and high-density exon array fordetecting differential gene expression between closely relatedspecies Nucleic Acids Res 39 578ndash588

23 RaghavachariN BarbJ YangY LiuP WoodhouseKLevyD OrsquoDonnellC MunsonPJ and KatoG (2012) Asystematic comparison and evaluation of high density exon arraysand RNA-seq technology used to unravel the peripheral bloodtranscriptome of sickle cell disease BMC Med Genomics 5 28

24 MartinJA and WangZ (2011) Next-generation transcriptomeassembly Nat Rev Genet 12 671ndash682

25 WangZ GersteinM and SnyderM (2009) RNA-Seq arevolutionary tool for transcriptomics Nat Rev Genet 1057ndash63

26 KozomaraA and Griffiths-JonesS (2011) miRBase integratingmicroRNA annotation and deep-sequencing data Nucleic AcidsRes 39 D152ndashD157

27 LinsenSE de WitE JanssensG HeaterS ChapmanLParkinRK FritzB WymanSK de BruijnE VoestEE et al(2009) Limitations and possibilities of small RNA digital geneexpression profiling Nat Methods 6 474ndash476

28 BakerM (2010) MicroRNA profiling separating signal fromnoise Nat Methods 7 687ndash692

29 HafnerM RenwickN BrownM MihailovicA HolochDLinC PenaJT NusbaumJD MorozovP LudwigJ et al(2011) RNA-ligase-dependent biases in miRNA representationin deep-sequenced small RNA cDNA libraries RNA 171697ndash1712

30 DabneyJ and MeyerM (2012) Length and GC-biases duringsequencing library amplification a comparison of variouspolymerase-buffer systems with ancient and modern DNAsequencing libraries Biotechniques 52 87ndash94

31 OrpanaAK HoTH and StenmanJ (2012) Multiple heatpulses during PCR extension enabling amplification of GC-richsequences and reducing amplification bias Anal Chem 842081ndash2087

32 SendlerE JohnsonGD and KrawetzSA (2011) Local andglobal factors affecting RNA sequencing analysis Anal Biochem419 317ndash322

33 ZhuangF FuchsRT SunZ ZhengY and RobbGB (2012)Structural bias in T4 RNA ligase-mediated 30-adapter ligationNucleic Acids Res 40 e54

34 DeChiaraTM and BrosiusJ (1987) Neural BC1 RNA cDNAclones reveal nonrepetitive sequence content Proc Natl AcadSci USA 84 2624ndash2628

Nucleic Acids Research 2013 11

35 WickensM and KwakJE (2008) Molecular biology A tail talefor U Science 319 1344ndash1345

36 MunafoDB and RobbGB (2010) Optimization of enzymaticreaction conditions for generating representative pools of cDNAfrom small RNA RNA 16 2537ndash2552

37 SarkarN (1997) Polyadenylation of mRNA in prokaryotesAnnu Rev Biochem 66 173ndash197

38 Yehudai-ResheffS and SchusterG (2000) Characterization of theEcoli poly(A) polymerase nucleotide specificity RNA-bindingaffinities and RNA structure dependence Nucleic Acids Res 281139ndash1144

39 RaabeCA HoeCH RandauG BrosiusJ TangTH andRozhdestvenskyTS (2011) The rocks and shallows of deep RNAsequencing Examples in the Vibrio cholerae RNome RNA 171357ndash1366

40 RaabeCA SanchezCP RandauG RobeckT SkryabinBVChinniSV KubeM ReinhardtR NgGH ManickamRet al (2010) A global view of the nonprotein-codingtranscriptome in Plasmodium falciparum Nucleic Acids Res 38608ndash617

41 SharmaCM HoffmannS DarfeuilleF ReignierJ FindeissSSittkaA ChabasS ReicheK HackermullerJ ReinhardtRet al (2010) The primary transcriptome of the major humanpathogen Helicobacter pylori Nature 464 250ndash255

42 SharmaCM and VogelJ (2009) Experimental approaches forthe discovery and characterization of regulatory small RNACurr Opin Microbiol 12 536ndash546

43 YuB YangZ LiJ MinakhinaS YangM PadgettRWStewardR and ChenX (2005) Methylation as a crucial step inplant microRNA biogenesis Science 307 932ndash935

44 HouwingS KammingaLM BerezikovE CronemboldDGirardA van den ElstH FilippovDV BlaserH RazEMoensCB et al (2007) A role for Piwi and piRNAs in germcell maintenance and transposon silencing in Zebrafish Cell 12969ndash82

45 AckermannHW (2003) Bacteriophage observations andevolution Res Microbiol 154 245ndash251

46 RossmannMG MesyanzhinovVV ArisakaF andLeimanPG (2004) The bacteriophage T4 DNA injectionmachine Curr Opin Struct Biol 14 171ndash180

47 HoCK and ShumanS (2002) Bacteriophage T4 RNA ligase 2(gp241) exemplifies a family of RNA ligases found in allphylogenetic domains Proc Natl Acad Sci USA 9912709ndash12714

48 YinS HoCK and ShumanS (2003) Structure-function analysisof T4 RNA ligase 2 J Biol Chem 278 17601ndash17608

49 AravinA and TuschlT (2005) Identification and characterizationof small RNAs involved in RNA silencing FEBS Lett 5795830ndash5840

50 ViolletS FuchsRT MunafoDB ZhuangF and RobbGB(2011) T4 RNA ligase 2 truncated active site mutants improvedtools for RNA analysis BMC Biotechnol 11 72

51 UhlenbeckOC and GumportRI (1982) T4 RNA LigaseIn BoyerPD (ed) The Enzymes Nucleic Acids Part BAcademic Press Inc New York pp 31ndash58

52 HoCK WangLK LimaCD and ShumanS (2004) Structureand mechanism of RNA ligase Structure 12 327ndash339

53 El OmariK RenJ BirdLE BonaMK KlarmannGLeGriceSF and StammersDK (2006) Molecular architectureand ligand recognition determinants for T4 RNA ligase J BiolChem 281 1573ndash1579

54 StarkMR PleissJA DerasM ScaringeSA and RaderSD(2006) An RNA ligase-mediated method for the efficient creationof large synthetic RNAs RNA 12 2014ndash2019

55 CranstonJW SilberR MalathiVG and HurwitzJ (1974)Studies on ribonucleic acid ligase Characterization of anadenosine triphosphate-inorganic pyrophosphate exchangereaction and demonstration of an enzyme-adenylate complex withT4 bacteriophage-induced enzyme J Biol Chem 2497447ndash7456

56 HigginsNP GeballeAP SnopekTJ SuginoA andCozzarelliNR (1977) Bacteriophage T4 RNA ligase preparationof a physically homogeneous nuclease-free enzyme fromhyperproducing infected cells Nucleic Acids Res 4 3175ndash3186

57 PfefferS Lagos-QuintanaM and TuschlT (2005) Cloning ofsmall RNA molecules Curr Protoc Mol Biol Chapter 26 Unit26 24

58 BruceAG and UhlenbeckOC (1978) Reactions at the terminiof tRNA with T4 RNA ligase Nucleic Acids Res 5 3665ndash3677

59 VigneaultF SismourAM and ChurchGM (2008) EfficientmicroRNA capture and bar-coding via enzymatic oligonucleotideadenylation Nat Methods 5 777ndash779

60 LiuJM LivnyJ LawrenceMS KimballMD WaldorMKand CamilliA (2009) Experimental discovery of sRNAs in Vibriocholerae by direct cloning 5StRNA depletion and parallelsequencing Nucleic Acids Res 37 e46

61 KrugM and UhlenbeckOC (1982) Reversal of T4 RNA ligaseBiochemistry 21 1858ndash1864

62 HafnerM LandgrafP LudwigJ RiceA OjoT LinCHolochD LimC and TuschlT (2008) Identification ofmicroRNAs and other small regulatory RNAs using cDNAlibrary sequencing Methods 44 3ndash12

63 JayaprakashAD JabadoO BrownBD and SachidanandamR(2011) Identification and remediation of biases in the activity ofRNA ligases in small-RNA deep sequencing Nucleic Acids Res39 e141

64 YewCW and KumarSV (2012) Isolation and cloning ofmicroRNAs from recalcitrant plant tissues with small amounts oftotal RNA a step-by step approach Mol Biol Rep 391783ndash1790

65 KurthHM and MochizukiK (2009) 2rsquo-O-methylation stabilizesPiwi-associated small RNAs and ensures DNA elimination inTetrahymena RNA 15 675ndash685

66 EbhardtHA ThiEP WangMB and UnrauPJ (2005)Extensive 30 modification of plant small RNAs is modulated byhelper component-proteinase expression Proc Natl Acad SciUSA 102 13398ndash13403

67 WangJ CzechB CrunkA WallaceA MitrevaMHannonGJ and DavisRE (2011) Deep small RNA sequencingfrom the nematode Ascaris reveals conservation functionaldiversification and novel developmental profiles Genome Res 211462ndash1477

68 ZhuangF FuchsRT and RobbGB (2012) Small RNAexpression profiling by high-throughput sequencing implicationsof enzymatic manipulation J Nucleic Acids 2012 360358

69 FujimuraT and EstebanR (2012) Cap snatching of yeast L-Adouble-stranded RNA virus can operate in trans and requiresviral polymerase actively engaging in transcription J Biol Chem287 12797ndash12804

70 BessmanMJ WalshJD DunnCA SwaminathanJWeldonJE and ShenJ (2001) The gene ygdP associated withthe invasiveness of Escherichia coli K1 designates a Nudixhydrolase Orf176 active on adenosine (50)-pentaphospho-(50)-adenosine (Ap5A) J Biol Chem 276 37834ndash37838

71 DeanaA CelesnikH and BelascoJG (2008) The bacterialenzyme RppH triggers messenger RNA degradation by50 pyrophosphate removal Nature 451 355ndash358

72 BlewettN CollerJ and GoldstrohmA (2011) A quantitativeassay for measuring mRNA decapping by splinted ligation reversetranscription polymerase chain reaction qSL-RT-PCR RNA 17535ndash543

73 KrogerC DillonSC CameronAD PapenfortKSivasankaranSK HokampK ChaoY SittkaA HebrardMHandlerK et al (2012) The transcriptional landscape and smallRNAs of Salmonella enterica serovar Typhimurium Proc NatlAcad Sci USA 109 E1277ndashE1286

74 EnglandTE and UhlenbeckOC (1978) Enzymaticoligoribonucleotide synthesis with T4 RNA ligase Biochemistry17 2069ndash2076

75 SorefanK PaisH HallAE KozomaraA Griffiths-JonesSMoultonV and DalmayT (2012) Reducing ligation bias of smallRNAs in libraries for next generation sequencing Silence 3 4

76 ZhelkovskyAM and McReynoldsLA (2011) Simple andefficient synthesis of 50 pre-adenylated DNA using thermostableRNA ligase Nucleic Acids Res 39 e117

77 McCoyMI and GumportRI (1980) T4 ribonucleic acid ligasejoins single-strand oligo(deoxyribonucleotides) Biochemistry 19635ndash642

12 Nucleic Acids Research 2013

78 AlonS VigneaultF EminagaS ChristodoulouDCSeidmanJG ChurchGM and EisenbergE (2011) Barcodingbias in high-throughput multiplex sequencing of miRNA GenomeRes 21 1506ndash1511

79 SunG WuX WangJ LiH LiX GaoH RossiJ andYenY (2011) A bias-reducing strategy in profiling small RNAsusing Solexa RNA 17 2256ndash2262

80 Van NieuwerburghF SoetaertS PodshivalovaK Ay-LinWangE SchafferL DeforceD SalomonDR HeadSR andOrdoukhanianP (2011) Quantitative bias in Illumina TruSeq anda novel post amplification barcoding strategy for multiplexedDNA and small RNA deep sequencing PLoS One 6 e26969

81 VigneaultF Ter-OvanesyanD AlonS EminagaSC ChristodoulouD SeidmanJG EisenbergE and MChurchG (2012) High-throughput multiplex sequencingof miRNA Curr Protoc Hum Genet Chapter 11 Unit 1112 11-10

82 PakJ and FireA (2007) Distinct populations of primary andsecondary effectors during RNAi in C elegans Science 315241ndash244

83 ZhuYY MachlederEM ChenchikA LiR and SiebertPD(2001) Reverse transcriptase template switching a SMARTapproach for full-length cDNA library constructionBiotechniques 30 892ndash897

84 LevinJZ YassourM AdiconisX NusbaumCThompsonDA FriedmanN GnirkeA and RegevA (2010)Comprehensive comparative analysis of strand-specific RNAsequencing methods Nat Methods 7 709ndash715

85 AndoY BurroughsAM KawanoM de HoonMJ andHayashizakiY (2012) Targeted methods to improve small RNAprofiles generated by deep sequencing In MallickB andGhoshZ (eds) Regulatory RNAs Basics Methods andApplications Springer-Verlag Heidelberg Germany pp 253ndash271

86 KawanoM KawazuC LizioM KawajiH CarninciPSuzukiH and HayashizakiY (2010) Reduction of non-insertsequence reads by dimer eliminator LNA oligonucleotide forsmall RNA deep sequencing Biotechniques 49 751ndash755

87 KapranovP OzsolakF and MilosPM (2012) Profiling of shortRNAs using Helicos single-molecule sequencing Methods MolBiol 822 219ndash232

88 KanagawaT (2003) Bias and artifacts in multitemplatepolymerase chain reactions (PCR) J Biosci Bioeng 96 317ndash323

89 DayDJ SpeiserPW SchulzeE BettendorfM FitnessJBaranyF and WhitePC (1996) Identification of non-amplifyingCYP21 genes when using PCR-based diagnosis of 21-hydroxylasedeficiency in congenital adrenal hyperplasia (CAH) affectedpedigrees Hum Mol Genet 5 2039ndash2048

90 AirdD RossMG ChenWS DanielssonM FennellTRussC JaffeDB NusbaumC and GnirkeA (2011) Analyzingand minimizing PCR amplification bias in Illumina sequencinglibraries Genome Biol 12 R18

91 UllahMF and AatifM (2009) The footprints of cancerdevelopment cancer biomarkers Cancer Treat Rev 35 193ndash200

92 MoMH ChenL FuY WangW and FuSW (2012) Cell-freeCirculating miRNA Biomarkers in Cancer J Cancer 3 432ndash448

93 ZenK and ZhangCY (2012) Circulating microRNAs a novelclass of biomarkers to diagnose and monitor human cancersMed Res Rev 32 326ndash348

94 KimT and ReitmairA (2013) Non-Coding RNAs FunctionalAspects and Diagnostic Utility in Oncology Int J Mol Sci 144934ndash4968

95 PatnaikSK MallickR and YendamuriS (2010) Detection ofmicroRNAs in dried serum blots Anal Biochem 407 147ndash149

96 MitchellPS ParkinRK KrohEM FritzBR WymanSKPogosova-AgadjanyanEL PetersonA NoteboomJOrsquoBriantKC AllenA et al (2008) Circulating microRNAs as

stable blood-based markers for cancer detection Proc NatlAcad Sci USA 105 10513ndash10518

97 CieslaM SkrzypekK KozakowskaM LobodaAJozkowiczA and DulakJ (2011) MicroRNAs as biomarkers ofdisease onset Anal Bioanal Chem 401 2051ndash2061

98 OzsolakF (2012) Third-generation sequencing techniques andapplications to drug discovery Expert Opin Drug Discov 7231ndash243

99 TsongalisGJ CalinG CordelierP CroceC MonzonF andSzafranska-SchwarzbachAE (2013) MicroRNA analysis is itready for prime time Clin Chem 59 343ndash347

100 AmiraN Cancel-TassinG BernardiniS Cochand-PriolletBBittardH ManginP FournierG LatilA and CussenotO(2004) Expression in bladder transitional cell carcinoma by real-time quantitative reverse transcription polymerase chain reactionarray of 65 genes at the tumor suppressor locus 9q341-2identification of 5 candidates tumor suppressor genes Int JCancer 111 539ndash542

101 ZhangX (2007) Biomarker validation movement towardspersonalized medicine Expert Rev Mol Diagn 7 469ndash471

102 SchmittgenTD LeeEJ and JiangJ (2008) High-throughputreal-time PCR Methods Mol Biol 429 89ndash98

103 BenesV and CastoldiM (2010) Expression profiling ofmicroRNA using real-time quantitative PCR how to use it andwhat is available Methods 50 244ndash249

104 BustinSA and NolanT (2004) Pitfalls of quantitative real-timereverse-transcription polymerase chain reaction J Biomol Tech15 155ndash166

105 BeitzingerM and MeisterG (2011) Experimental identificationof microRNA targets by immunoprecipitation of Argonauteprotein complexes Methods Mol Biol 732 153ndash167

106 FiedlerSD CarlettiMZ and ChristensonLK (2010)Quantitative RT-PCR methods for mature microRNA expressionanalysis Methods Mol Biol 630 49ndash64

107 ArefianE KianiJ SoleimaniM ShariatiSA Aghaee-BakhtiariSH AtashiA GheisariY AhmadbeigiN Banaei-MoghaddamAM NaderiM et al (2011) Analysis ofmicroRNA signatures using size-coded ligation-mediated PCRNucleic Acids Res 39 e80

108 JonstrupSP KochJ and KjemsJ (2006) A microRNAdetection system based on padlock probes and rolling circleamplification RNA 12 1747ndash1752

109 LatorraD ArarK and HurleyJM (2003) Designconsiderations and effects of LNA in PCR primers Mol CellProbes 17 253ndash259

110 BustinSA BenesV GarsonJA HellemansJ HuggettJKubistaM MuellerR NolanT PfafflMW ShipleyGLet al (2009) The MIQE guidelines minimum information forpublication of quantitative real-time PCR experiments ClinChem 55 611ndash622

111 TrapnellC WilliamsBA PerteaG MortazaviA KwanGvan BarenMJ SalzbergSL WoldBJ and PachterL (2010)Transcript assembly and quantification by RNA-Seq revealsunannotated transcripts and isoform switching during celldifferentiation Nat Biotechnol 28 511ndash515

112 TrapnellC PachterL and SalzbergSL (2009) TopHatdiscovering splice junctions with RNA-Seq Bioinformatics 251105ndash1111

113 ListerR OrsquoMalleyRC Tonti-FilippiniJ GregoryBDBerryCC MillarAH and EckerJR (2008) Highly integratedsingle-base resolution maps of the epigenome in ArabidopsisCell 133 523ndash536

114 PolidorosAN PasentsisK and TsaftarisAS (2006) Rollingcircle amplification-RACE a method for simultaneous isolationof 50 and 30 cDNA ends from amplified cDNA templatesBiotechniques 41 35ndash42

Nucleic Acids Research 2013 13

Page 11: SURVEY AND SUMMARY Biases in small RNA deep sequencing data

The source of expression level distortion and the failureto detect certain RNA species at all appear not to belimited to reactions of RNAcDNA-end modificationbut also apply to cDNA amplification by PCR (30ndash32)Differences in reaction buffers RNA GC-content sec-ondary structures RNA length and primers might eachlead to bias during PCR Therefore the analysis ofsRNA-seq data to examine relative changes in geneexpression or identification reflects both true RNA abun-dance and biases related to the methods applied

The sRNA-seq approach is also attractive to medicalresearch Deep sequencing might permit the screening ofmultiple patient samples to monitor the progression ofdisease or treatment Therefore it is urgently necessaryto establish rigid international standardizations tominimize distortion and to avoid misleading conclusionsduring RNA-seq data interpretation In summary quali-tative and quantitative RNomics remain more challengingthan anticipated

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

FUNDING

National Genome Research Network [NGFNIII01GS0808 to JB and TSR] the Deutsche Forschungs-gemeinschaft [BR 7544-1 to JB] USM ResearchUniversity Grant [1001CIPPT81305 to THT]Funding for open access charge The licence was fundedby Dr Brosiusrsquos research budget

Conflict of interest statement None declared

REFERENCES

1 BrosiusJ (2009) The fragmented gene Ann N Y Acad Sci1178 186ndash193

2 BrosiusJ and TiedgeH (2004) RNomenclature RNA Biol 181ndash83

3 LiuQ and ParooZ (2010) Biochemical principles of small RNApathways Annu Rev Biochem 79 295ndash319

4 BrosiusJ (2012) In ErdmannVA and BarciszewskiJ (eds)From Nucleic Acids Sequences to Molecular Medicine RNATechnologies Springer Berlin Heidelberg pp 1ndash18

5 HuttenhoferA BrosiusJ and BachellerieJP (2002) RNomicsidentification and function of small non-messenger RNAs CurrOpin Chem Biol 6 835ndash843

6 AnS and SongJJ (2011) The coded functions of noncodingRNAs for gene regulation Mol Cells 31 491ndash496

7 BushatiN and CohenSM (2007) microRNA functions AnnuRev Cell Dev Biol 23 175ndash205

8 ChiangHR SchoenfeldLW RubyJG AuyeungVCSpiesN BaekD JohnstonWK RussC LuoS BabiarzJEet al (2010) Mammalian microRNAs experimental evaluation ofnovel and previously annotated genes Genes Dev 24 992ndash1009

9 AmbrosV (2001) microRNAs tiny regulators with greatpotential Cell 107 823ndash826

10 LauNC LimLP WeinsteinEG and BartelDP (2001) Anabundant class of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 294 858ndash862

11 RuvkunG (2001) Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash799

12 RuvkunGB (2003) The tiny RNA world Harvey Lect 991ndash21

13 GurtanAM and SharpPA (2013) The Role of miRNAs inRegulating Gene Expression Networks J Mol Biol 4253582ndash3600

14 LiangH ZhangJ ZenK ZhangCY and ChenX (2013)Nuclear microRNAs and their unconventional role in regulatingnon-coding RNAs Protein Cell 4 325ndash330

15 HuangV and LiLC (2012) miRNA goes nuclear RNA Biol 9269ndash273

16 WangZ HanJ CuiY FanK and ZhouX (2013) CirculatingmicroRNA-21 as noninvasive predictive biomarker for response incancer immunotherapy Med Hypotheses 81 41ndash43

17 ZhaoC ZhangJ ZhangS YuD ChenY LiuQ ShiMNiC and ZhuM (2013) Diagnostic and biological significance ofmicroRNA-192 in pancreatic ductal adenocarcinoma Oncol Rep30 276ndash284

18 van RooijE (2011) The art of microRNA research Circ Res108 219ndash234

19 LeeLW ZhangS EtheridgeA MaL MartinD GalasDand WangK (2010) Complexity of the microRNA repertoirerevealed by next-generation sequencing RNA 16 2170ndash2180

20 OkoniewskiMJ and MillerCJ (2006) Hybridization interactionsbetween probesets in short oligo microarrays lead to spuriouscorrelations BMC Bioinformatics 7 276

21 MortazaviA WilliamsBA McCueK SchaefferL andWoldB (2008) Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 5 621ndash628

22 LiuS LinL JiangP WangD and XingY (2011) Acomparison of RNA-Seq and high-density exon array fordetecting differential gene expression between closely relatedspecies Nucleic Acids Res 39 578ndash588

23 RaghavachariN BarbJ YangY LiuP WoodhouseKLevyD OrsquoDonnellC MunsonPJ and KatoG (2012) Asystematic comparison and evaluation of high density exon arraysand RNA-seq technology used to unravel the peripheral bloodtranscriptome of sickle cell disease BMC Med Genomics 5 28

24 MartinJA and WangZ (2011) Next-generation transcriptomeassembly Nat Rev Genet 12 671ndash682

25 WangZ GersteinM and SnyderM (2009) RNA-Seq arevolutionary tool for transcriptomics Nat Rev Genet 1057ndash63

26 KozomaraA and Griffiths-JonesS (2011) miRBase integratingmicroRNA annotation and deep-sequencing data Nucleic AcidsRes 39 D152ndashD157

27 LinsenSE de WitE JanssensG HeaterS ChapmanLParkinRK FritzB WymanSK de BruijnE VoestEE et al(2009) Limitations and possibilities of small RNA digital geneexpression profiling Nat Methods 6 474ndash476

28 BakerM (2010) MicroRNA profiling separating signal fromnoise Nat Methods 7 687ndash692

29 HafnerM RenwickN BrownM MihailovicA HolochDLinC PenaJT NusbaumJD MorozovP LudwigJ et al(2011) RNA-ligase-dependent biases in miRNA representationin deep-sequenced small RNA cDNA libraries RNA 171697ndash1712

30 DabneyJ and MeyerM (2012) Length and GC-biases duringsequencing library amplification a comparison of variouspolymerase-buffer systems with ancient and modern DNAsequencing libraries Biotechniques 52 87ndash94

31 OrpanaAK HoTH and StenmanJ (2012) Multiple heatpulses during PCR extension enabling amplification of GC-richsequences and reducing amplification bias Anal Chem 842081ndash2087

32 SendlerE JohnsonGD and KrawetzSA (2011) Local andglobal factors affecting RNA sequencing analysis Anal Biochem419 317ndash322

33 ZhuangF FuchsRT SunZ ZhengY and RobbGB (2012)Structural bias in T4 RNA ligase-mediated 30-adapter ligationNucleic Acids Res 40 e54

34 DeChiaraTM and BrosiusJ (1987) Neural BC1 RNA cDNAclones reveal nonrepetitive sequence content Proc Natl AcadSci USA 84 2624ndash2628

Nucleic Acids Research 2013 11

35 WickensM and KwakJE (2008) Molecular biology A tail talefor U Science 319 1344ndash1345

36 MunafoDB and RobbGB (2010) Optimization of enzymaticreaction conditions for generating representative pools of cDNAfrom small RNA RNA 16 2537ndash2552

37 SarkarN (1997) Polyadenylation of mRNA in prokaryotesAnnu Rev Biochem 66 173ndash197

38 Yehudai-ResheffS and SchusterG (2000) Characterization of theEcoli poly(A) polymerase nucleotide specificity RNA-bindingaffinities and RNA structure dependence Nucleic Acids Res 281139ndash1144

39 RaabeCA HoeCH RandauG BrosiusJ TangTH andRozhdestvenskyTS (2011) The rocks and shallows of deep RNAsequencing Examples in the Vibrio cholerae RNome RNA 171357ndash1366

40 RaabeCA SanchezCP RandauG RobeckT SkryabinBVChinniSV KubeM ReinhardtR NgGH ManickamRet al (2010) A global view of the nonprotein-codingtranscriptome in Plasmodium falciparum Nucleic Acids Res 38608ndash617

41 SharmaCM HoffmannS DarfeuilleF ReignierJ FindeissSSittkaA ChabasS ReicheK HackermullerJ ReinhardtRet al (2010) The primary transcriptome of the major humanpathogen Helicobacter pylori Nature 464 250ndash255

42 SharmaCM and VogelJ (2009) Experimental approaches forthe discovery and characterization of regulatory small RNACurr Opin Microbiol 12 536ndash546

43 YuB YangZ LiJ MinakhinaS YangM PadgettRWStewardR and ChenX (2005) Methylation as a crucial step inplant microRNA biogenesis Science 307 932ndash935

44 HouwingS KammingaLM BerezikovE CronemboldDGirardA van den ElstH FilippovDV BlaserH RazEMoensCB et al (2007) A role for Piwi and piRNAs in germcell maintenance and transposon silencing in Zebrafish Cell 12969ndash82

45 AckermannHW (2003) Bacteriophage observations andevolution Res Microbiol 154 245ndash251

46 RossmannMG MesyanzhinovVV ArisakaF andLeimanPG (2004) The bacteriophage T4 DNA injectionmachine Curr Opin Struct Biol 14 171ndash180

47 HoCK and ShumanS (2002) Bacteriophage T4 RNA ligase 2(gp241) exemplifies a family of RNA ligases found in allphylogenetic domains Proc Natl Acad Sci USA 9912709ndash12714

48 YinS HoCK and ShumanS (2003) Structure-function analysisof T4 RNA ligase 2 J Biol Chem 278 17601ndash17608

49 AravinA and TuschlT (2005) Identification and characterizationof small RNAs involved in RNA silencing FEBS Lett 5795830ndash5840

50 ViolletS FuchsRT MunafoDB ZhuangF and RobbGB(2011) T4 RNA ligase 2 truncated active site mutants improvedtools for RNA analysis BMC Biotechnol 11 72

51 UhlenbeckOC and GumportRI (1982) T4 RNA LigaseIn BoyerPD (ed) The Enzymes Nucleic Acids Part BAcademic Press Inc New York pp 31ndash58

52 HoCK WangLK LimaCD and ShumanS (2004) Structureand mechanism of RNA ligase Structure 12 327ndash339

53 El OmariK RenJ BirdLE BonaMK KlarmannGLeGriceSF and StammersDK (2006) Molecular architectureand ligand recognition determinants for T4 RNA ligase J BiolChem 281 1573ndash1579

54 StarkMR PleissJA DerasM ScaringeSA and RaderSD(2006) An RNA ligase-mediated method for the efficient creationof large synthetic RNAs RNA 12 2014ndash2019

55 CranstonJW SilberR MalathiVG and HurwitzJ (1974)Studies on ribonucleic acid ligase Characterization of anadenosine triphosphate-inorganic pyrophosphate exchangereaction and demonstration of an enzyme-adenylate complex withT4 bacteriophage-induced enzyme J Biol Chem 2497447ndash7456

56 HigginsNP GeballeAP SnopekTJ SuginoA andCozzarelliNR (1977) Bacteriophage T4 RNA ligase preparationof a physically homogeneous nuclease-free enzyme fromhyperproducing infected cells Nucleic Acids Res 4 3175ndash3186

57 PfefferS Lagos-QuintanaM and TuschlT (2005) Cloning ofsmall RNA molecules Curr Protoc Mol Biol Chapter 26 Unit26 24

58 BruceAG and UhlenbeckOC (1978) Reactions at the terminiof tRNA with T4 RNA ligase Nucleic Acids Res 5 3665ndash3677

59 VigneaultF SismourAM and ChurchGM (2008) EfficientmicroRNA capture and bar-coding via enzymatic oligonucleotideadenylation Nat Methods 5 777ndash779

60 LiuJM LivnyJ LawrenceMS KimballMD WaldorMKand CamilliA (2009) Experimental discovery of sRNAs in Vibriocholerae by direct cloning 5StRNA depletion and parallelsequencing Nucleic Acids Res 37 e46

61 KrugM and UhlenbeckOC (1982) Reversal of T4 RNA ligaseBiochemistry 21 1858ndash1864

62 HafnerM LandgrafP LudwigJ RiceA OjoT LinCHolochD LimC and TuschlT (2008) Identification ofmicroRNAs and other small regulatory RNAs using cDNAlibrary sequencing Methods 44 3ndash12

63 JayaprakashAD JabadoO BrownBD and SachidanandamR(2011) Identification and remediation of biases in the activity ofRNA ligases in small-RNA deep sequencing Nucleic Acids Res39 e141

64 YewCW and KumarSV (2012) Isolation and cloning ofmicroRNAs from recalcitrant plant tissues with small amounts oftotal RNA a step-by step approach Mol Biol Rep 391783ndash1790

65 KurthHM and MochizukiK (2009) 2rsquo-O-methylation stabilizesPiwi-associated small RNAs and ensures DNA elimination inTetrahymena RNA 15 675ndash685

66 EbhardtHA ThiEP WangMB and UnrauPJ (2005)Extensive 30 modification of plant small RNAs is modulated byhelper component-proteinase expression Proc Natl Acad SciUSA 102 13398ndash13403

67 WangJ CzechB CrunkA WallaceA MitrevaMHannonGJ and DavisRE (2011) Deep small RNA sequencingfrom the nematode Ascaris reveals conservation functionaldiversification and novel developmental profiles Genome Res 211462ndash1477

68 ZhuangF FuchsRT and RobbGB (2012) Small RNAexpression profiling by high-throughput sequencing implicationsof enzymatic manipulation J Nucleic Acids 2012 360358

69 FujimuraT and EstebanR (2012) Cap snatching of yeast L-Adouble-stranded RNA virus can operate in trans and requiresviral polymerase actively engaging in transcription J Biol Chem287 12797ndash12804

70 BessmanMJ WalshJD DunnCA SwaminathanJWeldonJE and ShenJ (2001) The gene ygdP associated withthe invasiveness of Escherichia coli K1 designates a Nudixhydrolase Orf176 active on adenosine (50)-pentaphospho-(50)-adenosine (Ap5A) J Biol Chem 276 37834ndash37838

71 DeanaA CelesnikH and BelascoJG (2008) The bacterialenzyme RppH triggers messenger RNA degradation by50 pyrophosphate removal Nature 451 355ndash358

72 BlewettN CollerJ and GoldstrohmA (2011) A quantitativeassay for measuring mRNA decapping by splinted ligation reversetranscription polymerase chain reaction qSL-RT-PCR RNA 17535ndash543

73 KrogerC DillonSC CameronAD PapenfortKSivasankaranSK HokampK ChaoY SittkaA HebrardMHandlerK et al (2012) The transcriptional landscape and smallRNAs of Salmonella enterica serovar Typhimurium Proc NatlAcad Sci USA 109 E1277ndashE1286

74 EnglandTE and UhlenbeckOC (1978) Enzymaticoligoribonucleotide synthesis with T4 RNA ligase Biochemistry17 2069ndash2076

75 SorefanK PaisH HallAE KozomaraA Griffiths-JonesSMoultonV and DalmayT (2012) Reducing ligation bias of smallRNAs in libraries for next generation sequencing Silence 3 4

76 ZhelkovskyAM and McReynoldsLA (2011) Simple andefficient synthesis of 50 pre-adenylated DNA using thermostableRNA ligase Nucleic Acids Res 39 e117

77 McCoyMI and GumportRI (1980) T4 ribonucleic acid ligasejoins single-strand oligo(deoxyribonucleotides) Biochemistry 19635ndash642

12 Nucleic Acids Research 2013

78 AlonS VigneaultF EminagaS ChristodoulouDCSeidmanJG ChurchGM and EisenbergE (2011) Barcodingbias in high-throughput multiplex sequencing of miRNA GenomeRes 21 1506ndash1511

79 SunG WuX WangJ LiH LiX GaoH RossiJ andYenY (2011) A bias-reducing strategy in profiling small RNAsusing Solexa RNA 17 2256ndash2262

80 Van NieuwerburghF SoetaertS PodshivalovaK Ay-LinWangE SchafferL DeforceD SalomonDR HeadSR andOrdoukhanianP (2011) Quantitative bias in Illumina TruSeq anda novel post amplification barcoding strategy for multiplexedDNA and small RNA deep sequencing PLoS One 6 e26969

81 VigneaultF Ter-OvanesyanD AlonS EminagaSC ChristodoulouD SeidmanJG EisenbergE and MChurchG (2012) High-throughput multiplex sequencingof miRNA Curr Protoc Hum Genet Chapter 11 Unit 1112 11-10

82 PakJ and FireA (2007) Distinct populations of primary andsecondary effectors during RNAi in C elegans Science 315241ndash244

83 ZhuYY MachlederEM ChenchikA LiR and SiebertPD(2001) Reverse transcriptase template switching a SMARTapproach for full-length cDNA library constructionBiotechniques 30 892ndash897

84 LevinJZ YassourM AdiconisX NusbaumCThompsonDA FriedmanN GnirkeA and RegevA (2010)Comprehensive comparative analysis of strand-specific RNAsequencing methods Nat Methods 7 709ndash715

85 AndoY BurroughsAM KawanoM de HoonMJ andHayashizakiY (2012) Targeted methods to improve small RNAprofiles generated by deep sequencing In MallickB andGhoshZ (eds) Regulatory RNAs Basics Methods andApplications Springer-Verlag Heidelberg Germany pp 253ndash271

86 KawanoM KawazuC LizioM KawajiH CarninciPSuzukiH and HayashizakiY (2010) Reduction of non-insertsequence reads by dimer eliminator LNA oligonucleotide forsmall RNA deep sequencing Biotechniques 49 751ndash755

87 KapranovP OzsolakF and MilosPM (2012) Profiling of shortRNAs using Helicos single-molecule sequencing Methods MolBiol 822 219ndash232

88 KanagawaT (2003) Bias and artifacts in multitemplatepolymerase chain reactions (PCR) J Biosci Bioeng 96 317ndash323

89 DayDJ SpeiserPW SchulzeE BettendorfM FitnessJBaranyF and WhitePC (1996) Identification of non-amplifyingCYP21 genes when using PCR-based diagnosis of 21-hydroxylasedeficiency in congenital adrenal hyperplasia (CAH) affectedpedigrees Hum Mol Genet 5 2039ndash2048

90 AirdD RossMG ChenWS DanielssonM FennellTRussC JaffeDB NusbaumC and GnirkeA (2011) Analyzingand minimizing PCR amplification bias in Illumina sequencinglibraries Genome Biol 12 R18

91 UllahMF and AatifM (2009) The footprints of cancerdevelopment cancer biomarkers Cancer Treat Rev 35 193ndash200

92 MoMH ChenL FuY WangW and FuSW (2012) Cell-freeCirculating miRNA Biomarkers in Cancer J Cancer 3 432ndash448

93 ZenK and ZhangCY (2012) Circulating microRNAs a novelclass of biomarkers to diagnose and monitor human cancersMed Res Rev 32 326ndash348

94 KimT and ReitmairA (2013) Non-Coding RNAs FunctionalAspects and Diagnostic Utility in Oncology Int J Mol Sci 144934ndash4968

95 PatnaikSK MallickR and YendamuriS (2010) Detection ofmicroRNAs in dried serum blots Anal Biochem 407 147ndash149

96 MitchellPS ParkinRK KrohEM FritzBR WymanSKPogosova-AgadjanyanEL PetersonA NoteboomJOrsquoBriantKC AllenA et al (2008) Circulating microRNAs as

stable blood-based markers for cancer detection Proc NatlAcad Sci USA 105 10513ndash10518

97 CieslaM SkrzypekK KozakowskaM LobodaAJozkowiczA and DulakJ (2011) MicroRNAs as biomarkers ofdisease onset Anal Bioanal Chem 401 2051ndash2061

98 OzsolakF (2012) Third-generation sequencing techniques andapplications to drug discovery Expert Opin Drug Discov 7231ndash243

99 TsongalisGJ CalinG CordelierP CroceC MonzonF andSzafranska-SchwarzbachAE (2013) MicroRNA analysis is itready for prime time Clin Chem 59 343ndash347

100 AmiraN Cancel-TassinG BernardiniS Cochand-PriolletBBittardH ManginP FournierG LatilA and CussenotO(2004) Expression in bladder transitional cell carcinoma by real-time quantitative reverse transcription polymerase chain reactionarray of 65 genes at the tumor suppressor locus 9q341-2identification of 5 candidates tumor suppressor genes Int JCancer 111 539ndash542

101 ZhangX (2007) Biomarker validation movement towardspersonalized medicine Expert Rev Mol Diagn 7 469ndash471

102 SchmittgenTD LeeEJ and JiangJ (2008) High-throughputreal-time PCR Methods Mol Biol 429 89ndash98

103 BenesV and CastoldiM (2010) Expression profiling ofmicroRNA using real-time quantitative PCR how to use it andwhat is available Methods 50 244ndash249

104 BustinSA and NolanT (2004) Pitfalls of quantitative real-timereverse-transcription polymerase chain reaction J Biomol Tech15 155ndash166

105 BeitzingerM and MeisterG (2011) Experimental identificationof microRNA targets by immunoprecipitation of Argonauteprotein complexes Methods Mol Biol 732 153ndash167

106 FiedlerSD CarlettiMZ and ChristensonLK (2010)Quantitative RT-PCR methods for mature microRNA expressionanalysis Methods Mol Biol 630 49ndash64

107 ArefianE KianiJ SoleimaniM ShariatiSA Aghaee-BakhtiariSH AtashiA GheisariY AhmadbeigiN Banaei-MoghaddamAM NaderiM et al (2011) Analysis ofmicroRNA signatures using size-coded ligation-mediated PCRNucleic Acids Res 39 e80

108 JonstrupSP KochJ and KjemsJ (2006) A microRNAdetection system based on padlock probes and rolling circleamplification RNA 12 1747ndash1752

109 LatorraD ArarK and HurleyJM (2003) Designconsiderations and effects of LNA in PCR primers Mol CellProbes 17 253ndash259

110 BustinSA BenesV GarsonJA HellemansJ HuggettJKubistaM MuellerR NolanT PfafflMW ShipleyGLet al (2009) The MIQE guidelines minimum information forpublication of quantitative real-time PCR experiments ClinChem 55 611ndash622

111 TrapnellC WilliamsBA PerteaG MortazaviA KwanGvan BarenMJ SalzbergSL WoldBJ and PachterL (2010)Transcript assembly and quantification by RNA-Seq revealsunannotated transcripts and isoform switching during celldifferentiation Nat Biotechnol 28 511ndash515

112 TrapnellC PachterL and SalzbergSL (2009) TopHatdiscovering splice junctions with RNA-Seq Bioinformatics 251105ndash1111

113 ListerR OrsquoMalleyRC Tonti-FilippiniJ GregoryBDBerryCC MillarAH and EckerJR (2008) Highly integratedsingle-base resolution maps of the epigenome in ArabidopsisCell 133 523ndash536

114 PolidorosAN PasentsisK and TsaftarisAS (2006) Rollingcircle amplification-RACE a method for simultaneous isolationof 50 and 30 cDNA ends from amplified cDNA templatesBiotechniques 41 35ndash42

Nucleic Acids Research 2013 13

Page 12: SURVEY AND SUMMARY Biases in small RNA deep sequencing data

35 WickensM and KwakJE (2008) Molecular biology A tail talefor U Science 319 1344ndash1345

36 MunafoDB and RobbGB (2010) Optimization of enzymaticreaction conditions for generating representative pools of cDNAfrom small RNA RNA 16 2537ndash2552

37 SarkarN (1997) Polyadenylation of mRNA in prokaryotesAnnu Rev Biochem 66 173ndash197

38 Yehudai-ResheffS and SchusterG (2000) Characterization of theEcoli poly(A) polymerase nucleotide specificity RNA-bindingaffinities and RNA structure dependence Nucleic Acids Res 281139ndash1144

39 RaabeCA HoeCH RandauG BrosiusJ TangTH andRozhdestvenskyTS (2011) The rocks and shallows of deep RNAsequencing Examples in the Vibrio cholerae RNome RNA 171357ndash1366

40 RaabeCA SanchezCP RandauG RobeckT SkryabinBVChinniSV KubeM ReinhardtR NgGH ManickamRet al (2010) A global view of the nonprotein-codingtranscriptome in Plasmodium falciparum Nucleic Acids Res 38608ndash617

41 SharmaCM HoffmannS DarfeuilleF ReignierJ FindeissSSittkaA ChabasS ReicheK HackermullerJ ReinhardtRet al (2010) The primary transcriptome of the major humanpathogen Helicobacter pylori Nature 464 250ndash255

42 SharmaCM and VogelJ (2009) Experimental approaches forthe discovery and characterization of regulatory small RNACurr Opin Microbiol 12 536ndash546

43 YuB YangZ LiJ MinakhinaS YangM PadgettRWStewardR and ChenX (2005) Methylation as a crucial step inplant microRNA biogenesis Science 307 932ndash935

44 HouwingS KammingaLM BerezikovE CronemboldDGirardA van den ElstH FilippovDV BlaserH RazEMoensCB et al (2007) A role for Piwi and piRNAs in germcell maintenance and transposon silencing in Zebrafish Cell 12969ndash82

45 AckermannHW (2003) Bacteriophage observations andevolution Res Microbiol 154 245ndash251

46 RossmannMG MesyanzhinovVV ArisakaF andLeimanPG (2004) The bacteriophage T4 DNA injectionmachine Curr Opin Struct Biol 14 171ndash180

47 HoCK and ShumanS (2002) Bacteriophage T4 RNA ligase 2(gp241) exemplifies a family of RNA ligases found in allphylogenetic domains Proc Natl Acad Sci USA 9912709ndash12714

48 YinS HoCK and ShumanS (2003) Structure-function analysisof T4 RNA ligase 2 J Biol Chem 278 17601ndash17608

49 AravinA and TuschlT (2005) Identification and characterizationof small RNAs involved in RNA silencing FEBS Lett 5795830ndash5840

50 ViolletS FuchsRT MunafoDB ZhuangF and RobbGB(2011) T4 RNA ligase 2 truncated active site mutants improvedtools for RNA analysis BMC Biotechnol 11 72

51 UhlenbeckOC and GumportRI (1982) T4 RNA LigaseIn BoyerPD (ed) The Enzymes Nucleic Acids Part BAcademic Press Inc New York pp 31ndash58

52 HoCK WangLK LimaCD and ShumanS (2004) Structureand mechanism of RNA ligase Structure 12 327ndash339

53 El OmariK RenJ BirdLE BonaMK KlarmannGLeGriceSF and StammersDK (2006) Molecular architectureand ligand recognition determinants for T4 RNA ligase J BiolChem 281 1573ndash1579

54 StarkMR PleissJA DerasM ScaringeSA and RaderSD(2006) An RNA ligase-mediated method for the efficient creationof large synthetic RNAs RNA 12 2014ndash2019

55 CranstonJW SilberR MalathiVG and HurwitzJ (1974)Studies on ribonucleic acid ligase Characterization of anadenosine triphosphate-inorganic pyrophosphate exchangereaction and demonstration of an enzyme-adenylate complex withT4 bacteriophage-induced enzyme J Biol Chem 2497447ndash7456

56 HigginsNP GeballeAP SnopekTJ SuginoA andCozzarelliNR (1977) Bacteriophage T4 RNA ligase preparationof a physically homogeneous nuclease-free enzyme fromhyperproducing infected cells Nucleic Acids Res 4 3175ndash3186

57 PfefferS Lagos-QuintanaM and TuschlT (2005) Cloning ofsmall RNA molecules Curr Protoc Mol Biol Chapter 26 Unit26 24

58 BruceAG and UhlenbeckOC (1978) Reactions at the terminiof tRNA with T4 RNA ligase Nucleic Acids Res 5 3665ndash3677

59 VigneaultF SismourAM and ChurchGM (2008) EfficientmicroRNA capture and bar-coding via enzymatic oligonucleotideadenylation Nat Methods 5 777ndash779

60 LiuJM LivnyJ LawrenceMS KimballMD WaldorMKand CamilliA (2009) Experimental discovery of sRNAs in Vibriocholerae by direct cloning 5StRNA depletion and parallelsequencing Nucleic Acids Res 37 e46

61 KrugM and UhlenbeckOC (1982) Reversal of T4 RNA ligaseBiochemistry 21 1858ndash1864

62 HafnerM LandgrafP LudwigJ RiceA OjoT LinCHolochD LimC and TuschlT (2008) Identification ofmicroRNAs and other small regulatory RNAs using cDNAlibrary sequencing Methods 44 3ndash12

63 JayaprakashAD JabadoO BrownBD and SachidanandamR(2011) Identification and remediation of biases in the activity ofRNA ligases in small-RNA deep sequencing Nucleic Acids Res39 e141

64 YewCW and KumarSV (2012) Isolation and cloning ofmicroRNAs from recalcitrant plant tissues with small amounts oftotal RNA a step-by step approach Mol Biol Rep 391783ndash1790

65 KurthHM and MochizukiK (2009) 2rsquo-O-methylation stabilizesPiwi-associated small RNAs and ensures DNA elimination inTetrahymena RNA 15 675ndash685

66 EbhardtHA ThiEP WangMB and UnrauPJ (2005)Extensive 30 modification of plant small RNAs is modulated byhelper component-proteinase expression Proc Natl Acad SciUSA 102 13398ndash13403

67 WangJ CzechB CrunkA WallaceA MitrevaMHannonGJ and DavisRE (2011) Deep small RNA sequencingfrom the nematode Ascaris reveals conservation functionaldiversification and novel developmental profiles Genome Res 211462ndash1477

68 ZhuangF FuchsRT and RobbGB (2012) Small RNAexpression profiling by high-throughput sequencing implicationsof enzymatic manipulation J Nucleic Acids 2012 360358

69 FujimuraT and EstebanR (2012) Cap snatching of yeast L-Adouble-stranded RNA virus can operate in trans and requiresviral polymerase actively engaging in transcription J Biol Chem287 12797ndash12804

70 BessmanMJ WalshJD DunnCA SwaminathanJWeldonJE and ShenJ (2001) The gene ygdP associated withthe invasiveness of Escherichia coli K1 designates a Nudixhydrolase Orf176 active on adenosine (50)-pentaphospho-(50)-adenosine (Ap5A) J Biol Chem 276 37834ndash37838

71 DeanaA CelesnikH and BelascoJG (2008) The bacterialenzyme RppH triggers messenger RNA degradation by50 pyrophosphate removal Nature 451 355ndash358

72 BlewettN CollerJ and GoldstrohmA (2011) A quantitativeassay for measuring mRNA decapping by splinted ligation reversetranscription polymerase chain reaction qSL-RT-PCR RNA 17535ndash543

73 KrogerC DillonSC CameronAD PapenfortKSivasankaranSK HokampK ChaoY SittkaA HebrardMHandlerK et al (2012) The transcriptional landscape and smallRNAs of Salmonella enterica serovar Typhimurium Proc NatlAcad Sci USA 109 E1277ndashE1286

74 EnglandTE and UhlenbeckOC (1978) Enzymaticoligoribonucleotide synthesis with T4 RNA ligase Biochemistry17 2069ndash2076

75 SorefanK PaisH HallAE KozomaraA Griffiths-JonesSMoultonV and DalmayT (2012) Reducing ligation bias of smallRNAs in libraries for next generation sequencing Silence 3 4

76 ZhelkovskyAM and McReynoldsLA (2011) Simple andefficient synthesis of 50 pre-adenylated DNA using thermostableRNA ligase Nucleic Acids Res 39 e117

77 McCoyMI and GumportRI (1980) T4 ribonucleic acid ligasejoins single-strand oligo(deoxyribonucleotides) Biochemistry 19635ndash642

12 Nucleic Acids Research 2013

78 AlonS VigneaultF EminagaS ChristodoulouDCSeidmanJG ChurchGM and EisenbergE (2011) Barcodingbias in high-throughput multiplex sequencing of miRNA GenomeRes 21 1506ndash1511

79 SunG WuX WangJ LiH LiX GaoH RossiJ andYenY (2011) A bias-reducing strategy in profiling small RNAsusing Solexa RNA 17 2256ndash2262

80 Van NieuwerburghF SoetaertS PodshivalovaK Ay-LinWangE SchafferL DeforceD SalomonDR HeadSR andOrdoukhanianP (2011) Quantitative bias in Illumina TruSeq anda novel post amplification barcoding strategy for multiplexedDNA and small RNA deep sequencing PLoS One 6 e26969

81 VigneaultF Ter-OvanesyanD AlonS EminagaSC ChristodoulouD SeidmanJG EisenbergE and MChurchG (2012) High-throughput multiplex sequencingof miRNA Curr Protoc Hum Genet Chapter 11 Unit 1112 11-10

82 PakJ and FireA (2007) Distinct populations of primary andsecondary effectors during RNAi in C elegans Science 315241ndash244

83 ZhuYY MachlederEM ChenchikA LiR and SiebertPD(2001) Reverse transcriptase template switching a SMARTapproach for full-length cDNA library constructionBiotechniques 30 892ndash897

84 LevinJZ YassourM AdiconisX NusbaumCThompsonDA FriedmanN GnirkeA and RegevA (2010)Comprehensive comparative analysis of strand-specific RNAsequencing methods Nat Methods 7 709ndash715

85 AndoY BurroughsAM KawanoM de HoonMJ andHayashizakiY (2012) Targeted methods to improve small RNAprofiles generated by deep sequencing In MallickB andGhoshZ (eds) Regulatory RNAs Basics Methods andApplications Springer-Verlag Heidelberg Germany pp 253ndash271

86 KawanoM KawazuC LizioM KawajiH CarninciPSuzukiH and HayashizakiY (2010) Reduction of non-insertsequence reads by dimer eliminator LNA oligonucleotide forsmall RNA deep sequencing Biotechniques 49 751ndash755

87 KapranovP OzsolakF and MilosPM (2012) Profiling of shortRNAs using Helicos single-molecule sequencing Methods MolBiol 822 219ndash232

88 KanagawaT (2003) Bias and artifacts in multitemplatepolymerase chain reactions (PCR) J Biosci Bioeng 96 317ndash323

89 DayDJ SpeiserPW SchulzeE BettendorfM FitnessJBaranyF and WhitePC (1996) Identification of non-amplifyingCYP21 genes when using PCR-based diagnosis of 21-hydroxylasedeficiency in congenital adrenal hyperplasia (CAH) affectedpedigrees Hum Mol Genet 5 2039ndash2048

90 AirdD RossMG ChenWS DanielssonM FennellTRussC JaffeDB NusbaumC and GnirkeA (2011) Analyzingand minimizing PCR amplification bias in Illumina sequencinglibraries Genome Biol 12 R18

91 UllahMF and AatifM (2009) The footprints of cancerdevelopment cancer biomarkers Cancer Treat Rev 35 193ndash200

92 MoMH ChenL FuY WangW and FuSW (2012) Cell-freeCirculating miRNA Biomarkers in Cancer J Cancer 3 432ndash448

93 ZenK and ZhangCY (2012) Circulating microRNAs a novelclass of biomarkers to diagnose and monitor human cancersMed Res Rev 32 326ndash348

94 KimT and ReitmairA (2013) Non-Coding RNAs FunctionalAspects and Diagnostic Utility in Oncology Int J Mol Sci 144934ndash4968

95 PatnaikSK MallickR and YendamuriS (2010) Detection ofmicroRNAs in dried serum blots Anal Biochem 407 147ndash149

96 MitchellPS ParkinRK KrohEM FritzBR WymanSKPogosova-AgadjanyanEL PetersonA NoteboomJOrsquoBriantKC AllenA et al (2008) Circulating microRNAs as

stable blood-based markers for cancer detection Proc NatlAcad Sci USA 105 10513ndash10518

97 CieslaM SkrzypekK KozakowskaM LobodaAJozkowiczA and DulakJ (2011) MicroRNAs as biomarkers ofdisease onset Anal Bioanal Chem 401 2051ndash2061

98 OzsolakF (2012) Third-generation sequencing techniques andapplications to drug discovery Expert Opin Drug Discov 7231ndash243

99 TsongalisGJ CalinG CordelierP CroceC MonzonF andSzafranska-SchwarzbachAE (2013) MicroRNA analysis is itready for prime time Clin Chem 59 343ndash347

100 AmiraN Cancel-TassinG BernardiniS Cochand-PriolletBBittardH ManginP FournierG LatilA and CussenotO(2004) Expression in bladder transitional cell carcinoma by real-time quantitative reverse transcription polymerase chain reactionarray of 65 genes at the tumor suppressor locus 9q341-2identification of 5 candidates tumor suppressor genes Int JCancer 111 539ndash542

101 ZhangX (2007) Biomarker validation movement towardspersonalized medicine Expert Rev Mol Diagn 7 469ndash471

102 SchmittgenTD LeeEJ and JiangJ (2008) High-throughputreal-time PCR Methods Mol Biol 429 89ndash98

103 BenesV and CastoldiM (2010) Expression profiling ofmicroRNA using real-time quantitative PCR how to use it andwhat is available Methods 50 244ndash249

104 BustinSA and NolanT (2004) Pitfalls of quantitative real-timereverse-transcription polymerase chain reaction J Biomol Tech15 155ndash166

105 BeitzingerM and MeisterG (2011) Experimental identificationof microRNA targets by immunoprecipitation of Argonauteprotein complexes Methods Mol Biol 732 153ndash167

106 FiedlerSD CarlettiMZ and ChristensonLK (2010)Quantitative RT-PCR methods for mature microRNA expressionanalysis Methods Mol Biol 630 49ndash64

107 ArefianE KianiJ SoleimaniM ShariatiSA Aghaee-BakhtiariSH AtashiA GheisariY AhmadbeigiN Banaei-MoghaddamAM NaderiM et al (2011) Analysis ofmicroRNA signatures using size-coded ligation-mediated PCRNucleic Acids Res 39 e80

108 JonstrupSP KochJ and KjemsJ (2006) A microRNAdetection system based on padlock probes and rolling circleamplification RNA 12 1747ndash1752

109 LatorraD ArarK and HurleyJM (2003) Designconsiderations and effects of LNA in PCR primers Mol CellProbes 17 253ndash259

110 BustinSA BenesV GarsonJA HellemansJ HuggettJKubistaM MuellerR NolanT PfafflMW ShipleyGLet al (2009) The MIQE guidelines minimum information forpublication of quantitative real-time PCR experiments ClinChem 55 611ndash622

111 TrapnellC WilliamsBA PerteaG MortazaviA KwanGvan BarenMJ SalzbergSL WoldBJ and PachterL (2010)Transcript assembly and quantification by RNA-Seq revealsunannotated transcripts and isoform switching during celldifferentiation Nat Biotechnol 28 511ndash515

112 TrapnellC PachterL and SalzbergSL (2009) TopHatdiscovering splice junctions with RNA-Seq Bioinformatics 251105ndash1111

113 ListerR OrsquoMalleyRC Tonti-FilippiniJ GregoryBDBerryCC MillarAH and EckerJR (2008) Highly integratedsingle-base resolution maps of the epigenome in ArabidopsisCell 133 523ndash536

114 PolidorosAN PasentsisK and TsaftarisAS (2006) Rollingcircle amplification-RACE a method for simultaneous isolationof 50 and 30 cDNA ends from amplified cDNA templatesBiotechniques 41 35ndash42

Nucleic Acids Research 2013 13

Page 13: SURVEY AND SUMMARY Biases in small RNA deep sequencing data

78 AlonS VigneaultF EminagaS ChristodoulouDCSeidmanJG ChurchGM and EisenbergE (2011) Barcodingbias in high-throughput multiplex sequencing of miRNA GenomeRes 21 1506ndash1511

79 SunG WuX WangJ LiH LiX GaoH RossiJ andYenY (2011) A bias-reducing strategy in profiling small RNAsusing Solexa RNA 17 2256ndash2262

80 Van NieuwerburghF SoetaertS PodshivalovaK Ay-LinWangE SchafferL DeforceD SalomonDR HeadSR andOrdoukhanianP (2011) Quantitative bias in Illumina TruSeq anda novel post amplification barcoding strategy for multiplexedDNA and small RNA deep sequencing PLoS One 6 e26969

81 VigneaultF Ter-OvanesyanD AlonS EminagaSC ChristodoulouD SeidmanJG EisenbergE and MChurchG (2012) High-throughput multiplex sequencingof miRNA Curr Protoc Hum Genet Chapter 11 Unit 1112 11-10

82 PakJ and FireA (2007) Distinct populations of primary andsecondary effectors during RNAi in C elegans Science 315241ndash244

83 ZhuYY MachlederEM ChenchikA LiR and SiebertPD(2001) Reverse transcriptase template switching a SMARTapproach for full-length cDNA library constructionBiotechniques 30 892ndash897

84 LevinJZ YassourM AdiconisX NusbaumCThompsonDA FriedmanN GnirkeA and RegevA (2010)Comprehensive comparative analysis of strand-specific RNAsequencing methods Nat Methods 7 709ndash715

85 AndoY BurroughsAM KawanoM de HoonMJ andHayashizakiY (2012) Targeted methods to improve small RNAprofiles generated by deep sequencing In MallickB andGhoshZ (eds) Regulatory RNAs Basics Methods andApplications Springer-Verlag Heidelberg Germany pp 253ndash271

86 KawanoM KawazuC LizioM KawajiH CarninciPSuzukiH and HayashizakiY (2010) Reduction of non-insertsequence reads by dimer eliminator LNA oligonucleotide forsmall RNA deep sequencing Biotechniques 49 751ndash755

87 KapranovP OzsolakF and MilosPM (2012) Profiling of shortRNAs using Helicos single-molecule sequencing Methods MolBiol 822 219ndash232

88 KanagawaT (2003) Bias and artifacts in multitemplatepolymerase chain reactions (PCR) J Biosci Bioeng 96 317ndash323

89 DayDJ SpeiserPW SchulzeE BettendorfM FitnessJBaranyF and WhitePC (1996) Identification of non-amplifyingCYP21 genes when using PCR-based diagnosis of 21-hydroxylasedeficiency in congenital adrenal hyperplasia (CAH) affectedpedigrees Hum Mol Genet 5 2039ndash2048

90 AirdD RossMG ChenWS DanielssonM FennellTRussC JaffeDB NusbaumC and GnirkeA (2011) Analyzingand minimizing PCR amplification bias in Illumina sequencinglibraries Genome Biol 12 R18

91 UllahMF and AatifM (2009) The footprints of cancerdevelopment cancer biomarkers Cancer Treat Rev 35 193ndash200

92 MoMH ChenL FuY WangW and FuSW (2012) Cell-freeCirculating miRNA Biomarkers in Cancer J Cancer 3 432ndash448

93 ZenK and ZhangCY (2012) Circulating microRNAs a novelclass of biomarkers to diagnose and monitor human cancersMed Res Rev 32 326ndash348

94 KimT and ReitmairA (2013) Non-Coding RNAs FunctionalAspects and Diagnostic Utility in Oncology Int J Mol Sci 144934ndash4968

95 PatnaikSK MallickR and YendamuriS (2010) Detection ofmicroRNAs in dried serum blots Anal Biochem 407 147ndash149

96 MitchellPS ParkinRK KrohEM FritzBR WymanSKPogosova-AgadjanyanEL PetersonA NoteboomJOrsquoBriantKC AllenA et al (2008) Circulating microRNAs as

stable blood-based markers for cancer detection Proc NatlAcad Sci USA 105 10513ndash10518

97 CieslaM SkrzypekK KozakowskaM LobodaAJozkowiczA and DulakJ (2011) MicroRNAs as biomarkers ofdisease onset Anal Bioanal Chem 401 2051ndash2061

98 OzsolakF (2012) Third-generation sequencing techniques andapplications to drug discovery Expert Opin Drug Discov 7231ndash243

99 TsongalisGJ CalinG CordelierP CroceC MonzonF andSzafranska-SchwarzbachAE (2013) MicroRNA analysis is itready for prime time Clin Chem 59 343ndash347

100 AmiraN Cancel-TassinG BernardiniS Cochand-PriolletBBittardH ManginP FournierG LatilA and CussenotO(2004) Expression in bladder transitional cell carcinoma by real-time quantitative reverse transcription polymerase chain reactionarray of 65 genes at the tumor suppressor locus 9q341-2identification of 5 candidates tumor suppressor genes Int JCancer 111 539ndash542

101 ZhangX (2007) Biomarker validation movement towardspersonalized medicine Expert Rev Mol Diagn 7 469ndash471

102 SchmittgenTD LeeEJ and JiangJ (2008) High-throughputreal-time PCR Methods Mol Biol 429 89ndash98

103 BenesV and CastoldiM (2010) Expression profiling ofmicroRNA using real-time quantitative PCR how to use it andwhat is available Methods 50 244ndash249

104 BustinSA and NolanT (2004) Pitfalls of quantitative real-timereverse-transcription polymerase chain reaction J Biomol Tech15 155ndash166

105 BeitzingerM and MeisterG (2011) Experimental identificationof microRNA targets by immunoprecipitation of Argonauteprotein complexes Methods Mol Biol 732 153ndash167

106 FiedlerSD CarlettiMZ and ChristensonLK (2010)Quantitative RT-PCR methods for mature microRNA expressionanalysis Methods Mol Biol 630 49ndash64

107 ArefianE KianiJ SoleimaniM ShariatiSA Aghaee-BakhtiariSH AtashiA GheisariY AhmadbeigiN Banaei-MoghaddamAM NaderiM et al (2011) Analysis ofmicroRNA signatures using size-coded ligation-mediated PCRNucleic Acids Res 39 e80

108 JonstrupSP KochJ and KjemsJ (2006) A microRNAdetection system based on padlock probes and rolling circleamplification RNA 12 1747ndash1752

109 LatorraD ArarK and HurleyJM (2003) Designconsiderations and effects of LNA in PCR primers Mol CellProbes 17 253ndash259

110 BustinSA BenesV GarsonJA HellemansJ HuggettJKubistaM MuellerR NolanT PfafflMW ShipleyGLet al (2009) The MIQE guidelines minimum information forpublication of quantitative real-time PCR experiments ClinChem 55 611ndash622

111 TrapnellC WilliamsBA PerteaG MortazaviA KwanGvan BarenMJ SalzbergSL WoldBJ and PachterL (2010)Transcript assembly and quantification by RNA-Seq revealsunannotated transcripts and isoform switching during celldifferentiation Nat Biotechnol 28 511ndash515

112 TrapnellC PachterL and SalzbergSL (2009) TopHatdiscovering splice junctions with RNA-Seq Bioinformatics 251105ndash1111

113 ListerR OrsquoMalleyRC Tonti-FilippiniJ GregoryBDBerryCC MillarAH and EckerJR (2008) Highly integratedsingle-base resolution maps of the epigenome in ArabidopsisCell 133 523ndash536

114 PolidorosAN PasentsisK and TsaftarisAS (2006) Rollingcircle amplification-RACE a method for simultaneous isolationof 50 and 30 cDNA ends from amplified cDNA templatesBiotechniques 41 35ndash42

Nucleic Acids Research 2013 13