Wikipedia RAN Seq

RNA-SeqFrom Wikipedia, the free encyclopedia

RNA-seq (RNA Sequencing), also called "Whole Transcriptome Shotgun Sequencing"[1] ("WTSS"), is a technology that uses the capabilities of next-generation sequencing toreveal a snapshot of RNA presence and quantity from a genome at a given moment in

time.[2]

Contents

1 Introduction

2 Methods

2.1 RNA 'Poly(A)' Library

2.2 Small RNA/Non-coding RNA sequencing

2.3 Direct RNA Sequencing

2.4 Transcriptome Assembly

2.5 Experimental Considerations

3 Analysis

3.1 Gene expression

3.2 Single nucleotide variation discovery

3.3 Post-transcriptional SNVs

3.4 Fusion gene detection

3.5 Coexpression Networks

4 Application to Genomic Medicine

4.1 History

4.2 ENCODE and TCGA

5 External links

6 References

Introduction

The transcriptome of a cell is dynamic; it continually changes as opposed to a staticgenome. The recent developments of Next-Generation Sequencing (NGS) allow forincreased base coverage of a DNA sequence, as well as higher sample throughput. Thisfacilitates sequencing of the RNA transcripts in a cell, providing the ability to look atalternative gene spliced transcripts, post-transcriptional changes, gene fusion,

mutations/SNPs and changes in gene expression.[3] In addition to mRNA transcripts,RNA-Seq can look at different populations of RNA to include total RNA, small RNA,

RNA-Seq - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/RNA-Seq

1 of 17 05/07/2014 06:13 PM

such as miRNA, tRNA, and ribosomal profiling.[4] RNA-Seq can also be used todetermine exon/intron boundaries and verify or amend previously annotated 5’ and 3’gene boundaries. Ongoing RNA-Seq research includes observing cellular pathway

alterations during infection,[5] and gene expression level changes in cancer studies.[6]

Prior to NGS, transcriptomics and gene expression studies were previously done withexpression microarrays, which contain thousands of DNA sequences that probe for amatch in the target sequence, making available a profile of all transcripts beingexpressed. This was later done with Serial Analysis of Gene Expression (SAGE).

One deficiency with microarrays that makes RNA-Seq more attractive has been limitedcoverage; such arrays target the identification of known common alleles that representapproximately 500,000 to 2,000,000 SNPs of the more than 10,000,000 in the

genome.[7] As such, libraries aren’t usually available to detect and evaluate rare allele

variant transcripts,[8] and the arrays are only as good as the SNP databases they’re

designed from, so they have limited application for research purposes.[9] Many cancersfor example are caused by rare <1% mutations and would go undetected. However,arrays still have a place for targeted identification of already known common allelevariants, making them ideal for regulatory body-approved diagnostics such as cysticfibrosis.

Methods

RNA 'Poly(A)' Library

See also: Polyadenylation

Creation of a sequence library can change from platform

to platform in high throughput sequencing,[10] whereeach has several kits designed to build different types oflibraries and adapting the resulting sequences to thespecific requirements of their instruments. However, dueto the nature of the template being analyzed, there arecommonalities within each technology. Frequently, inmRNA analysis the 3' polyadenylated (poly(A)) tail istargeted in order to ensure that coding RNA is separatedfrom noncoding RNA. This can be accomplished simplywith poly (T) oligos covalently attached to a givensubstrate. Presently many studies utilize magnetic beads

for this step.[1][11] The Protocol Online website[12]

provides a list of several protocols relating to mRNAisolation.

Studies including portions of the transcriptome outsidepoly(A) RNAs have shown that when using poly(T) magnetic beads, the flow-throughRNA (non-poly(A) RNA) can yield important noncoding RNA gene discovery which

would have otherwise gone unnoticed.[1] Also, since ribosomal RNA represents over90% of the RNA within a given cell, studies have shown that its removal via probe


2 of 17 05/07/2014 06:13 PM

RNA-seq mapping of shortreads in exon-exon junctions.

hybridization increases the capacity to retrieve data from the remaining portion of thetranscriptome.

The next step is reverse transcription. Due to the 5' bias of randomly primed-reverse

transcription as well as secondary structures influencing primer binding sites,[11]

hydrolysis of RNA into 200-300 nucleotides prior to reverse transcription reduces bothproblems simultaneously. However, there are trade-offs with this method wherealthough the overall body of the transcripts are efficiently converted to DNA, the 5' and3' ends are less so. Depending on the aim of the study, researchers may choose to applyor ignore this step.

Once the cDNA is synthesized it can be further fragmented to reach the desiredfragment length of the sequencing system.

Small RNA/Non-coding RNA sequencing

When sequencing RNA other than mRNA, the library preparation is modified. Thecellular RNA is selected based on the desired size range. For small RNA targets, suchas miRNA, the RNA is isolated through size selection. This can be performed with asize exclusion gel, through size selection magnetic beads, or with a commerciallydeveloped kit. Once isolated, linkers are added to the 3’ and 5’ end then purified. Thefinal step is cDNA generation through reverse transcription.

Direct RNA Sequencing

As converting RNA into cDNA using reversetranscriptase has been shown to introduce biases andartifacts that may interfere with both the proper

characterization and quantification of transcripts,[13]

single molecule Direct RNA Sequencing (DRSTM)technology was under development by Helicos (nowbankrupt). DRSTM sequences RNA molecules directly ina massively-parallel manner without RNA conversion tocDNA or other biasing sample manipulations such asligation and amplification.

Transcriptome Assembly

See also: Sequence alignment software § Short-Read Sequence Alignment

Two different assembly methods are used for producing a transcriptome from rawsequence reads: de-novo and genome-guided.

The first approach does not rely on the presence of a reference genome in order toreconstruct the nucleotide sequence. Due to the small size of the short reads de novoassembly may be difficult though some software does exist ( Velvet (algorithm), Oases(http://www.ebi.ac.uk/~zerbino/oases/), and Trinity

(http://trinityrnaseq.sourceforge.net)[14] to mention a few), as there cannot be largeoverlaps between each read needed to easily reconstruct the original sequences. The


3 of 17 05/07/2014 06:13 PM

deep coverage also makes the computing power to track all the possible alignments

prohibitive.[15] This deficit can be improved using longer sequences obtained from thesame sample using other techniques such as Sanger sequencing, and using larger readsas a "skeleton" or a "template" to help assemble reads in difficult regions (e.g. regionswith repetitive sequences).

An “easier” and relatively computationally cheaper approach is that of aligning themillions of reads to a "reference genome". There are many tools available for aligninggenomic reads to a reference genome (sequence alignment tools), however, specialattention is needed when alignment of a transcriptome to a genome, mainly whendealing with genes having intronic regions. Several software packages exist for shortread alignment, and recently specialized algorithms for transcriptome alignment have

been developed, e.g. Bowtie for RNA-seq short read alignment,[16] TopHat for aligning

reads to a reference genome to discover splice sites,[17] Cufflinks to assemble the

transcripts and compare/merge them with others,[18] or FANSe.[19] These tools can

also be combined to form a comprehensive system.[20]

Although numerous solutions to the assembly quest have been proposed, there is stilllots of room for improvement given the resulting variability of the approaches. A groupfrom the Center for Computational Biology at the East China Normal University inShanghai compared different de novo and genome-guided approaches for RNA-Seqassembly. They noted that, although most of the problems can be solved using graphtheory approaches, there is still a consistent level of variability in all of them. Somealgorithms outperformed the common standards for some species while still strugglingfor others. The authors suggest that the “most reliable” assembly could be then

obtained by combining different approaches.[21] Interestingly, these results areconsistent with NGS-genome data obtained in a recent contest called Assemblathonwhere 21 contestants analyzed sequencing data from three different vertebrates (fish,snake and bird) and handed in a total of 43 assemblies. Using a metric made of 100different measures for each assembly, the reviewers concluded that 1) assembly qualitycan vary a lot depending on which metric is used and 2) assemblies that scored well in

one species didn’t really perform well in the other species.[22]

As discussed above, sequence libraries are created by extracting mRNA using itspoly(A) tail, which is added to the mRNA molecule post-transcriptionally and thussplicing has taken place. Therefore, the created library and the short reads obtainedcannot come from intronic sequences, so library reads spanning the junction of two ormore exons will not align to the genome.

A possible method to work around this is to try to align the unaligned short reads usinga proxy genome generated with known exonic sequences. This need not cover wholeexons, only enough so that the short reads can match on both sides of the exon-exonjunction with minimum overlap. Some experimental protocols allow the production of

strand specific reads.[11]

Experimental Considerations

The information gathered when sequencing a sample's transcriptome in this way has


4 of 17 05/07/2014 06:13 PM

many of the same limitations and advantages as other RNA expression analysispipelines. The main pros and cons of this approach can be summarized as:

a) Tissue specificity: Gene expression is not uniform throughout an organism's cells, itis strongly dependent on the tissue type being measured; RNA-Seq, as any othersequencing technology that analyzes homogeneous samples, can provide a completesnapshot of all the transcripts being available at that precise moment in the cell. Thisapproach is unlikely to be biased like an oligonucleotide microarray approach thatinstead analyzes a selected number of previously defined transcripts.

b) Time dependent: During a cell's lifetime and context, its gene expression levelschange. As previously mentioned any single sequencing experiment will offerinformation regarding one point in time. Time course experiments are so far the onlysolution that would allow a complete overview of the circadian transcriptome so thatresearchers could obtain a precise description of the physiological changes happeningover time. However, this approach is unfeasible for patient samples since it is quiteimprobable that biopsies will be collected serially in short time intervals. A possiblework-around could be the use of urine, blood or saliva samples that won’t require anyinvasive procedure.

c) Coverage: coverage/depth can affect the mutations seen. Given that everything isexpression-centric, an allele might not be detected, either because it is not in thegenome, or because it is not being expressed. At the same time, RNA-seq can yieldadditional information rather than just the existence of a heterozygous gene as it canalso help in estimating the expression of each allele. In association studies, genotypesare associated to disease and expression levels can also be associated with disease.Using RNA-seq, we can measure the relationship between these two associatedvariables, that is, in what relation are each of the alleles being expressed.

The depth of sequencing required for specific applications can be extrapolated from a

pilot experiment.[23]

d) Subjectivity of the analysis: As described above, numerous attempts have been takento uniformly analyze the data. However, the results can vary due to the multitude ofalgorithms and pipelines available. Most of the approaches are correct, but have to betailored to the needs of the investigators in order to better capture the desired effect.This variability in methods, although in smaller scale, is still present in other RNAprofiling approaches where reagents, personnel and techniques can lead to similar,although statistically different, results. Because of this, care must be taken whendrawing conclusions from the sequencing experiment, as some information gatheredmight not be representative of the individual.

e) Data management: The main issue with NGS data is the volume of data produced.Microarray data occupy up to one thousand times less disk space than NGS datatherefore requiring smaller storage units. The high capacity storage units required byRNA-Seq data are, however, directly proportional to the volume of information that goeswith it. The payoff of “more complete” big scale datasets have to be evaluated prior tostarting the experiment.

f) Downstream interpretation of the data: Different layers of interpretations have to be


5 of 17 05/07/2014 06:13 PM

considered when analyzing RNA-Seq data. Biological, clinical and regulatory functionsof the results are what allow clinicians and investigators to draw meaningfulconclusions (i.e. the sequence of an RNA molecule presents, although identified withdifferent read depths, might not perfectly mirror the initial DNA sequence). An exampleof this would be during SNV discovery as the mutations discovered are more preciselythe mutations being expressed. Observing a homozygote location to a non-referenceallele in an organism does not necessarily mean that this is the individual's genotype, itcould just mean that the gene copy with the reference allele is not being expressed inthat tissue and/or at the time snapshot the sample was acquired.

Analysis

See also List of RNA-Seq bioinformatics tools

Gene expression

The characterization of gene expression in cells via measurement of mRNA levels haslong been of interest to researchers, both in terms of which genes are expressed inwhat tissues, and at what levels. Even though it has been shown that due to other posttranscriptional gene regulation events (such as RNA interference) there is notnecessarily always a strong correlation between the abundance of mRNA and the

related proteins,[24] measuring mRNA concentration levels is still a useful tool indetermining how the transcriptional machinery of the cell is affected in the presence ofexternal signals (e.g. drug treatment), or how cells differ between a healthy state and adiseased state.

Expression can be deduced via RNA-seq to the extent at which a sequence is retrieved.

Transcriptome studies in yeast [25] show that in this experimental setting, a fourfoldcoverage is required for amplicons to be classified and characterized as an expressedgene. When the transcriptome is fragmented prior to cDNA synthesis, the number ofreads corresponding to the particular exon normalized by its length in vivo yields gene

expression levels which correlate with those obtained through qPCR.[23] This isfrequently further normalized by the total number of mapped reads so that expressionlevels are expressed as Fragments Per Kilobase of transcript per Million mapped reads

(FPKM).[18]

The only way to be absolutely sure of the individual's mutations is to compare thetranscriptome sequences to the germline DNA sequence. This enables the distinction ofhomozygous genes versus skewed expression of one of the alleles and it can alsoprovide information about genes that were not expressed in the transcriptomic

experiment. An R-based statistical package known as CummeRbund[26] can be used togenerate expression comparison charts for visual analysis.

Single nucleotide variation discovery

See also: single nucleotide polymorphism


6 of 17 05/07/2014 06:13 PM

RNA-seq mapping of shortreads over exon-exonjunctions, depending onwhere each end maps to, itcould be defined a Trans or aCis event.

Transcriptome single nucleotide variation has been analyzed in maize on the Roche 454

sequencing platform.[27] Directly from the transcriptome analysis, around 7000 singlenucleotide polymorphisms (SNPs) were recognized. Following Sanger sequencevalidation, the researchers were able to conservatively obtain almost 5000 valid SNPscovering more than 2400 maize genes. RNA-seq is limited to transcribed regionshowever, since it will only discover sequence variations in exon regions. This missesmany subtle but important intron alleles that affect disease such as transcriptionregulators, leaving analysis to only large effectors. While some correlation existsbetween exon to intron variation, only whole genome sequencing would be able to

capture the source of all relevant SNPs.[28]

Post-transcriptional SNVs

Having the matching genomic and transcriptomic sequences of an individual can also

help in detecting post-transcriptional edits,[10] where, if the individual is homozygousfor a gene, but the gene's transcript has a different allele, then a post-transcriptionalmodification event is determined.

mRNA centric single nucleotide variants (SNVs) are generally not considered as arepresentative source of functional variation in cells, mainly due to the fact that thesemutations disappear with the mRNA molecule, however the fact that efficient DNAcorrection mechanisms do not apply to RNA molecules can cause them to appear more

often. This has been proposed as the source of certain prion diseases,[29] also known asTSE or transmissible spongiform encephalopathies.

Fusion gene detection

See also: Fusion gene

Caused by different structural modifications in thegenome, fusion genes have gained attention because of

their relationship with cancer.[30] The ability of RNA-seqto analyze a sample's whole transcriptome in anunbiased fashion makes it an attractive tool to find these

kinds of common events in cancer.[31]

The idea follows from the process of aligning the shorttranscriptomic reads to a reference genome. Most of theshort reads will fall within one complete exon, and asmaller but still large set would be expected to map toknown exon-exon junctions. The remaining unmappedshort reads would then be further analyzed to determinewhether they match an exon-exon junction where theexons come from different genes. This would be evidenceof a possible fusion event, however, because of thelength of the reads, this could prove to be very noisy. An alternative approach is to usepair-end reads, when a potentially large number of paired reads would map each end toa different exon, giving better coverage of these events (see figure). Nonetheless, the


7 of 17 05/07/2014 06:13 PM

end result consists of multiple and potentially novel combinations of genes providing anideal starting point for further validation.

Coexpression Networks

Coexpression networks are data-derived representations of genes behaving in a similar

way across tissues and experimental conditions.[32] Their main purpose lies inhypothesis generation and guilt-by-association approaches for inferring functions of

previously unknown genes.[32] RNASeq data has been recently used to infer genes

involved in specific pathways based on Pearson correlation, both in plants [33] and

mammals.[34] The main advantage of RNASeq data in this kind of analysis over themicroarray platforms is the capability to cover the entire transcriptome, thereforeallowing the possibility to unravel more complete representations of the generegulatory networks. Differential regulation of the splice isoforms of the same gene can

be detected and used to predict and their biological functions.[35] Weighted geneco-expression network analysis has been successfully used to identify co-expressionmodules and intramodular hub genes based on RNA seq data. Co-expression modulesmay corresponds to cell types or pathways. Highly connected intramodular hubs can beinterpreted as representatives of their respective module. Variance-StabilizingTransformation approaches for estimating correlation coefficients based on RNA seq

data have been proposed.[33]

Application to Genomic Medicine

History

The past five years have seen a flourishing of NGS-based methods for genome analysisleading to the discovery of a number of new mutations and fusion transcripts in cancer.RNA-Seq data could help researchers interpreting the “personalized transcriptome” sothat it will help understanding the transcriptomic changes happening therefore, ideally,identifying gene drivers for a disease. The feasibility of this approach is howeverdictated by the costs in term of money and time.

A basic search on PubMed reveals that the term RNA Seq, queried as “rna Seq ORRNA-Seq OR rna sequencing OR RNASeq” in order to capture the most common waysof phrasing it, gives 147.525 hits demonstrating the exponentially increasing usage rateof this technology. A few examples will be taken into consideration to explain thatRNA-Seq applications to the clinic have the potentials to significantly affect patient’slife and, on the other hand, requires a team of specialists (bioinformaticians,physicians/clinicians, basic researchers, technicians) to fully interpret the huge amountof data generated by this analysis.

As an example of excellent clinical applications, researchers at the Mayo Clinic used anRNA-Seq approach to identify differentially expressed transcripts between oral cancerand normal tissue samples. They also accurately evaluated the allelic imbalance (AI),ratio of the transcripts produced by the single alleles, within a subgroup of genes

involved in cell differentiation, adhesion, cell motility and muscle contraction[36]


8 of 17 05/07/2014 06:13 PM

identifying a unique transcriptomic and genomic signature in oral cancer patients.Novel insight on skin cancer (melanoma) also come from RNA-Seq of melanomapatients. This approach led to the identification of eleven novel gene fusion transcriptsoriginated from previously unknown chromosomal rearrangements. Twelve novelchimeric transcripts were also reported, including seven of those that confirmed

previously identified data in multiple melanoma samples.[37] Furthermore, thisapproach is not limited to cancer patients. RNA-Seq has been used to study otherimportant chronic diseases such as Alzheimer (AD) and diabetes. In the former case,Twine and colleagues compared the transcriptome of different lobes of deceased AD’spatient’s brain with the brain of healthy individuals identifying a lower number of splicevariants in AD’s patients and differential promoter usage of the APOE-001 and -002

isoforms in AD’s brains.[38] In the latter case, different groups showed the unicity of thebeta-cells transcriptome in diabetic patients in terms of transcripts accumulation and

differential promoter usage[39] and long non coding RNAs (lncRNAs) signature.[40]

Compared with microarrays, NGS technology has identified novel and low frequencyRNAs associated with disease processes. This advantage aids in the diagnosis andpossible future treatments of diseases, including cancer. For example, NGS technologyidentified several previously undocumented differentially-expressed transcripts in ratstreated with AFB1, a potent hepatocarcinogen. Nearly 50 new differentially-expressedtranscriptions were identified between the controls and AFB1-treated rats. Additionallypotential new exons were identified, including some that are responsive to AFB1. Thenext-generation sequencing pipeline identified more differential gene expressionscompared with microarrays, particularly when DESeq software was utilized. Cufflinksidentified two novel transcripts that were not previously annotated in the Ensembl

database; these transcripts were confirmed using cloning PCR.[41] Numerous otherstudies have demonstrated NGS's ability to detect aberrant mRNA and smallnon-coding RNA expression in disease processes above that provided by microarrays.The lower cost and higher throughput offered by NGS confers another advantage toresearchers.

The role of small non-coding RNAs in disease processes has also been explored inrecent years. For example, Han et al. (2011) examined microRNA expressiondifferences in bladder cancer patients in order to understand how changes anddysregulation in microRNA can influence mRNA expression and function. SeveralmicroRNAs were differentially expressed in the bladder cancer patients. Upregulationin the aberrant microRNAs was more common than downregulation in the cancerpatients. One of the upregulated microRNAs, hsa-miR-96, has been associated withcarcinogenesis, and several of the overexpressed microRNAs have also been observedin other cancers, including ovarian and cervical. Some of the downregulated

microRNAs in cancer samples were hypothesized to have inhibitory roles.[42]

ENCODE and TCGA

A lot of emphasis has been given to RNA-Seq data after the Encyclopedia of theregulatory elements (ENCODE) and The Cancer Genome Atlas (TCGA) projects have

used this approach to characterize dozens of cell lines[43] and thousands of primary

tumor samples,[44] respectively. The former aimed to identify genome-wide regulatory


9 of 17 05/07/2014 06:13 PM

regions in different cohort of cell lines and transcriptomic data are paramount in orderto understand the downstream effect of those epigenetic and genetic regulatory layers.The latter project, instead, aimed to collect and analyze thousands of patient’s samplesfrom 30 different tumor types in order to understand the underlying mechanisms ofmalignant transformation and progression. In this context RNA-Seq data provide aunique snapshot of the transcriptomic status of the disease and look at an unbiasedpopulation of transcripts that allows the identification of novel transcripts, fusiontranscripts and non-coding RNAs that could be undetected with different technologies.

External links

RNA-Seq for Everyone (http://rnaseq.uoregon.edu/index.html): a high-level guide

to designing and implementing an RNA-Seq experiment.

ChIPBase database (http://deepbase.sysu.edu.cn/chipbase/expression.php):

provides expression profiles of protein-coding genes and lncRNAs (lincRNAs) from

RNA-Seq data across 22 tissues.

Martin A. Perdacher (September 2011) Next-Generation Sequencing and its

Applications in RNA-Seq (http://aboutme.biobyte.org/wp-content/uploads/2011/10

/Next-Generation-Sequencing-and-its-Applications-in-RNA-Seq.pdf). Theory part of

the Bachelorthesis, Hagenberg.

The RNA-Seq Blog (http://www.rna-seqblog.com/)

OMICtools (http://omictools.com/mrna-seq/): a didactic directory for RNA-seq data

analysis.

References

^ a b c Ryan D. Morin, Matthew

Bainbridge, Anthony Fejes, Martin Hirst,

Martin Krzywinski, Trevor J. Pugh, Helen

McDonald, Richard Varhol, Steven J.M.

Jones, and Marco A. Marra. (2008).

"Profiling the HeLa S3 transcriptome

using randomly primed cDNA and

massively parallel short-read sequencing"

(http://www.bcgsc.ca/about/pubann

/biotechniques-publication-2008-44-8).

BioTechniques 45 (1): 81–94.

doi:10.2144/000112900 (http://dx.doi.org

/10.2144%2F000112900).

PMID 18611170

(https://www.ncbi.nlm.nih.gov/pubmed

1. /18611170).

^ Chu Y, Corey DR (August 2012). "RNA

sequencing: platform selection,

experimental design, and data

interpretation"

(https://www.ncbi.nlm.nih.gov

/pmc/articles/PMC3426205). Nucleic Acid

Ther 22 (4): 271–4.

doi:10.1089/nat.2012.0367

(http://dx.doi.org

/10.1089%2Fnat.2012.0367).

PMC 3426205


/pmc/articles/PMC3426205).

PMID 22830413

2.


10 of 17 05/07/2014 06:13 PM


/22830413).

^ Maher CA, Kumar-Sinha C, Cao X, et al.

(March 2009). "Transcriptome sequencing

to detect gene fusions in cancer"


/pmc/articles/PMC2725402). Nature 458

(7234): 97–101. doi:10.1038/nature07638

(http://dx.doi.org

/10.1038%2Fnature07638). PMC 2725402



PMID 19136943


/19136943).

3.

^ Ingolia NT, Brar GA, Rouskin S,

McGeachy AM, Weissman JS (August

2012). "The ribosome profiling strategy

for monitoring translation in vivo by deep

sequencing of ribosome-protected mRNA

fragments" (https://www.ncbi.nlm.nih.gov

/pmc/articles/PMC3535016). Nat Protoc 7

(8): 1534–50. doi:10.1038/nprot.2012.086

(http://dx.doi.org

/10.1038%2Fnprot.2012.086).

PMC 3535016



PMID 22836135


/22836135).

4.

^ Qian F, Chung L, Zheng W, et al. (2013).

"Identification of Genes Critical for

Resistance to Infection by West Nile Virus

Using RNA-Seq Analysis". Viruses 5 (7):

1664–81. doi:10.3390/v5071664

(http://dx.doi.org/10.3390%2Fv5071664).

PMID 23881275


/23881275).

5.

^ Beane J, Vick J, Schembri F, et al. (June6.

2011). "Characterizing the impact of

smoking and lung cancer on the airway

transcriptome using RNA-Seq"


/pmc/articles/PMC3694393). Cancer Prev

Res (Phila) 4 (6): 803–17.

doi:10.1158/1940-6207.CAPR-11-0212

(http://dx.doi.org

/10.1158%2F1940-6207.CAPR-11-0212).

PMC 3694393



PMID 21636547


/21636547).

^ "HapMap: About the Project"

(http://hapmap.ncbi.nlm.nih.gov

/abouthapmap.html). Retrieved

2013-07-28.

7.

^ Marioni JC, Mason CE, Mane SM,

Stephens M, Gilad Y (September 2008).

"RNA-seq: an assessment of technical

reproducibility and comparison with gene

expression arrays"


/pmc/articles/PMC2527709). Genome Res.

18 (9): 1509–17.

doi:10.1101/gr.079558.108

(http://dx.doi.org

/10.1101%2Fgr.079558.108).

PMC 2527709



PMID 18550803


/18550803).

8.

^ Siu H, Zhu Y, Jin L, Xiong M (2011).

"Implication of next-generation

sequencing on association studies"


/pmc/articles/PMC3148210). BMC

9.


11 of 17 05/07/2014 06:13 PM

Genomics 12: 322.

doi:10.1186/1471-2164-12-322

(http://dx.doi.org

/10.1186%2F1471-2164-12-322).

PMC 3148210



PMID 21682891


/21682891).

^ a b Wang Z, Gerstein M, Snyder M.

(January 2009). "RNA-Seq: a revolutionary

tool for transcriptomics"

(http://www.nature.com/nrg/journal

/v10/n1/abs/nrg2484.html). Nature

Reviews Genetics 10 (1): 57–63.

doi:10.1038/nrg2484 (http://dx.doi.org

/10.1038%2Fnrg2484). PMC 2949280



PMID 19015660


/19015660).

10.

^ a b c Mortazavi A, Williams BA, McCue

K, Schaeffer L, Wold B. (2008). "Mapping

and quantifying mammalian

transcriptomes by RNA-seq"

(http://www.nature.com/nmeth/journal

/v5/n7/abs/nmeth.1226.html). Nature

Methods 5 (7): 621–628.

doi:10.1038/nmeth.1226 (http://dx.doi.org

/10.1038%2Fnmeth.1226).

PMID 18516045


/18516045).

11.

^ http://www.protocol-online.org

/prot/Molecular_Biology

/RNA/RNA_Extraction/mRNA_Isolation

/index.html

12.

^ Liu D, Graber JH (2006). "Quantitative

comparison of EST libraries requires

13.

compensation for systematic biases in

cDNA generation"


/pmc/articles/PMC1431573). BMC

Bioinformatics 7: 77.

doi:10.1186/1471-2105-7-77

(http://dx.doi.org

/10.1186%2F1471-2105-7-77).

PMC 1431573



PMID 16503995


/16503995).

^ Grabherr MG, Haas BJ, Yassour M, et al.

(July 2011). "Full-length transcriptome

assembly from RNA-Seq data without a

reference genome"


/pmc/articles/PMC3571712). Nat.

Biotechnol. 29 (7): 644–52.

doi:10.1038/nbt.1883 (http://dx.doi.org

/10.1038%2Fnbt.1883). PMC 3571712



PMID 21572440


/21572440).

14.

^ Zerbino DR, Birney E (2008). "Velvet:

Algorithms for de novo short read

assembly using de Bruijn graphs"

(http://genome.cshlp.org/content

/18/5/821.full). Genome Research 18 (5):

821–829. doi:10.1101/gr.074492.107

(http://dx.doi.org

/10.1101%2Fgr.074492.107).

PMC 2336801



PMID 18349386


15.


12 of 17 05/07/2014 06:13 PM

/18349386).

^ Langmead B, Trapnell C, Pop M,

Salzberg SL (2009). "Ultrafast and

memory-efficient alignment of short DNA

sequences to the human genome"


/pmc/articles/PMC2690996). Genome Biol.

10 (3): R25. doi:10.1186/gb-2009-10-3-r25

(http://dx.doi.org/10.1186%2Fgb-

2009-10-3-r25). PMC 2690996



PMID 19261174


/19261174).

16.

^ Cole Trapnell, Lior Pachter and Steven

Salzberg (2009). "TopHat: discovering

splice junctions with RNA-Seq"

(http://bioinformatics.oxfordjournals.org

/cgi/content/abstract/25/9/1105?etoc).

Bioinformatics 25 (9): 1105–1111.

doi:10.1093/bioinformatics/btp120

(http://dx.doi.org

/10.1093%2Fbioinformatics%2Fbtp120).

PMC 2672628



PMID 19289445


/19289445).

17.

^ a b Trapnell, Cole; Williams, Brian A;

Pertea, Geo; Mortazavi, Ali; Kwan,

Gordon; van Baren, Marijke J; Salzberg,

Steven L; Wold, Barbara J; Pachter, Lior

(May 2010). "Transcript assembly and

quantification by RNA-Seq reveals

unannotated transcripts and isoform

switching during cell differentiation"


/pmc/articles/PMC3146043). Nat

Biotechnol 28 (5): 511–515.

18.

doi:10.1038/nbt.1621 (http://dx.doi.org

/10.1038%2Fnbt.1621). PMC 3146043



PMID 20436464


/20436464).

^ "FANSe: introduction"

(http://bioinformatics.jnu.edu.cn/software

/fanse). Retrieved 2013-07-28.

19.

^ Trapnell C, Roberts A, Goff L, et al.

(March 2012). "Differential gene and

transcript expression analysis of RNA-seq

experiments with TopHat and Cufflinks"


/pmc/articles/PMC3334321). Nat Protoc 7

(3): 562–78. doi:10.1038/nprot.2012.016

(http://dx.doi.org

/10.1038%2Fnprot.2012.016).

PMC 3334321



PMID 22383036


/22383036).

20.

^ Lu B, Zeng Z, Shi T (February 2013).

"Comparative study of de novo assembly

and genome-guided assembly strategies

for transcriptome reconstruction based on

RNA-Seq". Science China Life Sciences 56

(2): 143–55.

doi:10.1007/s11427-013-4442-z

(http://dx.doi.org

/10.1007%2Fs11427-013-4442-z).

PMID 23393030


/23393030).

21.

^ Bradnam KR, Fass JN, Alexandrov A, et

al. (July 2013). "Assemblathon 2:

evaluating de novo methods of genome

assembly in three vertebrate species".

22.


13 of 17 05/07/2014 06:13 PM

Gigascience 2 (1): 10.

doi:10.1186/2047-217X-2-10

(http://dx.doi.org/10.1186%2F2047-217X-

2-10). PMID 23870653


/23870653).

^ a b Li H, Lovci MT, Kwon YS, Rosenfeld

MG, Fu XD, Yeo GW (2008).

"Determination of tag density required for

digital transcriptome analysis: Application

to an androgen-sensitive prostate cancer

model" (http://www.pnas.org/content

/105/51/20179.long). Proc Natl Acad Sci

USA 105 (51): 20179–84.

doi:10.1073/pnas.0807121105

(http://dx.doi.org

/10.1073%2Fpnas.0807121105).

PMC 2603435



PMID 19088194


/19088194).

23.

^ Greenbaum D, Colangelo C, Williams K,

Gerstein M. (2003). "Comparing protein

abundance and mRNA expression levels

on a genomic scale"

(http://genomebiology.com/2003/4/9/117).

Genome Biology 4 (9): 117.

doi:10.1186/gb-2003-4-9-117

(http://dx.doi.org/10.1186%2Fgb-

2003-4-9-117). PMC 193646



PMID 12952525


/12952525).

24.

^ Nagalakshmi U, Wang Z, Waern K, Shou

C, Raha D, Gerstein M, Snyder M (2008).

"The Transcriptional Landscape of the

Yeast Genome Defined by RNA

25.

Sequencing" (http://www.sciencemag.org

/cgi/content/abstract/320/5881/1344).

Science 320 (5881): 1344–1349.

doi:10.1126/science.1158441

(http://dx.doi.org

/10.1126%2Fscience.1158441).

PMC 2951732



PMID 18451266


/18451266).

^ "CummeRbund - An R package for

persistent storage, analysis, and

visualization of RNA-Seq from cufflinks

output" (http://compbio.mit.edu

/cummeRbund). Retrieved 2013-07-28.

26.

^ Barbazuk WB, Emrich SJ, Chen HD, Li

L, Schnable PS (2007). "SNP discovery via

454 transcriptome sequencing"

(http://www3.interscience.wiley.com

/journal/118488674/abstract). The Plant

Journal 51 (5): 910–918.

doi:10.1111/j.1365-313X.2007.03193.x

(http://dx.doi.org

/10.1111%2Fj.1365-313X.2007.03193.x).

PMC 2169515



PMID 17662031


/17662031).

27.

^ Lalonde E, Ha KC, Wang Z, et al. (April

2011). "RNA sequencing reveals the role

of splicing polymorphisms in regulating

human gene expression"



21 (4): 545–54.

doi:10.1101/gr.111211.110

(http://dx.doi.org

28.


14 of 17 05/07/2014 06:13 PM

/10.1101%2Fgr.111211.110).

PMC 3065702



PMID 21173033


/21173033).

^ Garcion E, Wallace B, Pelletier L, Wion

D. (2004). "RNA mutagenesis and

sporadic prion diseases". Journal of

Theoretical Biology 230 (2): 271–274.

doi:10.1016/j.jtbi.2004.05.014

(http://dx.doi.org

/10.1016%2Fj.jtbi.2004.05.014).

PMID 15302558


/15302558).

29.

^ Teixeira MR (2006). "Recurrent fusion

oncogenes in carcinomas"

(http://www.begellhouse.com/journals

/439f422d0783386a,1371844864dc630c,7

499e7bf0e7ad511.html). Ciritical Reviews

in Oncogenesis 12 (3–4): 257–271.

PMID 17425505


/17425505).

30.

^ Maher CA, Kumar-Sinha C, Cao X,

Kalyana-Sundaram S, Han B, Jing X, Sam

L, Barrette T, Palanisamy N, Chinnaiyan

AM (January 2009). "Transcriptome

Sequencing to Detect Gene Fusions in

Cancer" (http://www.nature.com/nature

/journal/vaop/ncurrent

/abs/nature07638.html). Nature 458

(7234): 97–101. doi:10.1038/nature07638

(http://dx.doi.org

/10.1038%2Fnature07638). PMC 2725402



PMID 19136943


31.

/19136943).

^ a b Marcotte, EM.; Pellegrini, M.;

Thompson, MJ.; Yeates, TO.; Eisenberg, D.

(Nov 1999). "A combined algorithm for

genome-wide prediction of protein

function.". Nature 402 (6757): 83–6.

doi:10.1038/47048 (http://dx.doi.org

/10.1038%2F47048). PMID 10573421


/10573421).

32.

^ a b Giorgi Federico Manuel (2013).

"Comparative study of RNA-seq- and

Microarray-derived coexpression

networks in Arabidopsis thaliana"


/content/29/6/717.short). Bioinformatics

29 (6): 717–724.

doi:10.1093/bioinformatics/btt053

(http://dx.doi.org

/10.1093%2Fbioinformatics%2Fbtt053).

PMID 23376351


/23376351).

33.

^ Iancu Ovidiu D (2012). "Utilizing

RNA-Seq data for de novo coexpression

network inference"


/content/28/12/1592.short). Bioinformatics

28 (12): 1592–1597.

doi:10.1093/bioinformatics/bts245

(http://dx.doi.org

/10.1093%2Fbioinformatics%2Fbts245).

PMID 22556371


/22556371).

34.

^ Eksi, R; Li, HD; Menon, R; Wen, Y;

Omenn, GS; Kretzler, M; Guan, Y (Nov

2013). "Systematically differentiating

functions for alternatively spliced

isoforms through integrating RNA-seq

data.". PLoS computational biology 9 (11):

35.


15 of 17 05/07/2014 06:13 PM

e1003314. PMID 24244129


/24244129).

^ Tuch BB, Laborde RR, Xu X, et al.

(2010). "Tumor transcriptome sequencing

reveals allelic expression imbalances

associated with copy number alterations"


/pmc/articles/PMC2824832). PLoS ONE 5

(2): e9317.

doi:10.1371/journal.pone.0009317

(http://dx.doi.org

/10.1371%2Fjournal.pone.0009317).

PMC 2824832



PMID 20174472


/20174472).

36.

^ Berger MF, Levin JZ, Vijayendran K, et

al. (April 2010). "Integrative analysis of

the melanoma transcriptome"



20 (4): 413–27.

doi:10.1101/gr.103697.109

(http://dx.doi.org

/10.1101%2Fgr.103697.109).

PMC 2847744



PMID 20179022


/20179022).

37.

^ Twine NA, Janitz K, Wilkins MR, Janitz

M (2011). "Whole transcriptome

sequencing reveals gene expression and

splicing differences in brain regions

affected by Alzheimer's disease"


/pmc/articles/PMC3025006). PLoS ONE 6

38.

(1): e16266.

doi:10.1371/journal.pone.0016266

(http://dx.doi.org

/10.1371%2Fjournal.pone.0016266).

PMC 3025006



PMID 21283692


/21283692).

^ Ku GM, Kim H, Vaughn IW, et al.

(October 2012). "Research resource:

RNA-Seq reveals unique features of the

pancreatic β-cell transcriptome". Mol.

Endocrinol. 26 (10): 1783–92.

doi:10.1210/me.2012-1176

(http://dx.doi.org

/10.1210%2Fme.2012-1176).

PMID 22915829


/22915829).

39.

^ Morán I, Akerman I, van de Bunt M, et

al. (October 2012). "Human β cell

transcriptome analysis uncovers lncRNAs

that are tissue-specific, dynamically

regulated, and abnormally expressed in

type 2 diabetes". Cell Metab. 16 (4):

435–48. doi:10.1016/j.cmet.2012.08.010

(http://dx.doi.org

/10.1016%2Fj.cmet.2012.08.010).

PMID 23040067


/23040067).

40.

^ Merrick, B. A., Phadke, D. P., Auerbach,

S. S., Mav, D., Stiegelmeyer, S. M., Shah,

R. R., & Tice, R. R. (2013). RNA-seq

reveals novel hepatic gene expression

pattern in Aflatoxin B1 treated rats. PLoS

ONE, 8, e61768.

41.

^ Han, y., Chen, J., Zhao, X., Liang, C.,

Wang, Y., Sun, L., Jiang, Z., Zhang, Z.,

42.


16 of 17 05/07/2014 06:13 PM

Yang, R., Chen, J., Li, Z., Tang, A., Li, X.,

Ye, J., Guan, Z., Gui, Y., & Cai, Z. (2011).

MicroRNA expression signatures of

bladder cancer revealed by deep

sequencing. PLos ONE, 6, e18286.

^ "ENCODE Data Matrix"

(http://genome.ucsc.edu/ENCODE

43.

/dataMatrix

/encodeDataMatrixHuman.html).

Retrieved 2013-07-28.

^ "The Cancer Genome Atlas - Data

Portal" (https://tcga-data.nci.nih.gov

/tcga/tcgaHome2.jsp). Retrieved

2013-07-28.

44.

Retrieved from "http://en.wikipedia.org/w/index.php?title=RNA-Seq&oldid=604992202"Categories: Molecular biology RNA Gene expression

This page was last modified on 20 April 2014 at 08:49.Text is available under the Creative Commons Attribution-ShareAlike License;additional terms may apply. By using this site, you agree to the Terms of Use andPrivacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation,Inc., a non-profit organization.


17 of 17 05/07/2014 06:13 PM

Documents

Wikipedia RAN Seq