Click here to load reader
Upload
jubatuslibro
View
213
Download
1
Embed Size (px)
Citation preview
RNA-SeqFrom Wikipedia, the free encyclopedia
RNA-seq (RNA Sequencing), also called "Whole Transcriptome Shotgun Sequencing"[1] ("WTSS"), is a technology that uses the capabilities of next-generation sequencing toreveal a snapshot of RNA presence and quantity from a genome at a given moment in
time.[2]
Contents
1 Introduction
2 Methods
2.1 RNA 'Poly(A)' Library
2.2 Small RNA/Non-coding RNA sequencing
2.3 Direct RNA Sequencing
2.4 Transcriptome Assembly
2.5 Experimental Considerations
3 Analysis
3.1 Gene expression
3.2 Single nucleotide variation discovery
3.3 Post-transcriptional SNVs
3.4 Fusion gene detection
3.5 Coexpression Networks
4 Application to Genomic Medicine
4.1 History
4.2 ENCODE and TCGA
5 External links
6 References
Introduction
The transcriptome of a cell is dynamic; it continually changes as opposed to a staticgenome. The recent developments of Next-Generation Sequencing (NGS) allow forincreased base coverage of a DNA sequence, as well as higher sample throughput. Thisfacilitates sequencing of the RNA transcripts in a cell, providing the ability to look atalternative gene spliced transcripts, post-transcriptional changes, gene fusion,
mutations/SNPs and changes in gene expression.[3] In addition to mRNA transcripts,RNA-Seq can look at different populations of RNA to include total RNA, small RNA,
RNA-Seq - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/RNA-Seq
1 of 17 05/07/2014 06:13 PM
such as miRNA, tRNA, and ribosomal profiling.[4] RNA-Seq can also be used todetermine exon/intron boundaries and verify or amend previously annotated 5’ and 3’gene boundaries. Ongoing RNA-Seq research includes observing cellular pathway
alterations during infection,[5] and gene expression level changes in cancer studies.[6]
Prior to NGS, transcriptomics and gene expression studies were previously done withexpression microarrays, which contain thousands of DNA sequences that probe for amatch in the target sequence, making available a profile of all transcripts beingexpressed. This was later done with Serial Analysis of Gene Expression (SAGE).
One deficiency with microarrays that makes RNA-Seq more attractive has been limitedcoverage; such arrays target the identification of known common alleles that representapproximately 500,000 to 2,000,000 SNPs of the more than 10,000,000 in the
genome.[7] As such, libraries aren’t usually available to detect and evaluate rare allele
variant transcripts,[8] and the arrays are only as good as the SNP databases they’re
designed from, so they have limited application for research purposes.[9] Many cancersfor example are caused by rare <1% mutations and would go undetected. However,arrays still have a place for targeted identification of already known common allelevariants, making them ideal for regulatory body-approved diagnostics such as cysticfibrosis.
Methods
RNA 'Poly(A)' Library
See also: Polyadenylation
Creation of a sequence library can change from platform
to platform in high throughput sequencing,[10] whereeach has several kits designed to build different types oflibraries and adapting the resulting sequences to thespecific requirements of their instruments. However, dueto the nature of the template being analyzed, there arecommonalities within each technology. Frequently, inmRNA analysis the 3' polyadenylated (poly(A)) tail istargeted in order to ensure that coding RNA is separatedfrom noncoding RNA. This can be accomplished simplywith poly (T) oligos covalently attached to a givensubstrate. Presently many studies utilize magnetic beads
for this step.[1][11] The Protocol Online website[12]
provides a list of several protocols relating to mRNAisolation.
Studies including portions of the transcriptome outsidepoly(A) RNAs have shown that when using poly(T) magnetic beads, the flow-throughRNA (non-poly(A) RNA) can yield important noncoding RNA gene discovery which
would have otherwise gone unnoticed.[1] Also, since ribosomal RNA represents over90% of the RNA within a given cell, studies have shown that its removal via probe
RNA-Seq - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/RNA-Seq
2 of 17 05/07/2014 06:13 PM
RNA-seq mapping of shortreads in exon-exon junctions.
hybridization increases the capacity to retrieve data from the remaining portion of thetranscriptome.
The next step is reverse transcription. Due to the 5' bias of randomly primed-reverse
transcription as well as secondary structures influencing primer binding sites,[11]
hydrolysis of RNA into 200-300 nucleotides prior to reverse transcription reduces bothproblems simultaneously. However, there are trade-offs with this method wherealthough the overall body of the transcripts are efficiently converted to DNA, the 5' and3' ends are less so. Depending on the aim of the study, researchers may choose to applyor ignore this step.
Once the cDNA is synthesized it can be further fragmented to reach the desiredfragment length of the sequencing system.
Small RNA/Non-coding RNA sequencing
When sequencing RNA other than mRNA, the library preparation is modified. Thecellular RNA is selected based on the desired size range. For small RNA targets, suchas miRNA, the RNA is isolated through size selection. This can be performed with asize exclusion gel, through size selection magnetic beads, or with a commerciallydeveloped kit. Once isolated, linkers are added to the 3’ and 5’ end then purified. Thefinal step is cDNA generation through reverse transcription.
Direct RNA Sequencing
As converting RNA into cDNA using reversetranscriptase has been shown to introduce biases andartifacts that may interfere with both the proper
characterization and quantification of transcripts,[13]
single molecule Direct RNA Sequencing (DRSTM)technology was under development by Helicos (nowbankrupt). DRSTM sequences RNA molecules directly ina massively-parallel manner without RNA conversion tocDNA or other biasing sample manipulations such asligation and amplification.
Transcriptome Assembly
See also: Sequence alignment software § Short-Read Sequence Alignment
Two different assembly methods are used for producing a transcriptome from rawsequence reads: de-novo and genome-guided.
The first approach does not rely on the presence of a reference genome in order toreconstruct the nucleotide sequence. Due to the small size of the short reads de novoassembly may be difficult though some software does exist ( Velvet (algorithm), Oases(http://www.ebi.ac.uk/~zerbino/oases/), and Trinity
(http://trinityrnaseq.sourceforge.net)[14] to mention a few), as there cannot be largeoverlaps between each read needed to easily reconstruct the original sequences. The
RNA-Seq - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/RNA-Seq
3 of 17 05/07/2014 06:13 PM
deep coverage also makes the computing power to track all the possible alignments
prohibitive.[15] This deficit can be improved using longer sequences obtained from thesame sample using other techniques such as Sanger sequencing, and using larger readsas a "skeleton" or a "template" to help assemble reads in difficult regions (e.g. regionswith repetitive sequences).
An “easier” and relatively computationally cheaper approach is that of aligning themillions of reads to a "reference genome". There are many tools available for aligninggenomic reads to a reference genome (sequence alignment tools), however, specialattention is needed when alignment of a transcriptome to a genome, mainly whendealing with genes having intronic regions. Several software packages exist for shortread alignment, and recently specialized algorithms for transcriptome alignment have
been developed, e.g. Bowtie for RNA-seq short read alignment,[16] TopHat for aligning
reads to a reference genome to discover splice sites,[17] Cufflinks to assemble the
transcripts and compare/merge them with others,[18] or FANSe.[19] These tools can
also be combined to form a comprehensive system.[20]
Although numerous solutions to the assembly quest have been proposed, there is stilllots of room for improvement given the resulting variability of the approaches. A groupfrom the Center for Computational Biology at the East China Normal University inShanghai compared different de novo and genome-guided approaches for RNA-Seqassembly. They noted that, although most of the problems can be solved using graphtheory approaches, there is still a consistent level of variability in all of them. Somealgorithms outperformed the common standards for some species while still strugglingfor others. The authors suggest that the “most reliable” assembly could be then
obtained by combining different approaches.[21] Interestingly, these results areconsistent with NGS-genome data obtained in a recent contest called Assemblathonwhere 21 contestants analyzed sequencing data from three different vertebrates (fish,snake and bird) and handed in a total of 43 assemblies. Using a metric made of 100different measures for each assembly, the reviewers concluded that 1) assembly qualitycan vary a lot depending on which metric is used and 2) assemblies that scored well in
one species didn’t really perform well in the other species.[22]
As discussed above, sequence libraries are created by extracting mRNA using itspoly(A) tail, which is added to the mRNA molecule post-transcriptionally and thussplicing has taken place. Therefore, the created library and the short reads obtainedcannot come from intronic sequences, so library reads spanning the junction of two ormore exons will not align to the genome.
A possible method to work around this is to try to align the unaligned short reads usinga proxy genome generated with known exonic sequences. This need not cover wholeexons, only enough so that the short reads can match on both sides of the exon-exonjunction with minimum overlap. Some experimental protocols allow the production of
strand specific reads.[11]
Experimental Considerations
The information gathered when sequencing a sample's transcriptome in this way has
RNA-Seq - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/RNA-Seq
4 of 17 05/07/2014 06:13 PM
many of the same limitations and advantages as other RNA expression analysispipelines. The main pros and cons of this approach can be summarized as:
a) Tissue specificity: Gene expression is not uniform throughout an organism's cells, itis strongly dependent on the tissue type being measured; RNA-Seq, as any othersequencing technology that analyzes homogeneous samples, can provide a completesnapshot of all the transcripts being available at that precise moment in the cell. Thisapproach is unlikely to be biased like an oligonucleotide microarray approach thatinstead analyzes a selected number of previously defined transcripts.
b) Time dependent: During a cell's lifetime and context, its gene expression levelschange. As previously mentioned any single sequencing experiment will offerinformation regarding one point in time. Time course experiments are so far the onlysolution that would allow a complete overview of the circadian transcriptome so thatresearchers could obtain a precise description of the physiological changes happeningover time. However, this approach is unfeasible for patient samples since it is quiteimprobable that biopsies will be collected serially in short time intervals. A possiblework-around could be the use of urine, blood or saliva samples that won’t require anyinvasive procedure.
c) Coverage: coverage/depth can affect the mutations seen. Given that everything isexpression-centric, an allele might not be detected, either because it is not in thegenome, or because it is not being expressed. At the same time, RNA-seq can yieldadditional information rather than just the existence of a heterozygous gene as it canalso help in estimating the expression of each allele. In association studies, genotypesare associated to disease and expression levels can also be associated with disease.Using RNA-seq, we can measure the relationship between these two associatedvariables, that is, in what relation are each of the alleles being expressed.
The depth of sequencing required for specific applications can be extrapolated from a
pilot experiment.[23]
d) Subjectivity of the analysis: As described above, numerous attempts have been takento uniformly analyze the data. However, the results can vary due to the multitude ofalgorithms and pipelines available. Most of the approaches are correct, but have to betailored to the needs of the investigators in order to better capture the desired effect.This variability in methods, although in smaller scale, is still present in other RNAprofiling approaches where reagents, personnel and techniques can lead to similar,although statistically different, results. Because of this, care must be taken whendrawing conclusions from the sequencing experiment, as some information gatheredmight not be representative of the individual.
e) Data management: The main issue with NGS data is the volume of data produced.Microarray data occupy up to one thousand times less disk space than NGS datatherefore requiring smaller storage units. The high capacity storage units required byRNA-Seq data are, however, directly proportional to the volume of information that goeswith it. The payoff of “more complete” big scale datasets have to be evaluated prior tostarting the experiment.
f) Downstream interpretation of the data: Different layers of interpretations have to be
RNA-Seq - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/RNA-Seq
5 of 17 05/07/2014 06:13 PM
considered when analyzing RNA-Seq data. Biological, clinical and regulatory functionsof the results are what allow clinicians and investigators to draw meaningfulconclusions (i.e. the sequence of an RNA molecule presents, although identified withdifferent read depths, might not perfectly mirror the initial DNA sequence). An exampleof this would be during SNV discovery as the mutations discovered are more preciselythe mutations being expressed. Observing a homozygote location to a non-referenceallele in an organism does not necessarily mean that this is the individual's genotype, itcould just mean that the gene copy with the reference allele is not being expressed inthat tissue and/or at the time snapshot the sample was acquired.
Analysis
See also List of RNA-Seq bioinformatics tools
Gene expression
The characterization of gene expression in cells via measurement of mRNA levels haslong been of interest to researchers, both in terms of which genes are expressed inwhat tissues, and at what levels. Even though it has been shown that due to other posttranscriptional gene regulation events (such as RNA interference) there is notnecessarily always a strong correlation between the abundance of mRNA and the
related proteins,[24] measuring mRNA concentration levels is still a useful tool indetermining how the transcriptional machinery of the cell is affected in the presence ofexternal signals (e.g. drug treatment), or how cells differ between a healthy state and adiseased state.
Expression can be deduced via RNA-seq to the extent at which a sequence is retrieved.
Transcriptome studies in yeast [25] show that in this experimental setting, a fourfoldcoverage is required for amplicons to be classified and characterized as an expressedgene. When the transcriptome is fragmented prior to cDNA synthesis, the number ofreads corresponding to the particular exon normalized by its length in vivo yields gene
expression levels which correlate with those obtained through qPCR.[23] This isfrequently further normalized by the total number of mapped reads so that expressionlevels are expressed as Fragments Per Kilobase of transcript per Million mapped reads
(FPKM).[18]
The only way to be absolutely sure of the individual's mutations is to compare thetranscriptome sequences to the germline DNA sequence. This enables the distinction ofhomozygous genes versus skewed expression of one of the alleles and it can alsoprovide information about genes that were not expressed in the transcriptomic
experiment. An R-based statistical package known as CummeRbund[26] can be used togenerate expression comparison charts for visual analysis.
Single nucleotide variation discovery
See also: single nucleotide polymorphism
RNA-Seq - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/RNA-Seq
6 of 17 05/07/2014 06:13 PM
RNA-seq mapping of shortreads over exon-exonjunctions, depending onwhere each end maps to, itcould be defined a Trans or aCis event.
Transcriptome single nucleotide variation has been analyzed in maize on the Roche 454
sequencing platform.[27] Directly from the transcriptome analysis, around 7000 singlenucleotide polymorphisms (SNPs) were recognized. Following Sanger sequencevalidation, the researchers were able to conservatively obtain almost 5000 valid SNPscovering more than 2400 maize genes. RNA-seq is limited to transcribed regionshowever, since it will only discover sequence variations in exon regions. This missesmany subtle but important intron alleles that affect disease such as transcriptionregulators, leaving analysis to only large effectors. While some correlation existsbetween exon to intron variation, only whole genome sequencing would be able to
capture the source of all relevant SNPs.[28]
Post-transcriptional SNVs
Having the matching genomic and transcriptomic sequences of an individual can also
help in detecting post-transcriptional edits,[10] where, if the individual is homozygousfor a gene, but the gene's transcript has a different allele, then a post-transcriptionalmodification event is determined.
mRNA centric single nucleotide variants (SNVs) are generally not considered as arepresentative source of functional variation in cells, mainly due to the fact that thesemutations disappear with the mRNA molecule, however the fact that efficient DNAcorrection mechanisms do not apply to RNA molecules can cause them to appear more
often. This has been proposed as the source of certain prion diseases,[29] also known asTSE or transmissible spongiform encephalopathies.
Fusion gene detection
See also: Fusion gene
Caused by different structural modifications in thegenome, fusion genes have gained attention because of
their relationship with cancer.[30] The ability of RNA-seqto analyze a sample's whole transcriptome in anunbiased fashion makes it an attractive tool to find these
kinds of common events in cancer.[31]
The idea follows from the process of aligning the shorttranscriptomic reads to a reference genome. Most of theshort reads will fall within one complete exon, and asmaller but still large set would be expected to map toknown exon-exon junctions. The remaining unmappedshort reads would then be further analyzed to determinewhether they match an exon-exon junction where theexons come from different genes. This would be evidenceof a possible fusion event, however, because of thelength of the reads, this could prove to be very noisy. An alternative approach is to usepair-end reads, when a potentially large number of paired reads would map each end toa different exon, giving better coverage of these events (see figure). Nonetheless, the
RNA-Seq - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/RNA-Seq
7 of 17 05/07/2014 06:13 PM
end result consists of multiple and potentially novel combinations of genes providing anideal starting point for further validation.
Coexpression Networks
Coexpression networks are data-derived representations of genes behaving in a similar
way across tissues and experimental conditions.[32] Their main purpose lies inhypothesis generation and guilt-by-association approaches for inferring functions of
previously unknown genes.[32] RNASeq data has been recently used to infer genes
involved in specific pathways based on Pearson correlation, both in plants [33] and
mammals.[34] The main advantage of RNASeq data in this kind of analysis over themicroarray platforms is the capability to cover the entire transcriptome, thereforeallowing the possibility to unravel more complete representations of the generegulatory networks. Differential regulation of the splice isoforms of the same gene can
be detected and used to predict and their biological functions.[35] Weighted geneco-expression network analysis has been successfully used to identify co-expressionmodules and intramodular hub genes based on RNA seq data. Co-expression modulesmay corresponds to cell types or pathways. Highly connected intramodular hubs can beinterpreted as representatives of their respective module. Variance-StabilizingTransformation approaches for estimating correlation coefficients based on RNA seq
data have been proposed.[33]
Application to Genomic Medicine
History
The past five years have seen a flourishing of NGS-based methods for genome analysisleading to the discovery of a number of new mutations and fusion transcripts in cancer.RNA-Seq data could help researchers interpreting the “personalized transcriptome” sothat it will help understanding the transcriptomic changes happening therefore, ideally,identifying gene drivers for a disease. The feasibility of this approach is howeverdictated by the costs in term of money and time.
A basic search on PubMed reveals that the term RNA Seq, queried as “rna Seq ORRNA-Seq OR rna sequencing OR RNASeq” in order to capture the most common waysof phrasing it, gives 147.525 hits demonstrating the exponentially increasing usage rateof this technology. A few examples will be taken into consideration to explain thatRNA-Seq applications to the clinic have the potentials to significantly affect patient’slife and, on the other hand, requires a team of specialists (bioinformaticians,physicians/clinicians, basic researchers, technicians) to fully interpret the huge amountof data generated by this analysis.
As an example of excellent clinical applications, researchers at the Mayo Clinic used anRNA-Seq approach to identify differentially expressed transcripts between oral cancerand normal tissue samples. They also accurately evaluated the allelic imbalance (AI),ratio of the transcripts produced by the single alleles, within a subgroup of genes
involved in cell differentiation, adhesion, cell motility and muscle contraction[36]
RNA-Seq - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/RNA-Seq
8 of 17 05/07/2014 06:13 PM
identifying a unique transcriptomic and genomic signature in oral cancer patients.Novel insight on skin cancer (melanoma) also come from RNA-Seq of melanomapatients. This approach led to the identification of eleven novel gene fusion transcriptsoriginated from previously unknown chromosomal rearrangements. Twelve novelchimeric transcripts were also reported, including seven of those that confirmed
previously identified data in multiple melanoma samples.[37] Furthermore, thisapproach is not limited to cancer patients. RNA-Seq has been used to study otherimportant chronic diseases such as Alzheimer (AD) and diabetes. In the former case,Twine and colleagues compared the transcriptome of different lobes of deceased AD’spatient’s brain with the brain of healthy individuals identifying a lower number of splicevariants in AD’s patients and differential promoter usage of the APOE-001 and -002
isoforms in AD’s brains.[38] In the latter case, different groups showed the unicity of thebeta-cells transcriptome in diabetic patients in terms of transcripts accumulation and
differential promoter usage[39] and long non coding RNAs (lncRNAs) signature.[40]
Compared with microarrays, NGS technology has identified novel and low frequencyRNAs associated with disease processes. This advantage aids in the diagnosis andpossible future treatments of diseases, including cancer. For example, NGS technologyidentified several previously undocumented differentially-expressed transcripts in ratstreated with AFB1, a potent hepatocarcinogen. Nearly 50 new differentially-expressedtranscriptions were identified between the controls and AFB1-treated rats. Additionallypotential new exons were identified, including some that are responsive to AFB1. Thenext-generation sequencing pipeline identified more differential gene expressionscompared with microarrays, particularly when DESeq software was utilized. Cufflinksidentified two novel transcripts that were not previously annotated in the Ensembl
database; these transcripts were confirmed using cloning PCR.[41] Numerous otherstudies have demonstrated NGS's ability to detect aberrant mRNA and smallnon-coding RNA expression in disease processes above that provided by microarrays.The lower cost and higher throughput offered by NGS confers another advantage toresearchers.
The role of small non-coding RNAs in disease processes has also been explored inrecent years. For example, Han et al. (2011) examined microRNA expressiondifferences in bladder cancer patients in order to understand how changes anddysregulation in microRNA can influence mRNA expression and function. SeveralmicroRNAs were differentially expressed in the bladder cancer patients. Upregulationin the aberrant microRNAs was more common than downregulation in the cancerpatients. One of the upregulated microRNAs, hsa-miR-96, has been associated withcarcinogenesis, and several of the overexpressed microRNAs have also been observedin other cancers, including ovarian and cervical. Some of the downregulated
microRNAs in cancer samples were hypothesized to have inhibitory roles.[42]
ENCODE and TCGA
A lot of emphasis has been given to RNA-Seq data after the Encyclopedia of theregulatory elements (ENCODE) and The Cancer Genome Atlas (TCGA) projects have
used this approach to characterize dozens of cell lines[43] and thousands of primary
tumor samples,[44] respectively. The former aimed to identify genome-wide regulatory
RNA-Seq - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/RNA-Seq
9 of 17 05/07/2014 06:13 PM
regions in different cohort of cell lines and transcriptomic data are paramount in orderto understand the downstream effect of those epigenetic and genetic regulatory layers.The latter project, instead, aimed to collect and analyze thousands of patient’s samplesfrom 30 different tumor types in order to understand the underlying mechanisms ofmalignant transformation and progression. In this context RNA-Seq data provide aunique snapshot of the transcriptomic status of the disease and look at an unbiasedpopulation of transcripts that allows the identification of novel transcripts, fusiontranscripts and non-coding RNAs that could be undetected with different technologies.
External links
RNA-Seq for Everyone (http://rnaseq.uoregon.edu/index.html): a high-level guide
to designing and implementing an RNA-Seq experiment.
ChIPBase database (http://deepbase.sysu.edu.cn/chipbase/expression.php):
provides expression profiles of protein-coding genes and lncRNAs (lincRNAs) from
RNA-Seq data across 22 tissues.
Martin A. Perdacher (September 2011) Next-Generation Sequencing and its
Applications in RNA-Seq (http://aboutme.biobyte.org/wp-content/uploads/2011/10
/Next-Generation-Sequencing-and-its-Applications-in-RNA-Seq.pdf). Theory part of
the Bachelorthesis, Hagenberg.
The RNA-Seq Blog (http://www.rna-seqblog.com/)
OMICtools (http://omictools.com/mrna-seq/): a didactic directory for RNA-seq data
analysis.
References
^ a b c Ryan D. Morin, Matthew
Bainbridge, Anthony Fejes, Martin Hirst,
Martin Krzywinski, Trevor J. Pugh, Helen
McDonald, Richard Varhol, Steven J.M.
Jones, and Marco A. Marra. (2008).
"Profiling the HeLa S3 transcriptome
using randomly primed cDNA and
massively parallel short-read sequencing"
(http://www.bcgsc.ca/about/pubann
/biotechniques-publication-2008-44-8).
BioTechniques 45 (1): 81–94.
doi:10.2144/000112900 (http://dx.doi.org
/10.2144%2F000112900).
PMID 18611170
(https://www.ncbi.nlm.nih.gov/pubmed
1. /18611170).
^ Chu Y, Corey DR (August 2012). "RNA
sequencing: platform selection,
experimental design, and data
interpretation"
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC3426205). Nucleic Acid
Ther 22 (4): 271–4.
doi:10.1089/nat.2012.0367
(http://dx.doi.org
/10.1089%2Fnat.2012.0367).
PMC 3426205
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC3426205).
PMID 22830413
2.
RNA-Seq - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/RNA-Seq
10 of 17 05/07/2014 06:13 PM
(https://www.ncbi.nlm.nih.gov/pubmed
/22830413).
^ Maher CA, Kumar-Sinha C, Cao X, et al.
(March 2009). "Transcriptome sequencing
to detect gene fusions in cancer"
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC2725402). Nature 458
(7234): 97–101. doi:10.1038/nature07638
(http://dx.doi.org
/10.1038%2Fnature07638). PMC 2725402
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC2725402).
PMID 19136943
(https://www.ncbi.nlm.nih.gov/pubmed
/19136943).
3.
^ Ingolia NT, Brar GA, Rouskin S,
McGeachy AM, Weissman JS (August
2012). "The ribosome profiling strategy
for monitoring translation in vivo by deep
sequencing of ribosome-protected mRNA
fragments" (https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC3535016). Nat Protoc 7
(8): 1534–50. doi:10.1038/nprot.2012.086
(http://dx.doi.org
/10.1038%2Fnprot.2012.086).
PMC 3535016
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC3535016).
PMID 22836135
(https://www.ncbi.nlm.nih.gov/pubmed
/22836135).
4.
^ Qian F, Chung L, Zheng W, et al. (2013).
"Identification of Genes Critical for
Resistance to Infection by West Nile Virus
Using RNA-Seq Analysis". Viruses 5 (7):
1664–81. doi:10.3390/v5071664
(http://dx.doi.org/10.3390%2Fv5071664).
PMID 23881275
(https://www.ncbi.nlm.nih.gov/pubmed
/23881275).
5.
^ Beane J, Vick J, Schembri F, et al. (June6.
2011). "Characterizing the impact of
smoking and lung cancer on the airway
transcriptome using RNA-Seq"
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC3694393). Cancer Prev
Res (Phila) 4 (6): 803–17.
doi:10.1158/1940-6207.CAPR-11-0212
(http://dx.doi.org
/10.1158%2F1940-6207.CAPR-11-0212).
PMC 3694393
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC3694393).
PMID 21636547
(https://www.ncbi.nlm.nih.gov/pubmed
/21636547).
^ "HapMap: About the Project"
(http://hapmap.ncbi.nlm.nih.gov
/abouthapmap.html). Retrieved
2013-07-28.
7.
^ Marioni JC, Mason CE, Mane SM,
Stephens M, Gilad Y (September 2008).
"RNA-seq: an assessment of technical
reproducibility and comparison with gene
expression arrays"
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC2527709). Genome Res.
18 (9): 1509–17.
doi:10.1101/gr.079558.108
(http://dx.doi.org
/10.1101%2Fgr.079558.108).
PMC 2527709
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC2527709).
PMID 18550803
(https://www.ncbi.nlm.nih.gov/pubmed
/18550803).
8.
^ Siu H, Zhu Y, Jin L, Xiong M (2011).
"Implication of next-generation
sequencing on association studies"
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC3148210). BMC
9.
RNA-Seq - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/RNA-Seq
11 of 17 05/07/2014 06:13 PM
Genomics 12: 322.
doi:10.1186/1471-2164-12-322
(http://dx.doi.org
/10.1186%2F1471-2164-12-322).
PMC 3148210
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC3148210).
PMID 21682891
(https://www.ncbi.nlm.nih.gov/pubmed
/21682891).
^ a b Wang Z, Gerstein M, Snyder M.
(January 2009). "RNA-Seq: a revolutionary
tool for transcriptomics"
(http://www.nature.com/nrg/journal
/v10/n1/abs/nrg2484.html). Nature
Reviews Genetics 10 (1): 57–63.
doi:10.1038/nrg2484 (http://dx.doi.org
/10.1038%2Fnrg2484). PMC 2949280
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC2949280).
PMID 19015660
(https://www.ncbi.nlm.nih.gov/pubmed
/19015660).
10.
^ a b c Mortazavi A, Williams BA, McCue
K, Schaeffer L, Wold B. (2008). "Mapping
and quantifying mammalian
transcriptomes by RNA-seq"
(http://www.nature.com/nmeth/journal
/v5/n7/abs/nmeth.1226.html). Nature
Methods 5 (7): 621–628.
doi:10.1038/nmeth.1226 (http://dx.doi.org
/10.1038%2Fnmeth.1226).
PMID 18516045
(https://www.ncbi.nlm.nih.gov/pubmed
/18516045).
11.
^ http://www.protocol-online.org
/prot/Molecular_Biology
/RNA/RNA_Extraction/mRNA_Isolation
/index.html
12.
^ Liu D, Graber JH (2006). "Quantitative
comparison of EST libraries requires
13.
compensation for systematic biases in
cDNA generation"
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC1431573). BMC
Bioinformatics 7: 77.
doi:10.1186/1471-2105-7-77
(http://dx.doi.org
/10.1186%2F1471-2105-7-77).
PMC 1431573
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC1431573).
PMID 16503995
(https://www.ncbi.nlm.nih.gov/pubmed
/16503995).
^ Grabherr MG, Haas BJ, Yassour M, et al.
(July 2011). "Full-length transcriptome
assembly from RNA-Seq data without a
reference genome"
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC3571712). Nat.
Biotechnol. 29 (7): 644–52.
doi:10.1038/nbt.1883 (http://dx.doi.org
/10.1038%2Fnbt.1883). PMC 3571712
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC3571712).
PMID 21572440
(https://www.ncbi.nlm.nih.gov/pubmed
/21572440).
14.
^ Zerbino DR, Birney E (2008). "Velvet:
Algorithms for de novo short read
assembly using de Bruijn graphs"
(http://genome.cshlp.org/content
/18/5/821.full). Genome Research 18 (5):
821–829. doi:10.1101/gr.074492.107
(http://dx.doi.org
/10.1101%2Fgr.074492.107).
PMC 2336801
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC2336801).
PMID 18349386
(https://www.ncbi.nlm.nih.gov/pubmed
15.
RNA-Seq - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/RNA-Seq
12 of 17 05/07/2014 06:13 PM
/18349386).
^ Langmead B, Trapnell C, Pop M,
Salzberg SL (2009). "Ultrafast and
memory-efficient alignment of short DNA
sequences to the human genome"
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC2690996). Genome Biol.
10 (3): R25. doi:10.1186/gb-2009-10-3-r25
(http://dx.doi.org/10.1186%2Fgb-
2009-10-3-r25). PMC 2690996
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC2690996).
PMID 19261174
(https://www.ncbi.nlm.nih.gov/pubmed
/19261174).
16.
^ Cole Trapnell, Lior Pachter and Steven
Salzberg (2009). "TopHat: discovering
splice junctions with RNA-Seq"
(http://bioinformatics.oxfordjournals.org
/cgi/content/abstract/25/9/1105?etoc).
Bioinformatics 25 (9): 1105–1111.
doi:10.1093/bioinformatics/btp120
(http://dx.doi.org
/10.1093%2Fbioinformatics%2Fbtp120).
PMC 2672628
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC2672628).
PMID 19289445
(https://www.ncbi.nlm.nih.gov/pubmed
/19289445).
17.
^ a b Trapnell, Cole; Williams, Brian A;
Pertea, Geo; Mortazavi, Ali; Kwan,
Gordon; van Baren, Marijke J; Salzberg,
Steven L; Wold, Barbara J; Pachter, Lior
(May 2010). "Transcript assembly and
quantification by RNA-Seq reveals
unannotated transcripts and isoform
switching during cell differentiation"
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC3146043). Nat
Biotechnol 28 (5): 511–515.
18.
doi:10.1038/nbt.1621 (http://dx.doi.org
/10.1038%2Fnbt.1621). PMC 3146043
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC3146043).
PMID 20436464
(https://www.ncbi.nlm.nih.gov/pubmed
/20436464).
^ "FANSe: introduction"
(http://bioinformatics.jnu.edu.cn/software
/fanse). Retrieved 2013-07-28.
19.
^ Trapnell C, Roberts A, Goff L, et al.
(March 2012). "Differential gene and
transcript expression analysis of RNA-seq
experiments with TopHat and Cufflinks"
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC3334321). Nat Protoc 7
(3): 562–78. doi:10.1038/nprot.2012.016
(http://dx.doi.org
/10.1038%2Fnprot.2012.016).
PMC 3334321
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC3334321).
PMID 22383036
(https://www.ncbi.nlm.nih.gov/pubmed
/22383036).
20.
^ Lu B, Zeng Z, Shi T (February 2013).
"Comparative study of de novo assembly
and genome-guided assembly strategies
for transcriptome reconstruction based on
RNA-Seq". Science China Life Sciences 56
(2): 143–55.
doi:10.1007/s11427-013-4442-z
(http://dx.doi.org
/10.1007%2Fs11427-013-4442-z).
PMID 23393030
(https://www.ncbi.nlm.nih.gov/pubmed
/23393030).
21.
^ Bradnam KR, Fass JN, Alexandrov A, et
al. (July 2013). "Assemblathon 2:
evaluating de novo methods of genome
assembly in three vertebrate species".
22.
RNA-Seq - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/RNA-Seq
13 of 17 05/07/2014 06:13 PM
Gigascience 2 (1): 10.
doi:10.1186/2047-217X-2-10
(http://dx.doi.org/10.1186%2F2047-217X-
2-10). PMID 23870653
(https://www.ncbi.nlm.nih.gov/pubmed
/23870653).
^ a b Li H, Lovci MT, Kwon YS, Rosenfeld
MG, Fu XD, Yeo GW (2008).
"Determination of tag density required for
digital transcriptome analysis: Application
to an androgen-sensitive prostate cancer
model" (http://www.pnas.org/content
/105/51/20179.long). Proc Natl Acad Sci
USA 105 (51): 20179–84.
doi:10.1073/pnas.0807121105
(http://dx.doi.org
/10.1073%2Fpnas.0807121105).
PMC 2603435
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC2603435).
PMID 19088194
(https://www.ncbi.nlm.nih.gov/pubmed
/19088194).
23.
^ Greenbaum D, Colangelo C, Williams K,
Gerstein M. (2003). "Comparing protein
abundance and mRNA expression levels
on a genomic scale"
(http://genomebiology.com/2003/4/9/117).
Genome Biology 4 (9): 117.
doi:10.1186/gb-2003-4-9-117
(http://dx.doi.org/10.1186%2Fgb-
2003-4-9-117). PMC 193646
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC193646).
PMID 12952525
(https://www.ncbi.nlm.nih.gov/pubmed
/12952525).
24.
^ Nagalakshmi U, Wang Z, Waern K, Shou
C, Raha D, Gerstein M, Snyder M (2008).
"The Transcriptional Landscape of the
Yeast Genome Defined by RNA
25.
Sequencing" (http://www.sciencemag.org
/cgi/content/abstract/320/5881/1344).
Science 320 (5881): 1344–1349.
doi:10.1126/science.1158441
(http://dx.doi.org
/10.1126%2Fscience.1158441).
PMC 2951732
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC2951732).
PMID 18451266
(https://www.ncbi.nlm.nih.gov/pubmed
/18451266).
^ "CummeRbund - An R package for
persistent storage, analysis, and
visualization of RNA-Seq from cufflinks
output" (http://compbio.mit.edu
/cummeRbund). Retrieved 2013-07-28.
26.
^ Barbazuk WB, Emrich SJ, Chen HD, Li
L, Schnable PS (2007). "SNP discovery via
454 transcriptome sequencing"
(http://www3.interscience.wiley.com
/journal/118488674/abstract). The Plant
Journal 51 (5): 910–918.
doi:10.1111/j.1365-313X.2007.03193.x
(http://dx.doi.org
/10.1111%2Fj.1365-313X.2007.03193.x).
PMC 2169515
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC2169515).
PMID 17662031
(https://www.ncbi.nlm.nih.gov/pubmed
/17662031).
27.
^ Lalonde E, Ha KC, Wang Z, et al. (April
2011). "RNA sequencing reveals the role
of splicing polymorphisms in regulating
human gene expression"
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC3065702). Genome Res.
21 (4): 545–54.
doi:10.1101/gr.111211.110
(http://dx.doi.org
28.
RNA-Seq - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/RNA-Seq
14 of 17 05/07/2014 06:13 PM
/10.1101%2Fgr.111211.110).
PMC 3065702
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC3065702).
PMID 21173033
(https://www.ncbi.nlm.nih.gov/pubmed
/21173033).
^ Garcion E, Wallace B, Pelletier L, Wion
D. (2004). "RNA mutagenesis and
sporadic prion diseases". Journal of
Theoretical Biology 230 (2): 271–274.
doi:10.1016/j.jtbi.2004.05.014
(http://dx.doi.org
/10.1016%2Fj.jtbi.2004.05.014).
PMID 15302558
(https://www.ncbi.nlm.nih.gov/pubmed
/15302558).
29.
^ Teixeira MR (2006). "Recurrent fusion
oncogenes in carcinomas"
(http://www.begellhouse.com/journals
/439f422d0783386a,1371844864dc630c,7
499e7bf0e7ad511.html). Ciritical Reviews
in Oncogenesis 12 (3–4): 257–271.
PMID 17425505
(https://www.ncbi.nlm.nih.gov/pubmed
/17425505).
30.
^ Maher CA, Kumar-Sinha C, Cao X,
Kalyana-Sundaram S, Han B, Jing X, Sam
L, Barrette T, Palanisamy N, Chinnaiyan
AM (January 2009). "Transcriptome
Sequencing to Detect Gene Fusions in
Cancer" (http://www.nature.com/nature
/journal/vaop/ncurrent
/abs/nature07638.html). Nature 458
(7234): 97–101. doi:10.1038/nature07638
(http://dx.doi.org
/10.1038%2Fnature07638). PMC 2725402
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC2725402).
PMID 19136943
(https://www.ncbi.nlm.nih.gov/pubmed
31.
/19136943).
^ a b Marcotte, EM.; Pellegrini, M.;
Thompson, MJ.; Yeates, TO.; Eisenberg, D.
(Nov 1999). "A combined algorithm for
genome-wide prediction of protein
function.". Nature 402 (6757): 83–6.
doi:10.1038/47048 (http://dx.doi.org
/10.1038%2F47048). PMID 10573421
(https://www.ncbi.nlm.nih.gov/pubmed
/10573421).
32.
^ a b Giorgi Federico Manuel (2013).
"Comparative study of RNA-seq- and
Microarray-derived coexpression
networks in Arabidopsis thaliana"
(http://bioinformatics.oxfordjournals.org
/content/29/6/717.short). Bioinformatics
29 (6): 717–724.
doi:10.1093/bioinformatics/btt053
(http://dx.doi.org
/10.1093%2Fbioinformatics%2Fbtt053).
PMID 23376351
(https://www.ncbi.nlm.nih.gov/pubmed
/23376351).
33.
^ Iancu Ovidiu D (2012). "Utilizing
RNA-Seq data for de novo coexpression
network inference"
(http://bioinformatics.oxfordjournals.org
/content/28/12/1592.short). Bioinformatics
28 (12): 1592–1597.
doi:10.1093/bioinformatics/bts245
(http://dx.doi.org
/10.1093%2Fbioinformatics%2Fbts245).
PMID 22556371
(https://www.ncbi.nlm.nih.gov/pubmed
/22556371).
34.
^ Eksi, R; Li, HD; Menon, R; Wen, Y;
Omenn, GS; Kretzler, M; Guan, Y (Nov
2013). "Systematically differentiating
functions for alternatively spliced
isoforms through integrating RNA-seq
data.". PLoS computational biology 9 (11):
35.
RNA-Seq - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/RNA-Seq
15 of 17 05/07/2014 06:13 PM
e1003314. PMID 24244129
(https://www.ncbi.nlm.nih.gov/pubmed
/24244129).
^ Tuch BB, Laborde RR, Xu X, et al.
(2010). "Tumor transcriptome sequencing
reveals allelic expression imbalances
associated with copy number alterations"
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC2824832). PLoS ONE 5
(2): e9317.
doi:10.1371/journal.pone.0009317
(http://dx.doi.org
/10.1371%2Fjournal.pone.0009317).
PMC 2824832
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC2824832).
PMID 20174472
(https://www.ncbi.nlm.nih.gov/pubmed
/20174472).
36.
^ Berger MF, Levin JZ, Vijayendran K, et
al. (April 2010). "Integrative analysis of
the melanoma transcriptome"
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC2847744). Genome Res.
20 (4): 413–27.
doi:10.1101/gr.103697.109
(http://dx.doi.org
/10.1101%2Fgr.103697.109).
PMC 2847744
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC2847744).
PMID 20179022
(https://www.ncbi.nlm.nih.gov/pubmed
/20179022).
37.
^ Twine NA, Janitz K, Wilkins MR, Janitz
M (2011). "Whole transcriptome
sequencing reveals gene expression and
splicing differences in brain regions
affected by Alzheimer's disease"
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC3025006). PLoS ONE 6
38.
(1): e16266.
doi:10.1371/journal.pone.0016266
(http://dx.doi.org
/10.1371%2Fjournal.pone.0016266).
PMC 3025006
(https://www.ncbi.nlm.nih.gov
/pmc/articles/PMC3025006).
PMID 21283692
(https://www.ncbi.nlm.nih.gov/pubmed
/21283692).
^ Ku GM, Kim H, Vaughn IW, et al.
(October 2012). "Research resource:
RNA-Seq reveals unique features of the
pancreatic β-cell transcriptome". Mol.
Endocrinol. 26 (10): 1783–92.
doi:10.1210/me.2012-1176
(http://dx.doi.org
/10.1210%2Fme.2012-1176).
PMID 22915829
(https://www.ncbi.nlm.nih.gov/pubmed
/22915829).
39.
^ Morán I, Akerman I, van de Bunt M, et
al. (October 2012). "Human β cell
transcriptome analysis uncovers lncRNAs
that are tissue-specific, dynamically
regulated, and abnormally expressed in
type 2 diabetes". Cell Metab. 16 (4):
435–48. doi:10.1016/j.cmet.2012.08.010
(http://dx.doi.org
/10.1016%2Fj.cmet.2012.08.010).
PMID 23040067
(https://www.ncbi.nlm.nih.gov/pubmed
/23040067).
40.
^ Merrick, B. A., Phadke, D. P., Auerbach,
S. S., Mav, D., Stiegelmeyer, S. M., Shah,
R. R., & Tice, R. R. (2013). RNA-seq
reveals novel hepatic gene expression
pattern in Aflatoxin B1 treated rats. PLoS
ONE, 8, e61768.
41.
^ Han, y., Chen, J., Zhao, X., Liang, C.,
Wang, Y., Sun, L., Jiang, Z., Zhang, Z.,
42.
RNA-Seq - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/RNA-Seq
16 of 17 05/07/2014 06:13 PM
Yang, R., Chen, J., Li, Z., Tang, A., Li, X.,
Ye, J., Guan, Z., Gui, Y., & Cai, Z. (2011).
MicroRNA expression signatures of
bladder cancer revealed by deep
sequencing. PLos ONE, 6, e18286.
^ "ENCODE Data Matrix"
(http://genome.ucsc.edu/ENCODE
43.
/dataMatrix
/encodeDataMatrixHuman.html).
Retrieved 2013-07-28.
^ "The Cancer Genome Atlas - Data
Portal" (https://tcga-data.nci.nih.gov
/tcga/tcgaHome2.jsp). Retrieved
2013-07-28.
44.
Retrieved from "http://en.wikipedia.org/w/index.php?title=RNA-Seq&oldid=604992202"Categories: Molecular biology RNA Gene expression
This page was last modified on 20 April 2014 at 08:49.Text is available under the Creative Commons Attribution-ShareAlike License;additional terms may apply. By using this site, you agree to the Terms of Use andPrivacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation,Inc., a non-profit organization.
RNA-Seq - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/RNA-Seq
17 of 17 05/07/2014 06:13 PM