DTU Aqua14. juni 2019 2
SUMMARY
1. Definitions
2. Sample collections and RNA integrity
3. Library preparation
4. Data analyses
– Reads mapping/assembly
– Normalization
– Read count
– Differential expression
– Functional enrichment
DTU Aqua14. juni 2019 3
TRANSCRIPTOME
Complete set of transcripts in a cell and their
quantity, for a specific developmental stage or a
physiological condition.
DTU Aqua14. juni 2019 4
RNA classification
•Ribosomal RNA (rRNA): catalytic
component of ribosomes (about 80-85%)
•Transfer RNA (tRNA): transfers amino acids
to polypeptide chain at the ribosomal site of
protein synthesis (about 15%)
•Coding RNA(mRNA): carries information
about a protein sequence to the ribosomes
(about 5%)
•Other Non coding regulatory RNAs
DTU Aqua14. juni 2019 5
Delpu et al. 2016. Drug Discovery in Cancer Epigenetics
Other non coding regulatory RNAs
3. RNA classification
DTU Aqua14. juni 2019 6
Long RNAs: splicing
DNA
RNA
mRNA
lncRNA
1. Definitions
DTU Aqua14. juni 2019 7
RNA-seq
• Abundance estimation/differential expression
• Alternative splicing
• RNA editing
• Novel transcripts
• Allele specific expression
• Fusion transcripts
High-throughput sequencing technology used
for probing the transcriptome of a sample
DTU Aqua14. juni 2019 8
1. Definitions
2. Sample collections and RNA integrity
3. Library preparation
4. Data analyses
– Read mapping/assembly
– Normalization
– Read count
– Differential expression
– Functional enrichment
DTU Aqua14. juni 2019 9
Before RNA extraction
RNA is more unstable than DNA, therefore higher
precautions are needed to avoid degradation
TISSUE COLLECTION:
• Liquid nitrogen
• RNA later (for solid tissues)
• Tempus/Pax tubes (for blood)
DTU Aqua14. juni 2019 10
After RNA extraction
RIN (RNA integrity number): algorithm for
assigning integrity values to RNA measurements.
10: maximum
0: minimum
Integrity RIN>7 ok
DTU Aqua14. juni 2019 11
RNA quality (RIN)
and quantification:
Bioanalyzer
DTU Aqua14. juni 2019 12
1. Definitions
2. Sample collections and RNA integrity
3. Library preparation
4. Data analyses
– Reads mapping/assembly
– Normalization
– Read count
– Differential expression
– Functional enrichment
DTU Aqua14. juni 2019 13
Different steps for different
RNAs
Total RNA seq (DNase treatment, Ribosomal
depletion, fragmentation, library preparation)
mRNA+lnc (polyA+) RNA seq (DNase
treatment, polyA enrichment, fragmentation,
library preparation)
shortRNA seq (DNase treatment, Size selection,
library preparation)
DTU Aqua14. juni 2019 14
LIBRARY PREPARATION
DTU Aqua14. juni 2019 15
LIBRARY PREPARATION…
with 3rd gen. sequencing
DTU Aqua14. juni 2019 16
SUMMARY
1. Definitions
2. Sample collections and RNA integrity
3. Library preparation
4. Data analyses
– Read mapping/assembly
– Normalization
– Read count
– Differential expression
– Functional enrichment
DTU Aqua14. juni 2019 17
Transcriptome assembly strategies
Reference-based
De novo
Pseudoalignment
DTU Aqua14. juni 2019 18
Martin et al. 2011, Nature Review Genetics
DTU Aqua14. juni 2019 1919
Reference-based: Most common tools
• Unspliced read aligner
BWA
Bowtie2
Novoalign
• Spliced read aligner
Tophat2/Hisat2
STAR
• Splice-junction not
considered
• Ideal for mapping against
cDNA databases
• Novel splice-junction
detected
• Better performance for
polymorphic regions and
pseudogenes
DTU Aqua14. juni 2019 20
• (Cufflinks/StringTie)
1) First you map all the reads from your experiment
to the reference sequence.
2) Then you run another step where you use the
mapped reads to assemble potential transcripts
and identify the genomic locations of introns and
exons.
REFERENCE-GUIDED
ASSEMBLY
DTU Aqua14. juni 2019 21
Splice junctions view through IGV (Integrative Genomics Viewer)
DTU Aqua14. juni 2019 22
• Velvet
Genomics and transcriptomics
• Trinity
Transcriptomics
De novo assembly: Most common tools
DTU Aqua14. juni 2019
KALLISTO-PSEUDOALIGMENT
• Most RNA seq tools do RNA seq analysis in two
parts-
• Alignment
• Quantification
• Kallisto fuses the two steps
N. Bray et al., Nature Biotechnology (2016)
DTU Aqua14. juni 2019
...
... ...
...
...
... ...
...
...
... ...
...
∩∩ =
a
b
c
d
e
• Create every k-mer in the transcriptome, build de Bruin
Graph and mark each k-mer
• Preprocess the transcriptome to create the T-DBG
• Indexing is faster
Target de Bruijn Graph (T-DBG)
DTU Aqua14. juni 2019
Target de Bruijn Graph (T-DBG)
...
... ...
...
...
... ...
...
...
... ...
...
∩∩ =
a
b
c
d
e
...
... ...
...
...
... ...
...
...
... ...
...
∩∩ =
a
b
c
d
e
...
... ...
...
...
... ...
...
...
... ...
...
∩∩ =
a
b
c
d
e
• Use k-mers in read to find which transcript it came
from
• pseudoalignment : which transcripts the read (pair) is
compatible
DTU Aqua14. juni 2019
Target de Bruijn Graph (T-DBG)
...
... ...
...
...
... ...
...
...
... ...
...
∩∩ =
a
b
c
d
e
• Each k-mer appears in a set of transcripts
• The intersection of all sets is our pseudoalignment
http://arxiv.org/pdf/1505.02710v2.pdf
DTU Aqua14. juni 2019 27
NORMALIZATION
• Longer genes will have more reads mapping to
them (within samples)
• Sequencing run with more depth will have more
reads mapping on each gene (between
samples)
DTU Aqua14. juni 2019 28
MAIN FACTORS DURING
NORMALIZATION
Sequencing depth
DTU Aqua14. juni 2019 29
MAIN FACTORS DURING
NORMALIZATION
Gene length
DTU Aqua14. juni 2019 30
MAIN FACTORS DURING
NORMALIZATION
RNA composition Anders and Huber , 2010 Genome Biol.
DTU Aqua14. juni 2019 31
NORMALIZATION
Normalization method Description Accounted factorsRecommendations for
use
TPM (transcripts per
kilobase million)
counts per length of
transcript (kb) per million
reads mapped
sequencing depth and
gene length
gene count comparisons
within a sample or
between samples of the
same sample group; NOT
for DE analysis
RPKM/FPKM(reads/frag
ments per kilobase of
exon per million
reads/fragments
mapped)
similar to TPMsequencing depth and
gene length
gene count comparisons
between genes within a
sample; NOT for
between sample
comparisons or DE
analysis
DESeq2’s median of
ratios
counts divided by
sample-specific size
factors determined by
median ratio of gene
counts relative to
geometric mean per gene
sequencing depth and
RNA composition
gene count comparisons
between samples and
for DE analysis; NOT for
within sample
comparisons
Common normalization methods
DTU Aqua14. juni 2019 32
READ COUNT
Count the
number of reads
aligned to each
known
transcripts/isofor
m
E.g HTSeq-count
DTU Aqua14. juni 2019 33
DIFFERENTIAL EXPRESSION
DTU Aqua14. juni 2019 34
FUNCTIONAL ENRICHMENT
ANALYSIS
Identification of classes of genes that are over-
represented among the differentially expressed genes,
and may have an association with the
disease/phenotype investigated
Gene Ontology project provides an ontology of defined terms representing
gene product properties. The ontology covers three domains:
•Molecular function: molecular activities of gene products
•Cellular component: where gene products are active
•Biological process: pathways and larger processes made up of the
activities of multiple gene products.
DTU Aqua14. juni 2019 35
Some GO and pathway analyses
websites
http://amp.pharm.mssm.edu/Enrichr/
http://cbl-gorilla.cs.technion.ac.il/
https://david.ncifcrf.gov/
https://cytoscape.org/