RNA-Seq - · PDF fileWhat is RNA-seq? • RNA-seq is the high-throughput sequencing of the...

RNA-SeqFrancesco Favero

27626: Next Generation Seqeuncing Analysis CBS - DTU

What is RNA-seq?

• RNA-seq is the high-throughput sequencing of the cDNA

• It’s used to measure the RNA expression

• It’s the NGS equivalent of microarray gene-expression

RNA-seq applications

• Discovery

• new transcript

• transcript boundaries

• splice junctions

• Comparison (between different samples)

• evaluate gene expression

• evaluate difference in splice patterns, isoform abundance.

44 Revolution – RNA-Seq – PCR-free – Ribo-Seq – CLIP-Seq – Normalization - FFPE

RNA families

Coding

PolyAmRNA

Non-PolyAmRNA

Non-coding

Structural

DNA associated

Replisome

DNA Repair

Telomeric

DNA methylation

(piRNA)

RNA associated

Ribosome associated rRNA

Regulatory

Micro RNA

TSS associated

Anti-sense

Enhancer RNA

RNA-seq and poly-A

• RNA-seq preparation protocols usually includes poly-A selection.

• tentative to remove rRNA

RNA-seq and poly-A

• RNA-seq preparation protocols usually includes poly-A selection.

• tentative to remove rRNA

• not only the mRNA appears to be poly-adenylated

• Always look at the used library preparation protocol (other approach are possible eg.: rRNA depletion kit)

RNA-seq vs microarray

• Microarray: • Pro:

• Costs, well established methods, small data

• Cons:

• Hybridization bias, sequence must be known

• RNAseq • Pro:

• Reproducible (no replicate needed), real transcriptome

• Information rich - not limited to expression -

• Cons:

• Complexity (need a lot of step to have actual results)

• Size and computational power

Marioni J C et al. Genome Res. 2008;18:1509-1517

Differential expressed genes called by microarray and RNA-seq

Alignment methods

• Two different approach are possible:

• Align vs the transcriptome

• faster, easier

• Align vs the whole genome

• the complete information

Alignment tools

• NGS common alignment program:

• BWA

• Bowtie (Bowtie2)

• Novoalign

• Take into account splice-junction

• Tophat/Cufflinks

Transcriptome assembly

Alternative splicing

Alternative splicing is a normal biological phenomenon. !One gene can encode different protein, by changing the combination of transcribed exons

De novo Assembly

• Transcriptomic content is more changeable then DNA genomic content

• Isoforms, alternative splicing.

• gene fusion

• Mapping reads on reference genome is unable to cope with such structural alterations.

• De novo transcriptome assembly

De novo Assembly

• Underlying assumptions relative to RNA-expression

• sequence coverage is similar in reads of the same transcript

• strand specific (sense and antisense transcripts)

• Assemblers:

• Velvet (Genomic and transcriptomic)

• Trinity (Transcriptomic)

• Cufflinks (Transcriptominc, reassemble pre-aligned transcripts to find alternative splicing based on differential expression)

RNA-seq and “reads”

• Reads, counts, call them as you wish. The number of reads for region reflect the expression level.

• Different way to consider the reads

• each reads = 1 count

• FPKM (fragment per kilobase of exon per million)

• The aim of FPKM is to deal with the fact that most reads will map to several transcripts. Each read influence the FPKM values of all these transcripts, but will not augment each count by one.

• FPKM is calculated by software like Cufflinks (http://cufflinks.cbcb.umd.edu/)

• Is not possible to converting FPKM back to reads

• length transcript times FPKM != reads. Each read match more transcripts

• Useful to compare abundance of different transcript within the same sample

• Might be able to detect alternative splicing

reads-count

• Considering reads we need to be sure that the alignment is unambiguous.

• Software like HTSeq counts reads discarding ambiguous or not-unique match.

Differential expression

• Reads obtained from different samples

• Compare reads for the same transcript/gene in the various samples

• Challenge:

• Annotation

• Statistics

Challenges

• Annotation

• Alignment to the transcriptome (transcript_id).

• Alignment to the reference genome

• use a GTF to map the desired features type into the genome (HTSeq uses that)

• Statistics

• R/Bioconductor (edgeR, DESeq... more)

Statistics

• A series of observations can be associated with a distribution function.

• Generally the most correct function is described by a binomial or a Poisson distribution.

• The advantage of using a distribution is that given few parameter (eg: size and mean) we can describe the whole data

Statistics

• The advantage of using a distribution is that given few parameter (eg: size and mean) we can describe the whole data

Statistics

• Fitting RNA-seq data in a pure Poisson distribution, the observed variance would results higher then expected (Overdispersion)

• A negative-binomial distribution, is a similar to a Poisson distribution with higher variance.

• neg. binom. is implemented in several R packages, it is a better fit in counts model, like RNA-seq case

Statistics

http://www.ats.ucla.edu/stat/stata/seminars/count_presentation/count.htm

Statistics

Poisson and Neg. binomial parameterhttp://www.ats.ucla.edu/stat/stata/seminars/count_presentation/count.htm

Statistics

Poisson and Neg. binomial parameterhttp://www.ats.ucla.edu/stat/stata/seminars/count_presentation/count.htm

Additional negative binomial parameter. when overdispersion = 0

neg. binom = Poisson

Statistics

• Normalization:

• Different sample have different number of total reads (library size)

• Normalization for library size (each package implement a different method)

• From the author of limma (linear model microarray)

• Negative-binomial distribution

• Normalize for size (Normalization-factor)

Ensemble gene ID

Log Fold Change log(rgene_iDHT) - log(rgene_iControl)

log Counts per Million convertible to RPKM/FPKM

Statistical scores

RNA-seq

Thanks!

RNA-Seq - · PDF fileWhat is RNA-seq? • RNA-seq is the high-throughput sequencing of the...

Documents

RNA-seq data analysis - CSC · Analysis tool overview 250 NGS tools for • RNA-seq • single cell RNA-seq • miRNA-seq • exome/genome-seq • ChIP-seq • FAIRE/DNase-seq •

ChIP-seq MBD-seq (MIRA-seq) BS-seq RNA-seq miRNA-seq

1.RNA Seq Part1 WorkingToTheGoal

Biases in RNA- Seq data October 30, 2013 NBIC Advanced RNA- Seq course

Bioinformatics for DNA - seq and RNA- seq experiments

RNA-Seq de novo assembly traininggenoweb.toulouse.inra.fr/~formation/RNASeq_de_novo/RNASeq_de_… · – RNA-Seq techniques RNA-Seq experiment set up Read quality assessment Read

Tutorial - QIAGEN Bioinformatics€¦ · Four workflows: 1.RNA-Seq and IPA analysis workflow 2.RNA-Seq and IPA advanced analysis workflow 3.RNA-Seq analysis workflow 4.RNA-Seq analysis

Introductiontodiﬀerentialgeneexpressionanalysisusing RNA-seq · Figure 1 RNA-seq work flow. (a) Schematic diagram of RNA-seq library construction. Total RNA is extracted from 300,000

RNA-Seq and Single-Cell RNA-Seq Tertiary Analysismed.stanford.edu/content/dam/sm/gbsc/YueZhang_2016_Genetics_R… · 3. Statistical Methods RNA-Seq and Single-Cell RNA-Seq Tertiary

Introduction to RNA-Seq - University of California, Davis...Introduction to RNA-Seq Monica Britton, Ph.D. Bioinformatics Analyst December 2014 Workshop Overview of RNA-Seq Activities

RNA-seq differential expression analysis

Rna seq pipeline

RNA-Seq Module 1

Analysis of RNA-seq Data - University of Hong Kongcgs.hku.hk/portal/files/GRC/Events/Seminars/2017/20170208/rna-seq.pdf · Outline • What is RNA-seq? • What can RNA-seq do? •

RNA-Seq analysis workshop - · PDF fileOutline • Background of RNA-Seq • Application of RNA-Seq (what RNA-Seq can do?) • Available sequencing platforms and strategies and which

RNA-seq data analysis - DKFZ · PDF file1 RNA-seq data analysis RNA-seq data analysis 1. Introductionto RNA-seq 2. Qualitycontrol, preprocessing 3. Alignment to reference 4. Quantitation

RNA-Seq with R-Bioconductor

Practical RNAPractical RNA-Seq analysisbarc.wi.mit.edu/education/hot_topics/RNAseq_Feb2014/RNA-seq_Feb_2014.slides_color.pdfPractical RNAPractical RNA-Seq analysis BaRC Hot Topics

RNA-Seq Analysis Overview

Rna seq and chip seq