35
DTU Aqua 14. juni 2019 1 Francesca Bertolini DTU Aqua [email protected] RNA-seq

RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 1

Francesca BertoliniDTU Aqua

[email protected]

RNA-seq

Page 2: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 2

SUMMARY

1. Definitions

2. Sample collections and RNA integrity

3. Library preparation

4. Data analyses

– Reads mapping/assembly

– Normalization

– Read count

– Differential expression

– Functional enrichment

Page 3: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 3

TRANSCRIPTOME

Complete set of transcripts in a cell and their

quantity, for a specific developmental stage or a

physiological condition.

Page 4: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 4

RNA classification

•Ribosomal RNA (rRNA): catalytic

component of ribosomes (about 80-85%)

•Transfer RNA (tRNA): transfers amino acids

to polypeptide chain at the ribosomal site of

protein synthesis (about 15%)

•Coding RNA(mRNA): carries information

about a protein sequence to the ribosomes

(about 5%)

•Other Non coding regulatory RNAs

Page 5: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 5

Delpu et al. 2016. Drug Discovery in Cancer Epigenetics

Other non coding regulatory RNAs

3. RNA classification

Page 6: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 6

Long RNAs: splicing

DNA

RNA

mRNA

lncRNA

1. Definitions

Page 7: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 7

RNA-seq

• Abundance estimation/differential expression

• Alternative splicing

• RNA editing

• Novel transcripts

• Allele specific expression

• Fusion transcripts

High-throughput sequencing technology used

for probing the transcriptome of a sample

Page 8: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 8

1. Definitions

2. Sample collections and RNA integrity

3. Library preparation

4. Data analyses

– Read mapping/assembly

– Normalization

– Read count

– Differential expression

– Functional enrichment

Page 9: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 9

Before RNA extraction

RNA is more unstable than DNA, therefore higher

precautions are needed to avoid degradation

TISSUE COLLECTION:

• Liquid nitrogen

• RNA later (for solid tissues)

• Tempus/Pax tubes (for blood)

Page 10: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 10

After RNA extraction

RIN (RNA integrity number): algorithm for

assigning integrity values to RNA measurements.

10: maximum

0: minimum

Integrity RIN>7 ok

Page 11: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 11

RNA quality (RIN)

and quantification:

Bioanalyzer

Page 12: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 12

1. Definitions

2. Sample collections and RNA integrity

3. Library preparation

4. Data analyses

– Reads mapping/assembly

– Normalization

– Read count

– Differential expression

– Functional enrichment

Page 13: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 13

Different steps for different

RNAs

Total RNA seq (DNase treatment, Ribosomal

depletion, fragmentation, library preparation)

mRNA+lnc (polyA+) RNA seq (DNase

treatment, polyA enrichment, fragmentation,

library preparation)

shortRNA seq (DNase treatment, Size selection,

library preparation)

Page 14: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 14

LIBRARY PREPARATION

Page 15: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 15

LIBRARY PREPARATION…

with 3rd gen. sequencing

Page 16: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 16

SUMMARY

1. Definitions

2. Sample collections and RNA integrity

3. Library preparation

4. Data analyses

– Read mapping/assembly

– Normalization

– Read count

– Differential expression

– Functional enrichment

Page 17: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 17

Transcriptome assembly strategies

Reference-based

De novo

Pseudoalignment

Page 18: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 18

Martin et al. 2011, Nature Review Genetics

Page 19: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 1919

Reference-based: Most common tools

• Unspliced read aligner

BWA

Bowtie2

Novoalign

• Spliced read aligner

Tophat2/Hisat2

STAR

• Splice-junction not

considered

• Ideal for mapping against

cDNA databases

• Novel splice-junction

detected

• Better performance for

polymorphic regions and

pseudogenes

Page 20: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 20

• (Cufflinks/StringTie)

1) First you map all the reads from your experiment

to the reference sequence.

2) Then you run another step where you use the

mapped reads to assemble potential transcripts

and identify the genomic locations of introns and

exons.

REFERENCE-GUIDED

ASSEMBLY

Page 21: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 21

Splice junctions view through IGV (Integrative Genomics Viewer)

Page 22: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 22

• Velvet

Genomics and transcriptomics

• Trinity

Transcriptomics

De novo assembly: Most common tools

Page 23: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019

KALLISTO-PSEUDOALIGMENT

• Most RNA seq tools do RNA seq analysis in two

parts-

• Alignment

• Quantification

• Kallisto fuses the two steps

N. Bray et al., Nature Biotechnology (2016)

Page 24: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019

...

... ...

...

...

... ...

...

...

... ...

...

∩∩ =

a

b

c

d

e

• Create every k-mer in the transcriptome, build de Bruin

Graph and mark each k-mer

• Preprocess the transcriptome to create the T-DBG

• Indexing is faster

Target de Bruijn Graph (T-DBG)

Page 25: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019

Target de Bruijn Graph (T-DBG)

...

... ...

...

...

... ...

...

...

... ...

...

∩∩ =

a

b

c

d

e

...

... ...

...

...

... ...

...

...

... ...

...

∩∩ =

a

b

c

d

e

...

... ...

...

...

... ...

...

...

... ...

...

∩∩ =

a

b

c

d

e

• Use k-mers in read to find which transcript it came

from

• pseudoalignment : which transcripts the read (pair) is

compatible

Page 26: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019

Target de Bruijn Graph (T-DBG)

...

... ...

...

...

... ...

...

...

... ...

...

∩∩ =

a

b

c

d

e

• Each k-mer appears in a set of transcripts

• The intersection of all sets is our pseudoalignment

http://arxiv.org/pdf/1505.02710v2.pdf

Page 27: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 27

NORMALIZATION

• Longer genes will have more reads mapping to

them (within samples)

• Sequencing run with more depth will have more

reads mapping on each gene (between

samples)

Page 28: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 28

MAIN FACTORS DURING

NORMALIZATION

Sequencing depth

Page 29: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 29

MAIN FACTORS DURING

NORMALIZATION

Gene length

Page 30: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 30

MAIN FACTORS DURING

NORMALIZATION

RNA composition Anders and Huber , 2010 Genome Biol.

Page 31: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 31

NORMALIZATION

Normalization method Description Accounted factorsRecommendations for

use

TPM (transcripts per

kilobase million)

counts per length of

transcript (kb) per million

reads mapped

sequencing depth and

gene length

gene count comparisons

within a sample or

between samples of the

same sample group; NOT

for DE analysis

RPKM/FPKM(reads/frag

ments per kilobase of

exon per million

reads/fragments

mapped)

similar to TPMsequencing depth and

gene length

gene count comparisons

between genes within a

sample; NOT for

between sample

comparisons or DE

analysis

DESeq2’s median of

ratios

counts divided by

sample-specific size

factors determined by

median ratio of gene

counts relative to

geometric mean per gene

sequencing depth and

RNA composition

gene count comparisons

between samples and

for DE analysis; NOT for

within sample

comparisons

Common normalization methods

Page 32: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 32

READ COUNT

Count the

number of reads

aligned to each

known

transcripts/isofor

m

E.g HTSeq-count

Page 33: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 33

DIFFERENTIAL EXPRESSION

Page 34: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 34

FUNCTIONAL ENRICHMENT

ANALYSIS

Identification of classes of genes that are over-

represented among the differentially expressed genes,

and may have an association with the

disease/phenotype investigated

Gene Ontology project provides an ontology of defined terms representing

gene product properties. The ontology covers three domains:

•Molecular function: molecular activities of gene products

•Cellular component: where gene products are active

•Biological process: pathways and larger processes made up of the

activities of multiple gene products.

Page 35: RNA-seqteaching.healthtech.dtu.dk/material/22126/NGS_RNA-seq...14. juni 2019 DTU Aqua 2 SUMMARY 1. Definitions 2. Sample collections and RNA integrity 3. Library preparation 4. Data

DTU Aqua14. juni 2019 35

Some GO and pathway analyses

websites

http://amp.pharm.mssm.edu/Enrichr/

http://cbl-gorilla.cs.technion.ac.il/

https://david.ncifcrf.gov/

https://cytoscape.org/