INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf ·...

Preview:

Citation preview

INTRODUCTION TO

NEXT GENERATION SEQUENCING

Claude Thermes

Analyse du génome Centre de Génétique Moléculaire

Gif-sur-Yvette 18/11/2013

ECOLE DE BIOINFORMATIQUE

INITIATION AU TRAITEMENT DES DONNÉES DE GÉNOMIQUE OBTENUES PAR SÉQUENÇAGE À HAUT DÉBIT

14-18 JANVIER 2013 - STATION BIOLOGIQUE - ROSCOFF

Step 1: sample preparation

Step 2: sequencing (Illumina)

Step 3: data analysis

(with permission of ABIMS)

Step 1: sample preparation

situation en 2009

Step 1: sample preparation

0.1 µg avec purif. Ribozero 1ng avec proto. Totalscript

1-2 ng

1 µg total RNA

50 ng avec proto. Nextera

Paired end Moleculo/Lrseq Mate pair Rad-seq Clip-seq Net-seq ....

situation en 2013

DNA-­‐Seq  Library  

Genomic  DNA  

liga/on   PCR  

PCR  product  

Fragmented  DNA  

Cleavage  (sonica.on)  

?

Adaptor ligation

Paired  end  sequencing  

1rst read

2d read

Single  read  density  

?   ?   ?  

Paired  end  density  

Genome  or  transcript  assembly    

Comparison  of  single  read  versus  paired  end  sequencing    

•   improves  genome  assembly  

•   be@er  iden/fica/on  of  RNA  5’  and  3’  ends  •   but  requires  a  good  control  of  DNA  fragmenta/on  (purifying  gels/columns)                    

•                       /me  consuming  and  requires  large  quan//es  (1-­‐5  µg)                                

Paired  end  density  

Single  read  density  

?   ?   ?  

Paired  end  density  

Paired  end  sequencing  :

Nextera “tagmentation” : a new methodology for construction of paired end libraries

Tagmentation

Dual barcode approach

up to 96 indexed samples

Tagment Enzyme fragments DNA and attaches junction adapters (blue and green) to both ends of the tagmented molecule

 rapid  (  2  hours)  and  requires  small  quan//es  (50  ng)

Transposomes / Tagment Enzyme

A recent improvement of mate pair libraries :

Illumina “Moleculo/LRSeq” technology

Genomic DNA is:

- sheared into 6–8 kb fragments

- partitioned into several 96-well plates

- further fragmented to 600–800 bp

-  barcoded and sequenced separately

  limiting the number of DNA molecules per well allows to study INDIVIDUAL FRAGMENTED MOLECULES

  almost eliminates chances of having a repeated or duplicate sequence within a defined partition

  since each well is over-sequenced, the error rate is reduced by the coverage

Voskoboynik et al. eLife Sciences 2013

•  assembly of complex, repeat-rich genomes

•  identification of alternative transcripts

Paired end fragments are too short

in particular for assembling large genomes with many repeated elements

mate pair libraries

“Classical”  Illumina  mate  pair  library  

Problems  :  low  coverage  few  fragments,  over-­‐amplified  

several kilobases

Nextera Mate Pair : a new methodology for construction of mate pair fragments

Tagment Enzyme fragments DNA and attaches a biotinylated junction adapter (green) to both ends of the tagmented molecule

circularization

Fragmentation enrichment via the biotin tag

adapters ligation at both ends

Nextera Mate Pair : a new methodology for construction of mate pair fragments

Tagment Enzyme fragments DNA and attaches a biotinylated junction adapter (green) to both ends of the tagmented molecule

circularization

Fragmentation enrichment via the biotin tag

adapters ligation at both ends  rapid  (  few  hours)  and  requires  small  quan//es  (50  ng)

Rad-seq: Restriction site Associated DNA sequencing

Genome sub-sampling that allows to simultaneously discover and score large numbers of SNP markers in several (hundreds) individuals for minimal investment

widely applied to genetic mapping in a variety of organisms

Baird et al. (2008) PLoS ONE

Amplification primer

Amplification primer

prevents amplification of genomic fragments lacking a P1 adapter

prevents amplification of genomic fragments lacking a P1 adapter

Amplification primer

AGAACAA!TCTTGTT!

No Amplification primer

Amplification primer

AGAACAA!TCTTGTT!

prevents amplification of genomic fragments lacking a P1 adapter

No Amplification primer

CLIP-Seq : cross-linking immunoprecipitation sequencing

•  Sequencing RNA sequences that interact with a particular RNA-binding protein :

•  UV-crosslinking between RNA and the protein

•  immunoprecipitation with antibodies for the protein

•  fragmentation,

•  sequencing

Sanford et al. Genome Research (2009)

•  sequencing of 5’ ends of nascent RNAs still associated with the elongating polymerase complexes •  detects the distribution of transcribing polymerases along the genome in a strand specific manner

NET-seq : Native Elongating Transcript sequencing

Churchman and Weissman, 2011

Pol II Pol II

Pol II Pol II

Pol II

Cells in desired condition

RNA polymerase II immunoprecipitation

Recovery of nascent transcripts Associated with the polymerase

RNA-seq and mapping on the genome

Some problems encountered when preparing libraries

DNA-­‐Seq  Library  

read  coverage  correlates  with  GC  content  

GC  content  %   read  coverage  

posi/on  (bp)  

                               GC  content                                  read  coverage  

Are these fluctuations reproducible between replicates ?

Multiplexed replicates to avoid differences due to sequencing

Mul/plexing  

Different  DNA  samples  

Liga.on  

PCR  amplifica.on(12-­‐18  cycles)  

Fragmented  DNA  samples  

Cleavage    

Tagged adaptors

Calibra.on  

Sequencing  of  the  mixed  libraries  in  the  same  line  

Mul/plexing  before  PCR  

Different  DNA  samples  

Liga.on  

PCR  amplifica.on(12-­‐18  cycles)  

Fragmented  DNA  samples  

Cleavage    

Tagged adaptors

Calibra.on  

Sequencing  of  the  mixed  libraries  in  the  same  line  

sample1

sample2

sample3

sample4

sample5

sample6

sample7

sample8

posi/on  (bp)  

read  coverage  (normalized)  

Mul/plex  liga/on  before  PCR    

Libraries  prepared  from  very  small  amounts  

of  DNA  or  RNA  (<<  1ng)    

•   ChIP-­‐seq  with  very  small  amounts  of  immuno-­‐precipitated  material  

•   RNA  from  small  amounts  of  /ssue  (laser  dissec/on  

Typical  problem  :  accumula/on  of  dimers  of  the  two  adaptors  

•   adaptor  dimers  are  amplified  more  rapidly  than  other  fragments  and  “invade”  the  libraries  

•   they  cons/tute  the  majority  of  sequenced  reads  

•   rare  fragments  then  tend  to  be  non  homogenously  amplified    

Sequencing  of  very  small  amounts  of  genome  fragments  (<<  1ng)    13 kb

43 kb

Small in put DNA

Increasing input DNA

Comparison  of  two  RNA-­‐seq  library  protocols:  

SOLiDTM  Whole  Transcriptome  Analysis  Kit  (RNase  III  fragmenta.on)    

versus  

Illumina’s  direc/onal  mRNA-­‐Seq  Library  (Zinc  fragmenta.on)  

N   NNNNNN  5’   3’  

HybridizaEon  with  adapters,  ligaEon  

Reverse  transcripEon  

PCR  amplificaEon  

Size  selecEon  

RiboMinus  RNA  

fragmented  RNA  

RNaseIII  

SOLiDTM  Whole  Transcriptome  Analysis  Kit:  RNase  III  fragmenta.on  

Sequencing  on  SOLiD  

YBR078W  intron  

SOLiD  

YBR078W  intron  

SOLiD  

Illumina  

Sequencing  on  Illumina  

 Very  heterogeneous  pa@ern;  not  due  to  sequencing  technology  but  to  library  prepara/on:  

RNase  III  fragmenta/on  not  so  random?  

liga/on   RT   PCR  ds  PCR  product  

ribo-­‐    RNA  

fragmented  RNA  

Zinc  

Total  RNA  

Deple.on  of  ribosomal  RNA  

Illumina  direcEonal  mRNA-­‐Seq  Library:  Zinc  fragmenta.on  

RNase  III  

Zinc  

YBR078W  intron  

Illumina  direcEonal  mRNA-­‐Seq  Library:  Zinc  fragmenta.on  

Same number of reads

Rnase III

Zinc fragmentation

Correlation between

nucleotides

Distance between nucleotides

M. Wery, M. Descrimes, C. Thermes, D. Gautheret & A. Morillon (submitted)

Supports: CNRS, ACI IMPBio, ANR

Recommended