40
INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre de Génétique Moléculaire Gif-sur-Yvette 18/11/2013 ECOLE DE BIOINFORMATIQUE INITIATION AU TRAITEMENT DES DONNÉES DE GÉNOMIQUE OBTENUES PAR SÉQUENÇAGE À HAUT DÉBIT 14-18 JANVIER 2013 - STATION BIOLOGIQUE - ROSCOFF

INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

INTRODUCTION TO

NEXT GENERATION SEQUENCING

Claude Thermes

Analyse du génome Centre de Génétique Moléculaire

Gif-sur-Yvette 18/11/2013

ECOLE DE BIOINFORMATIQUE

INITIATION AU TRAITEMENT DES DONNÉES DE GÉNOMIQUE OBTENUES PAR SÉQUENÇAGE À HAUT DÉBIT

14-18 JANVIER 2013 - STATION BIOLOGIQUE - ROSCOFF

Page 2: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Step 1: sample preparation

Step 2: sequencing (Illumina)

Step 3: data analysis

(with permission of ABIMS)

Page 3: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Step 1: sample preparation

situation en 2009

Page 4: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Step 1: sample preparation

0.1 µg avec purif. Ribozero 1ng avec proto. Totalscript

1-2 ng

1 µg total RNA

50 ng avec proto. Nextera

Paired end Moleculo/Lrseq Mate pair Rad-seq Clip-seq Net-seq ....

situation en 2013

Page 5: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

DNA-­‐Seq  Library  

Genomic  DNA  

liga/on   PCR  

PCR  product  

Fragmented  DNA  

Cleavage  (sonica.on)  

?

Page 6: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Adaptor ligation

Page 7: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre
Page 8: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Paired  end  sequencing  

1rst read

2d read

Page 9: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Single  read  density  

?   ?   ?  

Paired  end  density  

Genome  or  transcript  assembly    

Comparison  of  single  read  versus  paired  end  sequencing    

Page 10: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

•   improves  genome  assembly  

•   be@er  iden/fica/on  of  RNA  5’  and  3’  ends  •   but  requires  a  good  control  of  DNA  fragmenta/on  (purifying  gels/columns)                    

•                       /me  consuming  and  requires  large  quan//es  (1-­‐5  µg)                                

Paired  end  density  

Single  read  density  

?   ?   ?  

Paired  end  density  

Paired  end  sequencing  :

Page 11: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Nextera “tagmentation” : a new methodology for construction of paired end libraries

Tagmentation

Dual barcode approach

up to 96 indexed samples

Tagment Enzyme fragments DNA and attaches junction adapters (blue and green) to both ends of the tagmented molecule

 rapid  (  2  hours)  and  requires  small  quan//es  (50  ng)

Transposomes / Tagment Enzyme

Page 12: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

A recent improvement of mate pair libraries :

Illumina “Moleculo/LRSeq” technology

Page 13: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Genomic DNA is:

- sheared into 6–8 kb fragments

- partitioned into several 96-well plates

- further fragmented to 600–800 bp

-  barcoded and sequenced separately

  limiting the number of DNA molecules per well allows to study INDIVIDUAL FRAGMENTED MOLECULES

  almost eliminates chances of having a repeated or duplicate sequence within a defined partition

  since each well is over-sequenced, the error rate is reduced by the coverage

Voskoboynik et al. eLife Sciences 2013

•  assembly of complex, repeat-rich genomes

•  identification of alternative transcripts

Page 14: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Paired end fragments are too short

in particular for assembling large genomes with many repeated elements

mate pair libraries

Page 15: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

“Classical”  Illumina  mate  pair  library  

Problems  :  low  coverage  few  fragments,  over-­‐amplified  

several kilobases

Page 16: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Nextera Mate Pair : a new methodology for construction of mate pair fragments

Tagment Enzyme fragments DNA and attaches a biotinylated junction adapter (green) to both ends of the tagmented molecule

circularization

Fragmentation enrichment via the biotin tag

adapters ligation at both ends

Page 17: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Nextera Mate Pair : a new methodology for construction of mate pair fragments

Tagment Enzyme fragments DNA and attaches a biotinylated junction adapter (green) to both ends of the tagmented molecule

circularization

Fragmentation enrichment via the biotin tag

adapters ligation at both ends  rapid  (  few  hours)  and  requires  small  quan//es  (50  ng)

Page 18: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Rad-seq: Restriction site Associated DNA sequencing

Genome sub-sampling that allows to simultaneously discover and score large numbers of SNP markers in several (hundreds) individuals for minimal investment

widely applied to genetic mapping in a variety of organisms

Baird et al. (2008) PLoS ONE

Page 19: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre
Page 20: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Amplification primer

Page 21: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Amplification primer

prevents amplification of genomic fragments lacking a P1 adapter

Page 22: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

prevents amplification of genomic fragments lacking a P1 adapter

Amplification primer

AGAACAA!TCTTGTT!

No Amplification primer

Page 23: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Amplification primer

AGAACAA!TCTTGTT!

prevents amplification of genomic fragments lacking a P1 adapter

No Amplification primer

Page 24: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

CLIP-Seq : cross-linking immunoprecipitation sequencing

•  Sequencing RNA sequences that interact with a particular RNA-binding protein :

•  UV-crosslinking between RNA and the protein

•  immunoprecipitation with antibodies for the protein

•  fragmentation,

•  sequencing

Sanford et al. Genome Research (2009)

Page 25: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

•  sequencing of 5’ ends of nascent RNAs still associated with the elongating polymerase complexes •  detects the distribution of transcribing polymerases along the genome in a strand specific manner

NET-seq : Native Elongating Transcript sequencing

Churchman and Weissman, 2011

Pol II Pol II

Pol II Pol II

Pol II

Cells in desired condition

RNA polymerase II immunoprecipitation

Recovery of nascent transcripts Associated with the polymerase

RNA-seq and mapping on the genome

Page 26: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Some problems encountered when preparing libraries

Page 27: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

DNA-­‐Seq  Library  

read  coverage  correlates  with  GC  content  

GC  content  %   read  coverage  

posi/on  (bp)  

                               GC  content                                  read  coverage  

Are these fluctuations reproducible between replicates ?

Multiplexed replicates to avoid differences due to sequencing

Page 28: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Mul/plexing  

Different  DNA  samples  

Liga.on  

PCR  amplifica.on(12-­‐18  cycles)  

Fragmented  DNA  samples  

Cleavage    

Tagged adaptors

Calibra.on  

Sequencing  of  the  mixed  libraries  in  the  same  line  

Page 29: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Mul/plexing  before  PCR  

Different  DNA  samples  

Liga.on  

PCR  amplifica.on(12-­‐18  cycles)  

Fragmented  DNA  samples  

Cleavage    

Tagged adaptors

Calibra.on  

Sequencing  of  the  mixed  libraries  in  the  same  line  

Page 30: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

sample1

sample2

sample3

sample4

sample5

sample6

sample7

sample8

posi/on  (bp)  

read  coverage  (normalized)  

Mul/plex  liga/on  before  PCR    

Page 31: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Libraries  prepared  from  very  small  amounts  

of  DNA  or  RNA  (<<  1ng)    

•   ChIP-­‐seq  with  very  small  amounts  of  immuno-­‐precipitated  material  

•   RNA  from  small  amounts  of  /ssue  (laser  dissec/on  

Typical  problem  :  accumula/on  of  dimers  of  the  two  adaptors  

•   adaptor  dimers  are  amplified  more  rapidly  than  other  fragments  and  “invade”  the  libraries  

•   they  cons/tute  the  majority  of  sequenced  reads  

•   rare  fragments  then  tend  to  be  non  homogenously  amplified    

Page 32: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Sequencing  of  very  small  amounts  of  genome  fragments  (<<  1ng)    13 kb

43 kb

Small in put DNA

Increasing input DNA

Page 33: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Comparison  of  two  RNA-­‐seq  library  protocols:  

SOLiDTM  Whole  Transcriptome  Analysis  Kit  (RNase  III  fragmenta.on)    

versus  

Illumina’s  direc/onal  mRNA-­‐Seq  Library  (Zinc  fragmenta.on)  

Page 34: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

N   NNNNNN  5’   3’  

HybridizaEon  with  adapters,  ligaEon  

Reverse  transcripEon  

PCR  amplificaEon  

Size  selecEon  

RiboMinus  RNA  

fragmented  RNA  

RNaseIII  

SOLiDTM  Whole  Transcriptome  Analysis  Kit:  RNase  III  fragmenta.on  

Page 35: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Sequencing  on  SOLiD  

YBR078W  intron  

SOLiD  

Page 36: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

YBR078W  intron  

SOLiD  

Illumina  

Sequencing  on  Illumina  

 Very  heterogeneous  pa@ern;  not  due  to  sequencing  technology  but  to  library  prepara/on:  

RNase  III  fragmenta/on  not  so  random?  

Page 37: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

liga/on   RT   PCR  ds  PCR  product  

ribo-­‐    RNA  

fragmented  RNA  

Zinc  

Total  RNA  

Deple.on  of  ribosomal  RNA  

Illumina  direcEonal  mRNA-­‐Seq  Library:  Zinc  fragmenta.on  

Page 38: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

RNase  III  

Zinc  

YBR078W  intron  

Illumina  direcEonal  mRNA-­‐Seq  Library:  Zinc  fragmenta.on  

Same number of reads

Page 39: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Rnase III

Zinc fragmentation

Correlation between

nucleotides

Distance between nucleotides

M. Wery, M. Descrimes, C. Thermes, D. Gautheret & A. Morillon (submitted)

Page 40: INTRODUCTION TO NEXT GENERATION SEQUENCINGbiow.sb-roscoff.fr/.../NGS_intro_Claude_Thermes.pdf · INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Analyse du génome Centre

Supports: CNRS, ACI IMPBio, ANR