13
Mining hidden information from your 454 data using modular and database oriented methods Joachim De Schrijver

Mining hidden information from your 454 data using modular and database oriented methods

  • Upload
    eben

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Mining hidden information from your 454 data using modular and database oriented methods. Joachim De Schrijver. Overview. Short introduction on 454 sequencing Variant Identification pipeline Possibilities of a DB oriented pipeline Examples Coverage Improving PCR Fast Q assessment - PowerPoint PPT Presentation

Citation preview

Page 1: Mining hidden information from your 454 data using  modular and database oriented methods

Mining hidden information from your 454 data using modular and database

oriented methods

Joachim De Schrijver

Page 2: Mining hidden information from your 454 data using  modular and database oriented methods

Short introduction on 454 sequencing Variant Identification pipeline Possibilities of a DB oriented pipeline Examples

◦ Coverage◦ Improving PCR◦ Fast Q assessment◦ Homopolymers

Overview

Page 3: Mining hidden information from your 454 data using  modular and database oriented methods

Roche/454 GS-FLX sequencing:◦ Pyrosequencing◦ ± 400,000 reads/run◦ Average length: 200-250bp

Applications:◦ Resequencing: Variant identification◦ De novo (genome) sequencing: Assembly of new

regions, plasmids or entire genomes Standard Software:

◦ Variants: Amplicon Variant Analyzer (AVA)◦ Assembly: Standard 454 assembler

Introduction (i)

Page 4: Mining hidden information from your 454 data using  modular and database oriented methods

Standard software◦ + Easy to use◦ + reproducible results on similar datasets◦ + GUI (graphical user interface)◦ - No answer for ‘non-standard’ questions

Methylation experiments Different types of experiments grouped together …

◦ - What about ‘hidden’ information? Homopolymer error rates Quality score ~ length of sequenced read ‘Multirun’ information …

Introduction (ii)

Page 5: Mining hidden information from your 454 data using  modular and database oriented methods

Modular and database oriented pipeline

Modular:◦ Efficient planning◦ Scalable

Database (DB):◦ No loss of data◦ Grouping several

runs together

Variant Identification Pipeline (i)

Page 6: Mining hidden information from your 454 data using  modular and database oriented methods

Basic idea: Data is processed and stored in DB. Results (reports) are calculated ‘on the fly’ using the DB data.◦ Fast & efficient◦ Calculations only happen once◦ Everybody can access the database without risk of

data modification◦ Reporting is independent from the dataprocessing

Paper: De Schrijver et al. 2009. Analysing 454 sequences with a modular and database oriented Variant Identification Pipeline

Variant Identification pipeline (ii)

Page 7: Mining hidden information from your 454 data using  modular and database oriented methods

VIP originally developed for variant identification

Now being used in:◦ Amplicon resequencing◦ De novo shotgun◦ Methylation ◦ ~ solexa experiments

‘Hidden’ data can be extracted using intelligent querying strategies

Results per lane/Multiplex MID/run…

Possibilities of a DB oriented pipeline

Page 8: Mining hidden information from your 454 data using  modular and database oriented methods

Coverage can be calculated per◦ Lane◦ MID◦ Amplicon◦ Base position

Assessment of errors (PCR dropouts vs. human errors)

Example: Detailed coverage

1 2 3 4 5 6 7 8 9 10 11 120.00%2.00%4.00%6.00%8.00%

10.00%12.00%14.00%

MID frequency (unmapped)

Page 9: Mining hidden information from your 454 data using  modular and database oriented methods

Amplicon Resequencing experiment

Goal: Variant identification Length distributions

◦ Mapped◦ Unmapped◦ ‘Short’ mapped

Additional length separation + Improved PCR

Result: Improved efficiency

Example: Improving PCR

Page 10: Mining hidden information from your 454 data using  modular and database oriented methods

Can the length of a homopolymer be assessed using the Q score?

Yes, when homopolymer length < 6bp

Example: Homopolymers

Page 11: Mining hidden information from your 454 data using  modular and database oriented methods

Fast assessment of the quality of a run

Example: Q assessment

1 27 53 79 10513115718320923526128731333936505

1015202530354045

Q value ~ position

Q v

alue

0 50 100 150 200 250 30005

101520253035404550

Q value ~ position

Lab work OK Errors in lab work

Page 12: Mining hidden information from your 454 data using  modular and database oriented methods

Biobix – UgentWim Van CriekingeTim De MeyerGeert TrooskensTom VandekerkhoveLeander Van NesteGerben Mensschaert

CMG – UZ GentJo VandesompeleJan HellemansFilip PattynSteve LefeverKim DeleeneerJean-Pierre Renard

Acknowledgements NXT-GNT

Paul CouckeSofie BekaertFilip Van NieuwerburghDieter DeforceWim Van CriekingeJo Vandesompele

Page 13: Mining hidden information from your 454 data using  modular and database oriented methods

Questions ?

[email protected]