Transcript
Page 1: Mining hidden information from your 454 data using  modular and database oriented methods

Mining hidden information from your 454 data using modular and database

oriented methods

Joachim De Schrijver

Page 2: Mining hidden information from your 454 data using  modular and database oriented methods

Short introduction on 454 sequencing Variant Identification pipeline Possibilities of a DB oriented pipeline Examples

◦ Coverage◦ Improving PCR◦ Fast Q assessment◦ Homopolymers

Overview

Page 3: Mining hidden information from your 454 data using  modular and database oriented methods

Roche/454 GS-FLX sequencing:◦ Pyrosequencing◦ ± 400,000 reads/run◦ Average length: 200-250bp

Applications:◦ Resequencing: Variant identification◦ De novo (genome) sequencing: Assembly of new

regions, plasmids or entire genomes Standard Software:

◦ Variants: Amplicon Variant Analyzer (AVA)◦ Assembly: Standard 454 assembler

Introduction (i)

Page 4: Mining hidden information from your 454 data using  modular and database oriented methods

Standard software◦ + Easy to use◦ + reproducible results on similar datasets◦ + GUI (graphical user interface)◦ - No answer for ‘non-standard’ questions

Methylation experiments Different types of experiments grouped together …

◦ - What about ‘hidden’ information? Homopolymer error rates Quality score ~ length of sequenced read ‘Multirun’ information …

Introduction (ii)

Page 5: Mining hidden information from your 454 data using  modular and database oriented methods

Modular and database oriented pipeline

Modular:◦ Efficient planning◦ Scalable

Database (DB):◦ No loss of data◦ Grouping several

runs together

Variant Identification Pipeline (i)

Page 6: Mining hidden information from your 454 data using  modular and database oriented methods

Basic idea: Data is processed and stored in DB. Results (reports) are calculated ‘on the fly’ using the DB data.◦ Fast & efficient◦ Calculations only happen once◦ Everybody can access the database without risk of

data modification◦ Reporting is independent from the dataprocessing

Paper: De Schrijver et al. 2009. Analysing 454 sequences with a modular and database oriented Variant Identification Pipeline

Variant Identification pipeline (ii)

Page 7: Mining hidden information from your 454 data using  modular and database oriented methods

VIP originally developed for variant identification

Now being used in:◦ Amplicon resequencing◦ De novo shotgun◦ Methylation ◦ ~ solexa experiments

‘Hidden’ data can be extracted using intelligent querying strategies

Results per lane/Multiplex MID/run…

Possibilities of a DB oriented pipeline

Page 8: Mining hidden information from your 454 data using  modular and database oriented methods

Coverage can be calculated per◦ Lane◦ MID◦ Amplicon◦ Base position

Assessment of errors (PCR dropouts vs. human errors)

Example: Detailed coverage

1 2 3 4 5 6 7 8 9 10 11 120.00%2.00%4.00%6.00%8.00%

10.00%12.00%14.00%

MID frequency (unmapped)

Page 9: Mining hidden information from your 454 data using  modular and database oriented methods

Amplicon Resequencing experiment

Goal: Variant identification Length distributions

◦ Mapped◦ Unmapped◦ ‘Short’ mapped

Additional length separation + Improved PCR

Result: Improved efficiency

Example: Improving PCR

Page 10: Mining hidden information from your 454 data using  modular and database oriented methods

Can the length of a homopolymer be assessed using the Q score?

Yes, when homopolymer length < 6bp

Example: Homopolymers

Page 11: Mining hidden information from your 454 data using  modular and database oriented methods

Fast assessment of the quality of a run

Example: Q assessment

1 27 53 79 10513115718320923526128731333936505

1015202530354045

Q value ~ position

Q v

alue

0 50 100 150 200 250 30005

101520253035404550

Q value ~ position

Lab work OK Errors in lab work

Page 12: Mining hidden information from your 454 data using  modular and database oriented methods

Biobix – UgentWim Van CriekingeTim De MeyerGeert TrooskensTom VandekerkhoveLeander Van NesteGerben Mensschaert

CMG – UZ GentJo VandesompeleJan HellemansFilip PattynSteve LefeverKim DeleeneerJean-Pierre Renard

Acknowledgements NXT-GNT

Paul CouckeSofie BekaertFilip Van NieuwerburghDieter DeforceWim Van CriekingeJo Vandesompele

Page 13: Mining hidden information from your 454 data using  modular and database oriented methods

Questions ?

[email protected]


Recommended