10
In the Pursuit of Optimal Sequence Trimming Parameters for EST Projects Fabiano C. Peixoto & J. Miguel Ortega LCC-CENAPAD A T G C BIOINFORMÁTICA UFMG

In the Pursuit of Optimal Sequence Trimming Parameters for EST Projects Fabiano C. Peixoto & J. Miguel Ortega LCC-CENAPAD A T G C BIOINFORMÁTICA UFMG

Embed Size (px)

Citation preview

Page 1: In the Pursuit of Optimal Sequence Trimming Parameters for EST Projects Fabiano C. Peixoto & J. Miguel Ortega LCC-CENAPAD A T G C BIOINFORMÁTICA UFMG

In the Pursuit of Optimal Sequence Trimming Parameters

for EST Projects

Fabiano C. Peixoto & J. Miguel Ortega

LCC-CENAPAD

A

T

GCBIOINFORMÁTICA UFMG

Page 2: In the Pursuit of Optimal Sequence Trimming Parameters for EST Projects Fabiano C. Peixoto & J. Miguel Ortega LCC-CENAPAD A T G C BIOINFORMÁTICA UFMG

Noticed:

• BLAST results• Phred 15• Too much trimming

0

10

20

30

40

50

Page 3: In the Pursuit of Optimal Sequence Trimming Parameters for EST Projects Fabiano C. Peixoto & J. Miguel Ortega LCC-CENAPAD A T G C BIOINFORMÁTICA UFMG

Query: 469 TTAGGAGGATCGTTTTTAGAATCCCCTGCAACGTTACCACGGTGGATTTCACTGACTGCG 528 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||Sbjct: 1038 ttaggaggatcgtttttagaatcccctgcaacgttaccacggtggatttcactgactgcg 979

Query: 529 ACGTTCTTAACGTTGAATCCAACGTTGCTACCAgggagagcctcagtaagtgcttcatga 588 ||||||||||||||||| || |||||||||||||||||| ||||||||||||||||||||Sbjct: 978 acgttcttaacgttgaagcccacgttgctaccagggagaccctcagtaagtgcttcatga 919

Query: 589 tgcatttcgacagaattgacttcagtcgacaaaccttgcggagcaaaagtgacgaccata 648 |||||||||||||| |||||||||| |||| ||||||||||| |||||||||||||||||Sbjct: 918 tgcatttcgacagacttgacttcagccgaccaaccttgcggaccaaaagtgacgaccata 859

Query: 649 ccaggcttgatgataccagtttcaacgc 676 ||||||||||||||||||||||||||||Sbjct: 858 ccaggcttgatgataccagtttcaacgc 831

.TGAAGCTTTCAGCTTCTTTAGGAGGATCGTTTTTAGAATCCCCTGCAACGTTACCACGGTGGATTTCACTGACTGCGACGTTCTTAACGTTGAATCCAACGttGCTACCAgggagagcctcagtaagtgcttcatgatgcatttcgacagaattgacttcagtcgacaaaccttgcggagcaaaagtgacgaccataccaggcttgatgataccagtttcaacgcctcggggccaggctggcgtgaacagggcctagcgggtccgcgggggaagggtcccggctcaatccaccaatagagcggagctaaagtgacgggggcgcca

Phred 15

Page 4: In the Pursuit of Optimal Sequence Trimming Parameters for EST Projects Fabiano C. Peixoto & J. Miguel Ortega LCC-CENAPAD A T G C BIOINFORMÁTICA UFMG

Experimental approach

Sequences:

•pUC18 plasmidial vector (published sequence)•Sequence reaction:

•Single pool - 3 plates (96 samples)•MegaBACE sequencer

•3 reads for each plate, esd processing - 846 reads

Processing:

•BLAST (MegaBLAST, as in UniGene)•Phred

•trim: a chromatogram analyzer•trim_alt: trim_cutoff parameter 1% up to 25%

Page 5: In the Pursuit of Optimal Sequence Trimming Parameters for EST Projects Fabiano C. Peixoto & J. Miguel Ortega LCC-CENAPAD A T G C BIOINFORMÁTICA UFMG

-500

-400

-300

-200

-100

0

100

200

1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25%

Trim_cutoff parameter value(%)

Nu

mb

er

of

ba

se

s

Included (trim) Discarded (trim) Included (TrimAlt) Discarded(TrimAlt)

Page 6: In the Pursuit of Optimal Sequence Trimming Parameters for EST Projects Fabiano C. Peixoto & J. Miguel Ortega LCC-CENAPAD A T G C BIOINFORMÁTICA UFMG

-500

-400

-300

-200

-100

0

100

200

1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25%

Trim_cutoff parameter value(%)

Nu

mb

er

of

Ba

se

s (

x 1

00

0)

Included (trim) Discarded (trim) Included (TrimAlt) Discarded (TrimAlt) Total (trim_alt)

Page 7: In the Pursuit of Optimal Sequence Trimming Parameters for EST Projects Fabiano C. Peixoto & J. Miguel Ortega LCC-CENAPAD A T G C BIOINFORMÁTICA UFMG

0

100

200

300

400

500

600

700

800

900

1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25%

Trim_cutoff parameter value

Nu

mb

er

of

Se

qu

en

ce

s

Included (Trim) Discarded (Trim) Included (TrimAlt) Discarded (TrimAlt)

Page 8: In the Pursuit of Optimal Sequence Trimming Parameters for EST Projects Fabiano C. Peixoto & J. Miguel Ortega LCC-CENAPAD A T G C BIOINFORMÁTICA UFMG

0,00%

5,00%

10,00%

15,00%

20,00%

25,00%

30,00%

1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25%

total miscall stepwise miscall

16% 17%

Trim_alt sequence

BLAST

gaps/missmatches(% of bases)

Additionalbases

3%

Page 9: In the Pursuit of Optimal Sequence Trimming Parameters for EST Projects Fabiano C. Peixoto & J. Miguel Ortega LCC-CENAPAD A T G C BIOINFORMÁTICA UFMG

Conclusions

•trim_alt algorithm can be used with the trim_cutoff parameter up to 18%,

without including miscalled bases

•trim_alt algorithm with the proper parameters is capable of recovering more information than the trim algorithm

•other trimming algorithms, such as window- based ones, may also be analyzed in the same way

Page 10: In the Pursuit of Optimal Sequence Trimming Parameters for EST Projects Fabiano C. Peixoto & J. Miguel Ortega LCC-CENAPAD A T G C BIOINFORMÁTICA UFMG

Aknowledgements

Sequences:

•Laboratório de Genética e Bioquímica•Laboratório de Imunologia de Doencas Infecciosas•Laboratório de Biodiversidade e Evoluçâo Molecular•Marina M. Mourão , Lucila Grossi and Renata A. Ribeiro (UFMG, Rede Genoma de Minas Gerais)

Computing facilities:

•CENAPAD-MG/CO