27
The 8th annual Bioinformatics Open Source Conference (BOSC 2007) 18 th July, Vienna, Austria Peter Cock, MOAC Doctoral Training Centre, University of Warwick, UK Biopython Project Update

Biopython

  • Upload
    bosc

  • View
    3.642

  • Download
    5

Embed Size (px)

DESCRIPTION

Title: Biopython UpdateAuthor: Peter Cock

Citation preview

Page 1: Biopython

The 8th annualBioinformatics Open Source

Conference(BOSC 2007)

18th July, Vienna, Austria

Peter Cock,MOAC Doctoral Training Centre,University of Warwick, UK

Biopython Project Update

Page 2: Biopython

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

Talk OutlineTalk Outline What is python?What is python? What is Biopython?What is Biopython? Short historyShort history Project organisationProject organisation What can you do with it?What can you do with it? How can you contribute?How can you contribute? AcknowledgementsAcknowledgements

Page 3: Biopython

What is Python?What is Python?

High level programming High level programming languagelanguage

Object orientatedObject orientated Open Source, free ($$$) Open Source, free ($$$) Cross platform:Cross platform:

Linux, Windows, Mac OS X, …Linux, Windows, Mac OS X, … Extensible in C, C++, …Extensible in C, C++, …

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

Page 4: Biopython

What is Biopython?What is Biopython?

Set of libraries for computational Set of libraries for computational biologybiology

Open Source, free ($$$)Open Source, free ($$$) Cross platform:Cross platform:

Linux, Windows, Mac OS X, … Linux, Windows, Mac OS X, … Sibling project to BioPerl, BioRuby, Sibling project to BioPerl, BioRuby,

BioJava, …BioJava, …

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

Page 5: Biopython

Popularity by Google HitsPopularity by Google Hits PythonPython

PerlPerl

RubyRuby

JavaJava

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

98 million98 million

101 101 millionmillion

101 101 millionmillion

289 289 millionmillion

BiopythonBiopython

BioPerlBioPerl

BioRubyBioRuby

BioJavaBioJava

252,000252,000

610,000610,000

122,000122,000

185,000185,000

BioPerl BioPerl 610,000610,000

Both Perl and Python are strong at textBoth Perl and Python are strong at text Python may have the edge for numerical Python may have the edge for numerical

work (with the Numerical python libraries)work (with the Numerical python libraries)

Page 6: Biopython

Biopython historyBiopython history 1999 : Started by Jeff Chang & Andrew 1999 : Started by Jeff Chang & Andrew

DalkeDalke 2000 : Biopython 0.90, first release2000 : Biopython 0.90, first release 2001 : Biopython 1.00, “semi-complete”2001 : Biopython 1.00, “semi-complete” 2002 : Biopython 1.10, “semi-stable”2002 : Biopython 1.10, “semi-stable” 2003 : Biopython 1.20, 1.21, 1.22 and 1.232003 : Biopython 1.20, 1.21, 1.22 and 1.23 2004 : Biopython 1.24 and 1.302004 : Biopython 1.24 and 1.30 2005 : Biopython 1.40 and 1.412005 : Biopython 1.40 and 1.41 2006 : Biopython 1.422006 : Biopython 1.42 2007 : Biopython 1.432007 : Biopython 1.43

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

Page 7: Biopython

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

Biopython Project Biopython Project OrganisationOrganisation

Releases:Releases: No fixed scheduleNo fixed schedule Currently once or twice a yearCurrently once or twice a year Work from a stable CVS baseWork from a stable CVS base

Bugs:Bugs: Online bugzillaOnline bugzilla Some small changes handled on mailing listSome small changes handled on mailing list

Tests:Tests: Many based on unittest python libraryMany based on unittest python library Also simple scripts where output is verifiedAlso simple scripts where output is verified

Page 8: Biopython

What can you do with What can you do with Biopython?Biopython?

Read, write & manipulate sequencesRead, write & manipulate sequences Restriction enzymesRestriction enzymes BLAST (local and online)BLAST (local and online) Web databases (e.g. NCBI’s EUtils)Web databases (e.g. NCBI’s EUtils) Call command line tools (e.g. clustalw)Call command line tools (e.g. clustalw) Clustering (Bio.Cluster)Clustering (Bio.Cluster) Phylogenetics (Bio.Nexus)Phylogenetics (Bio.Nexus) Protein Structures (Bio.PDB)Protein Structures (Bio.PDB)

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

Page 9: Biopython

Manipulating SequencesManipulating Sequences

Use Biopython’s Seq object, holds:Use Biopython’s Seq object, holds: Sequence data (string like)Sequence data (string like) Alphabet (can include list of letters)Alphabet (can include list of letters)

Alphabet allows type checking, Alphabet allows type checking, preventing errors like appending preventing errors like appending DNA to ProteinDNA to Protein

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

Page 10: Biopython

Manipulating SequencesManipulating Sequencesfrom Bio.Seq import Seqfrom Bio.Alphabet.IUPAC import unambiguous_dna

my_dna=Seq('CTAAACATCCTTCAT', unambiguous_dna)print 'Original:'print my_dnaprint 'Reverse complement:'print my_dna.reverse_complement()

Original:Seq('CTAAACATCCTTCAT', IUPACUnambiguousDNA())Reverse complement:Seq('ATGAAGGATGTTTAG', IUPACUnambiguousDNA())

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

Page 11: Biopython

Translating SequencesTranslating Sequencesfrom Bio import Translatebact_trans=Translate.unambiguous_dna_by_id[11]

print 'Forward translation'print bact_trans.translate(my_dna)print 'Reverse complement translation'print bact_trans.translate( \ my_dna.reverse_complement())

Forward translationSeq('LNILH', HasStopCodon(IUPACProtein(), '*'))Reverse complement translationSeq('MKDV*', HasStopCodon(IUPACProtein(), '*'))

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

Page 12: Biopython

Sequence Input/OutputSequence Input/Output

Bio.SeqIO is new in Biopython 1.43Bio.SeqIO is new in Biopython 1.43 Inspired by BioPerl’s SeqIOInspired by BioPerl’s SeqIO Works with SeqRecord objectsWorks with SeqRecord objects

(not format specific representations)(not format specific representations) Builds on existing Biopython parsersBuilds on existing Biopython parsers

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

Page 13: Biopython

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

SeqIO – Sequence InputSeqIO – Sequence Inputfrom Bio import SeqIOhandle = open('ls_orchid.fasta')format = 'fasta'for rec in SeqIO.parse(handle, format) : print "%s, len %i" % (rec.id, len(rec.seq)) print rec.seq[:40].tostring() + "..."handle.close()

gi|2765658|emb|Z78533.1|CIZ78533, len 740CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCAT...gi|2765657|emb|Z78532.1|CCZ78532, len 753CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCAT......

Page 14: Biopython

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

SeqIO – Sequence InputSeqIO – Sequence Inputfrom Bio import SeqIOhandle = open('ls_orchid.gbk')format = 'genbank'for rec in SeqIO.parse(handle, format) : print "%s, len %i" % (rec.id, len(rec.seq)) print rec.seq[:40].tostring() + "..."handle.close()

Z78533.1, len 740CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCAT...Z78532.1, len 753CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCAT......

Page 15: Biopython

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

SeqIO – Extracting DataSeqIO – Extracting Datafrom Bio import SeqIOhandle = open('ls_orchid.gbk')format = 'genbank'from sets import Setprint Set([rec.annotations['organism'] \ for rec in SeqIO.parse(handle, format)])handle.close()

Set(['Cypripedium acaule', 'Paphiopedilum primulinum', 'Phragmipedium lindenii', 'Paphiopedilum papuanum', 'Paphiopedilum stonei', 'Paphiopedilum urbanianum', 'Paphiopedilum dianthum', ...])

Page 16: Biopython

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

SeqIO – Filtering OutputSeqIO – Filtering Outputi_handle = open('ls_orchid.gbk')o_handle = open('small_orchid.faa', 'w')SeqIO.write([rec for rec in \ SeqIO.parse(i_handle, 'genbank') \ if len(rec.seq) < 600], o_handle, 'fasta')i_handle.close()o_handle.close()

>Z78481.1 P.insigne 5.8S rRNA gene and ITS1 ...CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTT... >Z78480.1 P.gratrixianum 5.8S rRNA gene and ...CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTT......

Page 17: Biopython

3D Structures3D Structures

Bio.Nexus was added in Biopython Bio.Nexus was added in Biopython 1.30 by 1.30 by Frank Kauff and Cymon CoxFrank Kauff and Cymon Cox

Reads Nexus alignments and treesReads Nexus alignments and trees Also parses Newick format treesAlso parses Newick format trees

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

Page 18: Biopython

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

Newick Tree ParsingNewick Tree Parsing(Bovine:0.69395,(Gibbon:0.36079,(Orang:0.33636,(Gorilla:0.17147,(Chimp:0.19268, Human:0.11927):0.08386):0.06124):0.15057):0.54939,Mouse:1.21460):0.10;

from Bio.Nexus.Trees import Treetree_str = open("simple.tree").read()tree_obj = Tree(tree_str)print tree_obj

tree a_tree = (Bovine,(Gibbon,(Orang,(Gorilla,(Chimp,Human)))),Mouse);

Page 19: Biopython

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

Newick Tree ParsingNewick Tree Parsing

# taxon prev succ brlen blen (sum) support0 - None [1,2,11] 0.0 0.0 -1 Bovine 0 [] 0.69395 0.69395 -2 - 0 [3,4] 0.54939 0.54939 -3 Gibbon 2 [] 0.36079 0.91018 -4 - 2 [5,6] 0.15057 0.69996 -5 Orang 4 [] 0.33636 1.03632 -6 - 4 [7,8] 0.06124 0.7612 -7 Gorilla 6 [] 0.17147 0.93267 -8 - 6 [9,10] 0.08386 0.84506 -9 Chimp 8 [] 0.19268 1.03774 -10 Human 8 [] 0.11927 0.96433 -11 Mouse 0 [] 1.2146 1.2146 -

Root: 0

tree_obj.display()

Page 20: Biopython

3D Structures3D Structures

Bio.PDB was added in Biopython 1.24 Bio.PDB was added in Biopython 1.24 by Thomas Hamelryckby Thomas Hamelryck

Reads PDB and CIF format filesReads PDB and CIF format files

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

Page 21: Biopython

Working with 3D StructuresWorking with 3D Structures

http://www.warwick.ac.uk/go/peter_cock/http://www.warwick.ac.uk/go/peter_cock/

python/protein_superposition/python/protein_superposition/

Before: After:This example (online) uses Bio.PDB to align 21 alternative X-Ray crystal structures for PDB structure 1JOY.

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

Page 22: Biopython

Population Genetics Population Genetics (planned)(planned)

Tiago Antão (with Ralph Haygood) plans Tiago Antão (with Ralph Haygood) plans to start a Population Genetics moduleto start a Population Genetics module

See also:See also: PyPop: Python for Population GeneticsPyPop: Python for Population Genetics

Alex Lancaster Alex Lancaster et al.et al. (2003) (2003)www.pypop.orgwww.pypop.org

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

Page 23: Biopython

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

Areas for ImprovementAreas for Improvement Documentation!Documentation! I’m interested in sequences & I’m interested in sequences &

alignments:alignments: Seq objects – more like strings?Seq objects – more like strings? Alignment objects – more like arrays?Alignment objects – more like arrays? SeqIO – support for more formatsSeqIO – support for more formats AlignIO? – alignment equivalent to SeqIOAlignIO? – alignment equivalent to SeqIO

Move from Numeric to NumPyMove from Numeric to NumPy Move from CVS to SVN?Move from CVS to SVN?

Page 24: Biopython

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

How can you Contribute?How can you Contribute? Users:Users:

Discussions on the mailing listDiscussions on the mailing list Report bugsReport bugs Documentation improvementDocumentation improvement

Coders:Coders: Suggest bug fixesSuggest bug fixes New/extended test casesNew/extended test cases Adopt modules with no current ‘owner’Adopt modules with no current ‘owner’ New modulesNew modules

Page 25: Biopython

Biopython Biopython AcknowledgementsAcknowledgements

Open Bioinformatics Foundation or O|B|F Open Bioinformatics Foundation or O|B|F for web hosting, CVS servers, mailing listfor web hosting, CVS servers, mailing list

Biopython developers, including:Biopython developers, including:Jeff Chang, Andrew Dalke, Brad Jeff Chang, Andrew Dalke, Brad Chapman, Iddo Chapman, Iddo FriedbergFriedberg, , Michiel de Michiel de Hoon, Frank Kauff, Cymon Cox, Thomas Hoon, Frank Kauff, Cymon Cox, Thomas Hamelryck, meHamelryck, me

Contributors who report bugs & join in Contributors who report bugs & join in the mailing list discussionsthe mailing list discussions

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

Page 26: Biopython

Personal Personal AcknowledgementsAcknowledgements

Everyone for listeningEveryone for listening Open Bioinformatics Foundation or O|B|Open Bioinformatics Foundation or O|B|

F for the BOSC 2007 invitationF for the BOSC 2007 invitation Iddo Iddo Friedberg &Friedberg & Michiel de Hoon for Michiel de Hoon for

their encouragementtheir encouragement The EPSRC for my PhD funding via theThe EPSRC for my PhD funding via the

MOAC Doctoral Training Centre, MOAC Doctoral Training Centre, WarwickWarwick

http://www.warwick.ac.uk/go/moachttp://www.warwick.ac.uk/go/moacThe 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

Page 27: Biopython

The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,

Austria

Questions?Questions?