Upload
bosc
View
3.642
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Title: Biopython UpdateAuthor: Peter Cock
Citation preview
The 8th annualBioinformatics Open Source
Conference(BOSC 2007)
18th July, Vienna, Austria
Peter Cock,MOAC Doctoral Training Centre,University of Warwick, UK
Biopython Project Update
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
Talk OutlineTalk Outline What is python?What is python? What is Biopython?What is Biopython? Short historyShort history Project organisationProject organisation What can you do with it?What can you do with it? How can you contribute?How can you contribute? AcknowledgementsAcknowledgements
What is Python?What is Python?
High level programming High level programming languagelanguage
Object orientatedObject orientated Open Source, free ($$$) Open Source, free ($$$) Cross platform:Cross platform:
Linux, Windows, Mac OS X, …Linux, Windows, Mac OS X, … Extensible in C, C++, …Extensible in C, C++, …
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
What is Biopython?What is Biopython?
Set of libraries for computational Set of libraries for computational biologybiology
Open Source, free ($$$)Open Source, free ($$$) Cross platform:Cross platform:
Linux, Windows, Mac OS X, … Linux, Windows, Mac OS X, … Sibling project to BioPerl, BioRuby, Sibling project to BioPerl, BioRuby,
BioJava, …BioJava, …
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
Popularity by Google HitsPopularity by Google Hits PythonPython
PerlPerl
RubyRuby
JavaJava
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
98 million98 million
101 101 millionmillion
101 101 millionmillion
289 289 millionmillion
BiopythonBiopython
BioPerlBioPerl
BioRubyBioRuby
BioJavaBioJava
252,000252,000
610,000610,000
122,000122,000
185,000185,000
BioPerl BioPerl 610,000610,000
Both Perl and Python are strong at textBoth Perl and Python are strong at text Python may have the edge for numerical Python may have the edge for numerical
work (with the Numerical python libraries)work (with the Numerical python libraries)
Biopython historyBiopython history 1999 : Started by Jeff Chang & Andrew 1999 : Started by Jeff Chang & Andrew
DalkeDalke 2000 : Biopython 0.90, first release2000 : Biopython 0.90, first release 2001 : Biopython 1.00, “semi-complete”2001 : Biopython 1.00, “semi-complete” 2002 : Biopython 1.10, “semi-stable”2002 : Biopython 1.10, “semi-stable” 2003 : Biopython 1.20, 1.21, 1.22 and 1.232003 : Biopython 1.20, 1.21, 1.22 and 1.23 2004 : Biopython 1.24 and 1.302004 : Biopython 1.24 and 1.30 2005 : Biopython 1.40 and 1.412005 : Biopython 1.40 and 1.41 2006 : Biopython 1.422006 : Biopython 1.42 2007 : Biopython 1.432007 : Biopython 1.43
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
Biopython Project Biopython Project OrganisationOrganisation
Releases:Releases: No fixed scheduleNo fixed schedule Currently once or twice a yearCurrently once or twice a year Work from a stable CVS baseWork from a stable CVS base
Bugs:Bugs: Online bugzillaOnline bugzilla Some small changes handled on mailing listSome small changes handled on mailing list
Tests:Tests: Many based on unittest python libraryMany based on unittest python library Also simple scripts where output is verifiedAlso simple scripts where output is verified
What can you do with What can you do with Biopython?Biopython?
Read, write & manipulate sequencesRead, write & manipulate sequences Restriction enzymesRestriction enzymes BLAST (local and online)BLAST (local and online) Web databases (e.g. NCBI’s EUtils)Web databases (e.g. NCBI’s EUtils) Call command line tools (e.g. clustalw)Call command line tools (e.g. clustalw) Clustering (Bio.Cluster)Clustering (Bio.Cluster) Phylogenetics (Bio.Nexus)Phylogenetics (Bio.Nexus) Protein Structures (Bio.PDB)Protein Structures (Bio.PDB)
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
Manipulating SequencesManipulating Sequences
Use Biopython’s Seq object, holds:Use Biopython’s Seq object, holds: Sequence data (string like)Sequence data (string like) Alphabet (can include list of letters)Alphabet (can include list of letters)
Alphabet allows type checking, Alphabet allows type checking, preventing errors like appending preventing errors like appending DNA to ProteinDNA to Protein
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
Manipulating SequencesManipulating Sequencesfrom Bio.Seq import Seqfrom Bio.Alphabet.IUPAC import unambiguous_dna
my_dna=Seq('CTAAACATCCTTCAT', unambiguous_dna)print 'Original:'print my_dnaprint 'Reverse complement:'print my_dna.reverse_complement()
Original:Seq('CTAAACATCCTTCAT', IUPACUnambiguousDNA())Reverse complement:Seq('ATGAAGGATGTTTAG', IUPACUnambiguousDNA())
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
Translating SequencesTranslating Sequencesfrom Bio import Translatebact_trans=Translate.unambiguous_dna_by_id[11]
print 'Forward translation'print bact_trans.translate(my_dna)print 'Reverse complement translation'print bact_trans.translate( \ my_dna.reverse_complement())
Forward translationSeq('LNILH', HasStopCodon(IUPACProtein(), '*'))Reverse complement translationSeq('MKDV*', HasStopCodon(IUPACProtein(), '*'))
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
Sequence Input/OutputSequence Input/Output
Bio.SeqIO is new in Biopython 1.43Bio.SeqIO is new in Biopython 1.43 Inspired by BioPerl’s SeqIOInspired by BioPerl’s SeqIO Works with SeqRecord objectsWorks with SeqRecord objects
(not format specific representations)(not format specific representations) Builds on existing Biopython parsersBuilds on existing Biopython parsers
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
SeqIO – Sequence InputSeqIO – Sequence Inputfrom Bio import SeqIOhandle = open('ls_orchid.fasta')format = 'fasta'for rec in SeqIO.parse(handle, format) : print "%s, len %i" % (rec.id, len(rec.seq)) print rec.seq[:40].tostring() + "..."handle.close()
gi|2765658|emb|Z78533.1|CIZ78533, len 740CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCAT...gi|2765657|emb|Z78532.1|CCZ78532, len 753CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCAT......
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
SeqIO – Sequence InputSeqIO – Sequence Inputfrom Bio import SeqIOhandle = open('ls_orchid.gbk')format = 'genbank'for rec in SeqIO.parse(handle, format) : print "%s, len %i" % (rec.id, len(rec.seq)) print rec.seq[:40].tostring() + "..."handle.close()
Z78533.1, len 740CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCAT...Z78532.1, len 753CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCAT......
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
SeqIO – Extracting DataSeqIO – Extracting Datafrom Bio import SeqIOhandle = open('ls_orchid.gbk')format = 'genbank'from sets import Setprint Set([rec.annotations['organism'] \ for rec in SeqIO.parse(handle, format)])handle.close()
Set(['Cypripedium acaule', 'Paphiopedilum primulinum', 'Phragmipedium lindenii', 'Paphiopedilum papuanum', 'Paphiopedilum stonei', 'Paphiopedilum urbanianum', 'Paphiopedilum dianthum', ...])
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
SeqIO – Filtering OutputSeqIO – Filtering Outputi_handle = open('ls_orchid.gbk')o_handle = open('small_orchid.faa', 'w')SeqIO.write([rec for rec in \ SeqIO.parse(i_handle, 'genbank') \ if len(rec.seq) < 600], o_handle, 'fasta')i_handle.close()o_handle.close()
>Z78481.1 P.insigne 5.8S rRNA gene and ITS1 ...CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTT... >Z78480.1 P.gratrixianum 5.8S rRNA gene and ...CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTT......
3D Structures3D Structures
Bio.Nexus was added in Biopython Bio.Nexus was added in Biopython 1.30 by 1.30 by Frank Kauff and Cymon CoxFrank Kauff and Cymon Cox
Reads Nexus alignments and treesReads Nexus alignments and trees Also parses Newick format treesAlso parses Newick format trees
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
Newick Tree ParsingNewick Tree Parsing(Bovine:0.69395,(Gibbon:0.36079,(Orang:0.33636,(Gorilla:0.17147,(Chimp:0.19268, Human:0.11927):0.08386):0.06124):0.15057):0.54939,Mouse:1.21460):0.10;
from Bio.Nexus.Trees import Treetree_str = open("simple.tree").read()tree_obj = Tree(tree_str)print tree_obj
tree a_tree = (Bovine,(Gibbon,(Orang,(Gorilla,(Chimp,Human)))),Mouse);
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
Newick Tree ParsingNewick Tree Parsing
# taxon prev succ brlen blen (sum) support0 - None [1,2,11] 0.0 0.0 -1 Bovine 0 [] 0.69395 0.69395 -2 - 0 [3,4] 0.54939 0.54939 -3 Gibbon 2 [] 0.36079 0.91018 -4 - 2 [5,6] 0.15057 0.69996 -5 Orang 4 [] 0.33636 1.03632 -6 - 4 [7,8] 0.06124 0.7612 -7 Gorilla 6 [] 0.17147 0.93267 -8 - 6 [9,10] 0.08386 0.84506 -9 Chimp 8 [] 0.19268 1.03774 -10 Human 8 [] 0.11927 0.96433 -11 Mouse 0 [] 1.2146 1.2146 -
Root: 0
tree_obj.display()
3D Structures3D Structures
Bio.PDB was added in Biopython 1.24 Bio.PDB was added in Biopython 1.24 by Thomas Hamelryckby Thomas Hamelryck
Reads PDB and CIF format filesReads PDB and CIF format files
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
Working with 3D StructuresWorking with 3D Structures
http://www.warwick.ac.uk/go/peter_cock/http://www.warwick.ac.uk/go/peter_cock/
python/protein_superposition/python/protein_superposition/
Before: After:This example (online) uses Bio.PDB to align 21 alternative X-Ray crystal structures for PDB structure 1JOY.
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
Population Genetics Population Genetics (planned)(planned)
Tiago Antão (with Ralph Haygood) plans Tiago Antão (with Ralph Haygood) plans to start a Population Genetics moduleto start a Population Genetics module
See also:See also: PyPop: Python for Population GeneticsPyPop: Python for Population Genetics
Alex Lancaster Alex Lancaster et al.et al. (2003) (2003)www.pypop.orgwww.pypop.org
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
Areas for ImprovementAreas for Improvement Documentation!Documentation! I’m interested in sequences & I’m interested in sequences &
alignments:alignments: Seq objects – more like strings?Seq objects – more like strings? Alignment objects – more like arrays?Alignment objects – more like arrays? SeqIO – support for more formatsSeqIO – support for more formats AlignIO? – alignment equivalent to SeqIOAlignIO? – alignment equivalent to SeqIO
Move from Numeric to NumPyMove from Numeric to NumPy Move from CVS to SVN?Move from CVS to SVN?
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
How can you Contribute?How can you Contribute? Users:Users:
Discussions on the mailing listDiscussions on the mailing list Report bugsReport bugs Documentation improvementDocumentation improvement
Coders:Coders: Suggest bug fixesSuggest bug fixes New/extended test casesNew/extended test cases Adopt modules with no current ‘owner’Adopt modules with no current ‘owner’ New modulesNew modules
Biopython Biopython AcknowledgementsAcknowledgements
Open Bioinformatics Foundation or O|B|F Open Bioinformatics Foundation or O|B|F for web hosting, CVS servers, mailing listfor web hosting, CVS servers, mailing list
Biopython developers, including:Biopython developers, including:Jeff Chang, Andrew Dalke, Brad Jeff Chang, Andrew Dalke, Brad Chapman, Iddo Chapman, Iddo FriedbergFriedberg, , Michiel de Michiel de Hoon, Frank Kauff, Cymon Cox, Thomas Hoon, Frank Kauff, Cymon Cox, Thomas Hamelryck, meHamelryck, me
Contributors who report bugs & join in Contributors who report bugs & join in the mailing list discussionsthe mailing list discussions
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
Personal Personal AcknowledgementsAcknowledgements
Everyone for listeningEveryone for listening Open Bioinformatics Foundation or O|B|Open Bioinformatics Foundation or O|B|
F for the BOSC 2007 invitationF for the BOSC 2007 invitation Iddo Iddo Friedberg &Friedberg & Michiel de Hoon for Michiel de Hoon for
their encouragementtheir encouragement The EPSRC for my PhD funding via theThe EPSRC for my PhD funding via the
MOAC Doctoral Training Centre, MOAC Doctoral Training Centre, WarwickWarwick
http://www.warwick.ac.uk/go/moachttp://www.warwick.ac.uk/go/moacThe 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
Questions?Questions?