Biopython Project Update (BOSC 2012)

Preview:

DESCRIPTION

Highlights of the Biopython project for computational biology, 2011-2012: Artemis-like genome track comparison with GenomeDiagram, new formats for SeqIO, phylogenetics with Bio.Phylo, Bio.PDB improvements, and an update on Google Summer of Code (GSoC) projects.

Citation preview

Project Update

Eric Talevich, Peter Cock, Brad Chapman, João Rodrigues,

and Biopython contributors

Bioinformatics Open Source Conference (BOSC)July 14, 2012

Long Beach, California, USA

Hello, BOSC

Biopython is a freely available Python library for biological computation, and a long-running, distributed collaboration to produce and maintain it [1].● Supported by the Open Bioinformatics Foundation

(OBF)● "This is Python's Bio* library. There are several Bio*

libraries like it, but this one is ours."● http://biopython.org/_____[1] Cock, P.J.A., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., de Hoon, M.J. (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. doi:10.1093/bioinformatics/btp163

Bio.Graphics (Biopython 1.59, February 2012)

New features in...BasicChromosome:

● Draw simple sub-features on chromosome segments● Show the position of genes, SNPs or other loci

GenomeDiagram [2]:● Cross-links between tracks● Track-specific start/end positions for showing regions

_____[2] Pritchard, L., White, J.A., Birch, P.R., Toth, I. (2010) GenomeDiagram: a python package for the visualization of large-scale genomic data. Bioinformatics 2(5) 616-7.doi:10.1093/bioinformatics/btk021

BasicChromosome: Potato NB-LRRs

Jupe et al. (2012) BMC Genomics

GenomeDiagram imitatesArtemis Comparison Tool (ACT)

SeqIO and AlignIO(Biopython 1.58, August 2011)

● SeqXML format [3]

● Read support for ABI chromatogram files (Wibowo A.)

● "phylip-relaxed" format (Connor McCoy, Brandon I.)○ Relaxes the 10-character limit on taxon names○ Space-delimited instead○ Used in RAxML, PhyML, PAML, etc.

_____[3] Schmitt et al. (2011) SeqXML and OrthoXML: standards for sequence and orthology information. Briefings in Bioinformatics 12(5): 485-488. doi:10.1093/bib/bbr025

Bio.Phylo & pypaml

● PAML interop: wrappers, I/O, glue○ Merged Brandon Invergo’s pypaml as

Bio.Phylo.PAML (Biopython 1.58, August 2011)

● Phylo.draw improvements

● RAxML wrapper (Biopython 1.60, June 2012)

● Paper in review [4]

_____[4] Talevich, E., Invergo, B.M., Cock, P.J.A., Chapman, B.A. (2012) Bio.Phylo: a unified toolkit for processing, analysis and visualization of phylogenetic data in Biopython. BMC Bioinformatics 13:209. doi:10.1186/1471-2105-13-209

Phylo.draw and matplotlib

Bio.bgzf (Blocked GNU Zip Format)

● BGZF is a GZIP variant that compresses blocks of a fixed, known size

● Used in Next Generation Sequencing for efficient random access to compressed files○ SAM + BGZF = BAM

Bio.SeqIO can now index BGZF compressed sequence files. (Biopython 1.60, June 2012)

TogoWS(Biopython 1.59, February 2012)

● TogoWS is an integrated web resource for bioinformatics databases and services

● Provided by the Database Center for Life Science in Japan

● Usage is similar to NCBI Entrez

_____http://togows.dbcls.jp/

PyPy and Python 3

Biopython:● works well on PyPy 1.9

(excluding NumPy & C extensions)● works on Python 3 (excluding some C

extensions), but concerns remain about performance in default unicode mode.○ Currently 'beta' level support.

Bio.PDB

● mmCIF parser restored (Biopython 1.60, June 2012)○ Lenna Peterson fixed a 4-year-old lex/yacc-related

compilation issue○ That was awesome○ Now she's a GSoC student○ Py3/PyPy/Jython compatibility in progress

● Merging GSoC results incrementally○ Atom element names & weights (João Rodrigues,

GSoC 2010)○ Lots of feature branches remaining...

Bio.PDB feature branches

'10 '11 '12 ...

GSOC

mmCIF Parser

Bio.Struct

InterfaceAnalysis

Mocapy++Generic Features

PDBParser

Google Summer of Code (GSoC)

In 2011, Biopython had three projects funded via the OBF:● Mikael Trellet (Bio.PDB)● Michele Silva (Bio.PDB, Mocapy++)● Justinas Daugmaudis (Mocapy++)

In 2012, we have two projects via the OBF:● Wibowo Arindrarto: (SearchIO)● Lenna Peterson: (Variants)

_____http://biopython.org/wiki/Google_Summer_of_Codehttp://www.open-bio.org/wiki/Google_Summer_of_Codehttps://www.google-melange.com/

GSoC 2011: Mikael Trellet

Biomolecular interfaces in Bio.PDBMentor: João Rodrigues

● Representation of protein-protein interfaces: SM(I)CRA

● Determining interfaces from PDB coordinates● Analyses of these objects

_____http://biopython.org/wiki/GSoC2011_mtrellet

GSoC 2011: Michele Silva

Python/Biopython bindings for Mocapy++Mentor: Thomas Hamelryck

Michele Silva wrote a Python bridge for Mocapy++ and linked it to Bio.PDB to enable statistical analysis of protein structures.

More-or-less ready to merge after the next Mocapy++ release._____http://biopython.org/wiki/GSOC2011_Mocapy

Mocapy extensions in PythonMentor: Thomas Hamelryck

Enhance Mocapy++ in a complementary way, developing a plugin system for Mocapy++ allowing users to easily write new nodes (probability distribution functions) in Python.

He's finishing this as part of his master's thesis project with Thomas Hamelryck._____http://biopython.org/wiki/GSOC2011_MocapyExt

GSoC 2011: Justinas Daugmaudis

GSoC 2012: Lenna Peterson

Diff My DNA: Development of a Genomic Variant Toolkit for BiopythonMentors: Brad Chapman, James Casbon

● I/O for VCF, GVF formats● internal schema for variant data

_____http://arklenna.tumblr.com/tagged/gsoc2012

GSoC 2012: Wibowo Arindrarto

SearchIO implementation in BiopythonMentor: Peter Cock

Unified, BioPerl-like API for search results from BLAST, HMMer, FASTA, etc.

_____http://biopython.org/wiki/SearchIOhttp://bow.web.id/blog/tag/gsoc/

Thanks

● OBF● BOSC organizers● Biopython contributors● Scientists like you

Check us out:● Website: http://biopython.org● Code: https://github.com/biopython/biopython

Recommended