21
Project Update Eric Talevich, Peter Cock, Brad Chapman, João Rodrigues, and Biopython contributors Bioinformatics Open Source Conference (BOSC) July 14, 2012 Long Beach, California, USA

Biopython Project Update (BOSC 2012)

Embed Size (px)

DESCRIPTION

Highlights of the Biopython project for computational biology, 2011-2012: Artemis-like genome track comparison with GenomeDiagram, new formats for SeqIO, phylogenetics with Bio.Phylo, Bio.PDB improvements, and an update on Google Summer of Code (GSoC) projects.

Citation preview

Page 1: Biopython Project Update (BOSC 2012)

Project Update

Eric Talevich, Peter Cock, Brad Chapman, João Rodrigues,

and Biopython contributors

Bioinformatics Open Source Conference (BOSC)July 14, 2012

Long Beach, California, USA

Page 2: Biopython Project Update (BOSC 2012)

Hello, BOSC

Biopython is a freely available Python library for biological computation, and a long-running, distributed collaboration to produce and maintain it [1].● Supported by the Open Bioinformatics Foundation

(OBF)● "This is Python's Bio* library. There are several Bio*

libraries like it, but this one is ours."● http://biopython.org/_____[1] Cock, P.J.A., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., de Hoon, M.J. (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. doi:10.1093/bioinformatics/btp163

Page 3: Biopython Project Update (BOSC 2012)

Bio.Graphics (Biopython 1.59, February 2012)

New features in...BasicChromosome:

● Draw simple sub-features on chromosome segments● Show the position of genes, SNPs or other loci

GenomeDiagram [2]:● Cross-links between tracks● Track-specific start/end positions for showing regions

_____[2] Pritchard, L., White, J.A., Birch, P.R., Toth, I. (2010) GenomeDiagram: a python package for the visualization of large-scale genomic data. Bioinformatics 2(5) 616-7.doi:10.1093/bioinformatics/btk021

Page 4: Biopython Project Update (BOSC 2012)

BasicChromosome: Potato NB-LRRs

Jupe et al. (2012) BMC Genomics

Page 6: Biopython Project Update (BOSC 2012)

GenomeDiagram imitatesArtemis Comparison Tool (ACT)

Page 7: Biopython Project Update (BOSC 2012)

SeqIO and AlignIO(Biopython 1.58, August 2011)

● SeqXML format [3]

● Read support for ABI chromatogram files (Wibowo A.)

● "phylip-relaxed" format (Connor McCoy, Brandon I.)○ Relaxes the 10-character limit on taxon names○ Space-delimited instead○ Used in RAxML, PhyML, PAML, etc.

_____[3] Schmitt et al. (2011) SeqXML and OrthoXML: standards for sequence and orthology information. Briefings in Bioinformatics 12(5): 485-488. doi:10.1093/bib/bbr025

Page 8: Biopython Project Update (BOSC 2012)

Bio.Phylo & pypaml

● PAML interop: wrappers, I/O, glue○ Merged Brandon Invergo’s pypaml as

Bio.Phylo.PAML (Biopython 1.58, August 2011)

● Phylo.draw improvements

● RAxML wrapper (Biopython 1.60, June 2012)

● Paper in review [4]

_____[4] Talevich, E., Invergo, B.M., Cock, P.J.A., Chapman, B.A. (2012) Bio.Phylo: a unified toolkit for processing, analysis and visualization of phylogenetic data in Biopython. BMC Bioinformatics 13:209. doi:10.1186/1471-2105-13-209

Page 9: Biopython Project Update (BOSC 2012)

Phylo.draw and matplotlib

Page 10: Biopython Project Update (BOSC 2012)

Bio.bgzf (Blocked GNU Zip Format)

● BGZF is a GZIP variant that compresses blocks of a fixed, known size

● Used in Next Generation Sequencing for efficient random access to compressed files○ SAM + BGZF = BAM

Bio.SeqIO can now index BGZF compressed sequence files. (Biopython 1.60, June 2012)

Page 11: Biopython Project Update (BOSC 2012)

TogoWS(Biopython 1.59, February 2012)

● TogoWS is an integrated web resource for bioinformatics databases and services

● Provided by the Database Center for Life Science in Japan

● Usage is similar to NCBI Entrez

_____http://togows.dbcls.jp/

Page 12: Biopython Project Update (BOSC 2012)

PyPy and Python 3

Biopython:● works well on PyPy 1.9

(excluding NumPy & C extensions)● works on Python 3 (excluding some C

extensions), but concerns remain about performance in default unicode mode.○ Currently 'beta' level support.

Page 13: Biopython Project Update (BOSC 2012)

Bio.PDB

● mmCIF parser restored (Biopython 1.60, June 2012)○ Lenna Peterson fixed a 4-year-old lex/yacc-related

compilation issue○ That was awesome○ Now she's a GSoC student○ Py3/PyPy/Jython compatibility in progress

● Merging GSoC results incrementally○ Atom element names & weights (João Rodrigues,

GSoC 2010)○ Lots of feature branches remaining...

Page 14: Biopython Project Update (BOSC 2012)

Bio.PDB feature branches

'10 '11 '12 ...

GSOC

mmCIF Parser

Bio.Struct

InterfaceAnalysis

Mocapy++Generic Features

PDBParser

Page 15: Biopython Project Update (BOSC 2012)

Google Summer of Code (GSoC)

In 2011, Biopython had three projects funded via the OBF:● Mikael Trellet (Bio.PDB)● Michele Silva (Bio.PDB, Mocapy++)● Justinas Daugmaudis (Mocapy++)

In 2012, we have two projects via the OBF:● Wibowo Arindrarto: (SearchIO)● Lenna Peterson: (Variants)

_____http://biopython.org/wiki/Google_Summer_of_Codehttp://www.open-bio.org/wiki/Google_Summer_of_Codehttps://www.google-melange.com/

Page 16: Biopython Project Update (BOSC 2012)

GSoC 2011: Mikael Trellet

Biomolecular interfaces in Bio.PDBMentor: João Rodrigues

● Representation of protein-protein interfaces: SM(I)CRA

● Determining interfaces from PDB coordinates● Analyses of these objects

_____http://biopython.org/wiki/GSoC2011_mtrellet

Page 17: Biopython Project Update (BOSC 2012)

GSoC 2011: Michele Silva

Python/Biopython bindings for Mocapy++Mentor: Thomas Hamelryck

Michele Silva wrote a Python bridge for Mocapy++ and linked it to Bio.PDB to enable statistical analysis of protein structures.

More-or-less ready to merge after the next Mocapy++ release._____http://biopython.org/wiki/GSOC2011_Mocapy

Page 18: Biopython Project Update (BOSC 2012)

Mocapy extensions in PythonMentor: Thomas Hamelryck

Enhance Mocapy++ in a complementary way, developing a plugin system for Mocapy++ allowing users to easily write new nodes (probability distribution functions) in Python.

He's finishing this as part of his master's thesis project with Thomas Hamelryck._____http://biopython.org/wiki/GSOC2011_MocapyExt

GSoC 2011: Justinas Daugmaudis

Page 19: Biopython Project Update (BOSC 2012)

GSoC 2012: Lenna Peterson

Diff My DNA: Development of a Genomic Variant Toolkit for BiopythonMentors: Brad Chapman, James Casbon

● I/O for VCF, GVF formats● internal schema for variant data

_____http://arklenna.tumblr.com/tagged/gsoc2012

Page 20: Biopython Project Update (BOSC 2012)

GSoC 2012: Wibowo Arindrarto

SearchIO implementation in BiopythonMentor: Peter Cock

Unified, BioPerl-like API for search results from BLAST, HMMer, FASTA, etc.

_____http://biopython.org/wiki/SearchIOhttp://bow.web.id/blog/tag/gsoc/

Page 21: Biopython Project Update (BOSC 2012)

Thanks

● OBF● BOSC organizers● Biopython contributors● Scientists like you

Check us out:● Website: http://biopython.org● Code: https://github.com/biopython/biopython