19
Bioinformatics Open Source Conference 2013 @ Berlin, Germany (July 19) BioRuby updates Power of modularity in the community-based open source development model Toshiaki Katayama <[email protected]> http://jp.linkedin.com/in/toshiakikatayama Database Center for Life Science (DBCLS), Research Organization of Information and Systems (ROIS), Japan Pjotr Prins University Medical Center Utrecht, Netherlands Raoul Bonnal Instituto Nazionale Genetica Molecolare, Italy Francesco Strozzi Parco Tecnologico Padano, Italy Naohisa Goto Osaka University, Japan

Bioinformatics Open Source Conference 2013 @ Berlin ... · intermine entrez, eutils ruby-ensembl-api bio-dbsnp bio-chembl: Phyloinformatics biodiversity name-spotter bio-nexml bio-phyloxml:

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bioinformatics Open Source Conference 2013 @ Berlin ... · intermine entrez, eutils ruby-ensembl-api bio-dbsnp bio-chembl: Phyloinformatics biodiversity name-spotter bio-nexml bio-phyloxml:

Bioinformatics Open Source Conference 2013 @ Berlin, Germany (July 19)

BioRuby updatesPower of modularity in the community-basedopen source development model

Toshiaki Katayama <[email protected]>http://jp.linkedin.com/in/toshiakikatayamaDatabase Center for Life Science (DBCLS),Research Organization of Information and Systems (ROIS), Japan

Pjotr Prins University Medical Center Utrecht, Netherlands

Raoul Bonnal Instituto Nazionale Genetica Molecolare, Italy

Francesco Strozzi Parco Tecnologico Padano, Italy

Naohisa Goto Osaka University, Japan

Page 2: Bioinformatics Open Source Conference 2013 @ Berlin ... · intermine entrez, eutils ruby-ensembl-api bio-dbsnp bio-chembl: Phyloinformatics biodiversity name-spotter bio-nexml bio-phyloxml:

BioRuby stairway to freedom20

00

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

Project started http://bioruby.org/

O|B|FBioHackathons

Open Bio Japan http://open-bio.jp/

Grant by IPA

BioRuby-1.0

NBDC/DBCLSBioHackathons http://biohackathon.org/

BioRuby paperBiogem paper

BioRuby-1.5.0GitHub Biogem

Free to Use/Copy/Modify

Fork (GitHub)

Extend/Distribute (Biogem)

Join w/ approval (CVS/SVN commits)

GSoCJoined to O|B|F

Page 3: Bioinformatics Open Source Conference 2013 @ Berlin ... · intermine entrez, eutils ruby-ensembl-api bio-dbsnp bio-chembl: Phyloinformatics biodiversity name-spotter bio-nexml bio-phyloxml:

Biogem ecosystem and beyond

Ruby

BioRuby

Biogem Biogem

Biogem

Biogem

BioInterchange

BaseSpace

1995~

2000~

2010~

2012~

2013~

Page 4: Bioinformatics Open Source Conference 2013 @ Berlin ... · intermine entrez, eutils ruby-ensembl-api bio-dbsnp bio-chembl: Phyloinformatics biodiversity name-spotter bio-nexml bio-phyloxml:

Biogem lowered the entry barrier

•User can freely develop their own libs/apps and distribute them.•BioRuby core can concentrate on its stability and compatibility.

Biogem developer:% gem install bio-gem% biogem yourapp% cd bioruby-yourapp# develop lib/*.rb and/or bin/* as you like% bundle exec release# will make yourapp available on GitHub.com and Rubygems.org

Biogem user:% gem install bio-yourapp

That's it!

Page 5: Bioinformatics Open Source Conference 2013 @ Berlin ... · intermine entrez, eutils ruby-ensembl-api bio-dbsnp bio-chembl: Phyloinformatics biodiversity name-spotter bio-nexml bio-phyloxml:

http://biogems.info

>80 Biogems have been released so far

BioRuby >44K DLsBiogems vary from ~100 to 100K DLs

Page 6: Bioinformatics Open Source Conference 2013 @ Berlin ... · intermine entrez, eutils ruby-ensembl-api bio-dbsnp bio-chembl: Phyloinformatics biodiversity name-spotter bio-nexml bio-phyloxml:

Biogem packages break-down

Core/MetaNGS

Seq analysisTools

Web servicesSemantic Web

PhyloinformaticsChemoinformatics

VisualizationGrid computing

0 10 20 30 40

>20 new biogems have beendeveloped since BOSC2012

Page 7: Bioinformatics Open Source Conference 2013 @ Berlin ... · intermine entrez, eutils ruby-ensembl-api bio-dbsnp bio-chembl: Phyloinformatics biodiversity name-spotter bio-nexml bio-phyloxml:

Biogem packages

NGSbio-ngsbio-samtoolsbio-gadgetbio-fasterbio-sambambabio-tabixbio-bgzfbio-gngm :

Web servicesbio-ucsc-apiintermineentrez, eutilsruby-ensembl-apibio-dbsnpbio-chembl :

Phyloinformaticsbiodiversityname-spotterbio-nexmlbio-phyloxml :

Seq analysisbio-gff3bio-hmmer3bio-mafbio-phytabio-alignmentbio-signalpbio-tm-hmmbio-isoelectric-pointbio-genomic-intervalbio-kmer-counterbio-restriction-enzyme :

Othersbio-svgenesbiointerchange :

Page 8: Bioinformatics Open Source Conference 2013 @ Berlin ... · intermine entrez, eutils ruby-ensembl-api bio-dbsnp bio-chembl: Phyloinformatics biodiversity name-spotter bio-nexml bio-phyloxml:

bio-ucsc-api by Mishima H.

require 'bio-ucsc'

Automatically maps UCSC MySQL schemato Ruby class by ActiveRecord (Rails)Bio::Ucsc::DB::Table

(e.g., Bio::Ucsc::Hg19::Snp132)

Now integrated in TogoWS(you don't need to code!)

http://togows.org/api/ucsc/database/chromosomal-position[.format] → http://togows.org/api/ucsc/hg19/chr1:107,599,267-107,601,915.fasta

http://togows.org/api/ucsc/database/table/[column=]query[.format]/offset,limit→ http://togows.org/api/ucsc/hg19/refGene/name2=UVSSA.json→ http://togows.org/api/ucsc/hg19/snp137/chrom=chr22;refUCSC=A/1,10

Page 9: Bioinformatics Open Source Conference 2013 @ Berlin ... · intermine entrez, eutils ruby-ensembl-api bio-dbsnp bio-chembl: Phyloinformatics biodiversity name-spotter bio-nexml bio-phyloxml:

bio-gadget by Katayama S. (not me:)

NGS analysis package to handle RNA-Seq data with UMI+barcode+adaptor reads

http://www.nature.com/nprot/journal/v7/n5/full/nprot.2012.022.htmlIslam S, Kjällquist U, Moliner A, Zajac P, Fan J-B, Lönnerberg P, et al. Highly multiplexed and strand-specific single-cell RNA 5' end sequencing.Nat Protoc. 2012 May;7(5):813–828.DOI: 10.1038/nprot.2012.022  PMID: 22481528 http://www.nature.com/nmeth/journal/v9/n1/full/nmeth.1778.htmlKivioja T, Vähärautio A, Karlsson K, Bonke M, Enge M, Linnarsson S, et al.Counting absolute numbers of molecules using unique molecular identifiers.Nat Methods. 2011 Nov 20;9(1):72-74DOI: 10.1038/nmeth.1778  PMID: 22101854

% gem install bio-gadget% bio-gadget <task>

Available tasks•dedup :: Deduplicate fastq (via STDIN)•demlt :: Demultiplex fastq by barcodes•fqxz :: automatic (re)compression of *.fq(.gz|.bz2) files•qvstat :: Statistics of quality values in *.qual file•rgt2mtx :: Convert cuffdiff read group tracking file into tab-separated matrix•wig5p :: Convert bam-format alignments into wig-format table•wigchr :: Extract wiggle track on specified chromosome

mixed-reads demultiplexed

Page 10: Bioinformatics Open Source Conference 2013 @ Berlin ... · intermine entrez, eutils ruby-ensembl-api bio-dbsnp bio-chembl: Phyloinformatics biodiversity name-spotter bio-nexml bio-phyloxml:

bio-gngm by MacLean D. et al.

Another NGS analysis package to detect causative SNPs affecting WT/mutant phenotypes

http://bar.utoronto.ca/ngm/description.html

GNGM = Generalised NGM (Next-generation EMS mutation mapping)

1. Mapping to reference genome2. Calculating and grouping allele frequencies3. Find candidate positions of causative SNPs

Page 11: Bioinformatics Open Source Conference 2013 @ Berlin ... · intermine entrez, eutils ruby-ensembl-api bio-dbsnp bio-chembl: Phyloinformatics biodiversity name-spotter bio-nexml bio-phyloxml:

bio-svgenes by MacLean D.

Bio::Graphics for BioRuby to generate SVG images w/ an intuitive API and w/o dependencies

page  =  Bio::Graphics::Page.new(opts)                  #  sizes  etc.gene  =  Bio::Graphics::MiniFeature.new(opts)    #  positions  etc.gene_track  =  page.add_track(opts)                        #  glyphs  etc.gene_track.add(obj)

page.draw    #  =>  generate  a  SVG  image

Page 12: Bioinformatics Open Source Conference 2013 @ Berlin ... · intermine entrez, eutils ruby-ensembl-api bio-dbsnp bio-chembl: Phyloinformatics biodiversity name-spotter bio-nexml bio-phyloxml:

bio-diversity by Mozzherin D. et al.

Taxonomic scientific name parser to normalize species names from literatures in the best quality

% gem install biodiversity19% nnparse find_scientific_names.txt

Coeloglossum viride (L.) Hartman x Dactylorhiza majalis (Rchb. f.) P.F. Hunt & Summerhayes ssp. praetermissa (Druce) D.M. Moore & Soó

{"scientificName":{"parsed":true,"parser_version":"2.1.0","verbatim":"Coeloglossum viride (L.) Hartman x Dactylorhiza majalis (Rchb. f.) P.F. Hunt & Summerhayes ssp. praetermissa (Druce) D.M. Moore & Soó","normalized":"Coeloglossum viride (L.) Hartman × Dactylorhiza majalis (Rchb. f.) P.F. Hunt & Summerhayes ssp. praetermissa (Druce) D.M. Moore & Soó","canonical":"Coeloglossum viride × Dactylorhiza majalis praetermissa","hybrid":true,"details":[{"genus":{"string":"Coeloglossum"},"species":{"string":"viride","authorship":"(L.) Hartman","combinationAuthorTeam":{"authorTeam":"Hartman","author":["Hartman"]},"basionymAuthorTeam":{"authorTeam":"L.","author":["L."]}}},{"genus":{"string":"Dactylorhiza"},"species":{"string":"majalis","authorship":"(Rchb. f.) P.F. Hunt & Summerhayes","combinationAuthorTeam":{"authorTeam":"P.F. Hunt & Summerhayes","author":["P.F. Hunt","Summerhayes"]},"basionymAuthorTeam":{"authorTeam":"Rchb. f.","author":["Rchb. f."]}},"infraspecies":[{"string":"praetermissa","rank":"ssp.","authorship":"(Druce) D.M. Moore & Soó","combinationAuthorTeam":{"authorTeam":"D.M. Moore & Soó","author":["D.M. Moore","Soó"]},"basionymAuthorTeam":{"authorTeam":"Druce","author":["Druce"]}}]}],"parser_run":1,"positions":{"0":["genus",12],"13":["species",19],"21":["author_word",23],"25":["author_word",32],"35":["genus",47],"48":["species",55],"57":["author_word",62],"63":["author_word",65],"67":["author_word",71],"72":["author_word",76],"79":["author_word",90],"91":["infraspecific_type",95],"96":["infraspecies",108],"110":["author_word",115],"117":["author_word",121],"122":["author_word",127],"130":["author_word",133]}}}

Developed for Global Names Index http://gni.globalnames.org/ supported by GBIF/EOL/NSFGlobal Biodiversity Information Facility / Encyclopedia of Life

Sister products:name-spotter -- Wrapper for name-finding libraries, TaxonFinder (EOL) and NetiNeti (for OCRed text)taxamatch_rb -- Tony Rees' algorithm for fuzzy matching of scientific names (compare with corpus)dwc-archive -- parser/generator for DarwinCore Archive (CSV + XML) format

Top downloadedBiogem

Page 13: Bioinformatics Open Source Conference 2013 @ Berlin ... · intermine entrez, eutils ruby-ensembl-api bio-dbsnp bio-chembl: Phyloinformatics biodiversity name-spotter bio-nexml bio-phyloxml:

Join us!

We welcome your contributions especially on• Statistics• Semantic Web• Command line apps• Web apps and visualization tools• and something new!

To find biogems -- http://biogems.info/To create a biogem -- http://biogems.info/howto.htmlInterviews w/ biogem developers -- coming soon ...

Without the Biogem system, we could not accumulate this varietyof apps/libs only from the core BioRuby community!

BioRuby -- is a core libraryBioGem -- can extend BioRuby, use BioRuby or also provide apps!

Page 14: Bioinformatics Open Source Conference 2013 @ Berlin ... · intermine entrez, eutils ruby-ensembl-api bio-dbsnp bio-chembl: Phyloinformatics biodiversity name-spotter bio-nexml bio-phyloxml:

BioInterchange by Baran J. et al.

RDF converters for TSV, XML, GFF3, GVF, Newick and other files

Spin-off project from the BioHackathons in 2012 and 2013• Developed ontologies for GFF and GVF

Next release:• Utilizes FALDO location ontology and Identifiers.org URIs

http://biointerchange.org/

We are also working on converters forGTF, VCF, PubMed, and INSDC data w/ appropriate ontologies

Page 15: Bioinformatics Open Source Conference 2013 @ Berlin ... · intermine entrez, eutils ruby-ensembl-api bio-dbsnp bio-chembl: Phyloinformatics biodiversity name-spotter bio-nexml bio-phyloxml:

Run (.bcl)

Basecall

Sample (.fq)

FASTQ

AppResult

BAM, VCFWorkflow

Mapping: BAM - BWA, Bowtie, SW, iSAAC, ELANDVariation: VCF - GATK, Somatic Variant Caller, Starling 2 :

AppSession

App launch

Project = Samples + AppResults

Creating BaseSpace apps

https://developer.basespace.illumina.com/https://basespace.illumina.com/

Page 16: Bioinformatics Open Source Conference 2013 @ Berlin ... · intermine entrez, eutils ruby-ensembl-api bio-dbsnp bio-chembl: Phyloinformatics biodiversity name-spotter bio-nexml bio-phyloxml:

BaseSpace Ruby SDK

BaseSpace - Illumina's cloud solution comes w/ Python, Java, R SDKsRuby version of SDK is developed by the BioRuby group in 2013git clone https://github.com/joejimbo/basespace-ruby-sdk.git(will be available on Illumina's web site shortly)

Developers can create your own app- You can easily utilize your NGS biogem w/ BaseSpace Ruby SDK- You will easily obtain much more users

Users can use your app without coding- Don't need to learn programming. Just a click!

BioBaseSpace for non-Ruby programmersDuring the Codefest 2013, we found that it can be a burden to create new Web app from scratch on top of your NGS program.So we started new project to provide a Web-app scaffold for BS.

Page 17: Bioinformatics Open Source Conference 2013 @ Berlin ... · intermine entrez, eutils ruby-ensembl-api bio-dbsnp bio-chembl: Phyloinformatics biodiversity name-spotter bio-nexml bio-phyloxml:

provides UI- file browser- parameters

- show the results(text/html/image etc.)

% biobasespace create my_cool_bs_app

BioBaseSpace

Rails/Rack app:

Ruby script (which may utilize Biogems) or can be any command line tool written in any other languages (Java, R, Python or C etc.)

talk to BS

fetch files

execute

results:report texts and/orimage files

View

Controller

% biobasespace deploy my_cool_bs_app --to (AWS|Heroku|others|localhost)

BaseSpace Ruby SDK+

just configure the program and parameters to be executed in the app

Web app (my_cool_bs_app)

Your NGS tool

Page 18: Bioinformatics Open Source Conference 2013 @ Berlin ... · intermine entrez, eutils ruby-ensembl-api bio-dbsnp bio-chembl: Phyloinformatics biodiversity name-spotter bio-nexml bio-phyloxml:

BioBaseSpace by Bonnal R. et al.

Scaffold your BS Web app w/ BaseSpace Ruby SDK inside

Done: Authentication & file browsing

Todo: Configuration of your tool to exec& showing the result

Launch your app

Codename: basespace-dojo & basespace-ninja

Page 19: Bioinformatics Open Source Conference 2013 @ Berlin ... · intermine entrez, eutils ruby-ensembl-api bio-dbsnp bio-chembl: Phyloinformatics biodiversity name-spotter bio-nexml bio-phyloxml:

Acknowledgements

BioRuby coreNaohisa Goto+ panel members+ many contributors http://bioruby.open-bio.org/wiki/Contributors

Biogem systemRaoul Bonnal, Pjotr Prins, Francesco Strozzi

Biogem developersMany! http://biogems.info/

BioInterchangeJoachim Baran et al.

BaseSpace Ruby SDKToshiaki Katayama, Joachim Baran, Eri Kibukawa, Raoul Bonnal, Francesco Strozzi