26
Sequence analysis and bioinformatics using Debian GNU/Linux Andreas Tille Libre Software Meeting LSM, Nantes 2009 1 / 27

Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

Sequence analysis and bioinformatics usingDebian GNU/Linux

Andreas Tille

Libre Software Meeting

LSM, Nantes 2009

1 / 27

Page 2: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

Overview

1 Debian MedDebian Pure Blend for medical care and health scienceWhy Debian

2 ImplementationAvailable packagesBiological databases

3 Looking beyondAlternatives and prospectus

2 / 27

Page 3: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

Scope of Debian Med

Free management systems for patients in medical practiceand hospitals are rare

GNUmed Patient record documentation for generalpracticiants

MedinTux Practice management system written forFrench health care system

Vista Comprehensive software suite for hospitals(U.S. Department of Veterans Affairs)

Care2x Web based hospital management systemOthers . . .

However, people who hear the sound “Debian Med” justassume we provide a practice management system . . .. . . even if you tell them explicitly it is notSo what are the real strengths of Debian Med?

3 / 27

Page 4: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

Medical imaging

Debian Med can only include existing softwareFair amount of high quality Free Software for medicalimaging

Aeskulap, Amide: Medical image viewersDcmtk: OFFIS DICOM toolkitSofa: Simulation Open Framework ArchitectureFsl: analysis tools for brain imaging. . .

Complete overview on Debian Med imaging tasks page

4 / 27

Page 5: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

Molecular and structural biology, bioinformatics

Most established branch of Debian Med because of goodcoverage by upstream softwareFostering

Development at universitiesOrganised funding

HinderingAdvertising for proprietary softwareDifferent preferences of initiators

5 / 27

Page 6: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

Selected metapackages of Debian Med

2002 2003 2004 2005 2006 2007 2008 2009

Number of dependencies of selected Debian Med metapackages0

2040

6080 Microbiology

ImagingPractice

6 / 27

Page 7: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

Top 10 posters on [email protected]

2002 2003 2004 2005 2006 2007 2008 2009

010

020

030

040

0 Andreas.TilleCharles.PlessyDavid.PaleinoKarsten.HilbertSteffen.MoellerNelson.A..de.OliveiraMathieu.MalaterreMichael.HankeDaniel.LeidertSteve.M..Robbins

7 / 27

Page 8: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

Differences to commercial distributions

Commercial distributor Debian

Company Structure Organisation

Employees Persons Volunteers

CDs, Service Sells Nothing

Business plan Release If 0 RC-bugs

Certified Oracle, SAP, etc. Runs in principle

Beginners Preferred by Administrators

Rpm Packages Deb

Market Customisation Do-O-Cracy

8 / 27

Page 9: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

Customising Debian

Debian > 20000 packagesFocus on medical subset of those packagesEasy installation and configurationAutomatic installation→ cloud computingMaintaining a general infrastructure for medical usersPropagate the idea of Free Software in medicineCompletely integrated into Debian - no fork

Basic idea: Do not make a separatedistribution but make Debian fit for medical

care instead

9 / 27

Page 10: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

Debian - adaptable for any purpose?

Developed by about 1000 volunteersFlexible, not bound on commercial interestStrict rules (policy) glue all things togetherCommon interest of each individual developer:Get the best operating system for himself.Developers have children in real life or work in the field ofmedicine etc.In contrast to employees of companies every single Debiandeveloper has the freedom and ability to realize his visionEvery developer is able to influence the development ofDebian - he just has to do it.

Do-O-Cracy = "The doer decides"

10 / 27

Page 11: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

Programming language support

BioPerl Collection of Perl tools for computationalmolecular biology

BioPython Python library for computational molecular biologyBioRuby Ruby tools for computational molecular biologyBioJava Java API to biological data and applications

BioSQUID library of C code functions for sequence analysis

11 / 27

Page 12: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

Widely used software

BLAST2 Basic Local Alignment Search Toolofficial NCBI version of this famous sequencealignment program (Note that databases are notincluded in Debian; they must be retrievedmanually.)

EMBOSS European Molecular Biology Open Software SuiteEMBOSS is a free Open Source software analysispackage specially developed for the needs of themolecular biology (e.g. EMBnet) user community

12 / 27

Page 13: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

Statistics using GNU RR-cran-genetics GNU R package for population genetics

The package provides a library for the statisticsenvironment R that contains classes to representgenotypes and haplotypes at single markers up tomultiple markers on multiple chromosomes.

R-cran-haplo.stats GNU R package for haplotype analysisThe package provides routines for the GNU Rstatistics environment for statistical Analysis ofindirectly measured Haplotypes with Traits andCovariates when Linkage Phase is Ambiguous

Bioconductor GNU R tools for the analysis and comprehensionof genomic data.Not yet packaged for Debian but work in progressto automate packaging of CRAN and Bioconductorpackages.

There are some more general R packages recommended bymed-statistics

13 / 27

Page 14: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

Phylogenetic analysis

Altree Perform phylogeny based analysesfastdnaml Construction of phylogenetic trees of DNA

sequencesNjplot phylogenetic tree drawing program

Tree-puzzle Reconstruction of phylogenetic trees by maximumlikelihood

Treeviewx Displays and prints phylogenetic trees

Phylip Package of programs for inferring phylogeniesTreetool Interactive tool for displaying phylogenetic trees

14 / 27

Page 15: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

Genetics and analysis of RNA sequences

Genetics:Fastlink Faster version of pedigree programs of Linkage

Loki MCMC linkage analysis on general pedigreesPlink Whole-genome association analysis toolset

R-cran-qtl GNU R package for genetic marker linkageanalysis

Analysis of RNA sequences:

Infernal Inference of RNA secondary structural alignmentsRnahybrid Fast and effective prediction of microRNA/target

duplexes

15 / 27

Page 16: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

Sequence alignments and related programs

amap-alig Protein multiple alignment by sequence annealingBoxshade Pretty-printing of multiple sequence alignments

Dialign(-tx) Segment-based multiple sequence alignmentExonerate Generic tool for pairwise sequence comparisonGff2aplot Pair-wise alignment-plots for genomic sequences

in PostScriptHmmer Profile hidden Markov models for protein

sequence analysisKalign Global and progressive multiple sequence

alignmentMafft Multiple alignment program for amino acid or

nucleotide sequencesMummer Efficient sequence alignment of full genomes

Muscle Multiple alignment program of protein sequences

16 / 27

Page 17: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

Sequence alignments and related programs (cont.)Poa Partial Order Alignment for multiple sequence

alignmentProbcons PROBabilistic CONSistency-based multiple

sequence alignmentProda Multiple alignment of protein sequences

Seaview Multiplatform interface for sequence alignment andphylogeny

Sibsim4 Align expressed RNA sequences on a DNAtemplate

Sigma-align Simple greedy multiple alignment of non-codingDNA sequences

Sim4 Tool for aligning cDNA and genomic DNAT-coffee Multiple Sequence Alignment

Wise Comparison of biopolymers, commonly DNA andprotein sequences

17 / 27

Page 18: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

Molecular modelling and molecular dynamics

Adun.app Molecular Simulator for GNUstepAutogrid Pre-calculate binding of ligands to their receptor

Gamgi Construct, view and analyse atomic structuresGarlic Visualisation program for biomoleculesGdpc Visualiser of molecular dynamic simulations

Ghemical GNOME molecular modelling environmentGromacs Molecular dynamics simulator, with building and

analysis toolsPymol Molecular Graphics System

R-other-bio3d GNU R package for biological structure analysisRasmol Visualise biological macromolecules

Autodocktools GUI to help set up, launch and analyseAutoDock dockings

18 / 27

Page 19: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

High-throughput sequencing

“Next-generation sequencing”Chip-systems to sequence a genomeReads are very short (40 nucleotides rather thantraditionally about 600)Enormous amount of chromosomal regions covered

Last-align Genome-scale comparison of biologicalsequences

Maq Maps short fixed-length polymorphic DNAsequence reads to reference sequences

Ssake Genomics application for assembling millions ofvery short DNA sequences

Velvet Nucleic acid sequence assembler for very shortreads

19 / 27

Page 20: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

Mikrobiological packages

More than 80 PackagesOverview ataccording tasks page of Debian Med projectSoftware developed by

National Center for Biotechnology Information (NCBI)Sanger InstituteThe Institute for Genomic Research (TIGR). . .

20 / 27

Page 21: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

DebTags

udd=# SELECT tag, COUNT(*) FROM debtagsWHERE tag LIKE ’%bio%’GROUP BY tag ORDER BY tag;

tag | count-------------------------------+-------biology::emboss | 2biology::format:aln | 9biology::format:fasta | 9biology::nuceleic-acids | 11biology::peptidic | 12field::biology | 174field::biology:bioinformatics | 86field::biology:molecular | 8field::biology:structural | 16

(9 rows)

21 / 27

Page 22: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

How to install large databases

Bundling into Debian package makes no senseSize costs bandwidths and mirror spaceMoving target: Stable distribution will be out of date soonRemote service seems appropriate

Solution also works for astronomy and meteorology

getDataObtain data from external sourceMove data to local mirrorPreparation of configuration file for particular system thatdeals with the databasegetData is still in a proof of concept stagePeople are much welcome to join this development(Google Summer of Code project)

22 / 27

Page 23: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

Open Database License (ODbL) v1.0

Open Data CommonsLicense agreement intended to allow users to freely share,modify, and use this Database while maintaining this samefreedom for othersDatabases can contain a wide variety of types of content(images, audiovisual material, and sounds all in the samedatabase, for example)ODbL only governs the rights over the Database, and notthe contents of the Database individually

23 / 27

Page 24: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

Alternatives

BioLinuxBased on DebianCreate a policy incompatible structure in/usr/local/biolinux

Some software not yet available in Debian but really sloppywith licensesWe try to include the missing stuff in Debian to create apolicy compliant, really free systemHope BioLinux people will adopt this

FreeBSD ports collection BiologyAlso contains a fair amount of biological softwareOnly a few unimportant missing in Debian

24 / 27

Page 25: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

Prospectus

There are good requisites in DebianMost important tools of molecular biology, structuralbiology and bioinformatics for use in life sciences areincludedFurther increase of interest of developers and users andgetting them involved in the projectTurning Debian into the distribution of choice for peopleworking in the field of medicine because there is bestsupport for free medical software

25 / 27

Page 26: Sequence analysis and bioinformatics using Debian GNU/Linuxtille/talks/200907a_sequence_analysis/... · Programming language support BioPerl Collection of Perl tools for computational

This talk is available athttp://people.debian.org/˜ tille/talks/Andreas Tille <[email protected]>