13
Peter Rice and Mahmut Uludag EMBOSS as an Efficient DAS Annotation Source Peter Rice, EBI ([email protected]) Mahmut Uludag, EBI ([email protected]) 10th March 2009

Peter Rice and Mahmut Uludag EMBOSS as an Efficient DAS Annotation Source Peter Rice, EBI ([email protected]) Mahmut Uludag, EBI ([email protected]) 10th March

Embed Size (px)

Citation preview

Peter Rice and Mahmut Uludag

EMBOSS as an Efficient DAS Annotation Source

Peter Rice, EBI ([email protected])

Mahmut Uludag, EBI ([email protected])

10th March 2009

EMBOSS: History

• European Molecular Biology Open Software Suite• 1996: Started at Sanger Centre• 2000: Release 1.0.0 and moved to HGMP• 2005: Moved to EBI (HGMP closed)• 2008: Release 6.0.0

http://emboss.sourceforge.net

EMBOSS: Status

• Open source package• Sequence analysis• 200 applications• 100 third-party applications

• Reads 40 sequence formats• Writes 40 sequence formats• Reads 6 feature formats• Writes 10 feature formats

EMBOSS: Interfaces

• Over 100 interfaces / packages containing EMBOSS

• Command line• Web interfaces• GUIs• SOAP Web services (EMBRACE)• Taverna workflows

• Galaxy

Overview

EMBOSS produces annotations in DASGFF format Protein sequence referencing using Uniprot

protein identifiers Nucleotide sequence referencing using

Ensembl gene identifiers MyDAS based annotation server

Executes EMBOSS programs based on the incoming requests

Protein sequence annotation,EMBOSS programs used so far

pepcoil; predicted coiled coil regions in protein sequences patmatmotifs; motifs from the PROSITE database helixturnhelix; nucleic acid-binding motifs in protein

sequences garnier; predicted protein secondary structures using

GOR method sigcleave; predicted signal cleavage sites in protein

sequences digest; protein proteolytic enzyme or reagent cleavage sites antigenic; predicted antigenic regions in protein sequences

Nucleotide sequence annotation,EMBOSS programs used so far

equicktandem, tandem; tandem repeats in nucleotide sequences

silent; restriction enzyme sites in a nucleotide sequence which can be inserted (mutated) without changing the translation

jaspscan; transcription factor binding sites from the JASPAR database

marscan; matrix/scaffold recognition (MRS) signatures in DNA sequences

restrict; restriction enzyme cleavage sites in nucleotide sequences

tcode; protein-coding regions identified using Fickett TESTCODE statistic

Other EMBOSS programsthat can be used for annotation

26 EMBOSS programs producing graphical outputs Possibly using stylesheet support in Ensembl &

DAS 13 EMBOSS alignment programs

DAS 1.53E has alignment extension

Test clients used

Dasty2; for protein annotations Good in displaying individual features Useful links for further exploration

Links to ontology terms used Links to original DAS responses

Ensembl; for gene and protein annotations Displays features in genomic context Possible to use DAS resources that not in the registry

Example Dasty screen:

Example Ensembl screen:

Work in progress

Need to register on dasregistry.org Experimental DAS server available at

http://wwwdev.ebi.ac.uk/soaplab/das DAS servers as data sources

Common coordinate systems

The EMBOSS Team

Peter Rice Alan Bleasby Jon Ison

Mahmut Uludag