51
April 2006 March 2007 March 2007 Xos Xos é Mª Fernández é Mª Fernández European Bioinformatics Institute European Bioinformatics Institute Browsing Genomes with Ensembl Browsing Genomes with Ensembl

April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

April 2006March 2007March 2007

XosXosé Mª Fernándezé Mª FernándezEuropean Bioinformatics InstituteEuropean Bioinformatics Institute

Browsing Genomes with EnsemblBrowsing Genomes with Ensembl

Page 2: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

2 of 50

• Overview of Ensembl• Making genomes useful• Beyond Ensembl

Outline of talkOutline of talk

Page 3: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

3 of 50

• Overview of Ensembl– Ensembl - Project– Exploring genomes– Gene annotation

• Making genomes useful• Beyond Ensembl

Outline of talkOutline of talk

Page 4: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

4 of 50

Ensembl - ProjectEnsembl - Project

• Joint project– EMBL – European Bioinformatics Institute (EBI) – Wellcome Trust Sanger Institute

• Produce accurate, automatic genome annotation• Focused on selected eukaryotic genomes • Integrate external (distributed) biological data• Presentation of the analysis to all via the Web at

http://www.ensembl.org • Open distribution of the analysis the community• Development of open, collaborative software (databases

and APIs)

Page 5: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

5 of 50

Ensembl - ProjectEnsembl - Project

• Joint project– EMBL – European Bioinformatics Institute (EBI) – Wellcome Trust Sanger Institute

• Produce accurate, automatic genome annotation• Focused on selected eukaryotic genomes • Integrate external (distributed) biological data• Presentation of the analysis to all via the Web at

http://www.ensembl.org • Open distribution of the analysis the community• Development of open, collaborative software (databases and

APIs)

Page 6: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

6 of 50

Beyond classical Beyond classical ab initioab initio gene predictiongene prediction

• Ensembl automatic gene prediction relies on homology ‘supporting evidence’ to avoid overprediction.

• Classical ab initio gene prediction (eg GENSCAN) relies partly on global statistics of protein coding potentials, not used in the cell

• Genes are just a series of short signals– Transcription start site– Translation start site– 5’ & 3’ Intron splicing signals– Termination signals

• Short signal sequences difficult to recognise over background noise in large genomes

Page 7: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

7 of 50

Ensembl - ProjectEnsembl - Project

• Joint project– EMBL – European Bioinformatics Institute (EBI) – Wellcome Trust Sanger Institute

• Produce accurate, automatic genome annotation• Focused on selected eukaryotic genomes • Integrate external (distributed) biological data• Presentation of the analysis to all via the Web at

http://www.ensembl.org • Open distribution of the analysis the community• Development of open, collaborative software (databases and

APIs)

Page 8: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

8 of 50

Ensembl v43Ensembl v43

Page 9: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

9 of 50

Ensembl - ProjectEnsembl - Project

• Joint project– EMBL – European Bioinformatics Institute (EBI) – Wellcome Trust Sanger Institute

• Produce accurate, automatic genome annotation• Focused on selected eukaryotic genomes • Integrate external (distributed) biological data• Presentation of the analysis to all via the Web at

http://www.ensembl.org • Open distribution of the analysis the community• Development of open, collaborative software (databases and

APIs)

Page 10: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

10 of 50

http://www.dasregistry.orghttp://www.dasregistry.org

DAS DAS RegistryRegistry

Page 11: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

11 of 50

DASDAS

Page 12: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

12 of 50

Ensembl - ProjectEnsembl - Project

• Joint project– EMBL – European Bioinformatics Institute (EBI) – Wellcome Trust Sanger Institute

• Produce accurate, automatic genome annotation• Focused on selected eukaryotic genomes • Integrate external (distributed) biological data• Presentation of the analysis to all via the Web at

http://www.ensembl.orghttp://www.ensembl.org • Open distribution of the analysis the community• Development of open, collaborative software (databases and

APIs)

Page 13: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

13 of 50

PrPre! and Archiv and Archive! sites sites

http://pre.ensembl.org

http://www.ensembl.org

http://archive.ensembl.org

Page 14: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

14 of 50

Ensembl - ProjectEnsembl - Project

• Joint project– EMBL – European Bioinformatics Institute (EBI) – Wellcome Trust Sanger Institute

• Produce accurate, automatic genome annotation• Focused on selected eukaryotic genomes • Integrate external (distributed) biological data• Presentation of the analysis to all via the Web at

http://www.ensembl.org • Open distribution of the analysis the community• Development of open, collaborative software (databases and

APIs)

Page 15: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

15 of 50

• Object model– standard interface makes it easy for others to build

custom applications on top of Ensembl data

• Open discussion of design ([email protected])• Most major pharma and many academics represented

on mailing list and code is being actively developed externally

• Ensembl locally– Both industry & academia

Open source open Open source open standardsstandards

Page 16: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

16 of 50

Ensembl – Open sourceEnsembl – Open source

Page 17: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

17 of 50

Ensembl - ProjectEnsembl - Project

• Joint project– EMBL – European Bioinformatics Institute (EBI) – Wellcome Trust Sanger Institute

• Produce accurate, automatic genome annotation• Focused on selected eukaryotic genomes • Integrate external (distributed) biological data• Presentation of the analysis to all via the Web at

http://www.ensembl.org • Open distribution of the analysis the community• Development of open, collaborative software (databases

and APIs)

Page 18: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

18 of 50

APIsAPIs• Used to retrieve data from and to store data

in Ensembl databases.• Ensembl Perl API;

– Written in Object-Oriented Perl,

– Foundation for the Ensembl Pipeline and Ensembl Web interface.

Page 19: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

19 of 50

• Overview of Ensembl– Ensembl - Project– Exploring genomes– Gene annotation

• Making genomes useful• Beyond Ensembl

Page 20: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

20 of 50

Making genomes usefulMaking genomes useful• Interpretation

– Where are the interesting parts of the genome?– What do they do?– How are they related to elements in other

genomes?• Access

– for bench biologists– for non-programming mid-scale groups– for good programming groups

Page 21: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

21 of 50

Access… bench biologistsAccess… bench biologists• Mainly via the web• Web site designed for non programming, not

that genome aware biologist– Simple things to find are simple to find– Graphically displays and overviews– Consistency of layout, colour and text

Page 22: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

22 of 50

Analysis DB

CPU

Final DB

SupportingDatabases

SNP

ManualAnnotation

EnsemblEnsembl

Page 23: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

23 of 50

Genome browsingGenome browsingwhy present the whole genome?why present the whole genome?

• Explore what is in a chromosome region• See features in and around a specific gene• Search & retrieve across the whole genome• Investigate genome organization• Compare to other genomes

Page 24: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

24 of 50

Introduction to the Introduction to the

Ensembl web siteEnsembl web site Ensembl … …

takes genomic sequence assemblieshuman build 36, mouse, rat, mosquito…

adds annotation and links automated process

presents all the data on a web site

Page 25: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

25 of 50

Basic Genome AnnotationBasic Genome Annotation

• Genes– Genomic location– Gene model structures

• Exons• Introns• UTRs

– Transcript(s)

• Pseudogenes• Non-coding RNA

– Protein(s)– Links to other sources of information

Page 26: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

26 of 50

Advanced Genome AnnotationAdvanced Genome Annotation

• Cytogenetic bands• Polymorphic markers

– Sequence Tagged Sites (STS)

• Genetic variation– Single Nucleotide Polymorphisms (SNPs)

– Deletion-Insertion Polymorphisms (DIPs)

– Short Tandem Repeats (STRs)

• Repetitive sequences• Expressed Sequence Tags (ESTs)• cDNAs or mRNAs from related species• Regions of sequence homology

Page 27: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

27 of 50

How to get started … …How to get started … …

• Species homepage

• Map View

• Text search

• BLAST

• SSAHA

Page 28: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

28 of 50

HomepageHomepage

Page 29: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

MapViewMapView

Page 30: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

30 of 50

BLAST and SSAHABLAST and SSAHA

See blast hit on genome

Page 31: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

31 of 50

Regions, maps and markersRegions, maps and markers

MarkerView

SNPView

GeneSNPView

ContigView

CytoView

SyntenyView

MultiContigView

Page 32: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

EnsemblEnsemblContigView

Page 33: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

33 of 62

ContigViewContigView close-up

Transcriptsred & black(Ensembl predictions)Blue (Vega) & gold (HAVANA, only in human)

Pop-up menu

Page 34: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

34 of 62

ContigViewContigView - Navigation

Click and drag mouse to select region

Page 35: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

CytoViewCytoView

Page 36: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

GeneSNPGeneSNPViewView

Page 37: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

SNPViewSNPView

Page 38: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

MarkerViewMarkerView

Page 39: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

MultiContigViewMultiContigView

Page 40: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

40 of 50

Genes & gene productsGenes & gene products

GeneView

TransViewExonView

ProteinView

FamilyView

GOView

Page 41: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

EnsemblEnsemblGeneView

Page 42: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

ExonViewExonView

TransViewTransView

Page 43: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

ProteinProteinViewView

Page 44: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

FamilyFamilyViewView

Page 45: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

GOViewGOView

Page 46: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

46 of 50

Data retrievalData retrieval

BioMart

Data sets on ftp site

MySQL queries of databases

Perl API access to databases

Export View

Page 47: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

ExportViewExportView

Page 48: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

48 of 50

Help!Help!

• context sensitive help pages - click

• access other documentation via generic home page

• email the helpdesk

Page 49: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

49 of 50

Ensembl TeamEnsembl TeamJuly 2006July 2006

Page 50: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

50 of 50

Leaders Ewan Birney (EBI), Tim Hubbard (Sanger Institute)

Database Schema and Core API Glenn Proctor, Andreas Kähäri, Ian Longden, Patrick Meidl

BioMart Arek Kasprzyk, Damian Smedley, Richard Holland, Syed Haider

Distributed Annotation System (DAS) Eugene Kulesha

Outreach Xosé M Fernández, Bert Overduin, Giulietta Spudich, Michael Schuster

Web TeamJames Smith, Bethan Pritchard, Fiona Cunningham, Anne Parker, Stephen Rice, Steve Trevanion (VEGA), Matt Wood

Comparative GenomicsAbel Ureta-Vidal, Kathryn Beal, Benoît Ballester, Stephen Fitzgerald, Javier Herrero Sánchez, Albert Vilella

Analysis and Annotation PipelineVal Curwen, Steve Searle, Bronwen Aken, Julio Banet, Laura Clarke, Sarah Dyer, Jan-Hinnerck Vogel, Kevin Howe, Felix Kokocinski, Stephen Rice, Simon White

Functional Genomics Paul Flicek, Yuan Chen, Stefan Gräf, Nathan Johnson, Daniel Rios

Zebrafish Annotation Kerstin Howe, Mario Caccamo, Tina Eyre, Ian Sealy

VectorBase Annotation Martin Hammond, Dan Lawson, Karyn Megy

Systems & Support Guy Coates, Tim Cutts, Shelley Goddard

ResearchDamian Keefe, Guy Slater, Michael Hoffman, Alison Meynert, Benedict Paten, Daniel Zerbino, Dace Ruklisa

Ensembl TeamEnsembl Team

March 2007March 2007

Page 51: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

51 of 50

Training...Training... Somewhere near you Somewhere near you