View
216
Download
0
Embed Size (px)
Citation preview
April 2006March 2007March 2007
XosXosé Mª Fernándezé Mª FernándezEuropean Bioinformatics InstituteEuropean Bioinformatics Institute
Browsing Genomes with EnsemblBrowsing Genomes with Ensembl
2 of 50
• Overview of Ensembl• Making genomes useful• Beyond Ensembl
Outline of talkOutline of talk
3 of 50
• Overview of Ensembl– Ensembl - Project– Exploring genomes– Gene annotation
• Making genomes useful• Beyond Ensembl
Outline of talkOutline of talk
4 of 50
Ensembl - ProjectEnsembl - Project
• Joint project– EMBL – European Bioinformatics Institute (EBI) – Wellcome Trust Sanger Institute
• Produce accurate, automatic genome annotation• Focused on selected eukaryotic genomes • Integrate external (distributed) biological data• Presentation of the analysis to all via the Web at
http://www.ensembl.org • Open distribution of the analysis the community• Development of open, collaborative software (databases
and APIs)
5 of 50
Ensembl - ProjectEnsembl - Project
• Joint project– EMBL – European Bioinformatics Institute (EBI) – Wellcome Trust Sanger Institute
• Produce accurate, automatic genome annotation• Focused on selected eukaryotic genomes • Integrate external (distributed) biological data• Presentation of the analysis to all via the Web at
http://www.ensembl.org • Open distribution of the analysis the community• Development of open, collaborative software (databases and
APIs)
6 of 50
Beyond classical Beyond classical ab initioab initio gene predictiongene prediction
• Ensembl automatic gene prediction relies on homology ‘supporting evidence’ to avoid overprediction.
• Classical ab initio gene prediction (eg GENSCAN) relies partly on global statistics of protein coding potentials, not used in the cell
• Genes are just a series of short signals– Transcription start site– Translation start site– 5’ & 3’ Intron splicing signals– Termination signals
• Short signal sequences difficult to recognise over background noise in large genomes
7 of 50
Ensembl - ProjectEnsembl - Project
• Joint project– EMBL – European Bioinformatics Institute (EBI) – Wellcome Trust Sanger Institute
• Produce accurate, automatic genome annotation• Focused on selected eukaryotic genomes • Integrate external (distributed) biological data• Presentation of the analysis to all via the Web at
http://www.ensembl.org • Open distribution of the analysis the community• Development of open, collaborative software (databases and
APIs)
8 of 50
Ensembl v43Ensembl v43
9 of 50
Ensembl - ProjectEnsembl - Project
• Joint project– EMBL – European Bioinformatics Institute (EBI) – Wellcome Trust Sanger Institute
• Produce accurate, automatic genome annotation• Focused on selected eukaryotic genomes • Integrate external (distributed) biological data• Presentation of the analysis to all via the Web at
http://www.ensembl.org • Open distribution of the analysis the community• Development of open, collaborative software (databases and
APIs)
10 of 50
http://www.dasregistry.orghttp://www.dasregistry.org
DAS DAS RegistryRegistry
11 of 50
DASDAS
12 of 50
Ensembl - ProjectEnsembl - Project
• Joint project– EMBL – European Bioinformatics Institute (EBI) – Wellcome Trust Sanger Institute
• Produce accurate, automatic genome annotation• Focused on selected eukaryotic genomes • Integrate external (distributed) biological data• Presentation of the analysis to all via the Web at
http://www.ensembl.orghttp://www.ensembl.org • Open distribution of the analysis the community• Development of open, collaborative software (databases and
APIs)
13 of 50
PrPre! and Archiv and Archive! sites sites
http://pre.ensembl.org
http://www.ensembl.org
http://archive.ensembl.org
14 of 50
Ensembl - ProjectEnsembl - Project
• Joint project– EMBL – European Bioinformatics Institute (EBI) – Wellcome Trust Sanger Institute
• Produce accurate, automatic genome annotation• Focused on selected eukaryotic genomes • Integrate external (distributed) biological data• Presentation of the analysis to all via the Web at
http://www.ensembl.org • Open distribution of the analysis the community• Development of open, collaborative software (databases and
APIs)
15 of 50
• Object model– standard interface makes it easy for others to build
custom applications on top of Ensembl data
• Open discussion of design ([email protected])• Most major pharma and many academics represented
on mailing list and code is being actively developed externally
• Ensembl locally– Both industry & academia
Open source open Open source open standardsstandards
16 of 50
Ensembl – Open sourceEnsembl – Open source
17 of 50
Ensembl - ProjectEnsembl - Project
• Joint project– EMBL – European Bioinformatics Institute (EBI) – Wellcome Trust Sanger Institute
• Produce accurate, automatic genome annotation• Focused on selected eukaryotic genomes • Integrate external (distributed) biological data• Presentation of the analysis to all via the Web at
http://www.ensembl.org • Open distribution of the analysis the community• Development of open, collaborative software (databases
and APIs)
18 of 50
APIsAPIs• Used to retrieve data from and to store data
in Ensembl databases.• Ensembl Perl API;
– Written in Object-Oriented Perl,
– Foundation for the Ensembl Pipeline and Ensembl Web interface.
19 of 50
• Overview of Ensembl– Ensembl - Project– Exploring genomes– Gene annotation
• Making genomes useful• Beyond Ensembl
20 of 50
Making genomes usefulMaking genomes useful• Interpretation
– Where are the interesting parts of the genome?– What do they do?– How are they related to elements in other
genomes?• Access
– for bench biologists– for non-programming mid-scale groups– for good programming groups
21 of 50
Access… bench biologistsAccess… bench biologists• Mainly via the web• Web site designed for non programming, not
that genome aware biologist– Simple things to find are simple to find– Graphically displays and overviews– Consistency of layout, colour and text
22 of 50
Analysis DB
CPU
Final DB
SupportingDatabases
SNP
ManualAnnotation
EnsemblEnsembl
23 of 50
Genome browsingGenome browsingwhy present the whole genome?why present the whole genome?
• Explore what is in a chromosome region• See features in and around a specific gene• Search & retrieve across the whole genome• Investigate genome organization• Compare to other genomes
24 of 50
Introduction to the Introduction to the
Ensembl web siteEnsembl web site Ensembl … …
takes genomic sequence assemblieshuman build 36, mouse, rat, mosquito…
adds annotation and links automated process
presents all the data on a web site
25 of 50
Basic Genome AnnotationBasic Genome Annotation
• Genes– Genomic location– Gene model structures
• Exons• Introns• UTRs
– Transcript(s)
• Pseudogenes• Non-coding RNA
– Protein(s)– Links to other sources of information
26 of 50
Advanced Genome AnnotationAdvanced Genome Annotation
• Cytogenetic bands• Polymorphic markers
– Sequence Tagged Sites (STS)
• Genetic variation– Single Nucleotide Polymorphisms (SNPs)
– Deletion-Insertion Polymorphisms (DIPs)
– Short Tandem Repeats (STRs)
• Repetitive sequences• Expressed Sequence Tags (ESTs)• cDNAs or mRNAs from related species• Regions of sequence homology
27 of 50
How to get started … …How to get started … …
• Species homepage
• Map View
• Text search
• BLAST
• SSAHA
28 of 50
HomepageHomepage
MapViewMapView
30 of 50
BLAST and SSAHABLAST and SSAHA
See blast hit on genome
31 of 50
Regions, maps and markersRegions, maps and markers
MarkerView
SNPView
GeneSNPView
ContigView
CytoView
SyntenyView
MultiContigView
EnsemblEnsemblContigView
33 of 62
ContigViewContigView close-up
Transcriptsred & black(Ensembl predictions)Blue (Vega) & gold (HAVANA, only in human)
Pop-up menu
34 of 62
ContigViewContigView - Navigation
Click and drag mouse to select region
CytoViewCytoView
GeneSNPGeneSNPViewView
SNPViewSNPView
MarkerViewMarkerView
MultiContigViewMultiContigView
40 of 50
Genes & gene productsGenes & gene products
GeneView
TransViewExonView
ProteinView
FamilyView
GOView
EnsemblEnsemblGeneView
ExonViewExonView
TransViewTransView
ProteinProteinViewView
FamilyFamilyViewView
GOViewGOView
46 of 50
Data retrievalData retrieval
BioMart
Data sets on ftp site
MySQL queries of databases
Perl API access to databases
Export View
ExportViewExportView
48 of 50
Help!Help!
• context sensitive help pages - click
• access other documentation via generic home page
• email the helpdesk
49 of 50
Ensembl TeamEnsembl TeamJuly 2006July 2006
50 of 50
Leaders Ewan Birney (EBI), Tim Hubbard (Sanger Institute)
Database Schema and Core API Glenn Proctor, Andreas Kähäri, Ian Longden, Patrick Meidl
BioMart Arek Kasprzyk, Damian Smedley, Richard Holland, Syed Haider
Distributed Annotation System (DAS) Eugene Kulesha
Outreach Xosé M Fernández, Bert Overduin, Giulietta Spudich, Michael Schuster
Web TeamJames Smith, Bethan Pritchard, Fiona Cunningham, Anne Parker, Stephen Rice, Steve Trevanion (VEGA), Matt Wood
Comparative GenomicsAbel Ureta-Vidal, Kathryn Beal, Benoît Ballester, Stephen Fitzgerald, Javier Herrero Sánchez, Albert Vilella
Analysis and Annotation PipelineVal Curwen, Steve Searle, Bronwen Aken, Julio Banet, Laura Clarke, Sarah Dyer, Jan-Hinnerck Vogel, Kevin Howe, Felix Kokocinski, Stephen Rice, Simon White
Functional Genomics Paul Flicek, Yuan Chen, Stefan Gräf, Nathan Johnson, Daniel Rios
Zebrafish Annotation Kerstin Howe, Mario Caccamo, Tina Eyre, Ian Sealy
VectorBase Annotation Martin Hammond, Dan Lawson, Karyn Megy
Systems & Support Guy Coates, Tim Cutts, Shelley Goddard
ResearchDamian Keefe, Guy Slater, Michael Hoffman, Alison Meynert, Benedict Paten, Daniel Zerbino, Dace Ruklisa
Ensembl TeamEnsembl Team
March 2007March 2007
51 of 50
Training...Training... Somewhere near you Somewhere near you