17
Genome annotation techniques: new approaches and challenges ,Drug Discovery Today, Volume 7, Issue 11, 6 May 2002, Pages 570-576 Alistair G. Rust, Emmanuel Mongin and Ewan Birney Loraine AE, Helt GA. Genome annotation Genome annotation techniques: techniques: new approaches and challenges new approaches and challenges Presented by Presented by Haili Ping Haili Ping

Genome annotation techniques: new approaches and challengesGenome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7,

Embed Size (px)

Citation preview

Page 1: Genome annotation techniques: new approaches and challengesGenome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7,

Genome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7, Issue 11, 6 May 2002, Pages 570-576 Alistair G. Rust, Emmanuel Mongin and Ewan Birney Loraine AE, Helt GA.

Genome annotation techniques:Genome annotation techniques:new approaches and challengesnew approaches and challenges

Presented byPresented by Haili PingHaili Ping

Page 2: Genome annotation techniques: new approaches and challengesGenome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7,

Exponential increase of the amount of human genomic sequence and genomes from other species needs to be matched by increases in the accurate annotation of this huge variety of genomes

Accurate annotation of the human genome and other species is an essential element in supporting current drug discovery efforts

Bioinformatics solutions are increasingly required to develop automatic annotation techniques to support and complement the manual curation process

Page 3: Genome annotation techniques: new approaches and challengesGenome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7,

Automatic genome annotation pipelines Primary : goal is to deliver highly accurate and reliable genome annotations, using the widest range of evidence from available databases.

Enssence : pipelines are the integration of suites of bioinformatics software tools with multiple databases, to manage automatically the analysis and storage of genomic sequence

Trend : single algorithm methods consensus-based approaches

combined results of gene predictors and similarity search methods are used

Page 4: Genome annotation techniques: new approaches and challengesGenome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7,

The generic structure of an automatic genome annotation pipeline and delivery system

Page 5: Genome annotation techniques: new approaches and challengesGenome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7,

Box 1. Useful human genome annotation and browser URLs

Automated annotation pipelines                 EBI/Sanger Institute Ensembl Project: http://www.ensembl.org/Homo_sapiens/                 NCBI Human Genome Browser: http://proxy.library.uiuc.edu:3367/genome/guide/human/                 The Oak Ridge National Laboratories Genome Channel: http://compbio.ornl.gov/channel/                 Celera Discovery System: http://cds.celera.com/                 Incyte Genomics ¯ Genomics Knowledge Platform: http://www.incyte.com/incyte

_science/technology/gkp/                 Paracel GeneMatcher2 System: http://www.paracel.com/products/gm2.htmlHuman genome browsers                 UCSC Human Genome Browser: http://genome.cse.ucsc.edu/cgi-bin/hgGateway/                 Softberry Genome Explorer: http://www.softberry.com/berry.phtml?topic=genomexp                 Viaken Enterprise Ensembl Solution: http://www.viaken.com/ns/solutions/ensembl.html                 LabBook Inc. Genomic Explorer Suite: http://www.labbook.com/products/ExplorerSuite.asp                 University of Tokyo Gene Resource Locator Browser: http://grl.gi.k.u-tokyo.ac.jp/Other useful sites                 The Institute for Genomic Research (TIGR): http://www.tigr.org/                 Human Genome Central: http://www.ensembl.org/genome/central/ and http://proxy.library.uiuc.edu:3528/genome/central/

Page 6: Genome annotation techniques: new approaches and challengesGenome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7,

From raw sequence to gene predictions

Raw sequence pre-processingmasking known repeats and low comlexity sequences using RepeatMasker identifying homology matches using BLAST Scans for other features, such as sequence tagged site (STS) markers and CpG islands

Gene predictionPredictions based on protein matches Predictions based on DNA sequence Ab initio gene prediction programs

Page 7: Genome annotation techniques: new approaches and challengesGenome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7,

A simplified schematic of algorithmic gene prediction

Page 8: Genome annotation techniques: new approaches and challengesGenome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7,

Gene function characterizationMapping to known genes

RefSeq and SWISS-PROTHUGO (NCBI,UCSC and Ensemble)

Protein domain annotationPam, PRINTS, PROSITE, ProDom, BLOCKS and SMART. Interpro project :creating a unique characterization for a given protein family, domain or functional site. Domains of the protein sequences can then be identified using this signature method. The use of Interpro provides the least-redundant and extensive annotation currently available

Gene ontologyGene Ontology (GO) project aims at defining such common terms to specify molecular function, biological process and cellular location

Page 9: Genome annotation techniques: new approaches and challengesGenome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7,

Sharing genome annotationsWebsite display and ftp sites

  

Chromosome 20    Overview

Page 10: Genome annotation techniques: new approaches and challengesGenome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7,
Page 11: Genome annotation techniques: new approaches and challengesGenome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7,

Pros : does not require expert bioinformatics skills and they are thus more accessible to a wide range of researchers wishing to gain access to genomic annotation

Cons: it makes it difficult to perform large-scale data mining

Solution : enabling more experienced users to retrieve the data they require and to run analyses locally

Open annotationThe need for researchers to have access to annotations available in the community and to share their own contributions with the communityThe need for a common protocol between systems that enables genome data to be freely exchanged

the AGAVE (Architecture for Genomic Annotation, Visualization and Exchange) and the Distributed Annotation System (DAS) projects

Page 12: Genome annotation techniques: new approaches and challengesGenome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7,

Challenges facing automatic annotation systems

Data warehousing: a solution for large-scale data miningFirst, the desired query statement might be too complex to implementSecond, the computing power needed might be too expensive in most cases for queries performed on large, monolithic databasesSolution:

the business sector using data warehousing, which segregates information into denormalized databases, enabling fast querying and data retrieval.

a large variety of data-mining tools to extract datasets of interest efficiently can result in subsequent stages of statistical analyses or data mining

Page 13: Genome annotation techniques: new approaches and challengesGenome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7,

The requirement to remain flexible

The development of automated annotation pipelines is an evolving process.

the quality of sequences and assemblies continue to improve, redundant sequences are replaced with new, superior sequences demandsa flexible system in which new, individual sequences can be added and

analysed without disrupting the whole systemnew, improved algorithms and methodologies

demandsthe architecture of a pipeline flexible to incorporate them into the analysis process without redesign of the system.

Page 14: Genome annotation techniques: new approaches and challengesGenome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7,

Future opportunitiesComparative genomicsAs more genomes are sequenced and become publicly available in the next few years, comparative genomics will become one of the greatest areas of development

Cross-species Analysis : human-mouseProtein coding genes are likely to be highly conserved between closely related species (e.g. mouse and human), and other regions, such as RNA genes and regulatory regions, could also be elucidatedneed for the development of bioinformatics tools

Vista, Synplot and FamilyJewels the integration of such tools with the current automated

approaches the design of genome browsers and websites that can intelligently display and annotate comparative results

Page 15: Genome annotation techniques: new approaches and challengesGenome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7,

Integrating and delivering new data

Horizontal integration

genomic systems should be able to cross-match species that can be sensibly compared

Vertical integration

New flows of data coming from proteomics and microarray sources will soon have to be incorporated

Page 16: Genome annotation techniques: new approaches and challengesGenome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7,

Concluding remarks Automatic genome annotation systems increased and is increasing.

Grounded upon central cores of bioinformatics software tools and associated relational databases

sequenced genomes integration of new genomes into the current systems

the demand for an openess towards the distribution of annotation data

the delivery of genomic data in forms suitable for large- scale data mining

Page 17: Genome annotation techniques: new approaches and challengesGenome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7,

References :1.Genome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7, Issue 11, 6 May 2002, Pages 570-576 Alistair G. Rust, Emmanuel Mongin and Ewan Birney Loraine AE, Helt GA. 2.Discovering new genes with advanced homology detection, Trends in Biotechnology, Volume 20, Issue 8, 1 August 2002, Pages 315-316 Weizhong Li and Adam Godzik 3.Biswas M, O'Rourke JF, Camon E, Fraser G, Kanapin A, Karavidopoulou Y, Kersey P, Kriventseva E, Mittard V, Mulder N, Phan I, Servant F, Apweiler R. Applications of InterPro in protein annotation and genome analysis. Brief Bioinform. 2002 Sep;3(3):285-95. PMID: 12230037 [PubMed - in process] http://www.ebi.ac.uk/interpro/ 4.Visualizing the genome: techniques for presenting human genome data and annotations. BMC Bioinformatics. 2002 Jul 30;3(1):19. http://www.pubmedcentral.gov/articlerender.fcgi?tool=pubmed&pubmedid=12149135 5.Oshiro G, Wodicka LM, Washburn MP, Yates JR 3rd, Lockhart DJ, Winzeler EA. Parallel identification of new genes in Saccharomyces cerevisiae. Genome Res. 2002 Aug;12(8):1210-20. PMID: 12176929 [PubMed - indexed for MEDLINE] http://www.genome.org/cgi/content/full/12/8/1210