View
28
Download
3
Category
Preview:
DESCRIPTION
Comparing the Complete S. cerevisiae and C. elegans Proteomes: Orthology and Divergence. Stephen A. Chervitz Saccharomces Genome Database NCBI Boston University. Goals of this study. Explore protein sequence and domain conservation between S. cerevisiae and C. elegans . - PowerPoint PPT Presentation
Citation preview
Comparing the Complete S. cerevisiae and C. elegans Proteomes: Orthology and Divergence
Stephen A. Chervitz
Saccharomces Genome Database
NCBI
Boston University
2
Goals of this study Explore protein sequence and domain
conservation between S. cerevisiae and C. elegans. Unicellular vs. multicellular lifestyles
Classify yeast and worm similarity groups using functional annotation of yeast genes.
Enhance the SGD website and add value to the worm genomic sequence.
3
Organization of this study
Shared core biology Whole protein sequence comparisons
Divergence Protein domain comparisions
No gene predictions No mitochondrial sequence
4
Definitions
Orthologs: Genes from different species that perform the same biological function and are likely to be evolved from a common ancestral gene.
Paralogs: Genes that perform different biological functions in the same species that likely arose by duplication and divergence from a common ancestral gene.
5
Genome Scorecards
Saccharomyces cerevisiae Caenorhabditis elegansx200X20,000
No. of cells: 1 ~1000Size (Mbp): 12 97Chromosomes: 16 6Predicted ORFs: 6,217 19,099Percent coding: 72% 27%ORFs with gene names: 3,344 (53%) 688 (4%)
6
Core biology is carried out by similar numbers of proteins
7
Building a Biological Rosetta Stone
P-ValueYeast ORFs with
functional descriptionWorm orthologs with functional description
1e-10 86% 64%1e-20 89% 69%1e-40 93% 61%1e-60 96% 74%1e-80 96% 74%1e-100 98% 77%1e-200 98% 88%
8
Distribution of core biological functions conserved in both yeast and worm
9
Core Biological Functions Signal Transduction: kinases, phosphatases, Ras
superfamily and other GTP-binding proteins,GDP/GTP exchange factors, ADP-ribosylation factors, adenylyl/guanylyl cyclases, phosphatidylinositol kinases, EF-hand proteins
DNA/RNA Metabolism: polymerases, helicases, topoisomerases, repair/recombination-related, nucleases, primases, splicing factors, initiation/elongation factors (transcription & translation), tRNA synthetases, histone acetylases/deacetylases
Transport & Secretion: ABC transporters, permeases, vesicle coat & fusion proteins, clatherin-accociated, protein targeting, signal recognition particle, nuclear pore-associated
Cytoskeletal: Actin, myosin, tubulin, actin-related proteins, actin-interacting proteins, septins, cytokinesis-related proteins
10
Core Biological Functions (cont’d) Ribosomal:ribosomal proteins (small & large subunit),
ribosome processing proteins Protein Folding and Degradation: heat shock
proteins, chaperonins, proteasome subunits, ubiquitin-related, peptidyl prolyl cis-trans isomerase, protein disulfide isomerases, aminopeptidases, post-translational modifying enzymes (farnesyltransferase, myristoyltransferase, glycosylation, GPI-anchoring)
Intermediary Metabolism: dehydrogenases, reductases, mutases, lyases, isomerases, carboxylases, decarboxylases, nucleotide biosynthetic enzymes, transaminases, deaminases, epimerases, oxygenases, cytochromes, flavoproteins
11
Constructing Sequence Similarity Groups
12
Similarity Groups: MCM DNA replication initiator complex
13
Similarity Groups: Tubulin
14
Multiple Sequence Alignments
15
Domain Analysis
122 common eukaryotic protein domains. Associated with regulation of gene expression
and signal transduction. Compare occurrence and domain architectures
in yeast and worm protein sequences. Position-dependent weight matrices (profiles)
to detect domains (PSI-BLAST). Classify worm-only, yeast-only, and shared
domains.
16
Worm-Only Domains Nuclear hormone receptors Epidermal growth factor Degenerins FMRFamides (neuropeptides) Cadherin PTB (phosphotyrosine binding) T-box, SMAD (transcription factor domains) Insulin-like peptides Laminin NT
17
Yeast-Only Domains
C6 (Zn-binding cluster) ASPES (DNA-binding)
18
Shared Domains (Yeast & Worm) Protein kinase catalytic C2H2 Finger AAA ATPase DAG Kinase Arrestin Ankyrin SWI/SNF helicase RING-finger bHLH RHO GAP/GEF Plecstrin homology SH3 Ubiquitin
SH2 cNMP-signaling domains CaM EF-hands Homeodomains Potassium channels 7TM receptors HINT Immunoglobulin LRR vWA MATH POZ LIM
19
Frequency of occurrence of common domains
Domain counts are normalized to the number of proteins with a given domain per 1000 genes.
20
Conclusions Core biological functions are carried out by
orthologous proteins occurring in comparable numbers in yeast and worm.
These represent approx. 40% of the predicted yeast ORFs and 20% of the predicted worm ORFs.
Regulatory and signaling proteins in worm do not have orthologs in yeast but often share domains.
Complete results are available online at SGD at http://genome-www.stanford.edu/Saccharomyces/worm
21
Future Directions
Incorporate more sensitive sequence search results. More sophisticated clustering scheme.
Multi-domain proteins and weak similarities.
Up-to-date with to changes in the genomic datasets. Add/remove protein coding regions Correction of errors in the genomic sequence Sequence name changes
Extended annotation support. Controlled vocabularies, gene function ontologies.
Comparative genomics framework for additional genomes. More flexible browsing of genome-wide similarities.
Prototype yeast genome protein similarity Java viewer
22
Genome-wide protein similarity view Explore protein sequence similarities within or
between genomes Graphical user interface Available at SGD for the yeast genome
Sequence Resources, Protein Similarity View
23
24
Acknowledgements
Saccharomyces Genome Database (Stanford) Gavin Sherlock Cathy Ball Selina Dwight Midori Harris Kara Dolinski Shuai Weng Eric Hester Mike Cherry David Botstein
25
Acknowledgements (cont’d)
NCBI (Nat’l Library of Medicine) L. Aravind Eugene Koonin
Boston University Scott Mohr James Freeman Temple Smith
Neomorphic Software (Berkeley) www.neomorphic.com
26
Extra slides
27
Single-linkage clustering and multi-domain proteins
“Chaining”
1.
2.
3.
28
Whole genomic DNA microarrayDeRisi et al.(1997) Science 278: 680
29
Building a Biological Rosetta Stone
Recommended