Annotation. Traditional genome annotation BLAST Similarities

Preview:

Citation preview

Annotation

Traditional genome annotation

Traditional genome annotation

BLAST Similarities

Traditional genome annotation

BLAST Similarities

Traditional genome annotation

BLAST Similarities

Traditional genome annotation

BLAST Similarities

Traditional genome annotation

BLAST Similarities

Traditional genome annotation

BLAST Similarities

Traditional genome annotation

BLAST Similarities

Traditional genome annotation

BLAST Similarities

Traditional genome annotation

BLAST Similarities

Traditional genome annotation

BLAST Similarities

Traditional genome annotation

BLAST Similarities

Traditional genome annotation

BLAST Similarities

Traditional genome annotation

BLAST Similarities

Protein Families

Protein Families

Protein Families

Protein Families

Gene Ontology

Ontology A “hierarchy” of functions Does not need to be linear

Directed Acyclic Graph

Controlled Vocabulary Decides which words or phrases to use

GO

Gene ontology A eukaryotic focus

Drosophila Mus Saccharomyces Homo

GO

Cellular component The parts of a cell

Molecular function e.g. ligand binding

Biological processes What things do

GO Terms

[GO ID, function] e.g:

GO:0004743 Ontology: molecular function Name: pyruvate kinase activity

GO Terms

[GO ID, function] e.g:

GO:0004743 Ontology: molecular function Name: pyruvate kinase activity

Mainly assigned by BLAST/HMMER/... etc

Directed Acyclic GraphMolecular function

Catalytic activity

Transferase activity

Transferase activity, transferring phosphorous

Kinase activity phosphotransferase activity,alcohol group as acceptor

Pyruvate kinase activity

Problems

Annotation by committee Eukaryotic focus

Some efforts to counter that Owen White Arriane Toussaint

Not very deep Strict controlled vocabulary

Alternatives

lacZlacI lacY lacA

Jacob & Monod, 1961

Basic biology

lacZlacI lacY lacA

Basic biology

< 80 % < 80 % < 80%

Different types of clustering

< 80 % < 80 % < 80%

Different types of clustering

Purine metabolism

< 80 % < 80 % < 80%

Different types of clustering

Heme / chlorophyll metabolis

m is conserved

They are both porphyrins

Actin

obac

teria

Aquifi

cae

Bacte

roid

etes

Chlam

ydia

e

Chlor

oflex

i

Cyano

bact

eria

Deino

cocc

us-

Ther

mus Fi

rmicut

es

Spiro

chae

tes

Ther

mot

ogae

Prot

eoba

cter

ia

1

0.8

0.6

0.4

0.2

0

Clusters of genes w/ maximum 80% identity

Genes in subsystems in clustersTotal number of genomes in group

Fra

ctio

n o

f genes

in c

lust

ers

Num

ber o

f genom

es

0

40

80

120

Avera

ge

Occurrence of clustering in different genomes

Subsystem is a generalization of “pathway” collection of functional roles jointly involved in a

biological process or complex

Functional Role is the abstract biological function of a gene product atomic, or user-defined, examples:

6-phosphofructokinase (EC 2.7.1.11) LSU ribosomal protein L31p Streptococcal virulence factors Should not contain “putative”, “thermostable”, etc

Populated subsystem is complete spreadsheet of functions and roles

The Subsystems Approach to Annotation

1 HutH Histidine ammonia-lyase (EC 4.3.1.3)

2 HutU Urocanate hydratase (EC 4.2.1.49)

3 HutI Imidazolonepropionase (EC 3.5.2.7)4 GluF Glutamate formiminotransferase (EC 2.1.2.5)

5 HutG Formiminoglutamase (EC 3.5.3.8)

6 NfoD N-formylglutamate deformylase (EC 3.5.1.68)

7 ForI Formiminoglutamic iminohydrolase (EC 3.5.3.13)

Subsystem: Histidine Degradation

Conversion of histidine to glutamate Functional roles defined in table Inclusion in subsystem is only by functional role Controlled vocabulary …

Histidine Degradation

Column headers taken from table of functional roles Rows are selected genomes or organisms Cells are populated with specific, annotated genes Functional variants defined by the annotated roles Variant code -1 indicates subsystem is not functional Clustering shown by color

Organism Variant HutH HutU HutI GluF HutG NfoD ForI

Bacteroides thetaiotaomicron 1 Q8A4B3 Q8A4A9 Q8A4B1 Q8A4B0

Desulfotela psychrophila 1 gi51246205 gi51246204 gi51246203 gi51246202

Halobacterium sp. 2 Q9HQD5 Q9HQD8 Q9HQD6 Q9HQD7

Deinococcus radiodurans 2 Q9RZ06 Q9RZ02 Q9RZ05 Q9RZ04

Bacillus subtilis 2 P10944 P25503 P42084 P42068

Caulobacter crescentus 3 P58082 Q9A9MI P58079 Q9A9M0 Q9A9L9

Pseudomonas putida 3 Q88CZ7 Q88CZ6 Q88CZ9 Q88D00 Q88CZ3

Xanthomonas campestris 3 Q8PAA7 P58988 Q8PAA6 Q8PAA8 Q8PAA5

Listeria monocytogenes -1

Subsystem Spreadsheet

Subsystem Spreadsheet

1 HutH Histidine ammonia-lyase (EC 4.3.1.3)

2 HutU Urocanate hydratase (EC 4.2.1.49)

3 HutI Imidazolonepropionase (EC 3.5.2.7)4 GluF Glutamate formiminotransferase (EC 2.1.2.5)

5 HutG Formiminoglutamase (EC 3.5.3.8)

6 NfoD N-formylglutamate deformylase (EC 3.5.1.68)

7 ForI Formiminoglutamic iminohydrolase (EC 3.5.3.13)

Subsystem: Histidine Degradation

Organism Variant HutH HutU HutI GluF HutG NfoD ForI

Bacteroides thetaiotaomicron 1 Q8A4B3 Q8A4A9 Q8A4B1 Q8A4B0

Desulfotela psychrophila 1 gi51246205 gi51246204 gi51246203 gi51246202

Halobacterium sp. 2 Q9HQD5 Q9HQD8 Q9HQD6 Q9HQD7

Deinococcus radiodurans 2 Q9RZ06 Q9RZ02 Q9RZ05 Q9RZ04

Bacillus subtilis 2 P10944 P25503 P42084 P42068

Caulobacter crescentus 3 P58082 Q9A9MI P58079 Q9A9M0 Q9A9L9

Pseudomonas putida 3 Q88CZ7 Q88CZ6 Q88CZ9 Q88D00 Q88CZ3

Xanthomonas campestris 3 Q8PAA7 P58988 Q8PAA6 Q8PAA8 Q8PAA5

Listeria monocytogenes -1

Subsystem Spreadsheet

“The Populated Subsystem”

Wet lab Chromosomal context Metabolic context Phylogenetic context Microarray data Proteomics data

Subsystems developed based on

Three level “hierarchy”

• Amino Acids and Derivatives– Alanine, serine, and glycine

• Serine Biosynthesis

• Amino Acids and Derivatives– Lysine, threonine, methionine, and

cysteine• Methionine Biosynthesis

Make your own subsystems!

About 2,500 Subsystems

Growth in Subsystems Over Time

Classification#

SSClassification

# SS

Classification # SS

Experimental Subsystems

498 Regulation and Cell signaling

51 Motility and Chemotaxis

11

Clustering-based subsystems

352 Virulence 49 Plant cell walls and outer surfaces

10

Carbohydrates 160 Stress Response 43 Phages 10

Cofactors, Vitamins, Prosthetic Groups, Pigments

123 DNA Metabolism 41 Cell Division and Cell Cycle

10

Amino Acids and Derivatives

96 Aromatic Compounds 38 Photosynthesis 9

Protein Metabolism 95 Phages 36 Metabolite damage 8

Virulence, Disease, Defense

70 Secondary Metabolism 34 Phosphorus Metabolism

7

Miscellaneous 70 Iron acquisition and metabolism

31 Potassium metabolism 4

RNA Metabolism 65 Nucleosides and Nucleotides

24 Transcriptional regulation

2

Membrane Transport 65 Sulfur Metabolism 20 Plasmids 2

Respiration 62 Dormancy and Sporulation

17 Central metabolism 2

Cell Wall and Capsule 62 Plant-prokaryote 12 Autotrophy 2

Fatty Acids, Lipids, and Isoprenoids

60 Nitrogen Metabolism 12 Arabinose Transport 1

RAST usage grows...

RAST coverage....

RASTtk

RAST2.0 Customizable choice of pipelines to run Same behind the scenes infrastructure

RASTtk

Recommended