60
A brief on: Domain Families & Classification

A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

A brief on:

Domain Families & Classification

Page 2: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

• The discovery of domains in protein structures

• Domains at the sequence level

• Examples of “Domain Resources”

• Domain fusion

• Supra-domains

• Signaling domains and cell function

• InterPro

Evolution by Protein Domains

Page 3: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Classification to FamiliesWe can classify proteins into families by:

– A. Sequence (motifs; proteins)

– B. Structure

– C. Function (annotation)

– D. Evolution

Automatic

Large scale

Automatic

Large scale

Manual

High Quality

Manual

High Quality

Page 4: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Sequence Based Classification

• Proteins as a unit

• Proteins as combination of domains FunctionalStructural Sequence

The Goal:1. New Annotation, New Family, Family

connections (sub/ super) …2. Predicting power (given a new unknown

sequence)

Page 5: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Protein Multiple Alignment (Structurally supported)

Page 6: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Q: What is the best way to ‘represent’ this low sequence similarity of ~ 70 aa

Domains can be recognized through sequence similarity

Page 7: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Misannotation due to multidomain proteins

Smith and Zhang. Nat Biotechnol 1997 15:1222-3

Domain of known function

Domain of unknown function

kinase

Kinase-like

A

B

Kinase-like

A is similar to C, and C is similar to B, but A is not similar to B

Multidomain protein C

Annotation

Page 8: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Q: What is the best way to ‘represent’ this low sequence similarity of ~ 70 aa

‘Profile’ PSSM

Regular Expression

HMM

And more…

Page 9: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Multi domain protein families

Impossible to find ‘evolutionary relatedness”

without adding DOMAIN information…

Page 10: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

• Domains are the evolutionary units of sequence that comprise the gene coding regions.

• Most genes are built from more than one domain.

• Novel genes can be created by recombination of domains into new domain arrangements.

How is a novel gene born?

Page 11: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Glycerone-P

Glycerate-1,3P2

Glycerate-3P

PGK1

GAPDH

TIM

Glyceraldehyde-3P

Thermotoga Maritima PGK+TIM

M. genitalium PGK

M. genitalium TIM

Phytophthora infestans TIM+GAPDH

M. genitalium GAPDH

From Glycolysis:

Correspondence between functional associations and genes

linked by the fusion method

Page 12: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

8e-78

2e-47

9e-41

1e-42

False Transitivity of Local Alignment

CSKP HUMAN

DLG3 MOUSE

MPP3 HUMAN

K6A1 MOUSE

BLAST values

Pairwise similarities better than 1e-40 EScore

If we cluster these proteins, assuming transitivity of local alignment scores, we will cluster K6A1_MOUSE with MPP3_HUMAN

input

Page 13: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Used Terms:Motif = Domain = Signature

= Profile = Seed

Family = Cluster

These terms are used interchangeably, They are very (too) flexible

Domain Classification

(intro to few systems)

Page 14: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Protein Sequence Domain Classification

DOMO

ADDA

EVEREST

InterPro

CDD

MetaFam

ProSite

Pfam

Blocks+

Profile

SBASE

TigrFam

eMotif

SMART

PRINTS

ProDom

Based on different principles and a different focus

!

Page 15: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Integration: Data Fusion

InterPro 13,000 entries

Based on UniProt DB

Page 16: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Expert system Pfam

InterPro - >13,000 entries

2006 >8000

Sequence coverage Pfam-A : 75% Sequence coverage Pfam-B : 19% Other

Page 17: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Examples: complexity in domains

Identification ? Boundary ? Composition ?

Examples: complexity in domains

Identification ? Boundary ? Composition ?

Page 18: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Why domains and not proteins

Reducing false transitivity.

Exposing Mix and Match evolution

Immediate relevance to structural domain-families

Suggesting evolutionary ‘robust units’

Providing models for a family

Why automatic?

Overcoming large amounts of data

Unbiased identification of new families (even without an identified seed / without 3D structural information )

Page 19: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Domains are the building blocksof evolution: some facts..

Pyruvate kinase, PDB:1pkn

3 domains

Each occurs in diverse sets of protein families

Number of domains in proteins ranges from 1 up to tens

Structural based domain are ~ 150 aa Length varies: some are very short 30-40 aa, other are long > 500 aa

Domain definition is somewhat blurred

Domain boundary is an unsolved problem

Page 20: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

What is a domain? You know it when you see one

Page 21: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Automatic vs Manual

>13,000 entries

Page 22: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

General approaches• Motif based databases

• Prosite, Prints, Blocks, eMotif, InterPro• Domain-based databases

• Pfam, ProDom, Domo, Smart

• Manual/Semi-manual• Prosite

• Semi-automatically• Pfam, Smart

• Fully automatic• ProDom, Blocks, Domo, eMotif

• Use different models (regular expressions, profiles, HMMs)

• Based on each other

Page 23: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Example of semi - automatic

Pfam: Nucleic Acids Research, 2007, 1–8

1. Release of Pfam (22.0) contains 9318 protein families. cover 73.2% of sequences and 50.8%.

2. Pfam is now based on UniProtKB, NCBI GenPept and metagenomics projects.

3. ~ 500 new Pfam-A families for PDB sequences and SCOP entries.

Increasing the aa cover !

4. Clans are built manually (supported by literature, SCOP..) total of 283 clans comprising a total of 1808 Pfam-A

families.

Page 24: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

The Power of Integration

Pfam, Prosite, SMART, PRINTS,tigrFamProDom

InterPro

SCOPCATHFSSP

GOENZKEGG

Page 25: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

TRANSFERASE (METHYLTRANSFERASE) 1adm

Proteins were found to have spatially distinct structural unitsStructure Domains provide a “clean” definition

Page 26: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

In 1974, Michael Rossman observes that structural domains can recur in different structural contexts

1ht0 – an alcohol dehydrogenase1i0z – a lactate dehydrogenase

Rossman fold

Page 27: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Domains can recur in multiple copies in the same protein

Fibronectin protein–1fnf

Page 28: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

A distinct, compact, and stable protein structural unit that folds independently of other such units.

Structural definition of domains

Page 29: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

A distinct, compact, and stable protein structural unit that folds independently of other

such units.

Structural definition of domains

Page 30: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Recurrent domains in diphtheria toxin (1ddt)

The diphtheria toxin is made up of three domains, each of which is involved in a different stage of infection (receptor binding, membrane penetration,

and catalysis of ADP-ribosylation of elongation factor 2). A structural neighbor is depicted next to each domain of diphtheria toxin (middle).

Page 31: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Dominant domain fold types.

Holm and Sander. PROTEINS: Structure, Function, and Genetics 33:88–96 (1998)

Page 32: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

701

1,110

1,940

44,327

SCOP – a structural classification of proteins

Updated from Murzin et al. J. Mol. Biol. 247, 536-540.

Families are in turn grouped into superfamilies where sequence similarity is still recognizable and basic biochemical properties are conserved. Superfamilies and families are monophyletic(derive from a common ancestor)

Page 33: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Dominant domain fold types.

Holm and Sander. PROTEINS: Structure, Function, and Genetics 33:88–96 (1998)

Page 34: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Sequence Biology predominantly proceeds by decomposing proteins into their domains

Protein sequence families are constructed at the domain level

Page 35: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

PrositeA dictionary of functional and structural motifs and domains

Valuable biological information on each familyEach motif/domain/family is represented as a regular expression, a rule

or a profile

Models are generated from (usually published) multiple alignments, manually calibrated to ensure selectivity and sensitivity

Patterns do not always cover complete domains whereas profiles usually span the whole domain

As of June 2002 contains 1800 patterns and profiles describing 1200 families or domains

G-x(2,3)-[MLIV]-x-P-{K,H}-x(2)-C

1 2 3 4 5 6 7 8 9 10 11

A 0 0.25 0.25 1 0.5 0 1 0.5 0 0.25 0

C 0 0 0.25 0 0 0 0 0 0.25 0.25 1

G 1 0.5 0 0 0 0.25 0 0.5 0.75 0.25 0

T 0 0.25 0.5 0 0.5 0.75 0 0 0 0.25 0

OR

Page 36: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

From the SMART database

Detecting domains at the sequence level

Page 37: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Fusion link

Glycyl-tRNA Synthetase

E. Coli:

CT796

Fusion Links

glyQ glyS

C. Trachomatis:

The fact that glyQ and glyS interact could have been predicted from the fusion protein CT796

Page 38: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

InterproAn integrated resource of protein sites and functional domains

The good thing about standards is that there are so many of them to choose from…

Page 39: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Introducing Interpro….

http://www.ebi.ac.uk/interpro/

Page 40: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Interpro entry for a zinc finger domain

Page 41: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

:taxonomy חיפוש לפי •

Page 42: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

באדם:1Sirtתוצאות חיפוש לדוגמא עבור החלבון

Page 43: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

.Alignmentהצגת

Page 44: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

.HMM-Logoהצגת

Page 45: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

iPfam - .PDB המבוסס על רשומות domain-domainמאגר אינטראקציות

Page 46: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

יתרונות בולטים:

קישור ממאגרי המידע המובילים – •UniProt,PDB,interPro.

בקרה ידנית על החלוקה למשפחות.• עבור רצפים גלובלים ומקומיים.HMMחיפוש בעזרת • בהם משולב החלבון.domain architecturesריכוז של •עצים פילוגנטיים וטקסונומיים לחיפוש חלבונים •

הומולוגים מוכרים. בצורה גרפית.Alignment ו-HMMתצוגת •אפשרות להוריד את המאגר בשלמותו.•

Page 47: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Super-families of domains in Interpro(analogous to superfamilies in SCOP)

Page 48: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Some domains actually contain other domains!

Page 49: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

GATCTACCATGAAAGACTTGTGAATCCAGGAAGAGAGACTGACTGGGCAACATGTTATTCAGGTACAAAAAGATTTGGACTGTAACTTAAAAATGATCAAATTATGTTTCCCATGCATCAGGTGCAATGGGAAGCTCTTCTGGAGAGTGAGAGAAGCTTCCAGTTAAGGTGACATTGAAGCCAAGTCCTGAAAGATGAGGAAGAGTTGTATGAGAGTGGGGAGGGAAGGGGGAGGTGGAGGGATGGGGAATGGGCCGGGATGGGATAGCGCAAACTGCCCGGGAAGGGAAACCAGCACTGTACAGACCTGAACAACGAAGATGGCATATTTTGTTCAGGGAATGGTGAATTAAGTGTGGCAGGAATGCTTTGTAGACACAGTAATTTGCTTGTATGGAATTTTGCCTGAGAGACCTCATTGCAGTTTCTGATTTTTTGATGTCTTCATCCATCACTGTCCTTGATGGCATATTTTGTTCAGGGAATGGTGAATTAAGTGTGGCAGGAATGCTTTGTAGACACAGTAATTTGCTTGTATGGAGTCAAATAGTTTGGAACAGGTATAATGATCACAATAACCCCAAGCATAATATTTCGTTAATTCTCACAGAATCACATAT

AGGTGCCACAGTTATGGAGTSignalingandMulticellularityAAACCTTAGGAATAATGAATGATTTGCGCAGGCTCACCTGGATATTAAGACTGAGTCAAATGTTGGGTCTGGTCTGACTTTAATGTTTGCTTTGTTCATGAGCACCACATATTGCCTCTCCTATGCAGTTAAGCAGGTAGGTGACAGAAAAGCCCATGTTTGTCTCTACTCACACACTTCCGACTGAATGTATGTATGGAGTTTCTACACCAGATTCTTCAGTGCTCTGGATATTAACTGGGTATCCCATGACTTTATTCTGACACTACCTGGACCTTGTCAAATAGTTTGGACCTTGTCAAATAGTTTGGAGTCCTTGTCAAATAGTTTGGGGTTAGCACAGACCCCACAAGTTAGGGGCTCAGTCCCACGAGGCCATCCTCACTTCAGATGACAATGGCAAGTCCTAAGTTGTCACCATACTTTTGACCAACCTGTTACCAATCGGGGGTTCCCGTAACTGTCTTCTTGGGTTTAATAATTTGCTAGAACAGTTTACGGAACTCAGAAAAACAGTTTATTTTCTTTTTTTCTGAGAGAGAGGGTCTTATTTTGTTGCCCAGGCTGGTGTGCAATGGTGCAGTCATAGCTCATTGCAGCCTTGATTGTCTGGGTTCCAGTGGTTCTCCCACCTCAGCCTCCCTAGTAGCTGAGACTACATGCCTGCACCACCACATCTGGCTAGTTTCTTTTATTTTTTGTATAGATGGGGTCTTGTTGTGTTGGCCAGGCTGGCCACAAATTCCTGGTCTCAAGTGATCCTCCCACCTCAGCCTCTGAAAGTGCTGGGATTACAGATGTGAGCCACCACATCTGGCCAGTTCATTTCCTATTACTGGTTCATTGTGAAGGATACATCTCAGAAACAGTCAATGAAAGAGACGTGCATGCTGGATGCAGTGGCTCATGCCTGTAATCTCAGCACTTTGGGAGGCCAAGGTGGGAGGATCGCTTAAACTCAGGAGTTTGAGACCAGCCTGGGCAACATGGTGAAAACCTGTCTCTATAAAAAATTAAAAAATAATAATAATAACTGGTGTGGTGTTGTGCACCTAGAGTTCCAACTACTAGGGAAGCTGAGATGAGAGGATACCTTGAGCTGGGGACTGGGGAGGCTTAGGTTACAGTAAGCTGAGATTGTGCCACTGCACTCCAGCTTGGACAAAAGAGCCTGATCCTGTCTCAAAAAAAAGAAAGATACCCAGGGTCCACAGGCACAGCTCCATCGTTACAATGGCCTCTTTAGACCCAGCTCCTGCCTCCCAGCCTTCT

One of the key problems of becoming a multicellular organism is solving the problem of cell signaling.

Page 50: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

inactive active inactive

pkinase phosphotase

Phosphorylation can reversibly alter the activity of an enzyme through the combined action of a protein kinase and a protein phosphatase.

signal transduction

Page 51: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Tyrosine phosphorylation is a major mechanism of transmembrane signaling.

Pawson and Scott. Scientific American (2000)

Protein tyrosine kinases (PTKs) add phosphate to tyrosines

Page 52: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

SH2 domains (Src-homlogy 2)

SH2 domains are modules of ~100 amino acids that bind to specific phospho (pY)-containing peptide motifs

The Pawson Lab http://www.mshri.on.ca/pawson/domains.html

Page 53: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Pawson, T. et al., Trends in Cell Biology Vol.11 No.12 December 2001

The SH2 domain is found embedded in a wide variety of metazoan proteins that regulate functionally diverse processes.

Page 54: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Several modular domains have been identified that recognize specific sequences on their target acceptor proteins.

Protein modules for the assembly of signaling complexes

Pawson & Scott. Science (1997) 278 2075-2080

Page 55: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Pawson & Scott. Science (1997) 278 2075-2080

One way receptors may amplify their signaling is to use adaptor proteins that provide additional docking sites for modular signaling proteins.

Adaptor proteins

Page 56: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

The Order of Domains in the Polypeptide Chains of Src and Abl, and Diagrams of Their Assembled, Autoinhibited StatesIn both cases, the SH3-SH2 clamp fixes the bilobed kinase domain in an inactive conformation. The domain color codes are SH3, yellow; SH2, green; kinase small lobe, dark blue; kinase large lobe, light blue. The activation loop in the large lobe is red. Connector, linker, and N- and C-terminal extensions are black. In Bcr/Abl, gene fusion has replaced the Abl cap by a long segment of Bcr.

Harrison, S. C. (2003). Cell, 112, 737–740.

Supra-domains in Src and Abl

Page 57: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

A supra-domain is defined as a domain combination in a particular N-to-C-terminal orientation that occurs in at least two different domain architectures in different proteins with: (i) different types of domains at the N and C-terminal end of the combination; or (ii) different types of domains at one end and no domain at the other.

Supra-domainsEvolutionary units larger than single domains

Vogel C. J Mol Biol. 2004 336 (3) :809-23

N-terminal end C-terminal end

Each represents a different domain architecture

Supra-domain of size 2 and 3

Page 58: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

Chothia C. Science 2003 300: 1701-1703Vogel C. J Mol Biol. 2004 336 (3) :809-23

Supra-domainsEvolutionary units larger than single domains

The P-loop containing nucleotide triphosphate (NTP) hydrolase domain and the translation protein domain occur as one combination in several different translation factors.

This supra-domain occurs in 35 different domain architectures,and five of these are given here.

Page 59: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

The building blocks: modular interaction domains in signal transduction

Pawson & Nash. Science (2003) 300 445-452

Page 60: A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”

The Order of Domains in the Polypeptide Chains of Src and Abl, and Diagrams of Their Assembled, Autoinhibited StatesIn both cases, the SH3-SH2 clamp fixes the bilobed kinase domain in an inactive conformation. The domain color codes are SH3, yellow; SH2, green; kinase small lobe, dark blue; kinase large lobe, light blue. The activation loop in the large lobe is red. Connector, linker, and N- and C-terminal extensions are black. In Bcr/Abl, gene fusion has replaced the Abl cap by a long segment of Bcr.

Supra-domains in Src and Abl