1
Acknowledgements Comparative Analysis of Novel Proteins from the CATH Family of Zinc Comparative Analysis of Novel Proteins from the CATH Family of Zinc Peptidases Peptidases Debanu Das 1,2 , Abhinav Kumar 1,2 , Lukasz Jaroszewski 1,3 and Ashley Deacon 1,2 1 Joint Center for Structural Genomics, 2 Stanford Synchrotron Radiation Laboratory, Menlo Park, CA 94025, 3 Burnham Institute, La Jolla, CA, 92037 I. Introduction III. General structure and biochemistry These metallopeptidases show a high degree of structural conservation in the CATH domain which has a α/β/α sandwich architecture. The active site usually comprises of histidines and carboxylates interacting with two zinc ions. Despite the variety of molecular functions and substrate specificities of these proteins, the catalysis most likely involves a hydroxyl ion ligand involved in a nucleophilic attack. The full proteins often oligomerize and display some differences in their oligomerization state, however, the exact role of the oligomer in the molecular function is still unclear. In some cases, dimer formation results in assembly of a productive catalytic site. Dimerization is usually mediated by a dimerization domain. Higher oligomeric forms such as tetramers or octamers are also observed for some proteins. Figure of the representative CATH structure fro http://cathwww.biochem.ucl.ac.uk/cgi-bin/cath/GotoCath.pl?cath =3.40.630.10 II. Background and Significance CATH 3.40.630.10 proteins belong to PFAM clan CL0035 (Peptidase MH/MC/MF), and MEROPS peptidase (also termed proteases/proteinases/proteolytic enzymes) database clan MH/MC/MF of metallopeptidases. CL0035 has 7591 proteins in 8 Pfams: These proteins are involved in a variety of proteolytic activities, have a range of substrate specificities and are present in numerous microbial organisms, many of which are important human pathogens like S. aureus, S. typhimurium, T. vaginalis, M. tuberculosis, N. gonorrhea, N. meningitidis, C. trachomatis, G. intestinalis, and E. coli. Several of these proteins have been investigated for their therapeutic potential and diseases roles (Canavan’s disease, cancer therapy and prohormone/propeptide processing). V. Structures solved by JCSG IV. Progress of structure determination XII. Inferences and further work •In the quest for increasing structural coverage across protein families, it is expected that proteins similar in sequence within a protein family will be similar in structure. Increasing structural coverage provides better templates for modeling other proteins. The comparative structural analysis presented here provides experimental verification of the validity of this approach. •The structures for the proteins HP10645A and HP10645E suggest that they should be assigned to PF00246 in PfamA instead of the current suggestion of belonging to PF04952 by PfamB. • The 7 structures presented here provide a basis for enhancing the modeling of 2177 out of 7591 proteins (~29%) belonging to this Pfam clan. Furthermore, 3 of these JCSG structures provide the first examples of structures for proteins within a particular sequence cluster (2QYV, 2QJ8 and 3B2Y) and thus provide the basis for modeling 384 unique proteins (10 from organisms listed as top human pathogens) belonging to these 3 clusters from 2 different Pfams (PF01546 and PF04952). • 2QYV/HP9625C represents the first crystal structure of a dipeptidase PepD showing a dimer. • Further analysis will be performed to try to understand evolutionary relationships between these proteins based on sequence-based phylogenetic trees and structure-based trees. • Attempts will be made to investigate use of these structures and their comparative analyses in understanding structural basis for enzyme function and substrate specificities by analysis of active site amino acids, and to attempt to exploit information for therapeutic purposes. 2RB7.pdb (HP1666A), 1.6Å, R/Rfr=15.4/18.0% Unknown function, PF01546 48 close homologs from important human pathogens Potential in cancer therapy 2QYV.pdb (HP9625C), 2.11Å, R/Rf= 22.0, 24.4% Putative Xaa-His dipeptidase, PF01546, Zn+2 bound 7 close homologs from important human pathogens 2FVG.pdb (TM1049), 2.01Å, R/Rf= 20.3/24.4% Endoglucanase, PF05343 27 close homologs from important human pathogens HP10625B, 2.3Å, work in progress PF01546 50 close homologs from important human pathogens Potential in cancer therapy PF04952 Succinylglutamate desuccinylase / Aspartoacylase family (AstE-AspA ) 458 proteins 2 JCSG structures, 5 all other SG PF02127 Aminopeptidase I Zinc metalloprotease M18 227 4 all other SG PF01546 Peptidase family M20/M25/M40 3779 4 JCSG structures, 7 all other SG 6 non-SG PF00246 Zinc carboxypeptidase M14 1013 10 non-SG PF04389 Peptidase family M28 812 5 non-SG PF00883 Cytosol aminopeptidase family, catalytic domain 827 1 all other SG 1 non-SG PF05343 M42 Glutamyl aminopeptidase 427 1 JCSG structures, 1 all other SG 1 non-SG PF05450 Nicastrin (eukaryotic, not known to be peptidase, part of γ-secretase complex, no structures) 48 None None VII. Comparison of two proteins with >30% sequence identity within the same Pfam PF01546: 1CG2, 2RB7 1CG2:C-terminal glutamate moiety from folic acid and its analogues, such as methotrexate 2RB7: Unknown function, JCSG Common core ~290 aa, RMSD ~3.0 Å For structures that cluster together at 30% level, structural conservation in the common core is the highest, Generally only slight rearrangement of secondary structural elements is observed (within the domain). PF04952 32 3 PF02127 0 0 PF01546 56 1 PF00246 9 8 PF04389 10 7** PF00883 2 0 PF05343 5 1 PF05450 0 7** 0 20 40 60 80 100 120 140 Selected Cloned Expressed Purified Crystallized Diffracted Solved PDB Deposit VI. Phylogenetic tree and structure tree http://www.phlogeny.fr fatcat.burnham.org/POSA Sequence with >30% identity within a particular Pfam also cluster together in structure space 2QVP.pdb (HP10645A), 2.0Å, R/Rf= 16.1/21.3% Unknown function, PF04952 Structure suggests target may be closer in homology To PF00246 proteins VIII. Proteins with <30% sequence id. within the same Pfam PF01546: 2RB7, 2QYV (green) Common core ~250 aa, RMSD ~3.0 Å Common core ~190 aa, RMSD ~3.0 Å PF04952: 2QJ8, 3B2Y (cyan) r rearrangements and extensions of secondary tural elements. Inserts and novel features more common. * PFAM assigned based on sequence homology detected with FFAS http://ffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl There are 3 targets not assigned by PfamA or FFAS. ** 7 targets indicated show significant FFAS match to both PF04389 and PF05450, possibly distant bacterial homologs to the eukaryotic nicastrin family. Distribution of selected targets across Pfam families Targets assigned in PfamA Targets unassigned in PfamA * Current status of 137 targets All targets selected in March 2007 3B2Y.pdb (HP10645E), 1.74Å, R/Rfr=17.45/21.51% Unknown function, PF04952, Ni+2 bound Structure suggests target may be closer in homology To PF00246 proteins 2QJ8.pdb (HP10622H), 2.0Å, R/Rf= 20.7/25.4%, Unknown function, PF04952 Homolog involved in Canavan’s disease UCSD & Burnham (Bioinformatics Core) John Wooley Adam Godzik Lukasz Jaroszewski Slawomir Grzechnik Lian Duan Sri Krishna Subramanian Natasha Sefcovic Piotr Kozbial Andrew Morse Prasad Burra Tamara Astakhova Josie Alaoen Cindy Cook Dana Weekes TSRI (NMR Core) Kurt Wüthrich Reto Horst Maggie Johnson Amaranth Chatt erjee Michael Geralt Wojtek Augus tynia k Pedro Serrano Bill Pedrini William Placzek Stanford /SSRL (Structure Determination Core) Keith Hodgson Ashley Deacon Mitchell Miller Debanu Das Hsiu-Ju (Jessica) Chiu Kevin Jin Christopher Rife Qingping Xu Silvya Oommachen Scott Talafuse Henry van den Bedem Ronald Reyes Christine Trame Scientific Advisory Board Sir Tom Blundell Robert Stroud Univ. Cambridge Center for Structure of Membrane Proteins Homme Hellinga Membrane Protein Expression Center Duke University Medical Center UC San Francisco James Naismith James Paulson The Scottish Structural Proteomics facility Consortium for Functional Glycomics Univ. St. Andrews The Scripps Research Institute Soichi Wakatsuki Todd Yeates Photon Factory, KEK, Japan UCLA-DOE Inst. for Genomics and Proteomics James Wells UC San Francisco The JCSG is supported by the NIH Protein Structure Initiative (PSI) Grant U54 GM074898 from NIGMS (www.nigms.nih.gov). Portions of this research were carried out at the Stanford Synchrotron Radiation Laboratory (SSRL). The SSRL is a national user facility operated by Stanford University on behalf of the U.S. Department of Energy, Office of Basic Energy Sciences. The SSRL Structural Molecular Biology Program is supported by the Department of Energy, Office of Biological and Environmental Research, and by the NIH. GNF & TSRI (Crystallomics Core) Scott Lesley Mark Knuth Heath Klock Dennis Carlton Thomas Clayton Kevin D. Murphy Marc Deller Daniel McMullan Christina Trout Polat Abdubek Claire Acosta Linda M. Columbus Julie Feuerhelm Joanna C. Hale Thamara Janaratne Hope Johnson Linda Okach Edward Nigoghossian Sebastian Sudek Aprilfawn White Bernhard Geierstanger Glen Spraggon Ylva Elias Sanjay Agarwalla Charlene Cho Bi-Ying Yeh Anna Grzechnik Jessica Canseco Mimmi Brown TSRI (Admin C o r e ) Ian Wilson Marc El sl ig er Gye Won Han David Ma rc ia no Henry Tien Xiaoping Dai Lisa van Annual meeting with SAB 2007 As part of its mission to increase structural coverage of protein families, JCSG is targeting proteins from the large CATH homologous superfamily 3.40.630.10 of zinc peptidases, which belong to the phosphorylase/hydrolase-like fold in SCOP and are comprised of proteins from several Pfam families (the peptidase_MH clan). Hidden Markov Models from the CATH database were used to identify sequences in the JCSG genome pool. PSI-Blast seeded with sequences of these CATH family members were used to find additional proteins. These two sets contained 226 unique targets. After removing targets with more than 30% sequence identity to any PDB structure or to any crystallized target from a structural genomics center, 161 targets remained. Further clustering at 90% (in order to avoid nearly identical sequences), yielded a set of 137 targets. Prior to commencing work on these proteins in March 2007, there were ~40 unique structures from these Pfams from global SG and non-SG efforts. We have contributed 6 new structures and 7 other targets have been crystallized. We present our progress towards complete structural coverage of this family, highlighting common and variant structural features that support different molecular and cellular roles, focusing on active site residues, ligand binding, protein size and oligomerization state. This analysis may provide insights into structural themes that dictate protein function and also allows modeling of protein structures related by sequence. Our structures serve as a nucleation point for the design of further structure-based experiments to probe the biochemical and biomedical roles of these proteins. IX. Suggestion of PfamA assignment based on structure HP10645A (2QVP) and HP10645E (3B2Y) are assigned to PF04952 in PfamB. However, structural comparisons of only the CATH domain show a stronger similarity to a PF00246 protein (1QMU, left) than to a PF04952 protein (2QJ8, center) and this is also supported by structure & phylogenetic trees and FFAS. Also, like 1QMU, HP10645A/E lacks an ~70 amino acid insertion that forms a “C- terminal domain” (left, black circle) that is present in PF04952 proteins and is important for biochemical function. These two pieces of evidence suggest and support the assignment of HP10645A/E in PF00246 in PfamA. Alternatively, it is also possible that HP10645A/E could be novel members of PF04952 although sequence and structure suggest PF00246. Common core of 226 aa, RMSD 2.45 Å Common core of 191 aa, RMSD 2.49 Å X. Active site study may lead to structural basis of substrate specificity 2RB7 (cyan) and 1CG2, PF01546. Proteins in this Pfam with solved structures and >30% seq id with one another have function which include succinyl-diaminopimelate desuccinylase activity; Carboxypeptidase G2 which cleaves C-terminal glutamate moiety from folic acid and its analogues, such as methotrexate; N-acetyl-L-citrulline deacetylase and Peptidase T tripeptidase. Active site is 1CG2 is H112, D141, E200, E176, H385 Based on this, putative active site in 2RB7 is H72, D99, D100, E138, E139, D162 Hydrolysis of methotrexate by 1CG2 Based on this information, it would now be possible to perform targeted biochemical assays to determine substrate for 2RB7, to try to understand the structural basis for substrate selection and specificity and to exploit this information for its therapeutic potential. For example, can 2RB7 hydrolyse methorexate? Can it do so more efficiently? Can active site engineering based on structural information produce a more potent enzyme? Active site in 2RB7 XI. Elucidation of a unique oligomeric form The 2QYV (PepD, MEROPS M20.007, clan MH, subfamily C) monomer is very similar in structure to the 1LFW monomer (PepV, MEROPS M20.004, subfamily A). Both are dipeptidases belonging to PF01546. However, 1LFW is known to function as a monomer in which the molecular structure mimics that of a dimer seen in most other proteins in this Pfam. PepD in E. coli and Prevotella albensis are seen to function as dimers. 2QYV represents the first crystal structure of a PepD, revealing it to be dimeric in the crystal structure (see panel above) as well as by size exclusion chromatography and shows the structural nature of the dimer. This novel structure serves as a starting point for further experiments to probe the effect of this unique dimer formation on protein function. Superimposition of all 6 structures in PF04952: 1YW4, 1YW6, 2BCO, 2G9D, 2GU2 and 2QJ8

Acknowledgements Comparative Analysis of Novel Proteins from the CATH Family of Zinc Peptidases Debanu Das 1,2, Abhinav Kumar 1,2, Lukasz Jaroszewski 1,3

Embed Size (px)

Citation preview

Page 1: Acknowledgements Comparative Analysis of Novel Proteins from the CATH Family of Zinc Peptidases Debanu Das 1,2, Abhinav Kumar 1,2, Lukasz Jaroszewski 1,3

Acknowledgements

Comparative Analysis of Novel Proteins from the CATH Family of Zinc PeptidasesComparative Analysis of Novel Proteins from the CATH Family of Zinc Peptidases

Debanu Das1,2, Abhinav Kumar1,2, Lukasz Jaroszewski1,3 and Ashley Deacon1,2

1Joint Center for Structural Genomics, 2Stanford Synchrotron Radiation Laboratory, Menlo Park, CA 94025, 3Burnham Institute, La Jolla, CA, 92037

I. Introduction III. General structure and biochemistry

These metallopeptidases show a high degree of structural conservation in the CATHdomain which has a α/β/α sandwich architecture. Theactive site usually comprises of histidines and carboxylates interacting with two zinc ions.Despite the variety of molecular functionsand substrate specificities of these proteins, the catalysis most likely involves a hydroxyl ionligand involved in a nucleophilic attack. The full proteins often oligomerize and display some differences in their oligomerization state, however, the exact role of the oligomer in the molecular functionis still unclear. In some cases, dimer formation results inassembly of a productive catalytic site. Dimerization isusually mediated by a dimerization domain. Higheroligomeric forms such as tetramers or octamers are alsoobserved for some proteins.

Figure of the representative CATH structure frohttp://cathwww.biochem.ucl.ac.uk/cgi-bin/cath/GotoCath.pl?cath=3.40.630.10

II. Background and SignificanceCATH 3.40.630.10 proteins belong to PFAM clan CL0035 (Peptidase MH/MC/MF), and MEROPS peptidase (also termed proteases/proteinases/proteolytic enzymes) database clan MH/MC/MF of metallopeptidases. CL0035 has 7591 proteins in 8 Pfams:

These proteins are involved in a variety of proteolytic activities, have a range of substrate specificities and are present in numerous microbial organisms, many of which are important human pathogens like S. aureus, S. typhimurium, T. vaginalis, M. tuberculosis, N. gonorrhea, N. meningitidis, C. trachomatis, G. intestinalis, and E. coli. Several of these proteins have been investigated for their therapeutic potential and diseases roles (Canavan’s disease, cancer therapy and prohormone/propeptide processing).

V. Structures solved by JCSG

IV. Progress of structure determination

XII. Inferences and further work•In the quest for increasing structural coverage across protein families, it is expected that proteins similar in sequence within a protein family will be similar in structure. Increasing structural coverage provides better templates for modeling other proteins. The comparative structural analysis presented here provides experimental verification of the validity of this approach.

•The structures for the proteins HP10645A and HP10645E suggest that they should be assigned to PF00246 in PfamA instead of the current suggestion of belonging to PF04952 by PfamB.

• The 7 structures presented here provide a basis for enhancing the modeling of 2177 out of 7591 proteins (~29%) belonging to this Pfam clan. Furthermore, 3 of these JCSG structures provide the first examples of structures for proteins within a particular sequence cluster (2QYV, 2QJ8 and 3B2Y) and thus provide the basis for modeling 384 unique proteins (10 from organisms listed as top human pathogens) belonging to these 3 clusters from 2 different Pfams (PF01546 and PF04952).

• 2QYV/HP9625C represents the first crystal structure of a dipeptidase PepD showing a dimer.

• Further analysis will be performed to try to understand evolutionary relationships between these proteins based on sequence-based phylogenetic trees and structure-based trees.

• Attempts will be made to investigate use of these structures and their comparative analyses in understanding structural basis for enzyme function and substrate specificities by analysis of active site amino acids, and to attempt to exploit information for therapeutic purposes.

2RB7.pdb (HP1666A), 1.6Å, R/Rfr=15.4/18.0%Unknown function, PF0154648 close homologs from important human pathogensPotential in cancer therapy

2QYV.pdb (HP9625C), 2.11Å, R/Rf= 22.0, 24.4%Putative Xaa-His dipeptidase, PF01546, Zn+2 bound7 close homologs from important human pathogens

2FVG.pdb (TM1049), 2.01Å, R/Rf= 20.3/24.4%Endoglucanase, PF0534327 close homologs from important human pathogens

HP10625B, 2.3Å, work in progressPF0154650 close homologs from important human pathogensPotential in cancer therapy

PF04952 Succinylglutamate desuccinylase / Aspartoacylase family (AstE-AspA )

458 proteins 2 JCSG structures, 5 all other SG

PF02127 Aminopeptidase I Zinc metalloprotease M18 227 4 all other SG

PF01546 Peptidase family M20/M25/M40 3779 4 JCSG structures, 7 all other SG 6 non-SG

PF00246 Zinc carboxypeptidase M14 1013 10 non-SG

PF04389 Peptidase family M28 812 5 non-SG

PF00883 Cytosol aminopeptidase family, catalytic domain 827 1 all other SG 1 non-SG

PF05343 M42 Glutamyl aminopeptidase 427 1 JCSG structures, 1 all other SG 1 non-SG

PF05450 Nicastrin (eukaryotic, not known to be peptidase, part of γ-secretase complex, no structures)

48 None None

VII. Comparison of two proteins with >30% sequence identity within the same Pfam PF01546: 1CG2, 2RB7

1CG2:C-terminal glutamate moiety from folic acid and its analogues, such as methotrexate

2RB7: Unknown function, JCSG

Common core ~290 aa, RMSD ~3.0 Å

For structures that cluster together at 30% level, structural conservation in the common core is the highest,Generally only slight rearrangement of secondary structural elements is observed (within the domain).

PF04952 32 3

PF02127 0 0

PF01546 56 1

PF00246 9 8

PF04389 10 7**

PF00883 2 0

PF05343 5 1

PF05450 0 7**

0

20

40

60

80

100

120

140

Selected Cloned Expressed Purified Crystallized Diffracted Solved PDBDeposit

VI. Phylogenetic tree and structure tree

http://www.phlogeny.fr fatcat.burnham.org/POSA

Sequence with >30% identitywithina particular Pfam also clustertogether in structure space

2QVP.pdb (HP10645A), 2.0Å, R/Rf= 16.1/21.3%Unknown function, PF04952Structure suggests target may be closer in homology To PF00246 proteins

VIII. Proteins with <30% sequence id. within the same PfamPF01546: 2RB7, 2QYV (green) Common core ~250 aa, RMSD ~3.0 Å

Common core ~190 aa, RMSD ~3.0 ÅPF04952: 2QJ8, 3B2Y (cyan)

Larger rearrangements and extensions of secondary structural elements. Inserts and novel features more common.

* PFAM assigned based on sequence homology detected with FFAS http://ffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl There are 3 targets not assigned by PfamA or FFAS.

** 7 targets indicated show significant FFAS match to both PF04389 and PF05450, possibly distant bacterial homologs to the eukaryotic nicastrin family.

Distribution of selected targetsacross Pfam families

Targets assignedin PfamA

Targets unassignedin PfamA *

Current status of 137 targets

All targets selected in March 2007

3B2Y.pdb (HP10645E), 1.74Å, R/Rfr=17.45/21.51%Unknown function, PF04952, Ni+2 boundStructure suggests target may be closer in homology To PF00246 proteins

2QJ8.pdb (HP10622H), 2.0Å, R/Rf= 20.7/25.4%,Unknown function, PF04952Homolog involved in Canavan’s disease

UCSD & Burnham(Bioinformatics Core)

John Wooley Adam Godzik Lukasz Jaroszewski Slawomir Grzechnik Lian Duan Sri Krishna Subramanian Natasha Sefcovic Piotr Kozbial Andrew Morse Prasad Burra Tamara Astakhova Josie Alaoen Cindy Cook Dana Weekes

TSRI(NMR Core)

Kurt Wüthrich Reto Horst Maggie JohnsonAmaranth ChatterjeeMichael GeraltWojtek AugustyniakPedro SerranoBill PedriniWilliam Placzek

Stanford /SSRL(Structure Determination Core)

Keith Hodgson Ashley DeaconMitchell Miller Debanu DasHsiu-Ju (Jessica) Chiu Kevin JinChristopher Rife Qingping XuSilvya Oommachen Scott TalafuseHenry van den Bedem Ronald Reyes Christine Trame

Scientific Advisory Board

Sir Tom Blundell Robert Stroud Univ. Cambridge Center for Structure of Membrane Proteins Homme Hellinga Membrane Protein Expression Center Duke University Medical Center UC San FranciscoJames Naismith James Paulson The Scottish Structural Proteomics facility Consortium for Functional Glycomics Univ. St. Andrews The Scripps Research InstituteSoichi Wakatsuki Todd Yeates Photon Factory, KEK, Japan UCLA-DOE Inst. for Genomics and ProteomicsJames Wells UC San Francisco

The JCSG is supported by the NIH Protein Structure Initiative (PSI) Grant U54 GM074898 from NIGMS (www.nigms.nih.gov). Portions of this research were carried out at the Stanford Synchrotron Radiation Laboratory (SSRL). The SSRL is a national user facility operated by Stanford University on behalf of the U.S. Department of Energy, Office of Basic Energy Sciences. The SSRL Structural Molecular Biology Program is supported by the Department of Energy, Office of Biological and Environmental Research, and by the NIH.

GNF & TSRI (Crystallomics Core)

Scott Lesley Mark Knuth Heath Klock Dennis Carlton Thomas Clayton Kevin D. Murphy Marc Deller Daniel McMullan Christina TroutPolat Abdubek Claire Acosta Linda M. ColumbusJulie Feuerhelm Joanna C. Hale Thamara JanaratneHope Johnson Linda Okach Edward NigoghossianSebastian Sudek Aprilfawn White Bernhard GeierstangerGlen Spraggon Ylva Elias Sanjay AgarwallaCharlene Cho Bi-Ying Yeh Anna GrzechnikJessica Canseco Mimmi Brown

TSRI(Admin Core)

Ian WilsonMarc ElsligerGye Won HanDavid MarcianoHenry TienXiaoping DaiLisa van Veen

Annual meeting with SAB 2007

As part of its mission to increase structural coverage of protein families, JCSG is targeting proteins from the large CATH homologous superfamily 3.40.630.10 of zinc peptidases, which belong to the phosphorylase/hydrolase-like fold in SCOP and are comprised of proteins from several Pfam families (the peptidase_MH clan).

Hidden Markov Models from the CATH database were used to identify sequences in the JCSG genome pool. PSI-Blast seeded with sequences of these CATH family members were used to find additional proteins. These two sets contained 226 unique targets. After removing targets with more than 30% sequence identity to any PDB structure or to any crystallized target from a structural genomics center, 161 targets remained. Further clustering at 90% (in order to avoid nearly identical sequences), yielded a set of 137 targets.

Prior to commencing work on these proteins in March 2007, there were ~40 unique structures from these Pfams from global SG and non-SG efforts. We have contributed 6 new structures and 7 other targets have been crystallized.

We present our progress towards complete structural coverage of this family, highlighting common and variant structural features that support different molecular and cellular roles, focusing on active site residues, ligand binding, protein size and oligomerization state. This analysis may provide insights into structural themes that dictate protein function and also allows modeling of protein structures related by sequence. Our structures serve as a nucleation point for the design of further structure-based experiments to probe the biochemical and biomedical roles of these proteins.

IX. Suggestion of PfamA assignment based on structureHP10645A (2QVP) and HP10645E (3B2Y) are assigned to PF04952 in PfamB. However, structural comparisons of only the CATH domain show a stronger similarity to a PF00246 protein (1QMU, left) than to a PF04952 protein (2QJ8, center) and this is also supported by structure & phylogenetic trees and FFAS. Also, like 1QMU, HP10645A/E lacks an ~70 amino acid insertion that forms a “C-terminal domain” (left, black circle) that is present in PF04952 proteins and is important for biochemical function. These two pieces of evidence suggest and support the assignment of HP10645A/E in PF00246 in PfamA. Alternatively, it is also possible that HP10645A/E could be novel members of PF04952 although sequence and structure suggest PF00246.

Common core of 226 aa, RMSD 2.45 Å

Common core of 191 aa, RMSD 2.49 Å

X. Active site study may lead to structural basis of substrate specificity2RB7 (cyan) and 1CG2, PF01546. Proteins in this Pfam with solved structures and >30% seq id with one another have function which include succinyl-diaminopimelate desuccinylase activity; Carboxypeptidase G2 which cleaves C-terminal glutamate moiety from folic acid and its analogues, such as methotrexate; N-acetyl-L-citrulline deacetylase and Peptidase T tripeptidase.

Active site is 1CG2 is H112, D141, E200, E176, H385Based on this, putative activesite in 2RB7 is H72, D99, D100, E138, E139, D162 Hydrolysis of methotrexate by 1CG2

Based on this information, it would now be possible to perform targeted biochemical assays to determine substrate for 2RB7, to try to understand the structural basis for substrate selection and specificity and to exploit this information for its therapeutic potential. For example, can 2RB7 hydrolyse methorexate? Can it do so more efficiently? Can active site engineering based on structural information produce a more potent enzyme? Active site in 2RB7

XI. Elucidation of a unique oligomeric form The 2QYV (PepD, MEROPS M20.007, clan MH, subfamily C) monomer is very similar in structure to the 1LFW monomer (PepV, MEROPS M20.004, subfamily A). Both are dipeptidases belonging to PF01546. However, 1LFW is known to function as a monomer in which the molecular structure mimics that of a dimer seen in most other proteins in this Pfam. PepD in E. coli and Prevotella albensis are seen to function as dimers. 2QYV represents the first crystal structure of a PepD, revealing it to be dimeric in the crystal structure (see panel above) as well as by size exclusion chromatography and shows the structural nature of the dimer. This novel structure serves as a starting point for further experiments to probe the effect of this unique dimer formation on protein function.

Superimposition of all 6 structures in PF04952: 1YW4, 1YW6, 2BCO, 2G9D, 2GU2 and 2QJ8