1
UniProt and the PDB Matching sequence and structure at the residue level Paul J. Gane 1 and UniProt Consortium 1,2,3 1 EMBL-European Bioinformatics Institute, Cambridge, UK 2 SIB Swiss Institute of Bioinformatics, Geneva, Switzerland UniProtKB 3 Protein Information Resource, Georgetown University, Washington DC & University of Delaware, USA Email: [email protected] URL: www.uniprot.org Funding UniProt is funded by the European Molecular Biology Laboratory, National Institutes of Health, European Union, Swiss Federal Government, British Heart Foundation and National Science Foundation. ID CRYAB_HUMAN Reviewed; 175 AA. AC P02511; B0YIX0; O43416; Q9UC37; Q9UC38; Q9UC39; Q9UC40; Q9UC41; DT 21-JUL-1986, integrated into UniProtKB/Swiss-Prot. DE RecName: Full=Alpha-crystallin B chain; DE AltName: Full=Alpha(B)-crystallin; DE AltName: Full=Heat shock protein beta-5; DE Short=HspB5; DE AltName: Full=Renal carcinoma antigen NY-REN-27; DE AltName: Full=Rosenthal fiber component; GN Name=CRYAB; Synonyms=CRYA2; OS Homo sapiens (Human). OX NCBI_TaxID=9606; DR PDB; 2KLR; NMR; -; A/B=1-175. DR PDB; 2WJ7; X-ray; 2.63 A; A/B/C/D/E=67-157. DR PDB; 2Y1Y; X-ray; 2.00 A; A=71-157. DR PDB; 2Y1Z; X-ray; 2.50 A; A/B=67-157. DR PDB; 2Y22; X-ray; 3.70 A; A/B/C/D/E/F=67-157. DR PDB; 2YGD; EM; 9.40 A; A/B/C/D/E/F/G/H/I/J/K/L/M/N/O/P/Q/R/S/T/U/V/W/X=1-175. DR PDB; 3L1G; X-ray; 3.32 A; A=68-162. DR PDB; 3SGM; X-ray; 1.70 A; A/B/C/D=90-100. DR PDB; 3SGN; X-ray; 2.81 A; A/B=90-100. DR PDB; 3SGO; X-ray; 2.56 A; A=90-100. DR PDB; 3SGP; X-ray; 1.40 A; A/B/C/D=92-100. DR PDB; 3SGR; X-ray; 2.17 A; A/B/C/D/E/F=92-100. DR PDB; 3SGS; X-ray; 1.70 A; A=95-100. FT CHAIN 1 175 Alpha-crystallin B chain. FT /FTId=PRO_0000125907. FT METAL 104 104 Zinc. FT METAL 111 111 Zinc. FT METAL 119 119 Zinc. FT SITE 48 48 Susceptible to oxidation. FT SITE 60 60 Susceptible to oxidation. FT SITE 68 68 Susceptible to oxidation. FT MOD_RES 1 1 N-acetylmethionine (Probable). FT MOD_RES 19 19 Phosphoserine. FT MOD_RES 45 45 Phosphoserine. FT MOD_RES 59 59 Phosphoserine. FT MOD_RES 92 92 N6-acetyllysine; partial. FT MOD_RES 166 166 N6-acetyllysine. FT CARBOHYD 170 170 O-linked (GlcNAc) (By similarity). FT VARIANT 41 41 S -> Y (in dbSNP:rs2234703). FT /FTId=VAR_014607. FT VARIANT 51 51 P -> L (in dbSNP:rs2234704). FT /FTId=VAR_014608. FT VARIANT 120 120 R -> G (in MFM2; decreased interactions FT with wild-type CRYAA and CRYAB but FT increased interactions with wild-type FT CRYBB2 and CRYGC; dbSNP:rs28929489). FT /FTId=VAR_007899. FT CONFLICT 165 165 E -> K (in Ref. 4; AAC19161). FT CONFLICT 175 175 K -> KKMPFLELHFLKQESFPTSE (in Ref. 4; FT AAC19161). SQ SEQUENCE 175 AA; 20159 MW; AE08BED46B7849CB CRC64; MDIAIHHPWI RRPFFPFHSP SRLFDQFFGE HLLESDLFPT STSLSPFYLR PPSFLRAPSW FDTGLSEMRL EKDRFSVNLD VKHFSPEELK VKVLGDVIEV HGKHEERQDE HGFISREFHR KYRIPADVDP LTITSSLSSD GVLTVNGPRK QVSGPERTIP ITREEKPAVT AAPKK HEADER CHAPERONE 22-MAY-09 2WJ7 TITLE HUMAN ALPHAB CRYSTALLIN COMPND MOL_ID: 1; COMPND 2 MOLECULE: ALPHA-CRYSTALLIN B CHAIN; COMPND 3 CHAIN: A, B, C, D, E; COMPND 4 FRAGMENT: ALPHA-CRYSTALLIN DOMAIN, RESIDUES 67-157; COMPND 5 SYNONYM: ALPHA(B)-CRYSTALLIN, ROSENTHAL FIBER COMPONENT, COMPND 6 HEAT SHOCK PROTEIN BETA-5, HSPB5, RENAL CARCINOMA ANTIGEN COMPND 7 NY-REN-27, HUMAN ALPHAB; COMPND 8 ENGINEERED: YES SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: HOMO SAPIENS; REMARK 999 THE SEQUENCE IS RESIDUES 67-157 PRECEDED BY A GAM TAG DBREF 2WJ7 A 1 3 PDB 2WJ7 2WJ7 1 3 DBREF 2WJ7 A 4 94 UNP P02511 CRYAB_HUMAN 67 157 SEQRES 1 A 94 GLY ALA MET GLU MET ARG LEU GLU LYS ASP ARG PHE SER SEQRES 2 A 94 VAL ASN LEU ASP VAL LYS HIS PHE SER PRO GLU GLU LEU SEQRES 3 A 94 LYS VAL LYS VAL LEU GLY ASP VAL ILE GLU VAL HIS GLY SEQRES 4 A 94 LYS HIS GLU GLU ARG GLN ASP GLU HIS GLY PHE ILE SER SEQRES 5 A 94 ARG GLU PHE HIS ARG LYS TYR ARG ILE PRO ALA ASP VAL SEQRES 6 A 94 ASP PRO LEU THR ILE THR SER SER LEU SER SER ASP GLY SEQRES 7 A 94 VAL LEU THR VAL ASN GLY PRO ARG LYS GLN VAL SER GLY SEQRES 8 A 94 PRO GLU ARG ATOM 1 N MET A 3 23.981 -7.754 15.338 1.00 40.49 N ATOM 2 CA MET A 3 23.218 -8.749 14.574 1.00119.83 C ATOM 3 C MET A 3 24.149 -9.916 14.119 1.00 98.08 C ATOM 4 O MET A 3 25.254 -10.069 14.670 1.00 80.95 O ATOM 5 CB MET A 3 22.414 -8.092 13.403 1.00 45.41 C ATOM 6 N GLU A 4 23.686 -10.723 13.149 1.00 63.18 N ATOM 7 CA GLU A 4 24.418 -11.846 12.545 1.00 46.19 C ATOM 8 C GLU A 4 25.920 -11.668 12.409 1.00 83.92 C ATOM 9 O GLU A 4 26.425 -10.547 12.237 1.00 48.25 O ATOM 10 CB GLU A 4 23.895 -12.164 11.147 1.00 37.69 C ATOM 11 CG GLU A 4 24.844 -13.184 10.250 1.00 95.40 C ATOM 12 CD GLU A 4 25.982 -12.580 9.254 1.00125.93 C ATOM 13 OE1 GLU A 4 26.166 -11.344 9.039 1.00 77.38 O ATOM 14 OE2 GLU A 4 26.731 -13.391 8.642 1.00 87.82 O ATOM 15 N MET A 5 26.659 -12.773 12.451 1.00 36.27 N ATOM 16 CA MET A 5 28.054 -12.609 12.120 1.00 59.07 C ATOM 17 C MET A 5 28.601 -13.893 11.567 1.00 59.32 C >sp|P02511|CRYAB_HUMAN Alpha-crystallin B chain OS=Homo sapiens GN=CRYAB MDIAIHHPWIRRPFFPFHSPSRLFDQFFGEH LLESDLFPTSTSLSPFYLRPPSFLRAPSW FDTGLSEMRLEKDRFSVNLDVKHFSPEELKV KVLGDVIEVHGKHEERQDEHGFISREFHR KYRIPADVDPLTITSSLSSDGVLTVNGPRKQV SGPERTIPITREEKPAVTAAPKK >2WJ7:A|PDBID|CHAIN|SEQUENCE GAMEMRLEKDRFSVNLDVKHFSPEE LKVKVLGDVIEVHGKHEERQDEHGFI SREFHRKYRIPADVDPLTITSSLSSDG VLTVNGPRKQVSGPER PDB UniProtKB-PDB residue level mapping Why Residue Level Mapping? The UniProt Knowledgebase (UniProtKB), the worldwide protein sequence resource, contains over 32 million sequences (as of release 2013-03). Of these, 539616 have been manually annotated. The ‘added value’ of this annotation implies a degree of certainty about the quality of the sequence as well as a large amount of extra information which has been collated from a wide variety of sources. One of these is the Protein Databank (PDB) which has 3D experimental details of protein folds, active/binding site residues, ligands, metals and cofactor binding from which mechanisms of action can be deduced. This information is invaluable for drug design, homology modelling, impact of SNPs, mutation studies, novel protein design, etc. The number of solved structures in the PDB is 87681 on 03/2013 (less if non-redundancy is taken into account) - this represents a very small fraction of the total UniProtKB universe and would appear to have little impact on the improvement of UniProtKB annotation. However, this structural and functional information can be extended to the widely distributed homologous and orthologous sequences related to these PDB entries. The mappings are mostly generated automatically and updated weekly via a Java application called getafix. The number of new PDB structures deposited each week varies but is somewhere between 200-500 with each one requiring a ‘match’ to its specific UniProtKB entry. Problematic matches always occur and these are manually curated. Examples include chimaeras, N- and C- terminal tags, missing sections and domains, short sequences and peptides, antibodies and immunoglobulin folds, modified and non-standard residues. Merged, demerged and deleted UniProtKB entries are often a source of error in automated mapping and also need manual inspection. Mapped PDB text files possess one or more DBREF line(s) which indicate which residues of the structure relate to which in a UniProtKB sequence. In cases of multiple structures or chimaeras one PDB entry will point to a number UniProtKB identifiers. UniProtKB entries will cross-reference one or more PDB records in their DR PDB line(s). A direct link from each amino acid in a UniProtKB sequence to a PDB entry may appear a trivial task but, as can be seen in the simple example above, the N and C termini are not expressed in this crystallised protein (red). Note also that the SEQRES lines in the PDB suggest that the structure has an N-terminal 3 residue tag ‘GAM’. The actual coordinates, however, start with the final methionine of the tag, a residue not part of the UniProtKB sequence. Again, the SEQRES lines state the sequence ends in QVSGPER, whereas in fact these residues are also missing from the 3D coordinates (purple). Final mapped sequence: EMRLEKDRFSVNLDVKHFSPEELKV KVLGDVIEVHGKHEERQDEHGFISRE FHRKYRIPADVDPLTITSSLSSDGVLT VNGPRKQVSGPER The Binding of Biological Molecules in Protein Structures The binding of biologically important molecules in a PDB structure is captured and automatically added to unreviewed UniProtKB/TrEMBL entries, visible in the various FT lines. Reviewed or hand annotated UniProtKB/Swiss-Prot entries can be updated with similar information from the PDB using an in-house curator tool, again part of the getafix suite. FT METAL 167 185 Manganese[1ATP]. FT NP_BIND 50 58 ATP. FT NP_BIND 122 128 ATP. FT NP_BIND 169 172 ATP. FT ACT_SITE 167 167 Proton acceptor. FT BINDING 73 73 ATP. Collaborations and Applications Maintaining up to date mappings relies on a close collaboration with the PDBe and good communication with the RCSB. The mapping data is integral to the SIFTS database (structure integration with function, taxonomy and sequence) which provides residue level mapping to IntEnz, GO, Pfam, InterPro, SCOP, CATH and PubMed databases. The Enzyme Portal is another resource which uses residue level mappings by combining enzyme sequence and structure information with small molecule substrates/drugs and biochemical pathways and functionality. runPdbReleaseCheck UniProtKB/ Swiss-Prot and TrEMBL PDBe Get new and modified pdb files RunWeeklyPdbRelease.sh PDBe Repository for all PDB entries getafix Fasta and XML files pdbReleaseMapping Editor DBREF.txt PDBe mappings PDBe cronjob UniProtPdbXrefs.txt Email SIB SwissProtAddLogFile.txt SwissProtCuratedMoveLogFile.txt SwissProtDeletedXrefLogfile.txt ftp XREF files weekly monthly buildGetafixDB weekly Obsolete etc. files Errors in PDB entries email RCSB, PDBj or PDBe SIFTS GO annotation xml TrEMBL SwissProt CSA Catalytic site annotations FT lines to TrEMBL makeUniProtPdbXrefs RCSB

UniProtKB-PDB residue level mapping · UniProt and the PDB Matching sequence and structure at the residue level Paul J. Gane1 and UniProt Consortium1,2,3 1 EMBL-European Bioinformatics

  • Upload
    ngonhan

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: UniProtKB-PDB residue level mapping · UniProt and the PDB Matching sequence and structure at the residue level Paul J. Gane1 and UniProt Consortium1,2,3 1 EMBL-European Bioinformatics

UniProt and the PDB Matching sequence and structure at the residue level

Paul J. Gane1 and UniProt Consortium1,2,3

1EMBL-European Bioinformatics Institute, Cambridge, UK

2SIB Swiss Institute of Bioinformatics, Geneva, Switzerland

UniProtKB

3Protein Information Resource, Georgetown University, Washington DC & University of Delaware, USA

Email: [email protected] URL: www.uniprot.org

Funding

UniProt is funded by the European Molecular Biology Laboratory, National Institutes of Health, European Union, Swiss Federal Government, British Heart Foundation and National Science Foundation.

UniProt

ID CRYAB_HUMAN Reviewed; 175 AA.

AC P02511; B0YIX0; O43416; Q9UC37; Q9UC38; Q9UC39; Q9UC40; Q9UC41;

DT 21-JUL-1986, integrated into UniProtKB/Swiss-Prot.

DE RecName: Full=Alpha-crystallin B chain;

DE AltName: Full=Alpha(B)-crystallin;

DE AltName: Full=Heat shock protein beta-5;

DE Short=HspB5;

DE AltName: Full=Renal carcinoma antigen NY-REN-27;

DE AltName: Full=Rosenthal fiber component;

GN Name=CRYAB; Synonyms=CRYA2;

OS Homo sapiens (Human).

OX NCBI_TaxID=9606;

DR PDB; 2KLR; NMR; -; A/B=1-175.

DR PDB; 2WJ7; X-ray; 2.63 A; A/B/C/D/E=67-157.

DR PDB; 2Y1Y; X-ray; 2.00 A; A=71-157.

DR PDB; 2Y1Z; X-ray; 2.50 A; A/B=67-157.

DR PDB; 2Y22; X-ray; 3.70 A; A/B/C/D/E/F=67-157.

DR PDB; 2YGD; EM; 9.40 A; A/B/C/D/E/F/G/H/I/J/K/L/M/N/O/P/Q/R/S/T/U/V/W/X=1-175.

DR PDB; 3L1G; X-ray; 3.32 A; A=68-162.

DR PDB; 3SGM; X-ray; 1.70 A; A/B/C/D=90-100.

DR PDB; 3SGN; X-ray; 2.81 A; A/B=90-100.

DR PDB; 3SGO; X-ray; 2.56 A; A=90-100.

DR PDB; 3SGP; X-ray; 1.40 A; A/B/C/D=92-100.

DR PDB; 3SGR; X-ray; 2.17 A; A/B/C/D/E/F=92-100.

DR PDB; 3SGS; X-ray; 1.70 A; A=95-100.

FT CHAIN 1 175 Alpha-crystallin B chain.

FT /FTId=PRO_0000125907.

FT METAL 104 104 Zinc.

FT METAL 111 111 Zinc.

FT METAL 119 119 Zinc.

FT SITE 48 48 Susceptible to oxidation.

FT SITE 60 60 Susceptible to oxidation.

FT SITE 68 68 Susceptible to oxidation.

FT MOD_RES 1 1 N-acetylmethionine (Probable).

FT MOD_RES 19 19 Phosphoserine.

FT MOD_RES 45 45 Phosphoserine.

FT MOD_RES 59 59 Phosphoserine.

FT MOD_RES 92 92 N6-acetyllysine; partial.

FT MOD_RES 166 166 N6-acetyllysine.

FT CARBOHYD 170 170 O-linked (GlcNAc) (By similarity).

FT VARIANT 41 41 S -> Y (in dbSNP:rs2234703).

FT /FTId=VAR_014607.

FT VARIANT 51 51 P -> L (in dbSNP:rs2234704).

FT /FTId=VAR_014608.

FT VARIANT 120 120 R -> G (in MFM2; decreased interactions

FT with wild-type CRYAA and CRYAB but

FT increased interactions with wild-type

FT CRYBB2 and CRYGC; dbSNP:rs28929489).

FT /FTId=VAR_007899.

FT CONFLICT 165 165 E -> K (in Ref. 4; AAC19161).

FT CONFLICT 175 175 K -> KKMPFLELHFLKQESFPTSE (in Ref. 4;

FT AAC19161).

SQ SEQUENCE 175 AA; 20159 MW; AE08BED46B7849CB CRC64;

MDIAIHHPWI RRPFFPFHSP SRLFDQFFGE HLLESDLFPT STSLSPFYLR PPSFLRAPSW

FDTGLSEMRL EKDRFSVNLD VKHFSPEELK VKVLGDVIEV HGKHEERQDE HGFISREFHR

KYRIPADVDP LTITSSLSSD GVLTVNGPRK QVSGPERTIP ITREEKPAVT AAPKK

HEADER CHAPERONE 22-MAY-09 2WJ7

TITLE HUMAN ALPHAB CRYSTALLIN

COMPND MOL_ID: 1;

COMPND 2 MOLECULE: ALPHA-CRYSTALLIN B CHAIN;

COMPND 3 CHAIN: A, B, C, D, E;

COMPND 4 FRAGMENT: ALPHA-CRYSTALLIN DOMAIN, RESIDUES 67-157;

COMPND 5 SYNONYM: ALPHA(B)-CRYSTALLIN, ROSENTHAL FIBER COMPONENT,

COMPND 6 HEAT SHOCK PROTEIN BETA-5, HSPB5, RENAL CARCINOMA ANTIGEN

COMPND 7 NY-REN-27, HUMAN ALPHAB;

COMPND 8 ENGINEERED: YES

SOURCE MOL_ID: 1;

SOURCE 2 ORGANISM_SCIENTIFIC: HOMO SAPIENS;

REMARK 999 THE SEQUENCE IS RESIDUES 67-157 PRECEDED BY A GAM TAG

DBREF 2WJ7 A 1 3 PDB 2WJ7 2WJ7 1 3

DBREF 2WJ7 A 4 94 UNP P02511 CRYAB_HUMAN 67 157

SEQRES 1 A 94 GLY ALA MET GLU MET ARG LEU GLU LYS ASP ARG PHE SER

SEQRES 2 A 94 VAL ASN LEU ASP VAL LYS HIS PHE SER PRO GLU GLU LEU

SEQRES 3 A 94 LYS VAL LYS VAL LEU GLY ASP VAL ILE GLU VAL HIS GLY

SEQRES 4 A 94 LYS HIS GLU GLU ARG GLN ASP GLU HIS GLY PHE ILE SER

SEQRES 5 A 94 ARG GLU PHE HIS ARG LYS TYR ARG ILE PRO ALA ASP VAL

SEQRES 6 A 94 ASP PRO LEU THR ILE THR SER SER LEU SER SER ASP GLY

SEQRES 7 A 94 VAL LEU THR VAL ASN GLY PRO ARG LYS GLN VAL SER GLY

SEQRES 8 A 94 PRO GLU ARG

ATOM 1 N MET A 3 23.981 -7.754 15.338 1.00 40.49 N

ATOM 2 CA MET A 3 23.218 -8.749 14.574 1.00119.83 C

ATOM 3 C MET A 3 24.149 -9.916 14.119 1.00 98.08 C

ATOM 4 O MET A 3 25.254 -10.069 14.670 1.00 80.95 O

ATOM 5 CB MET A 3 22.414 -8.092 13.403 1.00 45.41 C

ATOM 6 N GLU A 4 23.686 -10.723 13.149 1.00 63.18 N

ATOM 7 CA GLU A 4 24.418 -11.846 12.545 1.00 46.19 C

ATOM 8 C GLU A 4 25.920 -11.668 12.409 1.00 83.92 C

ATOM 9 O GLU A 4 26.425 -10.547 12.237 1.00 48.25 O

ATOM 10 CB GLU A 4 23.895 -12.164 11.147 1.00 37.69 C

ATOM 11 CG GLU A 4 24.844 -13.184 10.250 1.00 95.40 C

ATOM 12 CD GLU A 4 25.982 -12.580 9.254 1.00125.93 C

ATOM 13 OE1 GLU A 4 26.166 -11.344 9.039 1.00 77.38 O

ATOM 14 OE2 GLU A 4 26.731 -13.391 8.642 1.00 87.82 O

ATOM 15 N MET A 5 26.659 -12.773 12.451 1.00 36.27 N

ATOM 16 CA MET A 5 28.054 -12.609 12.120 1.00 59.07 C

ATOM 17 C MET A 5 28.601 -13.893 11.567 1.00 59.32 C

PDB

>sp|P02511|CRYAB_HUMAN Alpha-crystallin

B chain OS=Homo sapiens GN=CRYAB

MDIAIHHPWIRRPFFPFHSPSRLFDQFFGEH

LLESDLFPTSTSLSPFYLRPPSFLRAPSW

FDTGLSEMRLEKDRFSVNLDVKHFSPEELKV

KVLGDVIEVHGKHEERQDEHGFISREFHR

KYRIPADVDPLTITSSLSSDGVLTVNGPRKQV

SGPERTIPITREEKPAVTAAPKK

>2WJ7:A|PDBID|CHAIN|SEQUENCE

GAMEMRLEKDRFSVNLDVKHFSPEE

LKVKVLGDVIEVHGKHEERQDEHGFI

SREFHRKYRIPADVDPLTITSSLSSDG

VLTVNGPRKQVSGPER

PDB

UniProtKB-PDB residue level mapping

Why Residue Level Mapping?

The UniProt Knowledgebase (UniProtKB), the worldwide protein sequence resource,

contains over 32 million sequences (as of release 2013-03). Of these, 539616 have been

manually annotated. The ‘added value’ of this annotation implies a degree of certainty about

the quality of the sequence as well as a large amount of extra information which has been

collated from a wide variety of sources. One of these is the Protein Databank (PDB) which

has 3D experimental details of protein folds, active/binding site residues, ligands, metals and

cofactor binding from which mechanisms of action can be deduced. This information is

invaluable for drug design, homology modelling, impact of SNPs, mutation studies, novel

protein design, etc.

The number of solved structures in the PDB is 87681 on 03/2013 (less if non-redundancy is

taken into account) - this represents a very small fraction of the total UniProtKB universe and

would appear to have little impact on the improvement of UniProtKB annotation. However,

this structural and functional information can be extended to the widely distributed

homologous and orthologous sequences related to these PDB entries.

The mappings are mostly generated automatically and updated weekly via a Java

application called getafix. The number of new PDB structures deposited each week varies

but is somewhere between 200-500 with each one requiring a ‘match’ to its specific

UniProtKB entry. Problematic matches always occur and these are manually curated.

Examples include chimaeras, N- and C- terminal tags, missing sections and domains, short

sequences and peptides, antibodies and immunoglobulin folds, modified and non-standard

residues. Merged, demerged and deleted UniProtKB entries are often a source of error in

automated mapping and also need manual inspection.

Mapped PDB text files possess one or more DBREF line(s) which indicate which residues of

the structure relate to which in a UniProtKB sequence. In cases of multiple structures or

chimaeras one PDB entry will point to a number UniProtKB identifiers.

UniProtKB entries will cross-reference one or more PDB records in their DR PDB line(s).

A direct link from each amino acid in a UniProtKB sequence to a PDB entry may appear a trivial task but, as can be seen in the simple example above, the N

and C termini are not expressed in this crystallised protein (red). Note also that the SEQRES lines in the PDB suggest that the structure has an N-terminal 3

residue tag ‘GAM’. The actual coordinates, however, start with the final methionine of the tag, a residue not part of the UniProtKB sequence. Again, the

SEQRES lines state the sequence ends in QVSGPER, whereas in fact these residues are also missing from the 3D coordinates (purple).

Final mapped sequence:

EMRLEKDRFSVNLDVKHFSPEELKV

KVLGDVIEVHGKHEERQDEHGFISRE

FHRKYRIPADVDPLTITSSLSSDGVLT

VNGPRKQVSGPER

The Binding of Biological Molecules in Protein Structures

The binding of biologically important molecules in a PDB structure is captured and

automatically added to unreviewed UniProtKB/TrEMBL entries, visible in the various FT

lines. Reviewed or hand annotated UniProtKB/Swiss-Prot entries can be updated with

similar information from the PDB using an in-house curator tool, again part of the getafix

suite.

FT METAL 167 185 Manganese[1ATP]. FT NP_BIND 50 58 ATP. FT NP_BIND 122 128 ATP. FT NP_BIND 169 172 ATP. FT ACT_SITE 167 167 Proton acceptor. FT BINDING 73 73 ATP.

Collaborations and Applications

Maintaining up to date mappings relies on a close collaboration with the PDBe and good

communication with the RCSB. The mapping data is integral to the SIFTS database

(structure integration with function, taxonomy and sequence) which provides residue level

mapping to IntEnz, GO, Pfam, InterPro, SCOP, CATH and PubMed databases. The Enzyme

Portal is another resource which uses residue level mappings by combining enzyme

sequence and structure information with small molecule substrates/drugs and biochemical

pathways and functionality.

runPdbReleaseCheck

UniProtKB/

Swiss-Prot and TrEMBL

PDBe

Get new and modified pdb

files

RunWeeklyPdbRelease.sh

PDBe Repository for all PDB entries

getafix Fasta and

XML files

pdbReleaseMapping

Editor DBREF.txt

PDBe

mappings

PDBe

cronjob

UniProtPdbXrefs.txt

Email SIB

SwissProtAddLogFile.txt

SwissProtCuratedMoveLogFile.txt

SwissProtDeletedXrefLogfile.txt

ftp XREF files

weekly

monthly buildGetafixDB

weekly

Obsolete etc.

files

Errors in PDB entries email

RCSB, PDBj or PDBe

SIFTS

GO annotation xml

TrEMBL SwissProt

CSA

Catalytic

site

annotations

FT lines to TrEMBL

makeUniProtPdbXrefs

RCSB