10
Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生生生 g934251 生生生

Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g934251 詹濠先

Embed Size (px)

Citation preview

Page 1: Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g934251 詹濠先

Protein Structure Database Introduction

Database of Comparative Protein Structure Models

ModBase

生資所 g934251 詹濠先

Page 2: Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g934251 詹濠先

General Information• MODBASE is a queryable database of annotated protein structure

models. The models are derived by ModPipe, an automated modeling pipeline relying on the programs PSI-BLAST and MODELLER. The database also includes fold assignments and alignments on which the models were based. MODBASE contains theoretically calculated models, which may contain significant errors, not experimentally determined structures. Thus, special care is taken to assess the quality of the models.

More information about MODBASE can be found in the HTML or PDF version of ModBase: A database of annotated comparative protein structure models. R. Sánchez, U. Pieper, N. Mirkovic, P.I.W. deBakker, E. Wittenstein & A. Sali. Nucl. Acids Res. 28, 250-253, 2000.

Page 3: Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g934251 詹濠先

MODBASE core

• Models in MODBASE are calculated using MODPIPE, our entirely automated software pipeline for comparative modeling (16). MODPIPE can calculate comparative models for a large number of protein sequences, using many different template structures and sequence–structure alignments. MODPIPE relies on the various modules of MODELLER for its functionality and is streamlined for large-scale operation on a cluster of PCs using scripts written in PERL.

Page 4: Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g934251 詹濠先

• Models in MODBASE are organized into data sets. The largest data set contains models of all sequences in the Swiss-Prot/TrEMBL database that are detectably related to at least one known structure in the PDB.

• Currently, there are 1,262,629 models for domains in 659,495 of the 1,182,126 sequences in the Swiss-Prot/TrEMBL database, with an average length of 235 residues per model.

Human 32,985:sequences, Arabidopsis thaliana : 22 880 sequences, Drosophila melanogaster : 15,195 sequences

Escherichia coli : 9,691 sequences • Because the sequence databases contain sequence info

rmation of different strains and mutations, the number of unique sequences for a given organism exceeds the number of genes in the genome.

Page 5: Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g934251 詹濠先

Specialities

• Predicted interacting proteins Residue contacts between the two models are predicted based on a

match of both modeled sequences to different parts of a single PDB file.

The residue contacts in a hypothetical interface are scored by their

propensities to span an interface. False positive ratio : 25%

Page 6: Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g934251 詹濠先

Specialities

• Predicted Ligand Binding Sites ModBase contains a list of the binding sites of known structure for ∼

50 000 ligands found in the PDB.

Forty-four percent of the models in MODBASE have at least one predicted binding site for a small ligand.

• Application of MODBASE to Structural Genomics

NYSGXRC structures, PSI-BLAST E-value

Page 7: Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g934251 詹濠先

Access and Interface• MODBASE is queryable http://salilab.org/modbase PDB codes, Swiss-Prot/TrEMBL and GenPept

accession numbers, annotation keywords, model reliability, model size, target–template sequence identity, alignment significance, and sequence similarity to the modeled sequences as detected by BLAST

• The output of a search is displayed on pages with varying amounts of information about the modeled sequences, template structures, alignments and functional annotations. These tables also contain links to other sequence, structure and function annotation databases, such as PDB (4), GenBank (3), Swiss-Prot/TrEMBL (2), CATH (32), Pfam (33), ProDom (34), and UCSC Genome Browser (35). In addition, MODBASE models are directly accessible from the Swiss-Prot/TrEMBL sequence pages at http://www.expasy.org and UCSC Genome Browser at http://genome.ucsc.edu.

Enter PDB, Swiss-ProtGenPept…etc here.

Press “Search”!

1

2

Page 8: Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g934251 詹濠先

Search

Select any of your interests.

Select your purpose, e.g., “FindLigand binding site”

Page 9: Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g934251 詹濠先

Homolog Structure

Predictionof Ligand Binding Sites

Page 10: Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g934251 詹濠先

Reference• Ursula, Pieper et al. MODBASE, a database of annotated comparative protein

structure models, and associated resources. Nucleic Acids Res. 2004 January 1; 32(Database issue): D217–D222.