Modeling the Structures of Proteins and Macromolecular Assemblies

Preview:

DESCRIPTION

Modeling the Structures of Proteins and Macromolecular Assemblies. Depts. Of Biopharmaceutical Sciences and Pharmaceutical Chemistry California Institute for Quantitative Biomedical Research University of California at San Francisco. Andrej Š ali http://salilab.org/. 3/25/03. - PowerPoint PPT Presentation

Citation preview

Modeling the Structures of Proteins andMacromolecular Assemblies

Depts. Of Biopharmaceutical Sciences and Pharmaceutical ChemistryCalifornia Institute for Quantitative Biomedical Research

University of California at San Francisco

Andrej Šali

http://salilab.org/

3/25/03

1. Yeast and E. coli ribosomes: Electron microscopy, comparative modeling, and structural genomics.

2. Yeast Nuclear Pore Complex: Low-resolution modeling of large assemblies; bridging the gaps between structural biology, proteomics, and system biology.

3. Comments.

4/6/03

S. cerevisiae ribosome

C. Spahn, R. Beckmann, N. Eswar, P. Penczek, A. Sali, G. Blobel, J. Frank. Cell 107, 361-372, 2001.

Fitting of comparative models into 15Å cryo- electron density map.

43 proteins could be modeled on 20-56% seq.id. to a known structure.

The modeled fraction of the proteins ranges from 34-99%.

3/25/03

E. coli ribosomeH.Gao, J.Sengupta, M.Valle, A. Korostelev, N.Eswar, S.Stagg, P.Van Roey, R.Agrawal, S.Harvey, A.Sali, M.Chapman, and J.Frank. Cell, in press.

Upon EF-G binding, the

ribosome becomes less

compact. In contrast to

mRNA, many protein

contacts undergo large

conformational changes,

suggesting ribosomal

proteins facilitate the

dynamics of translation.

4/6/03

STARTSTART

Get profile for sequence (NR)Get profile for sequence (NR)

Scan sequence profile against representative PDB chains

Scan sequence profile against representative PDB chains

Scan PDB chain profiles against sequence

Scan PDB chain profiles against sequence

PS

I-B

LA

ST

MODPIPE: Large-Scale Comparative Protein Structure Modeling

Select templates using permissive E-value

cutoff

Select templates using permissive E-value

cutoff

1

Expand match to cover complete

domains

Expand match to cover complete

domains

1

Build model for target segment by satisfaction of spatial

restraints

Build model for target segment by satisfaction of spatial

restraints

Evaluate modelEvaluate model

Align matched parts of sequence and structure

Align matched parts of sequence and structure

MO

DE

LL

ER

R. Sánchez & A. Šali, Proc. Natl. Acad. Sci. USA 95, 13597, 1998.N. Eswar, M. Marti-Renom, M.S. Madhusudhan, B. John, A. Fiser, R. Sánchez, F. Melo, N. Mirkovic, A. Šali.

Fo

r ea

ch t

arg

et s

equ

ence

Fo

r ea

ch t

emp

late

str

uct

ure

3/25/03

ENDEND

http://salilab.org/modbasePieper et al., Nucl. Acids Res. 2002.

3/25/03

Comparative modeling of the TrEMBL database

Unique sequences processed: 733,239

Sequences with fold assignments or models: 415,937 (57%)

4/03/02 ~4 weeks on 500 Pentium III CPUs

70% of models based on <30% sequence identity to template.

On average, only a domain per protein is modeled(an “average” protein has 2.5 domains of 175 aa).

3/25/03

Model AccuracyMarti-Renom et al. Annu.Rev.Biophys.Biomol.Struct. 29, 291-325, 2000.

MEDIUM ACCURACY LOW ACCURACYHIGH ACCURACY

% Sequence Identity

0 10 20 30 40 50 60 70 80 90 100

% S

tru

ctu

ral O

verl

ap

10

20

30

40

50

60

70

80

90

100

Target-ModelTarget-Template

NM23Seq id 77%

CRABPSeq id 41%

EDNSeq id 33%

X-RAY / MODEL

SidechainsCore backboneLoops

C equiv 147/148RMSD 0.41Å

SidechainsCore backboneLoopsAlignment

C equiv 122/137RMSD 1.34Å

SidechainsCore backboneLoopsAlignmentFold assignment

C equiv 90/134RMSD 1.17Å

4/6/03

Future directions

Make sure we have building blocks (structural genomics).

Develop methods for simultaneous fitting of proteins into

EM density and conformational modeling (induced fit,

comparative modeling, ab initio).

Need large computing (eg, cluster of hundreds of nodes

with >1GB memories).

4/6/03

Rockefeller University, New York

*UCSF

F. Alber*, T. Suprapto, J. Kipper, W. Zhang, L. Veenhoff, S. Dokudovskaya, M. Rout, B. Chait, A. Šali*

Modeling of the yeast nuclear pore complex by satisfaction of spatial restraints (MODELLER)

4/6/03

Modeling macromolecular assemblies by satisfaction of spatial restraints

1) Representation of a system.

2) Scoring function (spatial restraints).3) Optimization.

Sali, Ernst, Glaeser, Baumeister. From words to literature in structural proteomics. Nature 422, 216-225, 2003.3/25/03

There is nothing but points and restraints on them.

Modeling of NPC

Stochiometry.

Parse proteins into domains.

Protein and “sub-complex’ shapes from Stokes radii.

Excluded volume of proteins.

Symmetry of NPC (EM).

Radial and axial localization of proteins (IEM).

Protein-protein proximity (immuno-purification).

Binary protein-protein contacts from “overlay” experiments.

Modeling in the context of the nuclear envelope.

Comparative models for some domains.

Structural genomics of nucleoporins.4/6/03

spokehalf-spoke

cytosolic side

nuclear side

Schematic structure of yeast NPC

nuclear membrane

TOP VIEW SIDE VIEW

half-spoke contains~30 nucleoporin proteins (NUPs).

~480 NUPs in NPC.

C.W. Akey, M. Rout

3/25/03

[sum of residues]

com

ple

x d

imen

sio

n f

(res

) [A

]

pulloutN

i

resii

resP

resP

ABP nSNwithNftoleranceD

1

;

Protein-protein “contacts”

For each protein pair within a half-spoke, the upper bound on the center-center distance is the estimated maximal complex diameter:

Complications:- multiple protein copies in half-spoke - inter-spoke interactions.

3/25/03

Successful optimization11/18/023/25/03

Scan of figureNUP120NUP84

NUP85

NUP145C

SEH1

Cdc31

Experiment

Rappsilber J, Siniossoglou S, Hurt EC, Mann M. Anal Chem 2000 Jan 15;72(2):267-75.

Model

NUP84-complex

4/6/03

Structural proteomics aims to characterize structures of most macromolecular complexes, in space and time.

On average, a domain may interact with a few other domains. The function of a complex is determined by its structure and dynamics.

There are too many complexes to be determined directly by high-resolution experimental structure determination.

Thus, just like in structural genomics, an efficient combination of experiment and computation is required.

4/6/03

Structural Proteomics versus Structural Genomics

Potential targets not clear clear

Target selection not clear clear

Scope not clear clear

Structure determination hybrid X-ray or NMR

Functional Annotation essential not a major

focus

There are additional sciences and technologies in structural proteomics, relative to structural genomics.

4/6/03

Structural Genomics

Characterize most protein sequences based on related known structures.

There are ~16,000 30% seq id families (90%)(Vitkup et al. Nat. Struct. Biol. 8, 559, 2001)

Sali. Nat. Struct. Biol. 5, 1029, 1998.Sali et al. Nat. Struct. Biol., 7, 986, 2000.Sali. Nat. Struct. Biol. 7, 484, 2001.Baker & Sali. Science 294, 93, 2001.

The number of “families” is much smaller than the number of proteins.

Any one of the members of a family is fine.

Characterize most protein sequences based on related known structures:

3/25/03

Target selection for structural proteomics(scope)

Comprehensive coverage.

What targets:

Stable assemblies? Transient complexes? Pairs of domains?

Can they be organized into groups?

How many such groups are there?

4/6/03

Target selection for structural proteomics?

• There are ~3,000 folds containing ~90% of all sequences.

• A target: Binary domain-domain interface.

• There may be a finite number of domain-domain interfaces, largely defined by the domain fold types.

• Given pairwise interactions, a large assembly may be reconstructed.

4/6/03

Protein and assembly structure by experiment and computation

3/25/3Sali, Ernst, Glaeser, Baumeister. From words to literature in structural proteomics. Nature 422, 216-225, 2003.

3/25/03

Modeling macromolecular assemblies by satisfaction of spatial restraints

1) Representation of a system.

2) Scoring function (spatial restraints).3) Optimization.

Sali, Ernst, Glaeser, Baumeister. From words to literature in structural proteomics. Nature 422, 216-225, 2003.3/25/03

There is nothing but points and restraints on them.

Future directions

Development of general/flexible/hierarchical representation of assemblies.

Quantifying spatial information from experiments.

Optimization of structure.

Toy models.

Space, time.

4/6/03

Role of NIH

Structural proteomics is timely and feasible.

Humongous integration of concepts, sciences, methods, tools, people.

Thus, need research centers, but also “R01” research.

Significant computing.

Bridging the gaps between structural biology, proteomics, and system biology.

4/6/03

Recommended