25
Modeling the Structures of Proteins and Macromolecular Assemblies Depts. Of Biopharmaceutical Sciences and Pharmaceutical Chemistry California Institute for Quantitative Biomedical Research University of California at San Francisco Andrej Šali http:// salilab.org/ 3/25/03

Modeling the Structures of Proteins and Macromolecular Assemblies

  • Upload
    denver

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

Modeling the Structures of Proteins and Macromolecular Assemblies. Depts. Of Biopharmaceutical Sciences and Pharmaceutical Chemistry California Institute for Quantitative Biomedical Research University of California at San Francisco. Andrej Š ali http://salilab.org/. 3/25/03. - PowerPoint PPT Presentation

Citation preview

Page 1: Modeling the Structures of Proteins and Macromolecular Assemblies

Modeling the Structures of Proteins andMacromolecular Assemblies

Depts. Of Biopharmaceutical Sciences and Pharmaceutical ChemistryCalifornia Institute for Quantitative Biomedical Research

University of California at San Francisco

Andrej Šali

http://salilab.org/

3/25/03

Page 2: Modeling the Structures of Proteins and Macromolecular Assemblies

1. Yeast and E. coli ribosomes: Electron microscopy, comparative modeling, and structural genomics.

2. Yeast Nuclear Pore Complex: Low-resolution modeling of large assemblies; bridging the gaps between structural biology, proteomics, and system biology.

3. Comments.

4/6/03

Page 3: Modeling the Structures of Proteins and Macromolecular Assemblies

S. cerevisiae ribosome

C. Spahn, R. Beckmann, N. Eswar, P. Penczek, A. Sali, G. Blobel, J. Frank. Cell 107, 361-372, 2001.

Fitting of comparative models into 15Å cryo- electron density map.

43 proteins could be modeled on 20-56% seq.id. to a known structure.

The modeled fraction of the proteins ranges from 34-99%.

3/25/03

Page 4: Modeling the Structures of Proteins and Macromolecular Assemblies

E. coli ribosomeH.Gao, J.Sengupta, M.Valle, A. Korostelev, N.Eswar, S.Stagg, P.Van Roey, R.Agrawal, S.Harvey, A.Sali, M.Chapman, and J.Frank. Cell, in press.

Upon EF-G binding, the

ribosome becomes less

compact. In contrast to

mRNA, many protein

contacts undergo large

conformational changes,

suggesting ribosomal

proteins facilitate the

dynamics of translation.

4/6/03

Page 5: Modeling the Structures of Proteins and Macromolecular Assemblies

STARTSTART

Get profile for sequence (NR)Get profile for sequence (NR)

Scan sequence profile against representative PDB chains

Scan sequence profile against representative PDB chains

Scan PDB chain profiles against sequence

Scan PDB chain profiles against sequence

PS

I-B

LA

ST

MODPIPE: Large-Scale Comparative Protein Structure Modeling

Select templates using permissive E-value

cutoff

Select templates using permissive E-value

cutoff

1

Expand match to cover complete

domains

Expand match to cover complete

domains

1

Build model for target segment by satisfaction of spatial

restraints

Build model for target segment by satisfaction of spatial

restraints

Evaluate modelEvaluate model

Align matched parts of sequence and structure

Align matched parts of sequence and structure

MO

DE

LL

ER

R. Sánchez & A. Šali, Proc. Natl. Acad. Sci. USA 95, 13597, 1998.N. Eswar, M. Marti-Renom, M.S. Madhusudhan, B. John, A. Fiser, R. Sánchez, F. Melo, N. Mirkovic, A. Šali.

Fo

r ea

ch t

arg

et s

equ

ence

Fo

r ea

ch t

emp

late

str

uct

ure

3/25/03

ENDEND

Page 6: Modeling the Structures of Proteins and Macromolecular Assemblies

http://salilab.org/modbasePieper et al., Nucl. Acids Res. 2002.

3/25/03

Page 7: Modeling the Structures of Proteins and Macromolecular Assemblies

Comparative modeling of the TrEMBL database

Unique sequences processed: 733,239

Sequences with fold assignments or models: 415,937 (57%)

4/03/02 ~4 weeks on 500 Pentium III CPUs

70% of models based on <30% sequence identity to template.

On average, only a domain per protein is modeled(an “average” protein has 2.5 domains of 175 aa).

3/25/03

Page 8: Modeling the Structures of Proteins and Macromolecular Assemblies

Model AccuracyMarti-Renom et al. Annu.Rev.Biophys.Biomol.Struct. 29, 291-325, 2000.

MEDIUM ACCURACY LOW ACCURACYHIGH ACCURACY

% Sequence Identity

0 10 20 30 40 50 60 70 80 90 100

% S

tru

ctu

ral O

verl

ap

10

20

30

40

50

60

70

80

90

100

Target-ModelTarget-Template

NM23Seq id 77%

CRABPSeq id 41%

EDNSeq id 33%

X-RAY / MODEL

SidechainsCore backboneLoops

C equiv 147/148RMSD 0.41Å

SidechainsCore backboneLoopsAlignment

C equiv 122/137RMSD 1.34Å

SidechainsCore backboneLoopsAlignmentFold assignment

C equiv 90/134RMSD 1.17Å

4/6/03

Page 9: Modeling the Structures of Proteins and Macromolecular Assemblies

Future directions

Make sure we have building blocks (structural genomics).

Develop methods for simultaneous fitting of proteins into

EM density and conformational modeling (induced fit,

comparative modeling, ab initio).

Need large computing (eg, cluster of hundreds of nodes

with >1GB memories).

4/6/03

Page 10: Modeling the Structures of Proteins and Macromolecular Assemblies

Rockefeller University, New York

*UCSF

F. Alber*, T. Suprapto, J. Kipper, W. Zhang, L. Veenhoff, S. Dokudovskaya, M. Rout, B. Chait, A. Šali*

Modeling of the yeast nuclear pore complex by satisfaction of spatial restraints (MODELLER)

4/6/03

Page 11: Modeling the Structures of Proteins and Macromolecular Assemblies

Modeling macromolecular assemblies by satisfaction of spatial restraints

1) Representation of a system.

2) Scoring function (spatial restraints).3) Optimization.

Sali, Ernst, Glaeser, Baumeister. From words to literature in structural proteomics. Nature 422, 216-225, 2003.3/25/03

There is nothing but points and restraints on them.

Page 12: Modeling the Structures of Proteins and Macromolecular Assemblies

Modeling of NPC

Stochiometry.

Parse proteins into domains.

Protein and “sub-complex’ shapes from Stokes radii.

Excluded volume of proteins.

Symmetry of NPC (EM).

Radial and axial localization of proteins (IEM).

Protein-protein proximity (immuno-purification).

Binary protein-protein contacts from “overlay” experiments.

Modeling in the context of the nuclear envelope.

Comparative models for some domains.

Structural genomics of nucleoporins.4/6/03

Page 13: Modeling the Structures of Proteins and Macromolecular Assemblies

spokehalf-spoke

cytosolic side

nuclear side

Schematic structure of yeast NPC

nuclear membrane

TOP VIEW SIDE VIEW

half-spoke contains~30 nucleoporin proteins (NUPs).

~480 NUPs in NPC.

C.W. Akey, M. Rout

3/25/03

Page 14: Modeling the Structures of Proteins and Macromolecular Assemblies

[sum of residues]

com

ple

x d

imen

sio

n f

(res

) [A

]

pulloutN

i

resii

resP

resP

ABP nSNwithNftoleranceD

1

;

Protein-protein “contacts”

For each protein pair within a half-spoke, the upper bound on the center-center distance is the estimated maximal complex diameter:

Complications:- multiple protein copies in half-spoke - inter-spoke interactions.

3/25/03

Page 15: Modeling the Structures of Proteins and Macromolecular Assemblies

Successful optimization11/18/023/25/03

Page 16: Modeling the Structures of Proteins and Macromolecular Assemblies

Scan of figureNUP120NUP84

NUP85

NUP145C

SEH1

Cdc31

Experiment

Rappsilber J, Siniossoglou S, Hurt EC, Mann M. Anal Chem 2000 Jan 15;72(2):267-75.

Model

NUP84-complex

4/6/03

Page 17: Modeling the Structures of Proteins and Macromolecular Assemblies

Structural proteomics aims to characterize structures of most macromolecular complexes, in space and time.

On average, a domain may interact with a few other domains. The function of a complex is determined by its structure and dynamics.

There are too many complexes to be determined directly by high-resolution experimental structure determination.

Thus, just like in structural genomics, an efficient combination of experiment and computation is required.

4/6/03

Page 18: Modeling the Structures of Proteins and Macromolecular Assemblies

Structural Proteomics versus Structural Genomics

Potential targets not clear clear

Target selection not clear clear

Scope not clear clear

Structure determination hybrid X-ray or NMR

Functional Annotation essential not a major

focus

There are additional sciences and technologies in structural proteomics, relative to structural genomics.

4/6/03

Page 19: Modeling the Structures of Proteins and Macromolecular Assemblies

Structural Genomics

Characterize most protein sequences based on related known structures.

There are ~16,000 30% seq id families (90%)(Vitkup et al. Nat. Struct. Biol. 8, 559, 2001)

Sali. Nat. Struct. Biol. 5, 1029, 1998.Sali et al. Nat. Struct. Biol., 7, 986, 2000.Sali. Nat. Struct. Biol. 7, 484, 2001.Baker & Sali. Science 294, 93, 2001.

The number of “families” is much smaller than the number of proteins.

Any one of the members of a family is fine.

Characterize most protein sequences based on related known structures:

3/25/03

Page 20: Modeling the Structures of Proteins and Macromolecular Assemblies

Target selection for structural proteomics(scope)

Comprehensive coverage.

What targets:

Stable assemblies? Transient complexes? Pairs of domains?

Can they be organized into groups?

How many such groups are there?

4/6/03

Page 21: Modeling the Structures of Proteins and Macromolecular Assemblies

Target selection for structural proteomics?

• There are ~3,000 folds containing ~90% of all sequences.

• A target: Binary domain-domain interface.

• There may be a finite number of domain-domain interfaces, largely defined by the domain fold types.

• Given pairwise interactions, a large assembly may be reconstructed.

4/6/03

Page 22: Modeling the Structures of Proteins and Macromolecular Assemblies

Protein and assembly structure by experiment and computation

3/25/3Sali, Ernst, Glaeser, Baumeister. From words to literature in structural proteomics. Nature 422, 216-225, 2003.

3/25/03

Page 23: Modeling the Structures of Proteins and Macromolecular Assemblies

Modeling macromolecular assemblies by satisfaction of spatial restraints

1) Representation of a system.

2) Scoring function (spatial restraints).3) Optimization.

Sali, Ernst, Glaeser, Baumeister. From words to literature in structural proteomics. Nature 422, 216-225, 2003.3/25/03

There is nothing but points and restraints on them.

Page 24: Modeling the Structures of Proteins and Macromolecular Assemblies

Future directions

Development of general/flexible/hierarchical representation of assemblies.

Quantifying spatial information from experiments.

Optimization of structure.

Toy models.

Space, time.

4/6/03

Page 25: Modeling the Structures of Proteins and Macromolecular Assemblies

Role of NIH

Structural proteomics is timely and feasible.

Humongous integration of concepts, sciences, methods, tools, people.

Thus, need research centers, but also “R01” research.

Significant computing.

Bridging the gaps between structural biology, proteomics, and system biology.

4/6/03