10
Structure Review Protein Modeling: What Happened to the ‘‘Protein Structure Gap’’? Torsten Schwede 1,2, * 1 Biozentrum, University of Basel, Klingelbergstrasse 50-70, 4056 Basel, Switzerland 2 Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Klingelbergstrasse 50-70, 4056 Basel, Switzerland *Correspondence: [email protected] http://dx.doi.org/10.1016/j.str.2013.08.007 Computational modeling of three-dimensional macromolecular structures and complexes from their sequence has been a long-standing vision in structural biology. Over the last 2 decades, a paradigm shift has occurred: starting from a large ‘‘structure knowledge gap’’ between the huge number of protein sequences and small number of known structures, today, some form of structural information, either exper- imental or template-based models, is available for the majority of amino acids encoded by common model organism genomes. With the scientific focus of interest moving toward larger macromolecular complexes and dynamic networks of interactions, the integration of computational modeling methods with low-resolu- tion experimental techniques allows the study of large and complex molecular machines. One of the open challenges for computational modeling and prediction techniques is to convey the underlying assumptions, as well as the expected accuracy and structural variability of a specific model, which is crucial to understand- ing its limitations. All macromolecular structures are, to some degree, models with a variable ratio between experimental data and computational prediction. Typically, the atomic coordinates of heavy atoms in very high-resolution crystal structures are overdetermined by the diffraction data, while methods with a lower ratio of para- meters to experimental observables increasingly rely on com- putational tools to construct structural models for the spatial interpretation of the data (e.g., nuclear magnetic resonance [NMR], electron microscopy [EM], small-angle X-ray scattering [SAXS], fluorescence resonance energy transfer [FRET]) (Kalinin et al., 2012; Read et al., 2011; Rieping et al., 2005; Trewhella et al., 2013). At the other end of the spectrum are methods for the de novo prediction of macromolecular structures, which aim to predict the native three-dimensional (3D) structure, typi- cally a domain of a protein, directly from its amino acid sequence without experimental data by using various ab initio or knowl- edge-based computational approaches (Baker and Sali, 2001; Das and Baker, 2008). With the focus of interest in structural biology moving toward larger macromolecular complexes and dynamic networks of interactions, integrative structure solution techniques that can combine experimental data from heterogeneous sources and are able to handle ambiguous or conflicting information are becoming essential. The combination of computational modeling with low-resolution experimental constraints has espe- cially proven powerful (Ward et al., 2013). In retrospect, the most famous 3D structure of a biological macromolecule could, in today’s terms, be considered as an ‘‘integrative low-resolution model’’: When Watson and Crick published the structure of the DNA double helix, their model was based on fiber diffraction data at low resolution and additional constraints about chemistry and stoichiometry. Although atomic high-resolution diffraction data only became available much later, this low-resolution model suggested ‘‘a possible copying mechanism for the genetic mate- rial’’ (Watson and Crick, 1953, p. 737) and has initiated a revolu- tion in molecular biology and biomedical research (Collins et al., 2003). Obviously, it is not the atomic resolution or precision of a model that determines its usefulness but the understanding which interpretations and conclusions can be supported by the model at hand. In contrast to the regular structure of the DNA double helix, the structural biology of proteins is much more complex, where each protein has its own unique 3D structure. Since small changes in the sequence of a protein can have strong effects on its biophysical properties, experimental determination of protein structures is a laborious and often unpredictable endeavor. The computational modeling of a protein’s structure has therefore attracted substantial interest in the field of bioinformatics to complement experimental structural biology efforts to characterize the protein universe (Baker and Sali, 2001; Levitt, 2009). In the following paragraphs, I provide an updated view on the ‘‘protein structure gap,’’ arguing that structure information for the majority of amino acids in common model organism proteomes can be provided by a combination of computational and exper- imental techniques. I highlight some applications of structure modeling and prediction techniques in various areas of life sciences and discuss limitations and challenges in communi- cating model information to potential users of models. The Protein Structure Gap Is Disappearing Advances in DNA sequencing techniques are giving rise to an unprecedented avalanche of new sequences (UniProt Con- sortium, 2013), and it is obvious that it will be impossible to deter- mine the structures of all proteins of interest experimentally with current techniques (Figure 1). With more sensitive next-gen- eration sequencing techniques becoming available, noncultivat- able organisms also come within reach, widening the protein Structure 21, September 3, 2013 ª2013 Elsevier Ltd All rights reserved 1531

Protein Modeling: What Happened to the ``Protein Structure ...organism genomes. With the scientific focus of interest moving toward larger macromolecular complexes and dynamic networks

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Protein Modeling: What Happened to the ``Protein Structure ...organism genomes. With the scientific focus of interest moving toward larger macromolecular complexes and dynamic networks

Structure

Review

Protein Modeling: What Happenedto the ‘‘Protein Structure Gap’’?

Torsten Schwede1,2,*1Biozentrum, University of Basel, Klingelbergstrasse 50-70, 4056 Basel, Switzerland2Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Klingelbergstrasse 50-70, 4056 Basel, Switzerland*Correspondence: [email protected]://dx.doi.org/10.1016/j.str.2013.08.007

Computational modeling of three-dimensional macromolecular structures and complexes from theirsequence has been a long-standing vision in structural biology. Over the last 2 decades, a paradigm shifthas occurred: starting from a large ‘‘structure knowledge gap’’ between the huge number of proteinsequences and small number of known structures, today, some form of structural information, either exper-imental or template-based models, is available for the majority of amino acids encoded by common modelorganism genomes. With the scientific focus of interest moving toward larger macromolecular complexesand dynamic networks of interactions, the integration of computational modeling methods with low-resolu-tion experimental techniques allows the study of large and complex molecular machines. One of the openchallenges for computational modeling and prediction techniques is to convey the underlying assumptions,as well as the expected accuracy and structural variability of a specificmodel, which is crucial to understand-ing its limitations.

All macromolecular structures are, to some degree, models with

a variable ratio between experimental data and computational

prediction. Typically, the atomic coordinates of heavy atoms in

very high-resolution crystal structures are overdetermined by

the diffraction data, while methods with a lower ratio of para-

meters to experimental observables increasingly rely on com-

putational tools to construct structural models for the spatial

interpretation of the data (e.g., nuclear magnetic resonance

[NMR], electron microscopy [EM], small-angle X-ray scattering

[SAXS], fluorescence resonance energy transfer [FRET]) (Kalinin

et al., 2012; Read et al., 2011; Rieping et al., 2005; Trewhella

et al., 2013). At the other end of the spectrum are methods for

the de novo prediction of macromolecular structures, which

aim to predict the native three-dimensional (3D) structure, typi-

cally a domain of a protein, directly from its amino acid sequence

without experimental data by using various ab initio or knowl-

edge-based computational approaches (Baker and Sali, 2001;

Das and Baker, 2008).

With the focus of interest in structural biology moving toward

larger macromolecular complexes and dynamic networks of

interactions, integrative structure solution techniques that can

combine experimental data from heterogeneous sources and

are able to handle ambiguous or conflicting information

are becoming essential. The combination of computational

modelingwith low-resolution experimental constraints has espe-

cially proven powerful (Ward et al., 2013). In retrospect, the most

famous 3D structure of a biological macromolecule could, in

today’s terms, be considered as an ‘‘integrative low-resolution

model’’: When Watson and Crick published the structure of the

DNA double helix, their model was based on fiber diffraction

data at low resolution and additional constraints about chemistry

and stoichiometry. Although atomic high-resolution diffraction

data only became availablemuch later, this low-resolutionmodel

suggested ‘‘a possible copyingmechanism for the geneticmate-

Structure

rial’’ (Watson and Crick, 1953, p. 737) and has initiated a revolu-

tion in molecular biology and biomedical research (Collins et al.,

2003). Obviously, it is not the atomic resolution or precision of

a model that determines its usefulness but the understanding

which interpretations and conclusions can be supported by the

model at hand.

In contrast to the regular structure of the DNA double helix, the

structural biology of proteins is much more complex, where

each protein has its own unique 3D structure. Since small

changes in the sequence of a protein can have strong effects

on its biophysical properties, experimental determination of

protein structures is a laborious and often unpredictable

endeavor. The computational modeling of a protein’s structure

has therefore attracted substantial interest in the field of

bioinformatics to complement experimental structural biology

efforts to characterize the protein universe (Baker and Sali,

2001; Levitt, 2009).

In the following paragraphs, I provide an updated view on the

‘‘protein structure gap,’’ arguing that structure information for the

majority of amino acids in common model organism proteomes

can be provided by a combination of computational and exper-

imental techniques. I highlight some applications of structure

modeling and prediction techniques in various areas of life

sciences and discuss limitations and challenges in communi-

cating model information to potential users of models.

The Protein Structure Gap Is DisappearingAdvances in DNA sequencing techniques are giving rise to an

unprecedented avalanche of new sequences (UniProt Con-

sortium, 2013), and it is obvious that it will be impossible to deter-

mine the structures of all proteins of interest experimentally

with current techniques (Figure 1). With more sensitive next-gen-

eration sequencing techniques becoming available, noncultivat-

able organisms also come within reach, widening the protein

21, September 3, 2013 ª2013 Elsevier Ltd All rights reserved 1531

Page 2: Protein Modeling: What Happened to the ``Protein Structure ...organism genomes. With the scientific focus of interest moving toward larger macromolecular complexes and dynamic networks

Figure 1. Mind the GapThe number of entries in the SwissProt and trEMBL sequence databases(UniProt Consortium, 2013) and the PDB (Berman et al., 2007) are growingexponentially, while the protein structure gap between sequence and struc-tures is widening dramatically. Inset: growth of PDB holdings from 1972 to2013.

Table 1. Commonly Used Tools and Services for Protein

Structure Modeling and Prediction

Tool or Service Web Site

Protein Model

Portal

http://www.proteinmodelportal.org

(Arnold et al., 2009; Haas et al., 2013)

Model Archive http://modelarchive.org

HHpred http://toolkit.tuebingen.mpg.de/hhpred

(Hildebrand et al., 2009)

IMP http://www.salilab.org/imp (Russel et al., 2012;

Yang et al., 2012)

IntFOLD http://www.reading.ac.uk/bioinf/IntFOLD/

(Roche et al., 2011)

I-Tasser http://zhanglab.ccmb.med.umich.edu/

I-TASSER/ (Zhang, 2013)

ModBase http://salilab.org/modbase/

(Pieper et al., 2011)

Modeler/ModWeb http://salilab.org/modeller/

(Pieper et al., 2011; Yang et al., 2012)

Pcons.net http://pcons.net/ (Larsson et al., 2011)

PHYRE2 http://www.sbg.bio.ic.ac.uk/phyre2/

(Kelley and Sternberg, 2009)

Robetta http://robetta.bakerlab.org/

(Raman et al., 2009)

Rosetta https://www.rosettacommons.org

(Das and Baker, 2008)

SWISS-MODEL

Repository

http://swissmodel.expasy.org/repository

(Kiefer et al., 2009)

SWISS-MODEL

Workspace

http://swissmodel.expasy.org/workspace/

(Arnold et al., 2006; Bordoli and Schwede, 2012)

Structure

Review

structure gap even further, despite tremendous progress in auto-

mating experimental structure determination techniques.

Fortunately, homologous proteins that share detectable

sequence similarity have similar 3D structures, and their struc-

tural diversity is increasing with evolutionary distance, as out-

lined by Chothia and Lesk in their seminal paper, ‘‘The Relation

between the Divergence of Sequence and Structure in Proteins’’

(Chothia and Lesk, 1986). Based on this observation, methods

were developed 2 decades ago for the comparative modeling

(a.k.a. homology or template-based modeling) of protein struc-

tures, which allow extrapolation of the available experimental

structure information to as-yet uncharacterized protein se-

quences (Guex et al., 2009; Peitsch, 1995; Sanchez and Sali,

1998; Sutcliffe et al., 1987). Today, comparative modeling

techniques have matured into fully automated stable pipelines

that provide reliable 3Dmodels accessible also to nonspecialists

(Table 1).

The apparent complexity of the protein sequence universe can

be explained by multidomain architectures formed by combina-

tions of single domains characterized by approximately 15,000

sequence family profiles (Levitt, 2009). Thanks to efforts by the

experimental structural biology community andworldwide struc-

tural genomics efforts (Terwilliger, 2011), an increasing fraction

of protein families has at least one member with an experimental

structure in the Protein Data Bank (PDB) (Berman et al., 2007). At

the same time, sensitive and accurate profile HMM methods

have been developed and allow researchers to take advantage

of the available sequence information for detection of remote

template relationships (Remmert et al., 2012). As a result, a para-

digm shift has occurred during the last decade: starting from a

situation where the ‘‘structure knowledge gap’’ between the

number of protein sequences and small number of known struc-

tures has hampered the widespread use of structure-based

approaches in life science research, today, some form of struc-

tural information, either experimental or computational, is avail-

able for the majority of amino acids encoded by common model

organism genomes (Figure 2). The widespread availability of 3D

structure information enables rational structure-based ap-

1532 Structure 21, September 3, 2013 ª2013 Elsevier Ltd All rights r

proaches in a broad range of applications in life science research

(Schwede et al., 2009).

However, computational models often represent only fractions

of the full-length of a protein of interest, and one of the unre-

solved questions in template-basedmodeling is how to combine

information from multiple templates, e.g., different structural

domains, into larger complex assemblies. Current techniques

are not able to reliably predict the relative orientation of domains

of such multitemplate models. Also, comparative models still

resemble more closely the template than the target structure,

and refinement methods are not able to consistently refine

models closer to the target structure (MacCallum et al., 2011).

The development of reproducible and reliable methods for

refinement that consistently improve the accuracy of models

by shifting the coordinates closer to the native state is one of

the pressing challenges in the field.

The function of a protein almost always involves motions and

conformational changes, and a molecular understanding of its

mechanism requires a detailed description of the different func-

tional states the structure can explore dynamically. Typical

examples include allosteric conformational changes on binding

events, intermediate excited states in reaction cycles, trans-

port, and motion phenomena. Frequently, however, these

states are not directly observable experimentally at high reso-

lution but can only be characterized at low resolution, e.g., by

changes in chemical shifts, FRET, SAXS, or limited electron

density in X-ray crystallography (Hennig et al., 2013; Kalinin

eserved

Page 3: Protein Modeling: What Happened to the ``Protein Structure ...organism genomes. With the scientific focus of interest moving toward larger macromolecular complexes and dynamic networks

Figure 2. Structural Template Coverage of the Human ProteomeThe fraction of amino acids in the human proteome showing sequence simi-larity to proteins with known structures in the PDB is shown over time, wherecolors indicate levels of sequence identity as detected by PSI-BLAST (Altschulet al., 1997). The area shaded in red indicates the fraction of about 30% ofintrinsically unstructured residues estimated in the human proteome (Colaket al., 2013; Ward et al., 2004). Models built on templates sharing lowsequence identity <20% are often of poor quality due to evolutionary diver-gence between target and template structures, as well as limitations of themodeling and refinement methods (illustrated in Figure 3B). Prokaryotic pro-teomes have, in general, a higher structural coverage than eukaryotic ones(Guex et al., 2009; Zhang et al., 2009).

Structure

Review

et al., 2012; Ochi et al., 2012). Modeling and simulation will play

a central role in exploring these alternative conformations and

describing the dynamics of the transitions (Ma et al., 2011;

Nygaard et al., 2013; Weinkam et al., 2012).

Modeling Protein Complexes and InteractionsAlthough structural protein domains often reflect functional

modules (Lees et al., 2012), they are rarely found in isolation:

many proteins form intricate multidomain architectures, often

assemble to stable oligomeric quaternary states, and frequently

incorporate low-molecular-weight ligands and metal ions, e.g.,

as cofactors or structural components. Since molecular inter-

actions are not easily recognizable from the sequence of the pro-

tein, different structure-based prediction strategies have been

explored for modeling these interactions. Assuming that the

structures of the proteins participating in a complex are known,

docking programs aim to align them in an orientation favorable

for interaction (Janin, 2010). However, the details of the struc-

ture-affinity relationships are not yet fully understood, and for

complex systems, the estimates for the binding free energy are

often only approximate (Kastritis and Bonvin, 2013).

The fraction of protein heterocomplexes in the PDB is small

compared to the number of monomeric or homo-oligomeric

structures. Nevertheless, it seems that, for almost all known pro-

tein-protein interactions for which the individual components are

structurally characterized, structures of complexes can be iden-

tified in the PDB that can be used for template-based prediction

approaches (Kundrotas et al., 2012; Stein et al., 2011; Xu and

Dunbrack, 2011). In combination with homology modeling of

the target proteins, this opens the opportunity for the structural

prediction of protein-protein interactions on a genomewide scale

(Stein et al., 2011; Vakser, 2013). Although structure-based

methods for predicting protein-protein interactions might have

Structure

a rather high noise level, accuracy comparable to other high-

throughput methods can be achieved in combination with

orthogonal information (Aloy and Russell, 2006; Zhang et al.,

2012). The current state of the art in protein docking and predic-

tion of complexes is regularly assessed in the CAPRI blind pre-

diction experiment (Janin, 2010).

In contrast to structurally well-characterized stable protein

complexes, transient protein-protein interactions often involve

significant structural changes of the partners on binding. In

fact, a large fraction estimated around 30% of the proteome

encoded by higher eukaryotes is assumed to be highly flexible

or even intrinsically disordered and supposed to form a well-

defined 3-dimensional structure only on binding to a partner

molecule (Colak et al., 2013; Janin and Sternberg, 2013). These

intrinsically disordered protein (IDP) regions enable interactions

with many different proteins and are attributed to have a function

in tissue-specific rewiring of protein-protein interactions by spe-

cific modifications of IDP regions; for example, by posttransla-

tional modifications (PTMs) or alternative splicing (Buljan et al.,

2013). Sequence variations in protein-protein interfaces are

often associatedwith human diseases, as they have the potential

to disrupt regulatory networks in the cell (David et al., 2012; Vidal

et al., 2011; Wei et al., 2013). Prediction of these interactions and

structural modeling of mutations and PTMs remains an open

challenge of highest interest (Uversky and Dunker, 2013; Wass

et al., 2011).

Know Your Limits‘‘Essentially, all models are wrong, but some are useful’’ (Box

and Draper, 1987, p. 424). Different applications of macromolec-

ular structure information have different requirements with

respect to accuracy and resolution of a model. While atomistic

molecular modeling only works with highly accurate sets of

coordinates, models of lower resolution are still useful for the

rational-design site-directed mutagenesis experiments, epitope

mapping, or supporting experimental structure determination

(Schwede et al., 2009). In order to be able to decide if a model

or prediction is useful for a specific application, knowing its

expected accuracy and quality is essential. The assessment of

techniques for structure prediction in retrospective experiments

such as CASP (Moult et al., 2011), EVA (Koh et al., 2003), Life-

Bench (Rychlewski and Fischer, 2005), or CAMEO (Haas et al.,

2013) allowed comparison of different methods on the same

data set, thereby establishing the current state of the art and indi-

cating areas that require further development of improved

methods. However, the accuracy differences between the best

predictionmethods on the same protein target are typically small

in comparison to the differences between easy and difficult

protein targets (Figure 3). Therefore, reliable local error estimates

for the atomic coordinates predicted by a model are crucial for

judging its applicability for a specific question. Unfortunately,

only very few modeling methods today deliver reliable confi-

dence measures for their predictions (Kryshtafovych et al.,

2013; Mariani et al., 2011). This is a serious limitation of current

modeling techniques, which hinders the more widespread appli-

cation of models in biomedical research.

Several independent tools for model validation and quality

estimation have been developed to overcome this problem,

which assess certain structural features of a model such as

21, September 3, 2013 ª2013 Elsevier Ltd All rights reserved 1533

Page 4: Protein Modeling: What Happened to the ``Protein Structure ...organism genomes. With the scientific focus of interest moving toward larger macromolecular complexes and dynamic networks

Figure 3. Examples of Blind StructurePredictionsTwo proteins of different modeling difficulty aredisplayed, highlighting the importance of qualityestimation. Reliable estimation of the expectedcoordinate errors of individual models is crucial forjudging their suitability for specific applications.Consensus between independent predictionmethods has been shown to be a good indicator ofmodel accuracy in general.(A) Crystal structure of the acyl-CoA dehydroge-nase from Slackia heliotrinireducens solved by theMidwest Center for Structural Genomics insuperposition with the ten best blind predictions inthe CASP10 experiment (T0758). Obviously, in thiscase, all predictions agree well with the experi-mental reference structure, and the differencesbetween methods are small.(B) In more difficult cases, such as the crystalstructure of a hypothetical protein from Rumino-coccus gnavus solved at Joint Center for Struc-tural Genomics (PDB ID 4GL6) shown here (CASPtarget T0684-d2), no suitable template structurecould be identified, and the ten best predictionsshow large deviations from the reference structureand among each other.

Structure

Review

stereo-chemical correctness (Chen et al., 2010; Hooft et al.,

1996; Laskowski et al., 1993) or apply a combination of knowl-

edge-based statistical measures derived from high-resolution

crystals structures to provide estimates of model accuracy (Ben-

kert et al., 2011; Ray et al., 2012; Wiederstein and Sippl, 2007). In

case where many independent predictions for the same protein

by different independent methods are available, consensus

approaches have proven powerful to identify reliable areas in

models and to identify segments with likely deviations from the

actual structure (Ginalski et al., 2003; Kryshtafovych et al.,

2013; McGuffin et al., 2013).

Validating the accuracy and reliability of a model in order to

estimate its suitability for a specific application obviously

requires access to the model coordinates and information about

the procedures and underlying assumptions that were applied to

generate the model. However, since coordinates derived by

theoretical modeling cannot be deposited in the PDB (Berman

et al., 2006), many manuscripts reporting results of theoretical

modeling or simulations are published without the models being

made available. This makes it impossible for the reader and

reviewer of the manuscript to judge if the experiment is repro-

ducible and if conclusions are justified. In order to alleviate this

situation, a public archive of macromolecular structure models

(http://modelarchive.org) is currently being established as part

of Protein Model Portal (Haas et al., 2013). The model archive

provides a unique stable accession code (i.e., a digital object

identifier, or DOI) for each deposited model, which can be

directly referenced in the corresponding manuscripts. Besides

the actual model coordinates, archiving of models should

include sufficient details about assumptions, parameters, and

constraints applied in the simulation to allow the user of a model

to assess, and, if necessary, reproduce the simulation. In an ideal

situation, it should be possible to download a deposition from a

model archive and continue the simulation, e.g., by adding more

experimental data as constraints or by applying advanced simu-

lation methods that may have been developed in the meantime.

1534 Structure 21, September 3, 2013 ª2013 Elsevier Ltd All rights r

Over the last 2 decades, protein structure modeling and

prediction methods have matured to a point where reliable

models for many proteins can be generated and successfully

used as substitutes for direct experimental structures in a broad

variety of applications. In the following text, a few recent exam-

ples are highlighted.

Application of Homology Models in Structure-BasedDrug DiscoveryThe rational development of drugs increasingly relies on struc-

ture-based strategies for identifying potent and selective low-

molecular-weight chemical compounds. The usefulness of

homology models in structure-based virtual screening has

been demonstrated in various retrospective analyses on a broad

variety of different targets (Costanzi, 2013; Kairys et al., 2006;

McGovern and Shoichet, 2003; Oshiro et al., 2004; Skolnick

et al., 2013). One class of proteins garnering particular interest

for structure-based drug discovery are G protein-coupled recep-

tors (GPCRs), where recent advances in experimental structure

determination have brought many receptors within range for

comparative modeling techniques (Carlsson et al., 2011; Kobilka

andSchertler, 2008). Similar to protein structure prediction, com-

munity efforts for blind and independent validation of prediction

techniques are crucial for assessment of the expected accuracy

and reliability of modeled protein-ligand interactions (Damm-Ga-

namet et al., 2013; Kufareva et al., 2011; SAMPL, 2010).

Of note, experimental structures will not necessarily give

better results than models in structure-based drug discovery.

Bajorath and coworkers have analyzed 322 prospective virtual

screening campaigns in the scientific literature (Ripphausen

et al., 2010), out of which a total of 73 studies successfully

utilized homology models. It is surprising that the potency of

the hits identified using homology models was, on average,

higher than for hits identified by docking into X-ray structures.

The observation that an X-ray structure is not necessarily the

best possible representation of a particular structural state is

eserved

Page 5: Protein Modeling: What Happened to the ``Protein Structure ...organism genomes. With the scientific focus of interest moving toward larger macromolecular complexes and dynamic networks

Structure

Review

illustrated by a recent study for predicting the substrate site of

metabolism in cytochrome P450 CYP2D6. A model of CYP2D6

was generated based on the X-ray crystal structures of sub-

strate-bound CYP2C5 (Unwalla et al., 2010). During the study,

the structure of apo-CYP2D6 also became available. Both the

homology model and the experimental structure were used as

receptors in docking calculations. While overall the homology

model was in good agreement with the CYP2D6 crystal struc-

ture, the model consistently outperformed the experimental

structure in the docking calculations. This observation can be

attributed to structural differences in the substrate recognition

sites and demonstrates the importance of correctly describing

substrate-induced conformational changes that occur upon

ligand binding, which must be taken into account. Computa-

tional modeling techniques play a crucial role in exploring such

alternative receptor conformations.

Structural modeling can not only support the development of

new drugs but also predict the likely effect of amino acid

sequence variation in the proximity of the binding site; for ex-

ample, mutations that disrupt drug binding leading to drug re-

sistance. Computational techniques that can predict resistance

a priori are expected to become useful tools for drug discovery

and design of treatment in the clinics (Safi and Lilien, 2012).

De Novo Structure Prediction Techniques Guide ProteinEngineeringThe de novo structure prediction of naturally occurring proteins

is considered as the ‘‘holy grail’’ in computational structural

biology. Still, despite a few remarkable exceptions, de novo

techniques are limited to small proteins, and the overall accuracy

remains typically rather low (Kinch et al., 2011). It is interesting,

however, that the very same techniques appear to work remark-

ably well for the inverse problem—the design of protein

sequences with specific properties, e.g., that will fold into a spe-

cific structure or will form specific interactions. Baker and

coworkers have successfully designed a library of idealized

protein folds that allowed them to derive a set of rules guiding

future designs (Koga et al., 2012). Others have engineered natu-

rally occurring tandem-repeat proteins, e.g., ankyrin repeat

proteins, into a system that now allows researchers to rationally

design proteins with specific binding properties and finds a wide

range of applications from research in structural biology to,

possibly in the future, therapeutics (Javadi and Itzhaki, 2013;

Tamaskovic et al., 2012).

Due to limitations in the computational design methodology,

these approaches are often combined with high-throughput

screening and in vitro maturation techniques to diagnose

modeling inaccuracies and generate high-activity binders

(Whitehead et al., 2013). This approach was recently applied to

design proteins that bind a conserved surface patch on the

stem of the influenza hemagglutinin (HA) from the 1918 H1N1

pandemic virus. After affinity maturation, two of the designed

proteins, HB36 and HB80, bind H1 and H5 HAs with low nano-

molar affinity (Fleishman et al., 2011). Special cases of designing

specific interactions are proteins that self-assemble to a desired

symmetric architecture. The experimental validation of a de-

signed 24-subunit, 13-nm diameter complex with octahedral

symmetry and a 12-subunit, 11-nm diameter complex with tetra-

hedral symmetry confirmed that the resulting materials closely

Structure

matched the design models (King et al., 2012). The approach

opens interesting perspectives for the development of self-

assembling protein nanomaterials.

While, in the previous example, the self-assembly into larger

aggregates was a desired design goal, the self-assembly of

proteins into amyloid fibrils in the human body is associated

with numerous pathologies. Recently, several structures of

amyloid fibrils have been determined experimentally (Eisenberg

and Jucker, 2012) and provide an interesting target for the devel-

opment of specific inhibitors of pathological amyloid fibril for-

mation. Computer-aided structure-based design approaches

have been successfully applied to develop highly specific pep-

tide inhibitors of amyloid formation of the tau protein associated

with Alzheimer’s disease and of an amyloid fibril that enhances

sexual transmission of the HIV (Sievers et al., 2011). It is worth

noting that such peptide inhibitors are not limited to the 20 natu-

rally occurring amino acids.

Coevolution Information in Sequence Data HelpsPredict Membrane Protein StructuresThe experimental determination of the structures of membrane

proteins by X-ray crystallography is challenging and requires a

series of sophisticated techniques for protein expression, purifi-

cation, and crystallization. Recent technological advances have

led to a strong increase in the number of membrane proteins

characterized experimentally. Prominent examples include

pharmacologically highly relevant drug targets such as GPCRs

(Kobilka and Schertler, 2008), drug efflux transporters (Aller

et al., 2009; Nakashima et al., 2013), or ion channels (Gouaux

and Mackinnon, 2005). However, the number of membrane

proteins is still small compared to soluble proteins, and homol-

ogy modeling of membrane proteins is, therefore, only possible

for a small fraction of membrane proteins.

Recently, computational methods using coevolution infor-

mation have been developed to predict contacts between pairs

of amino acid residues based on deep multiple sequence align-

ments. Although similar approaches had been proposed before,

recent breakthroughs in the handling of phylogenetic information

and in disentangling indirect relationships have resulted in an

improved capacity to predict contacts between different protein

residues (Burger and van Nimwegen, 2010; de Juan et al., 2013;

Hopf et al., 2012; Lapedes et al., 2002; Morcos et al., 2011;

Nugent and Jones, 2012). These approaches were shown to be

effective in inferring evolutionary covariation in pairs of sequence

positions within families of membrane proteins and were shown

to provide sufficiently accurate pairwise distance constraints

to generate all-atom models. These methods are expected to

greatly expand the range of membrane proteins amenable to

modeling due to the rapid increase sequence information, which

will allowderivation ofmore comprehensive information on evolu-

tionary constraints (Hopf et al., 2012; Nugent and Jones, 2012).

Combining Experimental Methods and ComputationalModeling for Integrative Structure DeterminationIn the simplest case of combining modeling with experimental

data, homology models are being used as search models for

phasing diffraction data in X-ray crystallography by searching

for placements of a starting model within the crystallographic

unit cell that best accounts for the measured diffraction

21, September 3, 2013 ª2013 Elsevier Ltd All rights reserved 1535

Page 6: Protein Modeling: What Happened to the ``Protein Structure ...organism genomes. With the scientific focus of interest moving toward larger macromolecular complexes and dynamic networks

Figure 4. Integrative Structure Model of the NPCThe molecular architecture of the approximately 50 MDa transmembrane NPCconsist of 456 constituent proteins that selectively transport cargoes acrossthe nuclear envelope (Alber et al., 2007a, 2007b). Image courtesy of AndrejSali, UCSF (http://salilab.org).

Structure

Review

amplitudes. This approach however, requires relatively accurate

starting coordinates and often fails for starting models based on

remote homologs. This limitation can be overcome by using

modeling algorithms for sampling near-native conformations of

the initial starting model (DiMaio et al., 2011) or even allow

human protein folding game players to sample different protein

conformations (Khatib et al., 2011).

Often, however, preparing protein crystals of sufficient quality

to collect high-resolution diffraction data turns out to be the

limiting step, and complementary methods to characterize the

sample in solution have to be explored. Many of these methods

provide data of relatively low resolution, and computational

modeling techniques are required to generate likely structural

representations compatible with the data. For example, NMR

chemical shifts provide important local structural information

for proteins, and consistent structure generation from NMR

chemical shift data has recently become feasible for proteins

with sizes of up to 130 residues at an accuracy comparable to

those obtained with the standard NMR protocol (Shen et al.,

2009). SAXS is a robust and easily accessible technique for the

structural characterizations of biological macromolecular com-

plexes in solution under physiological conditions. This low-reso-

lution technique is specifically useful for characterizing large and

transient complexes andmovements in flexible macromolecules

(Graewert and Svergun, 2013; Rambo and Tainer, 2013;

Schneidman-Duhovny et al., 2012a; Trewhella et al., 2013) and

provides valuable constraints, e.g., for restrained macromolec-

ular docking experiments (Schneidman-Duhovny et al., 2011).

Recent advances in the measurement and interpretation of

single-molecule FRET experiments allow the generation of highly

accurate distance constraints as input for restrained modeling of

biomolecules and complexes (Kalinin et al., 2012). Conceptually

similar constraints about the spatial proximity of pairs of lysine

residues at the surface of a protein can be derived by chemical

crosslinking and analysis by peptide mass spectrometry. This

type of information appears useful for reconstructing large

macromolecular complexes (Walzthoeni et al., 2013).

While, traditionally, the main focus of macromolecular struc-

ture modeling and prediction has been on proteins, RNA struc-

ture prediction has recently gained significant attention. Of

note, many of the approaches developed originally for protein

modeling such as homology modeling, de novo prediction, qual-

ity estimation, and blind assessment of prediction methods,

appear to be applicable with small adaptations to RNA structure

modeling (Cruz et al., 2012; Rother et al., 2011; Seetin andMath-

ews, 2012; Sim et al., 2012). In contrast to proteins, RNA second-

ary structure can be directly characterized by an experimental

approach called SHAPE-Seq, which uses selective 20-hydroxylacylation analyzed by primer extension sequencing to inform

the modeling of RNA tertiary structure (Aviran et al., 2011).

With the focus of interest in structural biology moving toward

larger macromolecular complexes and dynamic networks of

interactions, individual experimental techniques are often no

longer able to generate sufficient data that would allow gener-

ating a unique high-resolution atomic model of the system. The

combination computational modeling with a variety of hetero-

geneous (low-resolution) experimental constraints has proven

extremely powerful (Alber et al., 2008; Ward et al., 2013). One

essential feature of such integrative structure solution tech-

1536 Structure 21, September 3, 2013 ª2013 Elsevier Ltd All rights r

niques is that they must be able to handle ambiguous or conflict-

ing information with different levels of accuracy (Alber et al.,

2008; Rieping et al., 2005; Schneidman-Duhovny et al., 2012b).

In this context, cryo-electron microscopy techniques play a cen-

tral role for determining the molecular architecture of large

macromolecular complexes (Lasker et al., 2012b; Velazquez-

Muriel et al., 2012; Zhao et al., 2013). Recent successful exam-

ples determined by integrative (a.k.a. hybrid) techniques include

the molecular architecture of the nuclear pore complex (NPC;

Figure 4), a transmembrane complex of approximately 50 MDa

with 456 constituent proteins that selectively transport cargoes

across the nuclear envelope (Alber et al., 2007a, 2007b). In a

similar approach, the molecular architecture of the 26S protea-

some holocomplex was determined (Lasker et al., 2012a).

Beyond Individual Models: Structural Biology of CellularProcessesAlthough many equilibrium models treat cells as a ‘‘bag of

enzymes,’’ living cells actively maintain a higher order internal

structure, and the functional aspects of this topological organiza-

tion in 3D space are gradually being discovered. New technolo-

gies for cryo-electron tomography hold the promise for direct

observation of cellular processes in situ (Briggs, 2013; Robinson

et al., 2007; Yahav et al., 2011). Higher order 3D cellular organiza-

tion also includes genome topology. Genomewide biochemical

analysis methods in combination with functional data provide

insights into how genome topology is maintained and the influ-

ence that it exhibits on gene expression and genome mainte-

nance. The intricate interplay between transcriptional activity

and spatial organization indicates a self-organizing and self-

perpetuating system that uses epigenetic dynamics to regulate

genome function in response to regulatory cues and topropagate

cell fate memory. Computational modeling based on data from

recently developed chromosome conformation capture tech-

nology provides detailed insights into the spatial organization of

genomes (Cavalli and Misteli, 2013; Dekker et al., 2013; Engreitz

et al., 2013; Gibcus and Dekker, 2013; Kimura et al., 2013).

eserved

Page 7: Protein Modeling: What Happened to the ``Protein Structure ...organism genomes. With the scientific focus of interest moving toward larger macromolecular complexes and dynamic networks

Figure 5. Speculative Data-Driven 3DModel of the Bacterial DivisionMachineryThe model was created with the GraphiteLifeExplorer modeling tool (Hornuset al., 2013). The FtsZ tubulin-like protein (in dark blue and yellow) is shapedinto a double ring. A short filament of the FtsA actin-like protein (in light blue) isshown on the Z ring. One FtsKmotor (in gray) pumps the DNA. This translocaseis linked to the membrane (data not shown) by six linkers (Vendeville et al.,2011). Image courtesy of Damien Lariviere (http://www.lifeexplorer.eu/).

Structure

Review

The integrative modeling of complex molecular machines

combines data from a broad range of experimental techniques,

such as EM, chemical crosslinking, proteomics, FRET, or

SAXS in an attempt to build an ensemble of models that is

consistent with the available data (Webb et al., 2011). Often,

this is an iterative process where ambiguities in the models indi-

cate lack (or inconsistency) of the data, motivating new experi-

ments to resolve these ambiguities. Software packages that

integrate modeling capability with powerful visualization tools

in projects such as IMP/Chimera (Yang et al., 2012), Mesoscope

(Al-Amoudi et al., 2011), or Life Explorer (Figure 5) (Hornus et al.,

2013) will greatly facilitate the generation and interpretation of

the resulting models.

Conveying the underlying assumptions of a computational

technique, as well as estimating the expected accuracy and vari-

ability of a model, will be crucial in making these computational

techniques useful for the nonexpert scientist. Understanding the

limitations of a model is the first step in deciding which interpre-

tations and conclusions can be supported. One of the key chal-

lenges in this respect will be to develop an open environment for

sharing of data, models, and algorithms that will allow us to

continuously and collaboratively refine our current model of the

‘‘molecular sociology of the cell’’ (Robinson et al., 2007).

ACKNOWLEDGMENTS

Special thanks to Andrej Sali (University of California, San Francisco) forproviding the image of the NPC, Damien Lariviere (Fondation Fourmentin)for the screen shot of Life Explorer, and Jurgen Haas and Stefan Bienert forhelp with the template coverage plot. Model Archive was supported by the Na-tional Institute of General Medical Science of the National Institutes of Healthas a subaward to Rutgers University under award number U01 GM093324 andthe SIB Swiss Institute of Bioinformatics.

REFERENCES

Al-Amoudi, A., Castano-Diez, D., Devos, D.P., Russell, R.B., Johnson, G.T.,and Frangakis, A.S. (2011). The three-dimensional molecular structure of thedesmosomal plaque. Proc. Natl. Acad. Sci. USA 108, 6480–6485.

Structure

Alber, F., Dokudovskaya, S., Veenhoff, L.M., Zhang, W., Kipper, J., Devos, D.,Suprapto, A., Karni-Schmidt, O., Williams, R., Chait, B.T., et al. (2007a). Deter-mining the architectures of macromolecular assemblies. Nature 450, 683–694.

Alber, F., Dokudovskaya, S., Veenhoff, L.M., Zhang, W., Kipper, J., Devos, D.,Suprapto, A., Karni-Schmidt, O., Williams, R., Chait, B.T., et al. (2007b). Themolecular architecture of the nuclear pore complex. Nature 450, 695–701.

Alber, F., Forster, F., Korkin, D., Topf, M., and Sali, A. (2008). Integratingdiverse data for structure determination of macromolecular assemblies.Annu. Rev. Biochem. 77, 443–477.

Aller, S.G., Yu, J., Ward, A., Weng, Y., Chittaboina, S., Zhuo, R., Harrell, P.M.,Trinh, Y.T., Zhang, Q., Urbatsch, I.L., and Chang, G. (2009). Structure of P-glycoprotein reveals a molecular basis for poly-specific drug binding. Science323, 1718–1722.

Aloy, P., and Russell, R.B. (2006). Structural systems biology: modelling pro-tein interactions. Nat. Rev. Mol. Cell Biol. 7, 188–197.

Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W.,and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: a new generationof protein database search programs. Nucleic Acids Res. 25, 3389–3402.

Arnold, K., Bordoli, L., Kopp, J., and Schwede, T. (2006). The SWISS-MODELworkspace: a web-based environment for protein structure homology model-ling. Bioinformatics 22, 195–201.

Arnold, K., Kiefer, F., Kopp, J., Battey, J.N., Podvinec, M., Westbrook, J.D.,Berman, H.M., Bordoli, L., and Schwede, T. (2009). The Protein Model Portal.J. Struct. Funct. Genomics 10, 1–8.

Aviran, S., Trapnell, C., Lucks, J.B., Mortimer, S.A., Luo, S., Schroth, G.P.,Doudna, J.A., Arkin, A.P., and Pachter, L. (2011). Modeling and automationof sequencing-based characterization of RNA structure. Proc. Natl. Acad.Sci. USA 108, 11069–11074.

Baker, D., and Sali, A. (2001). Protein structure prediction and structural geno-mics. Science 294, 93–96.

Benkert, P., Biasini, M., and Schwede, T. (2011). Toward the estimation of theabsolute quality of individual protein structure models. Bioinformatics 27,343–350.

Berman, H.M., Burley, S.K., Chiu, W., Sali, A., Adzhubei, A., Bourne, P.E.,Bryant, S.H., Dunbrack, R.L., Jr., Fidelis, K., Frank, J., et al. (2006). Outcomeof a workshop on archiving structural models of biological macromolecules.Structure 14, 1211–1217.

Berman, H., Henrick, K., Nakamura, H., and Markley, J.L. (2007). The world-wide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDBdata. Nucleic Acids Res. 35(Database issue), D301–D303.

Bordoli, L., and Schwede, T. (2012). Automated protein structure modelingwith SWISS-MODEL Workspace and the Protein Model Portal. MethodsMol. Biol. 857, 107–136.

Box, G.E.P., and Draper, N.R. (1987). Empirical Model-Building and ResponseSurfaces (New York: Wiley).

Briggs, J.A. (2013). Structural biology in situ—the potential of subtomogramaveraging. Curr. Opin. Struct. Biol. 23, 261–267.

Buljan, M., Chalancon, G., Dunker, A.K., Bateman, A., Balaji, S., Fuxreiter, M.,and Babu, M.M. (2013). Alternative splicing of intrinsically disordered regionsand rewiring of protein interactions. Curr. Opin. Struct. Biol. 23, 443–450.

Burger, L., and van Nimwegen, E. (2010). Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Comput. Biol. 6, e1000633.

Carlsson, J., Coleman, R.G., Setola, V., Irwin, J.J., Fan, H., Schlessinger, A.,Sali, A., Roth, B.L., and Shoichet, B.K. (2011). Ligand discovery from a dopa-mine D3 receptor homology model and crystal structure. Nat. Chem. Biol. 7,769–778.

Cavalli, G., and Misteli, T. (2013). Functional implications of genome topology.Nat. Struct. Mol. Biol. 20, 290–299.

Chen, V.B., Arendall, W.B., 3rd, Headd, J.J., Keedy, D.A., Immormino, R.M.,Kapral, G.J., Murray, L.W., Richardson, J.S., and Richardson, D.C. (2010).MolProbity: all-atom structure validation for macromolecular crystallography.Acta Crystallogr. D Biol. Crystallogr. 66, 12–21.

21, September 3, 2013 ª2013 Elsevier Ltd All rights reserved 1537

Page 8: Protein Modeling: What Happened to the ``Protein Structure ...organism genomes. With the scientific focus of interest moving toward larger macromolecular complexes and dynamic networks

Structure

Review

Chothia, C., and Lesk, A.M. (1986). The relation between the divergence ofsequence and structure in proteins. EMBO J. 5, 823–826.

Colak, R., Kim, T., Michaut, M., Sun, M., Irimia, M., Bellay, J., Myers, C.L.,Blencowe, B.J., and Kim, P.M. (2013). Distinct types of disorder in the humanproteome: functional implications for alternative splicing. PLoS Comput. Biol.9, e1003030.

Collins, F.S., Green, E.D., Guttmacher, A.E., and Guyer, M.S.; US NationalHuman Genome Research Institute. (2003). A vision for the future of genomicsresearch. Nature 422, 835–847.

Costanzi, S. (2013). Modeling G protein-coupled receptors and their interac-tions with ligands. Curr. Opin. Struct. Biol. 23, 185–190.

Cruz, J.A., Blanchet, M.F., Boniecki, M., Bujnicki, J.M., Chen, S.J., Cao, S.,Das, R., Ding, F., Dokholyan, N.V., Flores, S.C., et al. (2012). RNA-Puzzles: aCASP-like evaluation of RNA three-dimensional structure prediction. RNA18, 610–625.

Damm-Ganamet, K.L., Smith, R.D., Dunbar, J.B., Jr., Stuckey, J.A., and Carl-son, H.A. (2013). CSAR Benchmark Exercise 2011-2012: Evaluation of resultsfrom docking and relative ranking of blinded congeneric series. J. Chem. Inf.Model. Published online April 2, 2013. http://dx.doi.org/10.1021/ci400025f.

Das, R., and Baker, D. (2008). Macromolecular modeling with rosetta. Annu.Rev. Biochem. 77, 363–382.

David, A., Razali, R., Wass, M.N., and Sternberg, M.J. (2012). Protein-proteininteraction sites are hot spots for disease-associated nonsynonymous SNPs.Hum. Mutat. 33, 359–363.

de Juan, D., Pazos, F., and Valencia, A. (2013). Emerging methods in proteinco-evolution. Nat. Rev. Genet. 14, 249–261.

Dekker, J., Marti-Renom, M.A., and Mirny, L.A. (2013). Exploring the three-dimensional organization of genomes: interpreting chromatin interactiondata. Nat. Rev. Genet. 14, 390–403.

DiMaio, F., Terwilliger, T.C., Read, R.J.,Wlodawer, A., Oberdorfer, G.,Wagner,U., Valkov, E., Alon, A., Fass, D., Axelrod, H.L., et al. (2011). Improved molec-ular replacement by density- and energy-guided protein structure optimiza-tion. Nature 473, 540–543.

Eisenberg, D., and Jucker, M. (2012). The amyloid state of proteins in humandiseases. Cell 148, 1188–1203.

Engreitz, J.M., Pandya-Jones, A., McDonel, P., Shishkin, A., Sirokman, K.,Surka, C., Kadri, S., Xing, J., Goren, A., Lander, E.S., et al. (2013). The XistlncRNA exploits three-dimensional genome architecture to spread acrossthe X chromosome. Science. Published online July 4, 2013. http://dx.doi.org/10.1126/science.1237973.

Fleishman, S.J., Whitehead, T.A., Ekiert, D.C., Dreyfus, C., Corn, J.E., Strauch,E.M., Wilson, I.A., and Baker, D. (2011). Computational design of proteinstargeting the conserved stem region of influenza hemagglutinin. Science332, 816–821.

Gibcus, J.H., and Dekker, J. (2013). The hierarchy of the 3D genome. Mol. Cell49, 773–782.

Ginalski, K., Elofsson, A., Fischer, D., and Rychlewski, L. (2003). 3D-Jury: asimple approach to improve protein structure predictions. Bioinformatics 19,1015–1018.

Gouaux, E., and Mackinnon, R. (2005). Principles of selective ion transport inchannels and pumps. Science 310, 1461–1465.

Graewert, M.A., and Svergun, D.I. (2013). Impact and progress in small andwide angle X-ray scattering (SAXS and WAXS). Curr. Opin. Struct. Biol. Pub-lished online July 5, 2013. http://dx.doi.org/10.1016/j.sbi.2013.06.007.

Guex, N., Peitsch, M.C., and Schwede, T. (2009). Automated comparative pro-tein structure modeling with SWISS-MODEL and Swiss-PdbViewer: a histori-cal perspective. Electrophoresis 30(Suppl 1 ), S162–S173.

Haas, J., Roth, S., Arnold, K., Kiefer, F., Schmidt, T., Bordoli, L., and Schwede,T. (2013). The Protein Model Portal—a comprehensive resource for proteinstructure and model information. Database (Oxford) 2013, bat031.

Hennig, J., Wang, I., Sonntag, M., Gabel, F., and Sattler, M. (2013). CombiningNMR and small angle X-ray and neutron scattering in the structural analysis ofa ternary protein-RNA complex. J. Biomol. NMR 56, 17–30.

1538 Structure 21, September 3, 2013 ª2013 Elsevier Ltd All rights r

Hildebrand, A., Remmert, M., Biegert, A., and Soding, J. (2009). Fast andaccurate automatic structure prediction with HHpred. Proteins 77(Suppl 9 ),128–132.

Hooft, R.W., Vriend, G., Sander, C., and Abola, E.E. (1996). Errors in proteinstructures. Nature 381, 272.

Hopf, T.A., Colwell, L.J., Sheridan, R., Rost, B., Sander, C., and Marks, D.S.(2012). Three-dimensional structures of membrane proteins from genomicsequencing. Cell 149, 1607–1621.

Hornus, S., Levy, B., Lariviere, D., and Fourmentin, E. (2013). Easy DNAmodeling and more with GraphiteLifeExplorer. PLoS ONE 8, e53609.

Janin, J. (2010). Protein-protein docking tested in blind predictions: the CAPRIexperiment. Mol. Biosyst. 6, 2351–2362.

Janin, J., and Sternberg, M.J. (2013). Protein flexibility, not disorder, is intrinsicto molecular recognition. F1000 Biol. Rep. 5, 2.

Javadi, Y., and Itzhaki, L.S. (2013). Tandem-repeat proteins: regularity plusmodularity equals design-ability. Curr. Opin. Struct. Biol. 23, 622–631.

Kairys, V., Fernandes, M.X., and Gilson, M.K. (2006). Screening drug-likecompounds by docking to homology models: a systematic study. J. Chem.Inf. Model. 46, 365–379.

Kalinin, S., Peulen, T., Sindbert, S., Rothwell, P.J., Berger, S., Restle, T.,Goody, R.S., Gohlke, H., and Seidel, C.A. (2012). A toolkit and benchmarkstudy for FRET-restrained high-precision structural modeling. Nat. Methods9, 1218–1225.

Kastritis, P.L., and Bonvin, A.M. (2013). On the binding affinity of macromolec-ular interactions: daring to ask why proteins interact. J. R. Soc. Interface 10,20120835.

Kelley, L.A., and Sternberg, M.J. (2009). Protein structure prediction on theWeb: a case study using the Phyre server. Nat. Protoc. 4, 363–371.

Khatib, F., DiMaio, F., Cooper, S., Kazmierczyk, M., Gilski, M., Krzywda, S.,Zabranska, H., Pichova, I., Thompson, J., Popovi�c, Z., et al.; Foldit ContendersGroup; Foldit Void Crushers Group. (2011). Crystal structure of a monomericretroviral protease solved by protein folding game players. Nat. Struct. Mol.Biol. 18, 1175–1177.

Kiefer, F., Arnold, K., Kunzli, M., Bordoli, L., and Schwede, T. (2009). TheSWISS-MODEL Repository and associated resources. Nucleic Acids Res.37(Database issue), D387–D392.

Kimura, H., Shimooka, Y., Nishikawa, J.I., Miura, O., Sugiyama, S., Yamada,S., and Ohyama, T. (2013). The genome folding mechanism in yeast.J. Biochem. 154, 137–147.

Kinch, L., Yong Shi, S., Cong, Q., Cheng, H., Liao, Y., and Grishin, N.V. (2011).CASP9 assessment of freemodeling target predictions. Proteins 79(Suppl 10 ),59–73.

King, N.P., Sheffler, W., Sawaya, M.R., Vollmar, B.S., Sumida, J.P., Andre, I.,Gonen, T., Yeates, T.O., and Baker, D. (2012). Computational design of self-assembling protein nanomaterials with atomic level accuracy. Science 336,1171–1174.

Kobilka, B., and Schertler, G.F. (2008). New G-protein-coupled receptor crys-tal structures: insights and limitations. Trends Pharmacol. Sci. 29, 79–83.

Koga, N., Tatsumi-Koga, R., Liu, G., Xiao, R., Acton, T.B., Montelione, G.T.,and Baker, D. (2012). Principles for designing ideal protein structures. Nature491, 222–227.

Koh, I.Y., Eyrich, V.A., Marti-Renom,M.A., Przybylski, D., Madhusudhan, M.S.,Eswar, N., Grana, O., Pazos, F., Valencia, A., Sali, A., and Rost, B. (2003). EVA:Evaluation of protein structure prediction servers. Nucleic Acids Res. 31,3311–3315.

Kryshtafovych, A., Barbato, A., Fidelis, K., Monastyrskyy, B., Schwede, T., andTramontano, A. (2013). Assessment of the assessment: Evaluation of themodel quality estimates in CASP10. Proteins. http://dx.doi.org/10.1002/prot.24347, Published June 18, 2013.

Kufareva, I., Rueda, M., Katritch, V., Stevens, R.C., and Abagyan, R.; GPCRDock 2010 participants. (2011). Status of GPCR modeling and docking asreflected by community-wide GPCR Dock 2010 assessment. Structure 19,1108–1126.

eserved

Page 9: Protein Modeling: What Happened to the ``Protein Structure ...organism genomes. With the scientific focus of interest moving toward larger macromolecular complexes and dynamic networks

Structure

Review

Kundrotas, P.J., Zhu, Z., Janin, J., and Vakser, I.A. (2012). Templates are avail-able tomodel nearly all complexes of structurally characterized proteins. Proc.Natl. Acad. Sci. USA 109, 9438–9441.

Lapedes, A., Giraud, B., and Jarzynski, C. (2002). Using sequence alignmentsto predict protein structure and stability with high accuracy. arXiv,arXiv:1207.2484, http://arxiv.org/abs/1207.2484.

Larsson, P., Skwark, M.J., Wallner, B., and Elofsson, A. (2011). Improvedpredictions by Pcons.net using multiple templates. Bioinformatics 27,426–427.

Lasker, K., Forster, F., Bohn, S., Walzthoeni, T., Villa, E., Unverdorben, P.,Beck, F., Aebersold, R., Sali, A., and Baumeister, W. (2012a). Molecular archi-tecture of the 26S proteasome holocomplex determined by an integrativeapproach. Proc. Natl. Acad. Sci. USA 109, 1380–1387.

Lasker, K., Velazquez-Muriel, J.A., Webb, B.M., Yang, Z., Ferrin, T.E., and Sali,A. (2012b). Macromolecular assembly structures by comparative modelingand electron microscopy. Methods Mol. Biol. 857, 331–350.

Laskowski, R.A., Macarthur, M.W., Moss, D.S., and Thornton, J.M. (1993).Procheck - a program to check the stereochemical quality of protein struc-tures. J. Appl. Crystallogr. 26, 283–291.

Lees, J., Yeats, C., Perkins, J., Sillitoe, I., Rentzsch, R., Dessailly, B.H., andOrengo, C. (2012). Gene3D: a domain-based resource for comparative geno-mics, functional annotation and protein network analysis. Nucleic Acids Res.40(Database issue), D465–D471.

Levitt, M. (2009). Nature of the protein universe. Proc. Natl. Acad. Sci. USA106, 11079–11084.

Ma, B., Tsai, C.J., Halilo�glu, T., and Nussinov, R. (2011). Dynamic allostery:linkers are not merely flexible. Structure 19, 907–917.

MacCallum, J.L., Perez, A., Schnieders, M.J., Hua, L., Jacobson, M.P., andDill, K.A. (2011). Assessment of protein structure refinement in CASP9. Pro-teins 79(Suppl 10 ), 74–90.

Mariani, V., Kiefer, F., Schmidt, T., Haas, J., and Schwede, T. (2011). Assess-ment of template based protein structure predictions in CASP9. Proteins79(Suppl 10 ), 37–58.

McGovern, S.L., and Shoichet, B.K. (2003). Information decay in moleculardocking screens against holo, apo, and modeled conformations of enzymes.J. Med. Chem. 46, 2895–2907.

McGuffin, L.J., Buenavista, M.T., and Roche, D.B. (2013). The ModFOLD4server for the quality assessment of 3D protein models. Nucleic Acids Res.41(W1), W368–W372.

Morcos, F., Pagnani, A., Lunt, B., Bertolino, A., Marks, D.S., Sander, C., Zec-china, R., Onuchic, J.N., Hwa, T., and Weigt, M. (2011). Direct-couplinganalysis of residue coevolution captures native contacts across many proteinfamilies. Proc. Natl. Acad. Sci. USA 108, E1293–E1301.

Moult, J., Fidelis, K., Kryshtafovych, A., and Tramontano, A. (2011). Criticalassessment of methods of protein structure prediction (CASP)—round IX. Pro-teins 79(Suppl 10 ), 1–5.

Nakashima, R., Sakurai, K., Yamasaki, S., Hayashi, K., Nagata, C., Hoshino,K., Onodera, Y., Nishino, K., and Yamaguchi, A. (2013). Structural basis forthe inhibition of bacterial multidrug exporters. Nature 500, 102–106.

Nugent, T., and Jones, D.T. (2012). Accurate de novo structure prediction oflarge transmembrane protein domains using fragment-assembly and corre-lated mutation analysis. Proc. Natl. Acad. Sci. USA 109, E1540–E1547.

Nygaard, R., Zou, Y., Dror, R.O., Mildorf, T.J., Arlow, D.H., Manglik, A., Pan,A.C., Liu, C.W., Fung, J.J., Bokoch, M.P., et al. (2013). The dynamic processof b(2)-adrenergic receptor activation. Cell 152, 532–542.

Ochi, T., Wu, Q., Chirgadze, D.Y., Grossmann, J.G., Bolanos-Garcia, V.M.,and Blundell, T.L. (2012). Structural insights into the role of domain flexibilityin human DNA ligase IV. Structure 20, 1212–1222.

Oshiro, C., Bradley, E.K., Eksterowicz, J., Evensen, E., Lamb, M.L., Lanctot,J.K., Putta, S., Stanton, R., and Grootenhuis, P.D. (2004). Performance of3D-database molecular docking studies into homology models. J. Med.Chem. 47, 764–767.

Structure

Peitsch, M.C. (1995). Protein modeling by e-mail. Nat. Biotechnol. 13,658–660.

Pieper, U., Webb, B.M., Barkan, D.T., Schneidman-Duhovny, D., Schles-singer, A., Braberg, H., Yang, Z., Meng, E.C., Pettersen, E.F., Huang, C.C.,et al. (2011). ModBase, a database of annotated comparative protein structuremodels, and associated resources. Nucleic Acids Res. 39(Database issue),D465–D474.

Raman, S., Vernon, R., Thompson, J., Tyka, M., Sadreyev, R., Pei, J., Kim, D.,Kellogg, E., DiMaio, F., Lange, O., et al. (2009). Structure prediction for CASP8with all-atom refinement using Rosetta. Proteins 77(Suppl 9 ), 89–99.

Rambo, R.P., and Tainer, J.A. (2013). Super-resolution in solution X-ray scat-tering and its applications to structural systems biology. Annu. Rev. Biophys.42, 415–441.

Ray, A., Lindahl, E., and Wallner, B. (2012). Improved model quality assess-ment using ProQ2. BMC Bioinformatics 13, 224.

Read, R.J., Adams, P.D., Arendall, W.B., 3rd, Brunger, A.T., Emsley, P., Joos-ten, R.P., Kleywegt, G.J., Krissinel, E.B., Lutteke, T., Otwinowski, Z., et al.(2011). A new generation of crystallographic validation tools for the proteindata bank. Structure 19, 1395–1412.

Remmert, M., Biegert, A., Hauser, A., and Soding, J. (2012). HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat.Methods 9, 173–175.

Rieping, W., Habeck, M., and Nilges, M. (2005). Inferential structure determi-nation. Science 309, 303–306.

Ripphausen, P., Nisius, B., Peltason, L., and Bajorath, J. (2010). Quo vadis,virtual screening? A comprehensive survey of prospective applications.J. Med. Chem. 53, 8461–8467.

Robinson, C.V., Sali, A., and Baumeister, W. (2007). The molecular sociologyof the cell. Nature 450, 973–982.

Roche, D.B., Buenavista, M.T., Tetchner, S.J., and McGuffin, L.J. (2011). TheIntFOLD server: an integrated web resource for protein fold recognition, 3Dmodel quality assessment, intrinsic disorder prediction, domain predictionand ligand binding site prediction. Nucleic Acids Res. 39(Web Server issue),W171–W176.

Rother, M., Rother, K., Puton, T., and Bujnicki, J.M. (2011). RNA tertiary struc-ture prediction with ModeRNA. Brief. Bioinform. 12, 601–613.

Russel, D., Lasker, K., Webb, B., Velazquez-Muriel, J., Tjioe, E., Schneidman-Duhovny, D., Peterson, B., and Sali, A. (2012). Putting the pieces together:integrative modeling platform software for structure determination of macro-molecular assemblies. PLoS Biol. 10, e1001244.

Rychlewski, L., and Fischer, D. (2005). LiveBench-8: the large-scale, contin-uous assessment of automated protein structure prediction. Protein Sci. 14,240–245.

Safi, M., and Lilien, R.H. (2012). Efficient a priori identification of drug resistantmutations using Dead-End Elimination and MM-PBSA. J. Chem. Inf. Model.52, 1529–1541.

SAMPL. (2010). Proceedings of the Third Annual Statistical Assessment of theModeling of Proteins and Ligands (SAMPL) Challenge and Workshop. June2009. Montreal, Canada. J. Comput. Aided Mol. Des. 24, 257–383.

Sanchez, R., and Sali, A. (1998). Large-scale protein structure modeling of theSaccharomyces cerevisiae genome. Proc. Natl. Acad. Sci. USA 95, 13597–13602.

Schneidman-Duhovny, D., Hammel, M., and Sali, A. (2011). Macromoleculardocking restrained by a small angle X-ray scattering profile. J. Struct. Biol.173, 461–471.

Schneidman-Duhovny, D., Kim, S.J., and Sali, A. (2012a). Integrative structuralmodeling with small angle X-ray scattering profiles. BMC Struct. Biol. 12, 17.

Schneidman-Duhovny, D., Rossi, A., Avila-Sakar, A., Kim, S.J., Velazquez-Muriel, J., Strop, P., Liang, H., Krukenberg, K.A., Liao, M., Kim, H.M., et al.(2012b). A method for integrative structure determination of protein-proteincomplexes. Bioinformatics 28, 3282–3289.

Schwede, T., Sali, A., Honig, B., Levitt, M., Berman, H.M., Jones, D., Brenner,S.E., Burley, S.K., Das, R., Dokholyan, N.V., et al. (2009). Outcome of a

21, September 3, 2013 ª2013 Elsevier Ltd All rights reserved 1539

Page 10: Protein Modeling: What Happened to the ``Protein Structure ...organism genomes. With the scientific focus of interest moving toward larger macromolecular complexes and dynamic networks

Structure

Review

workshop on applications of protein models in biomedical research. Structure17, 151–159.

Seetin, M.G., and Mathews, D.H. (2012). RNA structure prediction: an over-view of methods. Methods Mol. Biol. 905, 99–122.

Shen, Y., Vernon, R., Baker, D., and Bax, A. (2009). De novo protein structuregeneration from incomplete chemical shift assignments. J. Biomol. NMR 43,63–78.

Sievers, S.A., Karanicolas, J., Chang, H.W., Zhao, A., Jiang, L., Zirafi, O.,Stevens, J.T., Munch, J., Baker, D., and Eisenberg, D. (2011). Structure-baseddesign of non-natural amino-acid inhibitors of amyloid fibril formation. Nature475, 96–100.

Sim, A.Y., Minary, P., and Levitt, M. (2012). Modeling nucleic acids. Curr. Opin.Struct. Biol. 22, 273–278.

Skolnick, J., Zhou, H., and Gao, M. (2013). Are predicted protein structures ofany value for binding site prediction and virtual ligand screening? Curr. Opin.Struct. Biol. 23, 191–197.

Stein, A., Mosca, R., and Aloy, P. (2011). Three-dimensional modeling ofprotein interactions and complexes is going ’omics. Curr. Opin. Struct. Biol.21, 200–208.

Sutcliffe, M.J., Haneef, I., Carney, D., and Blundell, T.L. (1987). Knowledgebased modelling of homologous proteins, Part I: Three-dimensional frame-works derived from the simultaneous superposition of multiple structures.Protein Eng. 1, 377–384.

Tamaskovic, R., Simon, M., Stefan, N., Schwill, M., and Pluckthun, A. (2012).Designed ankyrin repeat proteins (DARPins) from research to therapy.Methods Enzymol. 503, 101–134.

Terwilliger, T.C. (2011). The success of structural genomics. J. Struct. Funct.Genomics 12, 43–44.

Trewhella, J., Hendrickson, W.A., Kleywegt, G.J., Sali, A., Sato, M., Schwede,T., Svergun, D.I., Tainer, J.A., Westbrook, J., and Berman, H.M. (2013). Reportof the wwPDB Small-Angle Scattering Task Force: data requirements forbiomolecular modeling and the PDB. Structure 21, 875–881.

UniProt Consortium. (2013). Update on activities at the Universal ProteinResource (UniProt) in 2013. Nucleic Acids Res. 41(Database issue), D43–D47.

Unwalla, R.J., Cross, J.B., Salaniwal, S., Shilling, A.D., Leung, L., Kao, J., andHumblet, C. (2010). Using a homology model of cytochrome P450 2D6 to pre-dict substrate site of metabolism. J. Comput. Aided Mol. Des. 24, 237–256.

Uversky, V.N., and Dunker, A.K. (2013). The case for intrinsically disorderedproteins playing contributory roles in molecular recognition without a stable3D structure. F1000 Biol. Rep. 5, 1.

Vakser, I.A. (2013). Low-resolution structural modeling of protein interactome.Curr. Opin. Struct. Biol. 23, 198–205.

Velazquez-Muriel, J., Lasker, K., Russel, D., Phillips, J., Webb, B.M., Schneid-man-Duhovny, D., and Sali, A. (2012). Assembly of macromolecular com-plexes by satisfaction of spatial restraints from electron microscopy images.Proc. Natl. Acad. Sci. USA 109, 18821–18826.

Vendeville, A., Lariviere, D., and Fourmentin, E. (2011). An inventory of the bac-terial macromolecular components and their spatial organization. FEMSMicrobiol. Rev. 35, 395–414.

Vidal, M., Cusick, M.E., and Barabasi, A.L. (2011). Interactome networks andhuman disease. Cell 144, 986–998.

1540 Structure 21, September 3, 2013 ª2013 Elsevier Ltd All rights r

Walzthoeni, T., Leitner, A., Stengel, F., and Aebersold, R. (2013). Mass spec-trometry supported determination of protein complex structure. Curr. Opin.Struct. Biol. 23, 252–260.

Ward, J.J., Sodhi, J.S., McGuffin, L.J., Buxton, B.F., and Jones, D.T. (2004).Prediction and functional analysis of native disorder in proteins from the threekingdoms of life. J. Mol. Biol. 337, 635–645.

Ward, A.B., Sali, A., andWilson, I.A. (2013). Biochemistry. Integrative structuralbiology. Science 339, 913–915.

Wass, M.N., David, A., and Sternberg, M.J. (2011). Challenges for the predic-tion of macromolecular interactions. Curr. Opin. Struct. Biol. 21, 382–390.

Watson, J.D., and Crick, F.H. (1953). Molecular structure of nucleic acids; astructure for deoxyribose nucleic acid. Nature 171, 737–738.

Webb, B., Lasker, K., Schneidman-Duhovny, D., Tjioe, E., Phillips, J., Kim,S.J., Velazquez-Muriel, J., Russel, D., and Sali, A. (2011). Modeling of proteinsand their assemblies with the integrative modeling platform. Methods Mol.Biol. 781, 377–397.

Wei, Q., Xu, Q., and Dunbrack, R.L., Jr. (2013). Prediction of phenotypes ofmissense mutations in human proteins from biological assemblies. Proteins81, 199–213.

Weinkam, P., Pons, J., and Sali, A. (2012). Structure-based model of allosterypredicts coupling between distant sites. Proc. Natl. Acad. Sci. USA 109, 4875–4880.

Whitehead, T.A., Baker, D., and Fleishman, S.J. (2013). Computational designof novel protein binders and experimental affinity maturation. Methods Enzy-mol. 523, 1–19.

Wiederstein, M., and Sippl, M.J. (2007). ProSA-web: interactive web servicefor the recognition of errors in three-dimensional structures of proteins.Nucleic Acids Res. 35(Web Server issue), W407–W410.

Xu, Q., and Dunbrack, R.L., Jr. (2011). The protein common interface database(ProtCID)—a comprehensive database of interactions of homologous proteinsin multiple crystal forms. Nucleic Acids Res. 39(Database issue), D761–D770.

Yahav, T., Maimon, T., Grossman, E., Dahan, I., and Medalia, O. (2011). Cryo-electron tomography: gaining insight into cellular processes by structuralapproaches. Curr. Opin. Struct. Biol. 21, 670–677.

Yang, Z., Lasker, K., Schneidman-Duhovny, D., Webb, B., Huang, C.C., Pet-tersen, E.F., Goddard, T.D., Meng, E.C., Sali, A., and Ferrin, T.E. (2012).UCSF Chimera, MODELLER, and IMP: an integrated modeling system.J. Struct. Biol. 179, 269–278.

Zhang, Q.C., Petrey, D., Deng, L., Qiang, L., Shi, Y., Thu, C.A., Bisikirska, B.,Lefebvre, C., Accili, D., Hunter, T., et al. (2012). Structure-based predictionof protein-protein interactions on a genome-wide scale. Nature 490, 556–560.

Zhang, Y. (2013). Interplay of I-TASSER and QUARK for template-based andab initio protein structure prediction in CASP10. Proteins. http://dx.doi.org/10.1002/prot.24341, Published June 13, 2013.

Zhang, Y., Thiele, I., Weekes, D., Li, Z., Jaroszewski, L., Ginalski, K., Deacon,A.M., Wooley, J., Lesley, S.A., Wilson, I.A., et al. (2009). Three-dimensionalstructural view of the central metabolic network of Thermotoga maritima.Science 325, 1544–1549.

Zhao, G., Perilla, J.R., Yufenyuy, E.L., Meng, X., Chen, B., Ning, J., Ahn, J.,Gronenborn, A.M., Schulten, K., Aiken, C., and Zhang, P. (2013). MatureHIV-1 capsid structure by cryo-electron microscopy and all-atom moleculardynamics. Nature 497, 643–646.

eserved