1. Interdisciplinary Applied Mathematics Volume 21 Series
Editors S.S. Antman J.E. Marsden L. Sirovich Series Advisors C.L.
Bris L. Glass P. S. Krishnaprasad R.V. Kohn J.D. Murray S.S. Sastry
Geophysics and Planetary Sciences Imaging, Vision, and Graphics D.
Geman Mathematical Biology L. Glass, J.D. Murray Mechanics and
Materials R.V. Kohn Systems and Control S.S. Sastry, P.S.
Krishnaprasad Problems in engineering, computational science, and
the physical and biological sciences are using increasingly
sophisticated mathematical techniques. Thus, the bridge between the
mathematical sciences and other disciplines is heavily traveled.
The correspond- ingly increased dialog between the disciplines has
led to the establishment of the series: Interdisciplinary Applied
Mathematics. The purpose of this series is to meet the current and
future needs for the interaction between various science and
technology areas on the one hand and mathematics on the other. This
is done, rstly, by encouraging the ways that mathematics may be
applied in traditional areas, as well as point towards new and
innovative areas of applications; and, secondly, by encouraging
other scientic disciplines to engage in a dialog with mathe-
maticians outlining their problems to both access new methods and
suggest innovative developments within mathematics itself. The
series will consist of monographs and high-level texts from
researchers working on the interplay between mathematics and other
elds of science and technology. For further volumes:
http://www.springer.com/series/1390
2. Reviews of First Edition (continued from back cover) The
interdisciplinary structural biology community has waited long for
a book of this kind which provides an excellent introduction to
molecular modeling. (Harold A. Scheraga, Cornell University) A
uniquely valuable introduction to the modeling of biomolecular
structure and dynamics. A rigorous and up-to-date treatment of the
foundations, enlivened by engaging anecdotes and historical notes.
(J. Andrew McCammon, Howard Hughes Medical Institute, University of
California at San Diego) The text is beautifully illustrated with
many color illustrations. Even part of the text is type- set in
color. Not only the illustrations interrupt the very readable text,
there are also many box-insertions . . . (Adhemar Bultheel,
Bulletin of the Belgian Mathematical Society, Vol. 11 (4), 2004)
This textbook evolved from a graduate course in molecular modeling,
and was expanded to serve as an introduction to the eld for
scientists in other disciplines. . . . The book is unique in that
it combines introductory molecular biology with advanced topics in
modern simulation algorithms . . . . the author provides 1000+
references, and additionally includes reading lists complementing
the main text. This is an excellent introductory text that is a
pleasure to read. (Henry van den Bedem, MathSciNet, September,
2004) This book provides an excellent introduction to the modeling
of biomolecular structures and dynamics. . . . The books appendices
complement the material in the main text through home- work
assignments, reading lists, and other information useful for
teaching molecular modeling. The book is intended for students of
an interdisciplinary graduate course in molecular modeling as well
as for researchers (physicists, mathematicians and engineers) to
get them started in com- putational molecular biology. (Ivan Krivy,
University of Ostrava, Czech Republic, Zentralblatt MATH, Issue
1011, 2003) The book . . . is the outcome of the author Tamar
Schlicks teaching experience at New York University. It is a
fantastic graduate textbook to get into structural biology. . . .
even the most sophisticated problems are part of a gradual approach
. . . . The book will obviously be of great interest to students
and teachers but it should also be very valuable for research
scientists, espe- cially newcomers to the eld . . . as a reference
book and a point of entry in the more specialised literature.
(Benjamin Audit, Bioinformatics, January, 2003) The basic goal of
this new text is to introduce students to molecular modelling and
sim- ulation and to the wide range of biomolecular problems being
attacked by computational techniques. . . . the text emphasises
that the eld is changing very rapidly and that it is full of
exciting discoveries. . . . This book stimulates this excitement,
while still providing students many computational details. . . . It
contains detailed illustrations throughout ... . It should appeal
to beginning graduate students . . . in many scientic departments
... . (Biotech International, Vol. 15 (2), 2003)
3. Tamar Schlick Molecular Modeling and Simulation An
Interdisciplinary Guide 2nd edition 123
4. Prof. Tamar Schlick New York University Courant Institute of
Mathematical Sciences and Department of Chemistry 251 Mercer Street
New York, NY 10012 USA [email protected] Editors S.S. Antman
Department of Mathematics and Institute for Physical Science and
Technology University of Maryland College Park, MD 20742, USA
[email protected] J.E. Marsden Control and Dynamical Systems Mail
Code 107-81 California Institute of Technology Pasadena, CA 91125,
USA [email protected] L. Sirovich Department of
Biomathematics Laboratory of Applied Mathematics Mt. Sinai School
of Medicine Box 1012 New York, NY 10029 USA
[email protected] ISSN 0939-6047 ISBN 978-1-4419-6350-5
e-ISBN 978-1-4419-6351-2 DOI 10.1007/978-1-4419-6351-2 Springer New
York Dordrecht Heidelberg London Library of Congress Control
Number: 2010929799 Mathematics Subject Classication (2010): MSC
2010: 62P10, 65C05, 65C10, 65C20, 68U20, 92B05, 92C05, 92C40,
92E10, 97M60 c Springer Science+Business Media, LLC 2010 All rights
reserved. This work may not be translated or copied in whole or in
part without the written permission of the publisher (Springer
Science+Business Media, LLC, 233 Spring Street, New York, NY 10013,
USA), except for brief excerpts in connection with reviews or
scholarly analysis. Use in connection with any form of information
storage and retrieval, electronic adaptation, computer software, or
by similar or dissimilar methodology now known or hereafter
developed is forbidden. The use in this publication of trade names,
trademarks, service marks, and similar terms, even if they are not
identied as such, is not to be taken as an expression of opinion as
to whether or not they are subject to proprietary rights. Printed
on acid-free paper Springer is part of Springer Science+Business
Media (www.springer.com)
5. About the Cover Molecular modelers are artists in some
respects. Their subjects are complex, irregular, multiscaled,
highly dynamic, and sometimes multifarious, with diverse states and
functions. To study these complex phenomena, modelers must apply
computer programs based on precise algorithms that stem from solid
laws and theories from mathematics, physics, and chemistry. Like
innovative chefs, they also borrow their inspiration from other
elds and blend the ingredients and ideas to create appealing
inventive dishes. The West-Coast-inspired landscape paintings of
artist Wayne Thiebaud, whose work Reservoir Study decorated the
cover of the rst edition of this book, em- bodied that productive
blend of nonuniformity with orderliness as well as the multiplicity
in perspectives and interpretations central to molecular modeling.
For this edition, the collage on the back cover (created with
Shereef Elmetwaly) reects such an amalgam of foundations,
techniques, and applications. The com- puter salad image on the
front cover (created with Namhee Kim and James Van Arsdale) further
reects a vision for the near future when modeling and simulation
techniques will be reliable so as to compute folded structures and
other desired as- pects of biomolecular structure, motion, and
function. I hope such creative blends will trigger readers appetite
for more creations to come.
6. vi About the Cover
7. To the memory of my beloved aunt Cecilia, who lled my life
with love, joy, beauty, and courage which I will forever carry with
me.
8. Book URLs For Text: www.biomath.nyu.edu/index/book.html For
Course: www.biomath.nyu.edu/index/course/IndexMM.html
9. Preface As I update parts of this textbook seven years after
the original edition, I nd the progress in the eld to be
overwhelming, almost untting to justify maintain- ing the same
book. In fact, the sports analogy Bigger, faster, stronger seems
most appropriate to the eld of biomolecular modeling. Indeed, as
modeling and simulation are used to probe more biological and
chemical processes with improved force elds and algorithms and
faster computational platforms new discoveries are being made that
help interpret as well as extend experimental data. To
experimentalists and theoreticians alike, modeling remains a
valuable, albeit challenging, tool for probing numerous
conformational, dynamic, and thermo- dynamic questions. We can
certainly anticipate more exciting developments in biomolecular
modeling as the rst decade of this new century has ended and an-
other began. At the same time, we should be reminded by the wisdom
of the great French mathematician and scientist Pierre Simon de
Laplace, who I quote more than once in this text, who also said: Ce
que nous connaissons est peu de chose; ce que nous ignorons est
immense. (What we know is little; what we do not know is immense).
Besides small additions and revisions made throughout the text and
displayed materials to reect the latest literature and eld
developments, some chapters have undergone more extensive revisions
for this second edition. These include Chapters 1 and 2 that
provide a historical perspective and an overview of current
applications to biomolecular systems; Chapter 4, which reects
modied protein classication with new protein examples and sequence
statistics; the chapter Top- ics in Nucleic Acids (now expanded
into two chapters, 6 and 7), which includes recent developments in
RNA structure and function; the force eld chapters 810, which
contain new sections on enhanced sampling methods; Chapter 15,
which
10. xii Preface includes an update on pharmacogenomics
developments; and Appendices B and C which list key papers in the
eld and reference books, respectively. As in the original book, the
focus is on a broad and critical introduction to the eld rather
than a comprehensive view, though some algorithmic topics are
presented in more depth. There are many books now since the rst
edition was written that provide more details on various aspects of
biomolecular modeling and simulation (see Appendix C). I would like
to thank my many lab members and colleagues who have con- tributed
to this effort, by providing scientic and technical information,
making gures, and/or reading various versions of this text,
including Lisa Chase, Rosana Collepardo, Ron Dror, Shereef
Elmetwaly, Meredith Foley, Joachim Frank, Hin Hark Gan, Joe Izzo,
Namhee Kim, Itzhak Krinsky, Christian Laing, Pierre LEcuyer, Connie
Lee, Rubisco Li, Michael Overton, Vijay Pande, Ogi Perisic, Giulio
Quarta, Klaus Schulten, Rick Solway, James Van Arsdale, Arieh
Warshel, Michael Watters, Ada Yonath, and Yingkai Zhang. As before,
I invite readers to share their comments and thoughts with me
directly via email; I enjoy reading them all. Tamar Schlick New
York, NY March 2, 2010
11. Preface xiii Preface to the 2002 Edition Science is a way
of looking, reverencing. And the purpose of all science, like
living, which amounts to the same thing, is not the ac- cumulation
of gnostic power, the xing of formulas for the name of God, the
stockpiling of brutal efciency, accomplishing the sadistic myth of
progress. The purpose of science is to revive and cultivate a
perpetual state of wonder. For nothing deserves wonder so much as
our capacity to experience it. Roald Hoffman and Shira Leibowitz
Schmidt, in Old Wine, New Flasks: Reections on Science and Jewish
Tradition (W.H. Freeman, 1997). Challenges in Teaching Molecular
Modeling This textbook evolved from a graduate course termed
Molecular Modeling intro- duced in the fall of 1996 at New York
University. The primary goal of the course is to stimulate
excitement for molecular modeling research much in the spirit of
Hoffman and Leibowitz Schmidt above while providing grounding in
the discipline. Such knowledge is valuable for research dealing
with many practical problems in both the academic and industrial
sectors, from developing treatments for AIDS (via inhibitors to the
protease enzyme of the human immunodeciency virus, HIV-1) to
designing potatoes that yield spot-free potato chips (via trans-
genic potatoes with altered carbohydrate metabolism). In the course
of writing this text, the notes have expanded to function also as
an introduction to the eld for scientists in other disciplines by
providing a global perspective into problems and approaches, rather
than a comprehensive survey. As a textbook, my intention is to
provide a framework for teachers rather than a rigid guide, with
material to be supplemented or substituted as appropriate for the
audience. As a reference book, scientists who are interested in
learning about biomolecular modeling may view the book as a broad
introduction to an exciting new eld with a host of challenging,
interdisciplinary problems. The intended audience for the course is
beginning graduate students in medical schools and in all scientic
departments: biology, chemistry, physics, mathe- matics, computer
science, and others. This interdisciplinary audience presents a
special challenge: it requires a broad presentation of the eld but
also good cover- age of specialized topics to keep experts
interested. Ideally, a good grounding in basic biochemistry,
chemical physics, statistical and quantum mechanics, scien- tic
computing (i.e., numerical methods), and programming techniques is
desired. The rarity of such a background required me to offer
tutorials in both biological and mathematical areas.
12. xiv Preface The introductory chapters on biomolecular
structure are included in this book (after much thought) and are
likely to be of interest to physical and mathematical scientists.
Chapters 3 and 4 on proteins, together with Chapters 5 and 6 on
nucleic acids, are thus highly abbreviated versions of what can be
found in numerous texts specializing in these subjects. The
selections in these tutorials also reect some of my groups areas of
interest. Because many introductory and up-to-date texts exist for
protein structure, only the basics in protein structure are
provided, while a somewhat more expanded treatment is devoted to
nucleic acids. Similarly, the introductory material on mathematical
subjects such as basic op- timization theory (Chapter 10) and
random number generators (Chapter 11) is likely to be of use more
to readers in the biological / chemical disciplines. General
readers, as well as course instructors, can skip around this book
as appropriate and ll in necessary gaps through other texts (e.g.,
in protein structure or programming techniques). Text Limitations
By construction, this book is very broad in scope and thus no
subjects are covered in great depth. References to the literature
are only representative. The material presented is necessarily
selective, unbalanced in parts, and reects some of my areas of
interest and expertise. This text should thus be viewed as an
attempt to introduce the discipline of molecular modeling to
students and to scientists from disparate elds, and should be taken
together with other related texts, such as those listed in Appendix
C, and the representative references cited. The book format is
somewhat unusual for a textbook in that it is nonlinear in parts.
For example, protein folding is introduced early (before protein
basics are discussed) to illustrate challenging problems in the eld
and to interest more ad- vanced readers; the introduction to
molecular dynamics incorporates illustrations that require more
advanced techniques for analysis; some specialized topics are also
included throughout. For this reason, I recommend that students
re-read cer- tain parts of the book (e.g., rst two chapters) after
covering others (e.g., the biomolecular tutorial chapters). Still,
I hope most of all to grab the readers attention with exciting and
current topics. Given the many caveats of introducing and teaching
such a broad and inter- disciplinary subject as molecular modeling,
the book aims to introduce selected biomolecular modeling and
simulation techniques, as well as the wide range of biomolecular
problems being tackled with these methods. Throughout these pre-
sentations, the central goal is to develop in students a good
understanding of the inherent approximations and errors in the eld
so that they can adequately as- sess modeling results. Diligent
students should emerge with basic knowledge in modeling and
simulation techniques, an appreciation of the fundamental prob-
lems such as force eld approximations, nonbonded evaluation
protocols, size and timestep limitations in simulations and a
healthy critical eye for research. A historical perspective and a
discussion of future challenges are also offered.
13. Preface xv Dazzling Modeling Advances Demand Perspective
The topics I chose for this course are based on my own unorthodox
introduc- tion to the eld of modeling. As an applied mathematician,
I became interested in the eld during my graduate work, hearing
from Professor Suse Broyde whose path I crossed thanks to Courant
Professor Michael Overton about the fascinating problem of modeling
carcinogen/DNA adducts. The goal was to understand some structural
effects induced by certain com- pounds on the DNA (deduced by
energy minimization); such alterations can render DNA more
sensitive to replication errors, which in turn can eventually lead
to mutagenesis and carcinogenesis. I had to roam through many
references to obtain a grasp of some of the underlying concepts
involving force elds and simulation protocols, so many of which
seemed so approximate and not fully physically grounded. By now,
however, I have learned to appreciate the practical procedures and
compromises that computational chemists have formulated out of
sheer necessity to obtain answers and insights into important
biological processes that cannot be tackled by instrumentation. In
fact, approximations and simpli- cations are not only tolerated
when dealing with biomolecules; they often lead to insights that
cannot easily be obtained from more detailed representations. Fur-
thermore, it is often the neglect of certain factors that teaches
us their importance, sometimes in subtle ways. For example, when
Suse Broyde and I viewed in the mid 1980s her intriguing
carcinogen/modied DNA models, we used a large Evans and Sutherland
com- puter while wearing special stereoviewers; the hard-copy
drawings were ball and stick models, though the dimensionality
projected out nicely in black and white. (Today, we still use
stereo glasses, but current hardware stereo capabilities are much
better, and marvelous molecular renderings are available). At that
time, only small pieces of DNA could be modeled, and the
surrounding salt and solvent en- vironment was approximated. Still,
structural and functional insights arose from those earlier works,
many of which were validated later by more comprehensive
computation, as well as laboratory experiments. Book Overview The
book provides an overview of three broad topics: (a) biomolecular
struc- ture and modeling: current problems and state of
computations (Chapters 16); (b) molecular mechanics: force eld
origin, composition, and evaluation tech- niques (Chapters 79); and
(c) simulation techniques: conformational sampling by geometry
optimization, Monte Carlo, and molecular dynamics approaches
(Chapters 1013). Chapter 14 on the similarity and diversity
problems in chem- ical design introduces some of the challenges in
the growing eld related to combinatorial chemistry. Specically,
Chapters 1 and 2 give a historical perspective of biomolec- ular
modeling, outlining progress in experimental techniques, the
current
14. xvi Preface computational challenges, and the practical
applications of this enterprise to convey the immense interest in,
and support of, the discipline. Since these chapters discuss
rapidly changing subjects (e.g., genome projects, disease treat-
ments), they will be updated as possible on the text website.
General readers may nd these chapters useful as an introduction to
biomolecular modeling and its applications. Chapters 3 and 4 review
the basic elements in protein structure, and Chapter 5 similarly
presents the basic building blocks and conformational exibility in
nu- cleic acids. Chapter 6 presents additional topics in nucleic
acids, such as DNA sequence effects, DNA/protein interactions,
departures from the canonical DNA helix forms, RNA structure, and
DNA supercoiling. The second part of the book begins in Chapter 7
with a view of the discipline of molecular mechanics as an
offspring of quantum mechanics and discusses the basic premises of
molecular mechanics formulations. A detailed presentation of the
force eld terms origin, variation, and parameterization is given in
Chapter 8. Chapter 9 is then devoted to the computation of the
nonbonded energy terms, including cutoff techniques, Ewald and
multipole schemes, and continuum solvation alternatives. The third
part of the book, simulation algorithms,1 begins with a description
of optimization methods for multivariate functions in Chapter 10,
emphasizing the tradeoff between algorithm complexity and
performance. Basic issues of Monte Carlo techniques, appropriate to
a motivated novice, are detailed in Chapter 11, such as
pseudorandom number generators, Gaussian random variates, Monte
Carlo sampling, and the Metropolis algorithm. Chapters 12 and 13
describe the algorithmic challenges in biomolecular dynamics
simulations and present var- ious categories of integration
techniques, from the popular Verlet algorithm to multiple-timestep
techniques and Brownian dynamics protocols. Chapter 14 out- lines
the challenges in similarity and diversity sampling in the eld of
chemical design, related to the new eld of combinatorial chemistry.
The book appendices complement the material in the main text
through homework assignments, reading lists, and other information
useful for teaching molecular modeling. Instructors may nd the
sample course syllabus in Appendix A helpful. Impor- tant also to
teaching is an introduction to the original literature; a
representative reading list of articles used for the course is
collected in Appendix B. An annotated general reference list is
given in Appendix C. Selected biophysics applications are
highlighted through the homework as- signments (Appendix D). Humor
in the assignments stimulates creativity in many students. These
homeworks are a central component of learning molecular 1The word
algorithm is named after the ninth-century Persian (Iranian in
present-day terminol- ogy) mathematician al-Khwarizmi (nicknamed
after his home town of Khwarizm, now Khiva in the Uzbek Republic),
who stressed the importance of methodical procedures for solving
problems in his algebra textbook. The term has evolved to mean the
systematic process of solving problems by machine execution.
15. Preface xvii modeling, as they provide hands-on experience,
extend upon subjects covered in the chapters, and expose the
students to a wide range of current topics in bio- molecular
structure. Advanced students may use these homework assignments to
learn about molecular modeling through independent research. Many
homework assignments involve a molecular modeling software package.
I selected the Insight program in conjunction with our Silicon
Graphics computer laboratory, but other suitable modeling programs
can be used. Students also learn other basic research tools (such
as programming and literature searches) through the homeworks. Our
memorable force eld debate (see homework 7 in Appendix D) even
brought the AMBER team to class in white lab coats, each accented
with a name tag corresponding to one of AMBERs original authors.
The late Peter Kollman would have been pleased. Harold Scheraga
would have been no less impressed by the long list of ECEPP
successes prepared by his loyal troopers. Martin Karplus would not
have been disappointed by the strong proponents of the CHARMM
approach. I only hope to have as much spunk and talent in my future
molecular modeling classes. Extensive use of web resources is
encouraged, while keeping in mind the caveat of lack of general
quality control. I was amazed to nd some of my students discoveries
regarding interesting molecular modeling topics mentioned in the
classroom, especially in the context of the term project, which
requires them to nd outstanding examples of the successes and/or
failures of molecular modeling. Interested readers might also want
to glance at additional course information as part of my groups
home page, monod.biomath.nyu.edu/. Supplementary text information
(such as program codes and gure les) can also be obtained. To
future teachers of molecular modeling who plan to design similar
assign- ments and material, I share with you my following
experience regarding student reactions to this discipline: what
excited students the most about the subject mat- ter and led to
enthusiasm and excellent feedback in the classroom were the rapid
pace at which the eld is developing, its exciting discoveries, and
the medical and technological breakthroughs made possible by
important ndings in the eld. In more practical terms, a mathematics
graduate student, Brynja Kohler, expressed this enthusiasm
succinctly in the introduction to her term project: As I was doing
research for this assignment, I found that one inter- esting
article led to another. Communication via e-mail with some
researchers around the world about their current investigations
made me eagerly anticipate new results. The more I learned the more
easy it became to put off writing a nal draft because my curiosity
would lead me on yet another line of inquiry. However, alas, there
comes a time when even the greatest procrastinator must face the
music, and evaluate what it is that we know and not linger upon
what we hope to nd out. Future teachers are thus likely to have an
enjoyable experience with any good group of students.
16. xviii Preface Acknowledgments I am indebted to Jing Huang
for her devoted assistance with the manuscript prepa- ration, le
backups, data collection, and gure design. I also thank Wei Xu and
Mulin Ding for important technical assistance. I am grateful to my
other devoted current and former group members who helped read book
segments, collect data, prepare the gures found throughout this
book, and run to libraries throughout New York City often: Karunesh
Arora, Danny Barash, Paul Batcho, Dan Beard, Mulin Ding, Hin Hark
Gan, Jennifer Isbell, Joyce Noah, Xiaoliang Qian, Sonia Rivera,
Adrian Sandu, Dan Strahs, Dexuan Xie, Linjing Yang, and Qing Zhang.
Credits for each book gure and table are listed on the texts
website. I thank my colleagues Ruben Abagyan, Helen Berman, Dave
Case, Jonathan Goodman, Andrej Sali, and Harold Scheraga, who gave
excellent guest lectures in the course; and my course assistants
Karunesh Arora, Margaret Mandziuk, Qing Zhang, and Zhongwei Zhu for
their patient, dedicated assistance to the students with their
homework and queries. I am also very appreciative of the following
colleagues for sharing reprints, information, and unpublished data
and/or for their willingness to comment on segments of the book:
Lou Allinger, Nathan Baker, Mike Beer, Helen Berman, Suse Broyde,
John Board, Dave Beveridge, Ken Breslauer, Steve Burley, Dave Case,
Philippe Derreumaux, Ron Elber, Eugene Fluder, Leslie Greengard,
Steve Harvey, Jan Hermans, the late Peter Kollman, Robert Krasny,
Michael Levitt, Xiang-Jun Lu, Pierre LEcuyer, Neocles Leontis, the
late Shneior Lifson, Kenny Lipkowitz, Jerry Manning, Andy McCammon,
Mihaly Mezei, Jorge Nocedal, Wilma Olson, Michael Overton, Vijay
Pande, Dinshaw Patel, Harold Scheraga, Shulamith Schlick, Klaus
Schulten, Suresh Singh, Bob Skeel, A.R. Srinivasan, Emad
Tajkhorshid, Yuri Ushkaryov, Wilfred van Gunsteren, Arieh Warshel,
Eric Westhof, Weitao Yang, and Darren York. Of special note are the
extremely thor- ough critiques which I received from Lou Allinger,
Steve Harvey, Jerry Manning, Robert Krasny, Wilma Olson, and Bob
Skeel; their extensive comments and sug- gestions led to
enlightening discussions and helped me see the eld from many
perspectives. I thank my colleague and friend Suse Broyde for
introducing me to the eld and for reading nearly every page of this
books draft. To my family parents Haim and Shula, sisters Yael and
Daphne, aunt Cecilia, and especially Rick and Duboni I am grateful
for tolerating my long months on this project. Finally, I thank my
excellent students for making the course enjoyable and inspiring.
Tamar Schlick New York, NY June 10, 2002
17. Prelude Every sentence I utter must be understood not as an
afrmation but as a question. Niels Bohr (18851962). Only rarely
does science undergo a dramatic transformation that can be likened
to a tectonic rumble, as its character is transgured under the
weights of changing forces. We are now in such an exciting time.
The discovery of the DNA double helix in the early 1950s pregured
the rise of molecular biology and its many off- spring in the next
half century, just as the rise of Internet technology in the 1980s
and 1990s has molded, and is still reshaping, nearly every aspect
of contemporary life. With completion of the rst draft of the human
genome sequence trumpeting the beginning of the twenty-rst century,
triumphs in the biological sciences are competing with geopolitics
and the economy for prominent-newspaper head- lines. The genomic
sciences now occupy the center stage, linking basic to applied
(medical) research, applied research to commercial success and
economic growth, and the biological sciences to the chemical,
physical, mathematical and computer sciences. The subject of this
text, molecular modeling, represents a subeld of this suc- cessful
marriage. In this text, I attempt to draw to the eld newcomers from
other disciplines and to share basic knowledge in a modern context
and interdisci- plinary perspective. Though many details on current
investigations and projects will undoubtedly become obsolete as
soon as this book goes to press, the ba- sic foundations of
modeling will remain similar. Over the next decades, we will surely
witness a rapid growth in the eld of molecular modeling, as well as
many success stories in its application.
18. Contents About the Cover v Book URLs ix Preface xi Prelude
xix Table of Contents xxi List of Figures xxxiii List of Tables
xxxix Acronyms, Abbreviations, and Units xli 1 Biomolecular
Structure and Modeling: Historical Perspective 1 1.1 A
Multidisciplinary Enterprise . . . . . . . . . . . . . . . . 2
1.1.1 Consilience . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 What is Molecular Modeling? . . . . . . . . . . . . 3 1.1.3
Need For Critical Assessment . . . . . . . . . . . . . 5 1.1.4 Text
Overview . . . . . . . . . . . . . . . . . . . . . 6 1.2 The Roots
of Molecular Modeling in Molecular Mechanics . 8 1.2.1 The
Theoretical Pioneers . . . . . . . . . . . . . . . 8 1.2.2
Biomolecular Simulation Perspective . . . . . . . . . 11
37. Acronyms, Abbreviations, and Units A A adenine (purine
nitrogenous base) A angstrom (1010 m) AdMLP adenovirus major late
promoter (protein) AIDS acquired immune deciency syndrome Ala (A)
alanine Arg (R) arginine Asn (N) asparagine Asp (D) aspartic acid
AS Altona/Sundaralingam (sugar description) ATP adenosine
triphosphate (energy source) AZT zidovudine (AIDS drug) B bp base
pair bps base pairs BAC bacterial articial chromosome BOES
Born-Oppenheimer energy surfaces BPTI bovine pancreatic trypsin
inhibitor BSE bovine spongiform encephalopathy (mad cow disease) C
cm centimeter (102 m) C cytosine (pyrimidine nitrogenous base)
38. xlii Acronyms, Abbreviations, and Units CAP catabolite gene
activator protein CASP Critical Assessment of Techniques for
Protein Structure Prediction CG Conjugate gradient method (for
minimization) CJD Creutzfeld-Jakob disease (brain disorder, human
version of BSE) CN Crigler-Najjar (debilitating disease, gene
therapy applications) CP Cremer/Pople (sugar description) CPU
central processing units Cys (C) cysteine D DFT density functional
theory (quantum mechanics approach) DH Debye-Huckel DNA
deoxyribonucleic acid (also A-, B-, C-, D-, P-, S-, T-, and Z-DNA)
DOE Department of Energy E erg energy unit (107 J) EM electron
microscopy F fs femtosecond (1015 s) FFT Fast Fourier Transforms G
G guanine (purine nitrogenous base) Gln (Q) glutamine Glu (E)
glutamic acid Gly (G) glycine GSS Gerstmann-Straussler-Scheinker
disease (brain disorder similar to CJD) H HDV hepatitis delta
helper virus His (H) histidine HIV human immunodeciency virus HMC
hybrid Monte Carlo HTH helix/turn/helix (motif) Hz hertz (inverse
second) I Ile (I) isoleucine IHF integration host factor
(protein)
39. Acronyms, Abbreviations, and Units xliii K kbp kilobase
pairs kcal/mol kilocalories per mole (energy unit) kDa kilodaltons
(mass unit used for proteins) KR Kirkwood-Riseman L Leu (L) leucine
Lys (K) lysine LCG linear congruential generator M m meter mgr
minor groove ms millisecond (103 s) s microsecond (106 s) mm
millimeter (103 m) MAD multiple isomorphous replacement
(crystallography technique) MC Monte Carlo MD molecular dynamics
Met (M) methionine Mgr major groove MIR multiwavelength anomalous
diffraction (crystallography technique) MLCG multiplicative linear
congruential generator MTS multiple-timestep methods (for MD) N nm
nanometer (109 m) ns nanosecond (109 s) NCBI National Center for
Biotechnology Information NASA National Aeronautics and Space
Administration NDB nucleic acid database NIH National Institutes of
Health NMR nuclear magnetic resonance NSF National Science
Foundation O OTC ornithine transcarbamylase (chronic ailment, gene
therapy applications) P pn picoNewton (force unit) ps picosecond
(1012 s)
40. xliv Acronyms, Abbreviations, and Units PB
Poisson-Boltzmann PBE Poisson-Boltzmann equation PC principal
component PCA principal component analysis PCR polymerase chain
reaction PDB protein databank Phe (F) phenylalanine PIR Protein
Information Resource PME particle-mesh Ewald PNA peptide nucleic
acid (DNA mimic) Pro (P) proline PrPC prion protein cellular
(harmless) PrPSc harmful isoform of PrPC , causes scrapie in sheep
Pur purine (base) Pyr pyrimidine (base) Q QM quantum mechanics QN
quasi Newton method (for minimization) QSAR quantitative
structure/activity relationships R RCSB Research Collaboratory for
Structural Bioinformatics RMS (rms) root-mean-square RMSD
root-mean-square deviations RNA ribonucleic acid (also cRNA, gRNA,
mRNA, rRNA, snRNA, tRNA) RT reverse transcriptase (AIDS protein) S
s second Ser (S) serine SAR structure/activity relationships SCF
self-consistent eld (quantum mechanical approach) SCOP structural
classication of proteins SD steepest descent method (for
minimization) SGI Silicon Graphics Inc. SNPs single-nucleotide
polymorphisms (snips) SRY sex determining region Y (protein) STS
single-timestep methods (for MD) SVD singular value
decomposition
41. Acronyms, Abbreviations, and Units xlv T T thymine
(pyrimidine nitrogenous base) Thr (T) threonine Trp (W) tryptophan
Tyr (Y) tyrosine TBP TATA-box DNA binding protein (transcription
regulator) TE transcription efciency TMD targeted molecular
dynamics TN truncated Newton method (for minimization) 2D
two-dimensional 3D three-dimensional U U uracil (pyrimidine
nitrogenous base) URL uniform resource locator UV ultraviolet
spectroscopy V Val (V) valine W WC Watson/Crick base pairing
42. 1 Biomolecular Structure and Modeling: Historical
Perspective Chapter 1 Notation SYMBOL DEFINITION Vectors h unit
cell identier (crystallography) r position Fh structure factor
(crystallography) h phase angle (crystallography) Scalars d
distance between parallel planes in the crystal Ih intensity,
magnitude of structure factor (crystallography) V cell volume
(crystallography) reection angle (crystallography) wavelength of
the X-ray beam (crystallography) . . . physics, chemistry, and
biology have been connected by a web of causal explanation
organized by induction-based theories that tele- scope into one
another. . . . Thus, quantum theory underlies atomic physics, which
is the foundation of reagent chemistry and its special- ized
offshoot biochemistry, which interlock with molecular biology
essentially, the chemistry of organic macromolecules and hence,
through successively higher levels of organization, cellular, T.
Schlick, Molecular Modeling and Simulation: An Interdisciplinary
Guide, 1 Interdisciplinary Applied Mathematics 21, DOI
10.1007/978-1-4419-6351-2 1, c Springer Science+Business Media, LLC
2010
43. 2 1. Biomolecular Structure and Modeling: Historical
Perspective organismic, and evolutionary biology. . . . Such is the
unifying and highly productive understanding of the world that has
evolved in the natural sciences. Edward O. Wilson: Resuming the
Enlightenment Quest, in The Wilson Quarterly, Winter 1998. 1.1 A
Multidisciplinary Enterprise 1.1.1 Consilience The exciting eld of
modeling molecular systems by computer has been steadily drawing
increasing attention from scientists in varied disciplines. In
particular, modeling large biological polymers proteins, nucleic
acids, and lipids is a truly multidisciplinary enterprise.
Biologists describe the cellular picture; chemists ll in the atomic
and molecular details; physicists extend these views to the
electronic level and the underlying forces; mathematicians analyze
and formulate appropriate numerical models and algorithms; and
computer scien- tists and engineers provide the crucial
implementational support for running large computer programs on
high-speed and extended-communication platforms. The many names for
the eld (and related disciplines) underscore its cross-disciplinary
nature: computational biology, computational chemistry, in silico
biology, com- putational structural biology, computational
biophysics, theoretical biophysics, theoretical chemistry, and the
list goes on. As the pioneer of sociobiology Edward O. Wilson
reects in the opening quote, some scholars believe in a unifying
knowledge for understanding our universe and ourselves, or
consilience1 that merges all disciplines in a biologically-grounded
framework [1377]. Though this link is most striking between
genetics and hu- man behavior through the neurobiological
underpinnings of states of mind and mental activity, with shaping
by the environment and lifestyle factors such a unication that
Wilson advocates might only be achieved by a close interaction
among the varied scientists at many stages of study. The genomic
era has such immense ramications on every aspect of our lives from
health to technology to law that it is not difcult to appreciate
the effects of the biomolecular rev- olution on our 21st-century
society. Undoubtedly, a more integrated synthesis of biological
elements is needed to decode life [584]. In biomolecular modeling,
a multidisciplinary approach is important not only because of the
many aspects involved from problem formulation to solution but also
since the best computational approach is often closely tailored to
the 1Consilience was coined in 1840 by the theologian and polymath
William Whewhell in his syn- thesis The Philosophy of the Inductive
Sciences. It literally means the alignment, or jumping together, of
knowledge from different disciplines. The sociobiologist Edward O.
Wilson took this notion fur- ther recently by advocating in his
1998 book Consilience [1377] that the world is orderly and can be
explained by a set of natural laws that are fundamentally rooted in
biology.
44. 1.1. A Multidisciplinary Enterprise 3 biological problem.
In the same spirit, close connections between theory and ex-
periment are essential: computational models evolve as experimental
data become available, and biological theories and new experiments
are performed as a result of computational insights.2 Although few
theoreticians in the eld have expertise in experimental work as
well, the classic example of Werner Heisenbergs genius in
theoretical physics but naivete in experimental physics is a case
in point: Heisenberg required the resolving power of the microscope
to derive the uncertainty relations. In fact, an error in the
experimental interpretations was pointed out by Niels Bohr, and
this eventually led to the Copenhagen interpretation of quantum
mechanics. If Wilsons vision is correct, the interlocking web of
scientic elds rooted in the biological sciences will succeed
ultimately in explaining not only the func- tioning of a
biomolecule and the workings of the brain, but also many aspects of
modern society, through the connections between our biological
makeup and human behavior. 1.1.2 What is Molecular Modeling?
Molecular modeling is the science and art of studying molecular
structure and function through model building and computation. The
model building can be as simple as plastic templates or metal rods,
or as sophisticated as interactive, ani- mated color stereographics
and laser-made wooden sculptures. The computations encompass ab
initio and semi-empirical quantum mechanics, empirical (molec-
ular) mechanics, molecular dynamics, Monte Carlo, free energy and
solvation methods, structure/activity relationships (SAR),
chemical/biochemical informa- tion and databases, and many other
established procedures. The renement of experimental data, such as
from nuclear magnetic resonance (NMR) or X-ray crystallography, is
also a component of biomolecular modeling. I often remind my
students of Pablo Picassos statement on art: Art is the lie that
helps tell the truth. This view applies aptly to biomolecular
modeling. Though our models represent a highly-simplied version of
the complex cellu- lar environment, systematic studies based on
tractable quantitative tools can help discern patterns and add
insights that are otherwise difcult to observe. The key in modeling
is to develop and apply models that are appropriate for the ques-
tions being examined with them. Thus, the models regime of
applicability must be clearly dened and its predictability power
demonstrated. A case in point is the use of limited historical data
on home prices for extrapolative modeling of mortgage-backed
securities and credit derivatives; the resulting mispricing of risk
was a contributor to the U.S. subprime loan crisis that started in
2007. The questions being addressed by computational approaches
today are as intriguing and as complex as the biological systems
themselves. They range 2See [176,362,395,396,948], for example, in
connection to the characterization of protein folding
mechanisms.
45. 4 1. Biomolecular Structure and Modeling: Historical
Perspective from understanding the equilibrium structure of a small
biopolymer subunit, to the energetics of hydrogen-bond formation in
proteins and nucleic acids, to the kinetics of protein folding, to
the complex functioning of a supramolecular aggregate. As
experimental triumphs are being reported in structure determi-
nation from ion channel proteins, signaling receptor proteins
(receptors), membrane transport proteins (transporters), ribosomes
(see Figs. 1.1 and 1.2), various nucleosomes (see gures in Chapter
6), and non-coding RNAs including new methodologies for their
solution, such as advanced NMR, cryo- electron microscopy, and
single-molecule biochemistry techniques, modeling approaches are
needed to pursue many fundamental questions concerning their
biological motions and functions. Modeling provides a way to
systematically explore structural/dynamical/thermodynamic patterns,
test and develop hypothe- ses, interpret and extend experimental
data, and help better understand and extend basic laws that govern
molecular structure, exibility, and function. In tandem with
experimental advances, algorithmic and computer technological ad-
vances, especially concerning distributed, loosely-coupled computer
networks, have made problems and approaches that were
insurmountable a few years ago now possible. Figure 1.1. The
inter-subunit interface of the two eubacterial ribosomal subunits
at 3 A resolution, showing their main architectural features. D50S
is the large ribosomal sub- unit from Deinococcus radiodurans
[516], and T30S is the small ribosomal subunit from Thermus
thermophilus [1135], showing the head, platform, shoulder and latch
(H,P,S,L, respectively). The cyan dots indicate the approximate
mRNA channel; A, P, and E are the approximate positions of the
anti-codon loops (on T30S) and the edges of the tR- NAs acceptor
stems (on D50S) of the three tRNA substrates: aminoacylated-tRNA
(A), peptidyl-tRNA (P), and Exit tRNA (E). Image was kindly
provided by Ada Yonath.
46. 1.1. A Multidisciplinary Enterprise 5 1.1.3 Need For
Critical Assessment The eld of biomolecular modeling is relatively
young, having started in the 1960s, and only gained momentum since
the mid 1980s with the advent of supercomputers. Yet the eld is
developing with astonishing speed. Advances are driven by
improvements in instrumentational resolution and genomic and
structural databases, as well as in force elds, algorithms for
conformational sam- pling and molecular dynamics, computer
graphics, and the increased computer power and memory capabilities.
These impressive technological and modeling advances are steadily
establishing the eld of theoretical modeling as a partner to
experiment and a widely used tool for research and development.
Small Subunit Large Subunit P-site transfer RNA A/T-site transfer
RNA Elongation factor Tu Figure 1.2. Cryo-EM view of of the 70S
ribosome particle solved by J. Franks group at 6.7 A resolution in
a complex with the EF-Tu-aa-tRNA ternary complex, GDP, and the
antibiotic kirromycin [710]. Images were kindly provided by Michael
Watters and Joachim Frank. Yet as we witness the tantalizing
progress, a cautionary usage of molecular modeling tools as well as
a critical perspective of the elds strengths and limita- tions are
warranted. This is because the current generation of users and
application scientists in the industrial and academic sectors may
not be familiar with some of the caveats and inherent
approximations in biomolecular modeling and simula- tion approaches
that the eld pioneers clearly recognized. Indeed, the tools and
programs developed by a handful of researchers several decades ago
have now re- sulted in extensive prot-making software for genomic
information, drug design, and every aspect of modeling. More than
ever, a comprehensive background in the methodology framework is
necessary for sound studies in the exciting era of computational
biophysics that lies on the horizon.