38
Interdisciplinary Applied Mathematics Volume 21 Editors S.S. Antman J .E. Marsden L. Sirovich S. Wiggins Geophysics and Planetary Seiences Mathematical Biology L. Glass, J.D. Murray Mechanics and Materials R.V. Kohn Systems and Control S.S. Sastry, P.S. Krishnaprasad Problems in engineering, computational science, and the physical and biological sciences are using increasingly sophisticated mathematical techniques. Thus, the bridge between the mathematical sciences and other disciplines is heavily trav- eled. The correspondingly increased dialog between the disciplines has led to the establishment of the series: Interdisciplinary Applied Mathematics. The purpose of this series is to meet the current and future needs for the interac- tion between various science and technology areas on the one band and mathe- matics on the other. This is done, firstly, by encouraging the ways that mathe- matics may be applied in traditional areas, as well as point towards new and innovative areas of applications; and, secondly, by encouraging other scientific disciplines to engage in a dialog with mathematicians outlining their problems to both access new methods and suggest innovative developments within mathe- matics itself. The series will consist of monographs and high-level texts from researchers working on the interplay between mathematics and other fields of science and technology.

Interdisciplinary Applied Mathematics - Springer978-0-387-22464-0/1.pdf · Interdisciplinary Applied Mathematics Volume 21 ... and the physical and biological ... A historical perspective

  • Upload
    buicong

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

Interdisciplinary Applied Mathematics

Volume 21

Editors S.S. Antman J .E. Marsden L. Sirovich S. Wiggins

Geophysics and Planetary Seiences

Mathematical Biology L. Glass, J.D. Murray

Mechanics and Materials R.V. Kohn

Systems and Control S.S. Sastry, P.S. Krishnaprasad

Problems in engineering, computational science, and the physical and biological sciences are using increasingly sophisticated mathematical techniques. Thus, the bridge between the mathematical sciences and other disciplines is heavily trav­eled. The correspondingly increased dialog between the disciplines has led to the establishment of the series: Interdisciplinary Applied Mathematics.

The purpose of this series is to meet the current and future needs for the interac­tion between various science and technology areas on the one band and mathe­matics on the other. This is done, firstly, by encouraging the ways that mathe­matics may be applied in traditional areas, as well as point towards new and innovative areas of applications; and, secondly, by encouraging other scientific disciplines to engage in a dialog with mathematicians outlining their problems to both access new methods and suggest innovative developments within mathe­matics itself.

The series will consist of monographs and high-level texts from researchers working on the interplay between mathematics and other fields of science and technology.

Interdisciplinary Applied Mathematics Volumes published are listed at the end of this book.

Springer Science+Business Media, LLC

Tamar Schlick

Molecular Modeling and Simulation An Interdisciplinary Guide

With 147 Full-Color Illustrations

Springer

Tamar Schlick Department of Mathematics and Chemistry Courant Institute of Mathematical Sciences New York University New York, NY 10012 USA [email protected]

Editors J.E. Marsden Control and Dynarnical Systems Mail Code 107-81 California Institute of Technology Pasadena, CA 91125 USA marsden @cds.caltech.edu S. Wiggins School of Mathematics University of Bristol Bristol, BS8 1 TW United Kingdom [email protected]

L. Sirovich Division of Applied Mathematics Brown University Providence, RI 02912 USA chico @camelot.mssm.edu

S.S. Antman Department of Mathematics and Institute for Physical Science and Technology University of Maryland College Park, MD 20742-4015 USA [email protected]

Cover illustration: © Wayne Thiebaud/Licensed by V AGA, New York, NY. Courtesy of the Alian Stone Gallery, NYC.

Mathematics Subject Classification (2000): 92-01, 92Exx, 92C40

Library of Congress Cataloging-in-Publication Data Schlick, Tamar.

Molecular modeling and simulation: an interdisciplinary guide 1 Tamar Schlick. p. cm.- (lnterdisciplinary applied mathematics ; 21)

lncludes bibliographical references and index. ISBN 978-1-4757-5893-1 ISBN 978-0-387-22464-0 (eBook) DOI 10.1007/978-0-387-22464-0

1. Biomo1ecules-Models. 2. Biomolecules-Models-Computer simulation. 1. Title. Il. Interdisciplinary applied mathematics ; v. 21. QD480 .S37 2002 572'.33'015118-dc21 2002016003

ISBN 978-1-4757-5893-1 Printed on acid-free paper.

© 2002 Springer Science+Business Media New York Originally published by Springer-Verlag New York, Inc. in 2002 Softcover reprint of the hardcover 1st edition 2002

AII rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher Springer Science+Business Media, LLC , except for brief excerpts in connection with reviews or scho1arly analysis. U se in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

9 8 7 6 5 4 3 2

Typesetting: Pages created by author using a Springer TEX macro package.

www.springer-ny.com

About the Cover

Molecular modelers are artists in some respects. Their subjects are complex, ir­regular, multiscaled, highly dynamic, and sometimes multifarious, with diverse states and functions. To study these complex phenomena, modelers must apply computer programs based on precise algorithms that stem from solid laws and theories from mathematics, physics, and chemistry.

Many of Wayne Thiebaud's Iandscape paintings, like Reservoir Study shown on the cover, embody this productive blend of nonuniformity with orderliness. Thiebaud's body of water - organic, curvy, and multilayered (reminiscent of a cell) - is surrounded by apparently ordered fields and land sections. Upon close inspection, the multiplicity in perspectives and interpretations emerges. This artwork thus mirrors the challenging crossdisciplinary interplay, as well as blend of science and art, central to biomolecular modeling.

To my grandparents - Fanny and Iancu Iosupovici, Lucy and Charles Schlick- whose love and courage I carry forever.

Book URLs

ForText: monod.biomath. nyu.edu/index/book.html

ForCourse: monod.biomath.nyu.edu/index/course/lndexMM.html

Preface

Science is a way of looking, reverencing. And the purpose of all science, like living, which amounts to the same thing, is not the ac­cumulation of gnostic power, the fixing of formulas for the name of God, the stockpiling of brutal efficiency, accomplishing the sadistic myth of progress. The purpose of science is to revive and cultivate a perpetual state of wonder. For nothing deserves wonder so much as our capacity to experience it.

Roald Hoffman and Shira Leibowitz Schmidt, in Old Wine, New Flasks: Re.flections on Science and Jewish Tradition (W.H. Freeman, 1997).

Challenges in Teaching Molecular Modeling

This textbook evolved from a graduate course termed Molecular Modeling intro­duced in the fall of 1996 at New York University. The primary goal of the course is to stimulate excitement for molecular modeling research - much in the spirit of Hoffman and Leibowitz Schmidt above - while providing grounding in the discipline. Such knowledge is valuable for research dealing with many practical problems in both the acadernic and industrial sectors, from developing treatments for AIDS (via inhibitors to the protease enzyme of the human imrnunodeficiency virus, HIV-1) to designing potatoes that yie1d spot-free potato chips (via trans­genic potatoes with altered carbohydrate metabolism). In the course of writing

xii Preface

this text, the notes have expanded to function also as an introduction to the field for scientists in other disciplines by providing a global perspective into problems and approaches, rather than a comprehensive survey.

As a textbook, my intention is to provide a framework for teachers rather than a rigid guide, with material to be supplemented or substituted as appropriate for the audience. As a reference book, scientists who are interested in learning about biomolecular modeling may view the book as a broad introduction to an exciting new field with a host of challenging, interdisciplinary problems.

The intended audience for the course is beginning graduate students in medical schools and in all scientific departments: biology, chemistry, physics, mathe­matics, computer science, and others. This interdisciplinary audience presents a special challenge: it requires a broad presentation of the field but also good cover­age of specialized topics to keep experts interested. Ideally, a good grounding in basic biochemistry, chemical physics, statistical and quantum mechanics, scien­tific computing (i.e., numerical methods ), and programming techniques is desired. The rarity of such a background required me to offer tutorials in both biological and mathematical areas.

The introductory chapters on biomolecular structure are included in this book (after much thought) and are likely tobe of interest to physical and mathematical scientists. Chapters 3 and 4 on proteins, together with Chapters 5 and 6 on nucleic acids, are thus highly abbreviated versions of what can be found in numerous texts specializing in these subjects. The selections in these tutorials also reflect some of my group's areas of interest. Because many introductory and up-to-date texts exist for protein structure, only the basics in protein structure are provided, while a somewhat more expanded treatment is devoted to nucleic acids.

Similarly, the introductory material on mathematical subjects such as basic op­timization theory (Chapter 10) and random number generators (Chapter 11) is likely tobe of use more to readers in the biological I chemical disciplines. General readers, as well as course instructors, can skip around this book as appropriate and fill in necessary gaps through other texts ( e.g., in protein structure or programming techniques).

Text Limitations

By construction, this book is very broad in scope and thus no subjects are covered in great depth. References to the Iiterature are only representative. The material presented is necessarily selective, unbalanced in parts, and reflects some of my areas of interest and expertise. This text should thus be viewed as an attempt to introduce the discipline of molecular modeling to students and to scientists from disparate fields, and should be taken tagether with other related texts, such as those listed in Appendix C, and the representative references cited.

The book format is somewhat unusual for a textbook in that it is nonlinear in parts. For example, protein folding is introduced early (before protein ba-

Preface xiii

sics are discussed) to illustrate challenging problems in the field and to interest more advanced readers; the introduction to molecular dynamics incorporates il­lustrations that require more advanced techniques for analysis; some specialized topics are also included throughout. Forthis reason, I recommend that students re­read certain parts of the book (e.g., first two chapters) after covering others (e.g., the biomolecular tutorial chapters). Still, I hope most of all to grab the reader's attention with exciting and current topics.

Given the many caveats of introducing and teaching such a broad and inter­disciplinary subject as molecular modeling, the book aims to introduce selected biomolecular modeling and simulation techniques, as well as the wide range of biomolecular problems being tackled with these methods. Throughout these pre­sentations, the central goal is to develop in students a good understanding of the inherent approximations and errors in the field so that they can adequately as­sess modeling results. Diligent students should emerge with basic knowledge in modeling and simulation techniques, an appreciation of the fundamental prob­lems - such as force field approximations, nonbonded evaluation protocols, size and timestep limitations in simulations - and a healthy critical eye for research. A historical perspective and a discussion of future challenges are also offered.

Dazzling Modeling Advances Demand Perspective

The topics I chose for this course are based on my own unorthodox introduc­tion to the field of modeling. As an applied mathematician, I became interested in the field during my graduate work, hearing from Professor Suse Broyde -whose path I crossed thanks to Courant Professor Michael Overton - about the fascinating problern of modeling carcinogen/DNA adducts.

The goal was to understand some structural effects induced by certain com­pounds on the DNA (deduced by energy minirnization); such alterations can render DNA more sensitive to replication errors, which in turn can eventually lead to mutagenesis and carcinogenesis. I bad to roam through many references to obtain a grasp of some of the underlying concepts involving force fields and simulation protocols, so many of which seemed so approximate and not fully physically grounded. By now, however, I have learned to appreciate the practical procedures and compromises that computational chemists have formulated out of sheer necessity to obtain answers and insights into important biological processes that cannot be tackled by instrumentation. In fact, approximations and simplifi­cations are not only tolerated when dealing with biomolecules; they often lead to insights that cannot easily be obtained from more detailed representations. Fur­thermore, it is often the neglect of certain factors that teaches us their importance, sometimes in subtle ways.

For example, when Suse Broyde and I viewed in the mid 1980s her intriguing carcinogen/modified DNA models, we used a large Evans and Sutherland com­puter while wearing special stereoviewers; the hard-copy drawings were ball and

xiv Preface

stick models, though the dimensionality projected out nicely in black and white. (Today, we still use stereo glasses, but current hardware stereo capabilities are much better, and marvelous molecular renderings are available ). At that time, only small pieces of DNA could be modeled, and the surrounding salt and solvent en­vironment was approximated. Still, structural and functional insights arose from those earlier works, many of which were validated later by more comprehensive computation, as well as laboratory experiments.

Book Overview

The book provides an overview of three broad topics: (a) biomolecular struc­ture and modeling: current problems and state of computations (Chapters 1-6); (b) molecular mechanics: force field origin, composition, and evaluation tech­niques (Chapters 7-9); and (c) simulation techniques: conformational sampling by geometry optimization, Monte Carlo, and molecular dynamics approaches (Chapters 10-13). Chapter 14 on the similarity and diversity problems in chem­ical design introduces some of the challenges in the growing field related to combinatorial chemistry (Chapter 14).

Specifically, Chapters 1 and 2 give a historical perspective of biomolecular modeling, outlining progress in experimental techniques, the current computa­tional challenges, and the practical applications of this enterprise - to convey the immense interest in, and support of, the discipline. Since these chapters discuss rapidly changing subjects ( e.g., genome projects, disease treatments ), they will be updated as possible on the text website. General readers may find these chapters useful as an introduction to biomolecular modeling and its applications.

Chapters 3 and 4 review the basic elements in protein structure, and Chapter 5 similarly presents the basic building blocks and conformational flexibility in nu­cleic acids. Chapter 6 presents additional topics in nucleic acids, such as DNA sequence effects, DNA/protein interactions, departures from the canonical DNA helix forms, RNA structure, and DNA supercoiling.

The second part of the book begins in Chapter 7 with a view of the discipline of molecular mechanics as an offspring of quantum mechanics and discusses the basic premises of molecular mechanics formulations. A detailed presentation of the force field terms - origin, variation, and parameterization - is given in Chapter 8. Chapter 9 is then devoted to the computation of the nonbonded energy terms, including cutoff techniques, Ewald and multi pole schemes, and continuum solvation alternatives.

The third part of the book, simulation algorithms, 1 begins with a description of optimization methods for multivariate functions in Chapter 10, emphasizing the

1Tbe word algorithm is named after tbe nintb-century Arab mathematician al-Khwarizmi (nick­named after bis bome town of Khwarizm, now Khiva in tbe Uzbek Republic), wbo stressed tbe importance of methodical procedures for solving problems in bis algebra textbook. Tbe term bas evolved to mean tbe systematic process of solving problems by macbine execution.

Preface xv

tradeoff between algorithm complexity and performance. Basic issues of Monte Carlo techniques, appropriate to a motivated novice, are detailed in Chapter 11, such as pseudorandom number generators, Gaussian random variates, Monte Carlo sampling, and the Metropolis algorithm. Chapters 12 and 13 describe the algorithmic challenges in biomolecular dynarnics simulations and present var­ious categories of integration techniques, from the popular Verlet algorithm to multiple-timestep techniques and Brownian dynamics protocols. Chapter 14 out­lines the challenges in sirnilarity and diversity sampling in the field of chemical design, related to the new field of combinatorial chernistry.

The book appendices complement the material in the main text through homework assignments, reading lists, and other information useful for teaching molecular modeling.

Instructors may find the sample course syllabus in Appendix A helpful. Impor­tant also to teaching is an introduction to the originalliterature; a representative reading list of articles used for the course is collected in Appendix B. An annotated general reference list is given in Appendix C.

Selected biophysics applications are highlighted through the homework assign­ments (Appendix D). Humor in the assignments stimulates creativity in many students. These homeworks are a central component of learning molecular mod­eling, as they provide hands-on experience, extend upon subjects covered in the chapters, and expose the students to a wide range of current topics in biomolec­ular structure. Advanced students may use these homework assignments to learn about molecular modeling through independent research.

Many homework assignments involve a molecular modeling software package. I selected the lnsight program in conjunction with our Silicon Graphics computer laboratory, but other suitable modeling programs can be used. Students also learn other basic research tools (such as programrning and Iiterature searches) through the homeworks.

Our memorable "force field debate" (see homework 7 in Appendix D) even brought the AMBER team to class in white lab coats, each accented with a name tag corresponding to one of AMBER's original authors. The late Peter Kollman would have been pleased. Harold Scheraga would have been no less impressed by the long list of ECEPP successes prepared by bis loyal troopers. Martin Karplus would not have been disappointed by the strong proponents of the CHARMM approach. I only hope to have as much spunk and talent in my future molecular modeling classes.

Extensive use of web resources is encouraged, while keeping in rnind the caveat of lack of general quality control. I was amazed to find some of my students' discoveries regarding interesting molecular modeling topics mentioned in the classroom, especially in the context of the term project, which requires them to find outstanding examples of the successes and/or failures of molecular modeling.

Interested readers rnight also want to glance at additional course information as part of my group's home page, monod.biomath.nyu.edu/. Supplementary text information (such as program codes and figure files) can also be obtained.

xvi Preface

To future teachers of molecular modeling who plan to design sirnilar assign­ments and material, I share with you my following experience regarding student reactions to this discipline: what excited students the most about the subject mat­ter and led to enthusiasm and excellent feedback in the classroom were the rapid pace at which the field is developing, its exciting discoveries, and the medical and technological breakthroughs made possible by important findings in the field.

In more practical terms, a mathematics graduate student, Brynja Kohler, expressed this enthusiasm succinctly in the introduction to her term project:

As I was doing research for this assignment, I found that one inter­esting article led to another. Communication via e-mail with some researchers around the world about their current investigations made me eagerly anticipate new results. The more I leamed the more easy it became to put off writing a final draft because my curiosity would Iead me on yet another line of inquiry. However, alas, there comes a time when even the greatest procrastinator must face the music, and evaluate what it is that we know and not linger upon what we hope to find out.

Future teachers are thus likely to have an enjoyable experience with any good group of students.

Acknowledgments

I am indebted to Jing Huang for her devoted assistance with the manuscript prepa­ration, file backups, data collection, and figure design. I also thank Wei Xu and Mulin Ding for important technical assistance. I am grateful to my other devoted current and former group members who helped read book segments, collect data, prepare the figures found throughout this book, and run to libraries throughout New York City often: Karunesh Arora, Danny Barash, Paul Batcho, Dan Beard, Mulin Ding, Hin Hark Gan, Jennifer Isbell, Joyce Noah, Xiaoliang Qian, Sonia Rivera, Adrian Sandu, Dan Strahs, Dexuan Xie, Linjing Yang, and Qing Zhang. Credits for each book figure and table are listed on the text's website.

I thank my colleagues Ruben Abagyan, Helen Berman, Dave Case, Jonathan Goodman, Andrej Sali, and Harold Scheraga, who gave excellent guest lectures in the course; and my course assistants Karunesh Arora, Margaret Mandziuk, Qing Zhang, and Zhongwei Zhu for their patient, dedicated assistance to the students with their homework and queries.

I am also very appreciative of the following colleagues for sharing reprints, information, and unpublished data andlor for their willingness to comment on segments of the book: Lou Allinger, Nathan Baker, Mike Beer, Helen Berman, Suse Broyde, John Board, Dave Beveridge, Ken Breslauer, Steve Burley, Dave Case, Philippe Derreumaux, Ron Elber, Eugene Fluder, Leslie Greengard, Steve Harvey, Jan Hermans, the Jate Peter Kollman, Robert Krasny, Michael Levitt, Xiang-Jun Lu, Pierre L'Ecuyer, Neodes Leontis, the late Shneior Lifson, Kenny

Preface xvii

Lipkowitz, Jerry Manning, Andy McCammon, Mihaly Mezei, Jorge Nocedal, Wilma Olson, Michael Overton, Vijay Pande, Dinshaw Patel, Harold Scheraga, Shulamith Schlick, Klaus Schulten, Suresh Singh, Bob Skeel, A. R. Srinivasan, Emad Tajkhorshid, Yuri Ushkaryov, Wilfred van Gunsteren, Arieh Warshel, Erle Westhof, Weitao Yang, and Darren York. Of special note are the extremely thor­ough critiques which I received from Lou Allinger, Steve Harvey, Jerry Manning, Robert Krasny, Wilma Olson, and Bob Skeel; their extensive comments and sug­gestions led to enlightening discussions and helped me see the field from many perspectives. I thank my colleague and friend Suse Broyde for introducing me to the field and for reading nearly every page of this book's draft.

To my farnily- parents Haim and Shula, sisters Yael and Daphne, aunt Cecilia, and especially Rick and Duboni - I am grateful for tolerating my long months on this project.

Finally, I thank my excellent students for making the course enjoyable and inspiring.

Tamar Schlick

New York, New York

June 10, 2002

Prelude

Every sentence I utter must be understood not as an affirmation but as a question.

Niels Bohr (1885-1962).

Only rarely does science undergo a dramatic transformation that can be likened to a tectonic rumble, as its character is transfigured under the weights of changing forces. We are now in such an exciting time. The discovery of the DNA double helix in the early 1950s prefigured the rise of molecular biology and its many offspring in the next half century, just as the rise of Internet technology in the 1980s has molded, and is still reshaping, nearly every aspect of contemporary life. With completion of the first draft of the human genome sequence trumpet­ing the beginning of the twenty-first century, triumphs in the biological seiences are competing with geopolitics and the economy for prominent-newspaper head­lines. The genornic seiences now occupy the center stage, linking basic to applied (medical) research, applied research to commercial success and econornic growth, and the biological sciences to the chernical, physical, mathematical and computer sciences.

The subject of this text, molecular modeling, represents a subfield of this suc­cessful marriage. In this text, I attempt to draw to the field newcomers from other diseiplines and to share basic knowledge in a modern context and interdisci­plinary perspective. Though many details on current investigations and projects will undoubtedly become obsolete as soon as this book goes to press, the ba­sie foundations of modeling will remain sirnilar. Over the next decades, we will

xx Prelude

surely witness a rapid growth in the field of molecular modeling, as weil as many success stories in its application.

Contents

About the Cover

Book URLs

Preface

Prelude

List of Figures

List of Tables

Acronyms, Abbreviations, and Units

1 Biomolecular Structure and Modeling: Historical Perspective 1.1 A Multidisciplinary Euterprise . . . .

1.1.1 Consilience . . . . . . . . . . 1.1.2 What is Molecular Modeling? 1.1.3 Need For Critical Assessment . 1.1.4 Text Overview .

1.2 Molecular Mechanics . . . . . 1.2.1 Pioneers . . . . . . . . 1.2.2 Simulation Perspective

1.3 Experimental Progress . . . . . 1.3.1 Protein Crystallography .

V

ix

xi

xix

xxxi

xxxvüi

xli

1 2 2 3 4 6 7 7

10 12 12

xxii Contents

1.3.2 DNA Structure . . 1.3.3 Crystallography . . 1.3 .4 NMR Spectroscopy

1.4 Modern Era . . . . . . . 1.4.1 Biotechnology 1.4.2 PCR and Beyond

1.5 Genome Sequencing . . . 1.5 .1 Sequencing Overview . 1.5.2 Human Genome ....

2 Biomolecular Structure and Modeling: Problem and Application

14 16 18 19 19 20 23 23 27

Perspective 33 2.1 Computational Challenges . . . . 33

2.1.1 Bioinformatics . . . . . 33 2.1.2 Structure Prom Sequence 35

2.2 Protein Polding . . . . . . 37 2.2.1 Polding Views . . . . 37 2.2.2 Polding Challenges 39 2.2.3 Polding Simulations . 40 2.2.4 Chaperones . . . . . 42 2.2.5 Unstructured Proteins . 42

2.3 Protein Misfolding . . . . . . 44 2.3.1 Prions . . . . . . . . 44 2.3.2 Infectious Proteins? . 44 2.3.3 Hypotheses . . . . . 45 2.3.4 Other Misfolding Processes . 46 2.3.5 Punction Prom Structure 47

2.4 Practical Applications 4 7 2.4.1 Drug Design . 48 2.4.2 AIDS Drugs . 49 2.4.3 Other Drugs . 53 2.4.4 A Long Way To Go 54 2.4.5 Better Genes . . . . 54 2.4.6 Designer Poods . . 56 2.4. 7 Designer Materials 59 2.4.8 Cosmeceuticals 59

3 Protein Structure Introduction 61 3.1 Machinery ofLife . . . . . . . . . . 61

3.1.1 Prom Tissues to Hormones 61 3.1.2 Size and Punction Variability 3.1.3 Chapter Overview ..

3.2 Amino Acid Building Blocks . . . . 3.2.1 Basic ca Unit . . . . . . . . 3.2.2 Essential and Nonessential Amino Acids .

62 63 66 66 67

Contents xxiii

3.2.3 Linking Amino Acids . . . . 69 3.2.4 The Amino Acid Repertoire. 72

3.3 Sequence Variations in Proteins . . . 74 3.3.1 Globular Proteins . . . . . . 74 3.3.2 Membrane and Fibrous Proteins 75 3.3.3 Ernerging Patterns from Genome Databases 76 3.3.4 Sequence Similarity . . . . . . . . . . . . . 77

3.4 Protein Conformation Framework . . . . . . . . . . 80 3.4.1 The Flexible <P and 'ljJ and Rigid w Dihedral Angles 80 3.4.2 Rotamerk Structures . . . 84 3.4.3 Ramachandran Plots. . . . 84 3.4.4 Conformational Hierarchy 86

4 Protein Structure Hierarchy 91 4.1 Structure Hierarchy . . 92 4.2 Helices . . . . . . . . . 92

4.2.1 Classic n:-Helix 92 4.2.2 310 and 1r Helices 93 4.2.3 Left-Handed n:-Helix 96 4.2.4 Collagen Helix . . . 96

4.3 ß-Sheets: A Common Secondary Structural Element . 96 4.4 Turns and Loops . . . . . . . . . . . 96 4.5 Supersecondary and Tertiary Structure 99

4.5.1 Complex 3D Networks . . . . 99 4.5.2 Classes in Protein Architecture 99 4.5.3 Classes areFurther Divided into Folds 100

4.6 n:-Class Folds . . . . 100 4.6.1 Bundles . . . . 100 4.6.2 Folded Leafs. . 101 4.6.3 Hairpin Arrays 101

4.7 ß-Class Folds . . . . . 102 4.7.1 Anti-Parallel ß Domains 102 4. 7.2 Parallel and Antiparallel Combinations . 103

4.8 a/ß and n:+ß-Class Folds . . . . 103 4.8.1 a/ß Barrels . . . . . . . 104 4.8.2 Open Twisted a/ß Folds 104 4.8.3 Leueine-Rich a/ß Folds . 104 4.8.4 a+ß Folds . . . 104

4.9 NumberofFolds. . . . . . . . . 105 4.9.1 Finite Number? . . . . . 105 4.9.2 Concerted Target Selection: Structural Genolllies 105

4.10 Quatemary Structure. . . . . . . . . . . . . . . 106 4.10.1 Viruses. . . . . . . . . . . . . . . . . . 106 4.10.2 From Ribosomes to Dynamic Networks 110

4.11 Structure C1assification . . . . . . . . . . . . . 111

xxiv Contents

5 Nucleic Acids Structure Minitutorlai 5.1 DNA, Life's Blueprint ............. .

5 .1.1 The Kindled Field of Molecular Biology . 5.1.2 DNA Processes ........... . 5.1.3 Challenges in Nucleic Acid Structure. 5.1.4 Chapter Overview .

5.2 Basic Building Blocks ... 5.2.1 Nitrogenaus Bases 5.2.2 Hydrogen Bonds 5.2.3 Nucleotides . . . . 5.2.4 Polynucleotides . . 5.2.5 Stabilizing Polynucleotide Interactions . 5.2.6 Chain Notation . . . . . 5.2.7 Atmnic Labeling .... 5.2.8 Torsion Angle Labeling

5.3 Conformational Flexibility . . . 5.3.1 The Furanase Ring ... 5.3.2 Backhone Torsional Flexibility 5.3.3 The Glycosyl Rotation . . . . 5.3.4 Sugar/Glycosyl Combinations 5.3.5 Basic Helical Descriptors 5.3.6 Base-Pair Parameters

5.4 Canonical DNA Forms 5.4.1 B-DNA 5.4.2 5.4.3 5.4.4

A-DNA ... . Z-DNA ... . Comparative Features .

113 114 114 116 117 118 118 119 119 119 121 122 124 125 125 126 126 131 131 131 133 135 139 141 142 145 146

6 Topics in Nucleic Acids Structure 147 6.1 Introduction . . . . . . . . . 148 6.2 DNA Sequence Effects . . . 149

6.2.1 Local Deformations . 149 6.2.2 Orientation Preferences in Dinucleotide Steps 150 6.2.3 Intrinsic DNA Bending in A-Tracts . . . . . . 153 6.2.4 Sequence Deformabi1ity Analysis Continues . 156

6.3 DNA Hydration and Ion lnteractions 157 6.3 .1 Resolution Difficulties 159 6.3.2 BasicPatterns . . 159

6.4 DNA/Protein Interactions . . . 163 6.5 Variations on a Theme . . . . . 165

6.5.1 Hydrogen Bonding Patterns in Polynucleotides 165 6.5.2 Hybrid Helical/Nonhelical Forms . . . . 171 6.5.3 Overstretched and Understretched DNA 173

6.6 RNA Structure . . . . . . . . . . . . . . . . 175 6.6.1 RNA Chains Fold Upon Themselves . . 175

Contents xxv

6.6.2 RNA's Diversity . . . . . . . . . . . . . . . . . . 176 6.6.3 RNA at Atomic Resolution . . . . . . . . . . . . 176 6.6.4 Ernerging Themes in RNA Structure and Polding 179

6.7 CellularOrganizationofDNA. . . . . . 181 6.7.1 Campaction of Genomic DNA . . . . . . 181 6.7.2 Coiling ofthe DNA Helix Itself. . . . . . 182 6.7.3 Chromosomal Packaging of Coiled DNA 183

6.8 Mathematical Characterization of DNA Supercoiling . 186 6.8.1 DNA Topology and Geometry . . . . . . 186

6.9 Computational Treatments ofDNA Supercoiling 189 6.9.1 DNA as a Flexible Polymer . . . . 190 6.9.2 Elasticity Theory Framework . . . 191 6.9.3 Simulations of DNA Supercoiling 192

7 Theoretical and Computational Approaches to Biomolecular Structure 199 7.1 Merging of Theory and Experiment . . . . . . 200

7 .1.1 Exciting Tim es for Computationalists! 200 7 .1.2 The Future of Biocomputations . 202 7 .1.3 Chapter Overview . . . . . . . . 202

7.2 QM Foundations . . . . . . . . . . . . . 202 7 .2.1 The Schrödinger Wave Equation 203 7 .2.2 The Bom-Oppenheimer Approximation 203 7 .2.3 Ab Initio . . . . . . . . . . . . . . . . . 204 7 .2.4 Semi-Empirical QM . . . . . . . . . . . 205 7 .2.5 Recent Advances in Quantum Mechanics 205 7 .2.6 From Quantum to Molecular Mechanics 207

7.3 Molecular Mechanics Principles. . . . . 211 7.3.1 The Thermodynamic Hypothesis 211 7.3.2 Additivity . . . . . . . . . 212 7.3.3 Transferability . . . . . . . 214

7.4 Molecular Mechanics Formulation 217 7.4.1 Configuration Space. . . . 218 7.4.2 Functional Form. . . . . . 219 7 .4.3 Some Current Limitations . 222

8 Force Fields 225 8.1 Formulation of the Model and Energy . 227 8.2 Normal Modes . . . . . . . . . . 227

8.2.1 Characteristic Motions . . . . 227 8.2.2 Spectra of Biomolecules . . . 229 8.2.3 Spectra As Force Constant Sources . 230 8.2.4 In-Plane and Out-of-Plane Bending . 231

8.3 Bond Length Potentials 232 8.3.1 Harmonie Term . . . . . . . . . . . 233

xxvi Contents

80302 MorseTerm 0 0 0 0 0 0 0 234 803.3 Cubic and Quartic Terms 236

8.4 Bond Angle Potentials 0 0 0 0 0 0 237 8.401 Harmonie and Trigonometrie Terms 237 8.402 Cross Bond Stretch I Angle Bend Terms 239

805 Torsional Potentials ......... 241 8o5.1 Origin of Rotational Barriers 0 0 241 80502 Fourier Terms 0 0 0 0 0 0 0 0 0 0 242 8o5.3 Torsional Parameter Assignment 243 805.4 Improper Torsion ........ 247 80505 Cross Dihedral/Bond Angle and Improperllmproper

Dihedral Terms ....... 248 806 van der Waals Potential ........... 249

80601 Rapidly Decaying Potential 0 0 0 0 0 249 8o6o2 Parameter Fitting From Experiment 249 80603 Two Parameter Calculation Protocols 0 250

807 Coulomb Potential 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 251 8o7.1 Coulomb's Law: Slowly Decaying Potential 251 80702 Dielectric Function 253 80703 Partial Charges 254

808 Parameterization 0 0 0 0 0 0 255 80801 A Package Deal 0 0 255 80802 Force Field Performance 256

9 Nonbonded Computations 259 901 Computational Botdeneck 0 261 9o2 Reducing Computational Cost 0 262

90201 Simple Cutoff Schemes 262 90202 Ewald and Multipole Schemes 263

9.3 Spherical Cutoff Techniques 0 0 0 0 0 0 264 9.301 Technique Categories 0 0 0 0 0 264 9.302 Guidelines for Cutoff Functions 265 9o3.3 General Cutoff Formulations 266 903.4 Potential Switch 0 268 90305 Force Switch 269 9o3o6 Shift Functions 270

9.4 Ewald Method 0 0 0 0 0 0 271 9.401 Periodic Boundary Conditions 271 9.402 Ewald Sum and Crystallography 274 9.4.3 Morphing A Conditionally Convergent Sum 276 9.4.4 Finite-Dielectric Correction 0 280 9.405 Ewald Sum Complexity 0 0 0 280 9.406 Resulting Ewald Summation 281 9.407 Practical Implementation 283

9o5 Multipole Method 0 0 0 0 0 0 0 0 0 0 284

9.6

9.5.1 Basic Hierarchical Strategy .... . 9.5.2 Historical Perspective ....... . 9.5.3 Expansion in Spherical Coordinates 9.5.4 Biomolecular Implementations 9.5.5 Other Variants ..... . Continuum Salvation . . . . . . 9.6.1 Need for Simplification! 9.6.2 Potential ofMean Force. 9.6.3 Stochastic Dynamics .. 9.6.4 Continuum Electrostatics

Contents xxvii

285 289 291 292 293 293 293 294 295 298

10 Multivariate Minimization in Computational Chemistry 305 10.1 Optimization Applications . . . . . . . . . . 306

10.1.1 Algorithmic Understanding Needed 307 10.1.2 Chapter Overview . . 307

10.2 Fundamentals . . . . . . . . . 308 10.2.1 Problem Formulation 308 10.2.2 Independent Variables . 308 10.2.3 Function Characteristics. 309 10.2.4 Local and Global Minima . 310 10.2.5 Derivatives . . 312 10.2.6 Hessian Matrix 313

10.3 Basic Algorithms . . . 317 10.3.1 Greedy Descent 317 10.3.2 Line Searches . 318 10.3.3 Trust Region Metbads. 321 10.3.4 Convergence Criteria 322

10.4 Newton's Method . . . . . . . 323 10.4.1 Newton in One Dimension 324 10.4.2 Newton's Metbad for Minimization 327 10.4.3 Multivariate Newton 329

10.5 Large-Scale methods . . . . . . 329 10.5.1 Quasi-Newton (QN) . . . 330 10.5.2 Conjugate Gradient (CG) 332 10.5.3 Truncated-Newton (TN) . 334 10.5.4 Simple Example . . . . . 336

10.6 Software . . . . . . . . . . . . . 338 1 0.6.1 Popular Newton and CG 338 10.6.2 CHARMM's ABNR . . 338 10.6.3 CHARMM's TN . . . . 338 10.6.4 Comparative Performance on Mölecular Systems 339

10.7 Recommendations . 339 10.8 Future Outlook. . . . . . . . . . . . . . . . . . . . . . . 342

xxviii Contents

11 Monte Carlo Techniques 11.1 Monte Car1o Popu1arity . . . . .

11.1.1 A Winning Combination 11.1.2 From Needles to Bombs 11.1.3 Chapter Overview . . . . 11.1.4 Importance of Error Bars

11.2 Random Number Generators . . 11.2.1 What is Random? . . . . 11.2.2 Properties of Generators? . 11.2.3 Linear Congruential Generators . 11.2.4 Other Generators . 11.2.5 Artifacts . . . . . . 11.2.6 Recommendations .

11.3 Gaussian Random Variates 11.3.1 Manipulation of Uniform Random Variables . 11.3.2 Normal Variates in Molecular Simulations 11.3.3 Odeh/Evans . . . . . . 11.3.4 Box/Muller/Marsaglia .

11.4 Monte Carlo Means . . . 11.4.1 Expected Values . 11.4.2 Error Bars . . . 11.4.3 Batch Means . . .

11.5 Monte Carlo Sampling . . 11.5 .1 Probability Density Function 11.5.2 Equilibria or Dynarnics 11.5.3 Ensembles ..... . 11.5 .4 Importance Sampling

11.6 Hybrid MC . . . . . . 11.6.1 MC and MD ..... 11.6.2 Basic ldea . . . . . . 11.6.3 Variants and Other Hybrid Approaches .

12 Molecular Dynamics: Basics 12.1 Introduction ............ .

12.1.1 Why Molecular Dynamics? . 12.1.2 Background ...... . 12.1.3 Outline of MD Chapters .

12.2 Laplace's Vision ........ . 12.2.1 The Dream . . . . . . . 12.2.2 Deterministic Mechanics 12.2.3 Neglect of Electronic Motion . 12.2.4 Critical Frequencies ..... 12.2.5 Electron/Nuclear Treatment .

12.3 Basics .......... . 12.3.1 Following Motion ..... .

345 346 346 347 347 348 348 348 349 352 356 360 362 363 363 363 364 366 366 366 368 370 371 371 371 372 373 377 377 378 379

383 384 384 385 388 389 389 389 389 390 391 392 392

Contents xxix

12.3.2 Trajectory Quality ... 12.3.3 Initial System Settings 12.3.4 Trajectory Sensitivity . 12.3.5 Simulation Protocol .. 12.3.6 High-Speed Implementations . 12.3.7 Analysis and Visualization .. 12.3.8 Reliable Numerical Integration 12.3.9 Computationa1 Complexity . .

12.4 Verlet Algorithm ........... . 12.4.1 Position and Velocity Propagation 12.4.2 Leapfrog, Velocity Verlet, and Position Verlet

12.5 Constrained Dynamics . . 12.6 Various MD Ensembles ..

12.6.1 Ensemble Types .. 12.6.2 Simple A1gorithms 12.6.3 Extended System Methods

13 Molecular Dynamics: Further Topics 13.1 Introduction .......... . 13.2 Symplectic Integrators . . . . . .

13.2.1 Symplectic Transformation . 13.2.2 Harmonie Oscillator Example. 13.2.3 Linear Stability . . . . . . . . 13.2.4 Timestep-Dependent Rotation in Phase Space 13.2.5 Resonance Condition for Periodic Motion 13.2.6 Resonance Artifacts .....

13.3 Multiple-Timestep (MTS) Methods . 13.3.1 Basic Idea .. 13.3.2 Extrapolation ....... . 13.3.3 Impulses . . . . . . . . . . 13.3.4 Resonances in Impulse Splitting 13.3.5 Resonance Artifacts in MTS 13.3.6 Resonance Consequences .

13.4 Langevin Dynamics 13.4.1 Uses .... 13.4.2 Heat Bath . 13.4.3 Effect of 'Y . 13.4.4 Genera1ized Verlet for Langevin Dynarnics. 13.4.5 LN Method ... .

13.5 Brownian Dynamics (BD) .. . 13.5.1 Brownian Motion .. . 13.5.2 Brownian Framework . 13.5.3 General Propagation Framework 13.5.4 Hydrodynamics . 13.5.5 BD Propagation ........ .

393 394 396 399 400 402 402 403 406 406 408 410 412 412 413 416

419 420 421 422 422 423 424 426 427 428 428 429 430 431 431 434 435 435 435 435 437 438 442 442 444 446 447 450

xxx Contents

13.6 Implicit Integration ....... . 13.6.1 Implicit vs. Explicit Euler . 13.6.2 Intrinsic Damping . . 13.6.3 Computational Time 13.6.4 Resonance Artifacts .

13.7 Future Outlook ....... . 13.7.1 Integration Ingenuity 13.7.2 Current Challenges

14 Similarity and Diversity in Chemical Design 14.1 Introduction to Drug Design .

14.1.1 Chemical Libraries .. 14.1.2 Ear1y Days ...... . 14.1.3 Rational Drug Design . 14.1.4 Automated Technology 14.1.5 Chapter Overview .

14.2 Database Problems ...... . 14.2.1 Database Analysis .. . 14.2.2 Similarity and Diversity Sampling 14.2.3 Bioactivity ..... .

14.3 General Problem Definitions ... . 14.3.1 The Dataset ........ . 14.3.2 The Compound Descriptors . 14.3.3 Biological Activity 14.3.4 The Target Function .... . 14.3.5 Scaling Descriptors .... . 14.3.6 The Similarity and Diversity Problems

14.4 Data Compression and Cluster Analysis 14.4.1 PCA compression . 14.4.2 SVD compression .. . 14.4.3 PCA and SVD .... . 14.4.4 Projection Application 14.4.5 Example ..

14.5 Future Perspectives . . . . . .

Epilogue

Appendix A. Molecular Modeling Sampie Syllabus

Appendix B. Article Reading List

Appendix C. Supplementary Course Texts

Appendix D. Homework Assignments

Index

452 452 454 454 454 459 459 459

463 464 464 465 467 469 469 470 470 471 473 475 475 475 478 479 479 480 482 483 485 487 488 489 492

497

499

501

505

511

621

List of Figures

101 Simulationevolution (3D version) 1.2 Simulationevolution (2D version)

1.3 Cryo-EM view of a-latrotoxin

201 Sequence and structure data 0 0 202 Paracelsus' Janus 0 0 0 0 0 0 0 2.3 GroEL/GroES chaperonin/co-chaperonin complex 0 2.4 Prion protein 205 AIDS drugs 0 0

301 An amino acid 0 302 Water clusters 0 303 Dipeptideformation 0 3.4 Peptide formula 0 0 0 305 Aspartame 0 0 0 0 0 0 306 The amino acid repertoire 0 307 Aminoacids structures 308 Aminoacid frequencies 0 309 Fibrous proteins 0 3010 Rop 0 0 0 0 0 0 0 0 0 0 0 3 .11 EF proteins 0 0 0 0 0 0 0 3.12 Protein-structure variants 3013 Gauche and trans orientations 0 3 0 14 Dihedral angle 0 0 0 0 0 0 0 0 0

13 14 22

36 38 41 48 50

63 65 67 69 69 70 71 74 76 78 80 81 82 83

xxxii List of Figures

3.15 Rotations in polypeptides . 3.16 Lysine rotamers .... 3.17 Amino acids rotamers . . . 3.18 Ramachandranplots .... 3.19 Further study of Ramachandran plots .

4.1 The o:-he1ix and ß-sheet motifs . 4.2 o:-he1ica1 proteins (a) 4.3 o:-he1ical proteins (b) 4.4 ß-helical proteins (a) 4.5 ß-helical proteins (b) 4.6 aj ß proteins ... . 4.7 a + ß proteins .. . 4.8 Tornato bushy stunt virus

5.1 The DNA double helix 5.2 Nucleic acid components 5.3 Watson-Crick base pairing 5.4 The polynucleotide chain and labeling 5.5 Sugar envelope and twist puckers . 5.6 Sugar pseudorotation cycle 5.7 Common sugar puckers . 5.8 Sugar pucker dustering .. 5.9 Torsion angle wheel .... 5.10 Deoxyadenosine adiabatic map . 5.11 Base-pair coordinate system .. 5.12 Base-pair step and basepair parameters 5.13 ModelA, B, andZ-DNA ...... . 5.14 Model A, B, and Z-DNA, stereo side . 5.15 Model A, B, and Z-DNA, stereo top

6.1 Bending in 1ong DNA .... 6.2 Net DNA bending examp1es 6.3 A-tract DNA dodecamer .. 6.4 Sequence-dependent 1ocal DNA hydration . 6.5 DNNprotein binding motifs . . . . . . . 6.6 Various hydrogen-bonding schemes . . . 6.7 DNNprotein complex with Hoogsteen bp 6.8 Oligonucleotide analogues . . . . . . . 6.9 Various nucleotide-chain folding motifs 6.10 RNAs with pseudoknots . . . . . . . 6.11 Interwound and toroidal supercoiling . . 6.12 Nucleosome core particle ....... . 6.13 Schematic view of DNA levels of folding 6.14 Supercoiling topology and geometry . . .

84 84 85 87 88

93 94 95 97 98

107 108 109

120 121 122 123 127 128 129 130 132 134 138 140 142 143 144

154 155 156 162 166 168 170 172 177 178 182 185 187 188

List of Figures xxxiii

6.15 Brownian dynamics snapshots of DNA . 196 6.16 Site juxtaposition measurements 197 6.17 Polynucleosome modeling . . . . . . . 198

7.1 DNA quantum-mechanically derived electrostatic potentials 208 7.2 Enolase active site . . . 209 7.3 Molecular geometry. . 210 7.4 CHARMM atom types 216

8.1 Normal modes of a water molecule. 229 8.2 Computed protein and water spectra 231 8.3 Vibrational modes types 232 8.4 Bond-length potentials 235 8.5 Bond angles . . . . . . . 238 8.6 Bond-anglepotentials . . 239 8.7 Stretchlbend cross terms 240 8.8 Butane torsional orientations 241 8.9 Torsion-anglepotentials . . . 245 8.10 Model compounds for torsional parameterization 246 8.11 Wilson angle . . . . . . 248 8.12 Van der Waals potentials 252 8.13 Coulombpotentials . . . 255

9.1 CPU time for nonbonded calculations 262 9.2 Cutoff schemes . . . . . . 266 9.3 Switch and shift functions 268 9.4 Periodic domains . . . . . 272 9.5 Various periodic domains 273 9.6 Space-filling polyhedra . . 274 9.7 Ewald's trick of Gaussian masking 279 9.8 CPU time for PME vs. fast multipole . 284 9.9 Fastmultipole schemes . . . . . . . . 287 9.10 Screened Coulomb potential . . . . . 302 9.11 Poisson-Boltzmann rendering of the 30S ribosome 304

10.1 One-dimensional function . . . . . . . . 311 10.2 2D Contour curves for quadratic functions 10.3 3D curves for quadratic functions. 10.4 Sparse Hessians ...... . 10.5 Sparse Hessians, continued ... . 10.6 Line search minimization .... . 10.7 Newton's method, simple illustration . 10.8 Newton's method, quadratic example output . 10.9 Newton's method, cubic example output 10.10 Minimization paths ............. .

314 315 316 317 320 325 327 328 337

xxxiv List of Figures

10.11 Minimization progress 340

11.1 Lattice structure for simple random number generators 356 11.2 Structures for linear congruential generators 357 11.3 MC computation of 1r • • 368 11.4 Boltzmann probabilities . . . . . 373 11.5 MC moves for DNA . . . . . . . 377 11.6 MC and BD DNA Distributions 377 11.7 Bad MC protocol . 378

12.1 Sampling methods 386 12.2 Equilibration . . . 397 12.3 Chaos in MD . . . 398 12.4 Butane's end-to-end distance 400 12.5 Butane's end-to-end distance convergence 401 12.6 Energy drift . . . . . . . . . . . . . 404

13.1 Effective Verletphase space rotation 425 13.2 Verlet resonance foraMorse oscillator 428 13.3 Extrapolative vs. Impulse MTS . . . . . 429 13.4 Impulse vs. extrapolative force splitting 432 13.5 Resonance from force splitting . . . . . 433 13.6 Harmonie oscillator Langevin trajectories 436 13.7 BPTI means and variances by Langevin and Newtonian MTS 438 13.8 LN algorithm . . . . . . . . . . . . . 439 13.9 Manhattan plots for polymerase/DNA 440 13.10 Polymerase/DNA system . . . . . . 442 13.11 BPTI spectral densities . . . . . . . 443 13.12 Polymerase/DNA spectral densities. 443 13.13 Polymerase/DNAgeometry. . . . . 444 13.14 Cholesky vs. Chebyshev approaches for random force . 452 13.15 Implicit and explicit Euler . . . . . . 453 13.16 Verlet and implicit-midpoint energies 455 13.17 Stochastic-path approach snapshots 461

14.1 Sampie drugs . . . . . 468 14.2 Related pairs of drugs . 471 14.3 Chemicallibrary . . . 476 14.4 SVD/refinement performance . 490 14.5 SVD-based database projection in 2D and 3D 491 14.6 Clusteranalysis . . . . . . . . . . . . . . 493 14.7 PCA projection in 2D, with similar pairs . 494 14.8 PCA projection in 2D, with diverse pairs . 495

D.l Sampie histogram for protein/DNA interaction analysis 527

D.2 Biphenyl .................. . D.3 Structure for linear congruential generators D.4 Hydrogenbond geometry ......... .

List of Figures xxxv

547 552 559

List of Tables

1.1 Structural biology chronology . . 1.2 Biomolecular simulation evolution

2.1 Protein databases . . .

3.1 Aminoacid frequency.

5.1 Genetic code ..... 5.2 Nucleic acidtorsionangle definitions. 5.3 Mean properties of representative DNA forms .. 5.4 Selected parameters for model DNA helices ..

5 8

37

75

116 126 136 137

6.1 Base-pair step parameters for free and protein-bound DNA 151 6.2 Protein/DNA complexes . . . . . . . . . 164 6.3 DNA content of representative genomes. . 182

7.1 Some CHARMM atom types . . . . . . . 217

8.1 Characteristic stretching vibrational frequencies 230 8.2 Characteristic bending and torsional vibrational frequencies. 230 8.3 Examples oftorsional potentials . . . 247

9.1 CPU time for nonbonded calculations 263

10.1 Optimization software ....... . 343

XXXVlll List of Tables

10.2 Minimization comparisons 344

11.1 MC calculations for 1r • • . 381

12.1 Biomolecular sampling methods 387 12.2 High-frequency modes 391 12.3 Biomolecular timescales .... 392

13.1 Verlet timestep restriction timescales . 424 13.2 Stability limits ............. 427

Acronyms, Abbreviations, and Units

A A A AdMLP AIDS Ala (A) Arg(R) Asp (D) Asn (N) AS ATP AZT

B bp bps BAC BOES BPTI BSE

c

adenine (purine nitrogenous base) angstrom oo-10 m) adenovirus major late promoter (protein) acquired immune deficiency syndrome alanine arginine asparagirre aspartic acid Altona/Sundaralingam (sugar description) adenosirre triphosphate (energy source) zidovudine (AIDS drug)

basepair base pairs bacterial artificial chromosome Bom-Oppenheimer energy surfaces bovine pancreatic trypsin inhibitor bovine spongiform encephalopathy ('mad cow disease')

cm centimeter oo-2 m) C cytosine (pyrimidine nitrogenous base)

xl Acronyms, Abbreviations, and Units

CAP CASP CG CJD CN CP CPU Cys (C)

D

catabolite gene activator protein Critical Assessment of Techniques for Protein Structure Prediction Conjugate gradient method (for minimization) Creutzfeld-Jakob disease (brain disorder, human version of BSE) Crigler-Najjar (debilitating disease, gene therapy applications) Cremer/Pople (sugar description) central processing units cysteine

DFf density functional theory (quantum mechanics approach) DH Debye-Hückel DNA deoxyribonucleic acid (also A-, B-, C-, D-, P-, S-, T-, and Z-DNA) DOE Department of Energy

E erg energy unit (lo-7 J) EM electron microscopy

F fs femtosecond (10-15 s) FFf Fast Fourier Transforms

G G Gin (Q) Glu (E) Gly (G) GSS

H HDV His (H) HIV HMC HTH Hz

I

guanine (purine nitrogenaus base) glutamine glutamic acid glycine Gerstmann-Straussler-Scheinker disease (brain disorder similar to CJD)

hepatitis delta helper virus histidine human immunodeficiency virus hybrid Monte Carlo helix/turnlhelix (motit) hertz (inverse second)

Ile (I) isoleueine IHF integration host factor (protein)

K kbp kcal/mol kDa KR

L Leu (L) Lys (K) LCG

M m mgr ms J.LS mm MAD MC MD Met (M) Mgr MIR MLCG MTS

N nm ns NCBI NASA NDB NIH NMR NSF

0

Acronyms, Abbreviations, and Units xli

kilobase pairs kilocalories per mole (energy unit) kilodaltons (mass unit used for proteins) Kirkvvood-Riseman

leueine lysine linear congruential generator

meter minor groove rnillisecond (10-3 s) rnicrosecond (10-6 s) rnillimeter (10-3 m) multiple isomorphous replacement ( crystallography technique) Monte Carlo molecular dynamics methionine major groove multivvavelength anomalous diffraction ( crystallography technique) multiplicative linear congruential generator multiple-timestep methods (for MD)

nanometer (10-9 m) nanosecond (10-9 s) National Center for Biotechnology Information National Aeronautics and Space Administration nucleic acid database (ndbserver.rutgers.edu/) National Institutes of Health nuclear magnetic resonance National Science Foundation

OTC omithine transcarbamylase (chronic ailment, gene therapy applications)

p pn picoNevvton (force unit) ps picosecond (10- 12 s) PB Poisson-Boltzmann

xlii Acronyms, Abbreviations, and Units

PBE PC PCA PCR PDB Phe (F) PIR PME PNA Pro (P) PrPc PrPSc

Pur Pyr

Q QM QN QSAR

R RCSB RMS (rms) RMSD RNA RT

s s Ser (S) SAR SCF SCOP SD SGI SNPs SRY STS SVD

T T Thr(T)

Poisson-Boltzmann equation principal component principal component analysis polymerase chain reaction protein databank (www.rcsb.org/pdb) phenylalanine Protein Information Resource (pir.georgetown.edu) particle-mesh Ewald peptide nucleic acid (DNA mirnie) proline prion protein cellular (harmless) harmful isoform of PrPc, causes scrapie in sheep purine (base) pyrimidine (base)

quantum mechanics quasi Newton method (for minimization) quantitative structure/activity relationships

Research Collaboratory for Structural Bioinformatics (www.rcsb.org) root-mean-square root-mean-square deviations ribonucleic acid (also cRNA, gRNA, mRNA, rRNA, snRNA, tRNA) reverse transcriptase (AIDS protein)

second senne structure/activity relationships self-consistent field (quantum mechanical approach) structural classification of proteins (scop.mrc-lmb.cam.ac.uklscop/) steepest descent method (for minimization) Silicon Graphics Inc. single-nucleotide polymorphisms ("snips") sex determining region Y (protein) single-timestep methods (for MD) singular value decomposition

thymine (pyrimidine nitrogenaus base) threonine

Trp (W) Tyr (Y) TBP TE TMD TN 2D 3D

u

tryptophan tyrosine

Acronyms, Abbreviations, and Units xliii

TATA-box DNA binding protein (transcription regulator) transcription efficiency targeted molecular dynamics truncated Newton method (for minimization) two-dimensional three-dimensional

U uracil (pyrimidine nitrogenous base) URL uniform resource locator UV ultraviolet spectroscopy

V Val (V) valine

w WC Watson!Crick base pairing