View
30
Download
2
Category
Tags:
Preview:
DESCRIPTION
Structure and Motion. Jean-Claude Latombe Computer Science Department Stanford University NSF-ITR Meeting on. November 14, 2002. Stanford’s Participants. PI’s: L. Guibas, J.C. Latombe, M. Levitt Research Associate: P. Koehl Postdocs: F. Schwarzer, A. Zomorodian - PowerPoint PPT Presentation
Citation preview
Structure and MotionStructure and MotionJean-Claude Latombe
Computer Science Department Stanford University
NSF-ITR Meeting on
November 14, 2002
Stanford’s ParticipantsStanford’s Participants PI’s: L. Guibas, J.C. Latombe, M. Levitt Research Associate: P. Koehl Postdocs: F. Schwarzer, A. Zomorodian Graduate students: S. Apaydin (EE), S. Ieong
(CS), R. Kolodny (CS), I. Lotan (CS), A. Nguyen (Sc. Comp.), D. Russel (CS), R. Singh (CS), C. Varma (CS)
Undergraduate students: J. Greenberg (CS),E. Berger (CS)
Collaborating faculty: A. Brunger (Molecular & Cellular Physiology) D. Brutlag (Biochemistry) D. Donoho (Statistics) J. Milgram (Math) V. Pande (Chemistry)
Problems AddressedProblems Addressed
Biological functions derive from the structures (shapes) achieved by molecules through motions
Determination, classification, and prediction of 3D protein structures
Modeling of molecular energy and simulation of folding and binding motion
What’s New for Computer What’s New for Computer Science?Science?
Massive amount of experimental dataImportance of similaritiesMultiple representations of structure
Continuous energy functionsContinuous energy functions Many objects forming deformable chains
Many degrees of freedomMany degrees of freedom
Ensemble properties of pathwaysEnsemble properties of pathways
Massive amount of Massive amount of experimental dataexperimental data
Abstract/simplify data sets into compact data structures
E.g.: Electron density map Medial axis
Importance of similaritiesImportance of similarities
Segmentation/matching/scoring techniques
data set
clustereddata
smalllibrary
E.g.: Libraries of protein fragments[Kolodny, Koehl, Guibas, Levitt, JMB (2002)]
1tim Approximations
Complexity 10 (100 fragments of length 5)0.9146A cRMS
Complexity 2.26 (50 fragments of length 7)2.7805A cRMS
real protein
Alignment of Structural Motifs [Singh and Saha; Kolodny and Linial]
Problem: Determine if two structures share common
motifs:•2 (labelled) structures in R3
A={a1,a2,…,an}, B={b1,b2,…,bm}
•Find subsequences sa and sb s.t the substructures{asa(1),asa(2),…, asa(l)}{bsb(1),bsb(2),…, bsb(l)}
are similar Twofold problem: alignment and
correspondence Score Approximation Complexity
Iterative Closest Point (Besl-McKay) for alignment:
[R. Singh and M. Saha. Identifying Structural Motifs in Proteins.Pacific Symp. on Biocomputing, Jan. 2003.]
Score: RMSD distance
[R. Singh and M. Saha. Identifying Structural Motifs in Proteins.Pacific Symp. on Biocomputing, Jan. 2003.]
Trypsin
Trypsinactivesite
[R. Singh and M. Saha. Identifying Structural Motifs in Proteins.Pacific Symp. on Biocomputing, Jan. 2003.]
Trypsin active site against 42Trypsin like proteins
Multiple representations of Multiple representations of structurestructure
ProShape software[Koehl, Levitt (Stanford),Edelsbrunner (Duke)]
Decoys generated using “physical” potentials
Select best decoys using distance information
Statistical potentials for proteins based on alpha complex [Guibas, Koehl, Zomorodian]
Many pairs of objects, but relatively few are close enough to interact
Data structures that capture proximity, but undergo small or rare changes
During motion simulation - detect steric clashes (self-collisions)- find pairs of atoms closer than cutoff
Continuous energy functionsContinuous energy functions Many objects in deformable chainsMany objects in deformable chains
Other application domains:
Modular reconfigurable robots
Reconstructive surgery
Fixed Bounding-Volume hierarchies don’t work
sec17
Instead, exploit what doesn’t change: chain topology
Adaptive BV hierarchies[Guibas, Nguyen, Russel, Zhang] [Lotan, Schwarzer, Halperin, Latombe] (SOCG’02)
sec17
Wrapped bounding sphere hierarchies[Guibas, Nguyen, Russel, Zhang] (SoCG 2002)
•WBSH undergoes small number of changes•Self-collision:O(n logn ) in R2 O(n2-2/d) in Rd, d 3
ChainTrees[Lotan, Schwarzer, Halperin, Latombe] (SoCG’02)
Assumption: Few degrees of freedom change at each motion step (e.g., Monte Carlo simulation)
Find all pairs of atoms closer than a given cutoff Find which energy terms can be reused
ChainTrees[Lotan, Schwarzer, Halperin, Latombe] (SoCG’02)
logN
O mm
43( )N
Updating:
Finding interacting pairs:
(in practice, sublinear)
ChainTreesApplication to MC simulation (comparison to grid method)
(68) (144) (374) (755) (68) (144) (374) (755)
m=1 m = 5
Future work: ChainTrees
Open problem: How to find good moves to make when the conformation is compact and random moves are rejected with high probability?
Run new series of experiments with more complex energy field: EEF1 [Lazaridis & Karplus] (with Pande) Use library of fragments (with Koehl)
Capture proximity information with a sparse spanner
3HVT
Future Work: Spanner for deformable chain[Agarwal, Gao, Duke; Nguyen, Zhang, Stanford]
Many degrees of freedomMany degrees of freedom
Tools to explore large dimensional conformation space:
- Sampling strategies - Nearest neighbors
Sampling structures by combining fragments[Kolodny, Levitt]
a
bc d
cabbbc
Library of protein fragments
Discrete set of candidate structures
Find k nearest neighbors of a given protein conformation in a set of n conformations (cRMS, dRMS)
a0
a1
am
a6
a5a4
a3
a2
Idea: Cut backbone into m equal subsequences
Nearest neighbors in high-dimensional space[Lotan and Schwarzer]
Nearest neighbors in high-dimensional space[Lotan and Schwarzer]
Full rep., dRMS (brute force) ~84h
Ave. rep., dRMS (brute force) : ~4.8h
SVD red. rep., dRMS (brute force) 41min
SVD red. rep., dRMS (kd-tree) 19min
100,000 decoys of 1CTF (Park-Levitt set)Computation of 100 NN of each conformation
~80% of computed NNs are true NNskd-tree software from ANN library (U. Maryland)
Ensemble properties of Ensemble properties of pathwayspathways
Stochastic nature of molecular motion requires characterizing average properties of many pathways
Example #1: Probability of Folding pfold
Unfolded set Folded set
pfold1- pfold
“We stress that we do not suggest using pfold as a transition coordinate for practical purposes as it is
very computationally intensive.” Du, Pande, Grosberg, Tanaka, and Shakhnovich “On the Transition
Coordinate for Protein Folding” Journal of Chemical Physics (1998).
HIV integrase[Du et al. ‘98]
Example #2: Ligand-Protein Interaction
[Sept, Elcock and McCammon `99]
10K to 30K independent simulations
vi
vj
Pij
Probabilistic Roadmap [Apaydin, Brutlag, Hsu, Guestrin, Latombe] (RECOMB’02, ECCB’02) Idea: Capture the stochastic nature of molecular motion by a network of randomly selected conformations and by assigning probabilities to edges
Pii
F: Folded setU: Unfolded set
Pij
i
k
j
l
m
Pik Pil
Pim
Let fi = pfold(i)After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm
=1 =1
One linear equation per node Solution gives pfold for all nodes
No explicit simulation run All pathways are taken into account Sparse linear system
Probabilistic Roadmap [Apaydin, Brutlag, Hsu, Guestrin, Latombe] (RECOMB’02, ECCB’02)
Probabilistic Roadmap
Correlation with MC Approach• 1ROP
(repressor of primer)• 2 helices• 6 DOF
Monte Carlo:
49 conformations Over 11 days ofcomputer time
Over 106 energy
computations
Roadmap:
5000 conformations 1 - 1.5 hours ofcomputer time
~15,000 energycomputations
~4 orders of magnitude speedup!
Probabilistic Roadmap
Computation Times (1ROP)
Future work: Probabilistic Roadmap
Non-uniform sampling strategies Encoding molecular dynamics into probabilistic roadmaps (with V. Pande) Quantitative experiments with ligand-protein binding (with V. Pande)
Bio-X – Clark CenterBio-X – Clark Center
The following slides relate to non-research issues.I do not plan to present them. Jack and Leo may want to use the contents of some of them for their own presentations.
• Tutorial on Delaunay, Alpha-Shape and Pockets (Koehl)
• A biocomputing Notebook (Koehl)• Biocomputation lectures in pre-existing classes:
– CS326 – motion planning: molecular motion, probabilistic roadmaps, self-collision detection (Latombe)
– CS468 – intro to computational topology: finding pockets and tunnels in molecules, compute surface areas and volumes and their derivative (Zomorodian)
• New class on Algorithmic Biology (Batzoglu, Guibas, Latombe)
• Graduate Curriculum Committee, Bio-Engineering Dept., Stanford (Latombe)
EducationEducation
PhD studentsSerkan Apaydin, EEAn Nguyen, Scientific ComputingCarlos Guestrin, CS (Daphne Koller’s group)Itay Lotan, CSRachel Kolodny, CSDaniel Russel, CSSamuel Ieong, CS
Trained Students (1/2)Trained Students (1/2)
Most graduate students have a principal advisor in CS and a secondaryone in a bio-related department (Levitt, Brutlag, Pande)
Graduated Master studentsRohit Singh, finding motifs in proteins, best Stanford CS master’s thesis, June ’02 [current position: bioinformatics company in San Diego]Chris Varma, study of ligand-protein interaction with probabilistic roadmaps, June ’02 [current position: PhD student, Harvard/MIT Biomedical program]
Current Master studentBen Wong, modeling T cell activity
UndergraduateEric Berger, CS, Stanford, summer internship Julie Greeberg, CS, Harvard, summer internship
Trained Students (2/2)Trained Students (2/2)
• Prof. Alberto MunozMath Dept., University of Yucatan, Mexico3 months, Summer’02Haptic interaction and probabilistic roadmaps
• Prof. Ileana StreinuSmith College6 months, from Sept.’02Protein folding
VisitorsVisitors
- Guibas and Levitt, with J. Milgram (Math): topology of configuration spaces of chains
- Guibas, with V. Pande (Chemistry) and D. Donoho (Statistics) non-linear multi-resolution analysis of molecular motions
- Latombe and Apaydin, with D. Brutlag (Biochemistry) and V. Pande: probabilistic roadmaps
- Latombe and Lotan with V. Pande: efficient MC simulation
Interactions Within StanfordInteractions Within Stanford
- Collision Detection for Deforming Necklaces, P. Agarwal, L. Guibas, A. Nguyen, D. Russel, and L. Zhang. Invited to special issue of Comp. Geom., Theory and Applications, following presentation at SoCG'02.- Kinetic Medians and kd-Trees, P. Agarwal, J. Gao, and L. Guibas. Proc. 10th European Symp. Algorithms, LNCS 2461, Springer-Verlag, 5-16, 2002.- Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion, M.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, and J.C. Latombe. Proc. RECOMB'02, Washington D.C., pp. 12-21, 2002. - Efficient Maintenance and Self-Collision testing for Kinematic Chains, I. Lotan, F. Schwarzer, D. Halperin, and J.C. Latombe, SoCG’02, pp. 43-42. June 2002.- Stochastic Conformational Roadmaps for Computing Ensemble Properties of Molecular Motion, M.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, and J.C. Latombe. Workshop on Algorithmic Foundations of Robotics (WAFR), Nice, Dec. 2002.
Interactions Outside StanfordInteractions Outside Stanford
- BCATS ‘01 and ‘02 [Bio-Computation At Stanford]
- RECOMB ’02 [Int. Conf. on Research in Computational Biology]
- ISMB ‘02 [Int. Conf. on Intelligent Syst. for Molecular Biology]
- ECCB 2002 [European Conf. on Computational Biology]
- Biophysical Society Symp. on Molecular Simulations in Structural Biology, 2002- SoCG 2002 [ACM Symp. on Computational Heometry]
Attendance to ConferencesAttendance to Conferences
- Latombe and Levitt serve as members of the Scientific Leadership Council of Stanford’s Bio-X program- Presentations: Stanford’s Bio-X Symposium (3/02), Stanford’s Computer Forum (3/02), Berkeley’s Broad Area Seminar (4/02)- Conference committees:
Guibas, program committee, WAFR’02 and SoCG’03 Latombe, program committee, 1st IEEE Bioinformatics Conf. ‘03
Apaydin, organization committee of BCATS’02
OutreachOutreach
The following slides are extra slides that I removed from my presentation for lack of time
General GoalsGeneral Goals
Larger proteins considered computational efficiency
Diversity of molecules and interactions computational abstractions
Extension of in-silico experiments computational correctness
Enable biological studies that were not possible before, more systematically
ApproachApproach
Select hard problemsClose interaction between computer scientists (Guibas, Koehl, Latombe) and biologists (Koehl, Levitt, Brutlag, Pande, Brunger)Most graduate students are CS students with secondary advisor in biologyPerform extensive tests
Electron density map Medial axis[Guibas, Brunger, Russel]
Medial axis of iso-surfaces to estimate backbone
Cleaning and simplification of axis to filter noise out
Persistence of features across multiple iso-surfaces
sec17
Continuous energy functionContinuous energy function
Essential for protein structure prediction and molecular motion simulation:- Statistical potentials based on alpha complex- Maintenance of energy values during simulation
Instead, exploit what doesn’t change: chain topology
Adaptive BV hierarchiesBalanced binary trees of constant
topologyEfficient repair of position/size of BVs
[Guibas, Nguyen, Russel, Zhang] [Lotan, Schwarzer, Halperin, Latombe] (SOCG’02)
sec17
Future Work:Spanner for deformable chain[Agarwal, Gao, Duke; Nguyen, Zhang, Stanford]
• 1ROP (repressor of primer)
• 2 helices• 6 DOF
• 1HDD (Engrailed homeodomain)
• 3 helices• 12 DOF
H-P energy model with steric clash exclusion [Sun et al., 95]
Probabilistic Roadmap
Recommended