39
Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University {amato ,gsong ,burchanb }@cs.tamu.edu http://www. cs . tamu . edu /faculty/ amato

Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

  • View
    216

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Using Motion Planning to Study Ligand Binding and Protein Folding

Nancy Amato,Guang Song and Burchan Bayazit

Department of Computer ScienceTexas A&M University

{amato,gsong,burchanb}@cs.tamu.eduhttp://www.cs.tamu.edu/faculty/amato

Page 2: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Given: an environment (descriptions of moveable object A and obstacles B), and start and goal positions of A

Find: a valid path (continuous sequence of valid configurations of A) from start to goal

Motion Planning

start

goal

obstacles

Page 3: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Motivation: Paper Folding

Box: 12 => 5 dof Periscope: 11 dof

A Motion planning approach to ligand binding. Singh, Latombe, Brutlag, ISMB’99

Page 4: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Outline

• PRMs for Computational Biology (PRM: Probabilistic Roadmap Method)

—Conformation Space—Modeling—Potential Functions—Roadmap Construction

• Protein Folding Pathways

• Ligand Binding

Page 5: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

1. Connect start and goal to roadmap

Query processingstart

goal

Probabilistic Roadmap Methods (PRMs)[Kavraki, Svestka, Latombe,Overmars 1995]

C-obst

C-obst

C-obst

C-obst

Roadmap Construction (Pre-processing)

2. Connect pairs of nodes to form roadmap - simple, deterministic local planner (e.g., straightline) - discard paths that are in collision (collision check)

1. Randomly generate configurations (nodes) - discard nodes that are in collision (collision check)

C-obst

2. Find path in roadmap between start and goal - regenerate plans for edges in roadmap

Configuration Space

Page 6: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Node Generation

Node Connection (build edges)

Query

System Structure of PRM: object oriented approach

Node Validation

PRM motion planning framework

Dynamics

Local Connection

Potential FunctionsCollision Detector Collision-freeedge

Page 7: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Motion Planningin robotics and computational biology

Configuration Space

A robot

Collision Detector

Roadmap Construction(A connectivity graph)

Query (find collision-free path)

Conformation Space

Model ‘ligand’ or protein as articulated robots

Potential Calculation

Roadmap Construction(A connectivity graph)

Query (find energetically feasible path)

Although the problems appear different, they can both use the same motion planning framework, which works in the abstract C-space.

Page 8: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

PRM : Node Generation

1. Randomly generate a conformation, determine all atoms’ coordinates.

2. compute potential energy E of conformation and retain node with probability P(E): ( note, this is just one way to do it.)

Page 9: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

PRM: Roadmap Connection1. Find k closest nodes for each roadmap node

2. Calculate weight of straightline path [Singh, Latombe, Brutlag, 1999]

where Pi is probability of moving between intermediate configurations i and i+1

Low weight <=> more energetically feasible.

Page 10: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

PRM: Querying the Roadmap ( A 10-ALA folding case)

startgoal • Add start and goal to

roadmap

1. Extract smallest weight path (energetically most feasible) between them

Page 11: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

I. Protein Folding Pathways

Page 12: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Key Issues

• Goal: Study folding pathways to the known native fold.

• Validation—PRM roadmaps give us folding pathways, but how can we

test if they are (close to) the natural folding path?

• Potential Functions— The paths in the PRM roadmaps are selected based on the

potential function used to create the map.— How accurate does the potential function need to be to

produce roadmaps with good paths?• More accurate => more time and fewer nodes (less coverage)

Page 13: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Validation of Folding Pathways• Compare the order that secondary structures

form in our paths with experimental results— We used pulse labeling & N-state exchange results from

Woodward et al. 1999 as suggested by Dr. Marty Scholtz at Texas A&M.

Protein GB1— 56 residues— 1 alpha helix— 4 beta-strands

Pulse-labeling result • alpha helix (yellow) and beta-4 (red, and beta-3) form first

Page 14: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Potential Functions:Goal: simple, yet accurate enough for PRMs

Strategy: study importance of various terms and design a potential for PRM usage

— may also yield insight into which terms are most influential in governing the folding process

Potentials we have used (derivations of the potential function in Levitt 1983)— van der Waals only— van der Waals + hydrogen bonds— van der Waals + hydrogen bonds + disulphide bonds— … + hydrophobic effect

Page 15: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Protein G results:

PHI vs. PSI distribution.Nodes sampled around native fold using a set of normal distributions: {5,10,20,40, 80,160}.

Protein G’s native fold has one alpha-helix & four beta strands, which is reflected here by the dense samplesaround alpha & beta regions.

Page 16: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

RMSD is the distance from protein G’s native fold.

The sample (10000 nodes) suggests a funnel structure around the native fold.

Distribution of roadmap nodes :

Page 17: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Peaks show where some atoms are close and Van der Waals termdominates (which is approximated by a constant).

Bigger roadmaps have smoother paths.

Peaks right before goalindicate the tight packing of the native fold.

Potential Profiles for different size of roadmaps:

Page 18: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

While some peaks result from local sparse distribution of nodes, to optimizethe final folding path,more samplesaround the peakscan improve the path.

More samples around the peaks

Page 19: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Folding pathways for different start conformations

d e f

a b c

Letter labels Numerical Labels

: Folding from the extended conformation (reverse order).: Folding path from a random node in the roadmap.

Page 20: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Protein GB1 Movie

Page 21: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Results of Protein A

Nodes are sampled around the native fold using a setof normal distributions.

Protein A is an all-alphaprotein, which is shownby dense sampling at the alpha region.

Page 22: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

The overall ‘funnel’ shape is different from that of protein G, which possibly reflects different folding behaviors.

Note also the the narrow ‘funnel’ for RMSD< 10 A.

This may suggest that region contains only the packing of secondary structures to native fold (therefore potential changes little)

Protein A: Potential vs. RMSD distribution

Page 23: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

• the packing of secondary structure. • Both paths share some segments near the goal, suggesting the folding path near the native fold may be quite independent of start conformations.

Folding from different start conformations

a b c

d e f

Page 24: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Protein Folding pathways:Summary of Preliminary Results

• Our results seem to be in accordance with the pulse-labeling experiments for proteins GB1and A— more proteins must be studied

• Potential Energy Function…—van der Waals terms + Hydrogen & disulphide bonds can

capture some (high level) characteristics of the folding process— but they are probably not enough - what next?

• How can we analyze the paths contained in the PRM roadmaps?

http://www.cs.tamu.edu/faculty/amato/dsmft/

Page 25: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

II. Ligand Binding

Page 26: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Ligand Binding• Automated docking algorithms

— AutoDock,Dock,FlexX,FLOG,FTDock,Gold, etc.— Often simplified by rigid ligand assumption

• PRM Approach (Singh, et.al., 1999)

—rapidly explores high dimensional space—We use PRM variant better suited for ligand binding

• User-Computer interaction— haptics (sense of touch) helps the user to understand

molecular interaction— User may improve the PRM by suggesting candidate

site regions.

Page 27: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Our Approach

• Generate Binding Site Canditates—Generate sample nodes (automated or user collected)—Push nodes to local minima—Connect nodes

• Recognize Binding Sites—Choose largest connected component ( accessibility)—Discard nodes with potential larger than Emax

—Score remaining nodes

Page 28: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Generate Candidates:Automatic Node Generation

• Generate a collision free base

• Generate random values for other joint angles

• Keep this configuration if the potential is less than Emax

Protein

Ligandbase

Page 29: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Grid Potential/Force

• Create a grid in space

• Calculate the contribution of protein atoms to each grid point

• Precalculate

• Real-Time• Find the grid points where ligand

atoms are located

• Calculate the potential/force

Page 30: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

PHANToM

•User attaches haptic device to ligand, and moves it around

•user feels the forces on ligand•ligand is rigid• force calculation is too slow, so use extrapolation techniques (grid potential)

•Ligand configurations (candidate sites) passed to planner

• automatically sampled at regular intervals

Generate Canditates: Haptic Interaction

Page 31: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Approximate Gradient Descent

• Push nodes to local minima• For each node sample n

close nodes• Choose the node with

lowest potential among them

• Repeat until local minima or iteration limit is reached

Page 32: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Recognizing Binding Sites• Pick low energy configurations in the largest

connected component

• For each node in the selected configuration, uniformly sample n configurations within a distance r (i.e., construct a local roadmap)

• Find the score of the selected configuration as the average potential of the sampled configurations (local roadmap)

Page 33: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Potential v.s. Scoring Function

1STP(protein=streptavidin, ligand (11 dof) =biotin)

Potential Score

Page 34: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Experiments

Questions:

•What is the effect of rigid vs. flexible ligands?

•Can OBPRM identify binding sites?

•Can user provide helpful information?

Page 35: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Experiments•First Experiment

•treat ligand as rigid body

•Second Experiment•user collects and automated planner fine tunes

•Third Experiment•treat ligand as articulated (i.e., flexible)

Complexes(protein:ligand:degree of freedom)•1A5Z:L-Lactate Dehyrogenase:Oxamate:7•1LDM: M4-Lactate Dehyrogenase:Oxamate:7•1STP:Streptavidin:Biotin:11 } In Singh,et.al., 1999

Page 36: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Results (1A5Z and 1STP)

0

500

1000

1500

2000

2500

time (sec)

RigidUserArticulated

Rigid User Articulated

Timing

.Rigid.User .Articulated.Binding

RMSD vs Score (1A5Z, 7dof)

0

500

1000

1500

2000

2500

time (sec)

RigidUserArticulated

Rigid User Articulated

Timing

.Rigid.User .Articulated.Binding

RMSD vs Score (1STP, 11dof)

Page 37: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Results (1LDM)

0

100

200

300

400

500

600

700

time (sec)

User Articulated

User Articulated

Timing

RMSD vs Score (7dof)

.User .Articulated.Binding

Page 38: Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University

Conclusion•In our examples, we could generate and identify configurations in the true binding site

•Scoring function may be improved

•Our results may used as input for other automated docking programs

•User input improves efficiency, and haptic feedback helps the user better understand the problem

•We need a better user interface