47
Exploring Folding Landscapes with Motion Planning Techniques Bonnie Kirkpatrick Montana State University Dr. Nancy Amato Guang Song Xinyu Tang Texas A&M University Parallel Architectures, Algorithms, and Optimizations Laboratory

Exploring Folding Landscapes with Motion Planning Techniques

Embed Size (px)

DESCRIPTION

Exploring Folding Landscapes with Motion Planning Techniques. Bonnie Kirkpatrick Montana State University. Dr. Nancy Amato Guang Song Xinyu Tang Texas A&M University. Parallel Architectures, Algorithms, and Optimizations Laboratory. Outline. Motivation: Biopolymers - PowerPoint PPT Presentation

Citation preview

Page 1: Exploring Folding Landscapes with Motion Planning Techniques

Exploring Folding Landscapes with Motion Planning Techniques

Bonnie Kirkpatrick

Montana State University

Dr. Nancy Amato

Guang Song

Xinyu Tang

Texas A&M UniversityParallel Architectures, Algorithms, and Optimizations Laboratory

Page 2: Exploring Folding Landscapes with Motion Planning Techniques

Outline

Motivation: Biopolymers

Goal: Folding Landscapes

Method: Motion Planning

Application 1: Protein Folding

Application 2: RNA Folding

Page 3: Exploring Folding Landscapes with Motion Planning Techniques

Motivation: Biopolymers

Page 4: Exploring Folding Landscapes with Motion Planning Techniques

Protein and RNA Molecules

Protein and RNA molecules are a complex 3-dimensional folding of a sequence of bases.

Primary Structure– Sequence of bases– Each base is represented by a

letter of the alphabet– i.e. ACGUGCCAUCG– Obtained from experiment

Tertiary Structure– The sequence loops back on itself

and folds in 3-dimensions. Tertiary representation of an RNA molecule.

Page 5: Exploring Folding Landscapes with Motion Planning Techniques

Tertiary Structure

Chemical bonds (or contacts) form between complementary bases in close proximity.

There are many possible conformations of the primary sequence.

– Example sequence: CACAGAGUGU– Two possible conformations are shown.

Potential energy calculations based on number and types of bonds are used to classify conformations.

– The lowest energy conformation is known as the native structure.

– Conformations with few bonds and high energy are referred to as unfolded.

Two possible conformations of the sequence.

Bonds are blue.

Page 6: Exploring Folding Landscapes with Motion Planning Techniques

Goal: Folding Landscapes

Page 7: Exploring Folding Landscapes with Motion Planning Techniques

The Folding Process (The Black Box)

AGGCUACUGGGAGCCUUCUCCCC

Physical Laws

cause folding

Unfolded Conformation (high energy)

Native Conformation (low energy)

Page 8: Exploring Folding Landscapes with Motion Planning Techniques

Folding Landscapes

Description of the “black box” A space in which every point

corresponds to a conformation (or set of conformations) and its associated potential energy value (C-space).

A complete folding landscape contains a point for every possible conformation of a given sequence.

Tetrahymena Ribozyme Landscape [Russell, Zhuang, Babcock, Millett, Doniach, Chu, and Herschlag, 2002]

Native State

Conformation Space

Potential

Page 9: Exploring Folding Landscapes with Motion Planning Techniques

Folding Landscapes (cont.)

Conformational changes describe how a molecule changes physically to fold from one conformation to another– Continuous

Protein Folding Model Bond angles change with continuous rotations

– Discrete RNA Folding Model Bonds either exist or do not exist

Page 10: Exploring Folding Landscapes with Motion Planning Techniques

Native State

Features of Folding Landscapes

Folding pathways consist of the set of conformational changes a molecule is likely to fold though when moving from one conformation to another.

– N to X to Y Energy barriers are areas of the

landscape with high energy that separate groups of conformations.

– Y is separated from X and N Intermediate states are conformations

lying on the folding pathway represent local minimums of potential.

– Y and X Mutant α mRNA fragment [Chen and Dill, 2000]

Page 11: Exploring Folding Landscapes with Motion Planning Techniques

A Protein Folding Pathway

unfolded folded

Page 12: Exploring Folding Landscapes with Motion Planning Techniques

A RNA Folding Pathway

Phenylalanine tRNA [Hofacker, 1998]

Energy Barrier

Native State

Unfolded

Page 13: Exploring Folding Landscapes with Motion Planning Techniques

Mapping Folding Landscapes

• Existing techniques for mapping landscapes are limited to relatively short sequences (~200 nucleotides).

• A robotics motion planning technique called PRM has successfully been applied to protein folding.

Page 14: Exploring Folding Landscapes with Motion Planning Techniques

Method: Motion Planning

Page 15: Exploring Folding Landscapes with Motion Planning Techniques

Motion Planning

start

goal obstacles

(Basic) Motion Planning (in a nutshell):

Given a movable object, find a sequence of valid configurations that moves the object from the start to the goal.

Motion Planning for Foldable Objects:

Given a foldable object, find a valid folding sequence that transforms the object from one folded state to another.

Page 16: Exploring Folding Landscapes with Motion Planning Techniques

Native state

Construct the roadmap:1. Generate nodes.2. Connect to form roadmap

The Roadmap is like a net being laid down on protein’s potential landscape.

A conformation

Conformation space

Potential

Now the roadmap can be used:1. To find a path2. To extract multiple paths

Probabilistic Roadmap Method (PRM)

[Kavraki, Svestka, and Latombe, 1996]

Page 17: Exploring Folding Landscapes with Motion Planning Techniques

Application 1: Protein Folding

Page 18: Exploring Folding Landscapes with Motion Planning Techniques

Outline

Probabilistic Roadmap Methods (PRMs) for Protein Folding

– the native fold is known – bias sampling around native fold

Results– Protein folding landscapes– Secondary structure formation order and validation

timed contact map– Folding kinetics

Page 19: Exploring Folding Landscapes with Motion Planning Techniques

Model of a protein

[Song and Amato, 2001]

amino acid: pair of phi/psi angles protein: a sequence of amino acids.

– conformation node is:

Page 20: Exploring Folding Landscapes with Motion Planning Techniques

PRM: Node Generation N

Take advantage of the known native state.– map the potential landscape/funnel leading to it.– sample around it and gradually grow out. – generate conformations by randomly selecting phi/psi

angles

Criterion for accepting a node: Compute potential energy E of each node and retain it with probability P(E):

Page 21: Exploring Folding Landscapes with Motion Planning Techniques

PRM: Node Generation

Start with native structure. Gradually grow out.

Denser distribution around native state

Native state

Page 22: Exploring Folding Landscapes with Motion Planning Techniques

PRM: Roadmap Connection

1. Find k closest nodes for each roadmap node

2. Assign edge weight to reflect energetic feasibility:

lower weight more feasible [Singh, Latombe, Brutlag, 1999]

Native state

Page 23: Exploring Folding Landscapes with Motion Planning Techniques

Energy Computation

where

Potential (ref. Levitt’83)– van der Waals + hydrogen bonds + disulphide bonds +

hydrophobic effect– All-atom model

Free Energy (ref. Fiebig & Dill ’93, Munoz & Eaton’99)

Page 24: Exploring Folding Landscapes with Motion Planning Techniques

PRMs for Protein Folding: Key Issues

Validation– In RECOMB ‘01 (Song & Amato), our results

validated with hydrogen exchange experiments. [Li & Woodward 1999]

Energy Functions– The degree to which the roadmap accurately

reflects folding landscape depends on the quality of energy calculation.

Page 25: Exploring Folding Landscapes with Motion Planning Techniques

Analysis of Landscape

Folding Potential Landscape

Secondary structure formation order– timed contact map– experimental validation

Studying Folding Kinetics– 2-state folding kinetics– calculation of folding rates– identifying 2-state, 3-state, … k-state kinetics

Page 26: Exploring Folding Landscapes with Motion Planning Techniques

Distributions for different types:Potential Energy vs. RMSD for roadmap nodes

all alpha alpha + beta all beta

Page 27: Exploring Folding Landscapes with Motion Planning Techniques

Timed Contact Map:formation order for a Path

protein G (domain B1)

(IV: 1-4)

140 143

140 143 140

141 142 144

139 143 143 114

142135

131

1-4

3-4Average T = 142

Formation order:, 3-4, 1-2, 1-4

residue #

resid

ue #

1-2

Page 28: Exploring Folding Landscapes with Motion Planning Techniques

Validating Folding Pathways

Protein GB1 (56 amino acids)— 1 alpha helix & 4 beta-strands

Hydrogen Exchange Results first helix, and beta 3-4

Our Paths 80%: helix, beta 3-4, beta 1-2, beta 1-420%: helix, beta 1-2, beta 3-4, beta 1-4

[Li & Woodward 1999]

• our paths are: from all the nodes with little structure to the native state• secondary structure formation order checked on each path w/ timed contact map

Page 29: Exploring Folding Landscapes with Motion Planning Techniques

Secondary structure formation order and validation

•Proteins primarily from [Munoz & Eaton PNAS’99] for comparison purposes

•Contact us if you want us to analyze your proteins!

PDB name Num of Residues

2nd structures Comparison w/ Exp.

[Li & Woodward ’99]

1GB1 56 1 alpha + 4 beta Agreed

1BDD 60 3 alpha Agreed

1SHG 62 5 beta N/a

1COA 64 1 alpha + 4beta Agreed

1SRL 64 5 beta N/a

1CSP 67 7 beta N/a

1NYF 67 3 beta N/a

1MJC 69 7 beta N/a

2AIT 74 7 beta N/a

1UBQ 76 1alpha + 5 beta Agreed

1PKS 79 1 alpha + 5 beta N/a

1PBA 81 3 alpha + 3 beta N/a

2ABD 86 5 alpha N/a

1BRN 110 3 alpha + 7 beta Not sure

Page 30: Exploring Folding Landscapes with Motion Planning Techniques

Folding kinetics:statistical mechanical model

Proteins treated as statistical system Define interactions => partition function Free energy as a function of reaction coordinate (R) Then decide folding kinetics and folding rate [Munoz & Eaton, Alm & Baker. PNAS’99]

Assumption and limitation of statistical model– Very limited interactions to simplify partition function calculation– Assume selected reaction coordinate good (monotonically increasing)– Cannot provide folding trajectories

Strength: as a theoretical model, it is good for analysis

R

FU

N

Page 31: Exploring Folding Landscapes with Motion Planning Techniques

Free Energy Landscape:2-state folding kinetics

Our method can produce similar results (plus more).– 2-state folding kinetics indicated

Both plots ‘imply’ nativelikeness should always increase Plots like these lose info because of averaging effect

statistical model [Munoz & Eaton PNAS’99]

Native ContactsF

ree

Ene

rgy our

roadmapmodel

Blue line: free energy average Blue, red and green lines are three levels of approximation

Page 32: Exploring Folding Landscapes with Motion Planning Techniques

Studying folding kinetics at pathway level

A B

cluster paths into several groups

extract 2-state,3-state, …, k-state kinetics from same roadmap

not possible with statistical mechanical models

A B Average

2-state 3-state 2-state

Page 33: Exploring Folding Landscapes with Motion Planning Techniques

Studying folding kinetics at pathway level

Native contacts

Free energy

• trajectories not available from statistical model

• native contact not monotonically increasing

•Diverse free energy profiles

Protein G

Page 34: Exploring Folding Landscapes with Motion Planning Techniques

Protein Folding:Conclusion & Future Work

• PRM roadmaps approximate folding landscapes• Efficiently produce multiple folding pathways

– Secondary structure formation order– better than trajectory-based simulation methods, such as

Monte Carlo, molecular dynamics

• Provide a good way to study folding kinetics – multiple folding kinetics in same landscape (roadmap)– natural way to study the statistical behavior of folding– more realistic than statistical models (e.g. Lattice models,

Baker’s model PNAS’99, Munoz’s model, PNAS’99)

Page 35: Exploring Folding Landscapes with Motion Planning Techniques

Application 2: RNA Folding

Page 36: Exploring Folding Landscapes with Motion Planning Techniques

Outline

RNA Model using secondary structure Conformation Space Node Generation Node Evaluation Node Connection Edge Weights

Page 37: Exploring Folding Landscapes with Motion Planning Techniques

RNA Secondary Structure

Two-dimensional representation of the tertiary structure Planar representation Sufficient structural information

Pseudo knots are considered a tertiary structure, rather than a secondary structure

Page 38: Exploring Folding Landscapes with Motion Planning Techniques

Violates criteria (2)Violates criteria (1)

Secondary Structure Formalized

A secondary structure conformation is specified by a set of intra-chain contacts (bonds between base pairs) that follow certain rules.

Given any two intra-chain contacts [i, j] with i < j and [k, l] with k < l, then:1) If i = k, then j = l

• Each base can appear in only one contact pair

2) If k < j, then i < k < l < j• No pseudo-knots

Page 39: Exploring Folding Landscapes with Motion Planning Techniques

PRM: Conformation Space

Let U be the set of every possible combination of contact pairs.

Let C-space (the conformation space), C, be the sub-set of U containing only valid secondary structures.

C-space is smaller than U, but is still very large.– Sequence: (ACGU)10

– Length: 40 nucleotides– C-Space: 1.6x108 structures

Purpose: generate nodes in C-space that describe the space without covering it

C-space

Where n is the number of possible contact pairs

Page 40: Exploring Folding Landscapes with Motion Planning Techniques

PRM: Node Generation

Random Node Generation Algorithm– Starting with an empty configuration, c, random contacts

are added to c one at a time.– Each step preserves the condition that c contains a valid set

of base pair contacts.– Contacts are added until there are no more contacts that do

not conflict with the contact set of c. Every node generated has valid secondary structure

and is a member of C-space. Since every generated node has the maximal

number of contacts, the sampling is biased toward the area of C-space near the native state.

C-space

Page 41: Exploring Folding Landscapes with Motion Planning Techniques

PRM: Node Evaluation

Evaluation of Nodes– Potential energy determines how good a node is.– Only add a node to the roadmap if it has a low

energy.– Probability of a node q being added to the

roadmap:

Page 42: Exploring Folding Landscapes with Motion Planning Techniques

PRM: Node Connection

Given two nodes in C-Space, C1 and C2, find a path between them consisting of a sequence of nodes: { C1 = S1, S2, …, Sn-1, Sn = C2 }

The path must have the property that for each i, 1 < i < n, the set of contact pair of Si differs from that of Si-1 by the application of one transformation operation:

(1) open or (2) close a single contact pair.

C1 = S1 S2 Sn-1 Sn = C2…

Page 43: Exploring Folding Landscapes with Motion Planning Techniques

Node Connection (cont.)

There exists a path between any two nodes in C-Space.

Not just any path will do; we want a good one. Bad paths have high energy nodes in them. How do we find the lowest energy path?

Page 44: Exploring Folding Landscapes with Motion Planning Techniques

Node Connection (cont.)

more contacts less potential energy Heuristic: if a contact is opened by the

transition from one node to another, try to close a contact in the next transition

c1 = s1: ..(.((..))).. open

s2: ..(.(....)).. close

s3: ..(.((.).)).. open

s4: ..(..(.)..).. close

C2 = s5: ..(.((.)).)..

Page 45: Exploring Folding Landscapes with Motion Planning Techniques

Edge Weight

Depends on the nodes generated in the node connection phase.

Difference in potential energy ΔEi = E(si+1) – E(si)

Page 46: Exploring Folding Landscapes with Motion Planning Techniques

Future Work

Analysis of the roadmap– Finding the low energy folding pathways– Shortest path algorithm

Validation– How do we know if our results agree with experimental results?– Proposal

Compare a fully enumerated roadmap to experimental folding rates Compare a more sparse roadmap with a fully enumerated roadmap

– Proposal Solve the master equation using stacking pairs (they are representative of all

the dynamics) and our model Compare our results with results from the Zhang and Chen’s statistical

mechanical model [2002]

Page 47: Exploring Folding Landscapes with Motion Planning Techniques

References

Shi-Jie Chen and Ken A. Dill. Rna folding energy landscapes. PNAS, 97:646-651, 2000.Ivo L. Hofacker. Rna secondary structures: A tractable model of biopolymer folding. J.Theor.Biol.,

212:35-46, 1998.Ivo L. Hofacker Jan Cupal and Peter F. Stadler. Dynamic programming algorithm for the density of

states of rna secondry structures. Computer Science and Biology 96, 96:184-186, 1996.L. Kavraki, P. Svestka, J. C. Latombe, and M. Overmars. Probabilistic roadmaps for path planning in

high-dimensional conguration spaces. IEEE Trans. Robot. Automat., 12(4):566-580, August 1996.J. C. Latombe. Robot Motion Planning. Kluwer Academic Publishers, Boston, MA, 1991.R. Li and C. Woodward. The hydrogen exchange core and protein folding. Protein Sci., 8:1571-1591,

1999.John S. McCaskill. The equilibrium partition function and base pair binding probabilities for rna

secondary structure. Biopolymers, 29:1105-1119, 1990.Ruth Nussinov, George Piecznik, Jerrold R. Griggs, and Danel J. Kleitman. Algorithms for loop

matching. SIAM J. Appl. Math., 35:68-82, 1972.R. Russell, X. Zhuang, H. Babcock, I. Millet, S. Doniach, S. Chu, and D. Herschlag. Exploring the

folding landscape of a structured RNA. Proc. Natl. Acad. Sci., 99:155-60., 2002.Proc. Natl. Acad. Sci. U.S.A. 99, 155-60.D. Sanko and J.B. Kruskal. Time warps, string edits and macromolecules: the theory and practice of

sequence comparison. Addison Wesley, London, 1983.A.P. Singh, J.C. Latombe, and D.L. Brutlag. A motion planning approach to exible ligand binding. In

7th Int. Conf. on Intelligent Systems for Molecular Biology (ISMB), pages 252-261, 1999.G. Song and N. M. Amato. Using motion planning to study protein folding pathways. In Proc. Int. Conf.

Comput. Molecular Biology (RECOMB), pages 287-296, 2001.Stefan Wuchty. Suboptimal secondary structures of rna. Master Thesis, 1998.