30
INFORMS 2004 Hyun-suk Yoon Joel Sokol School of Industrial and Systems Engineering Georgia Institute of Technology Optimization Approaches to HP Lattice Protein Folding

INFORMS 2004

Embed Size (px)

DESCRIPTION

INFORMS 2004. Optimization Approaches to HP Lattice Protein Folding. Hyun-suk Yoon Joel Sokol School of Industrial and Systems Engineering Georgia Institute of Technology. Table of contents. Introduction to Protein Folding Integer Programming (IP) Approach - PowerPoint PPT Presentation

Citation preview

Page 1: INFORMS 2004

INFORMS 2004

Hyun-suk Yoon

Joel Sokol

School of Industrial and Systems Engineering

Georgia Institute of Technology

Optimization Approaches toHP Lattice Protein Folding

Page 2: INFORMS 2004

Table of contents

Introduction to Protein Folding

Integer Programming (IP) Approach

Introduction to Constraint Programming

(CP)

CP Approach

Discussion

Page 3: INFORMS 2004

Protein

• Sequence of amino acids

• Size: 30 ~ 10,000 amino acids,

a few hundred amino acids

on average

• Fold into a 3D compact structure

quickly in minimum energy state.

• Exponential number of possible

3D structures.

Page 4: INFORMS 2004

Problem description

How can we find a 3D structure of a protein

given a sequence of amino acids?

Page 5: INFORMS 2004

Motivation

1. Design drugs

• Most drugs work by attaching themselves to a protein

• Knowing 3-D shapes of proteins will help to design drugs. 2. Detect misfolding

• Proteins occasionally may not have the correct 3-D shapes.

• Misfolded proteins are known as the causes of a number of

diseases, i.e., Alzheimer’s disease and Parkinson’s disease.

Page 6: INFORMS 2004

Protein folding

How to figure out protein folding

• Experimental techniques: X-ray crystallography and NMR

spectroscopy

• Computational techniques: i.e., Folding@Home Protein Data Bank (PDB)

• http://www.rcsb.org/pdb

• Worldwide repository for 3-D structure data of large

molecules of proteins and nucleic acids.

Page 7: INFORMS 2004

HP model

• Hydrophobic or Polar

• 20 types of amino acids:

8 H’s and 12 P’s

Lattice model

• Locate each amino acid on a point of a cubic lattice.

• Parity problem: triangular or diagonal lattice model.

HP model and Lattice model

Page 8: INFORMS 2004

• HP model + Lattice model: the simplest protein model

- Advantage: use enumeration techniques to locate amino

acids.

- Disadvantage: low resolution, no explicit local interactions,

equal bond length

• Lau and Dill (1989): minimizing total energy in the HP

lattice model = maximizing the number of H-H contacts.

HP lattice model

Page 9: INFORMS 2004

Example of HP lattice model

Hydrophobic amino acid

Polar amino acid

Peptide bond

H-H contacts

Number of H-H contacts

= Number of adjacencies between hydrophobic amino acids

(except for peptide bonds)

Page 10: INFORMS 2004

Literature review

Protein topology

• Levitt and Chothia (1976) represent 2D structural topology of protein in a diagrammatic form.

• Richardson (1977) shows the first systematic survey of protein topology.

HP lattice model

• Lau and Dill (1989) study a HP model on the square and cubic lattice.

• Berger and Leighton (1998) and Crescenzi et al. (1998) prove that HP lattice model is NP-complete.

Page 11: INFORMS 2004

Table of contents

Introduction to Protein Folding

Integer Programming (IP) Approach

Introduction to Constraint Programming

(CP)

CP Approach

Discussion

Page 12: INFORMS 2004

General model

Max The number of H-H contacts

s.t. 1. (Assignment) Each amino acid must occupy one

lattice point.

2. (Non-overlapping) No two amino acids may share

the same lattice point.

3. (Connectivity) Every two amino acids that are

consecutive in the protein's sequence must also

occupy adjacent lattice points.

Page 13: INFORMS 2004

Two IP models

(0,0)

• Model IP-1: Uses the coordinate of each amino

acid.

• Model IP-2: Uses the direction (Up, Down, Left,

Right).

(0,1) (1,1)

2Up Righ

t

3

1

2 3

1

Page 14: INFORMS 2004

• Often use 2-D model instead of 3-D and attempt to extend

2-D into 3-D.

• Easily extend 2-D into 3-D in our models

- Model IP-1: (x,y) (x,y,z)

- Model IP-2: add two more directions – forward, backward.

2-D vs 3-D

Page 15: INFORMS 2004

Solving IP Models

Defining decision

variables

Formulating the

problem

Preprocessing

Running it with

CPLEX

Max

s.t.

ji d

ijdy

jixk

ijk ,1 (Assignment) kx

i jijk 1

(Non-overlapping)

ijdijk yx ,binary

djixhyxhyk

kdijkijdk

ijkkijd ,,, ,)(

kjixxxxx kjikjikjikjiijk ,,0)1)(1()1)(1()1()1()1()1(

(Connectivity)

(Define y)

xijk = 1 if kth amino acid is located at (i,j),

0 otherwise.

yijd = 1 if two amino acids in (i,j) and in (i,j)+d are

both adjacent,

0 otherwise.

Page 16: INFORMS 2004

Computational results

Instance: 1PSV

• 28 amino acids: one of the smallest human proteins.

• Obtained data from PDB.

• Truncate to different sizes: 12, 18, 23, 28.

• Optimal solution:

Page 17: INFORMS 2004

Computational results (cont)

• CPLEX Running times (seconds)

- IP does not work well.

- Take a long time to solve 23 and 28 amino acids

instances.

N = 12

N = 18 N = 23 N = 28

IP-1 9.32 72.61 30000+ 30000+IP-2 13.85 30000+ 30000+ 30000+

Page 18: INFORMS 2004

IP did not work well

• Why?

- High degeneracy: there are a lot of structures having

the same minimum energy.

- Symmetry: IP formulation contains much symmetry.

• CP is known better than IP where IP formulation contains

much symmetry.

• So move on to CP.

Page 19: INFORMS 2004

Table of contents

Introduction to Protein Folding

Integer Programming (IP) Approach

Introduction to Constraint Programming

(CP)

CP Approach

Discussion

Page 20: INFORMS 2004

Concepts of CP

Constraint programming (CP)

• Study of modeling and solving a system of logical

constraints using search techniques.

• Began in the 1980s as part of artificial intelligence

research.

• Two main procedures: domain reduction and constraint

propagation

Page 21: INFORMS 2004

CP vs IP

• Advantages and disadvantages

• Unified methodologies with CP and IP have been

designed in recent years.

Advantages Disadvantages

CP More expressive,More effective in some cases

Less predictable,A lower bound may not exists.

IP A lower bound always exists.

Less expressive

Page 22: INFORMS 2004

CP previous research

• Smith (1996) shows environments where CP may work

better than IP.

• Barták (1999), Smith (1995), ILOG Solver 5.0 manual

(2000) show CP’s successful accomplishments in many

applications.

• Easton (2003) and Milano (2004) deal with combining

CP and IP.

Page 23: INFORMS 2004

• Model CP-1, CP-2: Use the direction (Up, Down, Left,

Right).

• Model CP-3: Uses the combination of coordinates.

Three CP models

2Up Righ

t

3

1

203+1 = 1 (0,1) (1,1) 13+1 =43

103+0 = 0 (0,0)

Page 24: INFORMS 2004

Models Description

Model CP-1Similar as IP models, but use max function and if-

then

function. Model CP-2Similar to CP-1 and makes the formulation simpler

using

Boolean function and absolute value. Model CP-3Use the alldifferent function.

Page 25: INFORMS 2004

How to solve the problem faster

CP strategies to solve the problem faster

• Use a known solution.

• Fix the direction from the first amino acid to the

next.

• Any two amino acids which have an even distance

cannot be adjacent.

• Two amino acids have an upper bound on their

distance.

• Variable ordering: Choose first the variables with

the smallest domain.

Page 26: INFORMS 2004

Computational results

• Same instance as IP (1PSV): 12, 18, 23, 28 amino

acids.

• Use ILOG Solver to run CP.

N = 23 N = 28N = 23 N = 28

Page 27: INFORMS 2004

Computational result - IP vs CP

• IP vs CP best running times (seconds)

- Models used: IP IP-1, CP CP-1 (with strategies).

- CP is faster than IP with our models.

IP (CPLEX) CP (Solver)

N = 12

9.32 0.18

N = 18

72.61 18.83

N = 23

30,000+ 7347.74 (= 2 hrs)

N = 28

30,000+ 209,127.89 (= 58 hrs)

Page 28: INFORMS 2004

Proposed research

1. Try other CP approaches such as dual modeling and

dynamic variable ordering.

2. Consider an unified methodology of IP and CP

- Decompose the problem, and apply IP to one part and

CP to the other part.

3. Attempt other approaches such as heuristic algorithm to

find better bounds.

Page 29: INFORMS 2004

Contribution

2. Biological field

• Success of our research can help in

the prediction of 3-D protein

structures, which may assist in

medical development.

1. Optimization field

• Help to show how CP can be an

alternative to or a complement of IP.

Page 30: INFORMS 2004

Any questions?

Hyun-suk Yoon

Industrial and Systems Engineering, Georgia Tech

[email protected]