6
Determining Protein Backbone from H and H-alpha Short Interatomic Distances PEDRO NUCCI Dept. of Information Technology Navy Arsenal of Rio de Janeiro Pça. Barão de Ladário, ed.11 Rio de Janeiro, RJ BRAZIL [email protected] LOANA T. NOGUEIRA Computing Institute Fluminense Federal University R. Passo da Pátria, 156 Niterói, RJ BRAZIL [email protected] CARLILE LAVOR Applied Mathematics Institute State University of Campinas R. Sérgio B. de Holanda, 651 Campinas, SP BRAZIL [email protected] Abstract: In this work, we deal with a method based on the Discretizable Molecular Distance Geometry Problem (DMDGP), that consists in determining coordinates of a molecule satisfying some inter-atomic distances. Aiming practical applicability in Nuclear Magnetic Resonance protein spectroscopy, we present a sequence of atoms not chemically bonded, that can be solved by using only distances between hydrogen atoms. Once determined the structures for this chain, one determines the backbone of the protein. Differently from the existent one in the literature, our method has no use of H β atoms (which have less precise measures), has smaller cost for exploring the search space and an easier process of determining the backbone. Keywords: Protein Structure Calculation, Nuclear Magnetic Resonance, Molecular Distance Geometry Prob- lem. 1 Introduction In the Discretizable Molecular Distance Geometry Problem (DMDGP) [1], the main goal is to find feasible tridi- mensional structures of a molecule, given some of its inter-atomic distances, when the set of pairs of atoms with known distance satisfities some specific assumptions. In practice, the DMDGP has strong suitability to protein NMR spectroscopy, once this technique can provide distance conformational restrains compatible with the prob- lem [5]. The DMDGP is classified as NP-complete [1] and a Branch-and-Prune (BP) algorithm is proposed in [4] for solving it. Many works in the literature have tried to use the DMDGP in order to solve protein backbones directly, but problem restrictions required the knowledge of inter-atomic distances that, in practice, would be difficult to obtain. However, recently, it was identified in the literature [5] a specific sequence of atoms, composing an "artificial chain" that occurs along the residues of any usual polypeptide, with three important properties: (1) it consists of hydrogen atoms only, (2) the set of inter-atomic distances inside a cut-off radius compatible with NMR experiments satisfies all DMDGP assumptions, yielding problem instances, and, (3) from the hydrogen atom coordinates in each solution for the artificial chain, it is possible to obtain the coordinates for the protein backbone atoms. In this paper, we identify a new artificial chain, which inter-atomic distances are detectable in most basic homonuclear 1 H NMR experiments, as well as an algorithm that solves it. We compare it with the artificial chain from the literature in both theoretical and practical sides, with computational experiments against real proteins. 2 An extension of the DMDGP The artificial chain we will present in this work has particular features which the DMDGP cannot handle, moti- vating us to extend the problem in what follows. Given a sequence of atoms 1, 2,..., n, a set S 1 of atom pairs (i, j) which euclidean distance d i, j has known value δ i, j , and a set S 2 of atom pairs (i, j) which euclidean distance d i, j is one of the two values δ 0 i, j and δ 00 i, j , the Extended DMDGP (DMDGP e ) consists of finding {x 1 , ..., x n }∈ R 3 , the cartesian coordinates for each atom, such that kx i - x j k = δ i, j , (i, j) S 1 and kx i - x j k∈ n δ 0 i, j , δ 00 i, j o , (i, j) S 2 , 1 Advances in Biology, Bioengineering and Environment ISBN: 978-960-474-261-5 43

Determining Protein Backbone from H and H-alpha Short ... · Determining Protein Backbone from H and H-alpha Short Interatomic Distances PEDRO NUCCI Dept. of Information Technology

  • Upload
    lamthu

  • View
    221

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Determining Protein Backbone from H and H-alpha Short ... · Determining Protein Backbone from H and H-alpha Short Interatomic Distances PEDRO NUCCI Dept. of Information Technology

Determining Protein Backbone fromH and H-alpha Short Interatomic Distances

PEDRO NUCCIDept. of Information TechnologyNavy Arsenal of Rio de JaneiroPça. Barão de Ladário, ed.11

Rio de Janeiro, RJBRAZIL

[email protected]

LOANA T. NOGUEIRAComputing Institute

Fluminense Federal UniversityR. Passo da Pátria, 156

Niterói, RJBRAZIL

[email protected]

CARLILE LAVORApplied Mathematics InstituteState University of CampinasR. Sérgio B. de Holanda, 651

Campinas, SPBRAZIL

[email protected]

Abstract: In this work, we deal with a method based on the Discretizable Molecular Distance Geometry Problem(DMDGP), that consists in determining coordinates of a molecule satisfying some inter-atomic distances. Aimingpractical applicability in Nuclear Magnetic Resonance protein spectroscopy, we present a sequence of atoms notchemically bonded, that can be solved by using only distances between hydrogen atoms. Once determined thestructures for this chain, one determines the backbone of the protein. Differently from the existent one in theliterature, our method has no use of Hβ atoms (which have less precise measures), has smaller cost for exploringthe search space and an easier process of determining the backbone.

Keywords: Protein Structure Calculation, Nuclear Magnetic Resonance, Molecular Distance Geometry Prob-lem.

1 Introduction

In the Discretizable Molecular Distance Geometry Problem (DMDGP) [1], the main goal is to find feasible tridi-mensional structures of a molecule, given some of its inter-atomic distances, when the set of pairs of atoms withknown distance satisfities some specific assumptions. In practice, the DMDGP has strong suitability to proteinNMR spectroscopy, once this technique can provide distance conformational restrains compatible with the prob-lem [5]. The DMDGP is classified as NP-complete [1] and a Branch-and-Prune (BP) algorithm is proposed in [4]for solving it.

Many works in the literature have tried to use the DMDGP in order to solve protein backbones directly,but problem restrictions required the knowledge of inter-atomic distances that, in practice, would be difficultto obtain. However, recently, it was identified in the literature [5] a specific sequence of atoms, composing an"artificial chain" that occurs along the residues of any usual polypeptide, with three important properties: (1)it consists of hydrogen atoms only, (2) the set of inter-atomic distances inside a cut-off radius compatible withNMR experiments satisfies all DMDGP assumptions, yielding problem instances, and, (3) from the hydrogenatom coordinates in each solution for the artificial chain, it is possible to obtain the coordinates for the proteinbackbone atoms.

In this paper, we identify a new artificial chain, which inter-atomic distances are detectable in most basichomonuclear 1H NMR experiments, as well as an algorithm that solves it. We compare it with the artificial chainfrom the literature in both theoretical and practical sides, with computational experiments against real proteins.

2 An extension of the DMDGP

The artificial chain we will present in this work has particular features which the DMDGP cannot handle, moti-vating us to extend the problem in what follows.

Given a sequence of atoms 1,2, . . . ,n, a set S1 of atom pairs (i, j) which euclidean distance di, j has knownvalue δi, j, and a set S2 of atom pairs (i, j) which euclidean distance di, j is one of the two values δ

′i, j and δ

′′i, j, the

Extended DMDGP (DMDGPe) consists of finding {x1, ...,xn} ∈R3, the cartesian coordinates for each atom, suchthat

‖xi− x j‖= δi, j, ∀(i, j) ∈ S1 and ‖xi− x j‖ ∈{

δ′i, j,δ

′′i, j

}, ∀(i, j) ∈ S2,

1

Advances in Biology, Bioengineering and Environment

ISBN: 978-960-474-261-5 43

user
Rectangle
Page 2: Determining Protein Backbone from H and H-alpha Short ... · Determining Protein Backbone from H and H-alpha Short Interatomic Distances PEDRO NUCCI Dept. of Information Technology

where the two following assumptions are satisfied:

1. all pairs of atoms (i, j), for 1≤ j− i≤ 3, belong to S = S1∪S2 (in other words, their distance di, j has eitherone or two possible values),

2. angles between vectors (xi+2− xi+1) and (xi+1− xi), where 1≤ i≤ n−3, are not a multiple of π .

The set S may be partitioned accordingly to the index difference of its pairs: the set E, corresponding to all pairsof atoms (i, j) where 1≤ j− i≤ 3, and the set F of pairs (i, j) ∈ S where j− i > 3 (the Figure 1-a illustrates theanalog situation in NMR experiments, when nearby atoms have known distances). We call E-type distances all

d 3,15

d 4,15

1 19

15

ω4

2

34

3 4θ

Figure 1: (a) Sets of atom pairs which distance has known value. (b) Bond and torsion angles.

distances di, j related to pairs (i, j) ∈ E, and F-type distances all di, j related to pairs (i, j) ∈ F .For each choice of considered values for E-type distances, we can obtain the cosine of bond angles (θi), by

the cosine law, and the cosine of torsion angles (ωi), by the cosine law for torsion angles [4] (see Figure 1-b):

cosωi =cosγ− cosα cosβ√

1− cos2 α√

1− cos2 β, where (1)

cosα =d2

i−2,i +d2i−2,i−1−d2

i−1,i

2di−2,idi−2,i−1, cosβ =

d2i−3,i−2 +d2

i−2,i−1−d2i−3,i−1

2di−3,i−2di−2,i−1and cosγ =

d2i−3,i−2 +d2

i−2,i−d2i−3,i

2di−3,i−2di−2,i.

This information is enough for obtaining torsion matrices Bi for each atom i (author?) [3]. While torsion matricesfor the first three atoms can be determined uniquely, for each atom i ≥ 4 we obtain two torsion matrices, B1

i andB2

i , one for each possible value of sinωi =±√

1− cos2 ωi in

Bi =

−cosθi −sinθi 0 −di−1,i cosθisinθi cosωi −cosθi cosωi −sinωi di−1,i sinθi cosωisinθi sinωi −cosθi sinωi cosωi di−1,i sinθi sinωi

0 0 0 1

, 4≤ i≤ n. (2)

By the product B1B2...Bi, we easily obtain a position xi consistent with the E-type distance values being consid-ered. This will be useful for solving the artificial chain that we are going to present.

3 Solving Protein Backbones with only H and H-alpha interatomic distances

3.1 The artificial chain induced by HrN−Hr

α −Cr

Consider the sequence of atoms HrN−Hr

α−Cr, repeated for all residue r of any monomeric protein with m residues(see Figure 2). Although it does not contain hydrogen atoms only, neither it is chemically bonded, we will showthat the chain induced by this sequence (H1

N ,H1α ,C

1,H2N ,H

2α ,C

2, ...,HmN ,H

mα ,C

m) can be solved by using onlydistances between hydrogen atoms and distances known a priori.

In order to calculate torsion matrices, we need to know the distances di, j for pairs (i, j) ∈ E, along theresidues r, of the following types: (Hr

N ,Hrα), (H

rN ,C

r), (Hr−1N ,Hr

N), (Hrα ,C

r), (Hr−1α ,Hr

N), (Hr−1α ,Hr

α), (Cr−1,Hr

N),

2

Advances in Biology, Bioengineering and Environment

ISBN: 978-960-474-261-5 44

user
Rectangle
Page 3: Determining Protein Backbone from H and H-alpha Short ... · Determining Protein Backbone from H and H-alpha Short Interatomic Distances PEDRO NUCCI Dept. of Information Technology

H

O

CαN

O

O

CαN

Cβ HH

N

H

HH HH

C

H H

C C

HNr Hα

r

Cr

OH

Cα CN

H

Figure 2: Artificial chain induced by HrN−Hr

α −Cr.

(Cr−1,Hrα), (C

r−1,Cr). Almost all of them represent pairs of nearby hydrogen atoms (their distances are detectableby NMR experiments) or pairs of atoms with distances known a priori (due to the very low fluctuation of cova-lent bond length and angles), so we consider them as (i, j) ∈ (E ∩ S1) (i.e. their distances are known uniquely).Differently, for (Hr

N ,Cr), (Cr−1,Hr

α) and (Cr−1,Cr), we will make use of the result presented below, which let usconsider these as (i, j) ∈ (E ∩S2).

Let q = (i−3, i−2, i−1, i) be a quadruple of atoms with

• (a) known euclidian distances d j,k, where j ≥ i−3, k ≤ k and k− j ≤ 2,

• (b) the cosine of their torsion angle, cosω , known,

which distance di−3,i we want to find out. We can achieve this by rearranging the cosine law for torsion angles:

di−3,i =

√d2

i−3,i−2 +d2i−2,i−2di−3,i−2di−2,i

[(cosω)

√1− cos2 α

√1− cos2 β + cosα cosβ

]. (3)

Pairs of types (HrN ,C

r), (Cr−1,Hrα) and (Cr−1,Cr) satisfy the condition (a) when combined with atoms from

the main chain, forming different quadruples q of atoms which euclidian distances d j,k are known a priori, wherej ≥ i−3, k ≤ i and k− j ≤ 2, as can be seen in Figure 3.

HNr

Hαr

HNr

O OOO

Hαr

HNr

Cαr

Cr

Nr

i-3

i-2i-1

i

di-3,i-1

di-2,i

i-3

i-2

i-1

i

di-3,i-1

di-2,i

Cr-1

i-3

i-2 i-1

di-3,i-1

Cr-1

i

di-2,i

Nr Cαr

Hαr

Nr Cαr

Cr

q = (HNr, Nr, Cα

r, Cr) q = (Cr-1, Nr, Cαr, Hα

r) q = (Cr-1, Nr, Cαr, Cr)

Figure 3: Quadruples of atoms used to infer distances related to (HrN ,C

r), (Cr−1,Hrα) and (Cr−1,Cr).

Moreover, for both three cases, consider one same auxiliary quadruple of atoms qraux = (Hr

N ,Nr,Cr

α ,Hrα),

which torsion angle is ωraux. We have two possible values for ωr

aux, since all inter-atomic distances of qraux are

known, and by equation 1 we obtain the cosine of this angle. Once bond angles and bond lengths are fixed forcarbon and nitrogen atoms in the main chain, for each quadruple q in question, the difference4ω between torsionangles ωr

aux and ω , is the same for any residue or protein, implying ω = ωraux +4ω . For each of the two possible

values for ωaux, we will have one different value for ωi, and so for cosωi. Thus, the condition (b) is satisfied, butnot uniquely, yielding two possible values for di−3,i.

3

Advances in Biology, Bioengineering and Environment

ISBN: 978-960-474-261-5 45

user
Rectangle
Page 4: Determining Protein Backbone from H and H-alpha Short ... · Determining Protein Backbone from H and H-alpha Short Interatomic Distances PEDRO NUCCI Dept. of Information Technology

3.2 A new BP algorithm for solving the artificial chain

Since the three types of atom pairs (i, j) ∈ (E ∩ S2) depend on the same auxiliary torsion angle ωraux, we only

need to form torsion matrices with distance values derived from the same ωraux value. Hence, we can explore the

different combinations of torsion matrices respecting the F-type distances like a tree search, as in the original BPin [4], but branching ωr

aux before its decurrent matrices. Algorithm 1 synthetizes this method.

Algorithm 1 Extended Branch-And-PruneInitialization:create one node H1

N with torsion matrix correspondent to the originExtendedBP( H1

N )

ExtendedBP( parent )if (parent 6=Cm) then

if parent is like HrN then

create one node n1 and n2 for each possible ωraux value

branch← {n1,n2}else

obtain the product P of torsion matrices from the root until parentcompute torsion matrices B1 and B2 for next atom using distances derived from ancestral ωr

auxcreate one node n1 and n2 for each product PB1 and PB2

branch← {n1,n2}remove (prune) any node in branch if its position does not satisfy some F-type distance

end iffor each node b in branch do

connect b with parentExtendedBP( b )

end forelse print the solution formed by positions of each node in the path to the root

3.3 From the artificial chain to the protein backbone

With artificial chain solutions at hand, we can obtain their correspondent protein backbone by solving smallsubproblems of discovering the coordinates for a point x in R3, when we know the positions and related distancesto four fixed points b1, b2, b3 and b4, as in [5]. However we propose an alternative way to accomplish this: insteadof solving linear systems as in [2], a small DMDGP instance is created, starting with three of the fixed points(letting one point b

′out of the instance), while the fourth atom is the unknown point. The two torsion matrices for

x yield one position each, which we can test against the known distance dx,b′ . Figure 4 illustrates the describedsituation.

b’=b2x

x’

b3

b4b1

Figure 4: Determining a position from 4 fixed others.

In the artificial chain, for each residue r, the atoms HrN , Hr

α and Cr have distances to Crα known a priori,

determining two symmetric positions for Crα . For each one, we reach the situation described above, determining

therefore one position for Nr. In this way, by using information of the artificial chain from each residue r indepen-

4

Advances in Biology, Bioengineering and Environment

ISBN: 978-960-474-261-5 46

user
Rectangle
Page 5: Determining Protein Backbone from H and H-alpha Short ... · Determining Protein Backbone from H and H-alpha Short Interatomic Distances PEDRO NUCCI Dept. of Information Technology

dently, we obtain two enantiomer structures (mirrored over the plane defined by HrN , Hr

α and Cr) for the backboneof the same residue. We can choose the correct one by testing their chiralities.

4 Computational Experiments and Results

We created a test set consisting of 30 randomly selected monomeric protein structures from the Protein DataBank (PDB), with different sizes, similarity under 30%, solved by NMR spectroscopy and added to the databasebetween 2008 and 2010. All distances known a priori were obtained as mean values, measured from the entire testset. Specifically for proline residues, which do not have HN atoms, we considered that the Hr

N sequence elementrepresents the Cδ atom of the proline ring.

We have implemented the proposed artificial chain (which we call in the results as Artificial Chain A). In-stances of DMDGPe were created from known protein structures, being processed further by ExtendedBP algo-rithm. For each solution found, we have determined the correspondent protein backbone and measured its leastroot mean square deviation (lRMSD) related to the backbone of the PDB structure. We have implemented alsothe original BP algorithm for the artificial chain proposed in [5] (which we call in the results as Artificial ChainB). For implementations, we have used Java language. The experiments have been executed in a station runningLinux operational system, kernel 2.6, JRE 1.6, over an Intel Core2Duo 2.2GHz processor with 2GB RAM. We letboth algorithms executing for a maximum time of 120 minutes, with a prunning tolerance of 10−5Å. The com-putational results are summarized in Figure 5, showing the execution time and the lRMSD for the best solutionfound.

0,0000

0,1000

0,2000

0,3000

0,4000

0,5000

0,6000

0,7000

0,8000

0,9000

1,0000

0

20

40

60

80

100

120

Artificial Chain B Artificial Chain A

Best least RMSD (Å)

Time (min)

Figure 5: Computational results for the proposed artificial chain and the chain from [5].

5

Advances in Biology, Bioengineering and Environment

ISBN: 978-960-474-261-5 47

user
Rectangle
Page 6: Determining Protein Backbone from H and H-alpha Short ... · Determining Protein Backbone from H and H-alpha Short Interatomic Distances PEDRO NUCCI Dept. of Information Technology

5 Conclusion

In this paper, we have brought together one step further NMR data and the DMDGP, by proposing a new artifi-cial chain and an extension of DMDGP (DMDGPe) that handles it, a new Branch-And-Prune algorithm for theproblem, and an alternative method to infer the protein backbone from one obtained artificial chain structure. Wetested the previous artificial chain in the literature against real proteins, what was not made in [5].

As major theoretical advantages over the artificial chain proposed in [5], the presented method does not use Hβ

related distances (which are less precise for NMR experiments in practice, due to the high degree of freedom of theside chain in many residues), it has an easier process to obtain the backbone structure (with no error propagationamong residues, no resolution of linear systems and easier implementation) and it adds 4 levels per residue (beingone for torsion nodes) to the ExtendedBP tree, while the other adds 5 levels per residue to the BP tree.

Quantitatively, the chain of [5] had a large amount of nodes and identical solutions, which prevented anefficient exploration of search space, implying in its worse execution time and quality of best solutions for vastthe majority of instances (it performed better in 2K4T, where it did find solutions and the new chain did not).These good results were obtained for the new chain using less information than the chain of [5], once we coulduse Hβ -related distances as pruning criteria.

Acknowledgements: This work was partially supported by CNPq, FAPERJ, and FAPESP.

References:[1] Lavor, C., Liberti, L., Maculan, N. The discretizable molecular distance geometry problem, arXiv:q-

bio/0608012v1, 2006.

[2] Wu, D. and Wu, Z. An Updated Geometric Build-Up Algorithm for Solving the Molecular Distance GeometryProblem with Sparse Distance Data, Journal of Global Optimization 37, 661–673, 2007.

[3] Phillips, A.T., Rosen, J.B., Walke, V.H. Molecular structure determination by convex underestimation of localenergy minima. DIMACS Series in Discrete Mathematics and Theoretical Computer Science 23, 181-198,American Mathematical Society, 1996.

[4] Liberti, L., Lavor, C., Maculan, N. A branch-and-prune algorithm for the molecular distance geometry prob-lem. International Transactions in Operational Research 15, 1–17, 2008.

[5] Lavor, C., Mucherino, A., Liberti, L., Maculan, N. On the Computation of Protein Backbones by usingArtificial Backbones of Hydrogens. Journal of Global Optimization. 2010.

6

Advances in Biology, Bioengineering and Environment

ISBN: 978-960-474-261-5 48

user
Rectangle